WO2023172319A1 - Mise en cache d'échange direct avec réduction d'effet de parasites de voisin et attribution de plage d'adresses dynamique - Google Patents
Mise en cache d'échange direct avec réduction d'effet de parasites de voisin et attribution de plage d'adresses dynamique Download PDFInfo
- Publication number
- WO2023172319A1 WO2023172319A1 PCT/US2022/052607 US2022052607W WO2023172319A1 WO 2023172319 A1 WO2023172319 A1 WO 2023172319A1 US 2022052607 W US2022052607 W US 2022052607W WO 2023172319 A1 WO2023172319 A1 WO 2023172319A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- memory
- swappable
- range
- host
- far
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0842—Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
- G06F12/0848—Partitioned cache, e.g. separate instruction and operand caches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0855—Overlapped cache accessing, e.g. pipeline
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0855—Overlapped cache accessing, e.g. pipeline
- G06F12/0859—Overlapped cache accessing, e.g. pipeline with reload from main memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/109—Address translation for multiple virtual address spaces, e.g. segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/126—Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0868—Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0873—Mapping of cache memory to specific storage devices or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0888—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45583—Memory management, e.g. access or allocation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1008—Correctness of operation, e.g. memory ordering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1041—Resource optimization
- G06F2212/1044—Space efficiency improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/15—Use in a specific computing environment
- G06F2212/152—Virtualized environment, e.g. logically partitioned system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/25—Using a specific main memory architecture
- G06F2212/254—Distributed memory
- G06F2212/2542—Non-uniform memory access [NUMA] architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/50—Control mechanisms for virtual memory, cache or TLB
- G06F2212/502—Control mechanisms for virtual memory, cache or TLB using adaptive policy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/65—Details of virtual memory and virtual address translation
- G06F2212/657—Virtual address space management
Definitions
- Computing systems may include the public cloud, the private cloud, or a hybrid cloud having both public and private portions.
- the public cloud includes a global network of servers that perform a variety of functions, including storing and managing data, running applications, and delivering content or services, such as streaming videos, provisioning electronic mail, providing office productivity software, or handling social media.
- the servers and other components may be located in data centers across the world. While the public cloud offers services to the public over the Internet, businesses may use private clouds or hybrid clouds. Both private and hybrid clouds also include a network of servers housed in data centers.
- Multiple tenants may use compute, storage, and networking resources associated with the servers in the cloud.
- the compute, storage, and networking resources may be provisioned using a host operating system (OS) installed on a compute node (e.g., a server) in a data center.
- OS host operating system
- Each host OS may allow multiple tenants, such as a virtual machine, to access the compute and memory resources associated with a respective compute node.
- Each tenant may be allocated a certain amount of memory reflective of a certain number of cache lines.
- conflicting cache lines in the near memory e.g., the DRAM
- This may cause a tenant’s activities to create issues for another tenant, including reduced memory bandwidth and capacity.
- the present disclosure relates to a system including a compute node providing access to both near memory and far memory.
- the system may further include a host operating system (OS), configured to support a first set of tenants associated with the compute node, where the host OS having access to: (1) a first swappable range of memory addresses associated with the near memory and (2) a second swappable range of memory addresses associated with the far memory to allow for swapping of cache lines between the near memory and the far memory.
- the system may further include the host OS configured to allocate memory in a granular fashion to any of the first set of tenants such that each allocation of memory to a tenant includes memory addresses corresponding to a conflict set having a conflict set size.
- the conflict set may include a first conflicting region associated with the first swappable range of memory addresses associated with the near memory and a second conflicting region associated with the second swappable range of memory addresses associated with the far memory, and where each of the first conflicting region and the second conflicting region having a same size that is selected to be equal to or less than half of the conflict set size.
- the present disclosure relates to a method including provisioning a compute node with both near memory and far memory.
- the method may further include granting to a host operating system (OS), configured to support a first set of tenants associated with the compute node, access to: (1) a first swappable range of memory addresses associated with the near memory and (2) a second swappable range of memory addresses associated with the far memory to allow for swapping of cache lines between the near memory and the far memory.
- the method may further include allocating memory in a granular fashion to any of the first set of tenants such that each allocation of memory to a tenant includes memory addresses corresponding to a conflict set having a conflict set size.
- the conflict set may include a first conflicting region associated with the first swappable range of memory addresses associated with the near memory and a second conflicting region associated with the second swappable range of memory addresses associated with the far memory, and where each of the first conflicting region and the second conflicting region having a same size that is selected to be equal to or less than half of the conflict set size.
- the present disclosure relates to a method including provisioning a compute node with both near memory and far memory, where a host operating system (OS) associated with the compute node is granted access to a first system address map configuration and a second system address map configuration different from the first system address map configuration.
- the method may further include granting to the host OS, configured to support a first set of tenants, access to a first non-swappable address range associated with the near memory.
- OS host operating system
- the method may further include granting to the host OS, configured to support a second set of tenants, different from the first set of tenants, access to: (1) a first swappable address range associated with the near memory and (2) a second swappable address range associated with the far memory to allow for swapping of cache lines between the near memory and the far memory.
- the method may further include increasing a size of the first non-swappable address range by switching from the first system address map configuration to the second system address map configuration.
- FIG. l is a block diagram of a system including compute nodes coupled with a far memory system in accordance with one example
- FIG. 2 shows a block diagram of an example far memory system
- FIG. 3 shows an example system address map for use with the system of FIG. 1;
- FIG. 4 is a diagram showing a transaction flow related to a read operation and a write operation when the location of the data is in the near memory in accordance with one example
- FIG. 5 is a diagram showing a transaction flow relating to the transactions that occur when the data associated with a read operation is located in the far memory in accordance with one example
- FIG. 6 is a diagram showing a transaction flow relating to the transactions that occur when the data associated with a write operation is located in the far memory in accordance with one example
- FIG. 7 shows a block diagram of an example system for implementing at least some of the methods for direct swap caching with noisy neighbor mitigation and dynamic address range assignment
- FIG. 8 shows a data center for implementing a system for direct swap caching with noisy neighbor mitigation and dynamic address range assignment
- FIG. 9 shows a flow chart of an example method for direct swap caching with noisy neighbor mitigation
- FIG. 10 shows configuration A of a system address map for use with the system of FIG. 1;
- FIG. 11 shows configuration B of a system address map for use with the system of FIG. 1; and
- FIG. 12 shows a flow chart of another example method for direct swap caching with noisy neighbor mitigation.
- Examples described in this disclosure relate to systems and methods direct swap caching with noisy neighbor mitigation and dynamic address range assignment. Certain examples relate to leveraging direct swap caching for use with a host operating system (OS) in a computing system or a multi-tenant computing system.
- the multi-tenant computing system may be a public cloud, a private cloud, or a hybrid cloud.
- the public cloud includes a global network of servers that perform a variety of functions, including storing and managing data, running applications, and delivering content or services, such as streaming videos, electronic mail, office productivity software, or social media.
- the servers and other components may be located in data centers across the world. While the public cloud offers services to the public over the Internet, businesses may use private clouds or hybrid clouds. Both private and hybrid clouds also include a network of servers housed in data centers.
- Compute entities may be executed using compute and memory resources of the data center.
- the term “compute entity” encompasses, but is not limited to, any executable code (in the form of hardware, firmware, software, or in any combination of the foregoing) that implements a functionality, a virtual machine, an application, a service, a micro- service, a container, or a unikemel for serverless computing.
- compute entities may be executing on hardware associated with an edge-compute device, onpremises servers, or other types of systems, including communications systems, such as base stations (e.g., 5G or 6G base stations).
- a host OS may have access to a combination of near memory (e.g., the local DRAM) and an allocated portion of a far memory (e.g., pooled memory or non-pooled memory that is at least one level removed from the near memory).
- the far memory may relate to memory that includes any physical memory that is shared by multiple compute nodes.
- the near memory may correspond to double data rate (DDR) dynamic random access memory (DRAM) that operates at a higher data rate (e.g., DDR2 DRAM, DDR3 DRAM, DDR4 DRAM, or DDR5 DRAM) and the far memory may correspond to DRAM that operates at a lower data rate (e.g., DRAM or DDR DRAM).
- DDR double data rate
- DRAM dynamic random access memory
- near memory includes any memory that is used for storing any data or instructions that is evicted from the system level cache(s) associated with a CPU and the far memory includes any memory that is used for storing any data or instruction swapped out from the near memory.
- the near memory and the far memory relates to the relative number of physical links between the CPU and the memory. As an example, assuming the near memory is coupled via a near memory controller, thus being at least one physical link away from the CPU, the far memory is coupled to a far memory controller, which is at least one more physical link away from the CPU.
- FIG. 1 is a block diagram of a system 100 including compute nodes 110, 140, and 170 coupled with a far memory system 180 in accordance with one example.
- Each compute node may include compute and memory resources.
- compute node 110 may include a central processing unit (CPU) 112; compute node 140 may include a CPU 142; and compute node 170 may include a CPU 172.
- CPU central processing unit
- compute node in FIG. 1 is shown is having a single CPU, each compute node may include additional CPUs, and other devices, such as graphics processor units (GPUs), field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or other devices.
- each compute node may include near memory, which may be organized as memory modules.
- compute node 110 may include memory modules 122, 124, 126, 128, 130, and 132.
- Compute node 140 may include memory modules 152, 154, 156, 158, 160, and 162.
- Compute node 170 may include memory modules 182, 184, 186, 188, 190, and 192. Examples of such memory modules include, but are not limited to, dual-in-line memory modules (DIMMs) or single-in-line memory modules (SIMMs).
- Memory included in these modules may be dynamic random access memory (DRAM), flash memory, static random access memory (SRAM), phase change memory, magnetic random access memory, or any other type of memory technology that can allow the memory to act as local memory.
- each compute node may include one or more memory controllers.
- compute node 110 may include memory controller 118
- compute node 140 may include memory controller 148
- compute node 170 may include memory controller 178.
- the memory controller included in such nodes may be a double dynamic rate (DDR) DRAM controller in case the memory modules include DDR DRAM.
- Each compute node may be configured to execute several compute entities.
- compute node 110 may have host OS 114 installed on it; compute node 140 may have host OS 144 installed on it, and compute node 170 may have host OS 174 installed on it.
- Far memory system 180 may include pooled memory (or non-pooled memory), which may include several memory modules.
- Examples of such memory modules include, but are not limited to, dual-in-line memory modules (DIMMs) or single-in-line memory modules (SIMMs).
- Memory included in these modules may be dynamic random access memory (DRAM), flash memory, static random access memory (SRAM), phase change memory, magnetic random access memory, or any other type of memory technology that can allow the memory to act as far memory.
- DRAM dynamic random access memory
- SRAM static random access memory
- phase change memory phase change memory
- magnetic random access memory or any other type of memory technology that can allow the memory to act as far memory.
- Any of host OS (e.g., host OS 114, 144, or 174), being executed by any of compute nodes (e.g., compute node 110, 140, or 170), may access at least a portion of the physical memory included as part of far memory system 180.
- a portion of memory from far memory system 180 may be allocated to the compute node when the compute node powers on or as part of allocation/deallocation operations.
- the assigned portion may include one or more “slices” of memory, where a slice refers to any smallest granularity of portions of memory managed by the far memory controller (e.g., a memory page or any other block of memory aligned to a slice size).
- a slice of memory is allocated at most to only one host at a time.
- the far memory controller may assign or revoke assignment of slices to compute nodes based on an assignment/revocation policy associated with far memory system 180. As explained earlier, the data/instructions associated with a host OS may be swapped in and out of the near memory from/to the far memory.
- compute nodes 110, 140, and 170 may be part of a data center.
- the term data center may include, but is not limited to, some or all of the data centers owned by a cloud service provider, some or all of the data centers owned and operated by a cloud service provider, some or all of the data centers owned by a cloud service provider that are operated by a customer of the service provider, any other combination of the data centers, a single data center, or even some clusters in a particular data center.
- each cluster may include several identical compute nodes.
- a cluster may include compute nodes including a certain number of CPU cores and a certain amount of memory.
- FIG. 1 shows system 100 as having a certain number of components, including compute nodes and memory components, arranged in a certain manner, system 100 may include additional or fewer components, arranged differently.
- FIG. 2 shows a block diagram of an example far memory system 200 corresponding to far memory system 180 shown in FIG. 1.
- Far memory system 200 may include a switch 202 for coupling the far memory system controllers to compute nodes (e.g., compute nodes 110, 130, and 150 of FIG. 1).
- Far memory system 200 may further include several far memory controllers and associated far memory modules.
- far memory system 200 may include far memory controller (FMC) 210, FMC 220, FMC 230, FMC 240, FMC 250, and FMC 260 coupled to switch 202, as shown in FIG. 2.
- FMC 210, FMC 220, FMC 230, FMC 240, FMC 250, and FMC 260 may further be coupled to fabric manager 280.
- FMC 210 may further be coupled to memory modules 212, 214, 216, and 218.
- FMC 220 may further be coupled to memory modules 222, 224, 226, and 228.
- FMC 230 may further be coupled to memory modules 232, 234, 236, and 238.
- FMC 240 may further be coupled to memory modules 242, 244, 246, and 248.
- FMC 250 may further be coupled to memory modules 252, 254, 256, and 258.
- FMC 260 may further be coupled to memory modules 262, 264, 266, and 268.
- Each memory module may be a dual-in-line memory module (DIMM) or a single-in-line memory module (SIMM).
- each of the far memory controllers may be implemented as a Compute Express Link (CXL) specification compliant memory controller.
- each of the memory modules associated with far memory system 200 may be configured as Type 3 CXL devices.
- Fabric manager 280 may communicate via bus 206 with data center control plane 290.
- fabric manager 280 may be implemented as a CXL specification compliant fabric manager.
- Control information received from data center control plane 290 may include control information specifying which slices of memory from the far memory are allocated to any particular compute node at a given time.
- fabric manager 280 may allocate a slice of memory from within the far memory to a specific compute node in a time-division multiplexed fashion.
- CXL.io protocol which is a PCIe- based non-coherent I/O protocol
- CXL.io protocol may be used to configure the memory devices and the links between the CPUs and the memory modules included in far memory system 200.
- the CXL.io protocol may also be used by the CPUs associated with the various compute nodes in device discovery, enumeration, error reporting, and management.
- any other I/O protocol that supports such configuration transactions may also be used.
- the memory access to the memory modules may be handled via the transactions associated with CXL.mem protocol, which is a memory access protocol that supports memory transactions.
- load instructions and store instructions associated with any of the CPUs may be handled via CXL.mem protocol.
- any other protocols that allow the translation of the CPU load/ store instructions into read/write transactions associated with memory modules included in far memory system 200 may also be used.
- FIG. 2 shows far memory system 200 as having a certain number of components, including far memory controllers and memory modules, arranged in a certain manner, far memory system 200 may include additional or fewer components, arranged differently.
- the far memory may be implemented as memory modules that are coupled in the same manner as the near memory (e.g., memory modules shown as part of system 100 in FIG. 1).
- the far memory modules may be implemented using cheaper or lower speed versions of the memory.
- FIG. 3 shows an example system address map 300 for use with the system 100 of FIG. 1.
- the near memory in order to use direct swap caching in the context of system 100 of FIG. 1, the near memory must have a fixed ratio with the far memory.
- near memory includes both a non-swappable range and a swappable range. This means that in this example any access to memory within the non-swappable range will be guaranteed to get a “hit” in the near memory (since this range is not being swapped). Any access to a location in memory within the swappable range will operate in the direct swap cache manner. Thus, these accesses will first perform a lookup within the memory designated as the near memory.
- Swapping operations may be performed at a granularity level of a cache line.
- Each cache line may include a combination of a data portion (e.g., 512 bits) and a metadata portion (e.g., 128 bits).
- the data portion may contain data representing user data or instructions executed by a compute node.
- the metadata portion may include data representing various attributes of the data in the data portion.
- the metadata portion can also include error checking and correction bits or other suitable types of information.
- the metadata portion may include a tag having an appropriate number of bit(s) to distinguish between the location of a cache line.
- a tag having an appropriate number of bit(s) to distinguish between the location of a cache line.
- a ratio of 1 since the swappable memory region in the near memory has the same size as the swappable memory region in the far memory (a ratio of 1), a single bit may be used.
- a logical value of “1” may indicate that the cache line is in a location corresponding to the near memory whereas a logical value of “0” may indicate that the cache line is in a location corresponding to the far memory.
- the present disclosure is not limited to the use of a fixed ratio of 1 : 1 between the near memory and the far memory.
- a ratio of 1 :3 may be used. In such a case, additional tag bits may be required to encode the information concerning the location of the cache line in terms of the region of the memory having the
- one of the potential issues that can occur with respect to direct swap caching is that conflicting cache lines in near-memory may be allocated to separate tenants (e.g., VMs, containers, etc.) in a virtualized system.
- tenants e.g., VMs, containers, etc.
- one tenant’s swapping of cache lines can impact the memory bandwidth and the memory capacity of another tenant.
- the present disclosure describes an example mechanism that allows one to build isolation between tenants such that one tenant cannot impact the memory bandwidth and the memory capacity of another tenant.
- the present disclosure describes an address mapping arrangement such that conflict sets map to the same tenant — that is, one tenant’s addresses do not conflict with another.
- System address map 300 includes both swappable range and non-swappable range.
- an address bit is used to carve up the swappable range into smaller granular regions.
- 1 TB is configured as a non-swappable range
- 1 TB is configured as a swappable range.
- a low order address bit is used to carve this memory range (swappable range) into smaller granular regions, each having a size of 512 MB.
- a tenant e.g., any of VM 1, VM 2, ...
- VM N is allocated an address range equal to or higher that 1 GB (at least twice the size of the smaller granular regions), then the tenants’ addresses do not conflict with each other.
- the address range allocated to each tenant can be viewed as having a conflict set size (e.g., 1 GB), which in this example is selected to be of the same size as the page size associated with the system.
- the host OS e.g., a hypervisor
- Each conflict set (having two conflicting 512 MB swappable regions) corresponds to a single 512 MB region in the physical memory accessible to a tenant (e.g., the DRAM).
- a single 1 GB page corresponds to a single 512 MB region in the physical memory.
- a low order address bit (e.g., address bit 29) can have a logical value of “0” or “1” to distinguish between the two 512 MB conflicting regions.
- the cache line’s address corresponds to one of the 512 MB conflicting regions and when the logical value for the address bit 29 is “1,” then the cache line’s address corresponds to the other 512 MB conflicting region.
- Other types of encodings may also be used as part of the addressing to distinguish between the two conflicting regions.
- an interesting property of using the size of 512 MB is the following: if the first-level page tables (the tables that map the Guest Physical Address to the System Physical Address) use a 1 GB page size, then this method of carving up the address space may ensure perfect noisy-neighbor isolation even if the 1 GB pages are allocated in a dis-contiguous fashion across the system physical address (SPA) space.
- SPA system physical address
- system address map 300 also includes a non-swappable range. That range can be allocated to a set of high-priority tenants (e.g., VMs X, Y. . . Z) that use the non-swapped space that are also isolated from all the tenants using the swappable region prone to conflicts.
- tenants e.g., VMs X, Y. . . Z
- the compute node e.g., the host server
- the compute node is a two-socket server system that allows access to two non-uniform memory access (NUMA) sets: INTERLEAVED SET A (NUMA-0) and INTERLEAVED SET B (NUMA-1). These different sets can offer different NUMA characteristics to the tenants.
- the non-swappable range of system address map 300 can be mapped to the NUMA-0 set that allows for local access to memory that is faster relative to the NUMA-1 set.
- the swappable range and the non-swappable range can be advertised through the Advanced Configuration and Power Interface (ACPI) as two separate ranges.
- ACPI Advanced Configuration and Power Interface
- each range can be mapped to memory with different NUMA characteristics.
- each of the swappable range and the non-swappable range can have different attributes as provided via the respective Heterogenous Memory Attributes Tables (HMATs).
- HMATs Heterogenous Memory Attributes Tables
- FIG. 4 is a diagram showing a transaction flow 400 related to a read operation and a write operation when the location of the data is in the near memory.
- the transactions associated with the read operation are shown in portion 410 of transaction flow 400 and the transactions associated with the write operation are shown in flow portion 420 of transaction flow 400.
- a CPU e.g., any of CPUs 112, 142, or 172 of FIG. 1
- a memory controller e.g., any of memory controllers 118, 148, and 178 of FIG.
- address Al is first decoded to the near memory (e.g., any of the local memory associated with the CPU).
- the read from the local memory location results in a retrieval of a cache line including both the data portion and the metadata portion (including the tag).
- the tag indicates that the data portion corresponds to the address being looked up and hence it is a hit.
- the data in the cache line is returned to the requesting CPU.
- portion 420 of transaction flow 400 when a cache line is being written to the memory, every write operation needs to be preceded by a read operation to ensure that the memory location contains the address being written. In this case, the data is being written to address A2, which is located within the near memory and thus the write operation is also a hit.
- FIG. 5 is a diagram showing a transaction flow 500 relating to the transactions that occur when the data associated with a read operation is located in the far memory (e.g., the pooled memory).
- the tag indicates that the near memory location does not contain the address of the data being requested, then it is a miss.
- a blocking entry may be set in the memory controller for the four entries that map to the memory location in the local memory.
- the tag may be used to decode which location in the far memory contains the data corresponding to the address being requested.
- the far memory may be implemented as CXL compliant type 3 devices. In such an implementation, the memory controller may spawn a CXL. mem read request to the appropriate address.
- the data is sent to the original requester and thus completes the read operation.
- the data is also written to the near memory and the original data read from the local memory is written to the same location in the far memory from which the read happened — thereby performing the cache line swap.
- FIG. 6 is a diagram showing a transaction flow 600 relating to the transactions that occur when the data associated with a write operation is located in the far memory.
- a write e.g., write (A3)
- A3 write (A3)) that misses the near memory (local memory)
- the data is written to the far memory.
- FIG. 7 shows a block diagram of an example system 700 for implementing at least some of the methods for integrated memory pooling and direct swap caching.
- System 700 may include processor(s) 702, I/O component(s) 704, memory 706, presentation component(s) 708, sensors 710, database(s) 712, networking interfaces 714, and I/O port(s) 716, which may be interconnected via bus 720.
- Processor(s) 702 may execute instructions stored in memory 706.
- I/O component(s) 704 may include components such as a keyboard, a mouse, a voice recognition processor, or touch screens.
- Memory 706 may be any combination of non-volatile storage or volatile storage (e.g., flash memory, DRAM, SRAM, or other types of memories).
- Presentation component(s) 708 may include displays, holographic devices, or other presentation devices. Displays may be any type of display, such as LCD, LED, or other types of display.
- Sensor(s) 710 may include telemetry or other types of sensors configured to detect, and/or receive, information (e.g., collected data). Sensor(s) 710 may include telemetry or other types of sensors configured to detect, and/or receive, information (e.g., memory usage by various compute entities being executed by various compute nodes in a data center).
- Sensor(s) 710 may include sensors configured to sense conditions associated with CPUs, memory or other storage components, FPGAs, motherboards, baseboard management controllers, or the like.
- Sensor(s) 710 may also include sensors configured to sense conditions associated with racks, chassis, fans, power supply units (PSUs), or the like. Sensor(s) 710 may also include sensors configured to sense conditions associated with Network Interface Controllers (NICs), Top-of-Rack (TOR) switches, Middle-of- Rack (MOR) switches, routers, power distribution units (PDUs), rack level uninterrupted power supply (UPS) systems, or the like.
- NICs Network Interface Controllers
- TOR Top-of-Rack
- MOR Middle-of- Rack
- PDUs power distribution units
- UPS rack level uninterrupted power supply
- database(s) 712 may be used to store any of the data collected or logged and as needed for the performance of methods described herein.
- Database(s) 712 may be implemented as a collection of distributed databases or as a single database.
- Network interface(s) 714 may include communication interfaces, such as Ethernet, cellular radio, Bluetooth radio, UWB radio, or other types of wireless or wired communication interfaces.
- I/O port(s) 716 may include Ethernet ports, Fiber-optic ports, wireless ports, or other communication or diagnostic ports.
- FIG. 7 shows system 700 as including a certain number of components arranged and coupled in a certain way, it may include fewer or additional components arranged and coupled differently. In addition, the functionality associated with system 700 may be distributed, as needed.
- FIG. 8 shows a data center 800 for implementing a system for direct swap caching with noisy neighbor mitigation and dynamic address range assignment in accordance with one example.
- data center 800 may include several clusters of racks including platform hardware, such as compute resources, storage resources, networking resources, or other types of resources.
- Compute resources may be offered via compute nodes provisioned via servers that may be connected to switches to form a network. The network may enable connections between each possible combination of switches.
- Data center 800 may include serverl 810 and serverN 830.
- Data center 800 may further include data center related functionality 860, including deployment/monitoring 870, directory/identity services 872, load balancing 874, data center controllers 876 (e.g., software defined networking (SDN) controllers and other controllers), and routers/switches 878.
- Serverl 810 may include CPU(s) 811, host hypervisor 812, near memory 813, storage interface controller(s) (SIC(s)) 814, far memory 815, network interface controller(s) (NIC(s)) 816, and storage disks 817 and 818.
- memory 815 may be implemented as a combination of near memory and far memory.
- ServerN 830 may include CPU(s) 831, host hypervisor 832, near memory 833, storage interface controller(s) (SIC(s)) 834, far memory 835, network interface controller(s) (NIC(s)) 836, and storage disks 837 and 838.
- memory 835 may be implemented as a combination of near memory and far memory.
- Serverl 810 may be configured to support virtual machines, including VM1 819, VM2 820, and VMN 821.
- the virtual machines may further be configured to support applications, such as APP1 822, APP2 823, and APPN 824.
- ServerN 830 may be configured to support virtual machines, including VM1 839, VM2 840, and VMN 841.
- the virtual machines may further be configured to support applications, such as APP1 842, APP2 843, and APPN 844.
- data center 800 may be enabled for multiple tenants using the Virtual extensible Local Area Network (VXLAN) framework.
- VXLAN Virtual extensible Local Area Network
- Each virtual machine (VM) may be allowed to communicate with VMs in the same VXLAN segment.
- Each VXLAN segment may be identified by a VXLAN Network Identifier (VNI).
- VNI VXLAN Network Identifier
- FIG. 8 shows data center 800 as including a certain number of components arranged and coupled in a certain way, it may include fewer or additional components arranged and coupled differently.
- the functionality associated with data center 800 may be distributed or combined, as needed.
- FIG. 9 shows a flow chart 900 of an example method for direct swap caching with noisy neighbor mitigation.
- steps associated with this method may be executed by various components of the systems described earlier (e.g., system 100 of FIG. 1 and system 200 of FIG. 2).
- Step 910 may include provisioning a compute node with both near memory and far memory.
- Step 920 may include granting to a host operating system (OS), configured to support a first set of tenants associated with the compute node, access to: (1) a first swappable range of memory addresses associated with the near memory and (2) a second swappable range of memory addresses associated with the far memory to allow for swapping of cache lines between the near memory and the far memory.
- OS host operating system
- 1 TB is configured as a non-swappable range and 1 TB is configured as a swappable range.
- a low order address bit may be used to carve this swappable range into smaller granular regions, each having a size of 512 MB.
- Step 930 may include allocating memory in a granular fashion to any of the first set of tenants such that each allocation of memory to a tenant includes memory addresses corresponding to a conflict set having a conflict set size, and where the conflict set comprises: a first conflicting region associated with the first swappable range of memory addresses associated with the near memory and a second conflicting region associated with the second swappable range of memory addresses associated with the far memory, and where each of the first conflicting region and the second conflicting region having a same size that is selected to be equal to or less than half of the conflict set size.
- a tenant e.g., any of VM 1, VM 2, . . .
- VM N is allocated an address range equal to or higher that 1 GB (at least twice the size of the conflicting regions), then the tenants’ addresses do not conflict with each other.
- the address range allocated to each tenant can be viewed as having a conflict set size (e.g., 1 GB), which in this example is selected to be of the same size as the page size associated with the system.
- having the conflict set size being the same size as the page size associated with the system may result in the highest quality of service possible with respect to memory operations (e.g., read/write operations).
- the host OS e.g., a hypervisor
- Each conflict set (having two conflicting 512 MB swappable regions) corresponds to a single 512 MB region in the physical memory accessible to a tenant (e.g., the DRAM).
- a single 1 GB page corresponds to a single 512 MB region in the physical memory.
- a low order address bit (e.g., address bit 29) can have a logical value of “0” or “1” to distinguish between the two 512 MB conflicting regions. When the logical value for the address bit 29 is “0” then the cache line is in one of the 512 MB conflicting regions and when the logical value for the address bit 29 is “1” then the cache line is in the other 512 MB conflicting regions.
- the host OS can have initial access to a certain size of swappable range of memory addresses and a certain size of non-swappable range of memory addresses.
- any changes to this initial allocation have required modifications to hardware registers that may be programmed as part of the firmware associated with the boot sequence of the compute node.
- the basic input-output system (BIOS) associated with the system e.g., a system including a compute node
- the host OS may set up the hardware registers based on firmware settings.
- the host OS does not have access to the hardware registers. Accordingly, the host OS cannot change the system address map.
- any modifications to such hardware registers would require reprogramming of the firmware (e.g., the BIOS firmware).
- the present disclosure describes techniques to change the initial allocation of the size of the swappable region and the non-swappable region without requiring reprogramming of the hardware registers. In sum, this is accomplished by provisioning any number of different configurations and then switching between the configurations, as required, without having to reprogram the hardware registers.
- the switching between the configurations provides run-time flexibility with respect to the type of workloads that can be run using the system.
- the host OS for a system may have an equal amount of swappable and non-swappable range of addresses.
- the non-swappable range of addresses may be allocated to a set of high-priority tenants (e.g., VMs X, Y. . . Z) that use the nonswapped space and thus, are also isolated from all the tenants using the swappable region prone to conflicts.
- the host OS may discovers a higher demand for memory usage from the high-priority tenants, then the host OS may make a runtime switch to a different configuration of a system address map that includes a larger amount of non-swappable address space.
- the demand pattern is the reverse of this example, then the host OS may make a runtime switch to yet another configuration of a system address map that includes a larger amount of swappable address space.
- FIG. 10 shows a configuration A of a system address map 1000 for use with system 100 of FIG. 1.
- the configuration A described with respect to system address map 1000 assumes a nonswappable range of N gigabytes (GB) and a swappable range of M GB.
- a low order address bit is used to carve the swappable range into smaller granular regions (e.g., each having a size of 512 MB). These granular regions can be allocated to the tenants (e.g., any of VM 1, VM 2, . . . VM N).
- the non-swappable range can be allocated to tenants having a higher priority (e.g., any of VM X, Y, and Z).
- the compute node e.g., the host server
- the compute node is a two- socket server system that allows access to two non-uniform memory access (NUMA) sets: INTERLEAVED SET A (NUMA-0) and INTERLEAVED SET B (NUMA-1).
- NUMA non-uniform memory access
- these different sets can offer different NUMA characteristics to the tenants.
- the non-swappable range of system address map 1000 can be mapped to the NUMA-0 set that allows for local access to memory that is faster relative to the NUMA-1 set.
- system address map 1000 is further used to reserve two M/2 GB non-swappable address ranges.
- One of the M/2 GB non-swappable address ranges is mapped to near memory (e.g., DDR INTERLEAVED SET 3) and the other M/2 nonswappable address range is mapped to the far memory (e.g., CXL NON-INTERLEAVED SET 4).
- Hardware registers e.g., hardware address decoders associated with the compute node are set up such that each of the M/2 GB address ranges are mapping to the same near memory (e.g., the DRAM) locations. As such, these address ranges are reserved initially and are indicated to the host OS as unavailable. Thus, in the beginning, these two address ranges are marked as offline. As such, the address ranges marked as reserved are not mapped to any physical memory. Accordingly, in the beginning the host OS can only access the N GB non-swappable range and the M GB swappable range.
- system address map 1000 is switched from the configuration A shown in FIG. 10 to the configuration B shown in FIG. 11
- the switch to configuration B is accomplished by the host OS without invoking the BIOS, including without any reprogramming of the hardware registers.
- the host OS takes X GB of the swappable range offline. Prior to taking this range offline, the host OS invalidates all page table mappings in the system physical address table. This effectively means that the host OS can no longer access the address range taken offline.
- the host OS brings two X/2 GB memory address ranges online from the previously reserved nonswappable range (e.g., M GB non-swappable range shown as part of system address map 1000 of FIG. 10).
- One of the X/2 GB non-swappable address range maps to the far memory (e.g., CXL NON-INTERLEAVED SET 4) and the other X/2 GB non-swappable address range maps to the near memory (e.g., DDR INTERLEAVED SET 3).
- the host OS has effectively converted X GB swappable address range into a non-swappable address range.
- FIGs. 10 and 11 describe specific configurations, using similar techniques as described with respect to these figures, other configurations can also be deployed. These configurations allow for dynamic address range assignments that can be modified on the fly without requiring to reprogram the hardware registers used at the boot time.
- FIG. 12 shows a flow chart 1200 of an example method for direct swap caching with noisy neighbor mitigation.
- steps associated with this method may be executed by various components of the systems described earlier (e.g., system 100 of FIG. 1 and system 200 of FIG. 2).
- Step 1210 may include provisioning a compute node with both near memory and far memory, where a host operating system (OS) associated with the compute node is granted access to a first system address map configuration and a second system address map configuration different from the first system address map configuration.
- OS host operating system
- Step 1220 may include granting to the host OS, configured to support a first set of tenants, access to a first non-swappable address range associated with the near memory.
- a first non-swappable address range associated with the near memory.
- certain tenants having a higher priority e.g., any of VM X, Y, and Z
- the other tenants may be granted access to N GB of non-swappable address range.
- Step 1230 may include granting to the host OS, configured to support a second set of tenants, different from the first set of tenants, access to: (1) a first swappable address range associated with the near memory and (2) a second swappable address range associated with the far memory to allow for swapping of cache lines between the near memory and the far memory.
- a set of tenants e.g., any of VM 1, VM 2, . . . VM N
- a low order address bit is used to carve the swappable range into smaller granular regions (e.g., each having a size of 512 MB).
- Step 1240 may include increasing a size of the first non-swappable address range by switching from the first system address map configuration to the second system address map configuration.
- the host OS may increase the size of the nonswappable address range for the higher priority tenants by switching from system address map 1000 of FIG. 10 to system address map 1100 of FIG. 11.
- the switch is accomplished by the host OS without invoking the BIOS, including without any reprogramming of the hardware registers.
- the host OS may perform several actions in order to perform the switch. As an example, the host OS takes X GB of the swappable range offline. Prior to taking this range offline, the host OS invalidates all page table mappings in the system physical address table.
- the host OS can no longer access the address range taken offline.
- the host OS brings two X/2 GB memory address ranges online from the previously reserved non-swappable range (e.g., M GB non-swappable range shown as part of system address map 1000 of FIG. 10).
- the present disclosure relates to a system including a compute node providing access to both near memory and far memory.
- the system may further include a host operating system (OS), configured to support a first set of tenants associated with the compute node, where the host OS having access to: (1) a first swappable range of memory addresses associated with the near memory and (2) a second swappable range of memory addresses associated with the far memory to allow for swapping of cache lines between the near memory and the far memory.
- OS host operating system
- the system may further include the host OS configured to allocate memory in a granular fashion to any of the first set of tenants such that each allocation of memory to a tenant includes memory addresses corresponding to a conflict set having a conflict set size.
- the conflict set may include a first conflicting region associated with the first swappable range of memory addresses associated with the near memory and a second conflicting region associated with the second swappable range of memory addresses associated with the far memory, and where each of the first conflicting region and the second conflicting region having a same size that is selected to be equal to or less than half of the conflict set size.
- the host OS may have access to a first non-swappable range of memory addresses associated with the near memory and the host OS may further be configured to allocate memory addresses to a second set of tenants, having a higher priority than the first set of tenants, from within only the first non-swappable range of memory addresses associated with the near memory.
- the conflict set size may be selected to be equal to a size of a page of memory used by the host OS for page-based memory management.
- a ratio of a size of the first swappable range of memory addresses associated with the near memory and a size of the second swappable range of memory addresses associated with the far memory may be fixed.
- the host OS may further be configured to increase a size of the first nonswappable range of memory addresses without requiring reprogramming of hardware registers associated with the compute node.
- the system may further comprise a near memory controller for managing the near memory and a far memory controller, configured to communicate with the near memory controller, for managing the far memory.
- the near memory controller may further be configured to analyze a metadata portion associated with a cache line to determine whether the near memory contains the cache line or whether the far memory contains the cache line.
- the present disclosure relates to a method including provisioning a compute node with both near memory and far memory.
- the method may further include granting to a host operating system (OS), configured to support a first set of tenants associated with the compute node, access to: (1) a first swappable range of memory addresses associated with the near memory and (2) a second swappable range of memory addresses associated with the far memory to allow for swapping of cache lines between the near memory and the far memory.
- the method may further include allocating memory in a granular fashion to any of the first set of tenants such that each allocation of memory to a tenant includes memory addresses corresponding to a conflict set having a conflict set size.
- the conflict set may include a first conflicting region associated with the first swappable range of memory addresses associated with the near memory and a second conflicting region associated with the second swappable range of memory addresses associated with the far memory, and where each of the first conflicting region and the second conflicting region having a same size that is selected to be equal to or less than half of the conflict set size.
- the host OS may have access to a first non-swappable range of memory addresses associated with the near memory and the host OS is further configured to allocate memory addresses to a second set of tenants, having a higher priority than the first set of tenants, from within only the first nonswappable range of memory addresses associated with the near memory.
- the conflict set size may be selected to be equal to a size of a page of memory used by the host OS for page-based memory management.
- a ratio of a size of the first swappable range of memory addresses associated with the near memory and a size of the second swappable range of memory addresses associated with the far memory may be fixed.
- the method may further include increasing a size of the first nonswappable range of memory addresses without requiring reprogramming of hardware registers associated with the compute node.
- the method may further include analyzing a metadata portion associated with a cache line to determine whether the near memory contains the cache line or whether the far memory contains the cache line.
- the present disclosure relates to a method including provisioning a compute node with both near memory and far memory, where a host operating system (OS) associated with the compute node is granted access to a first system address map configuration and a second system address map configuration different from the first system address map configuration.
- the method may further include granting to the host OS, configured to support a first set of tenants, access to a first non-swappable address range associated with the near memory.
- OS host operating system
- the method may further include granting to the host OS, configured to support a second set of tenants, different from the first set of tenants, access to: (1) a first swappable address range associated with the near memory and (2) a second swappable address range associated with the far memory to allow for swapping of cache lines between the near memory and the far memory.
- the method may further include increasing a size of the first non-swappable address range by switching from the first system address map configuration to the second system address map configuration.
- the increasing the size of the first non-swappable address range is accomplished without requiring a reprogramming of hardware registers associated with the compute node.
- the first system address map configuration may include a first reserved non-swappable address range mapped to the near memory and a second reserved non-swappable address range mapped to the far memory, where all addresses associated with both the first reserved non-swappable address range and the second reserved non-swappable address range are marked as offline.
- the second address map configuration may include a portion of the first reserved non-swappable address range marked as online and a portion of the second reserved non-swappable address range marked as online.
- the second address map configuration may further include a portion of the first swappable address range marked as offline, where the portion of the first swappable address range marked as offline has a same size as a combined size of the first reserved non-swappable address range marked as online and the portion of the second reserved non-swappable address range marked as online.
- the method may further include allocating memory in a granular fashion to any of the first set of tenants such that each allocation of memory includes memory addresses corresponding to a conflict set having a conflict set size.
- the conflict set may include a first conflicting region associated with the first swappable range of memory addresses associated with the near memory and a second conflicting region associated with the second swappable range of memory addresses associated with the far memory, and where each of the first conflicting region and the second conflicting region having a same size that is selected to be equal to or less than half of the conflict set size.
- the conflict set size may be selected to equal to a size of a page of memory used by the host OS for page-based memory management.
- any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved.
- any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or inter-medial components.
- any two components so associated can also be viewed as being “operably connected,” or “coupled,” to each other to achieve the desired functionality.
- a component which may be an apparatus, a structure, a system, or any other implementation of a functionality, is described herein as being coupled to another component does not mean that the components are necessarily separate components.
- a component A described as being coupled to another component B may be a subcomponent of the component B
- the component B may be a sub-component of the component A
- components A and B may be a combined sub-component of another component C.
- non-transitory media refers to any media storing data and/or instructions that cause a machine to operate in a specific manner.
- exemplary non-transitory media include non-volatile media and/or volatile media.
- Nonvolatile media include, for example, a hard disk, a solid-state drive, a magnetic disk or tape, an optical disk or tape, a flash memory, an EPROM, NVRAM, PRAM, or other such media, or networked versions of such media.
- Volatile media include, for example, dynamic memory such as DRAM, SRAM, a cache, or other such media.
- Non-transitory media is distinct from, but can be used in conjunction with transmission media.
- Transmission media is used for transferring data and/or instruction to or from a machine.
- Exemplary transmission media include coaxial cables, fiber-optic cables, copper wires, and wireless media, such as radio waves.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Storage Device Security (AREA)
Abstract
L'invention concerne des systèmes et des procédés associés à une mise en cache d'échange direct avec une réduction d'effet de parasites de voisin et une attribution de plage d'adresses dynamique. Un système comprend un système d'exploitation hôte (OS), configuré pour prendre en charge un premier ensemble de locataires associés à un nœud de calcul, l'OS hôte ayant accès à : (1) une première plage interchangeable d'adresses de mémoire associée à une mémoire proche et (2) une deuxième plage interchangeable d'adresses de mémoire associées à une mémoire éloignée. L'OS hôte est configuré pour attribuer de la mémoire de manière granulaire de sorte que chaque attribution de mémoire à un locataire comprenne des adresses de mémoire correspondant à un ensemble de conflits ayant une taille d'ensemble de conflits. L'ensemble de conflits comprend une première région de conflit associée à la première plage interchangeable d'adresses de mémoire avec la mémoire proche et une deuxième région de conflit associée à la deuxième plage interchangeable d'adresses de mémoire avec la mémoire éloignée.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263319109P | 2022-03-11 | 2022-03-11 | |
US63/319,109 | 2022-03-11 | ||
US17/735,767 US11860783B2 (en) | 2022-03-11 | 2022-05-03 | Direct swap caching with noisy neighbor mitigation and dynamic address range assignment |
US17/735,767 | 2022-05-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023172319A1 true WO2023172319A1 (fr) | 2023-09-14 |
Family
ID=84901263
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/052607 WO2023172319A1 (fr) | 2022-03-11 | 2022-12-13 | Mise en cache d'échange direct avec réduction d'effet de parasites de voisin et attribution de plage d'adresses dynamique |
Country Status (2)
Country | Link |
---|---|
TW (1) | TW202340931A (fr) |
WO (1) | WO2023172319A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014143036A1 (fr) * | 2013-03-15 | 2014-09-18 | Intel Corporation | Procédé pour épingler des données dans un grand cache dans un système de mémoire à plusieurs niveaux |
US20180260323A1 (en) * | 2017-03-10 | 2018-09-13 | Oracle International Corporation | Methods to utilize heterogeneous memories with variable properties |
WO2020007813A1 (fr) * | 2018-07-04 | 2020-01-09 | Koninklijke Philips N.V. | Dispositif informatique présentant une résistance accrue contre les attaques par martèlement |
US20210089465A1 (en) * | 2019-09-25 | 2021-03-25 | Nvidia Corp. | Addressing cache slices in a last level cache |
WO2021057489A1 (fr) * | 2019-09-29 | 2021-04-01 | 华为技术有限公司 | Procédé et dispositif de gestion de mémoire de machine virtuelle |
-
2022
- 2022-12-13 WO PCT/US2022/052607 patent/WO2023172319A1/fr unknown
-
2023
- 2023-02-09 TW TW112104542A patent/TW202340931A/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014143036A1 (fr) * | 2013-03-15 | 2014-09-18 | Intel Corporation | Procédé pour épingler des données dans un grand cache dans un système de mémoire à plusieurs niveaux |
US20180260323A1 (en) * | 2017-03-10 | 2018-09-13 | Oracle International Corporation | Methods to utilize heterogeneous memories with variable properties |
WO2020007813A1 (fr) * | 2018-07-04 | 2020-01-09 | Koninklijke Philips N.V. | Dispositif informatique présentant une résistance accrue contre les attaques par martèlement |
US20210089465A1 (en) * | 2019-09-25 | 2021-03-25 | Nvidia Corp. | Addressing cache slices in a last level cache |
WO2021057489A1 (fr) * | 2019-09-29 | 2021-04-01 | 华为技术有限公司 | Procédé et dispositif de gestion de mémoire de machine virtuelle |
EP4030289A1 (fr) * | 2019-09-29 | 2022-07-20 | Huawei Technologies Co., Ltd. | Procédé et dispositif de gestion de mémoire de machine virtuelle |
Also Published As
Publication number | Publication date |
---|---|
TW202340931A (zh) | 2023-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113810312B (zh) | 用于管理存储器资源的系统和方法 | |
US9760497B2 (en) | Hierarchy memory management | |
US11789878B2 (en) | Adaptive fabric allocation for local and remote emerging memories based prediction schemes | |
KR102519904B1 (ko) | 영구 메모리 할당 및 구성 | |
US9652405B1 (en) | Persistence of page access heuristics in a memory centric architecture | |
US11403141B2 (en) | Harvesting unused resources in a distributed computing system | |
JP2014021972A (ja) | 複数の仮想マシンとして動作する複数のシステムによって共有されるストレージ・キャッシングでの柔軟性を改善するための方法および構造 | |
US11789609B2 (en) | Allocating memory and redirecting memory writes in a cloud computing system based on temperature of memory modules | |
US11455239B1 (en) | Memory reduction in a system by oversubscribing physical memory shared by compute entities supported by the system | |
US20240103876A1 (en) | Direct swap caching with zero line optimizations | |
US11860783B2 (en) | Direct swap caching with noisy neighbor mitigation and dynamic address range assignment | |
EP3959611A1 (fr) | Système de déplacement de données de notation intra-dispositif | |
US10983832B2 (en) | Managing heterogeneous memory resource within a computing system | |
US7793051B1 (en) | Global shared memory subsystem | |
US11687443B2 (en) | Tiered persistent memory allocation | |
WO2023172319A1 (fr) | Mise en cache d'échange direct avec réduction d'effet de parasites de voisin et attribution de plage d'adresses dynamique | |
US20230229498A1 (en) | Systems and methods with integrated memory pooling and direct swap caching | |
WO2023140911A1 (fr) | Systèmes et procédés avec mise en commun de mémoire et mise en cache par permutation directe intégrées | |
US20190138218A1 (en) | Memory system and operating method thereof | |
CN118511151A (zh) | 具有集成存储器池化和直接交换高速缓存的系统和方法 | |
US20240264759A1 (en) | Method and apparatus to perform memory reconfiguration without a system reboot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22840497 Country of ref document: EP Kind code of ref document: A1 |