CN117743253A - Interface for remote memory - Google Patents

Interface for remote memory Download PDF

Info

Publication number
CN117743253A
CN117743253A CN202310780755.7A CN202310780755A CN117743253A CN 117743253 A CN117743253 A CN 117743253A CN 202310780755 A CN202310780755 A CN 202310780755A CN 117743253 A CN117743253 A CN 117743253A
Authority
CN
China
Prior art keywords
interface
memory
cxl
processing circuitry
circuit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310780755.7A
Other languages
Chinese (zh)
Inventor
M·加格
R·阿马里
P·里希纳穆尔蒂
崔昌皓
奇亮奭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/054,492 external-priority patent/US20240095171A1/en
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN117743253A publication Critical patent/CN117743253A/en
Pending legal-status Critical Current

Links

Abstract

A system having an interface for remote memory is disclosed. In some embodiments, the system comprises: an interface circuit, the interface circuit having: a first interface configured to be connected to a processing circuit; and a second interface configured to connect to a memory, the first interface comprising a cache coherence interface, and the second interface being different from the first interface.

Description

Interface for remote memory
Cross Reference to Related Applications
The present application claims priority and benefit from U.S. provisional application No. 63/408,725, entitled "REMOTE ACCESS SOLUTION FOR CXL MEMORY CLUSTERING ACROSS SERVERS," filed on month 21 of 2022, the entire contents of which are incorporated herein by reference.
Technical Field
One or more aspects in accordance with embodiments of the present disclosure relate to computing systems, and more particularly to interfaces for remote memory.
Background
In computing systems, a host Central Processing Unit (CPU) may be connected to host memory through, for example, an address bus and a data bus, or through the use of a high-speed interconnect such as CXL. Some systems for forming connections to memory may limit the length of conductors (e.g., cables) that may be used to form these connections.
Aspects of the present disclosure are directed to such a general technical environment.
Disclosure of Invention
According to an embodiment of the present disclosure, there is provided a system, including: an interface circuit, the interface circuit having: a first interface configured to be connected to a processing circuit; and a second interface configured to connect to a memory, the first interface comprising a cache coherence interface, and the second interface being different from the first interface.
In some embodiments, the system further comprises a memory server connected to the second interface.
In some embodiments, the second interface comprises a remote direct memory access interface.
In some embodiments, the second interface comprises a computer cluster interconnect interface.
In some embodiments, the computer cluster interconnect interface comprises an ethernet interface.
In some embodiments, the memory server is connected to the second interface by a cable having a length greater than 6 feet.
In some embodiments, the cache coherence interface comprises a computing quick link (CXL) interface.
In some embodiments, the first interface is configured to: transmitting data to the processing circuitry in response to a load instruction executed by the processing circuitry; and receiving data from the processing circuitry in response to a store instruction executed by the processing circuitry.
In some embodiments, the system further comprises: a fast link (CXL) root complex is computed, the CXL root complex coupled between the processing circuitry and the first interface.
According to an embodiment of the present disclosure, there is provided a system, including: an interface circuit, the interface circuit having: a first interface configured to be connected to a processing circuit; and a second interface configured to connect to a memory, the first interface comprising a computing quick link (CXL) interface, and the second interface being different from the first interface.
In some embodiments, the system further comprises a memory server connected to the second interface.
In some embodiments, the second interface comprises a remote direct memory access interface.
In some embodiments, the second interface comprises a computer cluster interconnect interface.
In some embodiments, the computer cluster interconnect interface comprises an ethernet interface.
In some embodiments, the memory server is connected to the second interface by a cable having a length greater than 6 feet.
In some embodiments, the CXL interface includes a cache coherence interface.
In some embodiments, the first interface is configured to: transmitting data to the processing circuitry in response to a load instruction executed by the processing circuitry; and receiving data from the processing circuitry in response to a store instruction executed by the processing circuitry.
In some embodiments, the system further comprises: a CXL root complex, the CXL root complex coupled between the processing circuitry and the first interface.
According to an embodiment of the present disclosure, there is provided a method comprising: executing, by the central processing unit, a store instruction for storing a first value in a first memory location at a first address; transmitting, by an interface circuit, a store command to a memory including the first memory location in response to the executing the store instruction, the store command being a command to store the first value in the first memory location, wherein the interface circuit has: a first interface connected to the central processing unit; and a second interface, the second interface being connected to the memory, the first interface comprising a computing quick link (CXL) interface, and the second interface being different from the first interface.
In some embodiments, the method further comprises: executing, by the central processing unit, a load instruction for loading a value in a second memory location at a second address into a register of the central processing unit; a read command is sent by the interface circuit to the memory in response to the execution of the load instruction, the read command being a command to read a value in the second memory location.
Drawings
These and other features and advantages of the present disclosure will be appreciated and understood with reference to the specification, claims and appended drawings, wherein:
FIG. 1A is a block diagram of a single host computing system according to an embodiment of the present disclosure;
FIG. 1B is a flowchart of a startup process according to an embodiment of the present disclosure;
FIG. 1C is a block diagram of a single host computing system including a graphics processing unit, according to an embodiment of the present disclosure;
FIG. 1D is a block diagram of a single host computing system including a plurality of different memory pool servers, according to an embodiment of the present disclosure;
FIG. 2 is a block diagram of a multi-host computing system according to an embodiment of the present disclosure;
FIG. 3 is an operational block diagram of a single host computing system according to an embodiment of the present disclosure;
FIG. 4 is a block diagram of a multi-host computing system with a switch according to an embodiment of the present disclosure; and is also provided with
Fig. 5 is a flow chart of a method according to an embodiment of the present disclosure.
Detailed Description
The detailed description set forth below in connection with the appended drawings is intended as a description of exemplary embodiments of an interface for remote memory provided in accordance with the present disclosure and is not intended to represent the only forms in which the present disclosure may be constructed or utilized. The description sets forth the features of the present disclosure in connection with the illustrated embodiments. However, it is to be understood that the same or equivalent functions and structures may be accomplished by different embodiments that are also intended to be encompassed within the scope of the disclosure. Like element numbers are intended to indicate like elements or features, as shown elsewhere herein.
In various computing applications, such as when a new application is started on a host, the host demand for memory may change over time when a currently running application is shut down or when the user demand for the application changes. However, it may be costly to equip the host with enough main memory to handle the highest foreseeable demand. Further, cable length limitations that may exist for some interfaces between a host Central Processing Unit (CPU) and memory may limit the capacity available to the memory (e.g., limited to the capacity available within the same rack as the host CPU).
Thus, some embodiments create a mechanism that allows host applications to access memory pools outside of the local rack level using normal load and store commands. Such embodiments may use a cache coherency interface to interface circuitry (e.g., a fast computation link (CXL) interface) operable as a front end of a memory pool with a low latency remote memory access protocol (e.g., RDMA) memory pool operable as a back end of the memory pool. The system may dynamically allocate resources through the interface circuitry and the shared memory pool. Such embodiments may enable the decomposition of memory into physically separate memory resources (e.g., resources that do not need to be in the same rack as the processing circuitry that uses these resources), thereby avoiding the limitation on some memory interface links (e.g., peripheral component interconnect express (PCIe) -based links such as CXLs that may have cable lengths limited to lengths between 8 inches (3 rd generation PCIe) and 15 inches (1 st generation PCIe), for example).
In some embodiments, the host may use a pool of load and store semantic access memory and avoid the need to implement physical resources (such as Dynamic Random Access Memory (DRAM)) on devices that are in the same rack as the host. The split memory (via low latency remote memory access protocol (e.g., RDMA)) may be used to allocate resources at run-time. In some embodiments, the interface circuitry may be reconfigured to any size and mapped to a remote memory pool, and the remote memory pool may provide a set of streamed memory resources that may be dynamically combined to meet the needs of the server in the composable split memory architecture. Some embodiments result in lower Total Cost of Ownership (TCO) and overcome cable length limitations present in some interfaces by supporting long-range memory decomposition by means of RDMA, which is compatible with cables having lengths greater than, for example, 6 feet.
Referring to fig. 1A, in some embodiments, a computing system 100 includes a host 102 including a Central Processing Unit (CPU) 105 (which may be or include processing circuitry) and a local memory 110 (which may be a Double Data Rate (DDR) memory), and a memory system 115. The memory system may include interface circuitry 120 that includes a front-end interface 130 (e.g., cxl.mem) with a cache coherent memory access protocol and a back-end interface 135 (e.g., RDMA enabled network interface card) with low latency remote memory access capabilities. Front-end interface 130 may be a CXL interface (e.g., cxl.mem); in this case, the interface circuit 120 may be referred to as a CXL device. The memory pool server 125 may include a back-end interface 135 and a memory pool 140. The memory pool 140 may include, for example, a dynamic random access memory library configured, for example, as memory modules, each memory module including a plurality of memory chips on a printed circuit board. The memory may be directly connected to the back-end interface 135 of the memory pool server 125, or some or all of the memory may be implemented in one or more memory servers connected to the memory pool server 125 through a computer cluster interconnect interface.
The front-end interface 130 may be connected to an address bus and a data bus of the CPU 105. Thus, from the perspective of the CPU 105, the storage provided by the memory system 115 may be substantially the same as the local memory 110, and in operation, the memory system 115 may respond directly to load and store instructions executed by the CPU 105 when the addresses of the load and store instructions are within the physical address range allocated to the interface circuitry 120. This ability of the memory system 115 to respond directly to load and store instructions executed by the CPU 105 may eliminate the need for the CPU to invoke driver functions to store data in the memory system 115 or retrieve data from the memory system 115.
The host 102 may treat the interface circuit 120 as a CXL device that advertises its memory resources to the host 102 through CXL discovery. For example, an appropriate value stored in a Base Address Register (BAR) in the CXL interface of interface circuit 120 may determine the size of the memory address range allocated to interface circuit 120. At start-up, to determine the size of memory available through interface circuit 120, the CPU may write all binary one words to the appropriate base address register and then read the same base address register; in response, interface circuit 120 may send a word indicating the size of the available memory area. The host 102 may use memory resources by executing load instructions or store instructions of the instruction set of the CPU 105.
FIG. 1B illustrates a method that may be employed at start-up. In some embodiments, at startup, the RDMA network interface controller 135 in the interface circuit 120 sends a discovery message to the memory pool server 125 at 155. The memory pool server 125 then responds 160 to the discovery message with a response reporting the capabilities of the memory pool server 125. The reported capabilities may include, for example, the capacity of the memory pool server 125, the bandwidth of the memory pool server 125, and the latency of the memory pool server 125. The RDMA network interface controller 135 may maintain this capability information and provide the capability information to the memory interface 130 at 165 for CXL startup. The memory interface 130 may then perform a CXL startup at 170 and provide the memory-related information to the host 102 using a Consistent Device Attribute Table (CDAT), which may be employed under the CXL standard to send the memory information to the host.
The connection between the interface circuit 120 and the memory pool server 125 may be a remote direct memory access connection, as shown. The remote direct memory access connection may include a cable 145 (e.g., a power cable (having multiple conductors) or an optical cable (including multiple optical fibers)). Cable 145 may form a connection between back-end interface 135 in interface circuit 120 and another back-end interface 135 in memory pool server 125. The interface between RDMA network interface controllers 135 may be an Ethernet or any other suitable computer cluster interconnect interface, such as InfiniBand or fibre channel.
The configuration of the back-end interface 135 of the interface circuit 120 may include using an Internet Protocol (IP) address of the memory pool server 125 (which may be part of the configuration of the back-end interface 135) to communicate with the memory pool server 125. The interface circuit 120 may negotiate with the memory pool server 125 for memory resources at startup and establish a remote direct memory access connection with the RDMA server to perform a read or write operation.
From the perspective of host 102, interface circuit 120 may be a CXL type 3 device. In operation, the remote direct memory access system may generate one or more Queue Pairs (QP) and register one or more Memory Regions (MR). In response to load and store operations performed by the CPU 105 of the host 102, the queue pair and memory region may then be used to perform remote direct memory access read operations and remote direct memory access write operations.
For example, when the CPU 105 of the host 102 executes a store instruction for storing a first value in a first memory location at a first address, the first address being mapped to the interface circuit 120, the interface circuit 120 may receive the store instruction (as a result of the first address being mapped to the interface circuit 120), and in response to executing the store instruction, the interface circuit 120 may send a store command to the memory pool server 125. The storage command may be sent via a remote direct memory access; for example, for storing the first value in the memory pool 140 of the memory pool server 125, the interface circuit 120 may initiate a remote direct memory access write transfer to store the first value in the memory pool 140.
As another example, if the CPU 105 of the host 102 executes a load instruction to load a value in a second memory location at a second address into a register of the CPU 105, the second address being mapped to the interface circuit 120, the interface circuit 120 may receive the load instruction (as a result of the second address being mapped to the interface circuit 120), and in response to executing the load instruction, the interface circuit 120 may send a read command to the memory pool server 125. The read command may be sent via a remote direct memory access; for example, for reading values stored in the memory pool 140 of the memory pool server 125, the interface circuit 120 may initiate a remote direct memory access read transmission to read values from the memory pool 140.
CXL 2.0 can support hot plug features. Thus, in embodiments where front-end interface 130 is a CXL interface, a new connection may be made with interface circuit 120 while the host is operating or interface circuit 120 may be disconnected while the host is operating without interfering with the operation of the host. In some embodiments, the latency of memory system 115 is sufficiently low (e.g., as a result of using a low latency protocol such as InfiniBand or RDMA) to enable the support of the cache coherency reservation feature of CXL.cache.
Referring to FIG. 1C, in some embodiments, a Graphics Processing Unit (GPU) 148 is connected to CPU 105 and interface circuit 120. For example, a connection may be made through CXL switch 150. In some embodiments, graphics processing unit 148 is part of host 102, as shown; in other embodiments, graphics processing unit 148 may be the main processing circuitry of another host, or graphics processing unit 148 may be a separate CXL device (e.g., a type 2CXL device) that is not part of host 102 and is connected to the host like interface circuitry 120 through a CXL link. In some embodiments, graphics processing unit 148 is in a type 2CXL device.
Further, as shown in FIG. 1C, in some embodiments, a plurality of memory pool servers 125 are connected to the RDMA network interface controller 135 of the interface circuit 120. Each of the memory pool servers 125 may be connected to the interface circuit 120 via any suitable type of connection (e.g., ethernet, infiniband, or fibre channel), as shown. The interface circuit 120 may include a single RDMA network interface controller 135 capable of supporting multiple respective connections to multiple memory pool servers 125, as shown, or the interface circuit 120 may include multiple RDMA network interface controllers 135 or one NIC with multiple physical interfaces, each connected to the memory interface 130 of the interface circuit 120 and each (i) connected to a respective one of the memory pool servers 125 and (ii) configured to support a protocol (e.g., ethernet, infiniband, or fibre channel) used in the connection between the interface circuit 120 and the respective memory pool server 125. For example, a connection formed using Infiniband may have lower latency than a connection using Ethernet. Such a connection (e.g., a connection using Infiniband) may meet the latency requirements of one or more of the three CXL protocols (cxl.io, cxl.mem, and cxl.cache).
Referring to fig. 1D, in some embodiments, memory pool servers 125 are differently configured, e.g., one of memory pool servers 125 may include memory pool 140a optimized for high bandwidth, one of memory pool servers 125 may include memory pool 140b optimized for low latency, and one of memory pool servers 125 may include memory pool 140c optimized for high capacity. In some embodiments, the type of connection employed to connect the memory pool server 125 to the interface circuit 120 may be selected to provide a level of performance to the host 102. For example, an Infiniband connection, which may have a relatively low latency, may be used to connect the memory pool server 125, including the memory pool 140b optimized for low latency, to the interface circuit 120 such that the overall latency experienced by the host 102 is reduced due to both: (i) The type of memory pool 140 used and (ii) the type of connection used (for connecting the memory pool server 125 to the interface circuit 120).
In some embodiments, applications running on host 102 may require memory of different nature; for example, for performance reasons, an application may require low latency memory. In some embodiments, such an application may be aware of different performance characteristics of different memory pool servers 125 connected to the host 102 through the interface circuit 120. The application may access this information as a result of a boot process (described above), which may result in the information being stored in the host (e.g., by the operating system of host 102). The application may then request memory when it requests memory from the operating system with performance characteristics that will result in acceptable performance for the application.
Referring to fig. 2, in some embodiments, multiple hosts may each connect to the memory pool server 125 and share memory resources of the memory pool server 125. As in the embodiment of fig. 1A, each interface circuit 120 provides a memory device having a configured dynamic memory size to a respective host without the need for physical memory resources (e.g., DRAM memory) physically present in the interface circuit 120. The pool of resolved memory resources may be generated, for example, in one or more memory servers, and it may be placed at a remote location. The remote set of memory servers may be referred to as a memory farm (memory farm). Memory resources, such as remote direct memory access, may be accessed using a low latency network protocol; in such embodiments, memory resources may be used efficiently and Total Cost of Ownership (TCO) may be reduced.
Referring to FIG. 3, as described above, interface circuit 120 may occupy a portion of the physical address space of CPU 105. Host 102 may also include (in addition to CPU 105 and local memory 110) a CXL root complex 305, which may form an interface on the host side to the CXL link of fig. 1A. In some embodiments, in operation, a host writes to CXL-DRAM of a physical address space map, a request is sent to CXL root complex 305 (via the address bus and data bus of CPU 105), CXL root complex 305 generates a Transaction Layer Packet (TLP) and sends the TLP to interface circuit 120, and interface circuit 120 converts the transaction layer packet into a remote direct memory access and sends the remote direct memory access over a computer cluster interconnect interface.
Fig. 4 shows an embodiment that includes a CXL switch 405. Each of the plurality of hosts 102 is connected to a CXL switch 405 (which may be or may include, for example, a CXL 2.0 switch) that is connected to one or more interface circuits 120 (each labeled "IC" in fig. 4) and zero or more other CXL devices 410 (each labeled "D" in fig. 4). The interface circuits 120 may be connected to a single shared memory pool server 125 (as shown) through a remote direct memory access connection, or the interface circuits may be connected to multiple memory pool servers 125 (e.g., each interface circuit 120 may be connected to a respective memory pool server 125).
FIG. 5 is a flow chart of a method in some embodiments. The method comprises the following steps: at 505, executing, by the central processing unit, a store instruction for storing a first value in a first memory location at a first address; and at 510, sending, by the interface circuit, a store command to a memory including the first memory location in response to executing the store instruction, the store command being a command to store the first value in the first memory location. The method may further comprise: at 515, a load instruction is executed by the central processing unit to read a second value, which may be stored in a second memory location at a second address; and at 520, a read command is sent by the interface circuit to the memory including the second memory location in response to executing the load instruction, the read command being a command to read the second value from the second memory location.
As used herein, a computer cluster interconnect interface is any interface suitable for interconnecting computers, such as infiniband, ethernet, or fibre channel.
As used herein, "a portion" of something means "at least some" of the thing, and as such may mean less than all of the thing or all of the thing. Also, "a part" of an object includes the whole object as a special case, that is, the whole object is an example of a part of the object. As used herein, when the second amount is "within" the first amount X, it is meant that the second amount is at least X-Y and the second amount is at most x+y. As used herein, when the second value is within "Y% of the first value, it is meant that the second value is at least (1-Y/100) times the first value, and the second value is at most (1+Y/100) times the first value. As used herein, the term "or" should be interpreted as "and/or" such that, for example, "a or B" means "a" or "B" or any one of "a and B".
The background provided in the background section of this disclosure is only included to set context and the contents of this section are not considered prior art. Any component or any combination of components described (e.g., in any system diagram included herein) may be used to perform one or more operations of any flowchart included herein. Furthermore, (i) the operations are exemplary operations and may include various additional steps not explicitly contemplated, and (ii) the temporal order of the operations may vary.
Each of the terms "processing circuitry" and "means for processing" is used herein to mean any combination of hardware, firmware, and software for processing data or digital signals. The processing circuit hardware may include, for example, application Specific Integrated Circuits (ASICs), general purpose or special purpose Central Processing Units (CPUs), digital Signal Processors (DSPs), graphics Processing Units (GPUs), and programmable logic devices such as Field Programmable Gate Arrays (FPGAs). As used herein, each function is performed by hardware (i.e., hardwired) configured to perform the function, or by more general-purpose hardware (such as a CPU) configured to execute instructions stored in a non-transitory storage medium, in a processing circuit. The processing circuitry may be fabricated on a single Printed Circuit Board (PCB) or distributed over several interconnected PCBs. The processing circuitry may comprise other processing circuitry; for example, the processing circuitry may include two processing circuits interconnected on a PCB, an FPGA and a CPU.
As used herein, when a method (e.g., adjustment) or a first quantity (e.g., a first variable) is referred to as being "based on" a second quantity (e.g., a second variable), this means that the second quantity is an input to the method or affects the first quantity, e.g., the second quantity may be an input (e.g., a unique input, or one of several inputs) that is a function of calculating the first quantity, or the first quantity may be equal to the second quantity, or the first quantity may be the same as the second quantity (e.g., the same location or locations stored in memory as the second quantity).
It will be understood that, although the terms "first," "second," "third," etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Accordingly, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the spirit and scope of the present inventive concept.
For ease of description to describe the relationship of one element or feature to another element or feature as illustrated, spatially relative terms such as "below … …", "below … …", "below … …", "below … …", "above … …", "above … …", and the like may be used herein. It will be understood that such spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "below" or "beneath" other elements or features would then be oriented "above" the other elements or features. Thus, the exemplary terms "below" and "beneath" can encompass both an orientation of above and below. The device may be otherwise oriented (e.g., rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly. Further, it will also be understood that when a layer is referred to as being "between" two layers, it can be the only layer between the two layers, or one or more intervening layers may also be present.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the inventive concepts. As used herein, the terms "substantially," "about," and the like are used as approximate terms, rather than degree terms, and are intended to illustrate inherent variations of measured or calculated values that would be recognized by one of ordinary skill in the art.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. Expressions such as "at least one of" modify the entire list of elements when used after the list of elements, without modifying individual elements in the list. Furthermore, the use of "may" when describing embodiments of the inventive concepts refers to "one or more embodiments of the present disclosure. In addition, the term "exemplary" is intended to refer to an example or illustration. As used herein, the terms "use", "using" and "used" may be considered synonymous with the terms "utilized", "utilized" and "utilized", respectively.
It will be understood that when an element or layer is referred to as being "on," "connected to," "coupled to," or "adjacent to" another element or layer, it can be directly on "…," "connected to," "coupled to," or "adjacent to" the other element or layer or one or more intervening elements or layers may be present. In contrast, when an element or layer is referred to as being "directly on," "directly connected to," "directly coupled to," or "directly adjacent to" another element or layer, there are no intervening elements or layers present.
Any numerical range recited herein is intended to include all sub-ranges subsumed with the same numerical precision within the recited range. For example, a range of "1.0 to 10.0" or "between 1.0 and 10.0" is intended to include all subranges between (and including) the minimum value of 1.0 recited and the maximum value of 10.0 recited, i.e., having a minimum value equal to or greater than 1.0 and a maximum value equal to or less than 10.0, such as, for example, 2.4 to 7.6. Similarly, a range described as "within 35% of 10" is intended to include all subranges between (and including) the recited minimum value of 6.5 (i.e., (1-35/100) times 10) and the recited maximum value of 13.5 (i.e., (1+35/100) times 10), i.e., having a minimum value equal to or greater than 6.5 and a maximum value equal to or less than 13.5, such as, for example, 7.4 to 10.6. Any maximum numerical limitation recited herein is intended to include all lower numerical limitations subsumed therein, and any minimum numerical limitation recited in the present specification is intended to include all higher numerical limitations subsumed therein.
Some embodiments include features listed in the following numbered statements.
1. A system, comprising:
an interface circuit having:
a first interface configured to be connected to the processing circuit; and
a second interface configured to be connected to the memory,
the first interface includes a cache coherency interface, and
the second interface is different from the first interface.
2. The system of statement 1, further comprising a memory server connected to the second interface.
3. The system of statement 1 or statement 2, wherein the second interface comprises a remote direct memory access interface.
4. The system of any of the preceding statements, wherein the second interface comprises a computer cluster interconnect interface.
5. The system of statement 4, wherein the computer cluster interconnect interface comprises an ethernet interface.
6. The system of any of statements 2-5, wherein the memory server is connected to the second interface by a cable having a length greater than 6 feet.
7. The system of any one of the preceding statements, wherein the cache coherence interface comprises a computing quick link (CXL) interface.
8. The system of any of the preceding statements, wherein the first interface is configured to:
transmitting data to the processing circuitry in response to a load instruction executed by the processing circuitry; and is also provided with
Data is received from the processing circuitry in response to a store instruction executed by the processing circuitry.
9. The system of any of the preceding statements, further comprising: a fast link (CXL) root complex is computed, the CXL root complex coupled between the processing circuitry and the first interface.
10. A system, comprising:
an interface circuit, the interface circuit having:
a first interface configured to be connected to a processing circuit; and
a second interface configured to connect to a memory,
the first interface includes a computing quick link (CXL) interface, an
The second interface is different from the first interface.
11. The system of statement 10, further comprising a memory server connected to the second interface.
12. The system of statement 10 or statement 11, wherein the second interface comprises a remote direct memory access interface.
13. The system of any of statements 10-12, wherein the second interface comprises a computer cluster interconnect interface.
14. The system of any of statements 10-13, wherein the computer cluster interconnect interface comprises an ethernet interface.
15. The system of any of statements 10-14, wherein the memory server is connected to the second interface by a cable having a length greater than 6 feet.
16. The system of any one of statements 10-15, wherein the CXL interface comprises a cache coherence interface.
17. The system of any of statements 10-16, wherein the first interface is configured to:
transmitting data to the processing circuitry in response to a load instruction executed by the processing circuitry; and is also provided with
Data is received from the processing circuitry in response to a store instruction executed by the processing circuitry.
18. The system of any of statements 10-17, further comprising: a CXL root complex, the CXL root complex coupled between the processing circuitry and the first interface.
19. A method, comprising:
the store instructions for storing the first value in the first memory location at the first address are executed by the central processing unit,
transmitting, by an interface circuit, a store command to a memory including the first memory location in response to the executing the store instruction, the store command being a command to store the first value in the first memory location,
wherein the interface circuit has:
a first interface connected to the central processing unit; and
a second interface, said second interface being connected to said memory,
the first interface includes a computing quick link (CXL) interface, an
The second interface is different from the first interface.
20. The method of statement 19, further comprising:
a load instruction is executed by the central processing unit for loading a value in a second memory location at a second address into a register of the central processing unit,
a read command is sent by the interface circuit to the memory in response to the execution of the load instruction, the read command being a command to read a value in the second memory location.
Although exemplary embodiments of an interface for remote memory have been specifically described and illustrated herein, many modifications and variations will be apparent to those skilled in the art. Thus, it should be understood that interfaces for remote memory constructed in accordance with the principles of the present disclosure may be implemented differently than as specifically described herein. The invention is also defined in the appended claims and equivalents thereof.

Claims (20)

1. A system, comprising:
an interface circuit, the interface circuit having:
a first interface configured to be connected to a processing circuit; and
a second interface configured to connect to a memory,
the first interface includes a cache coherency interface, and
the second interface is different from the first interface.
2. The system of claim 1, further comprising a memory server connected to the second interface.
3. The system of claim 2, wherein the second interface comprises a remote direct memory access interface.
4. The system of claim 2, wherein the second interface comprises a computer cluster interconnect interface.
5. The system of claim 4, wherein the computer cluster interconnect interface comprises an ethernet interface.
6. The system of claim 2, wherein the memory server is connected to the second interface by a cable having a length greater than 6 feet.
7. The system of claim 1, wherein the cache coherence interface comprises a computing quick link (CXL) interface.
8. The system of claim 1, wherein the first interface is configured to:
transmitting data to the processing circuitry in response to a load instruction executed by the processing circuitry; and is also provided with
Data is received from the processing circuitry in response to a store instruction executed by the processing circuitry.
9. The system of claim 1, further comprising: a computing fast link (CXL) root complex is coupled between the processing circuitry and the first interface.
10. A system, comprising:
an interface circuit, the interface circuit having:
a first interface configured to be connected to a processing circuit; and
a second interface configured to connect to a memory,
the first interface includes a computing quick link (CXL) interface, an
The second interface is different from the first interface.
11. The system of claim 10, further comprising a memory server connected to the second interface.
12. The system of claim 11, wherein the second interface comprises a remote direct memory access interface.
13. The system of claim 11, wherein the second interface comprises a computer cluster interconnect interface.
14. The system of claim 13, wherein the computer cluster interconnect interface comprises an ethernet interface.
15. The system of claim 13, wherein the memory server is connected to the second interface by a cable having a length greater than 6 feet.
16. The system of claim 13, wherein the CXL interface comprises a cache coherence interface.
17. The system of claim 10, wherein the first interface is configured to:
transmitting data to the processing circuitry in response to a load instruction executed by the processing circuitry; and is also provided with
Data is received from the processing circuitry in response to a store instruction executed by the processing circuitry.
18. The system of claim 10, further comprising: a CXL root complex, the CXL root complex coupled between the processing circuitry and the first interface.
19. A method, comprising:
the store instructions for storing the first value in the first memory location at the first address are executed by the central processing unit,
transmitting, by an interface circuit, a store command to a memory including the first memory location in response to executing the store instruction, the store command being a command to store the first value in the first memory location,
wherein the interface circuit has:
a first interface connected to the central processing unit; and
a second interface, said second interface being connected to said memory,
the first interface includes a computing quick link (CXL) interface, an
The second interface is different from the first interface.
20. The method of claim 19, further comprising:
a load instruction is executed by the central processing unit for loading a value in a second memory location at a second address into a register of the central processing unit,
a read command is sent by the interface circuit to the memory in response to executing the load instruction, the read command being a command to read a value in the second memory location.
CN202310780755.7A 2022-09-21 2023-06-28 Interface for remote memory Pending CN117743253A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US63/408,725 2022-09-21
US18/054,492 2022-11-10
US18/054,492 US20240095171A1 (en) 2022-09-21 2022-11-10 Interface for remote memory

Publications (1)

Publication Number Publication Date
CN117743253A true CN117743253A (en) 2024-03-22

Family

ID=90281857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310780755.7A Pending CN117743253A (en) 2022-09-21 2023-06-28 Interface for remote memory

Country Status (1)

Country Link
CN (1) CN117743253A (en)

Similar Documents

Publication Publication Date Title
US9223579B2 (en) Handling atomic operations for a non-coherent device
US8438337B1 (en) System and method for conditionally sending a request for data to a home node
Woodacre et al. The SGI® AltixTM 3000 global shared-memory architecture
CN113742257A (en) Computing system and method for performing remote direct memory access therein
EP3896574A1 (en) System and method for computing
US20050265108A1 (en) Memory controller which increases bus bandwidth, data transmission method using the same, and computer system having the same
US9547610B2 (en) Hybrid memory blade
US20210075745A1 (en) Methods and apparatus for improved polling efficiency in network interface fabrics
TWI459211B (en) Computer system and method for sharing computer memory
US11573898B2 (en) System and method for facilitating hybrid hardware-managed and software-managed cache coherency for distributed computing
CN107209725A (en) Method, processor and the computer of processing write requests
US20220269433A1 (en) System, method and apparatus for peer-to-peer communication
US7096306B2 (en) Distributed system with cross-connect interconnect transaction aliasing
KR100807443B1 (en) Opportunistic read completion combining
CN115687193A (en) Memory module, system including the same, and method of operating memory module
KR20150136075A (en) Shared memory system
CN116225177B (en) Memory system, memory resource adjusting method and device, electronic equipment and medium
EP4343560A1 (en) Interface for remote memory
CN117743253A (en) Interface for remote memory
TW202414230A (en) System and method for remote access
US11693814B2 (en) Systems and methods for expanding memory access
Bai et al. An Analysis on Compute Express Link with Rich Protocols and Use Cases for Data Centers
JP7206485B2 (en) Information processing system, semiconductor integrated circuit and information processing method
US7194585B2 (en) Coherency controller management of transactions
CN116795742A (en) Storage device, information storage method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication