CN106688208B - Network communication using pooled storage in a rack scale architecture - Google Patents

Network communication using pooled storage in a rack scale architecture Download PDF

Info

Publication number
CN106688208B
CN106688208B CN201480081459.7A CN201480081459A CN106688208B CN 106688208 B CN106688208 B CN 106688208B CN 201480081459 A CN201480081459 A CN 201480081459A CN 106688208 B CN106688208 B CN 106688208B
Authority
CN
China
Prior art keywords
pooled
memory
node
computing node
vnic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480081459.7A
Other languages
Chinese (zh)
Other versions
CN106688208A (en
Inventor
胡潇
X·周
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN106688208A publication Critical patent/CN106688208A/en
Application granted granted Critical
Publication of CN106688208B publication Critical patent/CN106688208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • H04L12/4641Virtual LANs, VLANs, e.g. virtual private networks [VPN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/154Networked environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/16General purpose computing application
    • G06F2212/163Server or database system

Abstract

Technologies for network communications using pooled storage include a computer rack having: a pooled storage enclosure with a pooled storage controller, and a computing enclosure with two or more computing nodes. The first computing node determines a destination virtual network interface controller identifier (vNIC ID) associated with the destination network address. The first computing node transmits a send message to the pooled memory controller, the send message comprising a destination vNIC ID and a sender physical address of the packet data within the pooled memory. The second compute node transmits a recipient physical address of a receive buffer within the pooled memory to the pooled memory controller. The pooled memory controller copies packet data from the sender physical address to the receiver physical address. Other embodiments are also described and claimed.

Description

Network communication using pooled storage in a rack scale architecture
Background
Conventional computer data centers are generally based on servers as the basic unit of computing. Each server typically includes its own dedicated computing resources, including processors, memory, disk storage, and networking hardware and software. Individual servers may be stacked together in racks at high density, and multiple racks may be arranged in a data center.
Some current data center technologies aim to disaggregate (disaggregate) computing resources. In particular, rack-scale architectures reorganize computing racks into a basic unit for computing in large data centers. Each rack may include a set of pooled (pooled) compute nodes, pooled memory, and pooled storage devices. By disaggregating and pooling computing resources, a rack-scale architecture may improve flexibility and scalability of a data center, for example, by allowing computing resources (e.g., compute nodes and/or memory) to be dynamically added and/or partitioned among workloads. In addition, rack-scale architectures may improve data center thermal management and power consumption, which in turn improves computational density, performance, and efficiency.
Drawings
The concepts described herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. For simplicity and clarity of illustration, elements illustrated in the figures have not necessarily been drawn to scale. Where considered appropriate, reference numerals have been repeated among the figures to indicate corresponding or analogous elements.
FIG. 1 is a simplified block diagram of at least one embodiment of a system for network communications utilizing pooled storage in a rack-scale computing architecture;
FIG. 2 is a simplified block diagram of at least one embodiment of several environments of the system of FIG. 1;
FIG. 3 is a simplified flow diagram of at least one embodiment of a method for network communication utilizing pooled storage that may be performed by the pooled storage controllers of FIGS. 1 and 2;
FIG. 4 is a simplified flow diagram of at least one embodiment of a method for sending network data utilizing pooled storage that may be performed by the computing nodes of FIGS. 1 and 2; and
fig. 5 is a simplified flow diagram of at least one embodiment of a method for receiving network data utilizing pooled storage that may be performed by the computing nodes of fig. 1 and 2.
Detailed Description
While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the concepts of the present disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.
References in the specification to "one embodiment," "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not include the particular feature, structure, or characteristic. Moreover, such phases do not necessarily refer to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the purview of one skilled in the art to effect such feature, structure, or characteristic in connection with other ones of the embodiments whether or not explicitly described. Additionally, it is to be understood that items contained in the list in the form of "at least one of A, B and C" can represent (A); (B) (ii) a (C) (ii) a (A and B); (B and C); (A and C); or (A, B and C). Similarly, an item listed in the form of "at least one A, B or C" may represent (a); (B) (ii) a (C) (ii) a (A and B); (B and C); (A and C); or (A, B and C).
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on a transitory or non-transitory machine-readable (e.g., computer-readable) storage medium, which may be read and executed by one or more processors. A machine-readable storage medium may be implemented as any storage device, mechanism or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a magnetic disk, or other media device).
In the drawings, some features of structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that this particular arrangement and/or order is not required. Rather, in some embodiments, such features may be arranged in a different manner and/or order than that shown in the figures. Moreover, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments and, in some embodiments, may not be included or may be combined with other features.
Referring now to fig. 1, an exemplary system 100 for network communications utilizing pooled storage in a rack-scale computing architecture includes a computing rack 102 including a network switch 104, a pooled computing enclosure 106, and a pooled storage enclosure 108. The pooled computing enclosure 106 includes one or more computing nodes 110. Each compute node 110 can use the pooled memory of the pooled memory shell 108 as system memory. In use, network packets from a source computing node 110 destined for a destination computing node 110 within the same computer chassis 102 are processed by one or more virtual network interface controllers (vnics) of the computing nodes 110. Rather than transmitting packet data via network switch 104, compute node 110 sends a message to pooled memory enclosure 108, and pooled memory enclosure 108 replicates the packet data within the pooled memory without causing network traffic on network switch 104. Thus, the computer rack 102 may improve networking throughput and reduce latency by avoiding multiple copies of data between the pooled memory enclosures 108, the compute nodes 110, and/or the network switch 104.
The computer racks 102 may be implemented as modular computing devices capable of performing the functions described herein, alone or in combination with other computer racks 102. For example, the computer rack 102 may be implemented as a chassis (chassis) or other enclosure for rack-mounted modular computing units, such as computing trays, storage trays, network trays, or conventional rack-mounted components (e.g., servers or switches). Also shown in fig. 1, the computing rack 102 illustratively includes a network switch 104, a pooled computing enclosure 106, and a pooled storage enclosure 108. The computer racks 102 may also include additional pooled computing resources, such as pooled storage devices and pooled networking, as well as associated interconnects, peripherals, power supplies, thermal management systems, and other components. Additionally, while illustrated as including a single network switch 104, pooled computing enclosure 106, and pooled storage enclosure 108, it will be appreciated that in some embodiments, a computing rack 102 may include more than one of each of these devices.
The pooled computing enclosure 106 may be implemented as any chassis, tray, module, or other enclosure capable of supporting the computing node 110 and any associated interconnects, power supplies, thermal management systems, or other associated components. Although illustrated as including two computing nodes 110, it is to be understood that in other embodiments, the pooled computing enclosure 106 may include three or more computing nodes 110, and in some embodiments, these computing nodes 110 may be hot-plugged or otherwise configured.
Each computing node 110 may be implemented as any type of device capable of performing the functions described herein. For example, the computing node 110 may be implemented as, but not limited to: one or more server computing devices, computer motherboards, daughter or expansion cards, systems on a chip, computer processors, consumer electronics, smart appliances, and/or any other computing device or collection of devices capable of handling network communications. As shown in FIG. 1, each illustrated computing node 110 includes a processor 120, an I/O subsystem 122, a communications subsystem 124, and may include memory 128. Of course, the computing node 110 may include other or additional components, such as those commonly found in server computers (e.g., various input/output devices) in other embodiments. Additionally, in some embodiments, one or more of the illustrated components may be incorporated into or otherwise form a part of another component. For example, in some embodiments, the memory 128, or a portion thereof, may be incorporated into the processor 120.
Processor 120 may be implemented as any type of processor capable of performing the functions described herein. For example, a processor may be implemented as a single or multi-core processor, a digital signal processor, a microcontroller, or other processor or processing/control circuitry. Although illustrated as a single processor 120, in some embodiments each compute node 110 may include multiple processors 120. Similarly, the I/O subsystem 122 may be implemented as circuits and/or components to facilitate input/output operations with the processor 120, the communication subsystem 124, the memory 128, and other components of the computing node 110. For example, the I/O subsystem 122 may be implemented as or include a memory controller hub, an input/output control hub, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.), and/or other components and subsystems to facilitate input/output operations. In some embodiments, the I/O subsystem 122 may form part of a system on a chip (SoC) and may be incorporated on a single integrated circuit chip with the processor 120, the communication subsystem 124, the memory 128, and other components of the compute node 110.
Computing node 110 also includes a communication subsystem 124, which may be implemented as any communication circuit, device, or collection thereof, capable of communicating between computing node 110, network switch 104, and/or other remote devices. The communication subsystem 124 may be configured to use any one or more communication technologies (e.g., wired or wireless communication) and associated protocols (e.g., ethernet, etc.),
Figure GDA0002267631610000041
WiMAX, etc.) to enable such communication. In the exemplary embodiment, communication subsystem 124 includes a Network Interface Controller (NIC)126 to communicate network packets with network switch 104.
The memory 128 may be implemented as any type of volatile or non-volatile memory or data storage device capable of performing the functions described herein. In operation, the memory 128 may store various data and software (e.g., operating systems, applications, programs, libraries, and drivers) used during operation of the computing node 110. In some embodiments, the memory 128 may temporarily cache or otherwise store data maintained by the pooled memory shell 108. As shown, in some embodiments, the compute node 110 may not include any on-board private memory 128.
The pooled storage enclosure 108 may be implemented as any chassis, tray, module, or other enclosure capable of housing a storage module to be pooled and accessed by the compute nodes 110. Pooled storage enclosure 108 includes pooled storage controller 140 and pooled storage 142. Pooled memory controller 140 may be implemented as any computer, processor, microcontroller, or other computing device capable of providing computing node 110 access to pooled memory 142 and otherwise performing the functions described herein. For example, pooled memory controller 140 may be implemented as, but is not limited to: a server computing device, a computer motherboard or daughter card, a system on a chip, a computer processor, a consumer electronic device, a smart appliance, and/or any other computing device capable of performing the functions described herein.
The pooled memory 142 may be implemented as any type of volatile or non-volatile memory or data storage device capable of performing the functions described herein. For example, pooled memory 142 may be implemented as a large number of conventional RAM IDMMs. In operation, each compute node 110 may use pooled memory 142 as main memory. Thus, pooled memory 142 may store various data and software (e.g., operating systems, applications, programs, libraries, and drivers) used during the operation of each compute node 110. In some embodiments, pooled memory controller 140 may partition or otherwise isolate portions of pooled memory 142 in compute node 110.
As shown in FIG. 1, pooled memory shell 108 is coupled to each compute node 110 via an interconnect 144. Interconnect 144 may be implemented as any high speed interconnect capable of transferring data while supporting memory semantics. The interconnect 144 may be implemented as or include: a memory controller hub, an input/output control hub, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.), and/or other components and subsystems to facilitate transferring data between the compute nodes 110 and the pooled memory shells 108. For example, in some embodiments, interconnect 144 may be implemented as or include a silicon photonic switching fabric and a plurality of optical interconnects. As another example, the interconnect 144 may be implemented as or include a copper backplane.
The network switch 104 may be implemented as any networking device capable of switching network packets between the computing nodes 110 and/or other remote computing devices (e.g., the computing nodes 110 of other computing racks 102). For example, the network switch 104 may be implemented as a top of rack switch or other ethernet switch.
The computer racks 102 may be configured to send and receive data over the network 112 using remote devices, such as other computer racks (not shown). The network 112 may be implemented as any number of various wired and/or wireless networks. For example, network 112 may be implemented as or include a wired or wireless Local Area Network (LAN), a wired or wireless Wide Area Network (WAN), and/or a publicly accessible global network (e.g., the Internet), and as such, network 112 may include any number of additional devices, e.g., additional computers, routers, and switches, to facilitate communication among the devices of system 100.
Referring now to FIG. 2, in an exemplary embodiment, each computing node 110 establishes an environment 200 during operation. The exemplary environment 200 includes an application 202, a networking stack 206, and a virtual nic (vnic) module 210. The various modules of environment 200 may be implemented as hardware, firmware, software, or a combination thereof. For example, each module, logic, and other component of environment 200 may form a portion of processor 120, or be established by processor 120, or by other hardware components of computing node 110.
The application 202 is configured to send and receive network packet data with one or more other computing nodes 110 of the computing rack 102. For example, application 202 may be implemented as any combination of a server application, a client application, or a web application. The application 202 may maintain one or more application buffers 204 for storing packet data. Each application buffer 204 may be established in a user-mode memory space and may be identified by a physical memory address of the compute node 110. Each application buffer 204 may be stored in pooled memory 142 by pooled memory controller 140.
The networking stack 206 is configured to provide networking services to the application 202. For example, the networking stack 206 may provide TCP/IP networking services to the application 202. The networking stack 206 may be implemented as one or more supporting applications, libraries, operating system components, device drivers, or other components for providing networking services. The networking stack 206 may include one or more networking stack buffers 208 for storing packet data. In some embodiments, each networking stack buffer 208 may be maintained in a kernel memory space and may be identified by a physical memory address of the compute node 110. Each networking stack buffer 208 may be stored in the pooled memory 142 by the pooled memory controller 140.
vNIC module 210 is configured to emulate a NIC and exchange control messages with pooled memory controller 140 to communicate packet data with another computing node 110. In some embodiments, vNIC module 210 may be implemented as a kernel module, a device driver, or other component capable of interfacing with networking stack 206. vNIC module 210 may include one or more vNIC buffers 212 for storing packet data. In some embodiments, each vNIC buffer 212 may be maintained in a kernel memory space and may be identified by a physical memory address of compute node 110. Each vNIC buffer 212 may be stored in pooled memory 142 by pooled memory controller 140.
vNIC module 210 is also configured to generate, store, or otherwise determine a unique identifier, referred to as a vNICID. vNIC module 210 is also configured to determine a destination vNIC ID associated with a destination network address of the network packet. vNIC module 210 is configured to generate a "send" message that includes the source vNIC ID of the current vNIC module 210, the destination vNIC ID of the destination vNIC module 210, the size of the packet data, and the physical memory address of the packet data. vNIC module 210 is also configured to send the send message to pooled memory controller 140. Additionally, vNIC module 210 is configured to receive a "receive" message from pooled memory controller 140, and in response to the receive message, identify a physical memory address of a pooled memory buffer to receive packet data (e.g., vNIC buffer 212, networking stack buffer 208, or application buffer 204), and send the received physical memory address of the buffer to pooled memory controller 140. vNIC module 210 may also be configured to process one or more interrupts received from pooled memory controller 140 by computing node 110. In some embodiments, these functions may be performed by sub-modules, such as vNIC ID module 214, transmit module 216, and/or receive module 218.
Still referring to FIG. 2, in an exemplary embodiment, pooled memory controller 140 establishes environment 240 during operation. Exemplary environment 240 includes a vNIC interface module 242, a vNIC ID module 244, a data replication module 246, and a pooled memory access module 248. The various modules of environment 240 may be implemented as hardware, firmware, software, or a combination thereof. For example, each module, logic, and other components of environment 240 may form part of or be established by pooled memory controller 140 or other hardware components of pooled memory enclosure 108.
vNIC interface module 242 is configured to communicate send and receive messages with vNIC modules 210 of one or more computing nodes 110. vNIC interface module 242 may establish a dedicated I/O space for each vNIC module 210 (e.g., one or more control registers). The I/O space may be mapped to a physical memory space of an associated compute node 110. Thus, the pooled memory controller 140 and the compute node 110 may communicate messages by writing data at predefined memory addresses. vNIC interface module 242 may also be configured to generate one or more interrupts to computing nodes 110 to coordinate the transmission of messages.
vNIC ID module 244 is configured to associate the vNIC ID of each vNIC module 210 with the network address of the associated computing node 110. The network address may include, for example, an IP address or other high-level network address associated with the computing node 110. vNIC ID module 244 may, for example, determine the destination computing node 110 associated with the destination vNIC ID specified in the "send" message received from computing node 110.
Data copy module 246 is configured to copy data from a sender physical address associated with a source vNIC module 210 to a recipient physical address associated with a destination vNIC module 210 within pooled memory 142. In response to receiving the recipient physical address from the destination computing node 110, the data replication module 246 replicates the data. Data replication module 246 may replicate data directly between data buffers stored within pooled memory 142 (e.g., between vNIC buffer 212, networking stack buffer 208, and/or application buffer 204).
Pooled memory access module 248 is configured to allow computing node 110 to remotely access pooled memory 142. The application 202 and/or other modules of the environment 200 may access the pooled memory 142 as system memory. In particular, as described above, application buffer 204, networking stack buffer 208, and vNIC buffer 212 may be stored within pooled memory 142 and accessed via pooled memory access module 248. In some embodiments, some or all of the pooled memory 142 may be sequestered, partitioned, or otherwise dedicated to a particular compute node 110.
Referring now to fig. 3, in use, pooled storage controller 140 may perform a method 300 of network communication using pooled storage. The method 300 begins at block 302, where the pooled memory controller 140 initializes a vNIC ID associated with a compute node 110 of the computer chassis 102. In particular, pooled memory controller 140 associates the vNIC ID of each computing node 110 with the network address (e.g., IP address) of the corresponding computing node 110. After initialization, pooled memory controller 140 and all compute nodes 110 may be able to map between network addresses and vNIC IDs. Therefore, vNICID may be similarly used for conventional MAC addresses to identify the computing node 110 at the MAC layer. Pooled memory controller 140 may use any technique to obtain vNIC IDs from compute nodes 110 and propagate the vNIC IDs to other compute nodes 110. In some embodiments, pooled memory controller 140 may monitor the vNIC ID received from compute node 110 in block 304. Computing node 110 may send the vNIC ID directly to pooled memory controller 140, for example, or may broadcast the vNIC ID to many attached devices. In some embodiments, pooled memory controller 140 may retransmit the received vNIC ID to other computing nodes 110 in block 306.
In block 308, pooled memory controller 140 monitors for "send" messages received from compute node 110. Pooled memory controller 140 may use any technique to monitor messages from compute node 110. In block 310, in some embodiments, pooled memory controller 140 may read from the I/O space dedicated to the vnics of each compute node 110. The I/O space may be implemented as any memory region accessible to the compute node 110 and can be used to transfer information to the pooled memory controller 140. For example, the I/O space may include one or more control registers of pooled memory controller 140. In block 312, the pooled memory controller 140 determines whether any send messages have been received. If not, the method 300 loops back to block 308 to continue monitoring for sent messages. If a send message has been received, the method 300 proceeds to block 314.
In block 314, pooled memory controller 140 reads the send message and extracts the value it contains. In particular, pooled memory controller 140 reads the source vNIC ID, destination vNIC ID, packet size, and sender physical memory address from the send message. Source vNIC ID and destination vNIC ID represent vNIC IDs corresponding to source computing node 110 and destination computing node 110, respectively. The packet size indicates the amount of packet data (e.g., the number of bytes or octets) to be transferred. The sender physical memory address represents the starting physical address of the packet data in pooled memory 142. Thus, the sender physical memory address represents the memory address of the packet data used by the source computing node 110.
In block 316, the pooled memory controller 140 sends a "receive" message to the destination vNIC specified by the destination vNICID of the sent message. In block 318, the pooled memory controller 140 includes the source vNIC ID, the destination vNIC ID, and the packet size in the received message. In block 320, pooled memory controller 140 writes the receive message to the I/O space dedicated to the destination vNIC. In block 322, pooled memory controller 140 generates an interrupt to compute node 110 associated with the destination vNIC ID. The pooled memory controller 140 may identify a network address of the computing node 110 or otherwise identify the computing node 110 in order to identify the destination computing node 110 and/or generate an interrupt. As described further below in connection with fig. 5, the destination computing node 110 may read the received message from its dedicated I/O space after receiving the interrupt.
In block 324, the pooled memory controller 140 reads the recipient physical memory address from the destination vNIC. The recipient physical memory address represents the starting physical address of a buffer within pooled memory 142 that may receive packet data. Thus, the receiver physical memory address represents the memory address of the buffer used by the destination computing node 110. The pooled memory controller 140 may use any technique to read the recipient physical memory address. In some embodiments, in block 326, pooled memory controller 140 may read the I/O space dedicated to the destination vNIC.
In block 328, the pooled memory controller 140 copies the packet data from the sender physical memory address to the receiver physical memory address in the pooled memory 142. In other words, the pooled memory controller 140 copies packet-sized amounts of data from the packet data of the source computing node 110 into the receive buffer of the destination computing node 110. Because the packet data is replicated within the pooled memory 142, the packet data is not replicated across the interconnect 144 between the pooled memory shells 108 and the compute node 110, and the packet data is not replicated across the network switch 104 and/or the network 112.
In block 330, the pooled memory controller 140 informs the source compute node 110 and the destination compute node 110 that the packet data replication is complete. The pooled memory controller 140 may notify the compute node 110 using any technique. In some embodiments, pooled memory controller 140 may generate an interrupt to each compute node 110 in block 332. After notifying the computing node 110, the method 300 loops back to block 308 to monitor for additional sent messages.
Referring now to fig. 4, in use, a computing node 110 may perform a method 400 for sending network data using pooled storage. The method 400 begins at block 402, where a compute node 110 sends its vNIC ID to the pooled memory controller 140 and/or other compute nodes 110 of the computer chassis 102. Computing node 110 may, for example, send its vNIC ID directly to pooled memory controller 140 or may broadcast its vNIC ID to many attached devices. In block 404, computing node 110 receives vNIC IDs and associated network addresses from pooled memory controller 140 and/or other computing nodes 110. For example, computing node 110 may receive the vNIC ID directly from pooled memory controller 140 or may listen for broadcast messages from other devices. After receiving the vNIC ID and network address, the computing node 110 may be able to determine the vNIC ID associated with the network address of any computing node 110 of the computing chassis 102.
In block 406, the computing node 110 monitors for network packets to be sent to another computing node 110 of the computing rack 102. The computing node 110 may monitor network packets using any technique. For example, the computing node 110 may determine whether the application 202 has submitted a network packet for transmission to the networking stack 206. The computing nodes 110 may also determine whether a network packet is addressed to a computing node 110 within the computing rack 102. Although illustrated as the computing node 110 monitoring network packets, it should be understood that in some embodiments, the application 202 may initiate the network transfer operation directly without the computing node 110 performing any polling of the network packets. In some embodiments, application 202 may bypass networking stack 206 and submit network packets directly to vNIC module 210 for transmission. In block 408, the computing node 110 determines whether the network packet is ready to be transmitted. If not, the method 400 loops back to block 406 to continue monitoring packets. If the network packet is ready to be transmitted, the method 400 proceeds to block 410.
In block 410, the computing node 110 determines a destination vNICID associated with a destination network address of the network packet. For example, computing node 110 may determine a destination vNIC ID based on a destination IP address of the network packet. Of course, if the destination IP address does not resolve to a compute node 110 within the computer chassis 102, there is no associated vNIC ID and the compute node 110 may utilize the networking stack 206 and NIC 126 to process network packets.
In block 412, the computing node 110 determines the physical address of the packet data within the pooled storage 142. The physical address refers to the source location in the pooled memory 142 of packet data to be transferred to the destination vNIC. For example, a physical address may refer to one of the following maintained by application 202, networking stack 206, and vNIC module 210, respectively: an application buffer 204, a networking stack buffer 208, or a vNIC buffer 212. In some embodiments, in block 414, computing node 110 may copy packet data from networking stack buffer 208 managed by networking stack 206 into vNIC buffer 212 managed by vNIC module 210. In these embodiments, the physical address of the packet data may be located in vNIC buffer 212. For example, in many embodiments, packet data may be copied from application buffer 204 to networking stack buffer 208 and then copied into vNIC buffer 212. Thus, because buffers 204, 208, 212 are stored in pooled memory 142, in these embodiments, the same packet data may be replicated multiple times over interconnect 144 between compute node 110 and pooled memory shell 108.
In some embodiments, in block 416, compute node 110 may copy the data from application buffer 204 directly into vNIC buffer 212. In these embodiments, the physical address of the packet data may also be located in vNIC buffer 212. Thus, one or more copies of packet data between the compute node 110 and the pooled memory shells 108 may be avoided as compared to a copy from the application buffer 204 to the networking stack buffer 208. In some embodiments, in block 418, the compute node 110 may identify the physical address of the packet data in the application buffer 204 itself. In these embodiments, no additional copies of the packet data are transmitted over the interconnect 144 between the compute node 110 and the pooled memory shells 108.
In block 420, computing node 110 generates a "send" message that includes the destination vNIC ID, the source vNIC ID, the physical memory address of the packet data, and the packet size. In block 422, compute node 110 transmits a send message to pooled memory controller 140. The computing node 110 may use any technique to transmit the send message. In some embodiments, in block 424, computing node 110 may write the send message to an I/O space dedicated to vNIC module 210. As described above in connection with block 310 of fig. 3, pooled memory controller 140 may monitor the dedicated I/O space used to send messages from vNIC module 210.
In block 426, the compute node 110 waits for a notification from the pooled memory controller 140 that the network packet transmission is complete. As described above in connection with fig. 3, in response to receiving the send message, pooled memory controller 140 copies packet data from a physical memory location associated with source computing node 110 to another physical location associated with destination computing node 110 in pooled memory 142. The compute node 110 may use any technique to wait for notifications from the pooled memory controller 140. In some embodiments, in block 428, the compute node 110 waits for an interrupt generated by the pooled memory controller 140. After receiving the notification that the send operation is complete, the method 400 loops back to block 406 to monitor for additional network packets.
Referring now to fig. 5, in use, the computing node 110 may perform a method 500 for receiving network packets using pooled memory. Method 500 begins at block 502, where compute node 110 monitors for receive messages received from pooled memory controller 140. As described above, the pooled memory controller 140 may transmit a "receive" message to the destination computing node 110 in response to receiving a "send" message from the source computing node 110. Computing node 110 may monitor messages received from pooled memory controller 140 using any technique. In some embodiments, in block 504, the compute node 110 may wait for an interrupt from the pooled memory controller 140. In block 506, the computing node 110 determines whether a received message has been received. If not, the method 500 loops back to block 502 to continue monitoring for received messages. If a receive message is received, the method 500 proceeds to block 508.
In block 508, the compute node 110 reads the received message and extracts the value it contains. In particular, compute node 110 reads the source vNIC ID, the destination vNIC ID, and the packet size from the received message. Computing node 110 may read the received message from the I/O space dedicated to vNIC module 210.
In block 510, the computing node 110 identifies a receive buffer for the packet data in the pooled storage 142. The receive buffer may be identified by the starting physical memory address of the receive buffer in pooled memory 142. The receive buffer is large enough to hold all packet data; that is, the receive buffer may be as large or larger than the packet data specified within the receive message. The compute node 110 may use any technique to identify the receive buffer. For example, the compute node 110 may allocate a new buffer, identify a previously allocated buffer, or reuse an existing buffer. In some embodiments, in block 512, computing node 110 may identify vNIC buffer 212 managed by vNIC module 210. vNIC buffer 212 may be located in kernel space or otherwise managed by vNIC module 210, and thus packet data may be copied into user space after being received for use by application 202. In some embodiments, in block 514, the computing node 110 may identify the networking stack buffer 208 managed by the networking stack 206. The networking stack buffer 208 may be located in kernel space or otherwise managed by the operating system of the compute node 110, and thus the packet data may be copied into user space for use by the application 202 after being received. However, by receiving the packet data directly in networking stack buffer 208, one or more copies of the packet data may be avoided (e.g., copies from vNIC buffer 212 to networking stack buffer 208 may be avoided).
In some embodiments, in block 516, the computing node 110 may identify the application buffer 204 maintained by the application 202. Application buffer 204 may be located in user space and managed directly by application 202. Thus, receiving packet data directly in application buffer 204 may also reduce the number of copies of the packet data (e.g., by eliminating copies from vNIC buffer 212 and/or networking stack buffer 208 to application buffer 204).
In block 518, the compute node 110 transmits the physical address of the receive buffer to the pooled memory controller 140. As described above, in response to receiving the physical address, the pooled memory controller 140 copies the packet data in the pooled memory 142 from a memory accessible to the source computing node 110 into a receive buffer accessible to the destination computing node 110. The computing node 110 may use any technique to transmit the physical address. In some embodiments, in block 520, compute node 110 may write the physical address to an I/O space dedicated to vNIC module 210.
In block 522, the compute node 110 waits for a notification from the pooled memory controller 140 that network packet reception is complete. The compute node 110 may use any technique to wait for notifications from the pooled memory controller 140. In some embodiments, in block 524, the compute node 110 waits for an interrupt generated by the pooled memory controller 140. After the packet reception is complete, the computing node may convey the received packet data to application 202, e.g., by copying the packet data to networking stack buffer 208, application buffer 204, or otherwise notifying application 202. After receiving a notification that the send operation is complete, the method 500 loops back to block 502 to monitor for additional receive messages.
Examples of the present invention
Illustrative examples of the devices, systems, and methods disclosed herein are provided below. Embodiments of the apparatus, system, and method may include any one or more of the following examples, and any combination thereof.
Example 1 includes a pooled-memory controller for inter-node communication in a pooled-memory architecture, the pooled-memory controller comprising: a pooled storage access module to manage remote access to the pooled storage by the first compute node and the second compute node; a virtual network interface controller (vNIC) interface module to: receiving a send message from a first computing node, wherein the send message comprises a source vNIC identifier (vNICID) associated with the first computing node, a destination vNIC ID associated with a second computing node, and a sender physical memory address, and wherein the sender physical memory address identifies packet data within the pooled memory; transmitting a receive message to a second computing node, wherein the receive message includes a source vNIC ID and a destination vNIC ID; and receiving a recipient physical memory address from a second computing node in response to transmitting the receive message, wherein the recipient physical memory address identifies a memory location within the pooled memory; and a data copy module to copy packet data from a sender physical memory address to a receiver physical memory address within the pooled storage in response to receiving the receiver physical memory address.
Example 2 includes the subject matter of example 1, and wherein receiving the send message comprises: reading a first memory location within a first input/output (I/O) space of the pooled memory controller dedicated to the first compute node.
Example 3 includes the subject matter of any of examples 1 and 2, and wherein transmitting the received message comprises: writing to a second memory location within a second I/O space of the pooled memory controller dedicated to the second compute node; and generating an interrupt to the second compute node in response to writing to the second memory location.
Example 4 includes the subject matter of any of examples 1-3, and wherein receiving the recipient physical memory address comprises: a second memory location within a second I/O space is read.
Example 5 includes the subject matter of any of examples 1-4, and wherein the send message further includes a packet size of the packet data; the received message further includes a packet size of the packet data; and duplicating the packet data includes duplicating an amount of data equal to a packet size of the packet data.
Example 6 includes the subject matter of any of examples 1-5, and wherein the vNIC interface module is further to notify the first and second compute nodes that the replication is complete in response to the replication packet data.
Example 7 includes the subject matter of any of examples 1-6, and wherein notifying the first computing node and the second computing node that the replication is complete includes: generating a first interrupt to a first compute node; and generating a second interrupt to the second compute node.
Example 8 includes a computing node for inter-node communication in a pooled storage architecture, the computing node comprising: a virtual network interface controller identifier (vNIC ID) module to determine a destination vNIC ID associated with a destination network address of a network packet; and a sending module for: determining a physical address of packet data of a network packet, wherein the physical address identifies a memory location within a pooled memory accessible to a computing node; generating a send message, wherein the send message comprises a source vNIC ID, a destination vNIC ID, and a physical address associated with a compute node; and transmitting the send message to a pooled memory controller, wherein the compute node is to remotely access the pooled memory via the pooled memory controller.
Example 9 includes the subject matter of example 8, and wherein determining the physical address comprises: copying packet data to a kernel mode driver buffer located at the physical address.
Example 10 includes the subject matter of any of examples 8 and 9, and wherein replicating the packet data comprises: packet data is copied from a kernel mode networking stack buffer located within the pooled memory.
Example 11 includes the subject matter of any of examples 8-10, and wherein replicating the packet data comprises: packet data is copied from a user-mode application buffer located in the pooled memory.
Example 12 includes the subject matter of any of examples 8-11, and wherein determining the physical address comprises: determining a physical address of a user-mode application buffer located within the pooled memory.
Example 13 includes the subject matter of any of examples 8-12, and wherein transmitting the send message comprises: a first memory location within an input/output (I/O) space of the pooled memory controller dedicated to the compute node is written.
Example 14 includes the subject matter of any of examples 8-13, and wherein the send message further includes a packet size of the packet data.
Example 15 includes the subject matter of any of examples 8-14, and wherein the send module is further to wait for a notification from the pooled memory controllers to complete sending in response to transmitting the send message.
Example 16 includes the subject matter of any of examples 8-15, and wherein waiting for notification comprises: waiting for an interrupt from the pooled memory controller.
Example 17 includes a computing node for inter-node communication in a pooled storage architecture, the computing node comprising a receiving module to: receiving a receive message from a pooled memory controller, wherein the compute node remotely accesses packet data stored in a pooled memory via the pooled memory controller; identifying a physical memory address of a receive buffer of a pooled memory accessible to a compute node, wherein the receive buffer is capable of storing packet data; and transmitting a physical memory address of the receive buffer to a pooled memory controller in response to receiving the receive message.
Example 18 includes the subject matter of example 17, and wherein the receive buffer comprises a kernel mode networking stack buffer located within the pooled memory.
Example 19 includes the subject matter of any of examples 17 and 18, and wherein the receive buffer comprises a kernel mode networking stack buffer located within the pooled memory.
Example 20 includes the subject matter of any of examples 17-19, and wherein the receive buffer comprises a user mode application buffer located within the pooled memory.
Example 21 includes the subject matter of any of examples 17-20, and wherein receiving the received message comprises: receiving an interrupt from a pooled memory controller; and in response to receiving the interrupt, reading from a first memory location within an input/output (I/O) space of the pooled memory controller dedicated to the compute node.
Example 22 includes the subject matter of any of examples 17-21, and wherein transmitting the physical memory address comprises: a first memory location within the I/O space is written.
Example 23 includes the subject matter of any of examples 17-22, and wherein the received message further includes a packet size of the packet data; and the receive buffer is capable of storing a packet-sized amount of data.
Example 24 includes the subject matter of any of examples 17-23, and wherein the receive module is further to wait for a notification from the pooled memory controller to complete reception in response to transmitting the physical memory address.
Example 25 includes the subject matter of any of examples 17-24, and wherein waiting for the notification comprises: waiting for an interrupt from the pooled memory controller.
Example 26 includes a system for inter-node communication in a pooled storage architecture, the system comprising a pooled storage controller, a first compute node, and a second compute node, wherein the pooled storage controller comprises: a pooled storage access module to manage remote access to the pooled storage by the first compute node and the second compute node; a virtual network interface controller (vNIC) interface module to: receiving a send message from a first computing node, wherein the send message comprises a source vNIC identifier (vNIC ID) associated with the first computing node, a destination vNIC ID associated with a second computing node, and a sender physical memory address, and wherein the sender physical memory address identifies packet data within the pooled memory; transmitting a receive message to a second computing node, wherein the receive message includes a source vNIC ID and a destination vNIC ID; and receiving a recipient physical memory address from a second computing node in response to transmitting the receive message, wherein the recipient physical memory address identifies a memory location within the pooled memory; and a data copy module to copy packet data from a sender physical memory address to a receiver physical memory address within the pooled storage in response to receiving the receiver physical memory address.
Example 27 includes the subject matter of example 26, and wherein the first computing node comprises: a virtual network interface controller identifier (vNIC ID) module to determine a destination vNIC ID associated with a destination network address of a network packet; and a sending module for: determining a sender physical address of packet data of a network packet, wherein the sender physical address identifies a memory location within a pooled memory accessible to a first computing node; generating a send message, wherein the send message comprises a source vNIC ID, a destination vNIC ID, and a physical address; and transmitting the send message to a pooled memory controller, wherein the compute node is to remotely access the pooled memory via the pooled memory controller.
Example 28 includes the subject matter of any of examples 26 and 27, and wherein the second computing node includes a receiving module to: receiving a receive message from the pooled memory controller, wherein the second compute node remotely accesses the packet data stored in the pooled memory via the pooled memory controller; identifying a recipient physical memory address of a receive buffer within a pooled memory, wherein the receive buffer is capable of storing packet data; and transmitting a recipient physical memory address of the receive buffer to a pooled memory controller in response to receiving the receive message.
Example 29 includes a method for inter-node communication in a pooled storage architecture, the method comprising: receiving, by a pooled memory controller, a send message from a first computing node, wherein the send message comprises a source virtual network interface controller identifier (vNIC ID) associated with the first computing node, a destination vNIC ID associated with a second computing node, and a sender physical memory address, and wherein the sender physical memory address identifies packet data within a pooled memory accessible to the first computing node; transmitting, by the pooled memory controller, a receive message to a second computing node, wherein the receive message comprises a source vNIC ID and a destination vNIC ID; receiving, by the pooled memory controller, a recipient physical memory address from the second computing node in response to transmitting the receive message, wherein the recipient physical memory address identifies a memory location within the pooled memory accessible to the second computing node; and copying, by the pooled storage controller, packet data from the sender physical memory address to the receiver physical memory address within the pooled storage.
Example 30 includes the subject matter of example 29, and wherein receiving the sent message comprises: reading from a first memory location within a first input/output (I/O) space of the pooled memory controller dedicated to the first compute node.
Example 31 includes the subject matter of any of examples 29 and 30, and wherein transmitting the received message comprises: writing to a second memory location within a second I/O space of the pooled memory controller dedicated to the second compute node; and generating an interrupt to the second compute node in response to writing to the second memory location.
Example 32 includes the subject matter of any of examples 29-31, and wherein receiving the recipient physical memory address comprises: a second memory location within a second I/O space is read.
Example 33 includes the subject matter of any of examples 29-32, and wherein the send message further includes a packet size of the packet data; the received message further includes a packet size of the packet data; and duplicating the packet data includes duplicating an amount of data equal to a packet size of the packet data.
Example 34 includes the subject matter of any one of examples 29-33, and further includes: the first compute node and the second compute node are notified, by the pooled memory controller, that the replication has completed in response to copying an amount of data of the packet size.
Example 35 includes the subject matter of any of examples 29-34, and wherein notifying the first computing node and the second computing node that the replication is complete includes: generating a first interrupt to a first compute node; and generating a second interrupt to the second compute node.
Example 36 includes a method for inter-node communication in a pooled storage architecture, the method comprising: determining, by a computing node, a destination virtual network interface controller identifier (vNIC ID) associated with a destination network address of a network packet; determining, by a compute node, a physical address of packet data of a network packet, wherein the physical address identifies a memory location of a pooled memory accessible to the compute node; generating, by a computing node, a send message, wherein the send message comprises a source vNIC ID, a destination vNIC ID, and a physical address associated with the computing node; and transmitting, by the compute node, the send message to a pooled storage controller, wherein the compute node remotely accesses the pooled storage via the pooled storage controller.
Example 37 includes the subject matter of example 36, and wherein determining the physical address comprises: copying packet data to a kernel mode driver buffer located at the physical address.
Example 38 includes the subject matter of any of examples 36 and 37, and wherein replicating the packet data comprises: packet data is copied from a kernel mode networking stack buffer located within the pooled memory.
Example 39 includes the subject matter of any of examples 36-38, and wherein replicating the packet data comprises: packet data is copied from a user-mode application buffer located in the pooled memory.
Example 40 includes the subject matter of any one of examples 36-39, and wherein determining the physical address comprises: determining a physical address of a user-mode application buffer located within the pooled memory.
Example 41 includes the subject matter of any of examples 36-41, and wherein transmitting the send message comprises: a first memory location within an input/output (I/O) space of the pooled memory controller dedicated to the compute node is written.
Example 42 includes the subject matter of any of examples 36-41, and wherein the send message further includes a packet size of the packet data.
Example 43 includes the subject matter of any one of examples 36-42, further comprising: sending, by the compute node, a message in response to the transmission awaiting a notification from the pooled memory controller to complete the transmission.
Example 44 includes the subject matter of any of examples 36-43, and wherein waiting for the notification comprises: waiting for an interrupt from the pooled memory controller.
Example 45 includes a method for inter-node communication in a pooled storage architecture, the method comprising: receiving, by a compute node, a receive message from a pooled memory controller, wherein the compute node remotely accesses packet data stored in a pooled memory via the pooled memory controller; identifying, by a compute node, a physical memory address of a receive buffer within a pooled memory accessible to the compute node, wherein the receive buffer is capable of storing packet data; and transmitting, by the compute node, the physical memory address of the receive buffer to a pooled memory controller in response to receiving the receive message.
Example 46 includes the subject matter of example 45, and wherein identifying the receive buffer comprises: a kernel mode driver buffer located within the pooled memory is identified.
Example 47 includes the subject matter of any one of examples 45 and 46, and wherein identifying the receive buffer comprises: a kernel mode networking stack buffer located within the pooled memory is identified.
Example 48 includes the subject matter of any one of examples 45-47, and wherein identifying the receive buffer comprises: a user-mode application buffer located within the pooled memory is identified.
Example 49 includes the subject matter of any of, for example, 45-48, and wherein receiving the received message comprises: receiving an interrupt from a pooled memory controller; and in response to receiving the interrupt, reading from a first memory location within an input/output (I/O) space of the pooled memory controller dedicated to the compute node.
Example 50 includes the subject matter of any of examples 45-49, and wherein transmitting the physical memory address comprises: a first memory location within the I/O space is written.
Example 51 includes the subject matter of any of examples 45-50, and wherein the received message further includes a packet size of the packet data; and the receive buffer is capable of storing a packet-sized amount of data.
Example 52 includes the subject matter of any one of examples 45-51, and further comprising: a notification of completion of receipt from the pooled memory controller is awaited by the compute node in response to transmitting the physical memory address.
Example 53 includes the subject matter of any of examples 45-52, and wherein waiting for the notification comprises: waiting for an interrupt from the pooled memory controller.
Example 54 includes a computing device, comprising: a processor; and a memory storing a plurality of instructions that, when executed by the processor, cause the computing device to perform the method of any of examples 29-53.
Example 55 includes one or more machine-readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a computing device performing the method of any of examples 29-53.
Example 56 includes a computing device that includes means for performing the method of any of examples 29-53.
Example 57 includes a pooled memory controller for inter-node communication in a pooled memory architecture, the pooled memory controller comprising: means for receiving a send message from a first computing node, wherein the send message comprises a source virtual network interface controller identifier (vNIC ID) associated with the first computing node, a destination vNIC ID associated with a second computing node, and a sender physical memory address, and wherein the sender physical memory address identifies packet data within a pooled memory accessible to the first computing node; means for transmitting a receive message to a second computing node, wherein the receive message includes a source vNIC ID and a destination vNIC ID; and means for receiving a recipient physical memory address from a second computing node in response to transmitting the receive message, wherein the recipient physical memory address identifies a memory location within the pooled memory accessible to the second node; and means for copying packet data from the sender physical memory address to the receiver physical memory address within the pooled memory.
Example 58 includes the subject matter of example 57, and wherein means for receiving the sent message comprises: means for reading a first memory location within a first input/output (I/O) space of the pooled memory controller dedicated to the first compute node.
Example 59 includes the subject matter of any of examples 57 and 58, and wherein means for transmitting the received message comprises: means for writing to a second memory location within a second I/O space of the pooled memory controller dedicated to the second compute node; and means for generating an interrupt to the second compute node in response to writing to the second memory location.
Example 60 includes the subject matter of any of examples 57-59, and wherein means for receiving the recipient physical memory address comprises: means for reading a second memory location within a second I/O space.
Example 61 includes the subject matter of any of examples 57-60, and wherein the send message further includes a packet size of the packet data; the received message further includes a packet size of the packet data; and the means for duplicating the packet data includes means for duplicating an amount of data equal to a packet size of the packet data.
Example 62 includes the subject matter of any one of examples 57-61, and further includes: means for notifying the first compute node and the second compute node that the replication has completed in response to the amount of data that the packet size is replicated.
Example 63 includes the subject matter of any of examples 57-62, and wherein means for notifying the first computing node and the second computing node that the replication is complete comprises: means for generating a first interrupt to a first compute node; and means for generating a second interrupt to the second compute node.
Example 64 includes a computing node for inter-node communication in a pooled storage architecture, the computing node comprising: means for determining a destination virtual network interface controller identifier (vNIC ID) associated with a destination network address of a network packet; and means for determining a physical address of packet data of the network packet, wherein the physical address identifies a memory location within a pooled memory accessible to the computing node; means for generating a send message, wherein the send message comprises a source vNIC ID, a destination vNIC ID, and a physical address associated with a compute node; and means for transmitting the send message to a pooled storage controller, wherein the compute node is to remotely access the pooled storage via the pooled storage controller.
Example 65 includes the subject matter of example 64, and wherein means for determining a physical address comprises: means for copying packet data to a kernel mode driver buffer located at the physical address.
Example 66 includes the subject matter of any one of examples 64 and 65, and wherein the means for replicating the packet data comprises: means for copying packet data from a kernel mode networking stack buffer located within a pooled memory.
Example 67 includes the subject matter of any of examples 64-66, and wherein the means for replicating the packet data comprises: means for copying packet data from a user-mode application buffer located within a pooled memory.
Example 68 includes the subject matter of any of examples 64-67, and wherein means for determining a physical address comprises: means for determining a physical address of a user-mode application buffer located within the pooled memory.
Example 69 includes the subject matter of any of examples 64-68, and wherein means for communicating the send message comprises: means for writing to a first memory location within an input/output (I/O) space of a pooled memory controller dedicated to a compute node.
Example 70 includes the subject matter of any of examples 64-69, and wherein the send message further includes a packet size of the packet data.
Example 71 includes the subject matter of any one of examples 64-70, and further comprising: means for waiting for a notification from the pooled memory controllers to complete transmission in response to transmitting the transmit message.
Example 72 includes the subject matter of any of examples 64-71, and wherein means for waiting for notification comprises: means for waiting for an interrupt from the pooled memory controller.
Example 73 includes a computing node for inter-node communication in a pooled memory architecture, the computing node comprising: means for receiving a receive message from a pooled memory controller, wherein the compute node remotely accesses packet data stored in a pooled memory via the pooled memory controller; means for identifying a physical memory address of a receive buffer within a pooled memory accessible to a compute node, wherein the receive buffer is capable of storing packet data; and means for transmitting a physical memory address of the receive buffer to a pooled memory controller in response to receiving the receive message.
Example 74 includes the subject matter of example 73, and wherein means for identifying a receive buffer comprises means for identifying a kernel mode buffer located within the pooled memory.
Example 75 includes the subject matter of any of examples 73 and 74, and wherein means for identifying a receive buffer comprises means for identifying a kernel mode networking stack buffer located within the pooled memory.
Example 76 includes the subject matter of any of examples 73-75, and wherein means for identifying a receive buffer comprises means for identifying a user mode application buffer located within the pooled memory.
Example 77 includes the subject matter of any of examples 73-76, and wherein means for receiving the received message comprises: means for receiving an interrupt from a pooled memory controller; and means for reading, in response to receiving the interrupt, from a first memory location within an input/output (I/O) space of the pooled memory controller dedicated to the compute node.
Example 78 includes the subject matter of any one of examples 73-77, and wherein the means for transmitting the physical memory address comprises: a unit for writing to a first memory location within the I/O space.
Example 79 includes the subject matter of any of examples 73-78, and wherein the received message further includes a packet size of the packet data; and the receive buffer is capable of storing a packet-sized amount of data.
Example 80 includes the subject matter of any of examples 73-79, and further includes means for waiting for a notification of completed receipt from the pooled memory controller in response to transmitting the physical memory address.
Example 81 includes the subject matter of any of examples 73-80, and wherein means for waiting for notification comprises: means for waiting for an interrupt from the pooled memory controller.

Claims (22)

1. A pooled memory controller for inter-node communication in a pooled memory architecture, the pooled memory controller comprising:
a pooled storage access module to manage remote access to the pooled storage by the first compute node and the second compute node;
a virtual network interface controller (vNIC) interface module to:
receiving a send message from the first computing node, wherein the send message comprises a source vNIC identifier (vNIC ID) associated with the first computing node, a destination vNIC ID associated with the second computing node, and a sender physical memory address, and wherein the sender physical memory address identifies packet data within the pooled memory;
transmitting a receive message to the second computing node, wherein the receive message includes the source vNIC ID and the destination vNIC ID; and
receiving a recipient physical memory address from the second computing node in response to transmitting the receive message, wherein the recipient physical memory address identifies a memory location within the pooled memory; and
a data copy module to copy the packet data from the sender physical memory address to the receiver physical memory address within the pooled storage in response to receiving the receiver physical memory address.
2. The pooled memory controller of claim 1, wherein receiving the send message comprises: reading a first memory location within a first input/output (I/O) space of the pooled memory controller dedicated to the first compute node.
3. The pooled memory controller of claim 2, wherein transmitting the receive message comprises:
writing to a second memory location within a second I/O space of the pooled memory controller dedicated to the second compute node; and
generating an interrupt to the second compute node in response to writing to the second memory location.
4. The pooled memory controller of claim 1, wherein the vNIC interface module is further to notify the first computing node and the second computing node that the copying is complete in response to copying the packet data.
5. The pooled memory controller of claim 4, wherein notifying the first compute node and the second compute node that the replication is complete comprises:
generating a first interrupt to the first compute node; and
generating a second interrupt to the second compute node.
6. A pooled memory controller for inter-node communication in a pooled memory architecture, the pooled memory controller comprising:
means for receiving a send message from a first computing node, wherein the send message comprises a source virtual network interface controller identifier (vNIC ID) associated with the first computing node, a destination vNIC ID associated with a second computing node, and a sender physical memory address, and wherein the sender physical memory address identifies packet data within a pooled memory accessible to the first computing node;
means for transmitting a receive message to the second computing node, wherein the receive message comprises the source vNICID and the destination vNIC ID;
means for receiving a recipient physical memory address from the second computing node in response to transmitting the receive message, wherein the recipient physical memory address identifies a memory location within the pooled memory accessible to the second computing node; and
means for copying the packet data from the sender physical memory address to the receiver physical memory address within the pooled memory.
7. The pooled memory controller of claim 6, wherein means for receiving the send message comprises: means for reading a first memory location within a first input/output (I/O) space of the pooled memory controller dedicated to the first compute node.
8. The pooled memory controller of claim 7, wherein means for transmitting the receive message comprises:
means for writing to a second memory location within a second I/O space of the pooled memory controller dedicated to the second compute node; and
means for generating an interrupt to the second compute node in response to writing to the second memory location.
9. The pooled-memory controller of claim 6 further comprising means for notifying the first compute node and the second compute node that replication is complete in response to copying the amount of data for the packet size.
10. The pooled memory controller of claim 9, wherein means for notifying the first compute node and the second compute node that replication is complete comprises:
means for generating a first interrupt to the first compute node; and
means for generating a second interrupt to the second compute node.
11. A computing node for inter-node communication in a pooled memory architecture, the computing node comprising:
a virtual network interface controller identifier (vNIC ID) module to determine a destination vNIC ID associated with a destination network address of a network packet; and
a sending module configured to:
determining a physical address of packet data of the network packet, wherein the physical address identifies a memory location within a pooled memory accessible to the compute node;
generating a send message, wherein the send message comprises a source vNIC ID, a destination vNIC ID, and the physical address associated with the computing node; and
transmitting the send message to a pooled memory controller, wherein the compute node is to remotely access the pooled memory via the pooled memory controller.
12. The computing node of claim 11, wherein to determine the physical address comprises to: copying the packet data to a kernel mode driver buffer located at the physical address.
13. The computing node of claim 12, wherein to copy the packet data comprises to copy the packet data from a kernel mode networking stack buffer located within the pooled memory.
14. The computing node of claim 12, wherein to copy the packet data comprises to copy the packet data from a user mode application buffer located within the pooled memory.
15. The computing node of claim 11, wherein to determine the physical address comprises to: determining a physical address of a user-mode application buffer located within the pooled memory.
16. The computing node of claim 11, wherein transmitting the send message comprises: writing to a first memory location within an input/output (I/O) space of the compute node that is dedicated to the pooled memory controller.
17. A computing node for inter-node communication in a pooled memory architecture, the computing node comprising:
means for determining a destination virtual network interface controller identifier (vNIC ID) associated with a destination network address of a network packet;
means for determining a physical address of packet data of the network packet, wherein the physical address identifies a memory location within a pooled memory accessible to the compute node;
means for generating a send message, wherein the send message comprises a source vNICID associated with the computing node, the destination vNIC ID, and the physical address; and
means for transmitting the send message to a pooled storage controller, wherein the compute node remotely accesses the pooled storage via the pooled storage controller.
18. The computing node of claim 17, wherein means for determining the physical address comprises: means for copying the packet data to a kernel mode driver buffer located at the physical address.
19. The computing node of claim 18, wherein means for copying the packet data comprises means for copying the packet data from a kernel mode networking stack buffer located within the pooled memory.
20. The computing node of claim 18, wherein means for copying the packet data comprises means for copying the packet data from a user-mode application buffer located within the pooled memory.
21. The computing node of claim 17, wherein means for determining the physical address comprises: means for determining a physical address of a user-mode application buffer located within the pooled memory.
22. The computing node of claim 17, wherein means for transmitting the send message comprises: means for writing to a first memory location of the compute node that is dedicated within an input/output (I/O) space of the pooled memory controller.
CN201480081459.7A 2014-09-25 2014-09-25 Network communication using pooled storage in a rack scale architecture Active CN106688208B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/087438 WO2016045055A1 (en) 2014-09-25 2014-09-25 Network communications using pooled memory in rack-scale architecture

Publications (2)

Publication Number Publication Date
CN106688208A CN106688208A (en) 2017-05-17
CN106688208B true CN106688208B (en) 2020-06-30

Family

ID=55580103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480081459.7A Active CN106688208B (en) 2014-09-25 2014-09-25 Network communication using pooled storage in a rack scale architecture

Country Status (4)

Country Link
US (1) US10621138B2 (en)
EP (1) EP3198806B1 (en)
CN (1) CN106688208B (en)
WO (1) WO2016045055A1 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10348621B2 (en) * 2014-10-30 2019-07-09 AT&T Intellectual Property I. L. P. Universal customer premise equipment
US20170351639A1 (en) * 2016-06-06 2017-12-07 Cisco Technology, Inc. Remote memory access using memory mapped addressing among multiple compute nodes
US10637738B1 (en) * 2017-05-26 2020-04-28 Amazon Technologies, Inc. Network traffic logs with integrated product identifiers
US10862850B2 (en) * 2017-06-15 2020-12-08 Nicira, Inc. Network-address-to-identifier translation in virtualized computing environments
US20190044809A1 (en) * 2017-08-30 2019-02-07 Intel Corporation Technologies for managing a flexible host interface of a network interface controller
US10915791B2 (en) * 2017-12-27 2021-02-09 Intel Corporation Storing and retrieving training data for models in a data center
US10725941B2 (en) * 2018-06-30 2020-07-28 Western Digital Technologies, Inc. Multi-device storage system with hosted services on peer storage devices
KR102513919B1 (en) 2018-11-05 2023-03-27 에스케이하이닉스 주식회사 Memory system
KR102516584B1 (en) 2018-11-21 2023-04-03 에스케이하이닉스 주식회사 Memory system
US10915470B2 (en) * 2018-07-23 2021-02-09 SK Hynix Inc. Memory system
US10932105B2 (en) * 2018-09-26 2021-02-23 Micron Technology, Inc. Memory pooling between selected memory resources on vehicles or base stations
US11809908B2 (en) 2020-07-07 2023-11-07 SambaNova Systems, Inc. Runtime virtualization of reconfigurable data flow resources
KR20220061771A (en) 2020-11-06 2022-05-13 삼성전자주식회사 Memory device including direct memory access engine, System having the same and Operating method of memory device
US11392740B2 (en) 2020-12-18 2022-07-19 SambaNova Systems, Inc. Dataflow function offload to reconfigurable processors
US11182221B1 (en) * 2020-12-18 2021-11-23 SambaNova Systems, Inc. Inter-node buffer-based streaming for reconfigurable processor-as-a-service (RPaaS)
US11237880B1 (en) 2020-12-18 2022-02-01 SambaNova Systems, Inc. Dataflow all-reduce for reconfigurable processor systems
US20210149803A1 (en) * 2020-12-23 2021-05-20 Francesc Guim Bernat Methods and apparatus to enable secure multi-coherent and pooled memory in an edge network
US11782760B2 (en) 2021-02-25 2023-10-10 SambaNova Systems, Inc. Time-multiplexed use of reconfigurable hardware
US11200096B1 (en) 2021-03-26 2021-12-14 SambaNova Systems, Inc. Resource allocation for reconfigurable processors
CN116016437A (en) * 2021-10-21 2023-04-25 华为技术有限公司 Computing system, addressing method, computing node, storage medium and program product
CN115550281B (en) * 2022-11-30 2023-04-28 广州地铁设计研究院股份有限公司 Resource scheduling method and architecture for AWGR (optical fiber reinforced plastic) optical switching data center network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6741589B1 (en) * 2000-01-24 2004-05-25 Advanced Micro Devices, Inc. Apparatus and method for storing data segments in a multiple network switch system using a memory pool
CN1926616A (en) * 2004-01-19 2007-03-07 特科2000国际有限公司 Portable data storing device using storage address mapping table
CN102326147A (en) * 2009-03-02 2012-01-18 国际商业机器公司 Copy circumvention in virtual network environment
WO2013173181A1 (en) * 2012-05-14 2013-11-21 Advanced Micro Devices, Inc. Server node interconnect devices and methods
CN103905309A (en) * 2012-12-28 2014-07-02 中国电信股份有限公司 Method and system of data exchange between virtual machines

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8381209B2 (en) * 2007-01-03 2013-02-19 International Business Machines Corporation Moveable access control list (ACL) mechanisms for hypervisors and virtual machines and virtual port firewalls
US8886838B2 (en) * 2008-02-29 2014-11-11 Oracle America, Inc. Method and system for transferring packets to a guest operating system
WO2011060366A2 (en) * 2009-11-13 2011-05-19 Anderson Richard S Distributed symmetric multiprocessing computing architecture
US9396127B2 (en) * 2014-02-27 2016-07-19 International Business Machines Corporation Synchronizing access to data in shared memory

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6741589B1 (en) * 2000-01-24 2004-05-25 Advanced Micro Devices, Inc. Apparatus and method for storing data segments in a multiple network switch system using a memory pool
CN1926616A (en) * 2004-01-19 2007-03-07 特科2000国际有限公司 Portable data storing device using storage address mapping table
CN102326147A (en) * 2009-03-02 2012-01-18 国际商业机器公司 Copy circumvention in virtual network environment
WO2013173181A1 (en) * 2012-05-14 2013-11-21 Advanced Micro Devices, Inc. Server node interconnect devices and methods
CN103905309A (en) * 2012-12-28 2014-07-02 中国电信股份有限公司 Method and system of data exchange between virtual machines

Also Published As

Publication number Publication date
EP3198806A1 (en) 2017-08-02
CN106688208A (en) 2017-05-17
EP3198806B1 (en) 2019-09-25
EP3198806A4 (en) 2018-05-02
US20180225254A1 (en) 2018-08-09
WO2016045055A1 (en) 2016-03-31
US10621138B2 (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN106688208B (en) Network communication using pooled storage in a rack scale architecture
US10732879B2 (en) Technologies for processing network packets by an intelligent network interface controller
US11843691B2 (en) Technologies for managing a flexible host interface of a network interface controller
US11509606B2 (en) Offload of storage node scale-out management to a smart network interface controller
TWI543073B (en) Method and system for work scheduling in a multi-chip system
US20200257566A1 (en) Technologies for managing disaggregated resources in a data center
TWI519958B (en) Method and apparatus for memory allocation in a multi-node system
US7668984B2 (en) Low latency send queues in I/O adapter hardware
TWI547870B (en) Method and system for ordering i/o access in a multi-node environment
US7941569B2 (en) Input/output tracing in a protocol offload system
JP6336988B2 (en) System and method for small batch processing of usage requests
US9253287B2 (en) Speculation based approach for reliable message communications
TWI541649B (en) System and method of inter-chip interconnect protocol for a multi-chip system
TW201543218A (en) Chip device and method for multi-core network processor interconnect with multi-node connection
JP2017537404A (en) Memory access method, switch, and multiprocessor system
US20150207731A1 (en) System and method of forwarding ipmi message packets based on logical unit number (lun)
US7710990B2 (en) Adaptive low latency receive queues
EP3716088B1 (en) Technologies for flexible protocol acceleration
US10680976B2 (en) Technologies for performing switch-based collective operations in distributed architectures
KR20050080704A (en) Apparatus and method of inter processor communication
US10284501B2 (en) Technologies for multi-core wireless network data transmission
US20060129709A1 (en) Multipurpose scalable server communication link
CN111240845A (en) Data processing method, device and storage medium
JP2009276828A (en) Usb connector and usb connection method
US10038767B2 (en) Technologies for fabric supported sequencers in distributed architectures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant