WO2017151138A1

WO2017151138A1 - Atomic memory operation

Info

Publication number: WO2017151138A1
Application number: PCT/US2016/020719
Authority: WO
Inventors: Jean Tourrilhes; Michael Schlansker
Original assignee: Hewlett Packard Enterprise Development Lp
Priority date: 2016-03-03
Filing date: 2016-03-03
Publication date: 2017-09-08

Abstract

An atomic memory operation system comprises a memory fabric coupling a receiving node and a sending node. The memory fabric executes an atomic memory operation by identifying a control word of a ring buffer using a memory address of the memory operation. The memory fabric then retrieves a tail index from the control word, wherein the tail index indicates a position in a ring buffer, and inserts a memory word in the ring buffer at the position indicated by the tail index, wherein the memory word corresponds to the memory operation. Finally, the memory fabric updates the tail index in the control word to point to a next slot in the ring buffer.

Description

ATOMIC EMORY OPERATION

Background

[0001] Messaging allows two or more entities to exchange information across a network and represents a foundationai aspect of networking. As networks become larger and more complex, challenges may arise with respect to successfully messaging across networks.

Brief Description of the Drawings

[0002] Figure 1 is a block diagram of an example system for an atomic memory operation consistent with the present disclosure

[0003] Figure 2 is a block diagram of an example system for an atomic memory operation consistent with the present disclosure.

[0004] Figure 3 is a block diagram of an example system for an atomic memory operation consistent with the present disclosure.

[0005] Figure 4 is a block diagram of an example system for an atomic memory operation consistent with the present disclosure.

[0006] Figure 5 is a block diagram of an example method for an atomic memory operation consistent with the present disclosure.

[0007] Messaging may allow two or more entities to exchange information across a network. Messaging may be accomplished using either hardware or processor instructions. Hardware may be used for messaging between multiple computers. In some cases, the message may be sent via a cable, such as a copper or optical cable, and may be received by a Network Interface Card (NIC). The NIC may define a hardware receiving queue to receive the message and notify the recipient that the message has been received.

[0008] When messaging is done within a single computer, processor instructions may be used to control and implement the messaging, in some cases, the operations for queueing a received message may be encoded as memory write and read operations. The memory write and read operations may be performed as a single transaction, known as an atomic operation. This may allow multiple senders to send messages within the computer.

[0009] Figure 1 is a block diagram of an example system 100 for atomic queueing consistent with the present disclosure. System 100 may include multiple components, as illustrated in Figure 1. For example, system 100 may include a memory fabric 102. As used herein, a memory fabric refers to a framework that connects a plurality of computing nodes. Memory fabric 102 may consist of connected storage, a connected network, and/or connected processing. In some instances, memory fabric 102 may connect a plurality of nodes to a pool of global, shared memory 103. Memory fabric 102 may execute an atomic operation consistent with the present disclosure.

[0010] Further, as illustrated in Figure 1 , system 100 may include a sending node 104. As used herein, a sending node refers to a node that transmits a message to a separate node. As used herein, a message refers to a piece of data with a sender and a receiver. Sending node 104 may be composed of a plurality of processor cores and local memory. Sending node 104 may act like a computer. For instance, sending node 104 may have its own operating system and/or a local memory domain. Sending node 104 may further have its own power supply and/or its own fault domain. As shown in Figure 1 , sending node 104 may be coupled to the memory fabric 102. Although a single sending node 104 is shown in Figure 1 , it is contemplated that a plurality of sending nodes may be included within system 100, and a plurality of sending nodes 104 may be coupled to memory fabric 102. [0011] As further illustrated in Figure 1 , system 00 may include a receiving node 106. As used herein, a receiving node refers to a node that receives a message from another node, such as sending node 104. Receiving node 106 may be composed of a plurality of processor cores and local memory. Receiving node 106 may act like a computer. For instance, receiving node 106 may have its own operating system and/or a local memory domain. Receiving node 106 may further have its own power supply and/or its own fault domain. As shown in Figure 1 , receiving node 106 may be coupled to the memory fabric 102. Although a single receiving node 106 is shown in Figure 1 , it is contemplated that a plurality of receiving nodes 106 may be included within system 100, and a plurality of receiving nodes 106 may be coupled to memory fabric 102. in some examples, a single node may behave as both a receiving node 106 and a sending node 104. in such examples, the single node may act as a receiving node 106 for a first message and as a sending node 104 for a second message.

[0012] As shown in Figure 1 , receiving node 106 may include a memory fabric interface 108. As used herein, a memory fabric interface refers to a hardware interface that couples a memory fabric to other components of a system. Receiving node 106 may also include local memory 1 14. As used herein, local memory refers to the memory specific to a particular node. For instance, in Figure 1 , local memory 1 14 corresponds to the memory specific to receiving node 106.

[0013] Figure 2 is a block diagram of an example system 200 for atomic queueing consistent with the present disclosure. System 200 may include multiple components, as illustrated in Figure 2.

[0014] For example, system 200 may include a memory fabric 202. Memory fabric 202 is analogous to memory fabric 102 shown in Figure 1. Memory fabric 202 may consist of connected storage, a connected network, and/or connected processing. In some instances, memory fabric 202 may connect a plurality of nodes to a pool of global, shared memory. Memory fabric 202 may execute an atomic operation consistent with the present disclosure.

[0015] System 200 may further include a sending node 204. Sending node 204 is analogous to sending node 104 shown in Figure 1. Sending node 204 may be composed of a plurality of processor cores and local memory. Sending node 204 may act like a computer. For instance, sending node 204 may have its own operating system and/or a local memory domain. Sending node 204 may further have its own power supply and/or its own fault domain. As shown in Figure 2, sending node 204 may be coupled to the memory fabric 202. Although a single sending node 204 is shown in Figure 2, it is contemplated that a plurality of sending nodes 204 may be included within system 200, and a plurality of sending nodes 204 may be coupled to memory fabric 202.

[0016] System 200 may further include a receiving node 206. Receiving node 206 is analogous to receiving node 106 shown in Figure 1. Although a single receiving node 206 is shown, it is contemplated that a plurality of receiving nodes 206 may be included within system 200 and that a plurality of receiving nodes 206 may be coupled to memory fabric 202.

[0017] As shown in Figure 2, receiving node 206 may include a memory fabric interface 208. As used herein, a memory fabric interface refers to a hardware interface that couples a memory fabric to other components of a system. For instance, as shown in Figure 2, memory fabric interface 208 may couple the memory fabric 202 to other components of receiving node 206, Memory fabric interface 208 may include an enqueue atomic handier 210. As used herein, an enqueue atomic handier refers to the portion of the memory fabric interface responsible for executing atomic queueing instructions. Enqueue atomic handier 210 may be entirely contained within memory fabric interface 208.

[0018] Receiving node 206 may further contain a ring buffer 212. As used herein, a ring buffer refers to a fixed-size memory structure used to temporarily store data. Ring buffer 212 may be contained within local memory 214. As used herein, local memory refers to the memory specific to a particular node. For example, in Figure 2, local memory 214 corresponds to the memory specific to receiving node 206. Although a single ring buffer 212 is shown, it is contemplated that multiple ring buffers may be present. Additional ring buffers may be ring buffers similar to ring buffer 212, or they may be reserve ring buffers, described further herein. [0019] Ring buffer 212 may be composed of a single control word followed by an array of message slots. As used herein, a control word refers to metadata that allows an atomic queueing handler, such as enqueue atomic handler 210, to identify and operate on a particular ring buffer. As used herein, a message slot refers to a position within a ring buffer where a message handle may be written. In some examples, the control word may be a 64-bit control word and the array may be composed of 64-bit message slots. The size of the array, N, may be a power of two and in some implementations may be set within the control word. The overall size of the ring buffer 212 would be N+1 words, with each word being 64 bits, in some embodiments, the ring buffer may be composed of a 128 bit control word, with an array composed of 128 bit message slots.

[0020] In some embodiments, an individual message slot may have an index corresponding to its position within the ring buffer 212. For instance, as shown in Figure 2, a message slot may have an index of Slot #0 or Slot #1. A message slot may have a maximum index of Slot # (N- ), where N represents the size of the array.

[0021] When the sending node 204 wants to insert a message handle into the ring buffer 212, it generates an enqueue atomic memory operation, containing the memory address of the ring buffer 212 and the message handle, over the memory fabric 202. The enqueue memory operation is received by the enqueue atomic handier 210. When executing the enqueue atomic memory operation, the enqueue atomic handier 210 may identify a control word of a ring buffer 212 using a memory address of the memory operation. Enqueue atomic handler 210 may then retrieve a tail index from the control word. As used herein, a tail index refers to a position of a slot within a ring buffer. In some embodiments, the tail index may indicate a next slot to be filled in a ring buffer. Once the tail index has been retrieved, enqueue atomic handier 210 may insert a memory word in ring buffer 212 at the position indicated by the tail index, in some instances, the memory word may be a message handle. In other examples, the memory word may be a queue handle which identifies a specific message queue. In still other examples, the memory word may be a sender identifier. Enqueue atomic queueing handier 210 may complete the memory operation by updating the tail index in the control word. In some

embodiments, updating the tail index may include advancing a pointer to point to a next slot in ring buffer 212.

[0022] Receiving node 206 may further include an interrupt controller 218. As used herein, an interrupt controller refers to a hardware component which collects interrupts form various sources and interrupts the processor, interrupt controller 218 may be coupled to memory fabric interface 208 as well as to interrupt handier 220. In some embodiments, interrupt controller 218 may generate an interrupt. As used herein, an interrupt refers to a signal indicating an event that requires attention. An interrupt may be generated by hardware. For example, the enqueue atomic handier 210 may generate an interrupt to an interrupt handier 220. In some instances, enqueue atomic handier 210 may generate an interrupt in response to an updating of the tail index,

[0023] Figure 3 is a block diagram of an example system 300 for atomic queueing consistent with the present disclosure. System 300 may be used for interrupt virtualization and management, and may include multiple components, as illustrated in Figure 3.

[0024] For example, system 300 may include a memory fabric 302. Memory fabric 302 is analogous to memory fabric 102 shown in Figure 1 and memory fabric 202 shown in Figure 2. Memory fabric 302 may consist of connected storage, a connected network, and/or connected processing. In some instances, memory fabric 302 may connect a plurality of nodes to a pool of global, shared memory. Memory fabric 302 may execute an atomic operation consistent with the present disclosure.

[0025] System 300 may further include a sending node 304. Sending node 304 is analogous to sending node 104 shown in Figure 1 and sending node 204 shown in Figure 2. Sending node 304 may be composed of a plurality of processor cores and local memory. Sending node 304 may act like a computer. For instance, sending node 304 may have its own operating system and/or a local memory domain. Sending node 304 may further have its own power supply and/or its own fault domain. As shown in Figure 3, sending node 304 may be coupled to the memory fabric 302. Although a single sending node 304 is shown in Figure 3, it is contemplated that a plurality of sending nodes may be included within system 300, and a plurality of sending nodes 304 may be coupled to memory fabric 302.

[0026] System 300 may further include a receiving node 306. Receiving node 306 is analogous to receiving node 106 shown in Figure 1 and receiving node 206 shown in Figure 2. As shown in Figure 3, receiving node 306 may include multiple components. For instance, receiving node 306 may include a memory fabric interface 308. Memory fabric interface 308 is analogous to memory fabric interface 208 shown in Figure 2. Memory fabric interface 308 may include an enqueue atomic handier 310. As used herein, an enqueue atomic handier refers to the portion of the memory fabric interface responsible for executing atomic queueing instructions. Enqueue atomic handier 310 may be entirely contained within memory fabric interface 308, Receiving node 306 may further contain a local memory 314 and a processor 316, Local memory 314 and processor 316 are analogous to local memory 214 and processor 216, respectively, as shown in Figure 2.

[0027] Receiving node 306 may further include an interrupt controller 318.

Interrupt controller 318 is analogous to interrupt controller 218, shown in Figure 2. Interrupt controller 318 may be coupled to memory fabric interface 308 as well as to interrupt handler 320. System 300 may further contain an interrupt handler 320.

Interrupt handler 320 is analogous to interrupt handier 220, shown in Figure 2. As used herein, an interrupt handier refers to a set of instructions executable to prioritize and respond to interrupts occurring on a system. Interrupt handler 320 may be located on a processor, such as processor 316. Processor 316 is analogous to processor 216 shown in Figure 2.

[0028] Interrupt handler 320 may be coupled to interrupt controller 318 such that interrupt handler 320 is activated upon assertion of an interrupt on processor 316 by interrupt controller 318. Interrupt handler 320 may further be coupled to virtual interrupt queue 324. As used herein, a virtual interrupt queue refers to a stored series of virtual interrupts. Sending node 204 may use the enqueue atomic memory operation over the memory fabric 302 to insert a handle into a message queue, such as message queue 326. Message queue 326 may correspond to a ring buffer, such as ring buffer 212 shown in Figure 2. After inserting a handle into message queue 326, the sending node 304 may use the enqueue atomic memory operation over the memory fabric 302 to insert the queue handle corresponding to the message queue into the virtual interrupt queue 324. The handle inserted into the virtual interrupt queue 324 may correspond to a message queue 326 and may serve to identify the message queue 326. The handle may be a memory address of the message queue, an index corresponding to the message queue, or another identifier that serves to specify the specific message queue. Message queue 326 may in turn correspond to ring buffer 212, shown in Figure 2.

[0029] Once enqueue atomic handier 310 has inserted a handle into interrupt queue 324, enqueue atomic handler 310 may generate an interrupt to interrupt controller 318. Interrupt handler 320 may then activate in response to the interrupt generated to interrupt controller 318. Interrupt handler 320 may then consult interrupt queue 324 to locate the handle stored within the interrupt queue 324. in some instances, the handle stored in interrupt queue 324 may be the address of a message queue, such as message queue 326. Message queue 326 may correspond to a ring buffer, such as ring buffer 212 as shown in Figure 2. In such instances, interrupt handler 320 may then service the message queue 326. In some instances, servicing the message queue may include resolving the event that caused generation of the original interrupt.

[0030] Figure 4 is a block diagram of an example system 400 for atomic queueing consistent with the present disclosure. System 400 may be used for buffer management and may include multiple components, as illustrated in Figure 4.

[0031] For example, system 400 may include a memory fabric 402. Memory fabric 402 is analogous to memory fabric 102, 202, and 302, shown in Figures 1 , 2, and 3, respectively. Memory fabric 402 may consist of connected storage, a connected network, and/or connected processing. In some instances, memory fabric 402 may connect a plurality of nodes to a pool of global, shared memory. Memory fabric 402 may execute an atomic operation consistent with the present disclosure. [0032] System 400 may further include a sending node 404. Sending node 404 is analogous to sending nodes 104, 204, and 304, shown in Figures 1 , 2, and 3, respectively. Sending node 404 may be composed of a plurality of processor cores and local memory. Sending node 404 may act like a computer. For instance, sending node 404 may have its own operating system and/or a local memory domain. Sending node 404 may further have its own power supply and/or its own fault domain. As shown in Figure 4, sending node 404 may be coupled to the memory fabric 402. Although a single sending node 404 is shown in Figure 4, if is contemplated that a plurality of sending nodes may be included within system 400, and a plurality of sending nodes 404 may be coupled to memory fabric 402.

[0033] System 400 may further include a receiving node 406. Receiving node 406 is analogous to receiving nodes 106, 206, and 306, shown in Figures 1 , 2 and 3, respectively. As shown in Figure 4, receiving node 406 may include multiple components. For instance, receiving node 406 may include a memory fabric interface 408, Memory fabric interface 408 is analogous to memory fabric interfaces 208 and 308, shown in Figures 2 and 3, respectively. Memory fabric interface 408 may include an enqueue atomic handler 410. Enqueue atomic handler 410 is analogous to enqueue atomic handler 310, shown in Figure 3.

[0034] Receiving node 406 may further contain a local memory 414 and a processor 416. Local memory 414 and processor 416 are analogous to local memories 214 and 314, and processors 216 and 316, respectively, as shown in Figures 2 and 3. As shown in Figure 4, local memory 414 may contain a ring buffer 412. Ring buffer 412 is analogous to ring buffers 212 and 312, shown in Figures 2 and 3, respectively. Local memory 414 may further contain a reserve ring buffer 428 and a buffer array 430. The reserve ring buffer 428 is analogous to ring buffer 412 and may be the same size as ring buffer 412 or it may be a different size. However, reserve ring buffer 428 and buffer array 430 are to have the same size. Buffer array 430 includes a plurality of buffers 432-1 , 432-2...432-N.

[0035] To reserve a buffer using system 400, receiving node 406 may set the reserve ring buffer 428 up to allow the atomic memory operation to proceed on the reserve ring buffer 428. Receiving node 406 may then pre-allocate the plurality of buffers 432-1 through 432-N and populate the buffer array 430 with pointers to the pre- allocated buffers 432-1 through 432-N. Pre-allocation may be done using a malloc library, allocating on a stack, or splitting a large memory region into smaller, equally- sized pieces.

[0036] Once the buffer array 430 has been populated, the sending node 404 may use the enqueue atomic memory operation over the memory fabric 402 to insert a sender identification into the reserve buffer ring 428. Upon insertion, a unique index may be returned by the atomic memory operation to the sending node 404. The index may be used to point to a slot position in the buffer array 430. The slot position in the buffer array 430 indicated by the index may further contain a pointer that points to the reserved buffer.

[0037] Figure 5 is a block diagram of an example method 540 for atomic queueing consisting with the present disclosure. At 542, method 540 may include receiving an enqueue atomic memory operation. The memory operation may be received on the receiving node shown in Figures 1 -4 and may be sent by the sending node shown in Figures 1 -4,

[0038] At 544, method 540 may include identifying a memory address. In some embodiments, identifying a memory address may include identifying the memory address of the atomic memory operation. At 546, method 540 may include identifying a control word, in some embodiments, the control word may be identified using the memory address identified at 544. in such embodiments, the control word may be thought of as a memory word. In some embodiments, identifying a control word may include identifying a ring buffer among a plurality of ring buffers, wherein the ring buffer will be used for storing a message.

[0039] At 548, method 540 may include retrieving a tail index. In some embodiments, the tail index may be retrieved from the control word identified at 546.

[0040] At 550, method 540 may include inserting a memory word of the atomic memory operation into a slot within the ring buffer. In some embodiments, inserting the memory word into a slot within the ring buffer may include storing a message handle in the slot. [0041] At 552, method 540 may include updating the tail index. In some instances, the tail index may be updated by advancing the ring buffer. Advancing the ring buffer may include advancing a pointer to point to the next open and available slot in the ring buffer. As the ring buffer has a fixed size, advancing the ring buffer may further include increasing the tail index by one. if the increased tail index exceeds the size of the ring buffer, advancing the ring buffer may include resetting the tail index to zero or one. The size of the ring buffer may be extracted from a control word.

[0042] Method 540 may further include returning the result of the insertion into the ring buffer to the sending node. In some instances, returning the result of the insertion may include returning an index showing the location within the ring buffer of the inserted message. In other embodiments, returning the results of the insertion into the ring buffer may include returning an error message. The error message may indicate that the ring buffer is full and that insertion is therefore unable to proceed. In such instances, method 540 may include checking whether the ring buffer is full, so as to know if an error message is to be returned. Checking whether the ring buffer is full may include extracting a head index from the control word associated with the ring buffer and comparing it with the tail index of the ring buffer.

[0043] in the foregoing detailed description of the present disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration how examples of the disclosure may be practiced. These examples are described in sufficient detail to enable those of ordinary skill in the art to practice the examples of this disclosure, and it is to be understood that other examples may be utilized and that process, electrical, and/or structural changes may be made without departing from the scope of the present disclosure.

[0044] The figures herein follow a numbering convention in which the first digit corresponds to the drawing figure number and the remaining digits identify an element or component in the drawing. Elements shown in the various figures herein may be added, exchanged, and/or eliminated so as to provide a number of additional examples of the present disclosure. In addition, the proportion and the relative scale of the elements provided in the figures are intended to illustrate the examples of the present disclosure, and should not be taken in a limiting sense. Further, as used herein, "a number of an element and/or feature can refer to one or more of such elements and/or features.

[0045] As used herein, logic" is an alternative or additional processing resource to perform a particular action and/or function, etc., described herein, which includes hardware, e.g., various forms of transistor logic, application specific integrated circuits (ASICs), etc., as opposed to computer executable instructions, e.g., software firmware, etc., stored in memory and executable by a processor.

Claims

What is claimed:

1. An atomic memory operation system comprising:

a memory fabric to couple a receiving node and a sending node, wherein the memory fabric is to execute an atomic memory operation by:

identifying a control word of a ring buffer using a memory address of the memory operation;

retrieving a tail index from the control word, wherein the tail index indicates a position in a ring buffer;

inserting a memory word in the ring buffer at the position indicated by the tail index, wherein the memory word corresponds to the memory operation; and

updating the tail index in the control word to point to a next slot in the ring buffer.

2. The system of claim 1 , wherein updating the tail index in the control word to point to a next slot in the ring buffer is based on a ring size in the control word.

3. The system of claim 1 , wherein the memory fabric is to execute the atomic memory operation by:

returning the results of the insertion into the ring buffer, wherein returning the results of the insertion includes returning an index in the ring buffer showing where the message was inserted.

4. The system of claim 3, wherein returning the results of the insertion into the ring buffer includes returning an error message indicating that the ring buffer is full and the insertion may not proceed.

5. The system of claim 1 , wherein a plurality of sending nodes insert a plurality of memory words into the ring buffer.

6. The system of claim , further comprising the memory fabric to execute the atomic memory operation by: generating an interrupt via an interrupt controller on the receiving node; and storing a handle corresponding to the memory address, wherein the sending node inserts, via the memory fabric, a message queue identifier corresponding to the ring buffer into an interrupt queue.

7. The system of claim 1 , further comprising:

an interrupt handier, wherein the interrupt handler:

activates responsive to insertion of a handle corresponding to a memory address into an interrupt queue;

alerts the system upon receipt of an interrupt;

consults the interrupt queue,

locates the address of a message queue corresponding to the ring buffer stored in the interrupt queue; and

services the message queue, wherein servicing the message queue includes resolving an event causing an interrupt to be generated.

8. The system of claim 1 , further comprising the memory fabric to execute the atomic memory operation by:

generating an interrupt, wherein:

the interrupt is generated to the receiving node; and

the interrupt is generated in response to the updating of the tail index.

9. A system comprising:

a memory fabric coupling a receiving node and a sending node, wherein the receiving node, via the memory fabric, is to execute an atomic memory operation by:

defining a ring buffer in a memory fabric interface to receive a message from the sending node;

defining a reserve buffer ring and a buffer array;

configuring the reserve buffer ring to allow the atomic memory operation to proceed on the reserve buffer ring;

pre-ailocating a plurality of buffers; and populating the buffer array with pointers to the pre-allocated buffers.

10. The system of claim 9, further comprising the memory fabric to execute the atomic memory operation by:

inserting a sender identification into the reserve ring; and

returning a unique index to the sender, wherein:

the unique index reserves a buffer within the buffer array; and the unique index points to the reserved buffer within the buffer array.

1 1. A method for atomic memory operation, comprising:

receiving, by a memory fabric interface on a receiving node, a memory operation; identifying a memory address of the memory operation;

using the identified memory address to identify a control word;

retrieving a tail index from the control word;

inserting the control word of the memory operation into a slot within a ring buffer; and

updating the tail index of the control word by advancing the ring buffer.

12. The method of claim 1 1 , further comprising:

returning, via the memory fabric interface, the result of the insertion into the ring buffer to a sending node.

13. The method of claim 1 1 , wherein inserting the control word of the memory operation into the ring buffer includes:

identifying a ring buffer among a plurality of ring buffers on the receiving node using the address of the control word.

14. The method of claim 12, wherein returning the results of the insertion into the ring buffer includes returning an index in the ring buffer showing where the message was inserted.

15. The method of claim 12, wherein returning the results of the insertion into the ring buffer includes returning an error message indicating that the ring buffer is full and the insertion may not proceed.