WO2022225576A1 - Verbe d'ajout à rdma - Google Patents

Verbe d'ajout à rdma Download PDF

Info

Publication number
WO2022225576A1
WO2022225576A1 PCT/US2021/070908 US2021070908W WO2022225576A1 WO 2022225576 A1 WO2022225576 A1 WO 2022225576A1 US 2021070908 W US2021070908 W US 2021070908W WO 2022225576 A1 WO2022225576 A1 WO 2022225576A1
Authority
WO
WIPO (PCT)
Prior art keywords
append
memory
nic
rdma
pointer
Prior art date
Application number
PCT/US2021/070908
Other languages
English (en)
Inventor
Chun Liu
Chaohong Hu
Tony Mak
Hei Tao Fung
Mike Sheng Con Hsu
Xin LIAO
Original Assignee
Futurewei Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Futurewei Technologies, Inc. filed Critical Futurewei Technologies, Inc.
Priority to PCT/US2021/070908 priority Critical patent/WO2022225576A1/fr
Publication of WO2022225576A1 publication Critical patent/WO2022225576A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1657Access to multiple memories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/12Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor
    • G06F13/124Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor where hardware is a sequential transfer control unit, e.g. microprocessor, peripheral processor or state-machine
    • G06F13/128Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor where hardware is a sequential transfer control unit, e.g. microprocessor, peripheral processor or state-machine for dedicated transfers to a network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1663Access to shared memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/901Buffering arrangements using storage descriptor, e.g. read or write pointers

Definitions

  • Distributed computing systems can include multiple computing nodes that are remote from each other either physically or logically.
  • a computing node can include a host device that includes a processor, memory, and one or more applications executing on the processor.
  • Remote Direct Memory Access (RDMA) allows an application of one computing node to access the memory of another computing node without the processor of the other computing node needing to process the memory request. This reduces processor overhead and latency in the system.
  • RDMA Remote Direct Memory Access
  • An RDMA single-sided verb is one type of RDMA operation that operates directly on memory of a remote computing node.
  • RDMA single-sided verb operations include RDMA reads, RDMA writes and atomic memory operations.
  • an RDMA single-sided verb operation will require two rounds of message exchange to write data without conflict - a first message round to allocate the address for the written data using atomic verbs and a second message round to write the data using the given address.
  • RDMA Remote Direct Memory Access
  • a computer implemented method of sending data from a processing resource to a memory resource remote from the processing resource is implemented by a computer system that includes one or more processing resources, one or more memory resources, and at least one network interface card (NIC).
  • the computer implements method includes initiating a remote direct memory access (RDMA) append command from the processing resource to the memory resource remote from the processing resource.
  • the RDMA append command includes append data and an identifier of a memory region of the remote memory resource.
  • the method further includes writing, by the NIC, the append data to a memory location in the identified memory region indicated by an append pointer associated with the identified memory region; returning a value of the append pointer corresponding to the memory location to the processing resource; and updating the append pointer to a next memory location of the memory region for writing next append data.
  • another implementation of the aspect provides sending an RDMA append command that includes a size of the append data, and updating the value of the append pointer for the memory region using the size of the append data.
  • another implementation of the aspects provides registering the memory region with the NIC using another processing resource.
  • another implementation of the aspects provides a remote memory resource that includes multiple registered memory regions registered using multiple processing resources.
  • the method further provides receiving multiple RDMA append commands from multiple processing resources for the multiple registered memory regions, updating an append pointer for each of the registered memory regions using the NIC when there is an RDMA append command to the registered memory region, and returning the updated append pointer in a completion status sent to a processing resource originating the RDMA append command.
  • another implementation of the aspects provides storing a table in memory of the NIC that includes memory region identifiers and append pointers for the multiple registered memory regions.
  • another implementation of the aspects provides writing append data of the multiple RDMA append commands to sequential memory locations indicated by the append pointer.
  • another implementation of the aspects provides returning the value of the append pointer in a completion status returned to the processing resource by the NIC.
  • another implementation of the aspects provides notifying a processing resource associated with the remote memory resource of the writing of the append data by the NIC placing a completion entry in a completion queue of the NIC.
  • a network interface card (NIC) of a distributed computer system comprises processing circuitry configured to decode a remote direct memory access (RDMA) append command received from a processing resource of the distributed computer system remote from the NIC, wherein the RDMA append command includes append data and an identifier of a memory region of a memory resource local to the NIC, write the append data to a location in the memory region indicated by an append pointer associated with the identified memory region, encode a message for sending to the processing resource that includes a value of the append pointer associated with the memory location of the append data, and update the append pointer to a next location of the memory region for writing next append data.
  • RDMA remote direct memory access
  • another implementation of the aspect provides processing circuitry configured to decode a size field in the RDMA append command that indicates a size of the append data, and update the value of the append pointer for the memory region using the size of the append data.
  • another implementation of the aspect provides processing circuitry configured to encode a completion status message for sending to the processing resource that includes the value of the append pointer associated with the memory location of the append data.
  • another implementation of the aspect provides processing circuitry configured to store a table in the memory that includes memory region identifiers for multiple registered memory regions and append pointers for the multiple registered memory regions.
  • another implementation of the aspect provides processing circuitry configured to decode multiple RDMA append commands received from multiple processing resources remote from the NIC, and write append data received in the multiple RDMA commands to multiple registered memory regions identified in the multiple RDMA commands.
  • another implementation of the aspect provides a completion queue and processing circuitry configured to notify a processing resource local to the memory resource of the writing of the append data by placing a completion entry in the completion queue.
  • a distributed computer system provides multiple computing nodes wherein each computing node is remote from other computing nodes and includes a processing resource, a memory resource, and a network interface card (NIC).
  • the processing resource of a first computing node is configured to receive information of a registered memory region of a memory resource of a second computing node from a NIC of the second computing node, and send a remote direct memory access (RDMA) append command to the registered memory region, wherein the RDMA append command includes append data and an identifier of the registered memory region.
  • RDMA remote direct memory access
  • the NIC of the second computing node is configured to write the append data to a location in the registered memory region indicated by an append pointer for the identified memory region, return a value of the append pointer associated with the memory location of the append data to the processing resource of the first computing node, and update the append pointer to a next location of the registered memory region for writing next append data.
  • another implementation of the aspect provides the NIC of the second computing node configured to return the value of the append pointer in a completion status returned to the processing resource of the first computing node.
  • another implementation of the aspect provides the processing resource of the first computing node configured to send an RDMA append command that includes a size of the append data and the NIC of the second computing node is configured to update the value of the append pointer for the registered memory region using the size of the append data.
  • another implementation of the aspect provides the NIC of the second computing node configured to receive multiple RDMA append commands from multiple processing resources for multiple registered memory regions of the memory resource of the second computing node, write append data and update an append pointer for each of the registered memory regions when there is an RDMA append command to the registered memory region, and return the updated append pointer in a completion status sent to a processing resource originating the RDMA append command.
  • another implementation of the aspect provides the NIC of the second computing node configured to store a table that includes memory region identifiers and append pointers for the multiple registered memory regions.
  • the examples can be implemented in hardware, software or in any combination thereof.
  • the explanations provided for each of the first through third aspects and their implementation forms apply equally to other ones of the first through third aspects and the corresponding implementation forms. These aspects and implementation forms may be used in combination with one another.
  • Fig. l is a block diagram of two computing nodes of a distributed computing system to implement one or more example embodiments.
  • FIG. 2 is an illustration of a communication protocol among computing nodes of a distributed computer system to implement one or more example embodiments.
  • Fig. 3 is an illustration of another example of a communication protocol among computing nodes of a distributed computer system to implement one or more example embodiments.
  • FIG. 4 is a flow diagram of an example of a method of a communication protocol to implement one or more example embodiments.
  • FIG. 5 is a flow diagram of a method of processing a Remote
  • Fig. 6 is a block schematic diagram of a computer system to implement one or more example embodiments.
  • the functions or algorithms described herein may be implemented in software in one embodiment.
  • the software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non-transitory memories or other type of hardware-based storage devices, either local or networked.
  • modules which may be software, hardware, firmware, or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples.
  • the software may be executed on a digital signal processor, application specific integrated circuit (ASIC), microprocessor, or other type of processor operating on a computer system, such as a personal computer, server, or other computer system, turning such computer system into a specifically programmed machine.
  • ASIC application specific integrated circuit
  • the functionality can be configured to perform an operation using, for instance, software, hardware, firmware, or the like.
  • the phrase “configured to” can refer to a logic circuit structure of a hardware element that is to implement the associated functionality.
  • the phrase “configured to” can also refer to a logic circuit structure of a hardware element that is to implement the coding design of associated functionality of firmware or software.
  • the term “module” refers to a structural element that can be implemented using any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any combination of hardware, software, and firmware.
  • logic encompasses any functionality for performing a task.
  • each operation illustrated in the flowcharts corresponds to logic for performing that operation.
  • An operation can be performed using software, hardware, firmware, or the like.
  • the terms “component,” “system,” and the like may refer to computer-related entities, hardware, and software in execution, firmware, or combination thereof.
  • a component may be a process running on a processor, an object, an execution, a program, a function, a subroutine, a computer, or a combination of software and hardware.
  • processor may refer to a hardware component, such as a processing unit of a computer system.
  • the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computing device to implement the disclosed subject matter.
  • article of manufacture is intended to encompass a computer program accessible from any computer-readable storage device or media.
  • Computer-readable storage media can include, but are not limited to, magnetic storage devices, e.g., hard disk, floppy disk, magnetic strips, optical disk, compact disk (CD), digital versatile disk (DVD), smart cards, flash memory devices, among others.
  • computer-readable media i.e., not limited to storage media
  • Computing systems include processing resources (e.g., processors or processing units) and memory resources (e.g., various forms of data storage devices).
  • Distributed computing systems can include multiple computing nodes that are remote from each other. This means processing resources of the system can be located physically or logically remote from some data storage resources, such as in a distributed database system, cloud computing system, etc.
  • Fig. l is a block diagram of an example of two computing nodes of a distributed computing system (e.g., a distributed database system, cloud computing system, etc.).
  • An actual system can include many computing nodes.
  • the system 100 in the example of Fig. 1 shows a host side node 102 and a remote side node 104 physically or logically separate from the host side node.
  • the host side node 102 includes a memory resource 106 and a host application 108 that executes on a processing resource of the host side node 102.
  • the remote side node 104 includes a memory resource 110 and a remote application 112 that executes on a processing resource of the remote side node 104.
  • each computing node includes an RDMA network interface card (NIC) 116, 118.
  • NIC RDMA network interface card
  • the remote application 112 registers the memory resource 110, or a memory region of the memory resource 110, with NIC 118. Registering the memory alerts NIC 118 that the memory or memory region is available for RDMA communication.
  • the NIC 118 creates a registered memory region in the memory resource 110. This allows other computing nodes (e.g., the host side node) to directly access the memory resource 110 via communication network 114 without involvement of the processing resource of the remote side node 104. This reduces the processing load on the remote side node and reduces latency of memory access operations.
  • the host application may also register a memory region of the memory resource 106 of the host side node 102 with NIC 116.
  • the host application 108 creates a channel from NIC 116 to NIC 118.
  • the host operation 108 can then use an RDMA Write operation or an RDMA Read operation by sending a request for the operation to the Send Queue (SQ) of NIC 116 and monitor for the return of the operation in the Complete Queue (CQ).
  • the host application 108 can also use “Atomic Fetch and Add” operations and “Atomic Compare and Swap” operations to facilitate synchronization between multiple client applications of the computing nodes of the system 100.
  • Fig. 2 is an illustration of an example of a communication protocol among RDMA NICs of computing nodes of a distributed computer system.
  • the example shows three Client applications (222, 224, 226) and three RDMANICs (228, 230, 232) of three computing nodes, and shows portions of a registered memory resource 210 and a RDMA NIC 234 of a computing node remote from the Client applications.
  • the Client applications (222, 224, 226) run on processing resources of their computing nodes.
  • Client application 222 and Client application 224 want to append data (labeled [1] and [2] respectively) in the registered memory resource 210.
  • ROMANIC 228 associated with Client Application 222) sends an RDMA “Fetch and Add” verb 236 command to destination RDMA NIC 234 (associated with the registered memory resource 210).
  • the “Fetch and Add” verb 236 includes the address of the write pointer 245 (&offset) and the size (sizei) of data [1]
  • RDMA NIC 230 associated with Client Application 224) sends “Fetch and Add” verb 238 to RDMA NIC 234 with the address of the write pointer 245 (&offset) and the size (size2) of data [2]
  • NIC 234 returns the current value (offseti) of the write pointer 245 to RDMA NIC 228 to indicate where RDMA NIC 228 should write data [1] in memory resource 210.
  • RDMA NIC 234 also adds sizei to the write pointer 245 to update the value of the write pointer 245 (to offseti).
  • RDMA NIC 234 returns the current value of the write pointer 245 (now offseti) to RDMA NIC 230.
  • RDMA NIC 234 adds sizei to the write pointer 245 to update the value of the write pointer 245 (e.g., to offseti).
  • RDMA NIC 228 sends RDMA “Write” verb 244 that includes offseti, datai (i.e., data [1]), and sizei.
  • destination RDMA NIC 234 writes the append data [1] starting with a memory location indicated by offseti.
  • RDMA NIC 230 sends RDMA “Write” verb 246 that includes offseti, datai (i.e., data [2]), and sizei.
  • destination RDMA NIC 234 writes data [2] starting with a memory location indicated by offseti.
  • RDMA NIC 234 send acknowledge messages (ACK) 248, 250 to acknowledge that the data is written.
  • ACK acknowledge messages
  • FIG. 3 is an illustration of another example of a communication protocol implemented with a distributed computer system that has multiple computing nodes that include RDMA NICs.
  • the example of Fig. 3 again includes three Client applications (322, 324, 326) and three RDMA NICs (328, 330, 332) of three computing nodes, and portions of a registered memory resource 310 and a RDMA NIC 334 of a fourth computing node remote from the three computing nodes.
  • the Client applications (322, 324, 326) want to send append data (labeled [1], [2], and [3], respectively) to the registered memory resource 310.
  • the RDMA NICs associated with the Client Applications send an RDMA append command (an RDMA Append Verb) to the RDMA NIC 234 of the remote node.
  • an RDMA Append Verb an RDMA Append Verb
  • the Client applications may place a send queue entry (SQE) in the send queue (SQ) of the source RDMA NIC as shown in Fig. 1.
  • the opcode of the SQE is for the RDMA Append Verb instead of an RDMA Write command.
  • the registered memory resource 310 includes multiple memory regions that may each be registered, and each have an identifier ( ⁇ D X ) for the memory region.
  • the RDMA NIC 334 manages an Append Pointer for each of the memory regions.
  • the RDMA NIC 334 may store a table 352 that includes an Append Pointer stored in association with the memory region ID.
  • the Client application 322 of the first computing node sends append data [1] to RDMA NIC 328.
  • Source RDMA NIC 328 sends an RDMA Append Verb 354 (Appendi) to the destination RDMA NIC 334 associated with the registered memory resource 310.
  • the Appendi operation includes the identifier for the memory region ( ⁇ D X ) and the append data [1], and may also include the data length (len([l])) or size of the append data.
  • Destination RDMA NIC 334 writes the append data [1] to a memory location indicated by the Append Pointer 356 for the identified memory region.
  • Destination RDMA NIC 334 returns a value of the Append Pointer 356 (AP_1) associated with the written memory location in a Response 358 back to the source RDMA NIC 334.
  • the value AP_1 of the Append Pointer 356 indicates where in the memory region append data [1] was written.
  • the source RDMA NIC 328 may send the value AP_1 of the Append Pointer 356 to the Client application 322.
  • the Response 358 from the destination RDMA NIC 334 may be a completion status message returned by the RDMA NIC 334.
  • the destination RDMA NIC 334 updates the value of the Append Pointer 356 in the table 352 to the next location of the identified memory region for writing the next append data by adding len[l] to the value AP_1 to get a new value AP_2 of the Append Pointer 356.
  • Client Application 324 initiates sending a second RDMA Append
  • Verb 360 (Append?) by sending append data [2] to RDMA NIC 330.
  • the Response 358 to the first Append Verb Appendi is followed by Append Verb Append2, but Append2 may be sent before the Response 358 is returned.
  • Source RDMA NIC 330 sends Append Verb Append2 to the destination RDMA NIC 334, and Append2 includes the identifier for the memory region ( ⁇ D X ), the append data [2], and the data length (len([2])).
  • Destination RDMA NIC 334 writes the append data [2] to the memory location indicated by the updated Append Pointer 356, now AP_2, for memory region ⁇ D X.
  • the destination RDMA NIC 334 returns the Append Pointer value (AP_2) associated with the written memory location back in a Response 362 to the source RDMA NIC 330 and updates the value of the Append Pointer 356 to the next location of the identified memory region for writing the next append data by adding len[2] to AP_2 to get AP_3.
  • Source RDMA NIC 330 may return the completion status and the append pointer value AP_2 to the Client application 324.
  • Client Application 326 initiates sending a third RDMA Append
  • Verb 364 (Appends) by sending append data ([3]) to RDMA NIC 232.
  • source RDMA NIC 332 sends Append Verb Appends to the destination RDMA NIC 334, and Appends includes the identifier for the memory region (ID X ), the append data [3], and the data length (len([3])).
  • Destination RDMA NIC 334 writes the append data [3] to the memory location indicated by the updated Append Pointer 356, now AP_3, for memory region ⁇ D X .
  • the destination RDMA NIC 334 returns the Append Pointer value (AP_3) associated with the written memory location back in a Response 366 to the source RDMA NIC 332 and updates the value of the Append Pointer 356 to the next location of the identified memory region for writing the next append data by adding len[3] to AP_3 to get the value AP_4.
  • RDMA NIC 332 may return the completion status and the append pointer value AP_3 to the Client application 326.
  • the append data from the Client Applications is written to sequential memory locations in the designated memory region based on the updated values of the Append Pointer for the memory region. The process may continue with the next data (e.g., fourth append data [4]) from any Client application.
  • Registered memory 310 can include multiple memory regions (e.g., ID X , ⁇ D y ,
  • any Client application can send an RDMA Append Verb to any of the memory regions to write append data in the memory region identified in the RDMA Append Verb.
  • the destination RDMA NIC 234 stores table 352 to manage Append Pointers for the multiple memory regions.
  • the RDMA NIC of any of the computing nodes of the system can be a destination RDMA NIC if its memory is registered for RDMA operations.
  • Fig. 4 is a flow diagram of an example of a method 400 of a communication protocol implemented by a distributed computer system that includes one or more processing resources, one or more memory resources, and at least one NIC.
  • the communication protocol may be performed using the distributed computer system shown in Fig. 1.
  • an RDMA append command is initiated from a processing resource to a memory resource remote from the processing resource.
  • the append command may be initiated by a Client application executing on the processing resource.
  • the RDMA Append command includes sending an RDMA Append Verb that includes append data and an identifier of a memory region of the remote memory resource, such as RDMA Append Verb 354 in Fig. 3.
  • the Client application 108 may initiate the Append operation by writing an entry in the send queue (SQ) of the NIC 116 of the source computing node.
  • the append data is sent from the source computing node NIC 116 to the destination computing node NIC 118.
  • the NIC 118 of the destination computing node writes append data to a memory location in the identified memory region indicated by an append pointer associated with the identified memory region.
  • the NIC 118 returns a value of the append pointer corresponding to the memory location to the processing resource of the source computing node. This notifies the Client application 108 of the value of the append pointer associated with the memory location of the append data.
  • the append pointer may be returned in a completion queue entry (CQE) of the completion queue (CQ) of the NIC 116 of the source computing node.
  • the NIC 118 of the destination node updates the append pointer to point to the next memory location of the memory region for writing the next append data received for that memory region by adding the size of the written data to the value of the append pointer.
  • the RDMA Append Verb does not consume any of the processing resources of the destination computing node. However, the processing resources of the destination computing node can optionally be notified of the change to the memory resource of the destination computing node.
  • NIC 118 can include a complete queue (CQ).
  • CQ complete queue
  • the destination RDMA NIC 334 in Fig. 3 may place a complete queue entry (CQE) in its complete queue to notify the processing resource associated with the destination RDMA NIC 334 that an RDMA Append operation was processed and there is new append data in the registered memory.
  • the processing resource will process the new complete queue entry in the queue in an asynchronous fashion as part of the normal queue polling mechanism and will become aware of the new append data by processing the new complete queue entry.
  • the send queue entry (SQE) corresponding to the RDMA Append operation may include a notification field to indicate that notification to the destination processing resource of the Append operation is requested.
  • Fig. 5 is a flow diagram of an example of a method 500 of processing an RDMA Append Verb by the destination computing node.
  • the method 500 may be implemented by the destination RDMA NIC 334 in Fig. 3.
  • the RDMA NIC 334 waits for a new entry to be received. If the RDMA NIC 334 determines at block 510 that the opcode of the entry is an Append Request, at block 515 the entry is decoded to retrieve the identifier ( ⁇ D X ) of the destination memory region, the append data (D), the length of the data (L), and optionally that notification (N) to the destination processing resource or resources is enabled.
  • the RDMA NIC 334 locates the Append Pointer (AP) corresponding to ⁇ D X and writes Data (D) with Length (L) to the location specified by the AP.
  • AP Append Pointer
  • the RDMA NIC 334 determines whether
  • Notification is indicated in the Append Request. If the Notification is enabled, at block 525, the RDMA NIC 234 places a complete queue entry (CQE) in its complete queue (CQ).
  • the CQE can include the identifier ID X , the Append Pointer (AP) and the Notification indicator (N). If Notification is not indicated, the process branches around the CQE block.
  • the Append Pointer for the ID X memory region is updated from AP to AP’ using the current value of the Append Pointer (AP) and the Data Length (L).
  • the destination processing resource may process the newly added CQE in an asynchronous fashion as part of a normal polling mechanism.
  • Fig. 6 is a block schematic diagram of a computer system 600 for performing methods and algorithms according to example embodiments. All components need not be used in various embodiments.
  • One example is a computing device that may include a processing unit 602, memory 603, removable storage 610, and non-removable storage 612.
  • the example computing device is illustrated and described as computer 600, the computing device may be in different forms in different embodiments.
  • the computing device may be a server, a router, or a virtual router.
  • the various data storage elements are illustrated as part of the computer 600, the storage may also or alternatively include cloud-based storage accessible via a network, such as the Internet or server-based storage.
  • an SSD may include a processor on which the parser may be run, allowing transfer of parsed, filtered data through I/O channels between the SSD and main memory.
  • Memory 603 may include volatile memory 614 and non-volatile memory 608.
  • Computer 600 may include — or have access to a computing environment that includes — a variety of computer-readable media, such as volatile memory 614 and non-volatile memory 608, removable storage 610 and non-removable storage 612.
  • Computer storage includes random-access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD ROM), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.
  • Computer 600 may include or have access to a computing environment that includes input interface 606, output interface 604, and a communication interface 616.
  • Output interface 604 may include a display device, such as a touchscreen, that also may serve as an input device.
  • the input interface 606 may include one or more of a touchscreen, touchpad, mouse, keyboard, camera, one or more device-specific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computer 600, and other input devices.
  • the computer 600 may operate in a networked environment using a communication connection to connect to one or more remote computers, such as database servers.
  • the remote computer may include a personal computer (PC), server, router, network PC, a peer device or other common data flow network switch, or the like.
  • the communication connection may include a Local Area Network (LAN), a Wide Area Network (WAN), cellular, Wi-Fi, Bluetooth, or other networks.
  • LAN Local Area Network
  • WAN Wide Area Network
  • Bluetooth wireless Fidelity
  • the various components of computer 600 are connected with a system bus 620.
  • Computer-readable instructions stored on a computer-readable medium are executable by the processing unit 602 of the computer 600, such as a program 618.
  • the program 618 in some embodiments comprises software to implement one or more methods or algorithms described herein.
  • a hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium, such as a storage device.
  • the terms computer- readable medium and storage device do not include carrier waves to the extent carrier waves are deemed too transitory.
  • Storage can also include networked storage, such as a storage area network (SAN).
  • Computer program 618 along with the workspace manager 622 may be used to cause processing unit 602 to perform one or more methods or algorithms described here

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer And Data Communications (AREA)

Abstract

Un procédé mis en œuvre par ordinateur consiste à initier une commande d'ajout à un accès direct à la mémoire à distance (RDMA) d'une ressource de traitement à une ressource de mémoire à distance de la ressource de traitement (405), la commande d'ajout à RDMA comprenant des données d'ajout et un identifiant d'une région de mémoire de la ressource de mémoire à distance ; à écrire, par la carte d'interface réseau, des données d'ajout à un emplacement de mémoire dans la région de mémoire identifiée indiquée par un pointeur d'ajout associé à la région de mémoire identifiée (410) ; à renvoyer une valeur du pointeur d'ajout correspondant à l'emplacement de mémoire à la ressource de traitement (415) ; et à mettre à jour le pointeur d'ajout à un emplacement de mémoire suivant de la région de mémoire pour écrire les données d'ajout suivantes (420).
PCT/US2021/070908 2021-07-20 2021-07-20 Verbe d'ajout à rdma WO2022225576A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2021/070908 WO2022225576A1 (fr) 2021-07-20 2021-07-20 Verbe d'ajout à rdma

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/070908 WO2022225576A1 (fr) 2021-07-20 2021-07-20 Verbe d'ajout à rdma

Publications (1)

Publication Number Publication Date
WO2022225576A1 true WO2022225576A1 (fr) 2022-10-27

Family

ID=77367493

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/070908 WO2022225576A1 (fr) 2021-07-20 2021-07-20 Verbe d'ajout à rdma

Country Status (1)

Country Link
WO (1) WO2022225576A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222596A1 (en) * 2007-12-06 2009-09-03 David Flynn Apparatus, system, and method for coordinating storage requests in a multi-processor/multi-thread environment
WO2017156549A1 (fr) * 2016-03-11 2017-09-14 Purdue Research Foundation Système d'accès indirect à distance à une mémoire d'ordinateur
US20190102087A1 (en) * 2017-09-29 2019-04-04 Oracle International Corporation Remote one-sided persistent writes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222596A1 (en) * 2007-12-06 2009-09-03 David Flynn Apparatus, system, and method for coordinating storage requests in a multi-processor/multi-thread environment
WO2017156549A1 (fr) * 2016-03-11 2017-09-14 Purdue Research Foundation Système d'accès indirect à distance à une mémoire d'ordinateur
US20190102087A1 (en) * 2017-09-29 2019-04-04 Oracle International Corporation Remote one-sided persistent writes

Similar Documents

Publication Publication Date Title
US12050623B2 (en) Synchronization cache seeding
CN110392084B (zh) 在分布式系统中管理地址的方法、设备和计算机程序产品
US9706002B2 (en) Push notification via file sharing service synchronization
JP6325001B2 (ja) 階層データ構造のノードにおいて再帰的イベントリスナを用いる方法およびシステム
US10754588B2 (en) Performing data operations in a storage area network
US8874638B2 (en) Interactive analytics processing
CN110119304B (zh) 一种中断处理方法、装置及服务器
TWI773959B (zh) 用於處理輸入輸出儲存指令之資料處理系統、方法及電腦程式產品
US20210132860A1 (en) Management of multiple physical function non-volatile memory devices
US20220214901A1 (en) Migration speed-up for multiple virtual machines
WO2022017475A1 (fr) Procédé d'accès à des donnés et dispositif associé
KR102210289B1 (ko) 하드웨어 관리 통신 프로토콜
CN110870286B (zh) 容错处理的方法、装置和服务器
WO2018119116A1 (fr) Processeur de flux de données à la fois en mémoire et en messagerie persistante
US20240275740A1 (en) RDMA Data Transmission System, RDMA Data Transmission Method, and Network Device
CN109325002B (zh) 文本文件处理方法、装置、系统、电子设备、存储介质
US20160197849A1 (en) Method and Apparatus for Implementing a Messaging Interface
US10581997B2 (en) Techniques for storing or accessing a key-value item
US20180060273A1 (en) Disk access operation recovery techniques
US10785295B2 (en) Fabric encapsulated resilient storage
CN109445966B (zh) 事件处理方法、装置、介质和计算设备
WO2022225576A1 (fr) Verbe d'ajout à rdma
US20190387052A1 (en) Method, device and computer program product for transaction negotiation
WO2023029485A1 (fr) Procédé et appareil de traitement de données, dispositif informatique, et support de stockage lisible par ordinateur
US20170118146A1 (en) Using send buffers and receive buffers for sending messages among nodes in a network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21755880

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21755880

Country of ref document: EP

Kind code of ref document: A1