CN114531907B - System and method for handling out-of-order delivery transactions - Google Patents

System and method for handling out-of-order delivery transactions Download PDF

Info

Publication number
CN114531907B
CN114531907B CN202080066674.5A CN202080066674A CN114531907B CN 114531907 B CN114531907 B CN 114531907B CN 202080066674 A CN202080066674 A CN 202080066674A CN 114531907 B CN114531907 B CN 114531907B
Authority
CN
China
Prior art keywords
packet
transaction
index indication
packets
transactions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080066674.5A
Other languages
Chinese (zh)
Other versions
CN114531907A (en
Inventor
本-沙哈尔.贝尔彻
鲁文.科恩
利奥.赫尔莫什
盖伊.沙塔
曲会春
卢胜文
塔尔.米兹拉希
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN114531907A publication Critical patent/CN114531907A/en
Application granted granted Critical
Publication of CN114531907B publication Critical patent/CN114531907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1657Access to multiple memories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/161Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
    • G06F13/1621Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by maintaining request order
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/34Flow control; Congestion control ensuring sequence integrity, e.g. using sequence numbers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9057Arrangements for supporting packet reassembly or resequencing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

An apparatus and method for transmitting a plurality of transactions are disclosed. In the present disclosure, the apparatus is for: in each of a plurality of packet-based transactions according to network transport protocol encoded operations, a delivery index indication contained in host data (601) is identified, incorporated into each of a plurality of packets of a corresponding packet-based transaction (602), and the plurality of packets incorporating the delivery index indication (603) are instructed to be sent.

Description

System and method for handling out-of-order delivery transactions
Technical Field
The present disclosure relates in some embodiments to communication systems, and more particularly, but not exclusively, to systems and methods for handling out-of-order delivery transactions.
Background
Remote direct memory access (remote direct memory access, RDMA) is a direct memory access from the memory of one computer to the memory of another computer, without involving the operating system of either computer. RDMA methods allow high throughput and low latency networking, which is particularly useful in massively parallel computer clusters.
Remote direct memory access over converged ethernet (RDMA over converged ethernet, roCE) is a standard protocol that enables RDMA to be efficiently transferred over ethernet, allowing transport offloading with hardware RDMA engine implementations, and achieving excellent performance. RoCE is a standard protocol defined in the wireless bandwidth trade association (infiniband trade association, IBTA) standard. The use of the RoCE user datagram protocol (user datagram protocol UDP) encapsulation allows it to go beyond Layer 3 (Layer 3) networks. RDMA is a key capability used by the wireless bandwidth (InfiniBand) interconnect technology itself. InfiniBand and Ethernet RoCE share a common user application programming interface (application programming interface, API), but have different physical and link layers.
Disclosure of Invention
It is an object of the present disclosure to provide an apparatus for transmitting a plurality of transactions, an apparatus for receiving a plurality of out-of-order transactions, a method for transmitting a plurality of transactions, and a method for receiving a plurality of out-of-order transactions.
The above object and other objects are achieved by the features of the independent claims. Further embodiments are evident in the dependent claims, the description and the drawings.
According to a first aspect of the present disclosure, an apparatus for transmitting a plurality of transactions is disclosed. The device is used for:
In each of a plurality of packet-based transactions according to network transport protocol encoded operations: identifying a delivery index indication contained in the host data; merging the delivery index indication into each of a plurality of data packets of the respective packet-based transaction; and, directing to transmit the plurality of data packets incorporating the delivery index indication.
According to a second aspect of the present disclosure, an apparatus for receiving a plurality of transactions is disclosed. The device is used for: receiving a plurality of packet-based transactions according to network transport protocol encoded operations, wherein a delivery index indication is incorporated in each of a plurality of packets of a respective packet-based transaction; the processing of the misclassified packets in the plurality of packet-based transactions is managed according to the index indication.
According to a third aspect of the present disclosure, a method for sending transactions is disclosed. The method comprises the following steps:
In each of a plurality of packet-based transactions according to network transport protocol encoded operations: identifying a delivery index indication contained in the host data; merging the index indications into a plurality of data packets of the respective packet-based transactions; and, directing to transmit the plurality of data packets incorporating the delivery index indication.
According to a fourth aspect of the present disclosure, a method for receiving a transmitted transaction is disclosed. The method comprises the following steps: receiving a plurality of packet-based transactions according to network transport protocol encoded operations, wherein a delivery index indication is incorporated in each of a plurality of packets of a respective packet-based transaction; the processing of the misclassified packets in the plurality of packet-based transactions is managed according to the index indication.
In another implementation of the first aspect, the index indication is based on at least one of: a host internal queue index, a specific transaction type increment counter, a specific packet-based transaction indication, an absolute number, and an unlimited run index indication.
In another implementation of the first aspect, the index indication is incorporated into one of: a plurality of payloads of a plurality of data packets of a corresponding data packet based transaction; a packet of the corresponding packet-based transaction; and at least one header of a plurality of data packets of the corresponding packet-based transaction. In this way, the index indication may be incorporated into any available location of the data packet.
In another implementation of the first aspect, in each of the plurality of packet-based transactions, the apparatus is configured to merge the index indication by overwriting unused fields in at least one header of a plurality of packets of the corresponding packet-based transaction.
In another implementation of the first aspect, in each of the plurality of packet-based transactions, the apparatus is configured to merge the index indication into a dedicated field of at least one header of each of the plurality of packets of the respective packet-based transaction, the dedicated field being intended for transaction information purposes.
In another implementation of the first aspect, in each of the plurality of packet-based transactions, the apparatus is further configured to receive an acknowledgement ACK that successfully transmits a first packet of the corresponding packet-based transaction of the operation, and in response to receiving the ACK, cease merging the index indication into the plurality of packets of the corresponding packet-based transaction of the operation. Thus, the computational power of incorporating the index indication into each data packet is saved.
In another implementation of the first aspect, in each of the plurality of packet-based transactions, the apparatus is further configured to receive an acknowledgement ACK that successfully transmits a first packet of the corresponding packet-based transaction of the operation, and in response to receiving the ACK, continue to merge the index indication into the plurality of packets of the corresponding packet-based transaction of the operation.
In another implementation of the first aspect, the index indication is opposite to the last ACK received for the index indication. In this embodiment, the index indicates that counting transactions is restarted since the most recently acknowledged transaction. In this way, the smallest number possible representing the index indication is used, so that fewer bits are used on the header or payload incorporating the index indication when a new transaction is sent.
In another embodiment of the second aspect, the apparatus is further for: processing at least one of the plurality of data packets of the packet-based transaction according to the index indication; or creating a queue corresponding to a delivery order of the data packet-based transactions of the operation by generating an element for each data packet-based transaction of the operation upon first receiving a data packet of the corresponding data packet-based transaction of the operation; and removing elements of each of the data packet based transactions of the operation from the queue upon completion of the corresponding data packet based transaction of the operation. Thus, each transaction is identified by the index indication of the transaction, and the device knows which transaction each received packet belongs to. In this way, the device receiving the data packet may perform out-of-order processing.
In another embodiment of the fourth aspect, the step of managing comprises:
Processing at least one of the plurality of data packets of the packet-based transaction according to the index indication; or alternatively
Creating a queue corresponding to a delivery order of the data packet-based transactions of the operation by generating an element for each data packet-based transaction of the operation upon first receiving a data packet of the corresponding data packet-based transaction of the operation; and
The elements of each of the data packet based transactions of the operation are removed from the queue upon completion of the corresponding data packet based transaction of the operation.
In a fifth aspect, the present disclosure relates to a computer program product comprising computer readable code instructions which, when run in a computer, cause the computer to perform the method according to any one of the third and fourth aspects of the present disclosure and embodiments thereof.
Other systems, methods, features, and advantages of the disclosure will be or become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description and be within the scope of the present disclosure, and be protected by the accompanying claims.
Unless defined otherwise, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the examples pertains. Methods and materials similar or equivalent to those described herein can be used in the practice or testing of the embodiments, exemplary methods and/or materials being described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
Drawings
Some embodiments are described herein, by way of example only, in connection with the accompanying drawings. With specific reference now to the drawings, it is emphasized that the details shown are by way of example and are for illustrative discussion of the embodiments. In this regard, it will be apparent to those skilled in the art how to practice the embodiments from the specification and drawings.
In the drawings:
FIG. 1 schematically illustrates an example of a problem with transactions arriving at a receiver out of order;
FIG. 2 schematically illustrates an apparatus for transmitting and receiving a plurality of packet-based transactions, some of which are received out of order of the transaction, according to some embodiments of the present disclosure;
FIG. 3 schematically illustrates an example of receiving packet-based transactions out of order according to some embodiments of the present disclosure;
fig. 4 schematically illustrates another example of receiving packet-based transactions out of order according to some embodiments of the present disclosure.
FIG. 5 schematically illustrates an example of packet-based transactions receiving three RDMA send operations out of order according to some embodiments of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a method for processing transmission of a plurality of packet-based transactions from a transmitter to a receiver, the plurality of packet-based transactions received out of order at the receiver, according to some embodiments of the disclosure; and
Fig. 7 schematically illustrates a flow chart of a method for handling receipt of a plurality of packet-based transactions received out of order at a receiver according to some embodiments of the present disclosure.
Detailed Description
Some embodiments described in this disclosure relate to communication systems, and more particularly, but not exclusively, to systems and methods for handling out-of-order delivery transactions.
Remote Direct Memory Access (RDMA) requires underlying reliable transport. The data packet is sent from the sender/transmitter to the receiver and retransmitted if necessary (e.g., if the data packet is lost without reaching the destination).
Converged ethernet based RDMA (RoCE) is a network protocol that allows Remote Direct Memory Access (RDMA) over ethernet. The RoCE protocol has two versions, the first one RoCEv, also called non-routable RoCE, allowing communication between two hosts in the same ethernet broadcast domain. The second is RoCEv where the data packets are transmitted via User Datagram Protocol (UDP), which means that the data packets can be routed. RoCEv2 is also known as routable RoCE. Reliability is provided at the RDMA layer (i.e., retransmission of packets that are not received or packets that are received in error). Data traffic in RDMA consists of memory transactions. The correct behaviour of the RoCE protocol requires that the receiver is able to know the correct order of the transactions. The receiver is able to process the received data packets even if the data packets are received out of order within the transaction. However, for received packets received with the transactions out of order, the receiver cannot process these packets because the receiver does not know which transaction the packets belong to. For example, when a receiver receives a transaction for an inbound RDMA send operation, the receiver places the received transaction in a work queue element (work queue element, WQE) index of a Receive Queue (RQ). When a transaction of the inbound RDMA send operation is received out of order, the receiver cannot know in which WQE index of the receive queue this transaction should be placed.
Fig. 1 schematically shows an example of a problem with transactions arriving at a receiver out of order. In this example, the sender 101 sends three transactions to the receiver 102 in the following order: transaction a, transaction B, and transaction C. Transaction a includes data packets 1,2, 3. Transaction B includes data packets 4, 5,6 and transaction C includes data packets 7 and 8. The order in which the transactions are received is different from the order in which the transactions are delivered. The packets in the transaction are out of order. The first received is transaction a, the order of the packets is as follows: 1,3,2. Since all packets 1,2, and 3 belong to transaction a, the receiver is able to process all packets. The second received transaction B, the order of the data packets is as follows: 4. 5, and 6, wherein prior to data packet 6, the receiver receives transaction C comprising data packet 7 and data packet 8. Although transaction C, which includes packets 7 and 8, is received before transaction B is completed, packets 7 and 8 cannot be processed because the transaction is out of order. Only after transaction B (the second transaction) is completed and packet 6 is received, the receiver can process packets 7 and 8 of the third transaction. In many cases, when the receiver cannot process the data packet, the data packet is discarded, after which the transmitter has to send the discarded data packet again.
According to some embodiments of the present disclosure, apparatus and methods are provided for ensuring that a receiver knows the correct transaction delivery order. In particular, in all possible RoCE versions, RDMA transactions through RoCE. According to some embodiments of the present disclosure, the apparatus and methods of the present disclosure detect in each transaction an index indication (hereinafter simply referred to as index indication) based on the delivered host internal queue index contained in the host software, and incorporate the index indication into all data packets of the respective transaction. Thus, when the receiver receives a data packet in the wrong order, the receiver knows the correct delivery order of the data packet and the transaction, and can continue to perform the wrong order processing on the received data packet.
Before explaining at least one embodiment in detail, it is to be understood that the embodiment is not necessarily limited in its application to the details of construction and the arrangement of components and/or methods set forth in the following description and/or illustrated in the drawings and/or examples. Implementations can have other embodiments, or can be practiced or carried out in various ways.
Embodiments may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions thereon for causing a processor to perform aspects of the embodiments.
A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium would include: portable computer magnetic disk, hard disk, random access memory (random access memory, RAM), read-only memory (ROM), erasable programmable read-only memory (erasable programmable read-only memory, EPROM or flash memory), static random access memory (static random access memory, SRAM), portable compact disk read-only memory (portable compact disc read-only memory, CD-ROM), digital versatile disk (DIGITAL VERSATILE DISK, DVD), memory stick, floppy disk, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, should not be construed as transitory signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through fiber optic cable), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a corresponding computing/processing device or to an external computer or external storage device over a network such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical transmission fibers, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
The computer readable program instructions for performing the operations of the embodiments may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer and as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (local area network, LAN) or a wide area network (wide area network, WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, electronic circuitry, including, for example, programmable logic circuitry, field-programmable gate array (GATE ARRAY, FPGA), or programmable logic array (programmable logic array, PLA), may execute computer-readable program instructions by personalizing the electronic circuitry with state information for the computer-readable program instructions to perform aspects of the embodiments.
Aspects of the embodiments are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the embodiments. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having the instructions stored therein includes articles of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus, or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Referring now to fig. 2, fig. 2 schematically illustrates an apparatus for transmitting and receiving a plurality of packet-based transactions, some of which are received out of order of the transaction, according to some embodiments of the present disclosure. The apparatus 200 includes a transmitter 210 and a receiver 220. Transmitter 210 includes a processor 211, memory 212, and host software 213. Receiver 220 includes a processor 221, memory 222, and host software 223. The sender's processor 211 identifies a delivered host index indication in each transaction of operation, which is contained in host software 213. A transaction is a packet-based transaction and includes a plurality of packets. The operation may be, for example, a read operation, a write operation, a send operation, etc. The processor 211 incorporates the index indication into each of the plurality of data packets of the transaction of the operation and instructs the plurality of data packet based transactions of the operation incorporating the index indication to be sent to the receiver 220 via the network transport protocol. The processor 211 may be in a transmitter, may be a dedicated hardware block or processing unit on a network device, or may be a processor on host software.
According to some embodiments of the present disclosure, the index indication based on the host internal queue index is a Work Queue Element (WQE). According to some other embodiments of the disclosure, the index indication is a particular transaction type counter. The particular transaction type counter may be implemented in firmware, hardware, or software. The particular transaction type counter may be implemented, for example, as an opcode counter internal to the network device.
According to some embodiments of the present disclosure, the index indication is incorporated into the payloads of a plurality of data packets of a corresponding packet-based transaction. Alternatively, the index indication is incorporated into at least one header (header) of each of a plurality of data packets in a corresponding packet-based transaction.
According to some embodiments of the present disclosure, an index indication may also be incorporated into the header of each data packet. In some embodiments of the present disclosure, in each of the plurality of packet-based transactions, the transmitter's processor 211 merges the index indication by overwriting (overwritting) unused fields in the header of the plurality of packets of the corresponding packet-based transaction. According to some embodiments of the present disclosure, in each of a plurality of packet-based transactions, a transmitter incorporates an index indication into a dedicated field intended for transaction information purposes in at least one header of each of a plurality of packets of the respective packet-based transaction.
The receiver 220 receives the transmitted packet-based transaction of the operation encoded according to the network transport protocol, wherein the delivered index indication is incorporated in each of the plurality of packets of the corresponding packet-based transaction. The receiver then manages processing of the misclassified packets in the plurality of packet-based transactions according to the index indication. In some embodiments of the present disclosure, the processor 221 of the receiver 220 manages the processing of the out-of-order packets in the plurality of packet-based transactions. The processor 221 may be in a transmitter, may be a dedicated hardware block or processing unit on a network device, or may be a processor on host software. In the event that the data packets conform to the order of the transactions, the processor 221 processes at least one of the plurality of data packets of the packet-based transactions according to the index indication. Alternatively, in the event that the received data packet does not conform to the transaction order, the processor 221 of the receiver 220 processes a special receive queue corresponding to the delivery order of the operated on transactions. The processor, upon first receiving a data packet from a corresponding data packet based transaction of an operation, generates an element for each data packet based transaction of the operation and stores the element in memory 222 of receiver 220. The processor 221 removes the elements of each packet-based transaction of an operation from the particular receive queue when the corresponding packet-based transaction of the operation is completed.
Optionally, the index indication of the present disclosure increments a counter based on the host internal queue index or based on a particular transaction type.
According to some embodiments of the present disclosure, the index indication may be an absolute number, or relative number, and/or an unlimited run index indication.
According to some embodiments of the present disclosure, in each of a plurality of packet-based transactions, a sender receives an Acknowledgement (ACK) of a first packet in a corresponding packet-based transaction of a successful send operation from a receiver. Since the first packet of a transaction typically includes an address for each packet of the transaction, the sender may cease merging the index indication into the plurality of packets of the corresponding packet-based transaction of the operation in response to the ACK. In accordance with some other embodiments of the present disclosure, in response to receiving the ACK, the transmitter may continue to merge the index indication into a plurality of data packets of the respective data packet-based transaction of the operation. That is, the information in the first packet of a transaction is not always sufficient to place all packets received out of order. For example, in the case where six transactions of an operation have been transmitted and the receiver receives three first data packets of the three transactions of the operation, the receiver does not know how many transactions are lost other than the three transactions of the three first data packets that the receiver has received.
According to some embodiments of the present disclosure, the index indication may be opposite to the last ACK received for the index indication. This means that when the sender receives ACKs for all packets of a particular transaction, so that all packets of the particular transaction are received at the receiver, the sender can restart counting of the index indication. Thus, the transmitter uses as small a number as possible as an index indication and consumes few bits on the header or payload (payload) when transmitting the data packet. In the case where the index indications are relative, the sender may restart calculating the index indication of the transaction each time or every other time the sender receives an ACK that all packets for a particular transaction were successfully processed at the receiver. In this case, the host data holds a number indicating the total number of transactions, which is different from the index indication that resumes counting transactions after the latest ACK is received. In the case that the index indication is infinite, according to some embodiments of the present disclosure, the index indication is limited to the largest number that can be represented by the available bits, which are bits used to merge the index indication in the data packet. In this case, the index indication restarts counting once it reaches the maximum number that can be represented.
Fig. 3 schematically illustrates an example of receiving packet-based transactions out of order according to some embodiments of the present disclosure. In this example, sender 310 sent three transactions, transaction A3, transaction B3, and transaction C3, to receiver 320.
Transaction A3 is first sent which includes data packets 301, 302, and 303. The sender's processor recognizes the index indication in the sender's host software and incorporates the index indication into data packets 301, 302, and 303 of transaction A3. Since A3 is the first transaction, the index indication for transaction A3 may be, for example, the absolute number 1, which is incorporated into all packets for transaction A3. Alternatively, the index indication may be a relative number, such as x+1, where x is a predefined number. Receiver 320 first receives data packet 301 and the receiver's processor generates an element labeled "1" that corresponds to transaction A3, taken from data packet 301. All packets belonging to transaction A3 are stored in element "1". After receiving the data packet 301, the receiver receives the data packet 303. Since data packet 303 contains the same index indication as data packet 1, the processor of the receiver stores data packet 3 at element "1". The receiver then receives the data packet 302. Since data packet 302 contains the same index indication as data packet 301 and data packet 303, the receiver stores data packet 2 at element "1". Once transaction A3 is complete and all packets in the transaction are received, element "1" is removed from the receiver's special receive queue.
The second transaction, transaction B3, contains data packets 304, 305, and 306. At the sender, an index indication of transaction B3 is identified at the host software and incorporated into each packet of transaction B3. Since transaction B3 is the second transaction, the index indication may be an absolute number of 2, or a relative number of x+2, where x is a predefined number. The data packets for transaction B3 are sent in the following order: 304. 305 and 306. The first data packet 304 is received by the receiver 320. Since data packet 304 is the first time the receiver receives a data packet in transaction B3, the receiver's processor generates a new element labeled "2" in the receiver's special receive queue, which element labeled "2" is taken from data packet 304 and corresponds to transaction B3. In element "2", all data packets belonging to transaction B3 and containing an index indicating the same as transaction B3 are stored. The receiver then receives a data packet 305, which data packet 305 contains the same index indication as data packet 304. The processor of the receiver stores the data packet 305 at element "2". The receiver then receives the data packet 307, which data packet 307 does not belong to transaction B3. Data packet 307 belongs to new transaction C3, so that data packet 307 contains an index indication that is different from the index indications of transactions A3 and B3. Since transaction C3 is the third transaction, the index indication may be an absolute number of 3, or a relative number of x+3, where x is a predefined number. Thus, the receiver's processor generates a new element labeled "3", which is taken from data packet 307, corresponding to transaction C3. Once element "3" is generated, the special receive queue contains both elements "2" and "3". After packet 307, the receiver receives packet 306, which is the last packet of transaction B3, and packet 306 contains an index indication of transaction B3. The processor of the receiver stores the data packet 306 in element "2". At this stage, the processor of receiver 320 removes element "2" from the special receive queue because all packets for transaction B3 have been received. The receiver then receives the data packet 308, which data packet 308 belongs to transaction C3, so that the data packet 308 contains the same index indication as the data packet 307. The processor of receiver 320 stores data packet 308 at element "3" which is removed from the special receive queue by the receiver since all data packets for transaction C3 have been received.
Fig. 4 schematically illustrates another example of receiving packet-based transactions out of order according to some embodiments of the present disclosure. In this example, sender 410 sent three transactions, transaction A4, transaction B4, and transaction C4, to receiver 420.
Transaction A4 is first sent which includes data packets 401, 402 and 403. The processor of the sender 410 recognizes the index indication in the sender's host software and incorporates the index indication into the data packet 401, 402, 403 of transaction A4. Receiver 420 first receives data packet 401 and the receiver's processor generates an element labeled "1" which corresponds to transaction A4. All packets belonging to transaction A4 are stored in element "1". After receiving the data packet 401, the receiver receives the data packet 403. Since the data packet 403 contains an index indicating the same as the data packet 401, the processor of the receiver 420 stores the data packet 403 at element "1". Thereafter, the receiver receives the data packet 402. Since the data packet 402 contains an index indicating the same as the data packets 401 and 403, the processor of the receiver stores the data packet 402 at element "1". Once transaction A4 is complete, and all packets for transaction A4 are received and stored in element "1," element "1" is removed from the receiver's special receive queue.
The second transaction, transaction B4, contains packets 404, 405, and 406. At the sender, an index indication of transaction B4 is identified at the host software and incorporated into each packet of transaction B4. The data packets for transaction B4 are transmitted in the following order: 404. 405, and 406. The receiver 420 receives the first data packet 405. Since data packet 405 belongs to a new transaction and contains an index indication of new transaction B4, the processor of receiver 420 generates a new element labeled "2" in the special receive queue of the receiver, which corresponds to the index indication of transaction B4. Thereafter, the receiver receives a data packet 406, the data packet 406 containing an index indicating the same as the data packet 405. The processor of the receiver stores the data packet 406 in element "2" of the special receive queue. Thereafter, the receiver receives a data packet 407, the data packet 407 being a data packet from transaction C4. Since 407 is a data packet for a new transaction, data packet 407 contains a new index indication that is different from the index indications of transactions A4 and B4. Thus, in the special receive queue, the processor of receiver 420 generates a new element labeled "3". Data packet 407 is stored in element "3", and since data packet 408 received by receiver 420 after data packet 407 also belongs to transaction C4, data packet 408 is also stored in element "3". Since transaction C4 contains only packets 407 and 408, element "3" is removed from the special receive queue once all packets for transaction C4 have been received and stored. After element "3" is removed, the receiver receives packet 404. Since the data packet 404 belongs to the transaction B4, the transaction B4 already has the corresponding element "2", and the data packet 404 is stored in the element "2". Since packet 404 is the last packet of transaction B4 and all packets of transaction B4 have been received, element "2" is removed from the special receive queue of receiver 420.
Optionally, the index indication is incorporated into one data packet of the corresponding packet-based transaction.
According to some embodiments of the disclosure, the plurality of transactions are a wireless bandwidth based RDMA protocol, a RoCEv protocol, and a RoCEv protocol. Transactions may be used for marked operations and/or unmarked operations. For example, a transaction may be used for a send operation, which is a non-marked operation, and a read operation, which is a marked operation.
FIG. 5 schematically illustrates an example of packet-based transactions receiving three RDMA send operations out of order according to some embodiments of the present disclosure. In receiver 520, the producer generates n WQEs (indexes) where the consumer points to the next expected transaction that the receiver expects to receive. Thus, before the receiver receives the transaction, the consumer points to WQE index 1. When the receiver 520 receives the first packet 501 of transaction A5 (which transaction A5 is a transaction for an RDMA send operation), the receiver reads the index indication in packet 501 and sees that the index indication contained in packet 501 is "1". The receiver thus fetches WQE element "1" from the Receive Queue (RQ) and processes the packet 501 directly into the buffer represented by WQE element "1". Element "1" is taken from the index indication of packet 501, corresponding to the index indication of transaction A5 of the first RDMA Send operation. When the receiver receives the first transaction (meaning any packet in the first transaction), the consumer in the Receive Queue (RQ) points to the next expected transaction. In this case, the consumer points to WQE index 2. Thereafter, after transaction A5 is successfully completed, no second transaction of RDMA Send operation is received as expected by the consumer. Instead, a third RDMA Send operation is received with packets 507 and 508. Thus, the processor of receiver 520 generates element "3" which is taken from the index indication field (payload or at least one header) of packet 507, which corresponds to transaction C5, and the receiver thus takes WQE element "3" from the receive queue and processes packet 507 directly to the buffer represented by WQE element "3". Thereafter, the consumer jumps to index 4, which index 4 is the next expected index after index 3. Thereafter, transaction B5 (i.e., the second RDMA Send operation) is received at the receiver, with the packet receipt sequence as follows: 506. 505, and 504. In this case, the receiver reads the index indication incorporated into the data packet 506 and sees that the index indication contained in the data packet 506 is "2". The receiver thus fetches WQE element "2" from the Receive Queue (RQ) and processes the packet 506 directly to the buffer represented by WQE element "2". Element "2" is taken from the index indication of packet 506 corresponding to the index indication of transaction B5 of the second RDMA Send operation. However, the consumer still points to index 4 in the Receive Queue (RQ) because 4 is still the next expected index after transaction B5 completes.
Fig. 6 schematically illustrates a flow chart of a method of transmitting for processing a plurality of packet-based transactions from a transmitter to a receiver, the transactions received out of order at the receiver, according to some embodiments of the disclosure. At 601, a processor of a sender identifies a delivery index indication contained in host data of host software in each of a plurality of packet-based transactions of an operation encoded according to a network transport protocol. At 602, the identified delivery index indication is incorporated into each of a plurality of data packets of a corresponding transaction. The index indication may be based on a host internal queue index, such as Work Queue Element (WQE), or the index indication may be based on a particular transaction type counter. The index indication may be an absolute number, a relative number, and/or an unlimited run index. In some embodiments of the present disclosure, the index indication is incorporated into at least one header of each data packet, or into the payload of each data packet. The index indication is incorporated into at least one header of a plurality of data packets of a corresponding packet-based transaction by overwriting unused fields in the at least one header. In some other embodiments of the present disclosure, the index indication is incorporated into a dedicated field in at least one header of each data packet that is intended for transaction information purposes. Alternatively, the index indication may be incorporated into one packet of the corresponding transaction.
At 603, the processor of the transmitter gives instructions for transmitting a plurality of data packets that incorporate the delivery index indication.
Fig. 7 schematically illustrates a flow chart of a method for processing a received plurality of packet-based transactions received out of order at a receiver according to some embodiments of the present disclosure. At 701, a plurality of packet-based transactions of an operation encoded according to a network transport protocol are received at a receiver, wherein a delivery index indication is incorporated in each of a plurality of packets of a respective packet-based transaction.
At 702, a processor of a receiver manages processing of an out-of-order packet in a plurality of packet-based transactions according to a delivery index indication. According to some embodiments of the present disclosure, in the event that the out-of-order data packets do not conform to the order of the transactions, a special receive queue is created that corresponds to the order of delivery of the operational packet-based transactions. In the special receive queue, an element is generated for each packet-based transaction of an operation upon first receiving a packet of the corresponding packet-based transaction of the operation. The element is removed from the queue upon completion of the corresponding packet-based transaction of the operation. According to some embodiments of the present disclosure, in the event that the out-of-order data packets conform to the transaction order, the data packets are processed out-of-order according to the index indication.
According to some embodiments of the present disclosure, an ACK is sent from the receiver to the sender for the first packet of the corresponding packet-based transaction of the successful send operation. In response, according to some embodiments of the present disclosure, the merging of the index indication into each data packet may be stopped. According to some other embodiments of the present disclosure, the incorporation of the index indication into each data packet may continue in response to receiving the ACK.
According to some other embodiments of the present disclosure, the index indication is as opposed to the last receipt of an ACK for the index indication.
According to some embodiments of the present disclosure, methods of sending and receiving out-of-order delivery transactions may indicate packet loss or indicate out-of-order delivery of a first packet of a transaction. In RDMA and/or RoCE, only the first packet of a transaction provides a context and includes the memory address where the transaction occurred. The out-of-order delivery transaction sending and receiving methods of the present disclosure may be of great importance to RDMA implementations because the transaction's data packet can be stored in an element generated in a special receive queue even before the first data packet of the transaction is received.
According to some embodiments of the present disclosure, a computer program product is provided. The computer program product comprises computer readable code instructions which, when run in a computer, cause the computer to perform the method described in fig. 6 and 7 above.
The description of the various embodiments is for illustrative purposes only and is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein are selected to explain the principles of the embodiments herein, the practical application, or the technical improvement over commercially available technologies as much as possible, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
It is expected that during the life of a matured patent going from this application many relevant devices and methods for handling out-of-order delivery transactions will be developed and the scope of the term device and method for handling out-of-order delivery transactions is intended to include all such new technologies a priori.
The term "about" as used herein means ± 10%.
The terms "comprising," including, "" having, "and" with "mean" including but not limited to. This term includes the terms "consisting of … …" and "consisting essentially of … …".
The phrase "consisting essentially of … …" means that the composition or method may include additional ingredients and/or steps, provided that the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the composition or method as required.
As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. For example, the term "one complex" or "at least one complex" may include a plurality of complexes, including mixtures thereof.
The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or as a combination of features excluding other embodiments.
The word "optionally" as used herein means "provided in some embodiments and not provided in other embodiments. Any particular embodiment may include multiple "optional" features unless the features conflict.
Throughout this application, various embodiments may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as a limitation on the solidification of the scope of the embodiments. Accordingly, the description of a range should be considered to have specifically disclosed all possible sub-ranges as well as individual values within the ranges described above. For example, a description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6, etc., as well as individual numbers within the above ranges, such as 1, 2, 3, 4, 5, 6. This applies regardless of the width of the range.
Whenever a range of numbers is referred to herein, the representation includes any recited number (fractional or integer) within the indicated range. The phrases "in the range between the first indicator and the second indicator" and "in the range from the first indicator to the second indicator" are used interchangeably herein and refer to all fractions and integers comprising the first indicator and the second indicator and therebetween.
It is appreciated that certain features of the embodiments, which are, for brevity, described in the context of a single embodiment, may also be provided in combination in a single embodiment. Conversely, various features of the embodiments, which are, for brevity, described in the context of a single embodiment, may also be provided in any other described embodiment, alone or in any suitable subcombination or in a suitable manner. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments unless the embodiments described above are not without these elements.
Although the embodiments have been described in conjunction with specific embodiments, many alternatives, modifications, and variations will be apparent to those skilled in the art. Accordingly, the spirit and broad scope of the appended claims are intended to include all such alternatives, modifications, and variations.
All publications, patents, and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated herein by reference. Furthermore, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the embodiments. With respect to the use of section titles, the section titles should not be construed as necessarily limiting.

Claims (14)

1. An apparatus for transmitting a plurality of transactions, the apparatus being configured to:
In each of a plurality of packet-based transactions according to network transport protocol encoded operations:
Identifying a delivery index indication contained in host data, the delivery index indication indicating a correct delivery order for the transaction and the data packet;
Merging the delivery index indication into each of a plurality of data packets of a respective packet-based transaction; and
And indicating to send the plurality of data packets incorporating the delivery index indication.
2. The device of claim 1, wherein the index indication is based on at least one of: a host internal queue index, a specific transaction type increment counter, a specific packet-based transaction indication, an absolute number, and an unlimited run index indication.
3. The device of claim 1, wherein the index indication is incorporated into one of:
A plurality of payloads of the plurality of data packets of the respective packet-based transaction;
a packet of said corresponding packet-based transaction;
At least one header of the plurality of data packets of the corresponding packet-based transaction.
4. A device according to claim 3, wherein in each of the plurality of packet-based transactions, the device is configured to incorporate the index indication by overwriting unused fields in the at least one header of the plurality of packets of the respective packet-based transaction.
5. A device according to claim 3, wherein in each of the plurality of packet-based transactions, the device is to incorporate the index indication into a dedicated field in the at least one header of each of the plurality of packets of the respective packet-based transaction, the dedicated field being intended for representing transaction information.
6. The apparatus of claim 1, wherein in each of the plurality of packet-based transactions, the apparatus is further configured to receive an acknowledgement ACK of a first packet of the respective packet-based transaction that successfully sent the operation, and to cease merging the index indication into the plurality of packets of the respective packet-based transaction of the operation in response to receiving the acknowledgement ACK.
7. The apparatus of claim 1, wherein in each of the plurality of packet-based transactions, the apparatus is further configured to receive an acknowledgement ACK of a first packet of the respective packet-based transaction that successfully sent the operation, and to continue merging the index indication into the plurality of packets of the respective packet-based transaction of the operation in response to receiving the acknowledgement ACK.
8. The device of claim 1, wherein the index indication is opposite to a last acknowledgement ACK in which the index indication was received.
9. An apparatus for receiving a plurality of transactions, for:
Receiving a plurality of packet-based transactions according to network transport protocol encoded operations, wherein a delivery index indication is incorporated in each of a plurality of packets of a respective packet-based transaction, the delivery index indication indicating a correct delivery order of the transaction and the packets;
and managing the processing of the error-sequenced data packets in the plurality of data packet-based transactions according to the index indication.
10. The apparatus of claim 9, wherein the apparatus is further configured to:
processing at least one out-of-order packet of the plurality of packets of the packet-based transaction according to the index indication; or alternatively
Creating a queue corresponding to a delivery order of the packet-based transactions of the operation by generating an element for each of the packet-based transactions of the operation upon first receiving a packet of the corresponding packet-based transaction of the operation; and
The element of each of the data packet based transactions of the operation is removed from the queue upon completion of the corresponding data packet based transaction of the operation.
11. A method for sending transactions, comprising:
In each of a plurality of packet-based transactions according to network transport protocol encoded operations:
Identifying a delivery index indication contained in host data, the delivery index indication indicating a correct delivery order for the transaction and the data packet;
merging the index indications into a plurality of data packets of the respective packet-based transactions; and
And indicating to send the plurality of data packets combined with the index indication.
12. A method for receiving a transmitted transaction, comprising:
Receiving a plurality of packet-based transactions according to network transport protocol encoded operations, wherein a delivery index indication is incorporated in each of a plurality of packets of a respective packet-based transaction, the delivery index indication indicating a correct delivery order of the transaction and the packets;
and managing the processing of the error-sequenced data packets in the plurality of data packet-based transactions according to the index indication.
13. The method of claim 12, wherein the step of managing comprises:
processing at least one out-of-order packet of the plurality of packets of the packet-based transaction according to the index indication; or alternatively
Creating a queue corresponding to a delivery order of the packet-based transactions of the operation by generating an element for each of the packet-based transactions of the operation upon first receiving a packet of the corresponding packet-based transaction of the operation; and
The element of each of the data packet based transactions of the operation is removed from the queue upon completion of the corresponding data packet based transaction of the operation.
14. A computer program product comprising computer readable code instructions which, when run in a computer, cause the computer to perform the method according to any of claims 11 to 13.
CN202080066674.5A 2020-04-16 2020-04-16 System and method for handling out-of-order delivery transactions Active CN114531907B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/060659 WO2021209131A1 (en) 2020-04-16 2020-04-16 A system and method for handling out of order delivery transactions

Publications (2)

Publication Number Publication Date
CN114531907A CN114531907A (en) 2022-05-24
CN114531907B true CN114531907B (en) 2024-07-26

Family

ID=70456740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080066674.5A Active CN114531907B (en) 2020-04-16 2020-04-16 System and method for handling out-of-order delivery transactions

Country Status (2)

Country Link
CN (1) CN114531907B (en)
WO (1) WO2021209131A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109936510A (en) * 2017-12-15 2019-06-25 微软技术许可有限责任公司 Multipath RDMA transmission

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030220936A1 (en) * 2002-05-22 2003-11-27 Gifford John H. Software architecture for managing binary objects
US8103809B1 (en) * 2009-01-16 2012-01-24 F5 Networks, Inc. Network devices with multiple direct memory access channels and methods thereof
US9692560B1 (en) * 2014-07-10 2017-06-27 Qlogic Corporation Methods and systems for reliable network communication
US9985903B2 (en) * 2015-12-29 2018-05-29 Amazon Technologies, Inc. Reliable, out-of-order receipt of packets

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109936510A (en) * 2017-12-15 2019-06-25 微软技术许可有限责任公司 Multipath RDMA transmission

Also Published As

Publication number Publication date
WO2021209131A1 (en) 2021-10-21
CN114531907A (en) 2022-05-24

Similar Documents

Publication Publication Date Title
US11934340B2 (en) Multi-path RDMA transmission
EP2574000B1 (en) Message acceleration
JP4583383B2 (en) Method for improving TCP retransmission process speed
US8576847B2 (en) Mechanisms for discovering path maximum transmission unit
US8009672B2 (en) Apparatus and method of splitting a data stream over multiple transport control protocol/internet protocol (TCP/IP) connections
JP4508195B2 (en) Reduced number of write operations for delivery of out-of-order RDMA transmission messages
US20140310369A1 (en) Shared send queue
US10735373B2 (en) Communications over multiple protocol interfaces in a computing environment
US10333652B2 (en) Redundancy in converged networks
US20090232137A1 (en) System and Method for Enhancing TCP Large Send and Large Receive Offload Performance
CN107241378B (en) Apparatus and method for unified data networking across heterogeneous networks and storage medium
KR102563888B1 (en) Method, apparatus and computer program for deduplicating data frame
JP4979823B2 (en) Data transfer error check
US20130219077A1 (en) Emulating ficon over ip
US10419163B2 (en) Adaptive network communication protocols
US20140160952A1 (en) Detecting and isolating dropped or out-of-order packets in communication networks
CN110720205B (en) Electronic device communicating via a user service platform
CN114521317B (en) Apparatus and method for selective multi-packet transmission of preferred packets
CN114531907B (en) System and method for handling out-of-order delivery transactions
US20110161741A1 (en) Topology based correlation of threshold crossing alarms
US7526706B2 (en) Method and apparatus for preventing network outages
US20180337883A1 (en) In-band ldap over ficon
US10901820B1 (en) Error state message management
CN116346722A (en) Message processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant