WO2019140556A1 - 一种报文传输的方法及装置 - Google Patents

一种报文传输的方法及装置 Download PDF

Info

Publication number
WO2019140556A1
WO2019140556A1 PCT/CN2018/072886 CN2018072886W WO2019140556A1 WO 2019140556 A1 WO2019140556 A1 WO 2019140556A1 CN 2018072886 W CN2018072886 W CN 2018072886W WO 2019140556 A1 WO2019140556 A1 WO 2019140556A1
Authority
WO
WIPO (PCT)
Prior art keywords
message
packet
packets
bitmap
source
Prior art date
Application number
PCT/CN2018/072886
Other languages
English (en)
French (fr)
Inventor
苏德现
杨华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202010305935.6A priority Critical patent/CN111654447B/zh
Priority to PCT/CN2018/072886 priority patent/WO2019140556A1/zh
Priority to CN201880003454.0A priority patent/CN109691039B/zh
Publication of WO2019140556A1 publication Critical patent/WO2019140556A1/zh
Priority to US16/895,791 priority patent/US11716409B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • H04L12/4633Interconnection of networks using encapsulation techniques, e.g. tunneling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/34Flow control; Congestion control ensuring sequence integrity, e.g. using sequence numbers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Definitions

  • the present invention relates to the field of message transmission technologies, and in particular, to a message transmission method and apparatus.
  • RDMA Remote Direct Memory Access
  • RDMA is the direct transfer of data to the computer's storage area through the network, moving data from a system to a remote system memory without affecting the operating system.
  • RDMA eliminates the overhead of external memory copying and context switching, thus freeing up memory bandwidth and CPU cycles to improve application system performance.
  • RoCE Remote Direct Memory Access based on Converged Ethernet
  • RDMA over Converged Ethernet abbreviated as RoCE
  • RoCE has two protocol versions, v1 and v2.
  • v1 and v2 the RoCE v1 protocol allows direct access to any two servers in the same broadcast domain.
  • the RoCE v2 protocol can implement routing functions.
  • the advantages of the RoCE protocol are mainly based on the characteristics of the converged Ethernet, the RoCE protocol can also be applied to a traditional Ethernet network or a non-converged Ethernet network.
  • the forwarding path is usually selected according to the hash value of the quintuple information in the packet.
  • the packet traffic sent from a source port may be large in a certain period of time.
  • the randomness of the hash may also cause a path in the multipath network.
  • the traffic at a certain time is large, which may cause congestion of a certain path in the multi-path network.
  • the embodiment of the present application provides a message transmission method, so that a message adopting the RoCE protocol implements more balanced route transmission in an Ethernet.
  • the present application provides a method for transmitting a message, the method being applied to a data communication system, where a remote direct memory access RDMA, source is performed between a source device and a destination device in the data communication system via Ethernet
  • the network interface card of the end device includes at least a source queue pair, and the source queue pair includes a sending queue.
  • the method for transmitting the message includes: obtaining Q data segments from the transmission queue of the source queue pair; respectively, encapsulating the Q data segments to obtain Q packets, and respectively transmitting the Q packets, wherein the Q packets are respectively sent.
  • Each of the packets carries a first header and a second header, and the first header carried in each packet is used to indicate the write address of the packet in the memory of the destination device, and the second packet is carried in each packet.
  • the source port number information is included in the header, and the source port number information in the second header carried in at least two of the Q packets is different, and Q is a positive integer greater than or equal to 2.
  • the group of packets is divided into at least two different networks.
  • the transmission is performed in the path, so that the traffic of each path in the network is relatively balanced.
  • the foregoing solution carries a write in the message with the message indicating that the message is in the destination end.
  • the first header of the inbound address enables the destination device to perform RDMA operations directly according to the address information carried in each packet, so that the foregoing solution can further optimize the routing of the RDMA-operated packets and ensure the RDMA operation. Can be truly implemented at the destination.
  • a possible implementation manner is to sequentially encapsulate Q data segments to obtain Q packets according to the source port number information of the source queue pair, and send a encapsulated packet after each packet is encapsulated. And after the N packets are encapsulated, the source port number information of the source queue pair is updated. The source port number information carried in the previous group of N packets is different from the source port number information carried in the next group of N packets. , N is greater than or equal to 1, less than Q.
  • another possible implementation manner is: dividing Q data segments into M packets, each packet includes at least one data segment, and sequentially packing data segments in each packet to obtain each packet.
  • the source port number information carried in the packets in each group is the same, and the source port number information carried in the packets in the at least two packets is different.
  • another possible implementation manner is: before encapsulating Q data segments to obtain Q packets, and further comprising: base address of each data segment according to Q data segments and each data segment The length determines the write address of each of the Q messages in the memory of the destination device. By calculating the write address of each message in the memory of the destination device and encapsulating the address in the message, the message can be directly written into the corresponding address of the memory when it reaches the destination.
  • each of the Q messages further carries a message sequence number, and the message sequence number carried in each message is used to indicate that the message is in the Q message.
  • the order of transmission in the text In this way, the destination end can be used to confirm whether the group of packets is received or the packets are out of order according to the message sequence number, thereby improving the stability of the system.
  • a second aspect provides a method for transmitting a message, the method being applied to a data communication system, wherein a remote direct memory access RDMA is performed between a source device and a destination device in the data communication system, wherein the destination end
  • the network interface card of the device includes a destination queue pair, and the destination queue pair includes a receiving queue.
  • the method for transmitting the packet includes: receiving Q packets, where each packet carries a first header and a second header, and each packet The first header carried in the text is used to indicate the write address of the packet in the memory of the destination device, and the second header carried in each packet includes the source port number information, and at least two of the Q packets carry the packet.
  • the source port number information in the second header is different, and Q is a positive integer greater than or equal to 2; according to the write address of the message carried by the Q packets in the memory of the destination device, respectively, Q reports are respectively reported.
  • the text is saved from the destination queue pair to the memory of the destination device.
  • a packet sent by the source end may be different from the source end because the route to the destination end may be different.
  • the destination end may receive the packet sent by the source end.
  • the memory is directly written according to the write address carried in the message, instead of waiting for the entire group of messages to be re-arranged before the memory is written, thereby improving the system efficiency and avoiding a group of messages. If a packet loss occurs during transmission, it may be impossible for all packets of the group to be written to the destination memory.
  • receiving the Q packets includes: sequentially receiving Q packets; and saving the Q packets to the memory of the destination device includes: each time a packet is received, The step of saving the received message to the memory of the destination device is performed.
  • each of the Q messages further carries a message sequence number, and the message sequence number carried in each message is used to indicate that the message is in the Q The order in which the messages are sent.
  • the implementation manner includes: receiving a message, recording the sequence number of the packet currently received, and determining the packet of the next packet to be received according to the sequence number of the currently received packet. No. After receiving the next packet, it determines whether the sequence number of the next packet received is the same as the sequence number of the next packet to be received. If not, the packet loss detection process is started; The packet loss detection process determines that a packet loss occurs during the packet transmission, and then sends a packet retransmission indication to the source device.
  • the retransmission indication is not immediately sent to the source, but the corresponding packet loss detection is initiated, and the packet loss detection is performed to determine that packet loss occurs.
  • the source end retransmits the message, which improves the stability of the system.
  • the destination queue pair is provided with a bitmap
  • the bitmap includes at least Q bitmap bits
  • the Q bitmap bits are sent in the order according to the sending order of the Q messages.
  • the bitmap is provided with a head pointer and a tail pointer, the head pointer points to a bitmap bit corresponding to the latest received message of the receiving queue, and the tail pointer points to the next message to be received by the receiving queue;
  • a packet is received, and the sequence number of the packet currently received is recorded, and the sequence number of the next packet to be received is determined according to the sequence number of the currently received packet, including: according to the current receiving The message sequence number of the received message, the bitmap bit representing the currently received message in the bitmap is set to be valid, and the head pointer is pointed to the bitmap bit representing the currently received message; and, according to the current reception The message sequence number of the received message determines whether the currently received message is a message corresponding to the bitmap bit currently pointed
  • the pointer of the tail pointer is updated, and the new pointer of the tail pointer is the current receive
  • the first bitmap bit in the invalid bitmap bit after the bitmap bit corresponding to the received message if not, keep the bitmap bit currently pointed by the tail pointer unchanged. In this way, the bitmap is used to count the status of the received message, which improves the efficiency of the system.
  • another possible implementation manner is to confirm whether the received message sequence number of the next message is consistent with the message sequence number of the next message to be received, including: according to the received The message sequence number of a message determines whether the tail pointer currently points to the bitmap bit corresponding to the received next message. In this way, it can be determined whether the received message is out of order, thereby determining whether or not to take corresponding measures.
  • the packet loss detection process includes: starting a timer for a packet corresponding to a bitmap bit currently pointed by the tail pointer, and if the timer expires, the tail pointer is pointed No change occurs, and it is determined that the packet corresponding to the bitmap bit currently pointed by the tail pointer is lost. In this way, when a message has not been received, the system can determine that the message is lost, which improves the efficiency of the system.
  • the packet loss detection process includes: determining whether a bitmap bit currently pointed by the head pointer exceeds a predetermined value, and if so, determining a bitmap bit between the head pointer and the tail pointer. The corresponding packet is lost. By doing this, it is possible to effectively determine whether a received packet has a packet loss.
  • sending a message retransmission indication to the source device includes: sending a message retransmission indication to the source device, where the retransmission indication carries the bitmap currently pointed by the tail pointer
  • the message sequence number of the packet corresponding to the bit is sent to the source device to resend all the packets after the packet corresponding to the bitmap bit currently pointed by the tail pointer in the Q packets.
  • Another possible implementation manner is that when the value of the bitmap bit corresponding to a group of packets is set to be valid, it indicates that the group of packets has been completely collected, and the destination end sends the source to the source. Confirm the response message. By doing this, you can determine when a group of messages has been collected.
  • another possible implementation manner is: when the packet received by the destination end does not carry the part that indicates the write address of the packet at the destination end, the packet is cached first, and the packet is confirmed. Out of order, lost packets and whether they are collected. After confirming that the entire group of packets has been collected, the packets are sorted out of order according to the message sequence number of the message, and the packets are written into the memory after the out-of-order reordering. By doing so, it is possible to receive a message that does not carry a write address indicating that the message is at the destination end, and can perform out-of-order rearrangement.
  • a network interface card is provided, where the network interface card is located at a source device of a remote direct memory access RDMA, and an active queue pair is set on the network interface card, and each queue pair includes a sending queue; the network interface card includes: The obtaining module is configured to obtain Q data segments from the sending queue of the source queue pair, and the sending module is configured to encapsulate the Q data segments to obtain Q packets, and send Q packets, where the Q packets are Each packet carries a first header, a second header, and a queue pair identifier.
  • the first header carried in each packet is used to indicate the write address of the packet in the memory of the destination device, and each packet carries The second header includes the source port number information, and the source port number information in the second header carried in at least two of the Q packets is different, and Q is a positive integer greater than or equal to 2.
  • a possible implementation manner is that the sending module is specifically configured to sequentially encapsulate Q data segments according to the source port number information of the source queue pair to obtain Q packets, and send a packet after each packet is encapsulated. After the encapsulated packet is sent, the source port number information of the source queue pair is updated, and the source port number information carried in the previous group of N packets is carried in the next group of N packets.
  • the source port number information is different, N is greater than or equal to 1, less than Q.
  • the sending module is specifically configured to divide the Q data segments into M packets, and each packet includes at least one data segment, and sequentially encapsulates the data segments in each packet.
  • the packets in each packet are obtained, wherein the source port number information carried in the packets in each packet is the same, and the source port number information carried in the packets in the at least two packets is different, and M is less than or equal to Q.
  • a determining module configured to determine, according to a base address of the first data segment of the Q data segments and a length of each data segment, in the Q packets. The address of each message in the memory of the destination device.
  • each of the Q messages further carries a message sequence number, and the message sequence number carried in each message is used to indicate that the message is in the Q The order in which the messages are sent.
  • a fourth aspect provides a device, where the device includes a main processing system and a network interface card; the main processing system is configured to process the service, and when the service data needs to be sent to the destination device, the service data is sent to the network interface card.
  • Each of the Q messages carries a first packet and a second header, and the first header carried in each packet is used to indicate that the packet is in the destination.
  • the write address in the memory of the end device, the second header carried in each packet contains the source port number information, and the source port number information in the second header carried in at least two of the Q packets is different. Is a positive integer greater than or equal to 2.
  • a possible implementation manner is that the network interface card encapsulates the Q data segments to obtain Q packets, and sends the Q packets, including: sequentially packing the Q according to the source port number information configured by the source queue pair.
  • the data segment receives Q packets, and the encapsulated packet is sent after each packet is encapsulated.
  • the source port number information of the source queue pair is updated.
  • the first group of N packets is obtained.
  • the source port number information carried in the packet is different from the source port number information carried in the next N packets.
  • N is greater than or equal to 1, less than Q.
  • the network interface card encapsulates the Q data segments to obtain Q packets, and the Q packets are sent: the Q data segments are divided into M packets, and each The packet includes at least one data segment, and the data segment in each packet is encapsulated in turn to obtain a packet in each packet, where the source port number information carried in the packet in each packet is the same, and the packets in at least two packets are received.
  • the source port number information carried in the file is different.
  • the network interface card is further configured to determine, according to a base address of the first data segment of the Q data segments and a length of each data segment, in the Q packets. The write address of each message in the memory of the destination device.
  • each of the Q packets further carries a packet sequence number, and each packet carries the packet.
  • the sequence number is used to indicate the order in which the packets are sent in the Q packets.
  • the fifth aspect provides a network interface card, where the network interface card is located at a remote end memory access RDMA destination device, the network interface card is provided with a destination queue pair, and the destination queue pair includes a receiving queue; the network interface card includes: The receiving module is configured to receive the Q packets, where each packet carries a first header and a second header, and the first header carried in each packet is used to indicate that the packet is written in the memory of the destination device.
  • the inbound address, the second header carried in each packet contains the source port number information, and the source port number information in the second header carried in at least two of the Q packets is different, and Q is a positive integer greater than or equal to 2.
  • the destination device is a destination device of the RDMA, and the execution module is configured to save the Q packets from the destination queue according to the write address of the local message carried in the destination device. Go to the memory of the destination device.
  • a possible implementation manner is that the receiving module is specifically configured to sequentially receive Q packets; and each time the receiving module receives a packet, the executing module performs to save the received packet to the destination device. The steps in memory.
  • each of the Q messages further carries a message sequence number, and the message sequence number carried in each message is used to indicate that the message is in the message.
  • the sending sequence of the Q packets; the network interface card further includes: a detecting module; each time the receiving module receives a message, the detecting module is configured to record the sequence number of the packet currently received, and according to the current receiving The message sequence number of the message determines the sequence number of the next message to be received; after receiving the next message, it determines whether the message number of the next message received is the same as the next message to be received. Whether the sequence number of the packet is the same. If no, the packet loss detection process is started. If the packet loss detection process determines that packet loss occurs during packet transmission, the packet retransmission indication is sent to the source device.
  • the destination queue pair is provided with a bitmap
  • the bitmap includes at least Q bitmap bits
  • the Q bitmap bits correspond to Q according to the order in which the Q messages are sent.
  • the bitmap is provided with a head pointer and a tail pointer.
  • the head pointer points to the bitmap bit corresponding to the latest received message of the queue of the queue, and the tail pointer points to the next report to be received by the receiving queue of the queue pair.
  • the detection module is specifically configured to: according to the message sequence number of the currently received message, set the bitmap bit representing the currently received message in the bitmap to be valid, and point the head pointer to represent the currently received message.
  • Bitmap bit ; and, according to the message sequence number of the currently received message, determining whether the currently received message is a message corresponding to the bitmap bit currently pointed by the tail pointer, and if so, updating the pointer of the tail pointer
  • the new pointer of the tail pointer points to the first bitmap bit in the invalid bitmap bit after the bitmap bit corresponding to the currently received message. If not, the bitmap bit currently pointed by the tail pointer is kept unchanged.
  • the detecting module confirms whether the received message sequence number of the next message is consistent with the message sequence number of the next message to be received, including: according to the received The message sequence number of the next message determines whether the tail pointer currently points to the bitmap bit corresponding to the received next message.
  • the detecting module performing the packet loss detection process specifically includes: starting a timer for the packet corresponding to the bitmap bit currently pointed by the tail pointer, if the timer expires, The pointer of the tail pointer does not change, and it is determined that the packet corresponding to the bitmap bit currently pointed by the tail pointer is lost.
  • the detecting module performing the packet loss detection process specifically includes: determining whether a bitmap bit currently pointed by the head pointer exceeds a predetermined value, and if so, determining between the head pointer and the tail pointer The packet corresponding to the bitmap bit is lost.
  • the sending, by the detecting module, the message retransmission indication to the source device includes: sending a message retransmission indication to the source device, where the retransmission indication carries the tail pointer currently pointed The message sequence number of the packet corresponding to the bitmap bit is requested to resend all the packets after the packet corresponding to the bitmap bit currently pointed by the tail pointer in the Q packets.
  • an apparatus comprising a main processing system and a network interface card, the main processing system is configured to acquire application data from a memory of the device, and process the service according to the application data; the network interface card is configured to receive Each of the Q packets carries a first header and a second header. The first header carried in each packet is used to indicate the write address of the packet in the memory of the destination device.
  • the second header carried in the packet includes the source port number information, and the source port number information in the second header carried in at least two of the Q packets is different, and Q is a positive integer greater than or equal to 2; according to the Q packets
  • Each of the packets carried in the memory of the destination device stores the Q packets from the destination queue to the memory of the destination device.
  • a possible implementation manner is that the network interface card receives the Q packets, including: receiving the Q packets in sequence; and saving the Q packets to the memory of the destination device, including: receiving each report. For example, the step of saving the received message to the memory of the destination device is performed.
  • each of the Q messages further carries a message sequence number, and the message sequence number carried in each message is used to indicate that the message is in the Q The order in which the messages are sent.
  • the implementation manner includes: receiving a message, recording the sequence number of the packet currently received, and determining the packet of the next packet to be received according to the sequence number of the currently received packet. No. After receiving the next packet, it determines whether the sequence number of the next packet received is the same as the sequence number of the next packet to be received. If not, the packet loss detection process is started; The packet loss detection process determines that a packet loss occurs during the packet transmission, and then sends a packet retransmission indication to the source device.
  • the destination queue pair is provided with a bitmap
  • the bitmap includes at least Q bitmap bits
  • the Q bitmap bits are sent in the order in which the Q packets are sent.
  • the bitmap is provided with a head pointer and a tail pointer, the head pointer points to the bitmap bit corresponding to the latest received message of the receiving queue, and the tail pointer points to the next message to be received by the receiving queue;
  • the sequence number of the packet currently received is recorded, and the sequence number of the next packet to be received is determined according to the sequence number of the currently received packet, including: according to the current The message sequence number of the received message sets the bitmap bit representing the currently received message in the bitmap to be valid, and points the head pointer to the bitmap bit representing the currently received message; and, according to the current The sequence number of the received message determines whether the currently received message is the message corresponding to the bitmap bit currently pointed by the tail pointer.
  • the pointer of the tail pointer is updated, and the new pointer of the tail pointer is the Current connection
  • the network interface card confirms whether the received message sequence number of the next message is consistent with the message sequence number of the next message to be received, including: receiving The message sequence number of the next message to be determined determines whether the tail pointer currently points to the bitmap bit corresponding to the received next message.
  • the network interface card performs the packet loss detection process, including: starting a timer for the packet corresponding to the bitmap bit currently pointed by the tail pointer, if the timer expires, The pointer of the tail pointer does not change, and it is determined that the packet corresponding to the bitmap bit currently pointed by the tail pointer is lost.
  • the network interface card performs the packet loss detection process, including: determining whether the bitmap bit currently pointed by the head pointer exceeds a predetermined value, and if so, determining between the head pointer and the tail pointer The packet corresponding to the bitmap bit is lost.
  • the network interface card sends a message retransmission indication to the source device, including: sending a message retransmission indication to the source device, where the retransmission indication carries the tail pointer current pointing
  • the message sequence number of the packet corresponding to the bitmap bit is requested to resend all the packets after the packet corresponding to the bitmap bit currently pointed by the tail pointer in the Q packets.
  • a communication apparatus comprising a processor and a memory coupled to the processor, the processor for performing the method of message transmission as described in the first aspect in accordance with program instructions loaded in the memory.
  • a communication apparatus comprising a processor and a memory coupled to the processor, the processor for performing the method of message transmission as described in the second aspect in accordance with program instructions loaded in the memory.
  • a communication system includes a source device, a destination device, and at least one routing device, and the remote device directly accesses the RDMA, the source device, and the source device between the source device and the destination device.
  • the communication path between the destination devices includes at least one routing device.
  • the network interface card of the source device includes a source queue pair, and the source queue includes a sending queue.
  • the network interface card of the destination device includes a destination queue pair and a destination queue pair.
  • the source device is configured to obtain Q data segments from the sending queues of the source queue pair, and respectively encapsulate the Q data segments to obtain Q packets, and send the Q packets respectively, where Q
  • Each packet in the packet carries a first header and a second header, and the first header carried in each packet is used to indicate the write address of the packet in the memory of the destination device, and each packet carries The second header includes the source port number information, and the source port number information in the second header carried in at least two of the Q packets is different, and Q is a positive integer greater than or equal to 2; at least one
  • the routing device is configured to receive the Q packets sent by the source device, and determine the forwarding path for each packet according to the source port number information carried in each of the Q packets, and respectively forward the packets according to the determined The path forwards each packet; the destination device is configured to receive the Q packets, and according to the write addresses of the packets carried by the Q packets in the memory of the destination device, respectively, the Q packets are from the destination
  • the source device is further configured to further perform the method of the foregoing first aspect
  • the destination device is further configured to perform the method of the foregoing second aspect.
  • a computer readable storage medium comprising instructions, when executed on a computer, causing a computer to perform the method of message transmission as described in the first aspect.
  • a computer readable storage medium comprising instructions, when executed on a computer, causing a computer to perform the method of message transmission as described in the second aspect.
  • FIG. 1 is a schematic diagram of the composition of a data communication system in an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a data communication system that uses the RoCE protocol for transmission.
  • FIG. 3 is a schematic diagram of load imbalance caused by packet transmission under the RoCE protocol between two servers in the prior art.
  • FIG. 4 is a schematic structural diagram of a system for transmitting a message under the RoCE protocol between two servers in the embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a source end in an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a destination end in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a frame structure of a message in the RoCEv2 protocol in the prior art.
  • FIG. 8 is a schematic diagram of a frame structure of a encapsulated message according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a bitmap structure in an embodiment of the present application.
  • Figure 10 is a schematic illustration of the application of a bitmap in a data communication system in one embodiment of the present application.
  • FIG. 11 is a schematic diagram of a bitmap in an embodiment of the present application when an out-of-order message is received at a destination end.
  • FIG. 12 is a schematic diagram of a bitmap in the embodiment of the present application when the destination end receives the next message currently ready to be received.
  • FIG. 13 is a schematic flowchart of a source end in another embodiment of the present application.
  • FIG. 14 is a schematic flow chart of a destination end in another embodiment of the present application.
  • FIG. 15 is a schematic diagram of a functional structure of a network interface card of a source device in an embodiment of the present application.
  • FIG. 16 is a schematic diagram of a functional structure of a network interface card of a destination device in an embodiment of the present application.
  • Figure 17 is a schematic diagram showing the structure of a communication device in the embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of a source device in an embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of a destination device in an embodiment of the present application.
  • RDMA Remote Direct Memory Access
  • RDMA Remote Direct Memory Access
  • Ethernet The calculation processing function of this system. It eliminates the overhead of external memory copying and context switching, thus freeing up memory bandwidth and CPU cycles to improve application performance.
  • RoCE RDMA using Ethernet
  • the server can be roughly divided into a software layer and a hardware layer (illustrated by two servers in FIG. 1), wherein the software layer includes at least one application, and the hardware layer is mainly It is composed of a processor 111, a memory 121, a network interface card 131, and the like.
  • the data of one application on the server 101 needs to be shared by the RoCE protocol to another server 102 for use by the application on the other server 102.
  • the data communication system 200 includes a server 201 and a server 202.
  • the server 201 includes a network interface card 241 and a main processing system 281.
  • the main processing system 281 includes a host CPU 261 and a host memory 271 (other computer systems). Conventional hardware such as a hard disk, bus, etc. are not shown in FIG. 2, and various software components such as an operating system 251 and an application 211 running on the operating system 251 are also run on the main processing system 281.
  • the server 202 includes a network interface card 242 and a main processing system 282.
  • the main processing system 282 includes a host CPU 262 and a host memory 272.
  • the main processing system 282 also runs various software components, such as an operating system 252, and runs on the operating system 252.
  • Application 212 is not limited to the operating system.
  • the network interface card 241 (also referred to as a network adapter or a communication adapter) has a cache 221, and the cache pair 221 can be set with a queue pair (English name: Queue Pair, abbreviation: QP), and FIG. 2 shows a QP231 (network interface card).
  • the QP in the middle can be set according to the requirements of the upper application, and multiple QPs can be set.
  • a QP is taken as an example.
  • QP is a virtual interface provided by the network interface card to the application. It consists of a send work queue (English: Send Work Queue) and a receive work queue (English: Receive Work Queue). The send work queue and the receive work queue are always generated together.
  • the instructions sent by the application to the network interface card are called work queue elements (English full name: Work Queue Element, WQE).
  • WQE Work Queue Element
  • the server 201 and the server 202 first establish QP pairing, that is, the application program 211 and the application program 212 are explicitly implemented by the QP231 and the QP232. Inter-data transmission, and add the corresponding queue pair identifier (English: QP ID) in the message sent later.
  • the working process of RDMA is usually divided into three parts.
  • the first step when the application 211 on the server 201 executes the RDMA request, no data copying on the memory of the main processing system is performed, and the RDMA request is sent from the cache of the application 211 to the cache 221 in the network interface card 241.
  • the send queue of the queue pair in .
  • the network interface card 241 reads the content (data) of the transmission queue in the cache 221, and sends the content to the QP232 in the server 202 in the form of a message, thereby writing into the cache 222 of the network interface card 242.
  • the third step after receiving the data, the network interface card 242 directly writes the data into the memory corresponding to the application 212 of the main processing system.
  • the routing device in the Ethernet selects the forwarding path according to the quintuple information of the message. Specifically, the routing device performs hash calculation on the five-tuple information of the packet, that is, the source port number, the destination port number, the source IP address, the destination IP address, and the protocol type of the packet, and calculates according to the five parts.
  • the hash value is used as the basis for the packet forwarding path.
  • two servers in the data communication system 300, the server 301 and the server 302 are connected by a plurality of routers, and communicate under the RoCE protocol. There are multiple QPs on each server.
  • QP351 and QP352 on server 301, and QP353 and QP354 on server 302.
  • the quintuple information of the message remains unchanged, so The hash value of the path is also the same, and all the data sent by the QP 351 will select the same path, for example, the router 321 is selected to send the data to the QP in the server 302.
  • the source server transmits data with maximum capability from the start of data transmission.
  • the probability of network congestion is greatly increased.
  • the network path connected to the router 321 is congested, it affects the packet transmission of all servers connected to it.
  • the RoCE network is sensitive to packet loss. As the network packet loss rate increases, the effective bandwidth of the network transmission decreases rapidly.
  • the present application provides a more granular message transmission method and related device.
  • the packets sent by the same QP are further grouped, so that the packets of different packets belonging to the same QP have different source port number information, so that the packets sent by the same QP pass through.
  • different paths are obtained through the hash algorithm, so that even if the packets sent by the QP have abnormally increased traffic within a certain period of time, the traffic can be prevented from passing through the same path, and Avoid transmission imbalance and congestion of the entire multi-path network caused by congestion of a certain path.
  • the source port number of the packets sent by the same QP is modified.
  • This modification allows packets carrying different source port numbers to reach different paths in the multi-path network.
  • the length and efficiency of each path are different. Therefore, the order in which these packets reach the destination is different from the order in which the packets are sent on the source. This may cause the destination to receive the packets. Save the text to the real destination.
  • the packets sent by the same QP carry the same source port number information, and the packets are forwarded in the same path.
  • the receiving order of the destination end is the same as the sending order of the source end. Therefore, the RoCEv2 protocol specifies that the source end sends the same QP.
  • the packet only carries the write address of the data in the packet in the memory of the destination end in the first packet, and the other non-first packet does not need to carry the associated write address, and the destination end receives the packet according to the received packet.
  • the sequence can be used to write the message to the corresponding memory address.
  • the data communication system 400 includes two servers, a server 401 and a server 402 (two shown in the figure, and the actual number may be two or more).
  • the server 401 and the server 402 are directly connected to the router 411 and the router 412, respectively, and the router 411 and the router 412 are connected by four routers, such as a router 421, a router 422, a router 423, and a router 424.
  • the server 501 includes a processor 431 and a network interface card 441.
  • the network interface card 441 includes a plurality of QPs, which are shown as QP451 and QP452, wherein each QP is correspondingly provided with a bitmap.
  • the configuration of the server 402 is similar to that of the server 401, and is composed of a processor 432 and a network interface card 442.
  • the network interface card 441 and the network interface card 442 support the RoCEv2 protocol, and the server 401 and the server 402 perform RDMA communication through the QP.
  • the bitmap in FIG. 4 is a specific implementation of receiving and sorting packets at the destination end in one embodiment of the present application, and may be implemented in other methods in other embodiments.
  • FIG. 5 and FIG. 6 are flowcharts showing a source server sending a message and a destination server receiving a message according to an embodiment of the present application.
  • the network interface card 441 acquires Q data segments to be sent from the transmission queue of the QP451.
  • the work request will be sent directly to the corresponding QP in the network interface card 441.
  • the network interface card 44 can read the work request and have the QP execute the work request.
  • the content of the work request is to send a set of application data, and the set of application data may include Q data segments, where Q is a positive integer greater than or equal to 2.
  • S2 Confirm that the packet encapsulated by the acquired data segment is to be written into the address of the memory of the destination server 402.
  • the address is calculated according to the base address of the Q data segments and the length of the data segment of the Q data segments before the acquired data segment.
  • the source server 401 Before the source server 401 transmits data to the destination server 402 by means of RDMA, the source server 401 first communicates with the destination server 402, and the destination server 402 notifies the source server of the base address of the packet encapsulated by the source server 401. 401, wherein the base address refers to a first address of a write address of the first packet of the group of messages in the destination server memory.
  • S3 Encapsulate the obtained data segment to obtain a encapsulated packet.
  • FIG. 7 shows the existing RoCEv2 message format.
  • the RoCEv2 packet format is added to the header of the User Datagram Protocol (English Datagram Protocol, abbreviation: UDP) to support the Ethernet IP routing function and enhance the scalability of the RoCE network.
  • UDP User Datagram Protocol
  • the UDP header consists of five parts: the source port number, the destination port number, the length, the checksum, and the data.
  • the value of the destination port number of UDP is fixed to 4791 according to the protocol. Since there are multiple servers in the data communication system, and there are multiple QPs on each server, the value of the source port number of each QP is generally different.
  • the extension of the encapsulated message is mainly divided into two parts, as follows:
  • a part is to add a first header to the data segment, where the first header carries information for indicating the write address of the message in the memory of the destination end.
  • the RDMA extended transmission header is added after the basic transmission header (English full name: Base Transport Header, abbreviation: BTH) of the data segment (English full name) : RDMA Extended Transport Header (abbreviated as RETH); if it is not the first data segment, the extension header (English full name: Extended Header, EXH) is added after the BTH part of the data segment.
  • BTH Basic Transmission Header
  • RETH RDMA Extended Transport Header
  • EXH Extended Header
  • the RETH part includes three parts: a virtual address (English: Virtual Address), a remote key (English: Remote Key), and a DMA length (English: DMA Length).
  • the length of the Virtual Address part is 64 bits, and the virtual address of the corresponding destination end after the RDMA operation is recorded;
  • the length of the Remote Key part is 32 bits, and the authorization information for allowing the RDMA operation is recorded;
  • the length of the DMA Length part is 32 bits, which is the number of bytes of the message that performs the DMA operation.
  • the EXH header includes four parts: Virtual Address, immediate (English: Immediate), WQE serial number (English: WQE Number), and reserved field (English: Reserved).
  • the Virtual Address part is the same as the Virtual Address part of the RETH header.
  • the length is 64 bits, which records the memory address of the destination that the current message needs to be written.
  • the length of the Immediate part is 1 bit, and the current message is recorded. Whether to carry the immediate number; the length of the WQE Number part is 31 bits, which records the WQE sequence number sent by the QP; the length of the Reserved part is 32 bits, which is a reserved field. Among them, in addition to the Virtual Address part of the EXH header, the remaining three parts can be adjusted according to actual needs.
  • the packet By encapsulating the header containing the virtual address in the packet, the packet can be quickly written into the memory when it reaches the destination. At the same time, since the message has a virtual address part, even if the message is out of order during the network transmission, the corresponding position in the memory of the destination end can be written according to the virtual address.
  • the other part is to add a second header to the data segment.
  • the second header carries the source port number information of the source queue pair.
  • at least two of the Q packets encapsulated by the Q data segments in the embodiment of the present application have different source port number information.
  • the packets with different source port numbers may choose different forwarding paths because the source port information is different. Because the different source port number information of the packets sent by the same QP is set, the traffic of the packets sent by the same QP can be shared to different forwarding paths even if the traffic sent by the QP is large. It also does not cause congestion on a path of the entire multipath network.
  • a message sequence number (English name: Packet Sequence Number, abbreviation: PSN) may be added to the BTH part of the data segment, where the message sequence number is used to indicate the order of the data segment in the Q data segments.
  • PSN Packet Sequence Number
  • S4 The packet is sent every time a data segment is encapsulated into a message.
  • S5 Determine whether a preset number of packets have been sent. After the preset number of packets have been sent, S6 is performed; when a preset number of packets have not been sent, the process jumps to S1.
  • the preset quantity may be changed. For example, after sending three packets, the port information of the source queue pair is updated, and after four packets are sent, the port information of the source queue pair is updated.
  • the preset number may also be fixed. For example, the port information of the source queue pair is updated after sending 3 messages each time.
  • the port information of the source queue pair is updated.
  • the second header of the packet encapsulated by the set of data has different source port numbers. Therefore, when the packet is transmitted in the network, the router performs routing according to the hash value of the quintuple information of the packet. Since the packets have different source port numbers, the resulting hash values are likely to be different, so that different paths are selected for transmission, so that the traffic of each path in the network is more balanced.
  • each QP only uses a fixed source port number, and its forwarding path in the network is fixed. As long as packet loss does not occur, the packet will not be out of order. .
  • the source port number information corresponding to the QP is changed for the purpose of achieving traffic balancing, and therefore the path for packet forwarding on the network is also changed. Because different network paths may process packets in different times, packets may be out of order on the destination.
  • the source end encapsulates the RETH or EXH extension header into the packet, and puts the virtual address of the memory of the destination server to be written into the packet. When the packet arrives at the destination end, it can directly write the corresponding memory location of the destination server according to the virtual address in the RETH or EXH extension header, which is equivalent to the order in which it is restored when it is transmitted at the source end.
  • S7 Determine whether the Q data segments have been sent. If there is still a data segment not sent, jump to S1.
  • the source may further divide the Q data segments to be transmitted into at least two packets, each packet including at least one data segment.
  • the data segment in each packet is encapsulated to obtain a packet in each packet, where the source port number information carried in the packets in each packet is the same, and the source port number information carried in the packets in at least two packets is different.
  • the Q packets are sent from the source, the Q packets are forwarded through the router.
  • the router selects a forwarding path according to the quintuple information of the Q packets. When the source port number information of the Q packets is different, the packets may be forwarded in different paths.
  • the order in which the Q packets reach the destination may be different from the order in which the Q packets are sent from the source.
  • the destination end After receiving the Q packets, the destination end needs to save the data segments in the Q packets to the corresponding addresses.
  • the RoCE protocol stipulates that the destination end receives the packet in the order in which the packet is sent. If the received packet is out of order, the destination end immediately sends a retransmission indication, so that the source end retransmits the transmission path. Messages that may be lost.
  • the foregoing embodiment of the present application changes the source port number information of the packet sent by the same QP at the transmitting end, so that the order of the Q packets arriving at the destination end is different from the order of sending the Q packets.
  • the destination end determines that the received message is out of order, it immediately sends a retransmission indication, and the retransmission of the message will be more expensive.
  • the present application also performs out-of-order detection on the received packet at the destination end, and does not immediately send a packet retransmission indication to the source end when detecting an out-of-order condition, but initiates a packet loss detection process,
  • the packet loss detection process determines that a packet retransmission indication is sent to the source end when the packet loss occurs, thereby improving the transmission efficiency of the system. See Figure 6 for a specific process embodiment of the destination.
  • the sequence number of the packet carried by the received packet is used to check whether the packet sent by the source server is out of order, lost packets, and whether the packet is received.
  • the method can be implemented by means of bitmaps, arrays and linked lists. The embodiment of the present application will be described by taking a bitmap as an example.
  • FIG. 9-12 illustrate the principles of a bitmap algorithm in an embodiment of the present application, wherein:
  • FIG. 9 is a schematic diagram of a bitmap implementing a bitmap algorithm.
  • each QP corresponds to a bitmap for recording the reception of the message.
  • Each bitmap includes a plurality of bitmap bits, each bitmap bit represents a message, and the bitmap bits of the bitmap are numbered from front to back, and a correspondence relationship is established with the value of the message sequence number of the message.
  • the bit map bits correspond to the message from the way of the message according to the order in which the messages are sent.
  • Each bitmap also has a tail pointer and a head pointer, and the tail pointer points to a bitmap bit corresponding to the next message to be received by the receiving queue of the queue pair corresponding to the bitmap; the head pointer points to the currently received message corresponding to the current message.
  • Bitmap bit When the value of the bitmap bit in the bitmap is valid, it means that the message corresponding to the bitmap bit has been received; when the value of the bitmap bit in the bitmap is invalid, it represents the corresponding bit of the bitmap. The message has not been received yet, and the validity may be represented by a value of 1 or by a value of 0. In the embodiment of the present application, the effective value is represented by 1.
  • the range of the bitmap used is set. If the source end sends Q messages, the bitmap corresponding to the destination end includes at least Q bits. Map bit. In the range of the bitmap, the front end corresponds to a message having the smallest value of the message sequence number.
  • the direction of the tail pointer is the next message to be received.
  • the receiving queue of the queue pair corresponding to the bitmap is not currently received, and the received message is prepared in the next packet to be received.
  • the next message to be received is the latest one of the currently unreceived messages. It can also be said that the next message to be received generally refers to the message that the destination end does not currently receive.
  • the source sends Q packets.
  • the sequence of the Q packets is 1 to Q.
  • the sequence number of the packet is 1 for the first packet.
  • the packet number is Q. Indicates the last packet sent. If the destination receives the packet with the sequence number 1, 2, and 5, the next packet to be received is the packet with the sequence number 3.
  • the tail pointer also points to the packet. Bitmap bit corresponding to the message with the sequence number 3.
  • Figure 10-12 shows how the value of the bitmap bit in the bitmap and the position of the head and tail pointers change according to the received message.
  • the QP 451 in the server 401 sends 10 packets to the QP 452 in the server 402.
  • the message numbers of the packets are 1-10, and the corresponding bitmap also has 10 bitmap bits.
  • To the back end (shown from right to left in the figure) are numbered 1-10, which correspond to the message one-to-one.
  • the sequence of the message changes during transmission, and the order of reaching the destination QP452 is 3, 1, 2, 4, 5, 6, 7, 8, 9, and 10.
  • the head pointer moves to the corresponding bitmap bit 3 and sets the value of the bitmap bit to be valid. Since the position pointed by the tail pointer is the bitmap bit corresponding to the next message to be received, that is, the message whose message number has a value of 1, the tail pointer remains unchanged.
  • the head pointer moves to the corresponding bitmap bit position, and the value of the bitmap bit is set to be valid.
  • the tail pointer receives the next message currently ready to be received, so the movement occurs, and the new pointer points to the first bitmap bit in the invalid bitmap bit after the bitmap bit corresponding to the currently received message. , that is, the number 2 bitmap.
  • the steps performed by the destination end are as follows:
  • the destination end receives the packet sent by the source end, and caches the packet to the corresponding target queue pair.
  • the value of the bitmap bit corresponding to the packet in the bitmap is set to be valid. Is 1 and points the head pointer of the bitmap to the bitmap bit corresponding to the message.
  • the tail pointer of the bitmap points to the bitmap bit corresponding to the next message currently ready to be received. Therefore, when the head pointer and the tail pointer in the bitmap point to different bitmap bits, it can be determined that the received message is not the next message currently ready to be received, that is, the received message is out of order.
  • S5 is directly performed to determine whether the message has been received. If the received message is not the next message currently ready to be received, S4 is performed.
  • the packet loss detection process is started to determine whether packet loss occurs during the transmission. For example, when a bitmap is used for checking, the timer is started when it is determined that the received packet is not the next packet currently being received. If the pointer of the tail pointer does not change after the timer expires, it indicates that the destination does not receive the packet corresponding to the bitmap bit pointed to by the tail pointer within the preset time, thus indicating the bitmap pointed by the tail pointer. The packet corresponding to the bit has been dropped. If the message corresponding to the bitmap bit currently pointed by the tail pointer is received, the tail pointer moves and the timer will be reset.
  • the predetermined value T can be set according to actual needs. For example, the predetermined value T can be set to Q, that is, the number of the group of messages, in which case the bitmap bit pointed by the head pointer exceeds a predetermined value. In the case of T, it indicates that the destination end has received the next set of packets when the group of packets has not been received, and it can be determined that packet loss has occurred.
  • the destination end sends a negative response packet to the source end to notify the source end that the packet transmission process has an error.
  • the message retransmission indication is sent to the source end, where the retransmission indication carries the message sequence number of the packet corresponding to the bitmap bit currently pointed by the tail pointer, to request the source end to retransmit the message sequence number corresponding to the source end. All messages after the message. In this way, when the destination end receives the out-of-order packet, it can determine more accurately which packet loss may occur, and when the packet loss is determined, the source end is re-transmitted, thereby improving the packet. The efficiency of the system.
  • S5 Determine whether the message has been collected. When the value of the bitmap bit corresponding to the group of messages is set to be valid, it indicates that the group of packets has been collected, and S6 is performed. If the message has not been received, return to S1.
  • FIG. 13 and 14 are flow diagrams of the source and destination of another embodiment of the present application.
  • the network interface card 441 acquires Q data segments to be sent from the transmission queue of the QP451.
  • S2 Encapsulate the obtained data segment to obtain a encapsulated packet.
  • the second header carrying the port information of the source queue pair is added to the data segment, and the RETH carrying the write destination memory is added to the first data segment of each group of data.
  • the header does not add an EXH header carrying the write destination memory to the remaining data segments.
  • S3 The packet is sent every time a data segment is encapsulated into a message.
  • S4 Determine whether a preset number of packets have been sent. After the preset number of packets have been sent, proceed to S5; when a preset number of packets have not been sent, jump to S1.
  • S6 Determine whether the data of the group has been sent. If there is still data that is not encapsulated and sent, jump to S1.
  • the steps performed by the destination end are as follows:
  • the destination end receives the packets sent by the source end and buffers the packets to the corresponding queue pair.
  • S2 Determine whether the received message is the next message currently ready to be received. If not, proceed to S3; if the message is yes, proceed to S4.
  • S3 Determine whether packet loss occurs. If the packet is lost, the destination sends a negative acknowledgment message to the source, notifies the source, and the packet transmission process is incorrect. The packet is retransmitted to the source. If no packet loss occurs, proceed to S4.
  • S4 Determine whether the message has been collected. If the message has been collected, proceed to S5; if the message has not been collected, return to S1.
  • S7 The destination sends an acknowledgement packet to the source.
  • an embodiment of the present application provides a network interface card 1500, which is located at a source device located in a remote direct memory access RDMA, and is configured to be active on the network interface card 1500.
  • the obtaining module 1510 is configured to obtain Q data segments from a sending queue of the first source queue pair of the at least two source queue pairs;
  • the sending module 1520 is configured to encapsulate the Q data segments to obtain the Q packets, and send the Q packets, where each of the Q packets carries the first header and the second header, and each packet carries The first header is used to indicate the write address of the packet in the memory of the destination device, and the second header carried in each packet includes the source port number information, and the second packet carries the number of at least two packets.
  • the source port number information in the second header is different, Q is a positive integer greater than or equal to 2, and the destination device is the destination device of the RDMA;
  • the determining module 1530 is configured to determine, according to the base address of the first data segment of the Q data segments and the length of each data segment, a write address of each of the Q packets in the memory of the destination device. .
  • the embodiment of the present application provides another network interface card 1600, which is located in a remote direct memory access RDMA destination device, and the network interface card 1600 is provided with a destination.
  • the queue pair, the destination queue pair includes a receive queue;
  • the network interface card 1600 includes:
  • the receiving module 1610 is configured to receive the Q packets, where each packet carries a first header and a second header, and the first header carried in each packet is used to indicate that the packet is in the memory of the destination device.
  • the second header carried in each packet contains the source port number information, and the source port number information in the second header carried in at least two of the Q packets is different, and Q is greater than or equal to 2.
  • the executing module 1620 is configured to save the Q packets from the destination queue pair to the memory of the destination device according to the write address of the local message carried by the Q packets in the memory of the destination device.
  • the detecting module 1630 is configured to: when the receiving module receives a packet, the detecting module is configured to record the sequence number of the packet currently received, and determine the next received pre-received according to the sequence number of the currently received packet. The packet sequence number of the packet is determined. After receiving the next packet, it is determined whether the sequence number of the next packet received is the same as the packet sequence number of the next packet to be received. If not, the packet loss is started. The detection process is performed. If the packet loss detection process determines that packet loss occurs during packet transmission, the packet retransmission indication is sent to the source device.
  • FIG. 17 is a schematic structural diagram of a communication device 1700 according to an embodiment of the present application.
  • the communication device in this embodiment may be one of the specific implementation manners of the network interface card in the foregoing embodiments.
  • the communication device includes a processor 1701 that is coupled to the memory 1705.
  • the processor 1701 can be a central processing unit CPU, or a field programmable gate array (English name: Field Programmable Gate Array, abbreviation: FPGA), or a digital signal processor (English full name: Digital Signal Processor, abbreviation: DSP) and other calculation logic Or a combination of any of the above calculation logics.
  • the processor 1701 can also be a single core processor or a multi-core processor.
  • Memory 1705 can be RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable hard drive, CD-ROM, or any other form of storage medium known in the art, which can be used to store program instructions, the program When the instructions are executed by the processor 1701, the processor executes the method of the source or destination in the above embodiment.
  • connection line 1709 is used to transfer information between the components of the communication device.
  • the connection line 1709 can be connected by wire or wirelessly. This application does not limit this.
  • Connection 1709 is also coupled to network interface 1704.
  • Network interface 1704 enables communication with other devices or networks 1711 using connection means such as, but not limited to, cables or electrical strands, and network interface 1704 can also be interconnected with network 1711 in a wireless fashion.
  • Some features of embodiments of the present application may be completed/supported by the processor 1701 executing program instructions or software code in the memory 1705.
  • the loaded software components on the memory 1705 can be functionally or logically summarized, for example, the function/logic module of the acquisition module, the transmission module, etc. shown in FIG. 15, or the function/logic of the reception module and the execution module shown in FIG. Modules, etc.
  • the processor 1701 executes the above-described function/logic module related transactions in the memory.
  • FIG. 17 is merely an example of a communication device that may include more or fewer components than shown in FIG. 17, or have different component configurations.
  • the various components shown in FIG. 17 may be implemented by hardware, software, or a combination of hardware and software.
  • the communication device may be implemented in the form of one chip.
  • the memory and the processor may be implemented in one module, and the instructions in the memory may be written in advance to the memory, or may be loaded by the subsequent processor during execution.
  • An embodiment of the present application provides an apparatus, as shown in FIG. 18, which includes a main processing system 1810 and a network interface card 1830.
  • the main processing system 1810 is configured to process the service, and send the service data to the sending queue of the source queue pair corresponding to the service data in the network interface card 1830 when the service data needs to be sent to the destination device; the network interface card 1830.
  • the method is configured to obtain Q data segments from a sending queue of a source queue pair corresponding to the service data, where the Q data segments belong to service data, encapsulate Q data segments, obtain Q packets, and send the Q packets.
  • Each of the Q packets carries a first header and a second header, and the first header carried in each packet is used to indicate a write address in the memory of the destination end, and each packet is sent.
  • the second header carried includes the source port number information, and the source port number information in the second header carried in at least two of the Q packets is different, and Q is a positive integer greater than or equal to 2.
  • the network interface card 1830 is further configured to determine, according to the base address of the first data segment of the Q data segments and the length of each data segment, the writing of each of the Q packets in the memory of the destination device. address.
  • the device 1900 includes a main processing system 1910 and a network interface card 1930.
  • the main processing system 1910 is configured to acquire application data from the memory 1920 of the device 1900, and process the service according to the application data;
  • the network interface card 1930 is configured to receive application data implemented by the Q messages and the Qs to be received.
  • the message is written to memory 1920.
  • For the method for receiving the Q packets by the network interface card 1930 refer to the method for transmitting the message as shown in FIG. 6.
  • Embodiments of the present application also provide a computer readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform a method of message transmission as shown in FIG.
  • Embodiments of the present application also provide another computer readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform a method of message transmission as shown in FIG.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Communication Control (AREA)

Abstract

本发明提出了一种报文传输方法以及实现该方法的装置。在当前的做法中,路由设备根据所转发报文的五元组信息的哈希值选择转发路径。由于不同的队列对所发送的报文的五元组信息,经过哈希计算的值可能相同或者对应同一条转发路径,导致报文在各个路径上的流量不均衡。针对目前做法所可能导致的流量不均衡的问题,本发明将待发送的报文分为若干分组,不同分组中的报文具有不同的源端口信息,且每个报文携带的报头携带有该报文在目的服务器内的内存中的写入地址。通过这种做法,使得待发送报文会被分在不同的路径上进行转发,从而增加了网络流量的均衡性。

Description

一种报文传输的方法及装置 技术领域
本发明涉及报文传输技术领域,特别涉及一种报文传输的方法及装置。
背景技术
在数据通信系统,为了提高服务器之间报文传输的速度,通常采用远程直接内存访问(英文:Remote Direct Memory Access,简称:RDMA)技术进行连接。RDMA,是通过网络把数据直接传入计算机的存储区,将数据从一个系统快速移动到远程系统存储器中,而不对操作系统造成影响。RDMA消除了外部存储器复制和上下文切换的开销,因此能解放内存带宽和CPU周期用于改进应用系统性能。
基于融合以太网的远程直接内存访问(英文全称:RDMA over Converged Ethernet,缩写:RoCE)是RDMA技术的一种,允许服务器通过以太网进行远程直接内存访问。目前RoCE有两个协议版本,v1和v2。其中,RoCE v1协议允许在同一个广播域下的任意两台服务器直接访问。而RoCE v2协议则可以实现路由功能。虽然RoCE协议的优点主要是基于融合以太网的特性,但是RoCE协议也可以应用在传统以太网网络或者非融合以太网络中。
当RoCEv2协议的报文在多路径的网络中进行传输时,通常根据该报文中的五元组信息的哈希值来选择转发的路径,以此实现流量均衡。然而,根据RoCEv2协议的快启动特性,从某个源端端口发出的报文流量有可能在某个时间段可能比较大,另外,哈希的随机性也可能造成多路径网络中的某条路径在某时刻的流量较大,这都可能导致多路径网络中发生某条路径的拥塞。当网络产生拥塞之后,不仅会导致网络时延增加,也会导致网络丢包的可能性增加,从而导致网络传输的有效带宽下降。RoCE协议下的网络路由的路径均衡需要进一步优化。
发明内容
本申请的实施例提供一种报文传输方法,以使得采用RoCE协议的报文在以太网中实现更均衡的路由传输。
第一方面,本申请提供了一种报文传输的方法,该方法应用于数据通信系统,该数据通信系统中的源端设备与目的端设备之间通过以太网进行远程直接内存访问RDMA,源端设备的网络接口卡上包括至少源队列对,源队列对包括发送队列。该报文传输的方法包括:从源队列对的发送队列中获取Q个数据段;分别封装Q个数据段得到Q个报文,并分别发送该Q个报文,其中,这Q个报文中的每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示该报文在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文分别携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数。
上述方案中,由于至少两个报文携带的第二报头中的源端口号信息不同,因此路由器根据五元组的哈希值进行选路时,该组报文分在至少两个不同的网络路径中进行传输,从而使得网络中各个路径的流量较为均衡。另一方面,由于相同的一组报文在不同传输 路径传输可能导致目的端接收到乱序的一组报文,上述方案通过在报文中携带具有指示该报文在目的端的内存中的写入地址的第一报头,可以使得目的端设备能够直接根据每个报文携带的地址信息进行RDMA操作,从而使得上述方案既能够实现RDMA操作的报文的路由的进一步优化,又能够保证RDMA操作能够在目的端真正实现。
对于上述第一方面,一种可能的实现方式是根据源队列对配置的源端口号信息,依次封装Q个数据段得到Q个报文,每封装完成一个报文就发送封装后的报文,并在每封装完成N个报文后,更新源队列对配置的源端口号信息,前一组N个报文携带的源端口号信息与后一组N个报文携带的源端口号信息不同,N大于等于1,小于Q。通过上述每封装一个报文就发送该报文的做法,可以提高系统的效率。
对于上述第一方面,另一种可能的实现方式是:将Q个数据段划分为M个分组,每个分组中包括至少一个数据段,依次封装每个分组中的数据段得到每个分组中的报文,其中,每个分组中的报文携带的源端口号信息相同,至少两个分组中的报文携带的源端口号信息不同。通过上述分组封装的方法,可以提高系统的效率。
对于上述第一方面,另一种可能的实现方式是分别封装Q个数据段得到Q个报文之前,还包括:根据Q个数据段的第一个数据段的基地址和每个数据段的长度,确定Q个报文中的每个报文在目的端设备的内存中的写入地址。采取计算每个报文在目的端设备的内存中的写入地址并将该地址封装在报文中的做法,可以使得报文在到达目的端时直接被写入内存相应的地址中。
对于上述第一方面,另一种可能的实现方式是Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指示本报文在Q个报文中的发送顺序。通过这种做法,可以便于目的端根据报文序号确认该组报文是否收齐或者进行报文的乱序重排等,提高了系统的稳定性。
第二方面,提供一种报文传输的方法,该方法应用于数据通信系统,该数据通信系统中的源端设备和目的端设备之间通过以太网进行远程直接内存访问RDMA,其中,目的端设备的网络接口卡上包括目的队列对,目的队列对包括接收队列;该报文传输的方法包括:接收Q个报文,其中,每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数;根据Q个报文各自携带的本报文在目的端设备的内存中的写入地址,分别将Q个报文从目的队列对保存到目的端设备的内存中。
对于源端发送的一组报文,由于其在多路径网络中路由可能经过不同的传输路径,因此达到目的端的顺序与源端的发送顺序可能不同,目的端在接收到源端发送的报文后,直接根据报文携带的写入地址进行内存写入,而不是等待接收到全部一组报文后进行重排之后才进行内存写入,提高了系统效率,同时也避免了若一组报文在传输中发生丢包,将可能全部一组的报文将无法实现目的端内存写入的问题。
对于上述第二方面,一种可能的实现方式是,接收Q个报文包括:依次接收Q个报文;保存Q个报文到目的端设备的内存中包括:每接收到一个报文,就执行将接收到的报文保存到目的端设备的内存中的步骤。通过这种做法,可以每接收一个报文就进行相应的处理,提高了系统的效率。
对于上述第二方面,另一种可能的实现方式是,Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指明本报文在Q个报文中的发送顺序。该实现方式还包括:每接收到一个报文,记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号;在接收到下一个报文后,确定接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,如果否,启动丢包检测流程;如果通过丢包检测流程确定在报文传输过程中发生丢包,则向源端设备发送报文重传指示。通过这种做法,可以当乱序、丢包等情况发生时,避免立刻向源端发送重传指示,而是启动相应的丢包检测,在丢包检测确定发生丢包的情况下,才引导源端进行报文重传,提升了系统的稳定性。
对于上述第二方面,另一种可能的实现方式是目的队列对设置有位图,该位图至少包括Q个位图位,该Q个位图位按照Q个报文的发送顺序从前往后对应于该Q个报文,位图设置有头指针和尾指针,头指针指向接收队列最新接收到的报文所对应的位图位,尾指针指向接收队列预备接收的下一个报文;每接收到一个报文,记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号,包括:根据当前接收到的报文的报文序号,将位图中代表当前接收到的报文的位图位设置为有效,并将头指针指向代表当前接收到的报文的位图位;以及,根据当前接收到的报文的报文序号,确定当前接收到的报文是否是尾指针当前指向的位图位所对应的报文,如果是,更新尾指针的指向,尾指针新的指向为所述当前接收到的报文所对应的位图位之后的无效的位图位中的第一个位图位,如果否,保持尾指针当前指向的位图位不变。通过这种做法,利用位图来统计接收到的报文状况,提高了系统的效率。
对于上述第二方面,另一种可能的实现方式是,确认接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,包括:根据接收到的下一个报文的报文序号,确定尾指针当前是否指向接收到的下一个报文所对应的位图位。通过这种做法,可以判断接收到的报文是否发生乱序,从而决定是否要采取相应的措施。
对于上述第二方面,另一种可能的实现方式是,丢包检测流程包括:针对尾指针当前指向的位图位所对应的报文启动定时器,若在定时器超时后,尾指针的指向不发生改变,确定尾指针当前指向的位图位所对应的报文发生丢包。通过这种做法,当某个报文一直没有被接收到时,系统可以判定该报文丢失,提高了系统的效率。
对于上述第二方面,另一种可能的实现方式是,丢包检测流程包括:确定头指针当前指向的位图位是否超过预定值,如果超过,确定头指针和尾指针之间的位图位所对应的报文发生丢包。通过这种做法,可以有效判断接收到的报文是否有丢包产生。
对于上述第二方面,另一种可能的实现方式是,向源端设备发送报文重传指示包括:向源端设备发送报文重传指示,该重传指示携带尾指针当前指向的位图位所对应的报文的报文序号,以请求源端设备将Q个报文中的尾指针当前指向的位图位所对应的报文之后的所有报文进行重新发送。通过这种做法,只需要源端重传尾指针当前指向的位图位所对应的报文之后的所有报文,提高了系统的效率。
对于上述第二方面,另一种可能的实现方式是当一组报文所对应的位图位的值都被置为有效时,说明该组报文已经全部收齐,目的端向源端发送确认应答报文。通过这种做法,可以判断何时一组报文已经全部收齐。
对于上述第二方面,另一种可能的实现方式是当目的端接收到的报文没有携带含有指示该报文在目的端的写入地址的部分时,先缓存该报文,并确认报文是否发生乱序、丢包以及是否收齐。当确认整组报文都已经收齐后,根据报文的报文序号进行乱序重排,并在乱序重排后,将报文写入内存中。通过这种做法,可以接收没有携带含有指示该报文在目的端的写入地址的报文,并可以进行乱序重排。
第三方面,提供一种网络接口卡,该网络接口卡位于远程直接内存访问RDMA的源端设备,该网络接口卡上设置有源队列对,每队列对包括发送队列;该网络接口卡包括:获取模块,用于从源队列对的发送队列中获取Q个数据段;发送模块,用于封装Q个数据段得到Q个报文,并发送Q个报文,其中,Q个报文中的每个报文携带第一报头、第二报头和队列对标识,每个报文携带的第一报头用于指示本报文在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数。
对于上述第三方面,一种可能的实现方式是,发送模块具体用于根据源队列对配置的源端口号信息,依次封装Q个数据段得到Q个报文,每封装完成一个报文就发送封装后的报文,并在每封装完成N个报文后,更新源队列对配置的源端口号信息,前一组N个报文携带的源端口号信息与后一组N个报文携带的源端口号信息不同,N大于等于1,小于Q。
对于上述第三方面,另一种可能的实现方式是,发送模块具体用于将Q个数据段划分为M个分组,每个分组中包括至少一个数据段,依次封装每个分组中的数据段得到每个分组中的报文,其中,每个分组中的报文携带的源端口号信息相同,至少两个分组中的报文携带的源端口号信息不同,M小于等于Q。
对于上述第三方面,另一种可能的实现方式是,还包括确定模块,用于根据Q个数据段的第一个数据段的基地址和每个数据段的长度,确定Q个报文中的每个报文在目的端设备的内存中的写入地址。
对于上述第三方面,另一种可能的实现方式是,Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指示本报文在Q个报文中的发送顺序。
第四方面,提供一种设备,该设备包括主处理系统和网络接口卡;主处理系统用于处理业务,在需要将业务数据发送到目的端设备时,将业务数据发送到网络接口卡中的业务数据对应的源队列对的发送队列;网络接口卡用于从业务数据对应的源队列对的发送队列中获取Q个数据段,该Q个数据段属于业务数据,封装该Q个数据段得到Q个报文,并发送Q个报文,其中,Q个报文中的每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数。
对于上述第四方面,一种可能的实现方式是,网络接口卡封装Q个数据段得到Q个报文,并发送Q个报文包括:根据源队列对配置的源端口号信息,依次封装Q个数据段得到Q个报文,每封装完成一个报文就发送封装后的报文,并在每封装完成N个报文后,更新源队列对配置的源端口号信息,前一组N个报文携带的源端口号信息与后一组N个报文携带的源端口号信息不同,N大于等于1,小于Q。
对于上述第四方面,另一种可能的实现方式是,网络接口卡封装Q个数据段得到Q个报文,并发送Q个报文包括:将Q个数据段划分为M个分组,每个分组中包括至少一个数据段,依次封装每个分组中的数据段得到每个分组中的报文,其中,每个分组中的报文携带的源端口号信息相同,至少两个分组中的报文携带的源端口号信息不同。
对于上述第四方面,另一种可能的实现方式是,网络接口卡还用于根据Q个数据段的第一个数据段的基地址和每个数据段的长度,确定Q个报文中的每个报文在目的端设备的内存中的写入地址。
对于上述第四方面,另一种可能的实现方式是,网络接口卡在封装Q个报文时,Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指示本报文在Q个报文中的发送顺序。
第五方面,提供一种网络接口卡,该网络接口卡位于远程直接内存访问RDMA的目的端设备,该网络接口卡上设置有目的队列对,目的队列对包括接收队列;该网络接口卡包括:接收模块,用于接收Q个报文,其中,每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数,目的端设备为RDMA的目的端设备;执行模块,用于根据Q个报文各自携带的本报文在目的端设备的内存中的写入地址,分别将Q个报文从目的队列对保存到目的端设备的内存中。
对于上述第五方面,一种可能的实现方式是接收模块具体用于依次接收Q个报文;在接收模块每接收到一个报文,执行模块就执行将接收到的报文保存到目的端设备的内存中的步骤。
对于上述第五方面,另一种可能的实现方式是,Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指明本报文在所述Q个报文中的发送顺序;网络接口卡还包括:检测模块;在接收模块每接收到一个报文,检测模块用于记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号;在接收到下一个报文后,确定接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,如果否,启动丢包检测流程;如果通过丢包检测流程确定在报文传输过程中发生丢包,则向源端设备发送报文重传指示。
对于上述第五方面,另一种可能的实现方式是,目的队列对设置有位图,位图至少包括Q个位图位,该Q个位图位按照Q个报文的发送顺序对应于Q个报文,位图设置有头指针和尾指针,头指针指向本队列的接收队列最新接收到的报文所对应的位图位,尾指针指向本队列对的接收队列预备接收的下一个报文;检测模块具体用于根据当前接收到的报文的报文序号,将位图中代表当前接收到的报文的位图位设置为有效,并将头指针指向代表当前接收到的报文的位图位;以及,根据当前接收到的报文的报文序号,确定当前接收到的报文是否是尾指针当前指向的位图位所对应的报文,如果是,更新尾指针的指向,尾指针新的指向为当前接收到的报文所对应的位图位之后的无效的位图位中的第一个位图位,如果否,保持尾指针当前指向的位图位不变。
对于上述第五方面,另一种可能的实现方式是,检测模块确认接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,包括:根据接收到的下 一个报文的报文序号,确定尾指针当前是否指向接收到的下一个报文所对应的位图位。对于上述第五方面,另一种可能的实现方式是,检测模块执行丢包检测流程具体包括:针对尾指针当前指向的位图位所对应的报文启动定时器,若在定时器超时后,尾指针的指向不发生改变,确定尾指针当前指向的位图位所对应的报文发生丢包。
对于上述第五方面,另一种可能的实现方式是,检测模块执行丢包检测流程具体包括:确定头指针当前指向的位图位是否超过预定值,如果超过,确定头指针和尾指针之间的位图位所对应的报文发生丢包。
对于上述第五方面,另一种可能的实现方式是,检测模块向源端设备发送报文重传指示包括:向源端设备发送报文重传指示,该重传指示携带尾指针当前指向的位图位所对应的报文的报文序号,以请求源端设备将Q个报文中的尾指针当前指向的位图位所对应的报文之后的所有报文进行重新发送。
第六方面,提供一种设备,该设备包括主处理系统和网络接口卡,该主处理系统用于从设备的内存中获取应用数据,以及根据该应用数据处理业务;该网络接口卡用于接收Q个报文,其中,每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数;根据Q个报文各自携带的本报文在目的端设备的内存中的写入地址,分别将Q个报文从目的队列对保存到目的端设备的内存中。
对于上述第六方面,一种可能的实现方式是,网络接口卡接收Q个报文包括:依次接收Q个报文;保存Q个报文到目的端设备的内存中包括:每接收到一个报文,就执行将接收到的报文保存到目的端设备的内存中的步骤。
对于上述第六方面,另一种可能的实现方式是,Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指明本报文在Q个报文中的发送顺序。该实现方式还包括:每接收到一个报文,记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号;在接收到下一个报文后,确定接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,如果否,启动丢包检测流程;如果通过丢包检测流程确定在报文传输过程中发生丢包,则向源端设备发送报文重传指示。
对于上述第六方面,另一种可能的实现方式是,目的队列对设置有位图,该位图至少包括Q个位图位,该Q个位图位按照Q个报文的发送顺序从前往后对应于该Q个报文,位图设置有头指针和尾指针,头指针指向接收队列最新接收到的报文所对应的位图位,尾指针指向接收队列预备接收的下一个报文;每接收到一个报文,记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号,包括:根据当前接收到的报文的报文序号,将位图中代表当前接收到的报文的位图位设置为有效,并将头指针指向代表当前接收到的报文的位图位;以及,根据当前接收到的报文的报文序号,确定当前接收到的报文是否是尾指针当前指向的位图位所对应的报文,如果是,更新尾指针的指向,尾指针新的指向为所述当前接收到的报文所对应的位图位之后的无效的位图位中的第一个位图位,如果否,保持尾指针当前指向的位图位不变。
对于上述第六方面,另一种可能的实现方式是,网络接口卡确认接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,包括:根据接收到的下一个报文的报文序号,确定尾指针当前是否指向接收到的下一个报文所对应的位图位。
对于上述第六方面,另一种可能的实现方式是,网络接口卡进行丢包检测流程包括:针对尾指针当前指向的位图位所对应的报文启动定时器,若在定时器超时后,尾指针的指向不发生改变,确定尾指针当前指向的位图位所对应的报文发生丢包。
对于上述第六方面,另一种可能的实现方式是,网络接口卡进行丢包检测流程包括:确定头指针当前指向的位图位是否超过预定值,如果超过,确定头指针和尾指针之间的位图位所对应的报文发生丢包。
对于上述第六方面,另一种可能的实现方式是,网络接口卡向源端设备发送报文重传指示包括:向源端设备发送报文重传指示,该重传指示携带尾指针当前指向的位图位所对应的报文的报文序号,以请求源端设备将Q个报文中的尾指针当前指向的位图位所对应的报文之后的所有报文进行重新发送。
第七方面,提供一种通信装置,该通信装置包括处理器以及与该处理器耦合的存储器,处理器用于根据存储器中加载的程序指令执行如第一方面所述的报文传输的方法。
第八方面,提供一种通信装置,该通信装置包括处理器以及与该处理器耦合的存储器,处理器用于根据存储器中加载的程序指令执行如第二方面所述的报文传输的方法。
第九方面,提供一种通信系统,该通信系统包括源端设备、目的端设备和至少一个路由设备,源端设备和目的端设备之间通过以太网进行远程直接内存访问RDMA,源端设备和目的端设备之间的通信路径至少包括一个路由设备相连,源端设备的网络接口卡上包括源队列对,源队列包括发送队列;目的端设备的网络接口卡上包括目的队列对,目的队列对包括接收队列;源端设备,用于从源队列对的发送队列中获取Q个数据段,分别封装该Q个数据段得到Q个报文,并分别发送该Q个报文,其中,Q个报文中的每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数;至少一个路由设备,用于接收源端设备发送的Q个报文,根据Q个报文中的每个报文携带的源端口号信息,分别为每个报文确定转发路径,并分别根据确定的转发路径转发每个报文;目的端设备,用于接收Q个报文,根据Q个报文各自携带的本报文在目的端设备的内存中的写入地址,分别将Q个报文从目的队列对保存到目的端设备的内存中。
所述源端设备还用于进一步执行上述第一方面的方法,所述目的端设备还用于执行上述第二方面的方法。
第十方面,提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如第一方面所述的报文传输的方法。
第十一方面,提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如第二方面所述的报文传输的方法。
附图说明
图1为本申请的实施例中数据通信系统的组成示意图。
图2为采取RoCE协议进行传输的数据通信系统的示意图。
图3为现有技术中两台服务器之间进行RoCE协议下的报文传输而产生负载不均衡的示意图。
图4为本申请的实施例中两台服务器之间进行RoCE协议下的报文传输的系统的组成示意图。
图5为本申请的一个实施例中的源端的流程示意图。
图6为本申请的一个实施例中的目的端的流程示意图。
图7为现有技术中RoCEv2协议下报文的帧结构示意图。
图8为本申请的一个实施例中封装后的报文的帧结构示意图。
图9为本申请的一个实施例中的位图结构的示意图。
图10为本申请的一个实施例中的位图在数据通信系统中的应用的示意图。
图11为本申请的一个实施例中的位图在目的端接收到乱序报文时的示意图。
图12为本申请的一个实施例中的位图在目的端接收到当前预备接收的下一个报文时的示意图。
图13为本申请的另一个实施例中的源端的流程示意图。
图14为本申请的另一个实施例中的目的端的流程示意图。
图15为本申请的实施例中的源端设备的网络接口卡的功能结构的示意图。
图16为本申请的实施例中的目的端设备的网络接口卡的功能结构的示意图。
图17为本申请的实施例中的通信装置的结构的示意图。
图18为本申请的实施例中的源端设备的结构示意图。
图19为本申请的实施例中的目的端设备的结构示意图。
具体实施方式
为了使本申请的上述目的、技术方案和优点更易于理解,下文提供了详细的描述。所述详细的描述通过使用方框图、流程图和/或示例提出了设备和/或过程的各种实施例。由于这些方框图、流程图和/或示例包含一个或多个功能和/或操作,所以本领域内人员将理解可以通过许多硬件、软件、固件或它们的任意组合单独和/或共同实施这些方框图、流程图或示例内的每个功能和/或操作。
以下为本申请文件中相关的术语:
RDMA(Remote Direct Memory Access)技术全称远程直接内存访问,是为了解决网络传输中服务器端数据处理的延迟而产生的。RDMA通过网络把一个服务器中的数据直接传入另一个服务器的存储区,将数据从一个系统快速移动到其它系统的存储器中,而不对本系统的操作系统造成影响,这样就不需要使用到多少本系统的计算处理功能。它消除了外部存储器复制和上下文切换的开销,因而能解放内存带宽和CPU周期用于改进应用系统性能。其中,采用以太网进行RDMA被称为RoCE。
如图1所示,在数据通信系统100,服务器可大致分为软件层和硬件层(图1中以两个服务器为例示出),其中,软件层包括至少一个应用程序,而硬件层则主要由处理器111、内存121和网络接口卡131等组成。本实施例中,服务器101上的一个应用程序的数据需要通过RoCE协议共享到另一个服务器102上,以供另一个服务器102上的应用程 序使用。
如图2所示,数据通信系统200中包括服务器201和服务器202,其中,服务器201中包含网络接口卡241和主处理系统281,主处理系统281包括主机CPU261以及主机内存271(其它计算机系统的常规硬件如硬盘、总线等图2未示出),主处理系统281上还运行各种软件组件,例如操作系统251,以及在操作系统251上运行的应用程序211。服务器202中包含网络接口卡242和主处理系统282,主处理系统282包括主机CPU262以及主机内存272,主处理系统282上还运行各种软件组件,例如操作系统252,以及在操作系统252上运行的应用程序212。
网络接口卡241(也可以称为网络适配器或通信适配器)中有缓存221,缓存221中可以设置队列对(英文全称:Queue Pair,缩写:QP),图2中所示为QP231(网络接口卡中的QP根据上层应用的需求设置,可以设置多个QP,图2中以一个QP为例)。QP是网络接口卡提供给应用程序的虚拟接口,由一个发送工作队列(英文:Send Work Queue)和接收工作队列(英文:Receive Work Queue)组成,发送工作队列和接收工作队列永远是一同产生并成对出现的,它们将在其存在的时间内一直保持成对的状态。应用程序向网络接口卡发送的指令被称为工作队列元素(英文全称:Work Queue Element,WQE)。在服务器201中的应用程序211通过RDMA的方式向服务器202中的应用程序212发送数据之前,服务器201和服务器202先建立QP配对,即明确由QP231与QP232共同实现应用程序211与应用程序212之间的数据传输,并在之后发送的报文中加入相应的队列对标识(英文:QP ID)。
RDMA的工作过程通常分为三个部分。第一步,当服务器201上的应用程序211执行RDMA请求时,不执行在主处理系统的内存上的任何数据复制,RDMA请求从应用程序211的缓存被发送至网络接口卡241中的缓存221中的队列对的发送队列。第二步,网络接口卡241读取缓存221中发送队列的内容(数据),将内容通过报文的形式发送到服务器202中的QP232中,从而写入网络接口卡242的缓存222中。第三步,网络接口卡242收到数据后,直接将该数据写入主处理系统的应用程序212对应的内存中。
在报文经过多路径的以太网从服务器201到达服务器202的过程中,以太网中的路由设备根据报文的五元组信息选择转发路径。具体来说,路由设备通过对报文的五元组信息,即报文的源端口号、目的端口号、源IP地址、目的IP地址和协议类型等五个部分进行哈希计算,根据计算出的哈希值作为报文转发路径的依据。如图3所示,该数据通信系统300中的两台服务器,服务器301和服务器302之间通过多台路由器连接在一起,并进行RoCE协议下的通信。每个服务器上有多个QP,例如,如图中所示,服务器301上有QP351和QP352,服务器302上有QP353和QP354。由于在现有技术中,同一个QP在发送数据时使用同一个源端口号,服务器301中的QP351向服务器302中的QP发送数据时,报文的五元组信息保持不变,所以作为选路依据的哈希值也相同,导致QP351发送的所有数据都会选择同样的路径,例如都选择经路由器321将数据发送至服务器302中的QP上。当QP351所发送的数据量较大时,会导致连接路由器321的网络路径的负载较大,从而使得整个报文传输系统的路径的负载不均衡。再加上RoCE网络的快启动特性,即在RoCE网络中,源服务器从启动数据发送开始,就以最大能力进行数据的发送。当网络流量达到一定值时,会导致网络产生拥塞的概率大幅增加。并且,数据通信系统300 中往往不止两台服务器,可能会有更多的服务器与路由器321相连。当与路由器321连接的网络路径产生拥塞时,会影响到所有与其相连的服务器的报文传输。当网络产生拥塞之后,不仅会导致网络的时延增加,也会导致网络丢包的可能性增加。而RoCE网络对于丢包比较敏感,随着网络丢包率的增大,网络传输的有效带宽就会快速下降。
基于希望达到RoCE协议的报文在多路径网络中的传输进一步均衡的目的,本申请提供一种粒度更细化的报文传输方法和相关的装置,在源端侧发送多个报文时,对相同QP所发出的的报文进行进一步的分组,使得属于相同QP的不同分组的报文,其报文中携带的源端口号信息不相同,从而使得由相同QP发出的报文,在经过多路径网络时,经过哈希算法获得了不同的路径,从而即使由该QP发出的报文在某个时间段内发生流量异常增大的情况,也能够避免这些流量都经过相同的路径,并避免某条路径的拥塞所引发的整个多路径网络的传输不均衡和拥塞现象。由于在源端发送报文时,本申请将相同QP发出的报文的源端口号信息进行修改,这种修改使得携带不同源端口号信息的报文在多路径网络中可能经过不同的路径到达目的端,由于各条路径的长短和效率不同,因此可能使得这些报文到达目的端的顺序与这些报文在源端发送的顺序不同,这样有可能导致目的端接收到报文后无法将这些报文保存到真正的目的地。在现有技术中,相同QP发出的报文携带相同的源端口号信息,这些报文在相同的路径进行转发,目的端的接收顺序与源端的发送顺序一致,因此RoCEv2协议规定源端相同QP发出的报文只在首包报文中携带报文中的数据在目的端的内存中的写入地址,其它非首包报文不需要携带相关的写入地址,目的端根据接收到的报文的顺序可以实现报文写入对应的内存地址。在本申请源端相同QP发出的报文采用不同的源端口号信息发出之后,为了避免乱序的报文在目的端无法写入真正的目的地的问题,本申请还对相同QP发出的报文进行扩展,使得报文的报头与现有技术存在一定区别,并由此解决对应的问题。
图4所示的是本申请的实施例的系统结构图。如图所示,数据通信系统400包括2台服务器,服务器401和服务器402(图中所示为2台,实际数量可能是2台或更多台)。服务器401和服务器402分别与路由器411和路由器412直接相连,而路由器411和路由器412之间又通过路由器421、路由器422、路由器423、路由器424等4台路由器相连。服务器501中包括处理器431和网络接口卡441,网络接口卡441中包含若干QP,图中所示为QP451和QP452,其中每个QP对应设置有一个位图。服务器402的构成情况和服务器401相似,由处理器432和网络接口卡442组成。其中,网络接口卡441和网络接口卡442支持RoCEv2协议,服务器401和服务器402之间通过QP进行RDMA通信。在图4中的位图是本申请的其中一个实施例在目的端进行报文的接收和排序的具体实现,在其他的实施例中也可以采用其他的方法来实现。
图5和图6所示的是本申请的一个实施例的源服务器发送报文和目的服务器接收报文的流程图。
如图5所示,源服务器所进行的步骤如下:
S1:网络接口卡441从QP451的发送队列中获取待发送的Q个数据段。一般来说,当源服务器401中的应用程序提交一个工作请求后,该工作请求将直接被发送至网络接口卡441中相应的QP上。网络接口卡441进而可以读取该工作请求,并使QP来执行该工作请求。在本实施例中,该工作请求的内容是发送一组应用数据,该组应用数据可以包 括Q个数据段,其中Q为大于等于2的正整数。
S2:确认该获取的数据段封装而成的报文将要写入目的服务器402的内存的地址。其中,该地址是根据Q个数据段的基地址和Q个数据段在该获取的数据段之前的数据段的长度计算出来的。
在源服务器401通过RDMA的方式向目的服务器402发送数据之前,源服务器401先和目的服务器402进行通信,目的服务器402将源服务器401将要发送的数据封装而成的报文的基地址通知源服务器401,其中,基地址指的是该组报文的第一个报文在目的服务器内存中的写入地址的首地址。
S3:封装该获取的数据段,得到封装后的报文。
图7所示的是现有的RoCEv2的报文格式。与RoCEv1的格式相比,RoCEv2的报文格式加入了用户数据报协议(英文全称:User Datagram Protocol,缩写:UDP)报头部分,从而支持以太网IP路由功能,增强了RoCE网络的扩展性。其中,UDP的报头由源端口号、目的端口号、长度、校验和以及数据等五个部分组成。其中,在RoCEv2的报文中,UDP的目的端口号的值根据协议规定,固定为4791。由于数据通信系统中有多个服务器,每个服务器上有多个QP,每个QP的源端口号的值一般不同。
在本申请的实施例中,对封装后的报文的扩展主要分为两部分,具体如下:
如图8所示,一部分是给数据段增加第一报头,该第一报头中携带有用于指示本报文在目的端的内存中的写入地址的信息。具体来说,如果该数据段是该组数据中的第一个数据段,则在数据段的基本传输报头(英文全称:Base Transport Header,缩写:BTH)部分后加入RDMA扩展传输报头(英文全称:RDMA Extended Transport Header,缩写RETH)部分;如果不是第一个数据段,则在数据段的BTH部分后加入扩展报头(英文全称:Extended Header,缩写:EXH)部分。其中,每个WQE的第一个和最后一个数据段的BTH部分分别包含相应的信息,用以指示该报文是WQE中的第一个数据段或者是最后一个数据段。
其中,RETH部分包括虚拟地址(英文:Virtual Address)、远程秘钥(英文:Remote Key)、DMA长度(英文:DMA Length)等三个部分。其中Virtual Address部分的长度是64比特,记载的是进行RDMA操作后对应的目的端的虚拟地址;Remote Key部分的长度是32比特,记载的是允许进行RDMA操作的授权信息;DMA Length部分的长度是32比特,记载的是进行DMA操作的报文的字节数。EXH头部包括Virtual Address、立即数(英文:Immediate)、WQE序号(英文:WQE Number)、保留字段(英文:Reserved)等四个部分。其中,Virtual Address部分和RETH头部中的Virtual Address部分一样,长度是64比特,记载的是当前报文需要写入的目的端的内存地址;Immediate部分的长度是1比特,记载的是当前报文是否携带立即数;WQE Number部分的长度是31比特,记载的是QP发送的WQE序号;Reserved部分的长度是32比特,为保留字段。其中,EXH头部除了必须要有Virtual Address部分外,剩下的三个部分可以根据实际需要进行调整。
采用将包含虚拟地址的报头封装在报文上的做法,可以使得报文到达目的端时可以快速地写入内存之中。同时,由于报文具有虚拟地址部分,即使报文在网络传输途中发生了乱序,也可以通过根据虚拟地址写入目的端的内存中的相应位置。
另一部分是给数据段增加第二报头。其中,第二报头中携带源队列对的源端口号信息。和现有技术相比,本申请的实施例中的Q个数据段封装而成的Q个报文中至少有两个报文具有不同的源端口号信息。路由器在根据五元组信息选择转发的路径时,由于其中的源端口信息不同,具有不同源端口号信息的报文很大可能会选择不同的转发路径。因为这种对同一QP发送的报文的不同的源端口号信息的设置方式,能够将同一QP发出的报文流量分担到不同的转发路径,即使该QP发出的报文流量出现较大的情况,也不会引起整个多路径网络的某条路径的拥塞。
可选的,还可以在数据段中的BTH部分添加报文序号(英文全称:Packet Sequence Number,缩写:PSN),该报文序号用来表示该数据段在Q个数据段中的顺序。
S4:每当一个数据段封装成报文后,发送该报文。
S5:判断是否已经发送预设数量的报文。当已经发送预设数量的报文后,进行S6;当还没有发送预设数量的报文后,跳转至S1。
可选的,判断是否已经发送预设数量的报文,并对源队列对的端口信息进行更新时,该预设数量可以是变化的。例如,可以先发送3个报文后,对源队列对的端口信息进行更新,再发送4个报文后,对源队列对的端口信息进行更新。该预设数量也可以是固定的,例如每次都在发送3个报文后,对源队列对的端口信息进行更新。
S6:当已经发送预设数量的报文后,对源队列对的端口信息进行更新。通过这种方法,使得该组数据封装而成的报文的第二报头,具有不同的源端口号。从而当报文在网络中进行传输时,路由器根据报文的五元组信息的哈希值进行选路。由于报文拥有不同的源端口号,所以得出的哈希值很可能会不相同,从而选择不同的路径进行传输,使得网络中各个路径的流量更加均衡。
对于当前RoCE协议的规定而言,每个QP只会采用固定的源端口号,其在网络中的转发路径是固定的,只要不发生丢包的情况,报文就不会出现乱序的问题。而在上述实施例中,为了达到流量均衡的目的,QP对应的源端口号信息是在发生变化的,因此在网络上报文转发的路径也在发生着变化。由于不同网络路径对报文的处理时间可能会不同,所以报文在目的端可能会出现乱序。而源端通过给报文封装RETH或EXH扩展报头,将欲写入的目的服务器的内存的虚拟地址放入报文中。当报文到达目的端时,可以直接根据RETH或EXH扩展报头里的虚拟地址写入目的服务器相应的内存位置,相当于已经恢复了其在源端发送时的顺序。
S7:判断该Q个数据段是否已经发送完。如果还有数据段没有发送,则跳转至S1。
需要注意的是,上述S1-S7的编号仅用来指代,并不意味着在本申请的实施例中,上述步骤需要按照特定顺序来执行。例如,S2确认写入地址的步骤也可以S1之前。
在本申请的另一个实施例中,源端还可以将待发送的Q个数据段分成至少两个分组,每个分组包含至少一个数据段。封装每个分组中的数据段得到每个分组中的报文,其中,每个分组中的报文携带的源端口号信息相同,至少两个分组中的报文携带的源端口号信息不同。在源端发出Q个报文之后,该Q个报文经过路由器进行转发。路由器根据该Q个报文的五元组信息选择转发路径。当Q个报文的源端口号信息不同时,可能会被选择以不同的路径进行转发,因此该Q个报文到达目的端的顺序可能与该Q个报文在源端发出的顺序不同。目的端在接收到该Q个报文后,需要将该Q个报文中的数据段保存到对 应的地址。另外,现有技术中,RoCE协议规定目的端按报文的发送顺序接收报文,如果接收到的报文发生乱序,目的端要马上发送重传指示,使得源端重新发送在传输路径中可能丢失的报文。然而,本申请的上述实施例在发送端改变了相同QP发出的报文的源端口号信息,使得上述Q个报文到达目的端的顺序与该Q个报文的发送顺序大概率不同,在这种情况下,如果目的端一旦确定接收的报文发生乱序,就马上发送重传指示,报文重传的代价会较大。本申请在目的端还对接收到的报文进行乱序检测,在检测到发生乱序情况时并不立刻向源端发送报文重传指示,而是启动报文丢包检测流程,在根据该丢包检测流程确定发生丢包时,才向源端发送报文重传指示,提高系统的传输效率。目的端的具体流程实施例参见图6。
图6中,在本申请的实施例中,通过接收到的报文携带的报文序号,用来检验源服务器发送的报文是否出现乱序、丢包以及判断是否收齐。其中,该方法可以通过位图、数组和链表等方式进行实现。本申请的实施例将以位图为例进行说明。
图9-12所示的是位图算法在本申请的实施例中的原理,其中:
图9所示的是实现位图算法的位图示意图。如图9所示,在本申请的实施例中,每个QP都对应一个位图,用来记录报文的接收情况。每个位图包括多个位图位,每个位图位代表一个报文,将位图的位图位从前向后进行标号,并与报文的报文序号的值建立起对应关系,每个位图位按照报文的发送顺序从前往后与报文进行对应。每个位图还具有尾指针和头指针,尾指针指向该位图对应的队列对的接收队列预备接收的下一个报文所对应的位图位;头指针指向当前最新接收到的报文对应的位图位。当位图中的位图位的值为有效时,代表着该位图位对应的报文已经收到;当位图中的位图位的值为无效时,代表着该位图位对应的报文还没有收到,其中,有效既可以用值为1来表示,也可以用值为0来表示,在本申请的实施例中,有效用值为1来表示。同时,根据所需要进行排序的报文的报文序号的值的范围,设定所使用的位图的范围,若源端发送Q个报文,则目的端对应的位图至少包括Q个位图位。在该位图的范围中,最前端对应具有最小的报文序号的值的报文。
其中,上述的尾指针的指向是预备接收的下一个报文,一般指该位图对应的队列对的接收队列当前未接收到,并在下一个即将接收到的报文中预备接收到的报文,并且,该预备接收到的下一个报文是当前未接收到的报文中最新发送的报文,也可以说,预备接收的下一个报文一般指目的端当前没有接收到的报文中具有最小报文序号的报文,例如,源端发送Q个报文,该Q个报文的发送顺序为1到Q,报文序号为1表示最先发送的报文,报文序号为Q表示最后发送的报文,如果目的端接收到了报文序号分别为1、2、5的报文,则预备接收的下一个报文为报文序号为3的报文,该尾指针也指向报文序号为3的报文对应的位图位。
图10-12所示的是位图中位图位的值以及头指针和尾指针的位置是如何根据接收到的报文而发生变化的。例如,如图10所示,服务器401中的QP451向服务器402中的QP452发送10个报文,报文的报文序号分别为1-10,对应的位图也有10个位图位,从前端到后端(图中所示为从右向左)分别编号为1-10,从而和报文一一对应。报文在传输过程中顺序发生了改变,到达目的端QP452的顺序为3、1、2、4、5、6、7、8、9、10。
如图11所示,当目的端QP453接收到报文序号的值为3的报文时,头指针移动到对 应的3号位图位上,并将该位图位的值置为有效。由于尾指针所指向的位置是预备接收的下一个报文所对应的位图位,也就是报文序号的值为1的报文,因此尾指针保持不动。
如图12所示,当目的端QP453接收到报文序号的值为1的报文时,头指针移动到对应的1号位图位上,并将该位图位的值置为有效。而尾指针收到了当前预备接收的下一个报文,因此发生移动,其新的指向为当前接收到的报文所对应的位图位之后的无效的位图位中的第一个位图位,即2号位图位。
如图6所示,在本申请的一个实施例中,目的端所进行的步骤如下:
S1:目的端依次接收源端发送的报文,将该报文缓存到相应的目标队列对中。
S2:由于接收的报文在源端时,报头被加入了RETH部分或者EXH部分,而RETH部分和EXH部分中都包含有该报文所应该写入的目的端的内存的地址。目的端根据报文中包含的虚拟地址,将该接收到的报文写入相应的内存的地址中。
S3:每接收到所述Q个报文中的一个报文,记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号;在接收到所述Q个报文中的下一个报文后,确定接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致。
以采取位图进行检验的方式为例,当接收到源端发送的报文后,根据该报文的报文序号,将位图中代表该报文所对应的位图位的值设置为有效,即为1,并将该位图的头指针指向该报文对应的位图位。而位图的尾指针指向的是当前预备接收的下一个报文所对应的位图位。因此,当位图中的头指针和尾指针指向不同的位图位时,可以判断接收到的报文不是当前预备接收的下一个报文,即接收到的报文发生乱序。当接收到的报文是当前预备接收的下一个报文时,直接进行S5,判断报文是否已经收齐;如果接收到的报文不是当前预备接收的下一个报文时,进行S4。
S4:当接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号不一致时,启动丢包检测流程,判断报文在传输过程中是否发生丢包。仍然以采取位图进行检验的方式为例,当判断接收到的报文不是当前预备接收的下一个报文的情况产生时,启动定时器。如果定时器超时后,尾指针的指向没有发生改变,说明在预设的时间内,目的端没有接收到尾指针所指向的位图位所对应的报文,从而说明尾指针当前指向的位图位所对应的报文发生了丢包。而如果尾指针当前指向的位图位所对应的报文接收到了,尾指针发生移动,定时器也将进行重置。
判断报文在传输过程中是否发生丢包还有另一种方法。当判断有接收到的报文不是当前预备接收的下一个报文的情况产生时,确定头指针当前指向的位图位是否超过了预定值T。如果头指针当前指向的位图位超过了预定值T,说明当前头指针指向的位图位和尾指针指向的位图位之间的某个位图位所对应的报文发生了丢包。该预定值T可以根据实际需要进行设定,例如,该预定值T可以设定为Q,也就是该组报文的数目,在这种情况下,当头指针指向的位图位超过了预定值T时,说明目的端在该组报文还没有收齐的情况下,已经接收到了下一组报文,可以判断发生了丢包。
如果通过所述丢包检测流程确定在报文传输过程中发生丢包,目的端向源端发送否定应答报文,通知源端,报文的传输过程有错误。同时,向源端发送报文重传指示,该重传指示中携带有尾指针当前指向的位图位所对应的报文的报文序号,以请求源端重传 该报文序号所对应的报文之后的所有报文。通过这种做法,在目的端接收到乱序的报文时,可以较为准确的确定哪些情况可能发生丢包,并在确定发生丢包时,才指示源端进行报文重传,从而提高了系统的效率。
当判断没有出现丢包时,进行S5。
S5:判断报文是否已经收齐。当该组报文所对应的位图位的值都被置为有效时,说明该组报文已经被收齐,进行S6。如果报文还没有被收齐,则重新回到S1。
S6:当报文已经被收齐时,目的端向源端发送确认应答报文。
图13和图14所示的是本申请的另一个实施例的源端和目的端的流程图。
如图13所示,源端所进行的步骤如下:
S1:网络接口卡441从QP451的发送队列中获取待发送的Q个数据段。
S2:封装该获取的数据段,得到封装后的报文。和本申请的前述的实施例不同,在这里只向数据段添加携带该源队列对的端口信息的第二报头,以及给每组数据的第一个数据段添加携带写入目的端内存的RETH报头,而不向剩下的数据段添加携带写入目的端内存的EXH报头。
S3:每当一个数据段封装成报文后,发送该报文。
S4:判断是否已经发送预设数量的报文。当已经发送预设数量的报文后,进行S5;当还没有发送预设数量的报文后,跳转至S1。
S5:当已经发送预设数量的报文后,对源队列对的端口信息进行更新。
S6:判断该组数据是否已经发送完。如果还有数据没有封装并发送,则跳转至S1。
如图14所示,在本申请的第二个实施例中,目的端所进行的步骤如下:
S1:目的端依次接收源端发送的报文,并将这些报文缓存至相应的队列对中。
S2:判断接收到的报文是否是当前预备接收的下一个报文。如果不是,进行S3;如果报文是,则进行S4。
S3:判断报文是否发生丢包。如果报文发生丢包,目的端向源端发送否定应答报文,通知源端,报文的传输过程有错误,并向源端发送报文重传指示。如果报文没有发生丢包,则进行S4。
S4:判断报文是否已经收齐。如果报文已经收齐,进行S5;如果报文还没有收齐,则重新回到S1。
S5:当报文已经收齐后,根据报文所携带的报文序号进行乱序重排,使得缓存中的报文恢复顺序。
S6:当缓存中的报文被排好序后,将其写入内存之中。
S7:目的端向源端发送确认应答报文。
基于上述技术方案,参阅图15所示,本申请的实施例提供一种网络接口卡1500,该网络接口卡1500位于位于远程直接内存访问RDMA的源端设备,该网络接口卡1500上设置有源队列对,源队列对包括发送队列;该网络接口卡1500包括:
获取模块1510,用于从至少两个源队列对中的第一源队列对的发送队列中获取Q个数据段;
发送模块1520,用于封装Q个数据段得到Q个报文,并发送Q个报文,其中,Q个报文中的每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文 在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数,目的端设备为RDMA的目的端设备;
确定模块1530,用于根据Q个数据段的第一个数据段的基地址和每个数据段的长度,确定Q个报文中的每个报文在目的端设备的内存中的写入地址。
本申请的实施例所提供的网络接口卡1500,其功能的实现可以参考如图5所示的报文传输的方法。
基于上述技术方案,参阅图16所示,本申请的实施例提供另一种网络接口卡1600,该网络接口卡1600位于远程直接内存访问RDMA的目的端设备,该网络接口卡1600上设置有目的队列对,目的队列对包括接收队列;该网络接口卡1600包括:
接收模块1610,用于接收Q个报文,其中,每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数;
执行模块1620,用于根据Q个报文各自携带的本报文在目的端设备的内存中的写入地址,分别将Q个报文从目的队列对保存到目的端设备的内存中。
检测模块1630,在接收模块每接收到一个报文,检测模块用于记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号;在接收到下一个报文后,确定接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,如果否,启动丢包检测流程;如果通过丢包检测流程确定在报文传输过程中发生丢包,则向源端设备发送报文重传指示。
本申请的实施例所提供的网络接口卡1600,其功能的实现可以参考如图6所示的报文传输的方法。
图17为依据本申请的实施例的通信装置1700的结构示意图。本实施例中的通信装置可以是上述各实施例中的网络接口卡的其中一种具体实现方式。
如图17所示,通信装置包括处理器1701,处理器1701与存储器1705连接。处理器1701可以为中央处理单元CPU,或现场可编程门阵列(英文全称:Field Programmable Gate Array,缩写:FPGA),或数字信号处理器(英文全称:Digital Signal Processor,缩写:DSP)等计算逻辑或以上任意计算逻辑的组合。处理器1701也可以为单核处理器或多核处理器。
存储器1705可以是RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质,存储器可以用于存储程序指令,该程序指令被处理器1701执行时,处理器执行上述实施例中的源端或目的端的方法。
连接线1709用于在通信装置的各部件之间传递信息,连接线1709可以使用有线的连接方式或采用无线的连接方式,本申请并不对此进行限定。连接1709还连接有网络接口1704。
网络接口1704使用例如但不限于电缆或电绞线一类的连接装置,来实现与其他设备或网络1711之间的通信,网络接口1704还可以通过无线的形式与网络1711互连。
本申请实施例的一些特征可以由处理器1701执行存储器1705中的程序指令或者软件代码来完成/支持。存储器1705上在加载的软件组件可以从功能或者逻辑上进行概括,例如,图15所示的获取模块、发送模块等功能/逻辑模块,或者图16所示的接收模块和执行模块等功能/逻辑模块等。
在本申请的一个实施例中,当存储器1705加载进程序指令后,处理器1701执行存储器中的上述功能/逻辑模块相关的事务。
此外,图17仅仅是一个通信装置的例子,通信装置可能包含相比于图17展示的更多或者更少的组件,或者有不同的组件配置方式。同时,图17中展示的各种组件可以用硬件、软件或者硬件与软件的结合方式实施,例如,该通信装置可以以一个芯片的形式来实现。在这种情况下,存储器和处理器可以在一个模块中实现,存储器中的指令可以是预先写入所述存储器的,也可以是后续处理器在执行的过程中加载的。
本申请的实施例提供一种设备,如图18所示,该设备1800包括主处理系统1810和网络接口卡1830。其中,主处理系统1810用于处理业务,在需要将业务数据发送到目的端设备时,将业务数据发送到网络接口卡1830中的所述业务数据对应的源队列对的发送队列;网络接口卡1830,用于从业务数据对应的源队列对的发送队列中获取Q个数据段,该Q个数据段属于业务数据,封装Q个数据段得到Q个报文,并发送该Q个报文,其中,Q个报文中的每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在目的端的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数。网络接口卡1830还用于根据Q个数据段的第一个数据段的基地址和每个数据段的长度,确定Q个报文中的每个报文在目的端设备的内存中的写入地址。
本申请的实施例还提供另一种设备,如图19所示,该设备1900包括主处理系统1910和网络接口卡1930。其中,主处理系统1910用于从设备1900的内存1920中获取应用数据,以及根据该应用数据处理业务;网络接口卡1930用于接收通过Q个报文实现的应用数据以及将接收到的Q个报文写入内存1920。其中,网络接口卡1930接收Q个报文的方法可参考如图6所示的报文传输的方法。
本申请的实施例还提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如图5所示的报文传输的方法。
本申请的实施例还提供另一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如图6所示的报文传输的方法。

Claims (28)

  1. 一种报文传输的方法,其特征在于,所述方法应用于源端设备,所述源端设备与目的端设备之间通过以太网进行远程直接内存访问RDMA,所述源端设备的网络接口卡上包括源队列对,所述源队列对包括发送队列;所述方法包括:
    从所述源队列对的发送队列中获取Q个数据段;
    分别封装所述Q个数据段得到Q个报文,并分别发送所述Q个报文,其中,所述Q个报文中的每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在所述目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,所述Q个报文中至少两个报文分别携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数。
  2. 根据权利要求1所述的方法,其特征在于,所述分别封装所述Q个数据段得到Q个报文,并发送所述Q个报文包括:
    根据所述源队列对配置的源端口号信息,依次封装所述Q个数据段得到所述Q个报文,每封装完成一个报文就发送封装后的报文,并在每封装完成N个报文后,更新所述源队列对配置的源端口号信息,前一组N个报文携带的源端口号信息与后一组N个报文携带的源端口号信息不同,N大于等于1,小于Q。
  3. 根据权利要求1所述的方法,其特征在于,所述分别封装所述Q个数据段得到Q个报文,并发送所述Q个报文包括:
    将所述Q个数据段划分为M个分组,每个分组中包括至少一个数据段,依次封装每个分组中的数据段得到每个分组中的报文,其中,每个分组中的报文携带的源端口号信息相同,至少两个分组中的报文携带的源端口号信息不同,M小于等于Q。
  4. 根据权利要求1-3任意一项所述的方法,其特征在于,所述分别封装所述Q个数据段得到Q个报文之前,还包括:
    根据所述Q个数据段的第一个数据段的基地址和每个数据段的长度,确定所述Q个报文中的每个报文在所述目的端设备的内存中的写入地址。
  5. 根据权利要求1-4任意一项所述的方法,其特征在于,所述Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指示本报文在所述Q个报文中的发送顺序。
  6. 一种报文传输的方法,其特征在于,所述方法应用于目的端设备,源端设备与所述目的端设备之间通过以太网进行远程直接内存访问RDMA,所述目的端设备的网络接口卡上包括目的队列对,所述目的队列对包括接收队列;所述方法包括:
    接收Q个报文,其中,每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在所述目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,所述Q个报文中至少两个报文携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数;
    根据所述Q个报文各自携带的本报文在所述目的端设备的内存中的写入地址,分别将所述Q个报文保存到所述目的端设备的内存中。
  7. 根据权利要求6所述的方法,其特征在于,所述接收Q个报文包括:依次接收Q个报文;则,所述保存所述Q个报文到所述目的端设备的内存中包括:每接收到一个报 文,就执行所述将接收到的报文保存到所述目的端设备的内存中的步骤。
  8. 根据权利要求7所述的方法,其特征在于,所述Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指明本报文在所述Q个报文中的发送顺序;所述方法还包括:
    每接收到所述Q个报文中的一个报文,记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号;
    在接收到所述Q个报文中的下一个报文后,确定接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,如果否,启动丢包检测流程;
    如果通过所述丢包检测流程确定在报文传输过程中发生丢包,则向所述源端设备发送报文重传指示。
  9. 根据权利要求8所述的方法,其特征在于,所述目的队列对设置有位图,所述位图包括至少Q个位图位,所述Q个位图位按照所述Q个报文的发送顺序从前往后对应于所述Q个报文,所述位图设置有头指针和尾指针,所述头指针指向接收队列最新接收到的报文所对应的位图位,所述尾指针指向接收队列预备接收的下一个报文;
    所述每接收到一个报文,记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号,包括:
    根据当前接收到的报文的报文序号,将所述位图中代表所述当前接收到的报文的位图位设置为有效,并将所述头指针指向代表所述当前接收到的报文的位图位;以及,
    根据所述当前接收到的报文的报文序号,确定所述当前接收到的报文是否是所述尾指针当前指向的位图位所对应的报文,如果是,更新所述尾指针的指向,所述尾指针新的指向为所述当前接收到的报文所对应的位图位之后的无效的位图位中的第一个位图位,如果否,保持所述尾指针当前指向的位图位不变。
  10. 根据权利要求9所述的方法,其特征在于,所述确定接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,包括:
    根据所述接收到的下一个报文的报文序号,确定所述尾指针当前是否指向所述接收到的下一个报文所对应的位图位。
  11. 根据权利要求9或10所述的方法,其特征在于,所述丢包检测流程包括:
    针对所述尾指针当前指向的位图位所对应的报文启动定时器,若在定时器超时后,所述尾指针的指向不发生改变,确定所述尾指针当前指向的位图位所对应的报文发生丢包。
  12. 根据权利要求9或10所述的方法,其特征在于,所述丢包检测流程包括:
    确定所述头指针当前指向的位图位是否超过预定值,如果超过,确定所述头指针和所述尾指针之间的位图位所对应的报文发生丢包。
  13. 根据权利要求9至12任一项所述的方法,其特征在于,所述向所述源端设备发送报文重传指示包括:
    向所述源端设备发送报文重传指示,所述重传指示携带所述尾指针当前指向的位图位所对应的报文的报文序号,以请求所述源端设备将所述Q个报文中所述尾指针当前指向的位图位所对应的报文之后的所有报文进行重新发送。
  14. 一种网络接口卡,其特征在于,所述网络接口卡位于远程直接内存访问RDMA的 源端设备,所述网络接口卡上设置源队列对,所述源队列对包括发送队列,所述发送队列用于缓存所述源端设备待发送到目的端设备的数据;
    所述网络接口卡包括:
    获取模块,用于从所述源队列对的发送队列中获取Q个数据段;
    发送模块,用于分别封装所述Q个数据段得到Q个报文,并分别发送所述Q个报文,其中,所述Q个报文中的每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在所述目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,所述Q个报文中至少两个报文分别携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数。
  15. 根据权利要求14所述的网络接口卡,其特征在于,所述发送模块具体用于根据所述源队列对配置的源端口号信息,依次封装所述Q个数据段得到所述Q个报文,每封装完成一个报文就发送封装后的报文,并在每封装完成N个报文后,更新所述源队列对配置的源端口号信息,前一组N个报文携带的源端口号信息与后一组N个报文携带的源端口号信息不同,N大于等于1,小于Q。
  16. 根据权利要求14所述的网络接口卡,其特征在于,所述发送模块具体用于将所述Q个数据段划分为M个分组,每个分组中包括至少一个数据段,依次封装每个分组中的数据段得到每个分组中的报文,其中,每个分组中的报文携带的源端口号信息相同,至少两个分组中的报文携带的源端口号信息不同,M小于等于Q。
  17. 根据权利要求14-16任意一项所述的网络接口卡,其特征在于,还包括:
    确定模块,用于根据所述Q个数据段的第一个数据段的基地址和每个数据段的长度,确定所述Q个报文中的每个报文在所述目的端设备的内存中的写入地址。
  18. 根据权利要求14-17任意一项所述的网络接口卡,其特征在于,所述Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指示本报文在所述Q个报文中的发送顺序。
  19. 一种设备,其特征在于,所述设备包括主处理系统和网络接口卡;
    所述主处理系统用于处理业务,在需要将业务数据发送到目的端设备时,将所述业务数据发送到所述网络接口卡中的源队列对的发送队列;
    所述网络接口卡,用于从所述源队列对的发送队列中获取Q个数据段,所述Q个数据段属于所述业务数据,分别封装所述Q个数据段得到Q个报文,并分别发送所述Q个报文,其中,所述Q个报文中的每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在所述目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,所述Q个报文中至少两个报文分别携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数。
  20. 根据权利要求19所述的设备,其特征在于,所述网络接口卡还用于根据所述Q个数据段的第一个数据段的基地址和每个数据段的长度,确定所述Q个报文中的每个报文在所述目的端设备的内存中的写入地址。
  21. 一种网络接口卡,其特征在于,所述网络接口卡位于远程直接内存访问RDMA的目的端设备,所述网络接口卡上设置有目的队列对,所述目的队列对包括接收队列;
    所述网络接口卡包括:
    接收模块,用于接收Q个报文,其中,每个报文携带第一报头和第二报头,每个报文携带的第一报头用于指示本报文在目的端设备的内存中的写入地址,每个报文携带的第二报头包含源端口号信息,所述Q个报文中至少两个报文分别携带的第二报头中的源端口号信息不相同,Q为大于等于2的正整数;
    执行模块,用于根据所述Q个报文各自携带的本报文在所述目的端设备的内存中的写入地址,分别将所述Q个报文从目的队列对保存到所述目的端设备的内存中。
  22. 根据权利要求21所述的网络接口卡,其特征在于,所述接收模块具体用于:依次接收Q个报文,所述接收模块每接收到一个报文,所述执行模块就执行所述将接收到的报文保存到所述目的端设备的内存中的步骤。
  23. 根据权利要求22所述的网络接口卡,其特征在于,所述Q个报文中的每个报文还分别携带报文序号,每个报文携带的报文序号用于指明本报文在所述Q个报文中的发送顺序;
    所述网络接口卡还包括:检测模块;所述接收模块每接收到一个报文,所述检测模块用于记录当前接收到的报文携带的报文序号,并根据当前接收到的报文的报文序号,确定预备接收的下一个报文的报文序号;在接收到下一个报文后,确定接收到的下一个报文的报文序号是否与预备接收的下一个报文的报文序号是否一致,如果否,启动丢包检测流程;如果通过所述丢包检测流程确定在报文传输过程中发生丢包,则向所述源端设备发送报文重传指示。
  24. 根据权利要求23所述的网络接口卡,其特征在于,所述目的队列对设置有位图,所述位图至少包括Q个位图位,所述Q个位图位按照所述Q个报文的发送顺序对应于所述Q个报文,所述位图设置有头指针和尾指针,所述头指针指向接收队列最新接收到的报文所对应的位图位,所述尾指针指向接收队列预备接收的下一个报文;
    所述检测模块具体用于根据当前接收到的报文的报文序号,将所述位图中代表所述当前接收到的报文的位图位设置为有效,并将所述头指针指向代表所述当前接收到的报文的位图位;以及,根据所述当前接收到的报文的报文序号,确定所述当前接收到的报文是否是所述尾指针当前指向的位图位所对应的报文,如果是,更新所述尾指针的指向,所述尾指针新的指向为所述当前接收到的报文所对应的位图位之后的无效的位图位中的第一个位图位,如果否,保持所述尾指针当前指向的位图位不变。
  25. 根据权利要求24所述的网络接口卡,其特征在于,所述检测模块执行丢包检测流程具体包括:针对所述尾指针当前指向的位图位所对应的报文启动定时器,若在定时器超时后,所述尾指针的指向不发生改变,确定所述尾指针当前指向的位图位所对应的报文发生丢包。
  26. 一种设备,其特征在于,所述设备包括主处理系统和网络接口卡;
    所述主处理系统用于从所述设备的内存中获取应用数据,以及根据所述应用数据处理业务;
    所述网络接口卡,用于执行如权利要求6-13任意一项所述的方法。
  27. 一种计算机存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-5任意一项所述的方法。
  28. 一种计算机存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求6-13任意一项所述的方法。
PCT/CN2018/072886 2018-01-16 2018-01-16 一种报文传输的方法及装置 WO2019140556A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202010305935.6A CN111654447B (zh) 2018-01-16 2018-01-16 一种报文传输的方法及装置
PCT/CN2018/072886 WO2019140556A1 (zh) 2018-01-16 2018-01-16 一种报文传输的方法及装置
CN201880003454.0A CN109691039B (zh) 2018-01-16 2018-01-16 一种报文传输的方法及装置
US16/895,791 US11716409B2 (en) 2018-01-16 2020-06-08 Packet transmission method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/072886 WO2019140556A1 (zh) 2018-01-16 2018-01-16 一种报文传输的方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/895,791 Continuation US11716409B2 (en) 2018-01-16 2020-06-08 Packet transmission method and apparatus

Publications (1)

Publication Number Publication Date
WO2019140556A1 true WO2019140556A1 (zh) 2019-07-25

Family

ID=66191852

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072886 WO2019140556A1 (zh) 2018-01-16 2018-01-16 一种报文传输的方法及装置

Country Status (3)

Country Link
US (1) US11716409B2 (zh)
CN (2) CN111654447B (zh)
WO (1) WO2019140556A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021083492A1 (en) * 2019-10-29 2021-05-06 Huawei Technologies Co., Ltd. Systems and methods for sorting data elements with approximation to o(1)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6800933B2 (ja) * 2018-10-11 2020-12-16 株式会社ソニー・インタラクティブエンタテインメント ネットワーク評価装置、評価方法およびプログラム
US20190253357A1 (en) * 2018-10-15 2019-08-15 Intel Corporation Load balancing based on packet processing loads
CN117336229A (zh) 2019-06-04 2024-01-02 华为技术有限公司 一种集合通信的方法、装置及系统
US11005770B2 (en) * 2019-06-16 2021-05-11 Mellanox Technologies Tlv Ltd. Listing congestion notification packet generation by switch
CN110647071B (zh) * 2019-09-05 2021-08-27 华为技术有限公司 一种控制数据传输的方法、装置及存储介质
CN110677220B (zh) * 2019-09-09 2022-06-14 无锡江南计算技术研究所 一种基于多轨冗余应答的rdma消息机制及其实现装置
CN111711577B (zh) * 2020-07-24 2022-07-22 杭州迪普信息技术有限公司 流控设备的报文转发方法及装置
CN112566183B (zh) * 2020-11-20 2023-04-21 北京直真科技股份有限公司 一种自动开通5g传输电路的sdn控制器
CN114727340A (zh) * 2021-01-06 2022-07-08 华为技术有限公司 传输报文的方法和装置
CN112954045B (zh) * 2021-02-07 2022-04-26 游密科技(深圳)有限公司 节点中的数据传输方法、装置、介质及电子设备
CN115701060A (zh) * 2021-07-29 2023-02-07 华为技术有限公司 一种报文传输方法及相关装置
CN114090484B (zh) * 2021-11-15 2023-08-08 深圳云豹智能有限公司 远程直接数据存取方法及装置
CN114448892A (zh) * 2022-02-10 2022-05-06 珠海星云智联科技有限公司 一种软硬件选路方法及装置
CN114785714B (zh) * 2022-03-01 2023-08-22 阿里巴巴(中国)有限公司 一种报文传输时延检测方法、存储介质及设备
CN114979040B (zh) * 2022-05-07 2024-02-20 成都数之联科技股份有限公司 一种udp报文写入方法及系统及装置及介质
CN114978986B (zh) * 2022-05-13 2024-05-14 中国联合网络通信集团有限公司 一种数据传输方法、装置及存储介质
CN115633104B (zh) * 2022-09-13 2024-02-13 江苏为是科技有限公司 数据发送方法、数据接收方法、装置及数据收发系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040049601A1 (en) * 2002-09-05 2004-03-11 International Business Machines Corporation Split socket send queue apparatus and method with efficient queue flow control, retransmission and sack support mechanisms
CN103441937A (zh) * 2013-08-21 2013-12-11 曙光信息产业(北京)有限公司 组播数据的发送方法和接收方法
CN105025070A (zh) * 2014-04-30 2015-11-04 英特尔公司 用于优化约束系统内的网络数据流的方法
CN106411739A (zh) * 2015-07-31 2017-02-15 华为技术有限公司 一种数据转发方法、装置及系统
CN107113298A (zh) * 2014-12-29 2017-08-29 Nicira股份有限公司 为rdma提供多租赁支持的方法

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6917987B2 (en) * 2001-03-26 2005-07-12 Intel Corporation Methodology and mechanism for remote key validation for NGIO/InfiniBand™ applications
US20060067346A1 (en) * 2004-04-05 2006-03-30 Ammasso, Inc. System and method for placement of RDMA payload into application memory of a processor system
US7984180B2 (en) * 2005-10-20 2011-07-19 Solarflare Communications, Inc. Hashing algorithm for network receive filtering
US7836220B2 (en) * 2006-08-17 2010-11-16 Apple Inc. Network direct memory access
US8265092B2 (en) * 2007-09-14 2012-09-11 International Business Machines Corporation Adaptive low latency receive queues
CN101227287B (zh) * 2008-01-28 2010-12-08 华为技术有限公司 一种数据报文处理方法及数据报文处理装置
US8019826B2 (en) * 2008-09-29 2011-09-13 Cisco Technology, Inc. Reliable reception of messages written via RDMA using hashing
CN101409715B (zh) * 2008-10-22 2012-04-18 中国科学院计算技术研究所 一种利用InfiniBand网络进行通信的方法及系统
CN101702689B (zh) * 2009-11-30 2012-07-04 迈普通信技术股份有限公司 组播业务数据负载均衡的传输控制方法及传输控制设备
US10164870B2 (en) * 2013-06-28 2018-12-25 Avago Technologies International Sales Pte. Limited Relaxed ordering network
US10152352B2 (en) * 2015-11-17 2018-12-11 Friday Harbor Llc Writing to contiguous memory addresses in a network on a chip architecture
US10320681B2 (en) * 2016-04-12 2019-06-11 Nicira, Inc. Virtual tunnel endpoints for congestion-aware load balancing
US10193810B2 (en) * 2016-11-08 2019-01-29 Vmware, Inc. Congestion-aware load balancing
US10411996B2 (en) * 2017-06-19 2019-09-10 Cisco Technology, Inc. Validation of routing information in a network fabric
CN107231316B (zh) * 2017-06-27 2020-03-13 中国联合网络通信集团有限公司 报文的传输方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040049601A1 (en) * 2002-09-05 2004-03-11 International Business Machines Corporation Split socket send queue apparatus and method with efficient queue flow control, retransmission and sack support mechanisms
CN103441937A (zh) * 2013-08-21 2013-12-11 曙光信息产业(北京)有限公司 组播数据的发送方法和接收方法
CN105025070A (zh) * 2014-04-30 2015-11-04 英特尔公司 用于优化约束系统内的网络数据流的方法
CN107113298A (zh) * 2014-12-29 2017-08-29 Nicira股份有限公司 为rdma提供多租赁支持的方法
CN106411739A (zh) * 2015-07-31 2017-02-15 华为技术有限公司 一种数据转发方法、装置及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021083492A1 (en) * 2019-10-29 2021-05-06 Huawei Technologies Co., Ltd. Systems and methods for sorting data elements with approximation to o(1)

Also Published As

Publication number Publication date
US20200304608A1 (en) 2020-09-24
CN109691039A (zh) 2019-04-26
CN109691039B (zh) 2020-04-28
CN111654447B (zh) 2023-04-18
CN111654447A (zh) 2020-09-11
US11716409B2 (en) 2023-08-01

Similar Documents

Publication Publication Date Title
WO2019140556A1 (zh) 一种报文传输的方法及装置
CN110661723B (zh) 一种数据传输方法、计算设备、网络设备及数据传输系统
CN109936510B (zh) 多路径rdma传输
CN108881008B (zh) 一种数据传输的方法、装置和系统
US9450867B2 (en) Apparatus and method for controlling transmission between relay devices
US20060203730A1 (en) Method and system for reducing end station latency in response to network congestion
US7685250B2 (en) Techniques for providing packet rate pacing
US10791054B2 (en) Flow control and congestion management for acceleration components configured to accelerate a service
KR20140030313A (ko) 신뢰가능한 세션 마이그레이션을 위한 방법 및 장치
US20210297351A1 (en) Fabric control protocol with congestion control for data center networks
US10701189B2 (en) Data transmission method and apparatus
US20100226384A1 (en) Method for reliable transport in data networks
US9356989B2 (en) Learning values of transmission control protocol (TCP) options
WO2017162117A1 (zh) 一种集群精确限速方法和装置
WO2020073907A1 (zh) 转发表项的更新方法及装置
US11799777B2 (en) Method for transferring information across a data center network
WO2021244450A1 (zh) 一种通信方法及装置
US10999210B2 (en) Load sharing method and network device
JP2016100721A (ja) 制御装置
US20210297343A1 (en) Reliable fabric control protocol extensions for data center networks with failure resilience
JP2012134907A (ja) Tcp転送装置およびそのプログラム
TW202335470A (zh) 網路流壅塞管理裝置及其方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18900984

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18900984

Country of ref document: EP

Kind code of ref document: A1