WO2024113830A1 - 数据传输方法、装置、设备、系统及存储介质 - Google Patents

数据传输方法、装置、设备、系统及存储介质 Download PDF

Info

Publication number
WO2024113830A1
WO2024113830A1 PCT/CN2023/103736 CN2023103736W WO2024113830A1 WO 2024113830 A1 WO2024113830 A1 WO 2024113830A1 CN 2023103736 W CN2023103736 W CN 2023103736W WO 2024113830 A1 WO2024113830 A1 WO 2024113830A1
Authority
WO
WIPO (PCT)
Prior art keywords
communication device
message
connection
sub
service
Prior art date
Application number
PCT/CN2023/103736
Other languages
English (en)
French (fr)
Inventor
晏思宇
袁辉
宋鹤翔
刘宁
曲迪
郑晓龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024113830A1 publication Critical patent/WO2024113830A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/31Flow control; Congestion control by tagging of packets, e.g. using discard eligibility [DE] bits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/111Switch interfaces, e.g. port details

Definitions

  • the present application relates to the field of communication technology, and in particular to a data transmission method, apparatus, device, system and storage medium.
  • RDMA remote direct memory access
  • data transmission of RDMA technology adopts a single connection mode, that is, the messages of the same service are carried on a queue pair (QP) connection for transmission.
  • QP queue pair
  • One QP connection corresponds to one path, that is, the QP connection is used to describe the communication connection of the service between any two nodes.
  • the data transmission method in the related technology needs to establish a large number of QP connections to achieve data transmission.
  • the number of QP connections established is limited by resources such as network card memory, which in turn limits the performance of data transmission.
  • the present application provides a data transmission method, apparatus, device, system and storage medium for transmitting service messages through a shared connection group.
  • a data transmission method is provided. Taking a first communication device executing the method as an example, the first communication device obtains a first message corresponding to a first service, and the destination of the first message is a second communication device; the first communication device sends the first message to the second communication device according to a first connection group, and the first connection group is shared by the service transmitted by the first communication device.
  • This method can share and reuse the same connection group to transmit business messages, thereby improving the scalability of the connection.
  • the performance of data transmission is not limited by resources such as network card memory, nor is it limited by business interruptions caused by single connection failures, thereby improving the reliability of data transmission.
  • the first connection group includes a first connection between the first communication device and the second communication device, and the first communication device can send a first message to the second communication device according to the first connection in the first connection group.
  • the first communication device obtains a second message corresponding to the second service, the destination of the second message is the same as the destination of the first message, which is the second communication device; the first communication device can also send the second message to the second communication device according to the first connection.
  • the performance of data transmission is not limited by resources such as network card memory, further improving the scalability of the connection.
  • the first connection group includes not only the first connection between the first communication device and the second communication device, but also the second connection between the first communication device and the second communication device, and the first communication device may send the first message to the second communication device according to the first connection in the first connection group.
  • the first communication device obtains the second message corresponding to the first service, the destination of the second message is the second communication device; the first communication device may send the second message to the second communication device according to the second connection.
  • the first connection group is not only used for sharing the service transmitted from the first communication device to the second communication device, but also for sharing the service transmitted from the first communication device to the fourth communication device.
  • the first communication device obtains a third message corresponding to the first service, and the destination of the third message is the fourth communication device; the first communication device sends the third message to the fourth communication device according to the first connection group.
  • the first communication device can also send messages of the same service to the second communication device and the fourth communication device through the shared first connection group, further improving the connection scalability and data transmission performance.
  • the first communication device obtains a third message corresponding to the second service, and the destination of the third message is the fourth communication device; the first communication device sends the third message to the fourth communication device according to the first connection group.
  • the first communication device can also send messages of different services to the second communication device and the fourth communication device through the shared first connection group, further improving the connection scalability and data transmission performance.
  • the first communication device may send the first message to the second communication device according to the first connection by selecting the first connection in the first connection group to send the first message to the second communication device according to whether the connection performance of the first connection meets the performance condition, and the connection performance includes at least one of the transmission queue length, delay performance, or packet loss performance.
  • the connection whose connection performance meets the performance condition is the connection with higher connection performance in the connection group. Therefore, when the first connection group includes multiple connections, the first connection is selected according to whether the performance condition is met, so that the transmission performance of data transmission through the first connection is higher.
  • the first communication device includes a connection resource pool
  • the connection resource pool includes at least one connection group
  • each connection group corresponds to a destination communication device
  • the at least one connection group includes the first connection group.
  • the first communication device before the first communication device sends the first message to the second communication device according to the first connection group, it is necessary to first determine the first connection group according to the first message. For example, when the destination communication device corresponding to the at least one connection group includes the destination of the first message, the connection group corresponding to the destination is used as the first connection group; when the destination communication device corresponding to the at least one connection group does not include the destination of the first message, the first connection group is established so that the first message can be transmitted based on the first connection group.
  • connection resource pool already includes the corresponding connection group
  • the connection group corresponding to the destination of the message to be transmitted can be quickly determined based on the connection resource pool, and since there is no need to establish a connection now, the speed of data transmission is also improved.
  • the connection performance of each connection in the first connection group needs to be obtained before determining the first connection.
  • a detection message is sent based on any connection; at least one of the delay performance or the packet loss performance of any connection is obtained through the detection message.
  • the connection performance of each connection can be obtained in real time, which ensures the accuracy of the connection performance, and further ensures the accuracy of the first connection selected according to the connection performance, so that the message transmitted through the first connection has a higher transmission performance.
  • the first communication device may add a first identifier to the first message.
  • An important packet refers to a message whose packet loss has a greater impact on data transmission performance.
  • the important packet may be the last message in the sending queue corresponding to at least one connection, a control signaling message, a detection message, a heartbeat keep-alive message, an address request message, a packet loss retransmission message, or a service-specified important packet.
  • the first identifier is used by the communication device receiving the first message to determine that the first message is an important packet based on the first identifier.
  • the second communication device that receives the first message can determine that the first message is an important packet based on the first identifier, and then the second communication device can perform protection processing on the important packet to ensure the transmission performance of the important packet, reduce the packet loss rate of the important packet, and effectively avoid the tail delay problem caused by the loss of the important packet.
  • the first communication device may further divide the first message into a plurality of first sub-messages, and then send the plurality of first sub-messages to the second communication device according to the first connection group.
  • the first message with a data volume greater than the threshold is divided into a plurality of first sub-messages for transmission, so that fine-grained multi-connection scheduling can be more easily implemented, and data transmission performance can be improved.
  • the first message may be divided into a plurality of first sub-messages by dividing the first message into a plurality of first sub-messages according to a memory address range corresponding to the first message, and a first sub-message corresponds to a memory address interval in the memory address range.
  • the first communication device includes an application layer and a transport layer; the first communication device obtains a first message corresponding to the first service by the transport layer receiving the first message corresponding to the first service sent by the application layer calling the logical interface of the first service; similarly, after the first communication device sends multiple first sub-messages to the second communication device according to the first connection group, when the transport layer obtains notification messages corresponding to the multiple first sub-messages respectively, the transport layer calls the logical interface of the first service to send the notification message corresponding to the first message to the application layer. Report the news.
  • the application layer still transmits the first message to the transport layer and receives the notification message corresponding to the first message sent by the transport layer.
  • the method can be implemented by directly using the existing upper-layer application logic interface, without modifying the original business code or establishing a new interface, that is, multi-connection sharing is transparent to the application layer, making the method more deployable and applicable.
  • the first communication device is middleware between application layer software and network card hardware.
  • the first message is transmitted based on the RDMA protocol, so that the method can improve the data transmission performance under the RDMA technology.
  • a data transmission method is provided, which is applied to a third communication device, wherein the third communication device receives a first message corresponding to a first service sent by the first communication device based on a first connection group, the destination of the first message is the second communication device, and the first connection group is shared by the service transmitted by the first communication device; the third communication device sends a first message to the second communication device.
  • the method receives messages by sharing and multiplexing the same connection group, thereby ensuring the effective implementation of connection sharing and multiplexing, improving the connection scalability of the first communication device, and improving the performance and reliability of data transmission.
  • the first connection group includes a first connection between a first communication device and a second communication device
  • the third communication device receives a first message corresponding to a first service sent by the first communication device based on the first connection in the first connection group.
  • the third communication device may also receive a second message corresponding to a second service sent by the first communication device based on the first connection, the destination of the second message being the second communication device; and the third communication device sends the second message to the second communication device.
  • the first connection group includes a first connection and a second connection between the first communication device and the second communication device
  • the third communication device receives a first message corresponding to the first service sent by the first communication device based on the first connection in the first connection group.
  • the third communication device may also receive a second message corresponding to the first service sent by the first communication device based on the second connection, and the destination of the second message is the second communication device; the third communication device sends the second message to the second communication device.
  • the third communication device receives the first message of the first service sent by the first communication device, when the first message is an important packet, the first message is processed according to a first packet loss rate, and the first packet loss rate is lower than the packet loss rate of non-important packets.
  • the important packet is the last message in the sending queue corresponding to at least one connection, a control signaling message, a heartbeat keep-alive message, a detection message, an address request message, a packet loss retransmission message, or a service-specified important packet.
  • the first message carries a first tag, which is added by the first communication device; before processing the first message according to the first packet loss rate, it can be determined that the first message is an important packet based on the first tag.
  • a third communication device receives a detection message sent by the first communication device based on any connection in the first connection group, adds transmission information to the detection message, the transmission information includes at least one of a node identification, packet loss, delay or throughput; transmits the detection message with added transmission information, and the detection message is used by the first communication device to obtain at least one of delay performance or packet loss performance of any connection.
  • the data volume of the first message is greater than a threshold value
  • the first message includes multiple first sub-messages
  • the multiple first sub-messages are obtained by dividing the first message by the first communication device.
  • an out-of-order flag is added to the first sub-message that is out of order among the multiple first sub-messages; the first sub-message with the out-of-order flag added is transmitted, and the out-of-order flag is used to indicate that the multiple first sub-messages are out of order but no packet loss occurs.
  • the first message is transmitted based on the RDMA protocol.
  • a data transmission method is provided, which is applied to a second communication device, wherein the second communication device receives a first message corresponding to a first service, the first message is sent by the first communication device based on a first connection group, the destination of the first message is the second communication device, and the first connection group is shared by a service transmitted by the first communication device.
  • the method receives messages by sharing and multiplexing the same connection group, thereby ensuring the effective implementation of connection sharing and multiplexing, improving the connection scalability of the first communication device, and improving the performance and reliability of data transmission.
  • the first message may be sent by the first communication device based on the first connection.
  • the second communication device may also receive a second message corresponding to a second service, the second message being sent by the first communication device based on the first connection, and the destination of the second message being the second communication device.
  • the first message may be sent by the first communication device based on the first connection.
  • the second communication device may also receive a second message corresponding to the first service, the second message being sent by the first communication device based on the second connection, and the destination of the second message being the second communication device.
  • the first message includes a plurality of first sub-messages
  • the second communication device includes an application layer and a transport layer
  • the second The communication device may receive a first message corresponding to a first service in such a way that a transport layer receives a plurality of first sub-messages sent by the first communication device based on a first connection group, the plurality of first sub-messages respectively including corresponding sequence numbers, sub-sequence numbers and interface identifiers, and the plurality of first sub-messages are obtained by segmenting the first message by the first communication device; the transport layer arranges and combines the plurality of first sub-messages according to the sequence numbers and sub-sequence numbers to obtain a first message; and the transport layer calls the logical interface corresponding to the interface identifier to send the first message to the application layer.
  • the application layer Since the first message is divided into multiple first sub-messages for multi-connection shared transmission, the application layer still receives the first message sent by the transport layer, so the application layer is unaware of the data segmentation and multi-connection sharing process.
  • any first sub-message that is out of order is received, if any first sub-message carries an out-of-order flag, it is determined based on the out-of-order flag that multiple first sub-messages are out of order but no packet loss occurs. By identifying the out-of-order flag, it is possible to accurately distinguish out-of-order messages from packet loss messages, thereby avoiding mistaking out-of-order messages for packet loss messages, which triggers packet loss transmission when there is no packet loss.
  • the first communication device is middleware between application layer software and network card hardware.
  • the first message is transmitted based on the RDMA protocol, so that the method can improve the data transmission performance under the RDMA technology.
  • a data transmission device which is applied to a first communication device, and the device includes:
  • An acquisition module used to acquire a first message corresponding to a first service, the destination of the first message being a second communication device;
  • the sending module is used to send a first message to the second communication device according to the first connection group, and the first connection group is shared by the service transmitted by the first communication device.
  • the first connection group includes a first connection between the first communication device and the second communication device, and a sending module, configured to send a first message to the second communication device according to the first connection;
  • the acquisition module is further used to acquire a second message corresponding to the second service, the destination of the second message is the second communication device;
  • the sending module is further used to send a second message to the second communication device according to the first connection.
  • the first connection group includes a first connection between the first communication device and the second communication device, and a sending module, configured to send a first message to the second communication device according to the first connection;
  • the sending module is further used to send a second message to the second communication device according to the second connection.
  • the sending module is used to select the first connection to send the first message to the second communication device according to whether the connection performance of the first connection meets the performance condition, and the connection performance includes at least one of the sending queue length, delay performance or packet loss performance.
  • the first communication device includes a connection resource pool
  • the connection resource pool includes at least one connection group
  • each connection group corresponds to a destination communication device
  • the at least one connection group includes the first connection group
  • the device further includes:
  • a splitting module configured to, when the data volume of the first message is greater than a threshold, cause the first communication device to split the first message into a plurality of first sub-messages
  • the sending module is used to send a plurality of first sub-messages to the second communication device according to the first connection group.
  • the first communication device includes an application layer and a transport layer; an acquisition module, configured for the transport layer to receive a first message corresponding to the first service sent by the application layer by calling a logical interface of the first service;
  • the sending module is used for, when the transport layer obtains notification messages respectively corresponding to a plurality of first sub-messages, the transport layer calls the logical interface of the first service to send the notification message corresponding to the first message to the application layer.
  • the first communication device is middleware between application layer software and network card hardware.
  • the first message is transmitted based on the RDMA protocol.
  • a data transmission device which is applied to a third communication device, and the device includes:
  • a receiving module configured to receive a first message corresponding to a first service sent by a first communication device based on a first connection group, the destination of the first message is a second communication device, and the first connection group is shared by a service transmitted by the first communication device;
  • the sending module is used to send a first message to the second communication device.
  • the first connection group includes a first connection between the first communication device and the second communication device, and a receiving module configured to receive a first message corresponding to the first service sent by the first communication device based on the first connection;
  • the receiving module is further used to receive a second message corresponding to a second service sent by the first communication device based on the first connection, where the destination of the second message is the second communication device;
  • the sending module is also used to send a second message to the second communication device.
  • the first connection group includes a first connection and a second connection between the first communication device and the second communication device, and a receiving module is configured to receive a first message corresponding to the first service sent by the first communication device based on the first connection;
  • the receiving module is further used to receive a second message corresponding to the first service sent by the first communication device based on the second connection, where the destination of the second message is the second communication device;
  • the sending module is also used to send a second message to the second communication device.
  • the first message includes a plurality of first sub-messages
  • the device further includes: an adding module, configured to add a disorder identifier to a disordered first sub-message among the plurality of first sub-messages when the plurality of first sub-messages are disordered;
  • the transmission module is used to transmit the first sub-message with an out-of-order flag added, where the out-of-order flag is used to indicate that a plurality of first sub-messages are out of order but no packet loss occurs.
  • the first message is transmitted based on the RDMA protocol.
  • a data transmission device which is applied to a second communication device, and the device includes:
  • the receiving module is used to receive a first message corresponding to a first service, the first message is sent by a first communication device based on a first connection group, the destination of the first message is a second communication device, and the first connection group is shared by the service transmitted by the first communication device.
  • the first connection group includes a first connection between the first communication device and the second communication device, and the first message is sent by the first communication device based on the first connection;
  • the receiving module is further used to receive a second message corresponding to a second service, the second message is sent by the first communication device based on the first connection, and the destination of the second message is the second communication device.
  • the receiving module is further used to receive a second message corresponding to the first service, the second message is sent by the first communication device based on the second connection, and the destination of the second message is the second communication device.
  • a receiving module is used for receiving multiple first sub-messages at the transport layer, wherein the multiple first sub-messages respectively include corresponding sequence numbers, sub-sequence numbers and interface identifiers, and the multiple first sub-messages are obtained by segmenting the first message by the first communication device; the transport layer arranges and combines the multiple first sub-messages according to the sequence numbers and sub-sequence numbers to obtain the first message; the transport layer calls the logical interface corresponding to the interface identifier to send the first message to the application layer.
  • the determination module is used to determine that multiple first sub-messages are out of order but no packet loss occurs based on the out of order identifier when any first sub-message that is out of order is received, if any first sub-message carries an out of order identifier.
  • the second communication device is middleware between application layer software and network card hardware.
  • the first message is transmitted based on the RDMA protocol.
  • the memory may be integrated with the processor, or the memory may be provided separately from the processor.
  • the memory can be a non-transitory memory, such as a read-only memory (ROM), which can be integrated with the processor on the same chip or can be set on different chips.
  • ROM read-only memory
  • a communication device comprising: a transceiver, a memory, and a processor.
  • the transceiver, the memory, and the processor communicate with each other through an internal connection path, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory to control the transceiver to receive signals and control the transceiver to send signals, and when the processor executes the instructions stored in the memory, the communication device executes the method in the first aspect or any possible implementation of the first aspect, or executes the method in the second aspect or any possible implementation of the second aspect, or executes the method in the third aspect or any possible implementation of the third aspect.
  • a data transmission system comprising a first communication device and a second communication device;
  • the first communication device is used to execute the method of the first aspect or any possible implementation of the first aspect
  • the second communication device is used to execute the method of the third aspect or any possible implementation of the third aspect.
  • the data transmission system further includes a third communication device; the third communication device is used to perform the second aspect or any one of the second aspects.
  • the method of realizing the energy is not limited to a third communication device.
  • a computer program comprising: a computer program code, when the computer program code is executed by a computer, the computer executes the methods in the above aspects.
  • a chip comprising a processor for calling and executing instructions stored in a memory from the memory, so that a communication device equipped with the chip executes the methods in the above aspects.
  • another chip comprising: an input interface, an output interface, a processor and a memory, wherein the input interface, the output interface, the processor and the memory are connected via an internal connection path, and the processor is used to execute the code in the memory, and when the code is executed, the processor is used to execute the methods in the above aspects.
  • FIG1 is a schematic diagram of a data transmission provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of an implementation environment of a data transmission method provided in an embodiment of the present application.
  • FIG3 is an interactive schematic diagram of a data transmission method provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a connection between a resource pool and a logical interface provided in an embodiment of the present application
  • FIG5 is a schematic diagram of functional modules of a communication device provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of the structure of a data transmission device provided in an embodiment of the present application.
  • FIG9 is a schematic structural diagram of another data transmission device provided in an embodiment of the present application.
  • FIG10 is a schematic diagram of the structure of a network device provided in an embodiment of the present application.
  • FIG11 is a schematic diagram of the structure of another network device provided in an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the structure of a server provided in an embodiment of the present application.
  • the data transmission of RDMA technology is a single connection mode, that is, the message of a service is carried on a connection pair and sent out.
  • a connection pair can refer to a QP connection.
  • a QP connection includes a send queue (SQ), a receive queue (RQ) and a completion queue (CQ).
  • SQ send queue
  • RQ receive queue
  • CQ completion queue
  • the five-tuple of the message of a service is the same, and the five-tuple includes the source Internet protocol (IP), the destination IP, the protocol number, the source port number and the destination port number. Therefore, on a communication device that enables the equal-cost multi-path routing (ECMP) function, the messages on the same QP connection will be transmitted on the same physical path.
  • ECMP is a routing algorithm based on five-tuple hash routing.
  • the data transmission method in the related technology needs to establish a large number of QP connections to achieve data transmission, and the number of QP connections established is limited by resources such as network card memory, which in turn limits the performance of data transmission.
  • the single connection mode is also limited by the risk of service interruption caused by a single connection failure, which reduces the performance and reliability of data transmission.
  • the single connection mode also has a low utilization rate of the network bandwidth of multiple equal-cost paths.
  • the communication device 1 as a sending node mainly includes SQ and CQ
  • the communication device 2 as a receiving node mainly includes RQ and CQ.
  • the communication device 1 and the communication device 2 communicate with each other through at least one communication device, or the communication device 1 and the communication device 2 are directly connected.
  • the upper layer business application (application, APP) submits the task of sending and receiving data through work request (WR), and converts it into SQ or RQ to become a work queue element (WQE).
  • WQE work queue element
  • a CQE is converted into a work completion (WC) task and handed over to the business APP.
  • the transmission interface of the RDMA technology includes a reliable connection (RC) mode and a dynamically connected transmission (DC) mode.
  • the RC mode has the characteristics of being connection-oriented, ensuring reliability for retransmission of lost messages, and supporting unilateral or bilateral operations.
  • the example in Figure 1 is data transmission in the RC mode under bilateral operation. The difference between the unilateral operation in the RC mode and the data transmission shown in Figure 1 is that the receiving node does not need to generate WR and WC.
  • a QP of a local node establishes a single connection and communicates with a QP of a remote node
  • the number of network card connections of the local node also increases. For example, if the network includes N (N is a positive integer) nodes and each node includes P (P is a positive integer) business processes, then in the scenario of full network connection, each node needs to create N*P*P QP connections. If the network card memory and other resources of the local node are exhausted, the performance of data transmission is likely to deteriorate.
  • the local node In data transmission in DC mode, when a local node needs to communicate with multiple remote nodes, the local node can share a QP connection and establish connections with different remote nodes in a cycle to transmit data, so that the network card connection scale of the local node does not increase rapidly with the increase in the number of nodes, thereby reducing the problem of network card connection scale growth caused by the increase in network scale.
  • the overhead of each dismantling and building of the connection is large, which seriously degrades the latency and throughput of data transmission.
  • the implementation environment includes multiple electronic devices and multiple switches, and the multiple switches include leaf switches, spine switches, and core switches.
  • the electronic device can be a terminal or a server, which mainly carries high-performance services with different communication needs. Different services can be carried on the same electronic device, and there is a need for mutual communication between the services of different electronic devices.
  • the electronic device includes a network card, which is used to send and receive data to achieve communication between different electronic devices.
  • the number of electronic devices, leaf switches, spine switches, and core switches can be flexibly adjusted according to factors such as the network scale. It can be understood that as the network scale continues to increase, the number of electronic devices, leaf switches, spine switches, and core switches increases, and there may be multiple equivalent paths between any two electronic devices. Therefore, in this case, the single connection mode of RDMA cannot use multiple paths at the same time, so it cannot fully utilize the network bandwidth, which limits the service performance, and is prone to problems such as uneven network load on multiple paths and high chain reconstruction overhead under single point failure.
  • an embodiment of the present application provides a data transmission method, which uses the standard interface under the current RDMA, such as the RC interface or the DC interface, to solve the problems of connection scalability and link balancing through the sharing and multiplexing of the connection.
  • the method can be applied to data center network topology, interconnection between multiple data centers or wide area Internet, and the business scenario of the method can be distributed machine learning training, distributed storage, artificial intelligence (AI), high performance computing (HPC) or containers and other high-performance business scenarios.
  • the RDMA in the embodiment of the present application can be Infiniband or RDMA over converged Ethernet (RoCE).
  • FIG 3 is an interactive schematic diagram of a data transmission method provided in an embodiment of the present application.
  • the transmission of data from a first communication device to a second communication device is taken as an example for explanation, wherein the first communication device and the second communication device can be any two electronic devices shown in Figure 2, or the first communication device and the second communication device can refer to the middleware between the application layer software and the network card hardware.
  • the third communication device can be any leaf switch, spine switch or core switch shown in Figure 2.
  • the data transmission method includes the following steps 301-step 305.
  • Step 301 A first communication device obtains a first message corresponding to a first service, and the destination of the first message is a second communication device.
  • a plurality of upper layer applications are running in the first communication device, each upper layer application corresponds to at least one service process, each service process corresponds to at least one service thread, and each service thread needs to communicate with the corresponding remote node.
  • the first service in the embodiment of the present application may refer to any service thread running in the first communication device.
  • the first communication device includes an application layer and a transport layer, the application layer and the transport layer interact with each other, the application layer runs at least one application, an application includes at least one service, and the transport layer is used to send data from an upper layer, or to receive data from an upper layer. Data to the upper layer application.
  • the application layer calls the logical interface of the first service to send a first message corresponding to the first service to the transport layer, so that the first communication device can obtain the first message to be transmitted, that is, the transport layer receives the first message corresponding to the first service sent by the application layer calling the logical interface of the first service.
  • the transport layer calls the logical interface of the first service to send a notification message corresponding to the first message to the application layer.
  • the notification message can be a sending completion message (such as the CQE shown in Figure 1), a packet loss notification message, an out-of-order notification message, and the like.
  • the first message is WR data
  • the APP shown in FIG1 represents the application layer of the first communication device
  • the QP shown in FIG1 represents the transport layer of the first communication device.
  • the first communication device calls the corresponding logical interface through the APP, submits the task of sending data to the sending queue SQ of the transport layer in WR mode, and sends data to the second communication device through the sending queue SQ of the transport layer through the established QP connection.
  • the receiving queue CQ of the transport layer generates a CQE corresponding to the first message
  • the transport layer calls the corresponding logical interface to submit the CQE to the APP in WC mode.
  • Step 302 The first communication device sends the first message to the second communication device according to the first connection group, where the first connection group is shared by the service transmitted by the first communication device.
  • the method can be applied to the RDMA transmission technology scenario, and the first message can be transmitted based on the RDMA protocol, so that the method can improve the data transmission performance under the RDMA technology.
  • the first communication device includes a connection resource pool
  • the connection resource pool includes at least one connection group
  • one connection group corresponds to one destination communication device
  • at least one connection group includes a first connection group.
  • connection resource pool already includes the corresponding connection group
  • the connection group corresponding to the destination of the message to be transmitted can be quickly determined based on the connection resource pool, and since there is no need to establish a connection now, the speed of data transmission is also improved.
  • the at least one connection group included in the connection resource pool refers to all connection groups included in the connection resource pool.
  • the embodiment of the present application does not limit the way to establish the first connection group.
  • the first communication device initiates a request (REQ) message for establishing a QP connection.
  • the REQ message carries connection parameters, such as the sequence number of the QP connection, the starting packet sequence number (PSN), the upper limit of the number of retransmissions, the source port number, etc.
  • PSN packet sequence number
  • the second communication device monitors the connection request sent by the first communication device, it will verify and record the connection parameters in the REQ message. For example, the verification method is to determine whether the connection parameters in the REQ message are consistent with its own connection parameters.
  • a reply (REP) message is sent to indicate that the connection request sent by the first communication device is accepted, and the connection parameters of the second communication device are carried in the REP message.
  • the first communication device receives the REP message replied by the second communication device, it verifies and records the connection parameters in the REP message, and then sends a ready to use (RTU) message to indicate that it agrees with the connection parameters of the first communication device.
  • RTU ready to use
  • FIG 4 is a connection diagram of a connection resource pool and a logical interface provided by an embodiment of the present application.
  • a connection group corresponds to a destination address dst: IP1 of a destination communication device. Since different physical QP connections of the same connection group include different source ports, the second communication device can ECMP different or the same physical paths for different physical QP connections according to different source ports. Therefore, different physical QP connections correspond to the same or different physical paths to the destination.
  • the business application may include business 1 of APP1, business 2 of APP1, and business 1 of APP2, wherein the QP connection established by any business is a logical QP connection, and the logical QP connection established by any business is mapped to the physical QP connection actually established in the connection resource pool. Therefore, for the application business, WR data can be sent directly through the original logical interface, and then based on the mapping of the logical QP connection and the physical QP connection, the WR data is mapped to multiple physical QP connections. It ensures that the original business code does not need to be modified, that is, the connection resource pool is transparent to the upper-layer business, making the connection resource pool more deployable and adaptable, and compatible with multiple RDMA protocols.
  • the first message can be sent based on the first connection included in the first connection group.
  • the first connection is selected from the connections included in the first connection group, and the first message is sent to the second communication device through the first connection.
  • the connection performance of the first connection meets the performance condition, and the connection performance includes at least one of the sending queue length, delay performance, or packet loss performance.
  • the performance condition when the connection performance includes the length of the sending queue, the performance condition may be that the sending queue length is the shortest; when the connection performance includes the delay performance or the packet loss performance, the performance condition may be that the delay is minimized or the packet loss rate is minimized.
  • the first connection may be selected from the connections in the first connection group by a round-robin scheduling (RR) algorithm.
  • the RR algorithm may be used to schedule the connections in the first connection group. The same indiscriminate round-robin scheduling service.
  • the connection performance of each connection included in the first connection group before selecting the first connection in the first connection group, it is also necessary to obtain the connection performance of each connection included in the first connection group.
  • a probe message is sent based on any connection; at least one of the delay performance or packet loss performance of any connection is obtained through the probe message.
  • the quintuple of the probe message is the same as the quintuple corresponding to the any connection.
  • the probe message can be a message transmitted based on any connection, or it can be a probe message sent specifically to obtain the connection performance.
  • the probe message can be sent periodically based on any connection to obtain the actual connection performance corresponding to any connection in real time; it is also possible to send a probe message based on any connection when there is a measurement requirement for any connection to conduct targeted detection of the connection performance and reduce communication pressure.
  • the first message when the data volume of the first message is greater than a threshold, the first message may be divided into a plurality of first sub-messages.
  • the threshold may be a maximum transmission unit (MTU), where MTU refers to the maximum amount of data that an Ethernet frame can carry, and the first message may be divided using the granularity of the threshold MTU.
  • MTU maximum transmission unit
  • the first message when the data volume of the first message is 8000 bytes and the MTU is 4000 bytes, the first message may be divided into two first sub-messages of 4000 bytes.
  • the embodiment of the present application does not limit the manner in which the first message is divided.
  • the first message may be divided into a plurality of first sub-messages according to a memory address range corresponding to the first message, and a first sub-message corresponds to a memory address interval in the memory address range.
  • the plurality of first sub-messages are sent based on the first connection group. Therefore, when the first connection group includes a plurality of connections, the first connections selected in the first connection group may be a plurality of connections, so as to respectively send the plurality of first sub-messages to the second communication device through the plurality of first connections.
  • a first number of first connections may be selected from the plurality of connections included in the first connection group according to the first number of the plurality of first sub-messages, that is, the plurality of first sub-messages are distributed to different connections for transmission, or a second number of first connections may be selected from the plurality of connections included in the first connection group according to the first number of the plurality of first sub-messages, and the second number is less than the first number.
  • the performance condition may be the shortest send queue length and the second shortest send queue length.
  • the principle of selecting the first connection may be the best connection performance or load balancing among multiple connections.
  • the WQ data may be divided into corresponding multiple slice sub-WR data and distributed to the first physical QP connection, the second physical QP connection, and the third physical QP connection in connection group 1.
  • the WQ data may be divided into corresponding multiple slice sub-WR data and distributed to the first physical QP connection and the third physical QP connection in connection group 1.
  • each service in the embodiment of the present application can concurrently use multiple connections in a connection group to achieve load balancing between multiple connections, without being limited by the risk of service interruption under a single connection failure.
  • the load balancing between multiple paths of the network can also be improved, without being limited by the risk of service interruption under a single path failure.
  • an important packet is a message whose loss has a significant impact on transmission performance.
  • the important packet can be the last message in the sending queue corresponding to at least one connection, a control signaling message, a detection message, a heartbeat keep-alive message, an address request message, a packet loss retransmission message, or a service-specified important packet.
  • the embodiment of the present application does not limit the manner of adding the first identifier to the first message, so that the second communication device receiving the first message can determine that the first message is an important packet based on the first identifier. For example, a mark is made in the reserved field of the first message.
  • the first message is divided into multiple first sub-messages, when the first message is an important packet, the multiple first sub-messages are all important packets, and the first identifier is added to the multiple first sub-messages; when the first message is a non-important packet, each of the multiple first sub-messages is judged in turn whether it is an important packet, and the first identifier is added to each of the multiple first sub-messages.
  • the first message When the first communication device sends the first message to the second communication device based on the first connection group, the first message will be sent to the third communication device as an intermediate forwarding node in the network according to the routing algorithm.
  • the next hop for the first communication device to send the first message may be different third communication devices, but no matter which third communication device receives the first message in the middle, the operation performed by the third communication device is the same.
  • the first communication device when the first connection group includes multiple connections, the first communication device also performs priority scheduling and congestion control among the multiple connections.
  • priority scheduling can set different priorities for different physical QP connections according to service performance requirements, service flow length, etc.
  • the physical QP connection with a high priority is preferentially scheduled for sending.
  • the reserved physical QP connections in the connection group are set to high priority to transmit small service flows, thereby reducing the packet loss rate of small service flows.
  • Congestion control can load traditional congestion control algorithms.
  • the congestion control algorithm includes the following four stages: the first stage is slow start.
  • the initial execution of the sending node sending data is slow start.
  • the second stage is congestion avoidance.
  • a threshold is set.
  • the sending speed is reduced;
  • the third stage is fast retransmission.
  • the receiving node will confirm the last received ordered message segment.
  • the sending node executes fast retransmission; the fourth stage is fast recovery.
  • the sender receives three duplicate confirmations in succession, the threshold is reduced and congestion avoidance is executed.
  • the first communication device can also send the second message or the third message based on the shared first connection group.
  • the first communication device sends the second message or the third message based on the shared first connection group, including but not limited to the following four scenarios.
  • the first connection group includes a first connection between a first communication device and a second communication device.
  • the first communication device obtains a second message corresponding to a second service.
  • the destination of the second message is the same as the destination of the first message, which is the second communication device.
  • the first communication device sends the second message to the second communication device according to the first connection in the first connection group.
  • the first connection group also includes a second connection between the first communication device and the second communication device.
  • the first communication device obtains a second message corresponding to the first service, and the destination of the second message is the second communication device; the first communication device can send the second message to the second communication device according to the second connection in the first connection group.
  • the way of transmitting the same service through multiple connections can achieve load balancing between multiple connections, and can avoid service interruption problems caused by single connection failures.
  • the first connection in the first connection group fails, it can switch to the second connection in the first connection group for transmission, ensuring the reliability and stability of service transmission.
  • different connections correspond to different physical paths in the network, it can also achieve load balancing between multiple paths in the network, and avoid service interruption problems caused by single path failures, further ensuring the reliability and stability of service transmission.
  • the first communication device obtains a third message corresponding to the first service, and the destination of the third message is the fourth communication device; the first communication device sends the third message to the fourth communication device according to the first connection group.
  • the first communication device can send messages of the same service to the second communication device and the fourth communication device through the shared first connection group, further improving the connection scalability and data transmission performance.
  • the first communication device obtains a second message corresponding to a second service, and the destination of the second message is a fourth communication device; the first communication device sends the second message to the fourth communication device according to the first connection group.
  • the first communication device can send messages of different services to the second communication device and the fourth communication device through the shared first connection group, further improving the connection scalability and data transmission performance.
  • the first connection group is used not only for sharing services transmitted from the first communication device to the second communication device, but also for sharing services transmitted from the first communication device to the fourth communication device. If the destination of the second message is different from the destination of the first message, the above-mentioned DC mode can be adopted, that is, the first communication device can share a connection in a connection group and cyclically establish connections with different destination communication devices to transmit data.
  • Step 303 The third communication device receives a first message corresponding to the first service sent by the first communication device based on the first connection group.
  • the third communication device when the first communication device sends a first message corresponding to the first service based on the first connection in the first connection group, the third communication device also receives the first message corresponding to the first service sent by the first communication device based on the first connection in the first connection group.
  • the third communication device after the third communication device receives the first message corresponding to the first service sent by the first communication device to the second communication device, it will perform certain processing on the first message, which is mainly reflected in the protection processing of important packets to reduce the packet loss rate of important packets on the third communication device, thereby reducing the impact of transmission performance caused by packet loss.
  • the third communication device when the first message is an important packet, performs protection processing on the important packet.
  • the method of protection processing is not limited in the embodiment of the present application, and it is sufficient that the packet loss rate of the important packet can be reduced.
  • the method of protection processing by the third communication device on the important packet can be to process the first message according to a first packet loss rate, and the first packet loss rate is lower than the packet loss rate of non-important packets.
  • the first message carries a first mark
  • the third communication device determines that the first message is an important packet based on the first mark. Configuring a packet loss rate lower than that of non-important packets for important packets and processing important packets according to a packet loss rate lower than that of non-important packets can reduce the number of times important packets are lost.
  • the embodiment of the present application does not limit the value of the configured first packet loss rate, as long as it is lower than the packet loss rate of non-important packets.
  • the packet loss rate refers to the probability of the message being discarded. The greater the packet loss rate, the greater the possibility of the message being discarded.
  • the packet loss rate of non-important packets The first packet loss rate is 80%, and the first packet loss rate can be any value between 0-80%.
  • the first packet loss rate can be 10%.
  • the method of processing the first message according to the first packet loss rate can be that when the third communication device needs to lose packets, a random number is generated based on the first packet loss rate; when the random number is greater than the threshold, the first message is discarded; when the random number is not greater than the threshold, the first message is forwarded, and the threshold can be flexibly adjusted according to the application scenario.
  • the important packets may be processed at a lower packet loss rate than that of non-important packets by placing the important packets in a reserved buffer or raising the priority of the important packets. Since the packets in the reserved buffer will not be lost and the high-priority packets will not be preferentially lost, the effect of reducing the packet loss rate of the important packets can be achieved.
  • the reduction in the packet loss rate of the important packets also reduces the number of timeout retransmissions caused by the loss of the important packets. Since the timeout retransmission will cause a large network delay, the reduction in the number of timeout retransmissions reduces the impact of packet loss on network delay.
  • the third communication device can accurately identify the important packets that have a serious impact on business performance after packet loss.
  • the third communication device protecting the important packets, the transmission stability of the important packets is improved, thereby ensuring the stability of business performance in a lossy network.
  • the third communication device also receives a detection message sent by the first communication device based on any connection in the first connection group, and adds transmission information to the detection message, the transmission information including at least one of node identification, packet loss, delay or throughput; and transmits the detection message with the added transmission information.
  • the detection message is transmitted back to the first communication device, the first communication device obtains at least one of delay performance or packet loss performance based on the transmission information added in the detection message.
  • the detection message includes the packet loss and delay of each node on the physical path corresponding to any connection, and the first communication device performs statistical analysis on the packet loss and delay of each node to obtain the delay performance and packet loss performance corresponding to the any connection.
  • the second communication device when the data volume of the first message is greater than a threshold value, receives multiple first sub-messages obtained by segmenting the first message.
  • the third communication device also includes a disorder marking function. For example, when multiple first sub-messages are out of order, an out-of-order mark is added to the first sub-message that is out of order among the multiple first sub-messages; the first sub-message with the out-of-order mark is transmitted, and the out-of-order mark is used to indicate that multiple first sub-messages are out of order but no packet loss occurs. In this way, it is possible to avoid packet retransmission initiated by the receiving node due to mistaking packet loss for packet loss caused by disorder.
  • the multiple first sub-messages received by the third communication device respectively include sequence numbers sent in sequence.
  • the third communication device can identify the sequence number of each message in the sending queue and determine whether disorder occurs according to the order of the sequence numbers of the messages in the sending queue. For example, if the order of the sequence numbers of the messages in the sending queue is 1, 2, 3, 5, 4, it is determined that the message with sequence number 5 is out of order but no packet loss has occurred.
  • a disorder flag is added to the message with sequence number 5, so that when the receiving node does not receive the message with determined sequence number 4 but receives the message with sequence number 5, it determines that the message with sequence number 4 is not lost according to the disorder flag in the message with sequence number 5; if the order of the sequence numbers of the messages in the sending queue is 1, 2, 3, 5, 6, it is determined that no disorder has occurred and the message with sequence number 4 has been lost, then the disorder flag is not added to the message with sequence number 5, so that when the receiving node does not receive the message with determined sequence number 4 but receives the message with sequence number 5, it determines that the message with sequence number 4 is lost according to the fact that the message with sequence number 5 does not include the disorder flag.
  • the third communication device can also receive a second message corresponding to the second service sent by the first communication device based on the first connection, and the destination of the second message is the second communication device.
  • the third communication device can also receive a second message corresponding to the first service sent by the first communication device based on the second connection, and the destination of the second message is the second communication device.
  • the third communication device can also receive a third message corresponding to the first service sent by the first communication device based on the first connection group, and the destination of the third message is the fourth communication device.
  • the third communication device can also receive a third message corresponding to the second service sent by the first communication device based on the first connection group, and the destination of the third message is the fourth communication device.
  • Step 304 The third communication device sends a first message to the second communication device.
  • the third communication device may send the first message to the second communication device according to the destination of the first message being the second communication device.
  • the third communication device sends the first message to the second communication device based on the first connection.
  • the first communication device sends the second message or the third message based on the shared first connection group.
  • scenario one after receiving the second message corresponding to the second service sent by the first communication device based on the first connection, the third communication device sends the second message corresponding to the second service to the second communication device based on the fact that the destination of the second message is the second communication device.
  • scenario two after receiving the second message corresponding to the first service sent by the first communication device based on the second connection, the third communication device sends the second message corresponding to the first service to the second communication device based on the fact that the destination of the second message is the second communication device.
  • the third communication device after receiving the third message corresponding to the first service sent by the first communication device based on the first connection group, the third communication device sends the second message corresponding to the first service to the second communication device based on the fact that the destination of the third message is the second communication device.
  • the third communication device is a fourth communication device, and the third message corresponding to the first service is sent to the fourth communication device.
  • the third communication device after receiving the third message corresponding to the second service sent by the first communication device based on the first connection group, the third communication device sends the third message corresponding to the second service to the fourth communication device based on the fact that the destination of the third message is the fourth communication device.
  • Step 305 The second communication device receives a first message corresponding to the first service.
  • the first message corresponding to the first service sent by the first communication device based on the first connection group can be sent to the second communication device through the intermediate forwarding node.
  • the second communication device receives the first message corresponding to the first service forwarded by the third communication device.
  • the second communication device can also directly receive the first message corresponding to the first service sent by the first communication device based on the first connection group.
  • the second communication device can receive the first message corresponding to the first service based on the first connection group.
  • the second communication device also includes a corresponding connection resource pool, and a mapping connection is made between the physical QP connection in the connection resource pool and the logical QP connection of the upper layer application. After the second communication device receives the first message of the first service based on the first connection group, it sends it to the upper layer application through the corresponding logical QP connection.
  • the second communication device when the data volume of the first message is greater than a threshold, receives multiple first sub-messages obtained by segmenting the first message.
  • the first communication device also adds a corresponding sequence number, sub-sequence number and interface identifier to each first sub-message, so that after the second communication device receives multiple first sub-messages, it can arrange and combine the multiple first sub-messages according to the sequence number and the sub-sequence number to obtain the first message, and send the first message to the application through the logical interface corresponding to the interface identifier.
  • the second communication device also includes a disorder marking function
  • the second communication device receives any first sub-message that occurs in disorder among multiple first sub-messages, if any first sub-message does not carry an disorder identifier, it is determined that multiple first sub-messages have been lost, and a packet loss notification message is sent to trigger packet loss retransmission; if any first sub-message carries an disorder identifier, and it is determined based on the disorder identifier that multiple first sub-messages are disordered but no packet loss has occurred, the packet loss notification message is not sent first. If the first sub-message before the disorder is still not received after a period of time, the packet loss notification message is sent again.
  • the second communication device can also receive a second message corresponding to a second service sent by the first communication device based on the first connection in the first connection group. At this time, the destination of the second message is the same as the destination of the first message, but the second message and the first message belong to different services.
  • the second communication device can also receive a second message corresponding to a first service sent by the first communication device based on the second connection in the first connection group. At this time, the destination of the second message is the same as the destination of the first message, and the second message and the first message belong to the same service.
  • the second service and the first service may be two service threads in the same service process, or two service threads in different service processes of the same upper-layer application, or two service threads in different upper-layer applications.
  • the data transmission method provided in the embodiment of the present application can share and reuse the same connection group to transmit messages, thereby improving the scalability of the connection and improving the performance and reliability of data transmission.
  • the number of connection groups established by the connection resource pool in the embodiment of the present application is determined by the number of destination communication devices, and the number of physical QP connections included in the connection group can be flexibly adjusted. Therefore, the number of physical QP connections will not increase with the increase of data of the business of the upper-layer application, thereby realizing the decoupling of the number of local connections and the number of upper-layer services, and as the network scale increases, under the condition of limited control of the number of local network card connections, the connection expansion capability can be enhanced by establishing a connection resource pool.
  • connection group when the connection group includes multiple connections, due to the realization of RDMA multi-connection capability, the load balancing between multiple connections can be improved, and other connections can be switched in time when a single connection fails, thereby improving transmission reliability.
  • the transmission tail delay caused by the loss of important packets is effectively reduced, and the stability of data transmission performance under lossy networks is further improved.
  • FIG 5 is a schematic diagram of the functional modules of a communication device provided in an embodiment of the present application.
  • the communication device shown in Figure 5 can be used to perform the operations performed by the first communication device or the second communication device in the data transmission method shown in Figure 3 above, that is, the various functional modules of the communication device shown in Figure 5 are used to implement the functional operations performed by the first communication device or the second communication device in the data transmission method shown in Figure 3 above.
  • the functional modules of the communication device can be divided into an application layer, a transport layer, and a network card.
  • the application layer may include N (N is a positive integer) application services such as AI/HPC, distributed storage, containers, or big data, and one application service includes at least one service.
  • each application service is connected to an application program interface (API) through a corresponding connector (connector, Conn), and the API may be a socket API or a verb API.
  • API application program interface
  • the embodiment of the present application designs a new RDMA loss-tolerant transport layer (i.e., the transport layer middleware shown in FIG5 ) between the application service of the communication device and the network card.
  • the transport layer middleware includes a WR segmentation module, a connection sharing module, a connection measurement module, Multiple connection selection modules, coloring modules, reordering modules and scheduling modules.
  • some functional modules in the transport layer middleware can also be unloaded to the network card, for example, the connection measurement module, coloring module, reordering module and scheduling module can be unloaded to the network card.
  • the first communication device or the second communication device can refer to the middleware between the application layer software and the network card hardware, for example, the transport layer middleware shown in Figure 5.
  • the WR segmentation module is used to segment a large block of WR data into multiple WR sub-blocks.
  • the WR segmentation module determines whether the WR data is a large block of WR data by a threshold value (such as MTU).
  • a threshold value such as MTU
  • the segmentation method of the large block of WR data into multiple WR sub-blocks is limited. For example, the entire memory address of the large block of WR data is segmented into multiple WR sub-blocks by the first address offset.
  • the segmentation of WR sub-blocks can more easily realize fine-grained multi-connection scheduling and congestion control, thereby improving data transmission performance.
  • a connection sharing module is used to establish a connection resource pool.
  • the connection resource pool includes at least one connection group, and one connection group corresponds to one destination communication device, that is, the remote receiving node corresponding to each connection group is the same.
  • each connection group includes at least one QP connection, the sending node and the receiving node of each QP connection are the same, and the source port of each QP connection is different. Therefore, for any connection group, the network node can ECMP different connections for different QP connections according to the different source ports of different QP connections included in any connection group. In this case, different QP connections can correspond to different physical paths.
  • the connection sharing module first searches the connection resource pool for a connection group corresponding to the destination communication device. If there is a connection group corresponding to the destination communication device, a new connection group is not established; if there is no connection group corresponding to the destination communication device, a new connection group is established with the IP of the destination communication device, and the original connection established by the application service is used as a logical QP connection, and the QP connection in the connection pool is a physical QP connection that actually sends data.
  • the connection sharing module carries the WR sub-block in the connection group of the destination communication device corresponding to the destination IP for the WR sub-block from the logical QP connection according to the destination IP of the WR sub-block for transmission.
  • the WR sub-blocks of the logical QP connection from different upper-layer services can be sent using the physical QP connection in the same connection group as long as the destination communication device corresponding to the destination IP of the WR sub-block is the same, thereby realizing the mutual mapping between the physical QP connection and the logical QP connection in the connection resource pool, and achieving the effect of connection sharing multiplexing.
  • connection mode of the physical QP connection in the connection resource pool can be a connection mode such as RC or DC.
  • a service flow of a service if there is no corresponding connection group in the connection resource pool, it can be sent based on the established physical QP connection of the DC mode first, and when the data volume of the service flow sent exceeds a certain threshold, it can be switched to the established physical QP connection of the RC mode for sending. Since the establishment speed of the physical QP connection of the DC mode is faster than the establishment speed of the physical QP connection of the RC mode, first sending based on the physical QP connection of the DC mode can speed up the data transmission speed and reduce the transmission delay. After switching to the physical QP connection of the RC mode, it can solve the inefficiency problem caused by switching back and forth between different purpose communication devices caused by the DC mode.
  • the connection sharing module is also used to add an identifier (identity, ID) of the logical QP connection in each WR sub-block, so that the receiving node can hand over the large block of WR data before the WR sub-block is split to the original connection called by the application service based on the ID of the logical QP connection in each WR sub-block.
  • ID an identifier
  • the connection measurement module is used to send a detection message to measure the connection information.
  • the connection measurement module can perform connection measurement periodically, or it can perform connection measurement according to an event trigger.
  • the detection message can be a path-by-path message transmitted by the physical QP connection corresponding to the connection to be measured, for example, any WR sub-block sent based on the physical QP connection; or it can be a designated detection message with a detection function, the quintuple of the detection message is the same as the quintuple of the physical QP connection corresponding to the connection to be detected, so as to ensure that the physical path on the intermediate forwarding node is the same. Therefore, the connection measurement module can measure the delay, packet loss, etc. corresponding to different physical QP connections, so that the connection performance of different physical QP connections in the connection resource pool can be calibrated in real time.
  • a multi-connection selection module is used to select at least one physical QP connection in a determined connection group.
  • the multi-connection selection module distributes different WR sub-blocks to different physical QP connections according to the connection performance of different physical QP connections.
  • the distribution method can adopt the RR method, weight distribution, the shortest QP connection, or the shortest SQ queue in the QP connection.
  • the multi-connection selection module is also used to prune and replace any physical QP connection in the connection resource pool when the connection performance of the physical QP connection deteriorates, that is, to delete the physical QP connection from the connection group.
  • the multi-connection selection module adds the identifier WR_ID of the large block WR data before segmentation and the sequence number WR_SEQ of the WR sub-block after segmentation in each WR sub-block, so that the receiving node can select the WR_ID of the large block WR data before segmentation and the sequence number WR_SEQ of the WR sub-block after segmentation based on the WR_ID in each WR sub-block. and WR_SEQ, sort and combine the WR sub-blocks with the same WR_ID according to the order of WR_SEQ to obtain the large block of WR data before segmentation.
  • the coloring module is used to color important packets, that is, to mark the reserved fields of important packets.
  • important packets refer to packets whose loss will have a serious deterioration effect on service performance.
  • important packets may include detection packets for connection measurement, WR sub-blocks at the end of the sending queue SQ, control signaling packets for service transmission, packet loss retransmission packets, heartbeat keep-alive packets, address request packets or important packets specified by the service.
  • the coloring module needs to be implemented on the network card, and the packet loss retransmission packets are determined by sending data through the network card.
  • the out-of-order reordering module is used to sort the WR sub-blocks after segmentation.
  • the time for each WR sub-block to complete the transmission may be different, and therefore, the completion time of the CQE of each WR sub-block may be different.
  • the out-of-order reordering module is used to record the number of segmented WR sub-blocks.
  • the application business calls the API to send the WQE of the large block of WR data, and receives the CQE of the large block of WR data, which realizes the unawareness of the segmentation of the application business, that is, application transparency.
  • the out-of-order reordering module of the receiving node is used to sort and combine WR sub-blocks with the same WR_ID in the order of WR_SEQ based on the WR_ID and WR_SEQ in each WR sub-block.
  • the out-of-order reordering module is also used to maintain the order of two unconfirmed WR data with the same destination IP, that is, after any one WR data is confirmed, the other WR data is sent.
  • the out-of-order reordering module is also used to implement a selective retransmission mechanism, which is a mechanism that can achieve retransmission of only lost packet messages.
  • the scheduling module is used to perform priority scheduling and congestion control between different physical QP connections.
  • the connections selected by the multi-connection selection module are scheduled to be sent to meet the load balancing between the multiple connections and ensure the transmission performance of the multiple connections.
  • connection group sharing and multiplexing it is possible to utilize the existing RDMA interface and realize high-performance and high-reliability services in large-scale RDMA lossy networks through the method of connection group sharing and multiplexing, improve the scalability of RDMA connections, and solve the problem of limited connection scale.
  • connection group includes multiple connections
  • the service interruption problem caused by single connection transmission can be avoided through load sharing and scheduling between multiple connections, further ensuring the high reliability of data transmission.
  • multiple connections correspond to different multiple physical paths, it can solve the load balancing problem between multiple equal-cost paths in the network.
  • FIG 6 is a schematic diagram of the functional modules of a communication device provided in an embodiment of the present application.
  • the communication device shown in Figure 6 can be used to perform the operations performed by the third communication device in the data transmission method shown in Figure 3 above, that is, the various functional modules of the communication device shown in Figure 6 are used to implement the functional operations performed by the third communication device in the data transmission method shown in Figure 3 above.
  • the functional modules of the communication device include an important packet identification module, an important packet protection module, a notification message generation module, a connection measurement module and an out-of-order calibration module.
  • the communication device shown in Figure 6 is used to cooperate with the communication device shown in Figure 5 to implement the data transmission method provided in the embodiment of the present application.
  • the communication device shown in Figure 6 is mainly used to perform differentiated processing on important packets.
  • the important packet identification module is used to identify important packets, that is, to identify the message dyed by the dyeing module.
  • the received message is parsed, and when the reserved field of the message includes a dyeing mark, the received message is determined to be an important packet.
  • the important packet protection module is used to protect the important packets to reduce the packet loss rate of the important packets. For example, the important packets are placed in a reserved buffer and the priority of the important packets is increased.
  • the notification message generation module is used to construct a notification message without payload to the receiving node according to the demand, or to construct a loss confirmation (not acknowledgment, NAK) packet or confirmation (acknowledgement, ACK) packet to the sending node, or, when network congestion occurs, to construct a congestion notification message to the sending node.
  • the connection measurement module is used to receive the detection message sent by the connection measurement module in the communication device shown in Figure 5, add the identification, packet loss, delay, throughput and other information of the communication device in the detection message, and help the detection message be brought back to the communication device shown in Figure 5 to complete the connection measurement.
  • the out-of-order calibration module is used to add an out-of-order mark to the out-of-order message when the communication device causes out-of-order message forwarding, so as to help the receiving node distinguish between packet loss and message disorder. For example, if the receiving node receives an out-of-order message without an out-of-order mark, it believes that packet loss has occurred and sends a packet loss notification message; if the receiving node receives an out-of-order message with an out-of-order mark, it believes that disorder has occurred but packet loss has not occurred, and will delay sending the packet loss notification message.
  • Figure 7 is a schematic diagram of the structure of a data transmission device provided in the embodiment of the present application, and the device is applied to the first communication device shown in Figure 3. Based on the following multiple modules shown in Figure 7, the data transmission device shown in Figure 7 can perform all or part of the operations performed by the first communication device. It should be understood that the device may include more additional modules than the modules shown or omit some of the modules shown therein, and the embodiment of the present application does not limit this. As shown in Figure 7, the device includes:
  • An acquisition module 701 is used to acquire a first message corresponding to a first service, where the destination of the first message is a second communication device;
  • the sending module 702 is used to send a first message to the second communication device according to the first connection group, and the first connection group is shared by the service transmitted by the first communication device.
  • the first connection group includes a first connection between the first communication device and the second communication device, and the sending module 702 is configured to send a first message to the second communication device according to the first connection;
  • the acquisition module 701 is further used to acquire a second message corresponding to a second service, where the destination of the second message is a second communication device;
  • the sending module 702 is further configured to send a second message to the second communication device according to the first connection.
  • the first connection group includes a first connection between the first communication device and the second communication device, and the sending module 702 is configured to send a first message to the second communication device according to the first connection;
  • the acquisition module 701 is further configured to acquire a second message corresponding to the first service, where the destination of the second message is the second communication device;
  • the sending module 702 is further configured to send a second message to the second communication device according to the second connection.
  • the sending module 702 is used to select the first connection to send the first message to the second communication device according to whether the connection performance of the first connection meets the performance condition, and the connection performance includes at least one of the sending queue length, delay performance or packet loss performance.
  • the first communication device includes a connection resource pool
  • the connection resource pool includes at least one connection group
  • each connection group corresponds to a destination communication device
  • the at least one connection group includes the first connection group
  • the device further includes:
  • a splitting module configured to, when the data volume of the first message is greater than a threshold, cause the first communication device to split the first message into a plurality of first sub-messages
  • the sending module 702 is configured to send a plurality of first sub-messages to the second communication device according to the first connection group.
  • the first communication device includes an application layer and a transport layer; an acquisition module 701 is configured for the transport layer to receive a first message corresponding to the first service sent by the application layer calling a logical interface of the first service;
  • the sending module 702 is used for, when the transport layer obtains notification messages respectively corresponding to a plurality of first sub-messages, the transport layer calls the logical interface of the first service to send the notification message corresponding to the first message to the application layer.
  • the first communication device is middleware between application layer software and network card hardware.
  • the first message is transmitted based on the RDMA protocol.
  • FIG8 is a schematic diagram of the structure of a data transmission device provided in an embodiment of the present application, and the device is applied to the third communication device shown in FIG3. Based on the following multiple modules shown in FIG8, the data transmission device shown in FIG8 can perform all or part of the operations performed by the third communication device. It should be understood that the device may include more additional modules than the modules shown or omit some of the modules shown therein, and the embodiment of the present application does not limit this. As shown in FIG8, the device includes:
  • a receiving module 801 is configured to receive a first message corresponding to a first service sent by a first communication device based on a first connection group, the destination of the first message is a second communication device, and the first connection group is shared by a service transmitted by the first communication device;
  • the sending module 802 is configured to send a first message to a second communication device.
  • the first connection group includes a first connection between the first communication device and the second communication device, and the receiving module 801 is configured to receive a first message corresponding to the first service sent by the first communication device based on the first connection;
  • the receiving module 801 is further configured to receive a second message corresponding to a second service sent by the first communication device based on the first connection, where the destination of the second message is the second communication device;
  • the sending module 802 is further configured to send a second message to the second communication device.
  • the first connection group includes a first connection and a second connection between the first communication device and the second communication device, and the receiving module 801 is configured to receive a first message corresponding to the first service sent by the first communication device based on the first connection;
  • the receiving module 801 is further configured to receive a second message corresponding to the first service sent by the first communication device based on the second connection, where the destination of the second message is the second communication device;
  • the sending module 802 is further configured to send a second message to the second communication device.
  • the first message includes a plurality of first sub-messages
  • the device further includes: an adding module, configured to add a disorder identifier to a disordered first sub-message among the plurality of first sub-messages when the plurality of first sub-messages are disordered;
  • the transmission module is used to transmit the first sub-message with an out-of-order flag added, where the out-of-order flag is used to indicate that a plurality of first sub-messages are out of order but no packet loss occurs.
  • the first message is transmitted based on the RDMA protocol.
  • FIG9 is a schematic diagram of the structure of a data transmission device provided in an embodiment of the present application, and the device is applied to the second communication device shown in FIG3. Based on the following multiple modules shown in FIG9, the data transmission device shown in FIG9 can perform all or part of the operations performed by the second communication device. It should be understood that the device may include more additional modules than the modules shown or omit some of the modules shown therein, and the embodiment of the present application does not limit this. As shown in FIG9, the device includes:
  • the receiving module 901 is used to receive a first message corresponding to a first service, the first message is sent by a first communication device based on a first connection group, the destination of the first message is a second communication device, and the first connection group is shared by the service transmitted by the first communication device.
  • the first connection group includes a first connection between the first communication device and the second communication device, and the first message is sent by the first communication device based on the first connection;
  • the receiving module 901 is further configured to receive a second message corresponding to a second service, where the second message is sent by the first communication device based on the first connection, and the destination of the second message is the second communication device.
  • the first connection group includes a first connection and a second connection between the first communication device and the second communication device, and the first message is sent by the first communication device based on the first connection;
  • the receiving module 901 is further configured to receive a second message corresponding to the first service, where the second message is sent by the first communication device based on the second connection, and the destination of the second message is the second communication device.
  • the first message includes a plurality of first sub-messages
  • the second communication device includes an application layer and a transport layer
  • Receiving module 901 is used for receiving multiple first sub-messages at the transport layer, where the multiple first sub-messages respectively include corresponding sequence numbers, sub-sequence numbers and interface identifiers, and the multiple first sub-messages are obtained by segmenting the first message by the first communication device; the transport layer arranges and combines the multiple first sub-messages according to the sequence numbers and sub-sequence numbers to obtain the first message; the transport layer calls the logical interface corresponding to the interface identifier to send the first message to the application layer.
  • the device further includes:
  • the determination module is used to determine that multiple first sub-messages are out of order but no packet loss occurs based on the out of order identifier when any first sub-message that is out of order is received, if any first sub-message carries an out of order identifier.
  • the second communication device is middleware between application layer software and network card hardware.
  • the first message is transmitted based on the RDMA protocol.
  • the data transmission device realizes the shared reuse of the connection group, so that the number of local connections will not increase with the increase of business, and the data transmission performance is not limited by resources such as network card memory.
  • the connection group includes multiple connections, due to the realization of RDMA multi-connection capability, the load balancing between multiple connections can be improved, and other connections can be switched in time when a single connection fails, thereby improving transmission reliability.
  • the transmission tail delay caused by the loss of important packets can be effectively reduced, further improving the stability of data transmission performance under lossy networks.
  • FIG. 10 shows a schematic diagram of the structure of a data transmission device 2000 provided by an exemplary embodiment of the present application.
  • the data transmission device 2000 shown in FIG. 10 is used to perform the operations involved in the data transmission method shown in FIG. 3 .
  • the data transmission device 2000 is, for example, a switch, a router, etc., and the data transmission device 2000 can be implemented by a general bus architecture.
  • the data transmission device 2000 includes at least one processor 2001 , a memory 2003 , and at least one communication interface 2004 .
  • the processor 2001 is, for example, a general-purpose central processing unit (CPU), a digital signal processor (DSP), a network processor (NP), a graphics processing unit (GPU), a neural-network processing unit (NPU), a data processing unit (DPU), a microprocessor, or one or more integrated circuits for implementing the solution of the present application.
  • the processor 2001 includes a dedicated integrated circuit (application-specific integrated circuit, ASIC), programmable logic device (PLD) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • PLD is, for example, a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL) or any combination thereof.
  • the processor can implement or execute various logic blocks, modules and circuits described in conjunction with the disclosure of the embodiments of the present invention.
  • the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the data transmission device 2000 further includes a bus.
  • the bus is used to transmit information between the components of the data transmission device 2000.
  • the bus may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus may be divided into an address bus, a data bus, a control bus, etc.
  • FIG10 is represented by only one line, but it does not mean that there is only one bus or one type of bus.
  • the memory 2003 is, for example, a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, or a random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory 2003 is, for example, independent and connected to the processor 2001 through a bus.
  • the memory 2003 can also be integrated with the processor 2001.
  • the communication interface 2004 uses any transceiver-like device to communicate with other devices or communication networks, and the communication network can be Ethernet, radio access network (RAN) or wireless local area network (WLAN), etc.
  • the communication interface 2004 can include a wired communication interface and a wireless communication interface.
  • the communication interface 2004 can be an Ethernet interface, a Fast Ethernet (FE) interface, a Gigabit Ethernet (GE) interface, an Asynchronous Transfer Mode (ATM) interface, a wireless local area network (WLAN) interface, a cellular network communication interface or a combination thereof.
  • the Ethernet interface can be an optical interface, an electrical interface or a combination thereof.
  • the communication interface 2004 can be used for the data transmission device 2000 to communicate with other devices.
  • the processor 2001 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG10 . Each of these processors may be a single-core CPU processor or a multi-core CPU processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
  • the data transmission device 2000 may include multiple processors, such as the processor 2001 and the processor 2005 shown in FIG10 . Each of these processors may be a single-core CPU or a multi-core CPU.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (such as computer program instructions).
  • the data transmission device 2000 may also include an output device and an input device.
  • the output device communicates with the processor 2001 and can display information in a variety of ways.
  • the output device may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector.
  • the input device communicates with the processor 2001 and can receive user input in a variety of ways.
  • the input device may be a mouse, a keyboard, a touch screen device, or a sensor device.
  • the memory 2003 is used to store the program code 2010 for executing the solution of the present application
  • the processor 2001 can execute the program code 2010 stored in the memory 2003. That is, the data transmission device 2000 can implement the data transmission method provided by the method embodiment through the processor 2001 and the program code 2010 in the memory 2003.
  • the program code 2010 may include one or more software modules.
  • the processor 2001 itself can also store the program code or instruction for executing the solution of the present application.
  • the data transmission device 2000 of the embodiment of the present application may correspond to the third communication device in the above-mentioned method embodiments.
  • the processor 2001 in the data transmission device 2000 reads the instructions in the memory 2003, so that the data transmission device 2000 shown in Figure 10 can execute all or part of the operations performed by the third communication device.
  • the processor 2001 is used to receive a first message corresponding to a first service sent by a first communication device based on a first connection group, the destination of the first message is a second communication device, and the first connection group is shared by the service transmitted by the first communication device; Send the first message.
  • the data transmission device 2000 may also correspond to the data transmission apparatus shown in FIG8 , and each functional module in the data transmission apparatus is implemented by the software of the data transmission device 2000.
  • the functional modules included in the data transmission apparatus are generated after the processor 2001 of the data transmission device 2000 reads the program code 2010 stored in the memory 2003.
  • each step of the data transmission method shown in Figure 3 is completed by the hardware integrated logic circuit or software instructions in the processor of the data transmission device 2000.
  • the steps of the method disclosed in conjunction with the embodiment of the present application can be directly embodied as a hardware processor, or a combination of hardware and software modules in the processor.
  • the software module can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory or an electrically erasable programmable memory, a register, etc.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in conjunction with its hardware. To avoid repetition, it is not described in detail here.
  • Fig. 11 shows a schematic diagram of the structure of a data transmission device 2100 provided by another exemplary embodiment of the present application.
  • the data transmission device 2100 shown in Fig. 11 is used to perform all or part of the operations involved in the data transmission method shown in Fig. 3.
  • the data transmission device 2100 is, for example, a switch, a router, etc., and the data transmission device 2100 can be implemented by a general bus architecture.
  • the data transmission device 2100 includes: a main control board 2110 and an interface board 2130 .
  • the main control board is also called the main processing unit (MPU) or route processor card.
  • the main control board 2110 is used to control and manage various components in the data transmission device 2100, including routing calculation, device management, device maintenance, and protocol processing functions.
  • the main control board 2110 includes: a central processing unit 2111 and a memory 2112.
  • the interface board 2130 is also called a line processing unit (LPU), a line card or a service board.
  • the interface board 2130 is used to provide various service interfaces and realize the forwarding of data packets.
  • the service interface includes but is not limited to an Ethernet interface, a POS (Packet over SONET/SDH) interface, etc.
  • the Ethernet interface is, for example, a Flexible Ethernet Clients (FlexE Clients) service interface.
  • the interface board 2130 includes: a central processor 2131, a network processor 2132, a forwarding table entry memory 2134 and a physical interface card (PIC) 2133.
  • PIC physical interface card
  • the central processor 2131 on the interface board 2130 is used to control and manage the interface board 2130 and communicate with the central processor 2111 on the main control board 2110 .
  • the network processor 2132 is used to implement the forwarding processing of the message.
  • the network processor 2132 can be in the form of a forwarding chip.
  • the forwarding chip can be a network processor (NP).
  • the forwarding chip can be implemented by an application-specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • the network processor 2132 is used to forward the received message based on the forwarding table stored in the forwarding table entry memory 2134.
  • the message is sent to the CPU (such as the central processor 2131) for processing; if the destination address of the message is not the address of the data transmission device 2100, the next hop and the output interface corresponding to the destination address are found from the forwarding table according to the destination address, and the message is forwarded to the output interface corresponding to the destination address.
  • the processing of the uplink message may include: processing of the message input interface, forwarding table search; the processing of the downlink message may include: forwarding table search, etc.
  • the central processing unit may also perform the functions of the forwarding chip, such as implementing software forwarding based on a general-purpose CPU, so that a forwarding chip is not required in the interface board.
  • the physical interface card 2133 is used to implement the physical layer docking function, whereby the original traffic enters the interface board 2130, and the processed message is sent out from the physical interface card 2133.
  • the physical interface card 2133 also called a daughter card, can be installed on the interface board 2130, and is responsible for converting the photoelectric signal into a message and forwarding the message to the network processor 2132 for processing after checking the legitimacy of the message.
  • the central processor 2131 can also perform the functions of the network processor 2132, such as implementing software forwarding based on a general-purpose CPU, so that the network processor 2132 is not required in the physical interface card 2133.
  • the data transmission device 2100 includes a plurality of interface boards, for example, the data transmission device 2100 further includes an interface board 2140, and the interface board 2140 includes: a central processor 2141, a network processor 2142, a forwarding table entry memory 2144, and a physical interface card 2143.
  • the functions and implementation methods of the components in the interface board 2140 are the same or similar to those of the interface board 2130, and are not described in detail herein.
  • the data transmission device 2100 further includes a switching fabric board 2120.
  • the switching fabric board 2120 may also be referred to as a switch fabric unit (SFU).
  • SFU switch fabric unit
  • the switching fabric board 2120 is used to complete data exchange between the interface boards.
  • the interface board 2130 and the interface board 2140 may communicate through the switching fabric board 2120.
  • the main control board 2110 is coupled to the interface board.
  • the main control board 2110, the interface board 2130, the interface board 2140, and the switching network board 2120 are connected to the system backplane through the system bus to achieve intercommunication.
  • an inter-process communication (IPC) channel is established between the main control board 2110 and the interface board 2130 and the interface board 2140.
  • the interface board 2140 communicates with each other through the IPC channel.
  • IPC inter-process communication
  • the data transmission device 2100 includes a control plane and a forwarding plane.
  • the control plane includes a main control board 2110 and a central processing unit 2111.
  • the forwarding plane includes various components for performing forwarding, such as a forwarding table entry memory 2134, a physical interface card 2133, and a network processor 2132.
  • the control plane performs functions such as a router, generates a forwarding table, processes signaling and protocol messages, and configures and maintains the status of the data transmission device.
  • the control plane sends the generated forwarding table to the forwarding plane.
  • the network processor 2132 forwards the message received by the physical interface card 2133 based on the forwarding table sent by the control plane.
  • the forwarding table sent by the control plane can be stored in the forwarding table entry memory 2134. In some embodiments, the control plane and the forwarding plane can be completely separated and not on the same data transmission device.
  • main control boards there may be one or more main control boards, and when there are multiple boards, they may include a primary main control board and a backup main control board. There may be one or more interface boards. The stronger the data processing capability of the data transmission device, the more interface boards are provided. There may also be one or more physical interface cards on the interface board. There may be no switching network board, or there may be one or more switching network boards. When there are multiple switching network boards, they can jointly realize load sharing and redundant backup. In a centralized forwarding architecture, the data transmission device may not need a switching network board, and the interface board is responsible for the processing function of the service data of the entire system.
  • the data transmission device may have at least one switching network board, and the switching network board is used to realize data exchange between multiple interface boards, providing large-capacity data exchange and processing capabilities. Therefore, the data access and processing capabilities of the data transmission device with a distributed architecture are greater than those of the data transmission device with a centralized architecture.
  • the data transmission device may be in the form of only one board, that is, without a switching network board, and the functions of the interface board and the main control board are integrated on the board.
  • the central processor on the interface board and the central processor on the main control board can be combined into one central processor on the board to perform the functions of the two.
  • This form of data transmission device has low data exchange and processing capabilities (for example, low-end switches or routers and other data transmission devices). Which architecture to use depends on the specific networking deployment scenario, and no limitation is made here.
  • the data transmission device 2100 corresponds to the data transmission apparatus shown in FIG8 .
  • the receiving module 801 in the data transmission apparatus shown in FIG8 corresponds to the physical interface card 2133 in the data transmission device 2100 .
  • the embodiment of the present application also provides a communication device, which includes: a transceiver, a memory, and a processor.
  • the transceiver, the memory, and the processor communicate with each other through an internal connection path, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory to control the transceiver to receive signals and control the transceiver to send signals, and when the processor executes the instructions stored in the memory, the processor executes the method required to be executed by the first communication device, or the processor executes the method required to be executed by the second communication device, or the processor executes the method required to be executed by the third communication device.
  • the embodiment of the present application also provides a data transmission device, which includes: a processor, the processor is coupled to a memory, the memory stores at least one program instruction or code, and the at least one program instruction or code is loaded and executed by the processor, so that the data transmission device implements the data transmission method shown in Figure 3.
  • a data transmission device which includes: a processor, the processor is coupled to a memory, the memory stores at least one program instruction or code, and the at least one program instruction or code is loaded and executed by the processor, so that the data transmission device implements the data transmission method shown in Figure 3.
  • the memory may be integrated with the processor, or the memory may be provided separately from the processor.
  • the memory can be a non-transitory memory, such as a read-only memory (ROM), which can be integrated with the processor on the same chip or can be set on different chips.
  • ROM read-only memory
  • FIG12 is a schematic diagram of the structure of a server provided in an embodiment of the present application, and the server 1400 shown in FIG12 is used to perform the operations involved in the first communication device or the second communication device in the data transmission method shown in FIG3 above.
  • the server 1400 may have relatively large differences due to different configurations or performances, and may include one or more processors 1401 and one or more memories 1402, wherein the one or more memories 1402 store at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 1401, so that the server implements the data transmission method provided by the above-mentioned various method embodiments.
  • the server 1400 may also have components such as a wired or wireless network interface, a keyboard, and an input and output interface for input and output, and the server 1400 may also include other components for implementing device functions, which will not be repeated here.
  • the embodiment of the present application also provides a data transmission system, which includes: a first communication device and a second communication device.
  • the data transmission system also includes a third communication device.
  • the data transmission method performed by the first communication device, the second communication device and the third communication device can refer to the relevant description of the embodiment shown in Figure 3 above, and will not be repeated here.
  • the first communication device and the second communication device are the server 1400 shown in Figure 12, and the third communication device is the data transmission device 2000 shown in Figure 10 or the data transmission device 2100 shown in Figure 11.
  • the processor may be a CPU, or other general-purpose processors, digital signal processors (DSP), application specific integrated circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hard disks, or other programmable logic devices.
  • DSP digital signal processors
  • ASIC application specific integrated circuits
  • FPGA field-programmable gate arrays
  • the general purpose processor may be a microprocessor or any conventional processor. It is worth noting that the processor may be a processor supporting an advanced RISC machines (ARM) architecture.
  • ARM advanced RISC machines
  • the memory may include a read-only memory and a random access memory, and provide instructions and data to the processor.
  • the memory may also include a non-volatile random access memory.
  • the memory may also store information about the device type.
  • the memory may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memory.
  • the nonvolatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
  • the volatile memory may be a random access memory (RAM), which is used as an external cache. By way of example but not limitation, many forms of RAM are available.
  • SRAM static RAM
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous link DRAM
  • DR RAM direct rambus RAM
  • An embodiment of the present application further provides a computer-readable storage medium, in which at least one instruction is stored.
  • the instruction is loaded and executed by a processor so that a computer implements any of the above data transmission methods.
  • the embodiments of the present application also provide a computer program (product), which, when executed by a computer, can enable a processor or a computer to execute the corresponding steps and/or processes in the above method embodiments.
  • An embodiment of the present application also provides a chip, including a processor, for calling and executing instructions stored in the memory from the memory, so that a communication device equipped with the chip executes any of the above data transmission methods.
  • An embodiment of the present application also provides another chip, including: an input interface, an output interface, a processor and a memory, wherein the input interface, the output interface, the processor and the memory are connected via an internal connection path, and the processor is used to execute the code in the memory.
  • the processor is used to execute any of the above data transmission methods.
  • a computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • Computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • computer instructions can be transmitted from one website site, computer, server or data center to another website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrated. Available media can be magnetic media (e.g., floppy disk, hard disk, tape), optical media (e.g., DVD), or semiconductor media (e.g., solid state disk), etc.
  • the computer program product includes one or more computer program instructions.
  • the method of the embodiment of the present application can be described in the context of a machine executable instruction, and the machine executable instruction is such as included in the program module executed in the device on the real or virtual processor of the target.
  • a program module includes a routine, a program, a library, an object, a class, a component, a data structure, etc., which performs a specific task or realizes a specific abstract data structure.
  • the function of the program module can be merged or divided between the described program modules.
  • the machine executable instruction for the program module can be executed in a local or distributed device. In a distributed device, the program module can be located in both a local and a remote storage medium.
  • the computer program code for realizing the method for the embodiment of the present application can be written in one or more programming languages. These computer program codes can be provided to the processor of a general-purpose computer, a special-purpose computer or other programmable data processing device, so that the program code, when executed by a computer or other programmable data processing device, causes the function/operation specified in the flow chart and/or block diagram to be implemented.
  • the program code can be executed completely on a computer, partially on a computer, as an independent software package, partially on a computer and partially on a remote computer or completely on a remote computer or server.
  • computer program codes or related data may be carried by any appropriate carrier to enable a device, apparatus or processor to perform the various processes and operations described above.
  • Examples of carriers include signals, computer readable media, and the like.
  • Examples of signals may include electrical, optical, radio, acoustic or other forms of propagated signals, such as carrier waves, infrared signals, etc.
  • a machine-readable medium may be any tangible medium that contains or stores a program for or related to an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination thereof. More detailed examples of machine-readable storage media include an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical storage device, a magnetic storage device, or any suitable combination thereof.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the module is only a logical function division. There may be other division methods in actual implementation, such as multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or modules, or it can be an electrical, mechanical or other form of connection.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place or distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiments of the present application.
  • each functional module in each embodiment of the present application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software functional modules.
  • the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the method in each embodiment of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk and other media that can store program code.
  • first, second, etc. are used to distinguish between identical or similar items having substantially the same effects and functions. It should be understood that there is no logical or temporal dependency between “first”, “second”, and “nth”, nor is there a limitation on quantity and execution order. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, without departing from the scope of various examples, a first image may be referred to as a second image, and similarly, a second image may be referred to as a first image. Both the first image and the second image may be images, and in some cases, may be separate and different images.
  • the size of the serial number of each process does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
  • determining B based on A does not mean determining B only based on A.
  • B can also be determined based on A and/or other information.
  • references to “one embodiment”, “an embodiment”, or “a possible implementation” throughout the specification mean that specific features, structures, or characteristics related to the embodiment or implementation are included in at least one embodiment of the present application. Therefore, the references to “in one embodiment” or “in an embodiment”, or “a possible implementation” throughout the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

本申请公开了一种数据传输方法、装置、设备、系统及存储介质,涉及通信技术领域。以第一通信装置执行该方法为例,第一通信装置获取对应于第一业务的第一报文,第一报文的目的地是第二通信装置;第一通信装置根据第一连接组向第二通信装置发送该第一报文,第一连接组由第一通信装置传输的业务共享。该方法能够共享复用同一个连接组来传输业务的报文,提高了连接的可扩展性,相比于单连接的传输方式,使得数据传输的性能不受网卡内存等资源的限制,也不受限于单连接故障导致的业务中断,提高了数据传输的可靠性。

Description

数据传输方法、装置、设备、系统及存储介质
本申请要求于2022年11月29日提交的申请号为202211517066.9、发明名称为“数据传输方法、装置、设备、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,特别涉及数据传输方法、装置、设备、系统及存储介质。
背景技术
随着高性能业务(例如高性能计算、分布式机器学习、分布式存储)的快速部署和应用,远程直接内存访问(remote direct memory access,RDMA)作为支撑高性能业务的主要网络传输技术,逐渐成为网络通信的主流方案。
相关技术中,RDMA技术的数据传输采用单连接模式,即同一业务的报文承载在一个队列对(queue pair,QP)连接上进行传输,一个QP连接对应一条路径,即QP连接用来描述任两个节点间业务的通信连接。
但是,随着网络规模的不断扩大,网络中的节点数量增多或者任两个节点间业务的数量增多,使得相关技术中的数据传输方法,需要建立大量的QP连接实现数据传输,而QP连接的建立数量受限于网卡内存等资源的限制,进而使得数据传输的性能受限。
发明内容
本申请提供了一种数据传输方法、装置、设备、系统及存储介质,用于通过共享的连接组传输业务的报文。
第一方面,提供了一种数据传输方法,以第一通信装置执行该方法为例,第一通信装置获取对应于第一业务的第一报文,第一报文的目的地是第二通信装置;第一通信装置根据第一连接组向第二通信装置发送该第一报文,第一连接组由第一通信装置传输的业务共享。
该方法能够共享复用同一个连接组来传输业务的报文,提高了连接的可扩展性,相比于单连接的传输方式,使得数据传输的性能不受网卡内存等资源的限制,也不受限于单连接故障导致的业务中断,提高了数据传输的可靠性。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,则第一通信装置可以根据第一连接组中的第一连接向第二通信装置发送第一报文。此外,当第一通信装置获取对应于第二业务的第二报文,第二报文的目的地与第一报文的目的地相同是第二通信装置;第一通信装置还可以根据第一连接向第二通信装置发送第二报文。
由此,对于不同业务但去往同一目的地的报文,能够共享复用同一个连接组中的同一个连接,实现多业务之间的单连接共享,使得本地连接的数量不会随着业务增多而增多,因此使得数据传输的性能不受网卡内存等资源的限制,进一步提高了连接的可扩展性。
在一种可能的实施方式中,第一连接组除了包括第一通信装置和第二通信装置间的第一连接之外,还包括第一通信装置和第二通信装置间的第二连接,则第一通信装置可以根据第一连接组中的第一连接向第二通信装置发送第一报文。当第一通信装置获取对应于第一业务的第二报文,第二报文的目的地是第二通信装置;第一通信装置可以根据第二连接向第二通信装置发送第二报文。
由此,对于同一业务去往同一目的地的不同报文,能够通过同一个连接组中的不同连接传输,实现单业务的多连接共享,进而通过多连接传输同一业务的方式能够实现多连接之间的负载均衡,且能够避免由单连接故障导致的业务中断问题,当第一连接组中的第一连接故障时,可以切换到第一连接组中的第二连接传输,保障了业务传输的可靠性和稳定性。并且,当不同连接对应网络中的不同物理路径时,还能够实现网络中多路径间的负载均衡,以及避免由单路径故障导致的业务中断问题,进一步保障了业务传输的可靠性和稳定性。
在一种可能的实施方式中,第一连接组不仅用于第一通信装置向第二通信装置传输的业务共享,还可用于第一通信装置向第四通信装置传输的业务共享。第一通信装置获取对应于第一业务的第三报文,第三报文的目的地是第四通信装置;第一通信装置根据第一连接组向第四通信装置发送该第三报文。由此,第一通信装置还能够通过共享的第一连接组向第二通信装置和第四通信装置发送同一业务的报文,进一步提高了连接可扩展性和数据传输的性能。
在一种可能的实施方式中,第一通信装置获取对应于第二业务的第三报文,第三报文的目的地是第四通信装置;第一通信装置根据第一连接组向第四通信装置发送该第三报文。由此,第一通信装置还能够通过共享的第一连接组向第二通信装置和第四通信装置发送不同业务的报文,进一步提高了连接可扩展性和数据传输的性能。
在一种可能的实施方式中,第一通信装置根据第一连接向第二通信装置发送第一报文的方式可以为,根据第一连接的连接性能满足性能条件选择第一连接组中的第一连接向第二通信装置发送第一报文,连接性能包括发送队列长度、时延性能或丢包性能中的至少一种。其中,连接性能满足性能条件的连接即为连接组中连接性能较高的连接。因此在第一连接组包括多个连接的情况下,通过是否满足性能条件来选择第一连接,使得通过第一连接进行数据传输的传输性能更高。
在一种可能的实施方式中,第一通信装置包括连接资源池,连接资源池中包括至少一个连接组,其中每个连接组对应一个目的通信装置,至少一个连接组包括第一连接组。通过建立连接资源池的方式,能够更好的实现连接共享复用的效果。
在一种可能的实施方式中,第一通信装置在根据第一连接组向第二通信装置发送第一报文之前,需要先根据第一报文确定出第一连接组。例如,当该至少一个连接组分别对应的目的通信装置中包括第一报文的目的地时,将目的地对应的连接组作为第一连接组;当至少一个连接组分别对应的目的通信装置中不包括第一报文的目的地时,建立第一连接组,使得能够基于第一连接组传输第一报文。由此,在连接资源池已经包括对应的连接组的情况下,能够快速基于连接资源池确定得到与待传输的报文的目的地对应的连接组,由于无需现去建立连接,还提高了数据传输的速度。
在一种可能的实施方式中,在确定第一连接之前,还需获取第一连接组中的每一连接的连接性能。可选地,对于第一连接组中的任一连接,基于任一连接发送探测报文;通过探测报文获取任一连接的时延性能或丢包性能中的至少一种。通过发送探测报文的方式可以实时的获取每一连接的连接性能,保证了连接性能的准确性,进而保证了根据连接性能选择的第一连接的准确性,使得通过第一连接传输的报文具有较高的传输性能。
在一种可能的实施方式中,当第一报文为重要包时,第一通信装置可以在第一报文中添加第一标识。重要包是指丢包对数据传输性能影响较大的报文,例如,重要包可以为至少一个连接分别对应的发送队列中的最后一个报文、控制信令报文、探测报文、心跳保活报文、地址请求报文、丢包重传报文或业务指定重要包,第一标识用于接收第一报文的通信装置根据第一标识确定第一报文为重要包。
通过为重要包添加第一标识,使得接收第一报文的第二通信装置能够基于第一标识确定第一报文为重要包,进而使得第二通信装置能够对重要包进行保护处理,以保证重要包的传输性能,降低重要包的丢包率,进而有效避免由重要包丢包导致的尾时延问题。
在一种可能的实施方式中,当第一报文的数据量大于阈值时,第一通信装置还可以将第一报文切分为多个第一子报文,进而根据第一连接组向第二通信装置发送多个第一子报文。通过数据切分的方式,将数据量大于阈值的第一报文切分为多个第一子报文来传输,使得能够更容易的实现细粒度的多连接调度,提高数据传输性能。
在一种可能的实施方式中,将第一报文切分为多个第一子报文的方式可以为,按照第一报文对应的内存地址范围将第一报文切分为多个第一子报文,一个第一子报文对应内存地址范围中的一个内存地址区间。通过首地址偏移的方式来切分数据,无需对切分数据进行复制粘贴,减少了对内存的占用和中央处理器(central processing unit,CPU)计算资源的占用,提高了数据切分的效率。
在一种可能的实施方式中,第一通信装置包括应用层和传输层;第一通信装置获取对应于第一业务的第一报文的方式可以为,传输层接收应用层调用第一业务的逻辑接口发送的对应于第一业务的第一报文;同样的,第一通信装置根据第一连接组向第二通信装置发送多个第一子报文之后,当传输层获取到多个第一子报文分别对应的通告消息时,传输层调用第一业务的逻辑接口向应用层发送第一报文对应的通 告消息。
由此,使得在将第一报文切分为多个第一子报文进行传输的情况下,对于应用层来说,依旧是向传输层传输第一报文,以及接收传输层发送的第一报文对应的通告消息。该方法直接使用现有的上层应用的逻辑接口即可实现,原始业务代码无需修改,也无需建立新的接口,即多连接共享对于应用层透明,使得该方法的可部署性和适用性更广。
在一种可能的实施方式中,第一通信装置是应用层软件和网卡硬件之间的中间件。
在一种可能的实施方式中,第一报文基于RDMA协议传输。使得该方法能够提高RDMA技术下的数据传输性能。
第二方面,提供了一种数据传输方法,应用于第三通信装置,第三通信装置接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文,第一报文的目的地是第二通信装置,第一连接组由第一通信装置传输的业务共享;第三通信装置向第二通信装置发送第一报文。
该方法通过共享复用同一个连接组来接收报文,保障了连接共享复用的有效实施,提高了第一通信装置的连接可扩展性,提高了数据传输的性能和可靠性。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,第三通信装置接收第一通信装置基于第一连接组中的第一连接发送的对应于第一业务的第一报文。第三通信装置还可以接收第一通信装置基于第一连接发送的对应于第二业务的第二报文,第二报文的目的地是第二通信装置;第三通信装置向第二通信装置发送第二报文。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接和第二连接,第三通信装置接收第一通信装置基于第一连接组中的第一连接发送的对应于第一业务的第一报文。第三通信装置还可以接收第一通信装置基于第二连接发送的对应于第一业务的第二报文,第二报文的目的地是第二通信装置;第三通信装置向第二通信装置发送第二报文。
在一种可能的实施方式中,第三通信装置接收第一通信装置发送的第一业务的第一报文之后,当第一报文为重要包时,按照第一丢包率处理第一报文,第一丢包率低于非重要包的丢包率。重要包为至少一个连接分别对应的发送队列中的最后一个报文、控制信令报文、心跳保活报文、探测报文、地址请求报文、丢包重传报文或业务指定重要包。
在一种可能的实施方式中,第一报文携带有第一标记,第一标记由第一通信装置添加;按照第一丢包率处理第一报文之前,可以基于第一标记确定第一报文为重要包。
在一种可能的实施方式中,第三通信装置接收第一通信装置基于第一连接组中的任一连接发送的探测报文,在探测报文中添加传输信息,传输信息包括节点标识、丢包、时延或吞吐量中的至少一种;传输添加传输信息的探测报文,探测报文用于第一通信装置获取任一连接的时延性能或丢包性能中的至少一种。
在一种可能的实施方式中,第一报文的数据量大于阈值,第一报文包括多个第一子报文,多个第一子报文由第一通信装置对第一报文切分得到。在该情况下,当多个第一子报文发生乱序时,在多个第一子报文中的发生乱序的第一子报文中添加乱序标识;传输添加乱序标识的第一子报文,乱序标识用于指示多个第一子报文发生乱序但未发生丢包。
在一种可能的实施方式中,第一报文基于RDMA协议传输。
第三方面,提供了一种数据传输方法,应用于第二通信装置,第二通信装置接收对应于第一业务的第一报文,第一报文由第一通信装置基于第一连接组发送,第一报文的目的地是第二通信装置,第一连接组由第一通信装置传输的业务共享。
该方法通过共享复用同一个连接组来接收报文,保障了连接共享复用的有效实施,提高了第一通信装置的连接可扩展性,提高了数据传输的性能和可靠性。
在一种可能的实施方式中,若第一连接组包括第一通信装置和第二通信装置间的第一连接,第一报文可以由第一通信装置基于第一连接发送。第二通信装置还可以接收对应于第二业务的第二报文,第二报文由第一通信装置基于第一连接发送,第二报文的目的地是第二通信装置。
在一种可能的实施方式中,若第一连接组包括第一通信装置和第二通信装置间的第一连接和第二连接,第一报文可以由第一通信装置基于第一连接发送。第二通信装置还可以接收对应于第一业务的第二报文,第二报文由第一通信装置基于第二连接发送,第二报文的目的地是第二通信装置。
在一种可能的实施方式中,第一报文包括多个第一子报文,第二通信装置包括应用层和传输层;第二 通信装置接收对应于第一业务的第一报文的方式可以为,传输层接收第一通信装置基于第一连接组发送的多个第一子报文,多个第一子报文分别包括对应的序列号、子序列号和接口标识,多个第一子报文由第一通信装置对第一报文切分得到;传输层按照序列号和子序列号将多个第一子报文排列组合,得到第一报文;传输层调用接口标识对应的逻辑接口向应用层发送第一报文。
由于在将第一报文切分为多个第一子报文进行多连接共享传输的情况下,对于应用层来说,依旧是接收传输层发送的第一报文,使得应用层对数据切分和多连接共享过程无感。
在一种可能的实施方式中,当接收到发生乱序的任一第一子报文时,若任一第一子报文携带有乱序标识,基于乱序标识确定多个第一子报文发生乱序但未发生丢包。通过对乱序标识的识别,能够准确区分发生乱序的报文和发生丢包的报文,避免将发生乱序的报文误认为发生丢包的报文,导致在无丢包的情况下触发丢包传输。
在一种可能的实施方式中,第一通信装置是应用层软件和网卡硬件之间的中间件。
在一种可能的实施方式中,第一报文基于RDMA协议传输。使得该方法能够提高RDMA技术下的数据传输性能。
第四方面,提供了一种数据传输装置,应用于第一通信装置,该装置包括:
获取模块,用于获取对应于第一业务的第一报文,第一报文的目的地是第二通信装置;
发送模块,用于根据第一连接组向第二通信装置发送第一报文,第一连接组由第一通信装置传输的业务共享。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,发送模块,用于根据第一连接向第二通信装置发送第一报文;
获取模块,还用于获取对应于第二业务的第二报文,第二报文的目的地是第二通信装置;
发送模块,还用于根据第一连接向第二通信装置发送第二报文。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,发送模块,用于根据第一连接向第二通信装置发送第一报文;
获取模块,还用于获取对应于第一业务的第二报文,第二报文的目的地是第二通信装置;
发送模块,还用于根据第二连接向第二通信装置发送第二报文。
在一种可能的实施方式中,发送模块,用于根据第一连接的连接性能满足性能条件选择第一连接向第二通信装置发送第一报文,连接性能包括发送队列长度、时延性能或丢包性能中的至少一种。
在一种可能的实施方式中,第一通信装置包括连接资源池,连接资源池包括至少一个连接组,其中每个连接组对应一个目的通信装置,至少一个连接组包括第一连接组。
在一种可能的实施方式中,该装置还包括:
切分模块,用于当第一报文的数据量大于阈值时,第一通信装置将第一报文切分为多个第一子报文;
发送模块,用于根据第一连接组向第二通信装置发送多个第一子报文。
在一种可能的实施方式中,第一通信装置包括应用层和传输层;获取模块,用于传输层接收应用层调用第一业务的逻辑接口发送的对应于第一业务的第一报文;
发送模块,用于当传输层获取到多个第一子报文分别对应的通告消息时,传输层调用第一业务的逻辑接口向应用层发送第一报文对应的通告消息。
在一种可能的实施方式中,第一通信装置是应用层软件和网卡硬件之间的中间件。
在一种可能的实施方式中,第一报文基于RDMA协议传输。
第五方面,提供了一种数据传输装置,应用于第三通信装置,该装置包括:
接收模块,用于接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文,第一报文的目的地是第二通信装置,第一连接组由第一通信装置传输的业务共享;
发送模块,用于向第二通信装置发送第一报文。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,接收模块,用于接收第一通信装置基于第一连接发送的对应于第一业务的第一报文;
接收模块,还用于接收第一通信装置基于第一连接发送的对应于第二业务的第二报文,第二报文的目的地是第二通信装置;
发送模块,还用于向第二通信装置发送第二报文。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接和第二连接,接收模块,用于接收第一通信装置基于第一连接发送的对应于第一业务的第一报文;
接收模块,还用于接收第一通信装置基于第二连接发送的对应于第一业务的第二报文,第二报文的目的地是第二通信装置;
发送模块,还用于向第二通信装置发送第二报文。
在一种可能的实施方式中,第一报文包括多个第一子报文,装置还包括:添加模块,用于当多个第一子报文发生乱序时,在多个第一子报文中的发生乱序的第一子报文中添加乱序标识;
传输模块,用于传输添加乱序标识的第一子报文,乱序标识用于指示多个第一子报文发生乱序但未发生丢包。
在一种可能的实施方式中,第一报文基于RDMA协议传输。
第六方面,提供了一种数据传输装置,应用于第二通信装置,该装置包括:
接收模块,用于接收对应于第一业务的第一报文,第一报文由第一通信装置基于第一连接组发送,第一报文的目的地是第二通信装置,第一连接组由第一通信装置传输的业务共享。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,第一报文由第一通信装置基于第一连接发送;
接收模块,还用于接收对应于第二业务的第二报文,第二报文由第一通信装置基于第一连接发送,第二报文的目的地是第二通信装置。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接和第二连接,第一报文由第一通信装置基于第一连接发送;
接收模块,还用于接收对应于第一业务的第二报文,第二报文由第一通信装置基于第二连接发送,第二报文的目的地是第二通信装置。
在一种可能的实施方式中,第一报文包括多个第一子报文,第二通信装置包括应用层和传输层;
接收模块,用于传输层接收多个第一子报文,多个第一子报文分别包括对应的序列号、子序列号和接口标识,多个第一子报文由第一通信装置对第一报文切分得到;传输层按照序列号和子序列号将多个第一子报文排列组合,得到第一报文;传输层调用接口标识对应的逻辑接口向应用层发送第一报文。
在一种可能的实施方式中,该装置还包括:
确定模块,用于当接收到发生乱序的任一第一子报文时,若任一第一子报文携带有乱序标识,基于乱序标识确定多个第一子报文发生乱序但未发生丢包。
在一种可能的实施方式中,第二通信装置是应用层软件和网卡硬件之间的中间件。
在一种可能的实施方式中,第一报文基于RDMA协议传输。
第七方面,提供了一种数据传输设备,该数据传输设备包括:处理器,处理器与存储器耦合,存储器中存储有至少一条程序指令或代码,至少一条程序指令或代码由处理器加载并执行,以使数据传输设备实现如上第一方面至第三方面任一的数据传输方法。
可选地,处理器为一个或多个,存储器为一个或多个。
可选地,存储器可以与处理器集成在一起,或者存储器与处理器分离设置。
在具体实现过程中,存储器可以为非瞬时性(non-transitory)存储器,例如只读存储器(read only memory,ROM),其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型以及存储器与处理器的设置方式不做限定。
第八方面,提供了一种通信装置,该装置包括:收发器、存储器和处理器。其中,该收发器、该存储器和该处理器通过内部连接通路互相通信,该存储器用于存储指令,该处理器用于执行该存储器存储的指令,以控制收发器接收信号,并控制收发器发送信号,并且当该处理器执行该存储器存储的指令时,使得该通信装置执行第一方面或第一方面的任一种可能的实施方式中的方法,或者执行第二方面或第二方面的任一种可能的实施方式中的方法,或者执行第三方面或第三方面的任一种可能的实施方式中的方法。
第九方面,提供了一种数据传输系统,数据传输系统包括第一通信装置和第二通信装置;
第一通信装置用于执行第一方面或第一方面的任一种可能的实现方式的方法,第二通信装置用于执行第三方面或第三方面的任一种可能的实现方式的方法。
可选地,数据传输系统还包括第三通信装置;第三通信装置用于执行第二方面或第二方面的任一种可 能的实现方式的方法。
第十方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行,以使计算机实现上述第一方面或第一方面的任一种可能的实施方式中的方法,或者实现上述第二方面或第二方面的任一种可能的实施方式中的方法,或者实现上述第三方面或第三方面的任一种可能的实施方式中的方法。
第十一方面,提供了一种计算机程序(产品),所述计算机程序(产品)包括:计算机程序代码,当所述计算机程序代码被计算机运行时,使得所述计算机执行上述各方面中的方法。
第十二方面,提供了一种芯片,包括处理器,用于从存储器中调用并运行所述存储器中存储的指令,使得安装有所述芯片的通信设备执行上述各方面中的方法。
第十三方面,提供另一种芯片,包括:输入接口、输出接口、处理器和存储器,所述输入接口、输出接口、所述处理器以及所述存储器之间通过内部连接通路相连,所述处理器用于执行所述存储器中的代码,当所述代码被执行时,所述处理器用于执行上述各方面中的方法。
应当理解的是,本申请的第二方面至第十三方面技术方案及对应的可能的实施方式所取得的有益效果可以参见上述对第一方面及其对应的可能的实施方式的技术效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种数据传输的示意图;
图2为本申请实施例提供的一种数据传输方法的实施环境的示意图;
图3为本申请实施例提供的一种数据传输方法的交互示意图;
图4为本申请实施例提供的一种连接资源池与逻辑接口的连接示意图;
图5为本申请实施例提供的一种通信装置的功能模块示意图;
图6为本申请实施例提供的另一种通信装置的功能模块示意图;
图7为本申请实施例提供的一种数据传输装置的结构示意图;
图8为本申请实施例提供的另一种数据传输装置的结构示意图;
图9为本申请实施例提供的又一种数据传输装置的结构示意图;
图10为本申请实施例提供的一种网络设备的结构示意图;
图11为本申请实施例提供的另一种网络设备的结构示意图;
图12为本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
相关技术中,RDMA技术的数据传输为单连接模式,即一个业务的报文承载在一个连接对上发送出去,一个连接对可以是指一个QP连接,一个QP连接包括发送队列(send queue,SQ)、接收队列(receive queue,RQ)和完成队列(completion queue,CQ)。一个业务的报文的五元组相同,五元组包括源网际互连协议(internet protocol,IP),目的IP,协议号,源端口号和目的端口号。由此,在使能等价多路径路由(equal-cost multi-path routing,ECMP)功能的通信装置上,同一个QP连接上的报文会以同一条物理路径进行传输。其中,ECMP是一种基于五元组进行哈希选路的路由算法。
但是,随着网络规模的不断扩大,网络中的节点数量增多或者任两个节点间业务的数量增多,使得相关技术中的数据传输方法,需要建立大量的QP连接实现数据传输,而QP连接的建立数量受限于网卡内存等资源的限制,进而使得数据传输的性能受限。并且,单连接模式还受限于单连接故障下导致的业务中断的风险,降低了数据传输的性能和可靠性。此外,当任两个节点之间存在多条等价路径时,单连接模式还对多条等价路径的网络带宽的利用率较低。
参见图1,图1为本申请实施例提供的一种数据传输的示意图。如图1所示,通信装置1作为发送节点主要包括SQ和CQ,通信装置2作为接收节点主要包括RQ和CQ,通信装置1和通信装置2之间通过至少一个通信装置进行通信,或者,通信装置1和通信装置2直接相连。其中,上层的业务应用(application, APP)通过工作请求(work request,WR)的方式提交发送数据任务与接收数据任务,并且转化到SQ或是RQ中成为工作队列元素(work queue element,WQE)。当一个报文发送完成或是接收完成时,CQ中会对应产生完成队列元素(completion queue element,CQE),一个CQE对应转化为一个(work completion,WC)任务上交给业务APP。
可选地,RDMA技术的传输接口包括可靠连接(reliable connection,RC)模式和动态连接传输(dynamically connected,DC)模式。其中,RC模式具备面向连接、对丢失报文重传保障可靠性、支持单边或双边操作等特点,图1中示例为RC模式在双边操作下的数据传输,RC模式下的单边操作与图1所示的数据传输区别在于接收节点无需产生WR和WC。RC模式下的数据传输,由于本地节点的一个QP与远端节点的一个QP建立单连接并通信,随着节点规模增加,或者本地节点上的业务的规模增大,本地节点的网卡连接数也随之增多。例如,如果网络中包括N(N为正整数)个节点,每个节点上包括P(P为正整数)个业务进程,则在网络全连接的场景下,每个节点需要创建N*P*P个QP连接。若耗尽本地节点的网卡内存等资源,容易出现数据传输的性能劣化。
在DC模式下的数据传输,当本地节点需要与多个远端节点通信时,本地节点可以共用一个QP连接,循环与不同远端节点建立连接以传输数据,使得本地节点的网卡连接规模不随节点数量增多而快速增多,从而减少网络规模变大带的网卡连接规模增长的问题。但是,由于需要反复的拆建连接来达到同一QP连接切换不同远端节点的目的,每次拆建连接的开销大,使得数据传输的时延和吞吐劣化严重。
参见图2,图2为本申请实施例提供的一种数据传输方法的实施环境的示意图。如图2所示,该实施环境包括多个电子设备和多个交换机,多个交换机包括叶子(leaf)交换机、骨干(spine)交换机和核心(core)交换机。电子设备可以为终端或者服务器,主要承载不同通信需求的高性能业务,同一电子设备上可承载不同业务,不同电子设备的业务间有互相通信需求。电子设备中包括网卡,网卡用于发送数据和接收数据,以实现不同电子设备之间的通信。
在实际部署时,电子设备、leaf交换机、spine交换机和core交换机的数量可根据网络规模等因素灵活调整。可以理解,随着网络规模的不断增大,电子设备、leaf交换机、spine交换机和core交换机的数量越来越多,则任两个电子设备之间可能存在等价的多条路径。因此,在该情况下,RDMA的单连接模式由于不能同时使用多条路径,从而无法充分利用网络带宽,使得业务性能受限,且容易出现多条路径的网络负载不均、单点故障下重建链开销大等问题。即使是任两个电子设备之间不存在等价的多条路径,即多个连接对应相同的物理路径的情况下,由于有些网络故障是针对连接的,因此RDMA的单连接模式也会因单连接的网络故障导致业务中断,同样使得数据传输的可靠性较差。
基于此,本申请实施例提供了一种数据传输方法,利用目前RDMA下的标准接口,例如RC接口或DC接口,通过连接的共享复用,解决连接扩展性、链路均衡等问题。可选地,该方法可以应用于数据中心网络拓扑、多数据中心之间互联或者广域互联网,该方法的业务场景可以是分布式机器学习训练、分布式存储、人工智能(artificial intelligence,AI)、高性能计算(high performance computing,HPC)或者容器等高性能业务场景。本申请实施例中的RDMA可以是无限带宽(Infiniband)或者基于融合以太网的RDMA(RDMA over converged Ethernet,RoCE)。
参见图3,图3为本申请实施例提供的一种数据传输方法的交互示意图。示例性地,以第一通信装置向第二通信装置传输数据为例进行说明,其中,第一通信装置和第二通信装置可以为图2所示的任两个电子设备,或者,第一通信装置和第二通信装置可以是指应用层软件和网卡硬件之间的中间件。当第一通信装置通过第三通信装置向第二通信装置传输数据时,第三通信装置可以为图2所示的任一leaf交换机、spine交换机或core交换机。如图3所示,该数据传输方法包括如下步骤301-步骤305。
步骤301,第一通信装置获取对应于第一业务的第一报文,第一报文的目的地是第二通信装置。
在本申请实施例中,第一通信装置中运行有多个上层应用,每个上层应用对应有至少一个业务进程,每个业务进程对应有至少一个业务线程,每个业务线程需要与对应的远端节点进行通信。其中,本申请实施例中的第一业务可以指的是第一通信装置中运行的任一业务线程。当第一通信装置向第二通信装置传输数据时,第一通信装置称为本地节点,第二通信装置称为远端节点、目的地或者目的通信装置,第一通信装置与第二通信装置之间的节点称为第三通信装置。
在一种可能的实施方式中,第一通信装置包括应用层和传输层,应用层和传输层之间相互交互,应用层运行有至少一个应用,一个应用包括至少一个业务,传输层用于发送来自上层的数据,或者用于接收去 往上层应用的数据。例如,当应用层的第一业务需要传输数据时,应用层调用第一业务的逻辑接口向传输层发送对应于第一业务的第一报文,以使第一通信装置能够获取到待传输的第一报文,即传输层接收应用层调用第一业务的逻辑接口发送的对应于第一业务的第一报文。同样的,当通过传输层获取到第一报文的通告消息时,通过传输层调用第一业务的逻辑接口向应用层发送第一报文对应的通告消息。可选地,通告消息可以为发送完成消息(例如图1所示的CQE)、丢包通告消息、乱序通告消息等等。
示例性地,第一报文为WR数据,图1所示的APP代表第一通信装置的应用层,图1所示的QP代表第一通信装置的传输层,第一通信装置通过APP调用对应的逻辑接口,以WR的方式提交发送数据任务给传输层的发送队列SQ,通过传输层的发送队列SQ以建立的QP连接向第二通信装置发送数据。第一通信装置在发送完第一报文后,传输层的接收队列CQ会产生第一报文对应的CQE,传输层调用对应的逻辑接口以WC的方式提交CQE给APP。
步骤302,第一通信装置根据第一连接组向第二通信装置发送该第一报文,第一连接组由第一通信装置传输的业务共享。
在本申请实施例中,该方法可应用于RDMA传输技术场景,则第一报文可以基于RDMA协议传输。使得该方法能够提高RDMA技术下的数据传输性能。
其中,第一通信装置包括连接资源池,连接资源池包括至少一个连接组,一个连接组对应一个目的通信装置,至少一个连接组包括第一连接组。第一通信装置在基于第一连接组发送第一报文之前,需要先基于第一报文的目的地在连接资源池中确定第一连接组。可选地,当该至少一个连接组分别对应的目的通信装置中包括第一报文的目的地时,将目的地对应的连接组作为第一连接组;当至少一个连接组分别对应的目的通信装置中不包括第一报文的目的地时,建立第一连接组,使得能够基于第一连接组传输第一报文。由此,在连接资源池已经包括对应的连接组的情况下,能够快速基于连接资源池确定得到与待传输的报文的目的地对应的连接组,由于无需现去建立连接,还提高了数据传输的速度。连接资源池包括的至少一个连接组是指连接资源池包括的所有连接组。
本申请实施例不对建立第一连接组的方式进行限定,例如,第一通信装置发起一个用于建立QP连接的请求(request,REQ)消息,REQ消息中携带有连接参数,例如QP连接的序列号、起始的报文序列号(packet sequence numbers,PSN)、重传次数上限、源端口号等等。第二通信装置监听到第一通信装置发送的连接请求后,会对REQ消息中的连接参数进行校验和记录,例如,校验方式为判断REQ消息中的连接参数是否与自身的连接参数相符,校验通过后发送回复(reply,REP)消息,表示接受第一通信装置发送的连接请求,并且在REP消息中携带第二通信装置的连接参数。第一通信装置收到第二通信装置回复的REP消息后,对REP消息中的连接参数进行校验和记录,然后发送一个准备使用(ready to use,RTU)消息表示同意第一通信装置的连接参数。由此,建立完成第一通信装置与第二通信装置之间的一个QP连接,通过该QP连接能够进行数据传输。进而通过不同的连接参数,例如,不同的源端口号,能够建立第一通信装置与第二通信装置之间的多个QP连接。
参见图4,图4为本申请实施例提供的一种连接资源池与逻辑接口的连接示意图。如图4所示,一个连接组对应一个目的通信装置的目的地址dst:IP1。由于同一连接组的不同物理QP连接包括不同的源端口,则第二通信装置能够根据不同源端口,为不同物理QP连接ECMP出不同或相同的物理路径,因此,不同物理QP连接对应到达目的地的物理路径相同或不同。
在图4中可以看出,业务应用可以包括APP1的业务1、APP1的业务2和APP2的业务1,其中,任一业务建立的QP连接为逻辑QP连接,任一业务建立的逻辑QP连接与连接资源池中实际建立的物理QP连接映射。因此,对于应用业务来讲,可以直接通过原始逻辑接口发送WR数据,然后基于逻辑QP连接与物理QP连接的映射,将WR数据映射到多物理QP连接。保障原始业务代码无需修改,即连接资源池对上层业务透明,使得连接资源池的可部署性与适应性更广,能够兼容多种RDMA协议。
在连接资源池中确定得到第一连接组后,即可基于第一连接组包括的第一连接发送第一报文。可选地,在第一连接组包括的连接中选择出第一连接,进而通过第一连接向第二通信装置发送第一报文。其中,第一连接的连接性能满足性能条件,连接性能包括发送队列长度、时延性能或丢包性能中的至少一种。
示例性地,当连接性能包括发送队列长度时,性能条件可以为发送队列长度最短;当连接性能包括时延性能或丢包性能时,性能条件可以为时延最小或丢包率最小。此外,还可以通过循环调度(round-robin scheduling,RR)算法在第一连接组中的连接中选择第一连接,RR算法能够对第一连接组中的连接进行同 样的无差别的循环调度服务。
在一种可能的实施方式中,在第一连接组中选择第一连接之前,还需要获取第一连接组包括的各个连接的连接性能。可选地,对于第一连接组中的任一连接,基于任一连接发送探测报文;通过探测报文获取任一连接的时延性能或丢包性能中的至少一种。其中,探测报文的五元组与该任一连接对应的五元组相同,例如,探测报文可以是基于任一连接传输的报文,也可以是为获取连接性能专门发送的探测报文。可选地,可以基于任一连接周期性地发送探测报文,以实时的获取任一连接对应的实际连接性能;也可以在针对任一连接具有测量需求的情况下,基于任一连接发送探测报文,以针对性的进行连接性能的探测,减少通信压力。
在一种可能的实施方式中,当第一报文的数据量大于阈值时,可以将第一报文切分为多个第一子报文。可选地,阈值可以为最大传输单元(maximum transmission unit,MTU),MTU指的是一个以太帧(ethernet frame)能携带的最大数据量,进而可以使用阈值MTU的粒度对第一报文进行切分。例如,当第一报文的数据量为8000字节,MTU为4000字节时,则可以将第一报文切分为两个4000字节的第一子报文。对于第一报文的切分方式,本申请实施例不做限定,例如,按照第一报文对应的内存地址范围将第一报文切分为多个第一子报文,一个第一子报文对应内存地址范围中的一个内存地址区间。
在将第一报文切分为多个第一子报文的情况下,基于第一连接组发送该多个第一子报文。因此,在第一连接组包括的连接为多个时,在第一连接组中选择的第一连接可以是多个,以通过多个第一连接分别向第二通信装置发送该多个第一子报文。例如,按照多个第一子报文的第一数量在第一连接组包括的多个连接中选择第一数量的第一连接,即将多个第一子报文分发到不同的连接中传输,或者,也可以按照多个第一子报文的第一数量在第一连接组包括的多个连接中选择第二数量的第一连接,第二数量小于第一数量。
在该情况下,以连接性能包括发送队列长度为例,性能条件可以为发送队列长度最短和其次短。总之,选择第一连接的原则可以为连接性能最优或者多连接间的负载均衡。
示例性地,参见图4,对于通过APP1向传输层下发的WQ数据,WQ数据可以被切分为对应的多个切片子WR数据分发到连接组1中的第一物理QP连接、第二物理QP连接和第三物理QP连接。对于通过APP2向传输层下发的WQ数据,WQ数据可以被切分为对应的多个切片子WR数据分发到连接组1中的第一物理QP连接和第三物理QP连接。
由此,相较于相关技术中的每个业务的单连接模式,本申请实施例中的每个业务可以同时并发利用一个连接组中的多个连接,实现多连接间的负载均衡性,不受限于单连接故障下业务中断的风险。在多连接对应多路径的情况下,还能提高网络多路径之间的负载均衡性,不受限于单路径故障下业务中断的风险。
在一种可能的实施方式中,当第一报文为重要包时,为第一报文添加第一标识,第一标识用于接收第一报文的第二通信装置确定第一报文为重要包。可选地,重要包为丢失对传输性能造成较大影响的报文,例如,重要包可以为至少一个连接分别对应的发送队列中的最后一个报文、控制信令报文、探测报文、心跳保活报文、地址请求报文、丢包重传报文或业务指定重要包。
本申请实施例不对为第一报文添加第一标识的方式进行限定,使得接收第一报文的第二通信装置能够基于第一标识确定第一报文为重要包即可。例如,在第一报文的保留字段中打上标记。在第一报文切分为多个第一子报文的情况下,当第一报文为重要包时,则多个第一子报文均为重要包,对多个第一子报文均添加第一标识;当第一报文为非重要包时,依次判断多个第一子报文中的每一个第一子报文是否为重要包,对多个第一子报文中的每一个重要包均添加第一标识。
第一通信装置在基于第一连接组向第二通信装置发送第一报文时,会根据路由算法将第一报文向网络中作为中间转发节点的第三通信装置发送。当选择第一连接组中的不同连接时,由于不同连接可能路由到不同的物理路径,则第一通信装置发送第一报文的下一跳可能为不同的第三通信装置,但无论中间接收第一报文的第三通信装置为哪个通信装置,第三通信装置所执行的操作是相同的。
在一种可能的实施方式中,当第一连接组包括多个连接时,第一通信装置还在多个连接之间进行优先级调度和拥塞控制。其中,优先级调度可以根据业务性能需求、业务流长度等对不同的物理QP连接设置不同的优先级,当优先级高的物理QP连接的发送队列SQ中有报文存在,则优先调度优先级高的物理QP连接进行发送。示例性地,在连接组中预留部分的物理QP连接设置为高优先级用来传递小业务流,降低小业务流的丢包率。
拥塞控制可以加载传统的拥塞控制算法,例如,拥塞控制算法包括如下4个阶段,第一阶段慢启动, 发送节点发送数据的最初执行是慢开始,当收到接收节点的确认信息后,开始加速发送,可防止进入拥塞状态;第二阶段拥塞避免,在慢启动的基础上,由于慢启动到后面数据传输量依然很大,增长非常快,网络拥塞概率增高,故设置一个门限,当发送数据量超过门限时,减小发送速度;第三阶段快重传,接收节点会对最后一个已收到的有序报文段进行确认,当连续回复3个相同的确认消息时,说明其下一个报文段丢失了,发送节点执行快重传;第四阶段快恢复,当发送方连续收到3个重复确认时,将门限减小并执行拥塞避免。
此外,第一通信装置在根据第一连接组向第二通信装置发送该第一报文的基础上,由于第一连接组是业务共享的,则第一通信装置还可以基于共享的第一连接组发送第二报文或第三报文,对于不同情况下的第二报文或第三报文,第一通信装置基于共享的第一连接组发送第二报文或第三报文包括但不限于如下四种场景。
场景一,第一连接组包括第一通信装置和第二通信装置间的第一连接,第一通信装置获取对应于第二业务的第二报文,第二报文的目的地与第一报文的目的地相同是第二通信装置;第一通信装置根据第一连接组中的第一连接向第二通信装置发送第二报文。
由此,对于不同业务但去往同一目的地的报文,能够共享复用同一个连接组中的同一个连接,使得本地连接的数量不会随着业务增多而增多,使得数据传输的性能不受网卡内存等资源的限制,进一步提高了连接可扩展性。
场景二,第一连接组除了包括第一通信装置和第二通信装置间的第一连接之外,还包括第一通信装置和第二通信装置间的第二连接,第一通信装置获取对应于第一业务的第二报文,第二报文的目的地是第二通信装置;第一通信装置可以根据第一连接组中的第二连接向第二通信装置发送第二报文。
由此,对于同一业务去往同一目的地的不同报文,能够通过同一个连接组中的不同连接传输,通过多连接传输同一业务的方式能够实现多连接之间的负载均衡,且能够避免由单连接故障导致的业务中断问题,当第一连接组中的第一连接故障时,可以切换到第一连接组中的第二连接传输,保障了业务传输的可靠性和稳定性。并且,当不同连接对应网络中的不同物理路径时,还能够实现网络中多路径间的负载均衡,以及避免由单路径故障导致的业务中断问题,进一步保障了业务传输的可靠性和稳定性。
场景三,第一通信装置获取对应于第一业务的第三报文,第三报文的目的地是第四通信装置;第一通信装置根据第一连接组向第四通信装置发送该第三报文。由此,第一通信装置能够通过共享的第一连接组向第二通信装置和第四通信装置发送同一业务的报文,进一步提高了连接可扩展性和数据传输的性能。
场景四,第一通信装置获取对应于第二业务的第二报文,第二报文的目的地是第四通信装置;第一通信装置根据第一连接组向第四通信装置发送该第二报文。由此,第一通信装置能够通过共享的第一连接组向第二通信装置和第四通信装置发送不同业务的报文,进一步提高了连接可扩展性和数据传输的性能。
可选地,在场景三和场景四中,第一连接组不仅用于第一通信装置向第二通信装置传输的业务共享,还可用于第一通信装置向第四通信装置传输的业务共享。若第二报文的目的地与第一报文的目的地不同时,可以采用上述DC模式实现,即第一通信装置可以共用一个连接组中的连接,循环与不同目的通信装置建立连接以传输数据。
步骤303,第三通信装置接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文。
其中,当第一通信装置基于第一连接组中的第一连接发送的对应于第一业务的第一报文时,则第三通信装置接收的也为第一通信装置基于第一连接组中的第一连接发送的对应于第一业务的第一报文。在本申请实施例中,第三通信装置接收第一通信装置向第二通信装置发送的对应于第一业务的第一报文之后,会对第一报文进行一定的处理,其主要体现在对重要包进行保护处理,以减小重要包在第三通信装置上的丢包率,进而降低由丢包带来的传输性能影响。
在一种可能的实施方式中,当第一报文为重要包时,第三通信装置对重要包进行保护处理,保护处理的方式本申请实施例不进行限定,能够降低重要包的丢包率即可。示例性地,第三通信装置对重要包进行保护处理的方式可以为,按照第一丢包率处理第一报文,第一丢包率低于非重要包的丢包率。可选地,第一报文携带有第一标记,第三通信装置基于第一标记确定第一报文为重要包。为重要包配置低于非重要包的丢包率,按照低于非重要包的丢包率来处理重要包,能够降低重要包的丢失次数。
本申请实施例不对配置的第一丢包率的数值进行限定,低于非重要包的丢包率即可。可选地,丢包率指的是该报文被丢弃的概率,丢包率越大该报文被丢弃的可能性越大。例如,非重要包的丢包 率为80%,第一丢包率可以为0-80%之间的任意数值,可选地,第一丢包率可以为10%。则按照第一丢包率处理第一报文的方式可以为当第三通信装置需要丢包时,基于第一丢包率产生一个随机数;当随机数大于阈值时,丢弃该第一报文;当随机数不大于阈值时,转发该第一报文,阈值可根据应用场景灵活调整。
可选地,按照低于非重要包的丢包率对重要包进行处理的方式还可以为,将重要包放入预留缓存(buffer)或者将重要包的优先级调高等等。由于预留buffer内的报文不会被丢包,高优先级的报文不会被优先丢包,则均可以达到减小重要包的丢包率的效果。重要包的丢包率减小,也就减少了由重要包丢包导致的超时重传的次数,由于超时重传会造成较大的网络延迟,超时重传次数的减少使得丢包对网络延迟的影响降低。
由此,通过第一通信装置对重要包添加的第一标识,使得第三通信装置可以准确识别出对丢包后业务性能影响严重的重要包,通过第三通信装置对重要包的保护处理,提高了重要包的传输稳定性,保障有损网络下业务性能稳定。
在一种可能的实施方式中,第三通信装置还接收第一通信装置基于第一连接组中的任一连接发送的探测报文,并在探测报文中添加传输信息,传输信息包括节点标识、丢包、时延或吞吐量中的至少一种;传输添加传输信息的探测报文。当探测报文传输回第一通信装置时,第一通信装置根据探测报文中添加的传输信息获取时延性能或丢包性能中的至少一种。示例性地,探测报文包括任一连接对应的物理路径上的各个节点的丢包和时延,第一通信装置将各个节点的丢包和时延进行统计分析,得到该任一连接对应的时延性能和丢包性能。
在本申请实施例中,当第一报文的数据量大于阈值时,第二通信装置接收的是由第一报文切分得到的多个第一子报文。在一种可能的实施方式中,第三通信装置还包括乱序标定功能,示例性地,当多个第一子报文发生乱序时,在多个第一子报文中的发生乱序的第一子报文中添加乱序标识;传输添加乱序标识的第一子报文,乱序标识用于指示多个第一子报文发生乱序但未发生丢包。由此,能够避免由乱序导致的接收节点误认为丢包发起的丢包重传。
例如,第三通信装置接收的多个第一子报文分别包括按序发送的序列号,第三通信装置能够识别发送队列中的每个报文的序列号,根据发送队列中的报文的序列号的顺序确定是否发生乱序。例如,若发送队列中的报文的序列号的顺序为1、2、3、5、4,则确定序列号为5的报文发生乱序但为未发生丢包,因此,在序列号为5的报文中添加乱序标识,使得接收节点在未接收到确定序列号为4的报文而接收到序列号为5的报文时,根据序列号为5的报文中的乱序标识确定序列号为4的报文未丢包;若发送队列中的报文的序列号的顺序为1、2、3、5、6,则确定未发生乱序且序列号为4的报文发生丢包,则不在序列号为5的报文中添加乱序标识,使得接收节点在未接收到确定序列号为4的报文而接收到序列号为5的报文时,根据序列号为5的报文中不包括乱序标识确定序列号为4的报文丢包。
此外,针对上述第一通信装置基于共享的第一连接组发送第二报文或第三报文的四种场景。相对于场景一,第三通信装置还可以接收第一通信装置基于第一连接发送的对应于第二业务的第二报文,第二报文的目的地是第二通信装置。相对于场景二,第三通信装置还可以接收第一通信装置基于第二连接发送的对应于第一业务的第二报文,第二报文的目的地是第二通信装置。相对于场景三,第三通信装置还可以接收第一通信装置基于第一连接组发送的对应于第一业务的第三报文,第三报文的目的地是第四通信装置。相对于场景四,第三通信装置还可以接收第一通信装置基于第一连接组发送的对应于第二业务的第三报文,第三报文的目的地是第四通信装置。
步骤304,第三通信装置向第二通信装置发送第一报文。
第三通信装置在接收到第一报文后,即可根据第一报文的目的地为第二通信装置,向第二通信装置发送第一报文。可选地,第三通信装置基于第一连接向第二通信装置发送第一报文。
同理,针对上述第一通信装置基于共享的第一连接组发送第二报文或第三报文的四种场景。相对于场景一,第三通信装置在接收第一通信装置基于第一连接发送的对应于第二业务的第二报文后,基于第二报文的目的地是第二通信装置,向第二通信装置发送对应于第二业务的第二报文。相对于场景二,第三通信装置在接收第一通信装置基于第二连接发送的对应于第一业务的第二报文后,基于第二报文的目的地是第二通信装置,向第二通信装置发送对应于第一业务的第二报文。相对于场景三,第三通信装置在接收第一通信装置基于第一连接组发送的对应于第一业务的第三报文后,基于第三报文的目的地 是第四通信装置,向第四通信装置发送对应于第一业务的第三报文。相对于场景四,第三通信装置在接收第一通信装置基于第一连接组发送的对应于第二业务的第三报文后,基于第三报文的目的地是第四通信装置,向第四通信装置发送对应于第二业务的第三报文。
步骤305,第二通信装置接收对应于第一业务的第一报文。
在本申请实施例中,第一通信装置基于第一连接组发送的对应于第一业务的第一报文,可以通过中间转发节点发送到第二通信装置,例如,通过上述步骤301-步骤304,第二通信装置接收第三通信装置转发的对应于第一业务的第一报文。可选地,若第一通信装置于第二通信装置直连,第二通信装置也可以直接接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文。
由此,第二通信装置能够基于第一连接组接收到对应于第一业务的第一报文。与第一通信装置相同,第二通信装置也包括对应的连接资源池,连接资源池中的物理QP连接与上层应用的逻辑QP连接之间映射连接,则第二通信装置基于第一连接组接收第一业务的第一报文之后,通过对应的逻辑QP连接上送到上层应用。
在一种可能的实施方式中,当第一报文的数据量大于阈值时,第二通信装置接收的是由第一报文切分得到的多个第一子报文。在该情况下,第一通信装置还在每个第一子报文中添加对应的序列号、子序列号和接口标识,使得第二通信装置接收到多个第一子报文后,能够按照序列号和子序列号将多个第一子报文排列组合,得到第一报文,并通过接口标识对应的逻辑接口上送第一报文给应用。
在第二通信装置还包括乱序标定功能的情况下,当第二通信装置在接收到多个第一子报文中的发生乱序的任一第一子报文时,若任一第一子报文未携带有乱序标识,则确定多个第一子报文发生丢包,发送丢包通告消息,以触发丢包重传;若任一第一子报文携带有乱序标识,基于乱序标识确定多个第一子报文发生乱序但未发生丢包,则先不发送丢包通告消息,如果一段时间后仍未接收到乱序之前的第一子报文,再发送丢包通告消息。
此外,针对上述第一通信装置基于共享的第一连接组发送第二报文的四种场景。相对于场景一,第二通信装置还可以接收第一通信装置基于第一连接组中的第一连接发送的对应于第二业务的第二报文,此时第二报文的目的地与第一报文的目的地相同,但第二报文与第一报文属于不同业务。相对于场景二,第二通信装置还可以接收第一通信装置基于第一连接组中的第二连接发送的对应于第一业务的第二报文,此时第二报文的目的地与第一报文的目的地相同,且第二报文与第一报文属于同一业务。
在本申请实施例中,第二业务和第一业务可以为同一业务进程中的两个业务线程,也可以是同一上层应用的不同业务进程中的两个业务线程,还可以是不同上层应用中的两个业务线程。
本申请实施例提供的数据传输方法,能够共享复用同一个连接组来传输报文,提高了连接可扩展性,提高了数据传输的性能和可靠性。本申请实施例中的连接资源池建立的连接组的数量由目的通信装置的数量确定,而连接组中包括的物理QP连接的数量可灵活调整,因此物理QP连接的数量不会随着上层应用的业务的数据增多而增多,实现了本地连接数与上层业务数量的解耦,并且随着网络规模变大,在本地网卡连接数有限控制的情况下,通过连接资源池的建立能够增强连接扩展能力。其次,当连接组包括多个连接时,由于实现了RDMA多连接能力,能够提高多连接之间的负载均衡性,在单连接故障时可以及时切换其他连接,提高传输可靠性。此外,通过对重要包的标识、识别与保护,有效减少由重要包丢失导致的传输尾时延,进一步提高有损网络下的数据传输性能的稳定性。
参见图5,图5为本申请实施例提供的一种通信装置的功能模块示意图。图5所示的通信装置可以用于执行上述图3所示的数据传输方法中第一通信装置或者第二通信装置所执行的操作,即图5所示的通信装置的各个功能模块用于实现上述图3所示的数据传输方法中第一通信装置或者第二通信装置所执行的功能操作。
如图5所示,通信装置的功能模块可以分为应用层、传输层和网卡。示例性地,应用层可以包括AI/HPC、分布式存储、容器或大数据等N(N为正整数)个应用业务,一个应用业务包括至少一个业务。可选地,每一个应用业务通过对应的连接器(connector,Conn)与应用程序接口(application program interface,API)连接,API可以为套接字(Soceket)API或者动词(Verb)API。
以RDMA场景为例,本申请实施例在通信装置的应用业务与网卡之间设计新的RDMA容损传输层(即图5所示的传输层中间件),该传输层中间件包括WR切分模块、连接共享模块、连接测量模块、 多连接选择模块、染色模块、乱序重排模块和调度模块。其中,传输层中间件中的部分功能模块也可卸载到网卡上,例如,可以将连接测量模块、染色模块、乱序重排模块和调度模块卸载到网卡上。在本申请实施例中,第一通信装置或者第二通信装置可以是指应用层软件和网卡硬件之间的中间件,例如,图5所示的传输层中间件。
WR切分模块,用于将大块WR数据切分为多个WR子块。可选地,WR切分模块在接收到应用业务通过API下发的WR数据后,通过阈值(例如MTU)来判断该WR数据是否为大块WR数据。可选地,将大块WR数据切分为多个WR子块的切分方式进行限定,例如,将大块WR数据的整块内存地址,通过首地址偏移的方式切分为多个WR子块。通过WR子块的切分能够更容易的实现细粒度的多连接调度和拥塞控制,提高数据传输性能。
连接共享模块,用于建立连接资源池。连接资源池包括至少一个连接组,一个连接组对应一个目的通信装置,即每个连接组对应的远端的接收节点相同。其中,每个连接组包括至少一个QP连接,每个QP连接的发送节点和接收节点相同,每个QP连接的源端口不同。由此,对于任一连接组,网络节点能够根据该任一连接组包括的不同QP连接的不同源端口,为不同QP连接ECMP出不同的连接。在该情况下,不同QP连接可以对应不同的物理路径。
可选地,当本地通信装置的应用业务与目的通信装置的应用业务需要通信时,例如,当通过API获取到上层的应用业务的WR数据时,连接共享模块先从连接资源池中查找是否存在与该目的通信装置对应的连接组。如果存在与该目的通信装置对应的连接组,则不建立新的连接组;如果不存在与该目的通信装置对应的连接组,则与该目的通信装置的IP建立新的连接组,将应用业务建立的原始连接作为逻辑QP连接,连接池中的QP连接为实际发送数据的物理QP连接。
在本申请实施例中,连接共享模块针对来自逻辑QP连接的WR子块,根据WR子块的目的IP,将WR子块承载在该目的IP对应的目的通信装置的连接组中进行发送。其中,来自上层不同业务的逻辑QP连接的WR子块,只要WR子块的目的IP对应的目的通信装置相同,则可以使用同一个连接组中的物理QP连接进行发送,从而实现连接资源池中的物理QP连接与逻辑QP连接之间的相互映射,达到了连接共享复用的效果。
本申请实施例不对连接资源池中的物理QP连接的连接模式进行限定,例如,可以为RC或DC等连接模式。在一种可能的实施方式中,对于一个业务的业务流,如果连接资源池中不存在对应的连接组,可以先基于建立的DC模式的物理QP连接进行发送,当业务流的数据量发送超过一定阈值时切换到建立的RC模式的物理QP连接进行发送。由于DC模式的物理QP连接的建立速度比RC模式的物理QP连接的建立速度快,因此,先基于DC模式的物理QP连接进行发送,能够加快数据传输的速度,减少传输延迟,切换到RC模式的物理QP连接之后,能够解决由DC模式导致的在不同目的通信装置间来回切换带来的低效率问题。
可选地,针对双边操作下的数据传输,连接共享模块还用于在每个WR子块中添加逻辑QP连接的标识(identity,ID),使得接收节点能够基于每个WR子块中的逻辑QP连接的ID,将WR子块切分前的大块WR数据上交给应用业务调用的原始连接上。
连接测量模块,用于发送探测报文进行连接信息测量。其中,连接测量模块可以周期性地进行连接测量,也可以根据事件触发的方式进行连接测量。探测报文可以是待测量连接对应的物理QP连接传输的随路报文,例如,基于该物理QP连接进行发送的任一WR子块;也可以是一个指定的具有探测功能的探测报文,该探测报文的五元组与待探测连接对应的物理QP连接的五元组相同,以保障中间转发节点上的物理路径相同。由此,通过连接测量模块能够测量出不同物理QP连接对应的时延、丢包等,从而可以实时标定出连接资源池中不同物理QP连接的连接性能。
多连接选择模块,用于在确定的连接组中选择至少一个物理QP连接。可选地,多连接选择模块根据不同物理QP连接的连接性能,将不同WR子块分发到不同物理QP连接中。例如,分发方式可以采用RR方式、权重分配、最短QP连接或QP连接中的最短SQ排队。在本申请实施例中,多连接选择模块,还用于当连接资源池中的任一物理QP连接的连接性能劣化时,对该任一物理QP连接进行剪枝与替换,即从连接组中删除该任一物理QP连接。
可选地,针对双边操作下的数据传输,多连接选择模块在每个WR子块中添加切分前的大块WR数据的标识WR_ID和切分后的WR子块的序列号WR_SEQ,使得接收节点能够基于每个WR子块中的WR_ID 和WR_SEQ,将相同WR_ID的WR子块依照WR_SEQ的顺序进行排序与组合,得到切分前的大块WR数据。
染色模块,用于为重要包进行染色,即在重要包的保留字段进行标记。可选地,重要包是指丢失会对业务性能产生较严重的劣化影响的报文,例如,重要包可以包括用于连接测量的探测报文、发送队列SQ队尾的WR子块、业务传递的控制信令报文、丢包重传报文、心跳保活报文、地址请求报文或业务指定的重要包等。其中,若要对丢包重传报文进行染色,则需要将染色模块实现在网卡上,通过网卡发送数据确定丢包重传报文。
乱序重排模块,用于对切分后的WR子块进行排序。可选地,在一个大块WR数据切分为多个WR子块后,由于通过共享复用的多个物理QP连接实现了多WR子块的多连接传输,则每个WR子块完成传输的时间可能不同,因此,每个WR子块的CQE的完成时间可能不同。乱序重排模块,用于记录切分的WR子块的数量,每当WR子块的CQE的完成数量等于WR子块的数量时,则将每个WR子块的CQE聚合成为切分前的大块WR数据的CQE进行上层的应用业务的提交。由此,在上述过程中,应用业务调用API发送的是大块WR数据的WQE,接收的也是大块WR数据的CQE,实现了应用业务的切分无感知,即应用透明。
在本申请实施例中,针对双边操作下的数据传输,接收节点的乱序重排模块,用于基于每个WR子块中的WR_ID和WR_SEQ,将相同WR_ID的WR子块依照WR_SEQ的顺序进行排序与组合。乱序重排模块,还用于对于两个未确认且目的IP相同的WR数据进行保序工作,即任一个WR数据确认后发送另一个WR数据。乱序重排模块,还用于实现选择性重传机制,选择性重传机制是能够实现仅对丢包报文进行重传的机制。
调度模块,用于在不同物理QP连接之间进行优先级调度和拥塞控制。由此,对上述多连接选择模块选择出的连接之间调度发送,以满足多连接之间的负载均衡,保证多连接的传输性能。
由此,通过上述各个功能模块,能够利用现有的RDMA接口,通过连接组共享复用的方法,实现大规模RDMA有损网的高性能高可靠业务,提高了RDMA连接扩展性,解决了连接规模受限的问题。当连接组包括多个连接时,通过多连接间的负载分担和调度,能够避免由单连接传输导致的业务中断问题,进一步保证了数据传输的高可靠性。当多个连接对应不同的多个物理路径时,能够解决网络中的多等价路径之间的负载均衡问题。
参见图6,图6为本申请实施例提供的一种通信装置的功能模块示意图。图6所示的通信装置可以用于执行上述图3所示的数据传输方法中第三通信装置所执行的操作,即图6所示的通信装置的各个功能模块用于实现上述图3所示的数据传输方法中第三通信装置执行的功能操作。
如图6所示,通信装置的功能模块包括重要包识别模块、重要包保护模块、通告报文生成模块、连接测量模块和乱序标定模块。图6所示的通信装置用于配合图5所示的通信装置实现本申请实施例提供的数据传输方法,图6所示的通信装置主要用于对重要包做差异化处理。
重要包识别模块,用于对重要包进行识别,即对上述染色模块进行染色的报文进行识别。可选地,对接收的报文进行解析,当报文的保留字段中包括染色标记时,确定接收的报文为重要包。
重要包保护模块,用于对重要包进行保护处理,以减小重要包的丢包率。示例性地,将重要包放入预留缓存(buffer)、将重要包的优先级调高。
通告报文生成模块,用于根据需求构造不带负载(payload)的通告报文给接收节点,或者构造丢失确认(not acknowledgement,NAK)包或者确认(acknowledgement,ACK)包给发送节点,或者,当发生网络拥塞时,构造拥塞通告报文给发送节点。
连接测量模块,用于接收图5所示的通信装置中的连接测量模块发送的探测报文,在探测报文中添加通信装置的标识、丢包、时延、吞吐等信息,帮助探测报文带回图5所示的通信装置以完成连接测量。
乱序标定模块,用于当通信装置造成报文转发乱序时,在乱序报文中添加乱序标识,帮助接收节点做丢包和报文乱序的区分。例如,接收节点如果收到未添加乱序标识的乱序报文,认为发生了丢包,发送丢包通告消息;接收节点如果收到添加乱序标识的乱序报文,认为发生了乱序但未发生丢包,则会延迟发送丢包通告消息。
由此,通过上述各个功能模块,对重要包做差异化处理以减少丢包危害,减少了网络丢包等带来的超 时重传等尾时延劣化影响。
以上介绍了本申请实施例的数据传输方法,与上述方法对应,本申请实施例还提供了数据传输装置。图7是本申请实施例提供的一种数据传输装置的结构示意图,该装置应用于图3所示的第一通信装置。基于图7所示的如下多个模块,该图7所示的数据传输装置能够执行第一通信装置所执行的全部或部分操作。应理解到,该装置可以包括比所示模块更多的附加模块或者省略其中所示的一部分模块,本申请实施例对此并不进行限制。如图7所示,该装置包括:
获取模块701,用于获取对应于第一业务的第一报文,第一报文的目的地是第二通信装置;
发送模块702,用于根据第一连接组向第二通信装置发送第一报文,第一连接组由第一通信装置传输的业务共享。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,发送模块702,用于根据第一连接向第二通信装置发送第一报文;
获取模块701,还用于获取对应于第二业务的第二报文,第二报文的目的地是第二通信装置;
发送模块702,还用于根据第一连接向第二通信装置发送第二报文。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,发送模块702,用于根据第一连接向第二通信装置发送第一报文;
获取模块701,还用于获取对应于第一业务的第二报文,第二报文的目的地是第二通信装置;
发送模块702,还用于根据第二连接向第二通信装置发送第二报文。
在一种可能的实施方式中,发送模块702,用于根据第一连接的连接性能满足性能条件选择第一连接向第二通信装置发送第一报文,连接性能包括发送队列长度、时延性能或丢包性能中的至少一种。
在一种可能的实施方式中,第一通信装置包括连接资源池,连接资源池包括至少一个连接组,其中每个连接组对应一个目的通信装置,至少一个连接组包括第一连接组。
在一种可能的实施方式中,该装置还包括:
切分模块,用于当第一报文的数据量大于阈值时,第一通信装置将第一报文切分为多个第一子报文;
发送模块702,用于根据第一连接组向第二通信装置发送多个第一子报文。
在一种可能的实施方式中,第一通信装置包括应用层和传输层;获取模块701,用于传输层接收应用层调用第一业务的逻辑接口发送的对应于第一业务的第一报文;
发送模块702,用于当传输层获取到多个第一子报文分别对应的通告消息时,传输层调用第一业务的逻辑接口向应用层发送第一报文对应的通告消息。
在一种可能的实施方式中,第一通信装置是应用层软件和网卡硬件之间的中间件。
在一种可能的实施方式中,第一报文基于RDMA协议传输。
图8是本申请实施例提供的一种数据传输装置的结构示意图,该装置应用于图3所示的第三通信装置。基于图8所示的如下多个模块,该图8所示的数据传输装置能够执行第三通信装置所执行的全部或部分操作。应理解到,该装置可以包括比所示模块更多的附加模块或者省略其中所示的一部分模块,本申请实施例对此并不进行限制。如图8所示,该装置包括:
接收模块801,用于接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文,第一报文的目的地是第二通信装置,第一连接组由第一通信装置传输的业务共享;
发送模块802,用于向第二通信装置发送第一报文。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,接收模块801,用于接收第一通信装置基于第一连接发送的对应于第一业务的第一报文;
接收模块801,还用于接收第一通信装置基于第一连接发送的对应于第二业务的第二报文,第二报文的目的地是第二通信装置;
发送模块802,还用于向第二通信装置发送第二报文。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接和第二连接,接收模块801,用于接收第一通信装置基于第一连接发送的对应于第一业务的第一报文;
接收模块801,还用于接收第一通信装置基于第二连接发送的对应于第一业务的第二报文,第二报文的目的地是第二通信装置;
发送模块802,还用于向第二通信装置发送第二报文。
在一种可能的实施方式中,第一报文包括多个第一子报文,装置还包括:添加模块,用于当多个第一子报文发生乱序时,在多个第一子报文中的发生乱序的第一子报文中添加乱序标识;
传输模块,用于传输添加乱序标识的第一子报文,乱序标识用于指示多个第一子报文发生乱序但未发生丢包。
在一种可能的实施方式中,第一报文基于RDMA协议传输。
图9是本申请实施例提供的一种数据传输装置的结构示意图,该装置应用于图3所示的第二通信装置。基于图9所示的如下多个模块,该图9所示的数据传输装置能够执行第二通信装置所执行的全部或部分操作。应理解到,该装置可以包括比所示模块更多的附加模块或者省略其中所示的一部分模块,本申请实施例对此并不进行限制。如图9所示,该装置包括:
接收模块901,用于接收对应于第一业务的第一报文,第一报文由第一通信装置基于第一连接组发送,第一报文的目的地是第二通信装置,第一连接组由第一通信装置传输的业务共享。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接,第一报文由第一通信装置基于第一连接发送;
接收模块901,还用于接收对应于第二业务的第二报文,第二报文由第一通信装置基于第一连接发送,第二报文的目的地是第二通信装置。
在一种可能的实施方式中,第一连接组包括第一通信装置和第二通信装置间的第一连接和第二连接,第一报文由第一通信装置基于第一连接发送;
接收模块901,还用于接收对应于第一业务的第二报文,第二报文由第一通信装置基于第二连接发送,第二报文的目的地是第二通信装置。
在一种可能的实施方式中,第一报文包括多个第一子报文,第二通信装置包括应用层和传输层;
接收模块901,用于传输层接收多个第一子报文,多个第一子报文分别包括对应的序列号、子序列号和接口标识,多个第一子报文由第一通信装置对第一报文切分得到;传输层按照序列号和子序列号将多个第一子报文排列组合,得到第一报文;传输层调用接口标识对应的逻辑接口向应用层发送第一报文。
在一种可能的实施方式中,该装置还包括:
确定模块,用于当接收到发生乱序的任一第一子报文时,若任一第一子报文携带有乱序标识,基于乱序标识确定多个第一子报文发生乱序但未发生丢包。
在一种可能的实施方式中,第二通信装置是应用层软件和网卡硬件之间的中间件。
在一种可能的实施方式中,第一报文基于RDMA协议传输。
本申请实施例提供的数据传输装置,实现了连接组的共享复用,使得本地连接的数量不会随着业务增多而增多,数据传输性能不受网卡内存等资源的限制。其次,当连接组包括多个连接时,由于实现了RDMA多连接能力,能够提高多连接之间的负载均衡性,在单连接故障时可以及时切换其他连接,提高传输可靠性。此外,通过对重要包的标识、识别与保护,有效减少由重要包丢失导致的传输尾时延,进一步提高有损网络下的数据传输性能的稳定性。
应理解的是,上述图7-9提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
参见图10,图10示出了本申请一个示例性实施例提供的数据传输设备2000的结构示意图。图10所示的数据传输设备2000用于执行上述图3所示的数据传输方法所涉及的操作。该数据传输设备2000例如是交换机、路由器等,该数据传输设备2000可以由一般性的总线体系结构来实现。
如图10所示,数据传输设备2000包括至少一个处理器2001、存储器2003以及至少一个通信接口2004。
处理器2001例如是通用中央处理器(central processing unit,CPU)、数字信号处理器(digital signal processor,DSP)、网络处理器(network processer,NP)、图形处理器(Graphics Processing Unit,GPU)、神经网络处理器(neural-network processing units,NPU)、数据处理单元(Data Processing Unit,DPU)、微处理器或者一个或多个用于实现本申请方案的集成电路。例如,处理器2001包括专用集成电路 (application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。PLD例如是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。其可以实现或执行结合本发明实施例公开内容所描述的各种逻辑方框、模块和电路。处理器也可以是实现计算功能的组合,例如包括一个或多个微处理器组合,DSP和微处理器的组合等等。
可选的,数据传输设备2000还包括总线。总线用于在数据传输设备2000的各组件之间传送信息。总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。
存储器2003例如是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,又如是随机存取存储器(random access memory,RAM)或者可存储信息和指令的其它类型的动态存储设备,又如是电可擦可编程只读存储器(electrically erasable programmable read-only Memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备,或者是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器2003例如是独立存在,并通过总线与处理器2001相连接。存储器2003也可以和处理器2001集成在一起。
通信接口2004使用任何收发器一类的装置,用于与其它设备或通信网络通信,通信网络可以为以太网、无线接入网(radio access network,RAN)或无线局域网(wireless local area networks,WLAN)等。通信接口2004可以包括有线通信接口,还可以包括无线通信接口。具体的,通信接口2004可以为以太(Ethernet)接口、快速以太(Fast Ethernet,FE)接口、千兆以太(Gigabit Ethernet,GE)接口,异步传输模式(Asynchronous Transfer Mode,ATM)接口,无线局域网(wireless local area networks,WLAN)接口,蜂窝网络通信接口或其组合。以太网接口可以是光接口,电接口或其组合。在本申请实施例中,通信接口2004可以用于数据传输设备2000与其他设备进行通信。
在具体实现中,作为一种实施例,处理器2001可以包括一个或多个CPU,如图10中所示的CPU0和CPU1。这些处理器中的每一个可以是一个单核(single-core CPU)处理器,也可以是一个多核(multi-core CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,数据传输设备2000可以包括多个处理器,如图10中所示的处理器2001和处理器2005。这些处理器中的每一个可以是一个单核处理器(single-core CPU),也可以是一个多核处理器(multi-core CPU)。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,数据传输设备2000还可以包括输出设备和输入设备。输出设备和处理器2001通信,可以以多种方式来显示信息。例如,输出设备可以是液晶显示器(liquid crystal display,LCD)、发光二级管(light emitting diode,LED)显示设备、阴极射线管(cathode ray tube,CRT)显示设备或投影仪(projector)等。输入设备和处理器2001通信,可以以多种方式接收用户的输入。例如,输入设备可以是鼠标、键盘、触摸屏设备或传感设备等。
在一些实施例中,存储器2003用于存储执行本申请方案的程序代码2010,处理器2001可以执行存储器2003中存储的程序代码2010。也即是,数据传输设备2000可以通过处理器2001以及存储器2003中的程序代码2010,来实现方法实施例提供的数据传输方法。程序代码2010中可以包括一个或多个软件模块。可选地,处理器2001自身也可以存储执行本申请方案的程序代码或指令。
本申请实施例的数据传输设备2000可对应于上述各个方法实施例中的第三通信装置,数据传输设备2000中的处理器2001读取存储器2003中的指令,使图10所示的数据传输设备2000能够执行第三通信装置所执行的全部或部分操作。
具体的,处理器2001用于接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文,第一报文的目的地是第二通信装置,第一连接组由第一通信装置传输的业务共享;向第二通信装置 发送第一报文。
其他可选的实施方式,为了简洁,在此不再赘述。
数据传输设备2000还可以对应于上述图8所示的数据传输装置,数据传输装置中的每个功能模块采用数据传输设备2000的软件实现。换句话说,数据传输装置包括的功能模块为数据传输设备2000的处理器2001读取存储器2003中存储的程序代码2010后生成的。
其中,图3所示的数据传输方法的各步骤通过数据传输设备2000的处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤,为避免重复,这里不再详细描述。
参见图11,图11示出了本申请另一个示例性实施例提供的数据传输设备2100的结构示意图,图11所示的数据传输设备2100用于执行上述图3所示的数据传输方法所涉及的全部或部分操作。该数据传输设备2100例如是交换机、路由器等,该数据传输设备2100可以由一般性的总线体系结构来实现。
如图11所示,数据传输设备2100包括:主控板2110和接口板2130。
主控板也称为主处理单元(main processing unit,MPU)或路由处理卡(route processor card),主控板2110用于对数据传输设备2100中各个组件的控制和管理,包括路由计算、设备管理、设备维护、协议处理功能。主控板2110包括:中央处理器2111和存储器2112。
接口板2130也称为线路接口单元(line processing unit,LPU)、线卡(line card)或业务板。接口板2130用于提供各种业务接口并实现数据包的转发。业务接口包括而不限于以太网接口、POS(Packet over SONET/SDH)接口等,以太网接口例如是灵活以太网业务接口(Flexible Ethernet Clients,FlexE Clients)。接口板2130包括:中央处理器2131、网络处理器2132、转发表项存储器2134和物理接口卡(physical interface card,PIC)2133。
接口板2130上的中央处理器2131用于对接口板2130进行控制管理并与主控板2110上的中央处理器2111进行通信。
网络处理器2132用于实现报文的转发处理。网络处理器2132的形态可以是转发芯片。转发芯片可以是网络处理器(network processor,NP)。在一些实施例中,转发芯片可以通过专用集成电路(application-specific integrated circuit,ASIC)或现场可编程门阵列(field programmable gate array,FPGA)实现。具体而言,网络处理器2132用于基于转发表项存储器2134保存的转发表转发接收到的报文,如果报文的目的地址为数据传输设备2100的地址,则将该报文上送至CPU(如中央处理器2131)处理;如果报文的目的地址不是数据传输设备2100的地址,则根据该目的地址从转发表中查找到该目的地址对应的下一跳和出接口,将该报文转发到该目的地址对应的出接口。其中,上行报文的处理可以包括:报文入接口的处理,转发表查找;下行报文的处理可以包括:转发表查找等等。在一些实施例中,中央处理器也可执行转发芯片的功能,比如基于通用CPU实现软件转发,从而接口板中不需要转发芯片。
物理接口卡2133用于实现物理层的对接功能,原始的流量由此进入接口板2130,以及处理后的报文从该物理接口卡2133发出。物理接口卡2133也称为子卡,可安装在接口板2130上,负责将光电信号转换为报文并对报文进行合法性检查后转发给网络处理器2132处理。在一些实施例中,中央处理器2131也可执行网络处理器2132的功能,比如基于通用CPU实现软件转发,从而物理接口卡2133中不需要网络处理器2132。
可选地,数据传输设备2100包括多个接口板,例如数据传输设备2100还包括接口板2140,接口板2140包括:中央处理器2141、网络处理器2142、转发表项存储器2144和物理接口卡2143。接口板2140中各部件的功能和实现方式与接口板2130相同或相似,在此不再赘述。
可选地,数据传输设备2100还包括交换网板2120。交换网板2120也可以称为交换网板单元(switch fabric unit,SFU)。在数据传输设备2100有多个接口板的情况下,交换网板2120用于完成各接口板之间的数据交换。例如,接口板2130和接口板2140之间可以通过交换网板2120通信。
主控板2110和接口板耦合。例如。主控板2110、接口板2130和接口板2140,以及交换网板2120之间通过系统总线与系统背板相连实现互通。在一种可能的实现方式中,主控板2110和接口板2130及接口板2140之间建立进程间通信协议(inter-process communication,IPC)通道,主控板2110和接口板2130 及接口板2140之间通过IPC通道进行通信。
在逻辑上,数据传输设备2100包括控制面和转发面,控制面包括主控板2110和中央处理器2111,转发面包括执行转发的各个组件,比如转发表项存储器2134、物理接口卡2133和网络处理器2132。控制面执行路由器、生成转发表、处理信令和协议报文、配置与维护数据传输设备的状态等功能,控制面将生成的转发表下发给转发面,在转发面,网络处理器2132基于控制面下发的转发表对物理接口卡2133收到的报文查表转发。控制面下发的转发表可以保存在转发表项存储器2134中。在有些实施例中,控制面和转发面可以完全分离,不在同一数据传输设备上。
值得说明的是,主控板可能有一块或多块,有多块的时候可以包括主用主控板和备用主控板。接口板可能有一块或多块,数据传输设备的数据处理能力越强,提供的接口板越多。接口板上的物理接口卡也可以有一块或多块。交换网板可能没有,也可能有一块或多块,有多块的时候可以共同实现负荷分担冗余备份。在集中式转发架构下,数据传输设备可以不需要交换网板,接口板承担整个系统的业务数据的处理功能。在分布式转发架构下,数据传输设备可以有至少一块交换网板,通过交换网板实现多块接口板之间的数据交换,提供大容量的数据交换和处理能力。所以,分布式架构的数据传输设备的数据接入和处理能力要大于集中式架构的数据传输设备。可选地,数据传输设备的形态也可以是只有一块板卡,即没有交换网板,接口板和主控板的功能集成在该一块板卡上,此时接口板上的中央处理器和主控板上的中央处理器在该一块板卡上可以合并为一个中央处理器,执行两者叠加后的功能,这种形态数据传输设备的数据交换和处理能力较低(例如,低端交换机或路由器等数据传输设备)。具体采用哪种架构,取决于具体的组网部署场景,此处不做任何限定。
在具体实施例中,数据传输设备2100对应于上述图8所示的数据传输装置。在一些实施例中,图8所示的数据传输装置中的接收模块801相当于数据传输设备2100中的物理接口卡2133。
本申请实施例还提供了一种通信装置,该装置包括:收发器、存储器和处理器。其中,该收发器、该存储器和该处理器通过内部连接通路互相通信,该存储器用于存储指令,该处理器用于执行该存储器存储的指令,以控制收发器接收信号,并控制收发器发送信号,并且当该处理器执行该存储器存储的指令时,使得该处理器执行第一通信装置所需执行的方法,或者使得该处理器执行第二通信装置所需执行的方法,或者使得该处理器执行第三通信装置所需执行的方法。
本申请实施例还提供了一种数据传输设备,该数据传输设备包括:处理器,处理器与存储器耦合,存储器中存储有至少一条程序指令或代码,至少一条程序指令或代码由处理器加载并执行,以使数据传输设备实现如图3所示的数据传输方法。可选地,所述处理器为一个或多个,所述存储器为一个或多个。
可选地,所述存储器可以与所述处理器集成在一起,或者所述存储器与处理器分离设置。
在具体实现过程中,存储器可以为非瞬时性(non-transitory)存储器,例如只读存储器(read only memory,ROM),其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型以及存储器与处理器的设置方式不做限定。
图12是本申请实施例提供的一种服务器的结构示意图,图12所示的服务器1400用于执行上述图3所示的数据传输方法中第一通信装置或第二通信装置所涉及的操作。该服务器1400可因配置或性能不同而产生比较大的差异,可以包括一个或多个处理器1401和一个或多个存储器1402,其中,该一个或多个存储器1402中存储有至少一条计算机程序,该至少一条计算机程序由该一个或多个处理器1401加载并执行,以使该服务器实现上述各个方法实施例提供的数据传输方法。当然,该服务器1400还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器1400还可以包括其他用于实现设备功能的部件,在此不做赘述。
本申请实施例还提供了一种数据传输系统,该数据传输系统包括:第一通信装置和第二通信装置。可选地,该数据传输系统还包括第三通信装置。其中,第一通信装置、第二通信装置和第三通信装置所执行的数据传输方法可参见上述图3所示实施例的相关描述,此处不再加以赘述。例如,第一通信装置和第二通信装置为图12所示的服务器1400,第三通信装置为图10所示的数据传输设备2000或图11所示的数据传输设备2100。
应理解的是,上述处理器可以是CPU,还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬 件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。值得说明的是,处理器可以是支持进阶精简指令集机器(advanced RISC machines,ARM)架构的处理器。
进一步地,在一种可选的实施例中,上述存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。存储器还可以包括非易失性随机存取存储器。例如,存储器还可以存储设备类型的信息。
该存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用。例如,静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic random access memory,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
本申请实施例还提供了一种计算机可读存储介质,存储介质中存储有至少一条指令,指令由处理器加载并执行,以使计算机实现如上任一的数据传输方法。
本申请实施例还提供了一种计算机程序(产品),当计算机程序被计算机执行时,可以使得处理器或计算机执行上述方法实施例中对应的各个步骤和/或流程。
本申请实施例还提供了一种芯片,包括处理器,用于从存储器中调用并运行存储器中存储的指令,使得安装有芯片的通信设备执行如上任一的数据传输方法。
本申请实施例还提供另一种芯片,包括:输入接口、输出接口、处理器和存储器,输入接口、输出接口、处理器以及存储器之间通过内部连接通路相连,处理器用于执行存储器中的代码,当代码被执行时,处理器用于执行如上任一的数据传输方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如,固态硬盘(solid state disk))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例中描述的各方法步骤和模块,能够以软件、硬件、固件或者其任意组合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各实施例的步骤及组成。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机程序指令。作为示例,本申请实施例的方法可以在机器可执行指令的上下文中被描述,机器可执行指令诸如包括在目标的真实或者虚拟处理器上的器件中执行的程序模块中。一般而言,程序模块包括例程、程序、库、对象、类、组件、数据结构等,其执行特定的任务或者实现特定的抽象数据结构。在各实施例中,程序模块的功能可以在所描述的程序模块之间合并或者分割。用于程序模块的机器可执行指令可以在本地或者分布式设备内执行。在分布式设备中,程序模块可以位于本地和远程存储介质二者中。
用于实现本申请实施例的方法的计算机程序代码可以用一种或多种编程语言编写。这些计算机程序代码可以提供给通用计算机、专用计算机或其他可编程的数据处理装置的处理器,使得程序代码在被计算机或其他可编程的数据处理装置执行的时候,引起在流程图和/或框图中规定的功能/操作被实施。程序代码可以完全在计算机上、部分在计算机上、作为独立的软件包、部分在计算机上且部分在远程计算机上或完全在远程计算机或服务器上执行。
在本申请实施例的上下文中,计算机程序代码或者相关数据可以由任意适当载体承载,以使得设备、装置或者处理器能够执行上文描述的各种处理和操作。载体的示例包括信号、计算机可读介质等等。
信号的示例可以包括电、光、无线电、声音或其它形式的传播信号,诸如载波、红外信号等。
机器可读介质可以是包含或存储用于或有关于指令执行系统、装置或设备的程序的任何有形介质。机器可读介质可以是机器可读信号介质或机器可读存储介质。机器可读介质可以包括但不限于电子的、磁的、光学的、电磁的、红外的或半导体系统、装置或设备,或其任意合适的组合。机器可读存储介质的更详细示例包括带有一根或多根导线的电气连接、便携式计算机磁盘、硬盘、随机存储存取器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或闪存)、光存储设备、磁存储设备,或其任意合适的组合。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、设备和模块的具体工作过程,可以参见前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,该模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、设备或模块的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
该作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以是两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
该集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例中方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种示例的范围的情况下,第一图像可以被称为第二图像,并且类似地,第二图像可以被称为第一图像。第一图像和第二图像都可以是图像,并且在某些情况下,可以是单独且不同的图像。
还应理解,在本申请的各个实施例中,各个过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本申请中术语“至少一个”的含义是指一个或多个,本申请中术语“多个”的含义是指两个或两个以上,例如,多个第二报文是指两个或两个以上的第二报文。本文中术语“系统”和“网络”经常可互换使用。
应理解,在本文中对各种所述示例的描述中所使用的术语只是为了描述特定示例,而并非旨在进行限制。如在对各种所述示例的描述和所附权利要求书中所使用的那样,单数形式“一个(“a”,“an”)” 和“该”旨在也包括复数形式,除非上下文另外明确地指示。
还应理解,本文中所使用的术语“和/或”是指并且涵盖相关联的所列出的项目中的一个或多个项目的任何和全部可能的组合。术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中的字符“/”,一般表示前后关联对象是一种“或”的关系。
还应理解,术语“包括”(也称“includes”、“including”、“comprises”和/或“comprising”)当在本说明书中使用时指定存在所陈述的特征、整数、步骤、操作、元素、和/或部件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元素、部件、和/或其分组。
还应理解,术语“若”和“如果”可被解释为意指“当...时”(“when”或“upon”)或“响应于确定”或“响应于检测到”。类似地,根据上下文,短语“若确定...”或“若检测到[所陈述的条件或事件]”可被解释为意指“在确定...时”或“响应于确定...”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。
应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
还应理解,说明书通篇中提到的“一个实施例”、“一实施例”、“一种可能的实现方式”意味着与实施例或实现方式有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”、“一种可能的实现方式”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。
以上描述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (47)

  1. 一种数据传输方法,其特征在于,应用于第一通信装置,所述方法包括:
    所述第一通信装置获取对应于第一业务的第一报文,所述第一报文的目的地是第二通信装置;
    所述第一通信装置根据第一连接组向所述第二通信装置发送所述第一报文,所述第一连接组由所述第一通信装置传输的业务共享。
  2. 根据权利要求1所述的方法,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接,所述第一通信装置根据第一连接组向所述第二通信装置发送所述第一报文,包括:
    所述第一通信装置根据所述第一连接向所述第二通信装置发送所述第一报文;
    所述方法还包括:
    所述第一通信装置获取对应于第二业务的第二报文,所述第二报文的目的地是所述第二通信装置;
    所述第一通信装置根据所述第一连接向所述第二通信装置发送所述第二报文。
  3. 根据权利要求1所述的方法,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接和第二连接,所述第一通信装置根据第一连接组向所述第二通信装置发送所述第一报文,包括:
    所述第一通信装置根据所述第一连接向所述第二通信装置发送所述第一报文;
    所述方法还包括:
    所述第一通信装置获取对应于所述第一业务的第二报文,所述第二报文的目的地是所述第二通信装置;
    所述第一通信装置根据所述第二连接向所述第二通信装置发送所述第二报文。
  4. 根据权利要求2或3所述的方法,其特征在于,所述第一通信装置根据所述第一连接向所述第二通信装置发送所述第一报文,包括:
    所述第一通信装置根据所述第一连接的连接性能满足性能条件选择所述第一连接向所述第二通信装置发送所述第一报文,所述连接性能包括发送队列长度、时延性能或丢包性能中的至少一种。
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述第一通信装置包括连接资源池,所述连接资源池包括至少一个连接组,其中每个连接组对应一个目的通信装置,所述至少一个连接组包括所述第一连接组。
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述第一通信装置根据第一连接组向所述第二通信装置发送所述第一报文之前,还包括:
    当所述第一报文的数据量大于阈值时,所述第一通信装置将所述第一报文切分为多个第一子报文;
    所述第一通信装置根据第一连接组向所述第二通信装置发送所述第一报文,包括:
    所述第一通信装置根据所述第一连接组向所述第二通信装置发送所述多个第一子报文。
  7. 根据权利要求6所述的方法,其特征在于,所述第一通信装置包括应用层和传输层;所述第一通信装置获取对应于第一业务的第一报文,包括:
    所述传输层接收所述应用层调用所述第一业务的逻辑接口发送的对应于第一业务的第一报文;
    所述第一通信装置根据所述第一连接组向所述第二通信装置发送所述多个第一子报文之后,还包括:
    当所述传输层获取到所述多个第一子报文分别对应的通告消息时,所述传输层调用所述第一业务的逻辑接口向所述应用层发送所述第一报文对应的通告消息。
  8. 根据权利要求1-7任一所述的方法,其特征在于,所述第一通信装置是应用层软件和网卡硬件之间的中间件。
  9. 根据权利要求1-8任一所述的方法,其特征在于,所述第一报文基于远程直接内存访问RDMA协议传输。
  10. 一种数据传输方法,其特征在于,应用于第三通信装置,所述方法包括:
    所述第三通信装置接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文,所述第一报文的目的地是第二通信装置,所述第一连接组由所述第一通信装置传输的业务共享;
    所述第三通信装置向所述第二通信装置发送所述第一报文。
  11. 根据权利要求10所述的方法,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接,所述第三通信装置接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文,包括:
    所述第三通信装置接收所述第一通信装置基于所述第一连接发送的对应于第一业务的第一报文;
    所述方法还包括:
    所述第三通信装置接收所述第一通信装置基于所述第一连接发送的对应于第二业务的第二报文,所述第二报文的目的地是所述第二通信装置;
    所述第三通信装置向所述第二通信装置发送所述第二报文。
  12. 根据权利要求10所述的方法,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接和第二连接,所述第三通信装置接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文,包括:
    所述第三通信装置接收所述第一通信装置基于所述第一连接发送的对应于第一业务的第一报文;
    所述方法还包括:
    所述第三通信装置接收所述第一通信装置基于所述第二连接发送的对应于所述第一业务的第二报文,所述第二报文的目的地是所述第二通信装置;
    所述第三通信装置向所述第二通信装置发送所述第二报文。
  13. 根据权利要求10-12任一所述的方法,其特征在于,所述第一报文包括多个第一子报文,所述方法还包括:
    当所述多个第一子报文发生乱序时,在所述多个第一子报文中的发生乱序的第一子报文中添加乱序标识;
    传输添加所述乱序标识的第一子报文,所述乱序标识用于指示所述多个第一子报文发生乱序但未发生丢包。
  14. 根据权利要求10-13任一所述的方法,其特征在于,所述第一报文基于远程直接内存访问RDMA协议传输。
  15. 一种数据传输方法,其特征在于,应用于第二通信装置,所述方法包括:
    所述第二通信装置接收对应于第一业务的第一报文,所述第一报文由第一通信装置基于第一连接组发送,所述第一报文的目的地是所述第二通信装置,所述第一连接组由所述第一通信装置传输的业务共享。
  16. 根据权利要求15所述的方法,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接,所述第一报文由所述第一通信装置基于所述第一连接发送;所述方法还包括:
    所述第二通信装置接收对应于第二业务的第二报文,所述第二报文由所述第一通信装置基于所述 第一连接发送,所述第二报文的目的地是所述第二通信装置。
  17. 根据权利要求15所述的方法,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接和第二连接,所述第一报文由所述第一通信装置基于所述第一连接发送;所述方法还包括:
    所述第二通信装置接收对应于所述第一业务的第二报文,所述第二报文由所述第一通信装置基于所述第二连接发送,所述第二报文的目的地是所述第二通信装置。
  18. 根据权利要求15-17任一所述的方法,其特征在于,所述第一报文包括多个第一子报文,所述第二通信装置包括应用层和传输层;所述第二通信装置接收对应于第一业务的第一报文,包括:
    所述传输层接收所述多个第一子报文,所述多个第一子报文分别包括对应的序列号、子序列号和接口标识,所述多个第一子报文由所述第一通信装置对所述第一报文切分得到;
    所述传输层按照所述序列号和所述子序列号将所述多个第一子报文排列组合,得到所述第一报文;
    所述传输层调用所述接口标识对应的逻辑接口向所述应用层发送所述第一报文。
  19. 根据权利要求15-18任一所述的方法,其特征在于,所述第一报文包括多个第一子报文,所述方法还包括:
    当接收到发生乱序的任一第一子报文时,若所述任一第一子报文携带有乱序标识,基于所述乱序标识确定所述多个第一子报文发生乱序但未发生丢包。
  20. 根据权利要求15-19任一所述的方法,其特征在于,所述第二通信装置是应用层软件和网卡硬件之间的中间件。
  21. 根据权利要求15-20任一所述的方法,其特征在于,所述第一报文基于远程直接内存访问RDMA协议传输。
  22. 一种数据传输装置,其特征在于,应用于第一通信装置,所述装置包括:
    获取模块,用于获取对应于第一业务的第一报文,所述第一报文的目的地是第二通信装置;
    发送模块,用于根据第一连接组向所述第二通信装置发送所述第一报文,所述第一连接组由所述第一通信装置传输的业务共享。
  23. 根据权利要求22所述的装置,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接,所述发送模块,用于根据所述第一连接向所述第二通信装置发送所述第一报文;
    所述获取模块,还用于获取对应于第二业务的第二报文,所述第二报文的目的地是所述第二通信装置;
    所述发送模块,还用于根据所述第一连接向所述第二通信装置发送所述第二报文。
  24. 根据权利要求22所述的装置,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接,所述发送模块,用于根据所述第一连接向所述第二通信装置发送所述第一报文;
    所述获取模块,还用于获取对应于所述第一业务的第二报文,所述第二报文的目的地是所述第二通信装置;
    所述发送模块,还用于根据所述第二连接向所述第二通信装置发送所述第二报文。
  25. 根据权利要求23或24所述的装置,其特征在于,所述发送模块,用于根据所述第一连接的连接性能满足性能条件选择所述第一连接向所述第二通信装置发送所述第一报文,所述连接性能包括发送队列长度、时延性能或丢包性能中的至少一种。
  26. 根据权利要求22-25任一所述的装置,其特征在于,所述第一通信装置包括连接资源池,所述连接资源池包括至少一个连接组,其中每个连接组对应一个目的通信装置,所述至少一个连接组包括所述第一 连接组。
  27. 根据权利要求22-26任一所述的装置,其特征在于,所述装置还包括:
    切分模块,用于当所述第一报文的数据量大于阈值时,所述第一通信装置将所述第一报文切分为多个第一子报文;
    所述发送模块,用于根据所述第一连接组向所述第二通信装置发送所述多个第一子报文。
  28. 根据权利要求27所述的装置,其特征在于,所述第一通信装置包括应用层和传输层;所述获取模块,用于所述传输层接收所述应用层调用所述第一业务的逻辑接口发送的对应于第一业务的第一报文;
    所述发送模块,用于当所述传输层获取到所述多个第一子报文分别对应的通告消息时,所述传输层调用所述第一业务的逻辑接口向所述应用层发送所述第一报文对应的通告消息。
  29. 根据权利要求22-28任一所述的装置,其特征在于,所述第一通信装置是应用层软件和网卡硬件之间的中间件。
  30. 根据权利要求22-29任一所述的装置,其特征在于,所述第一报文基于远程直接内存访问RDMA协议传输。
  31. 一种数据传输装置,其特征在于,应用于第三通信装置,所述装置包括:
    接收模块,用于接收第一通信装置基于第一连接组发送的对应于第一业务的第一报文,所述第一报文的目的地是第二通信装置,所述第一连接组由所述第一通信装置传输的业务共享;
    发送模块,用于向所述第二通信装置发送所述第一报文。
  32. 根据权利要求31所述的装置,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接,所述接收模块,用于接收所述第一通信装置基于所述第一连接发送的对应于第一业务的第一报文;
    所述接收模块,还用于接收所述第一通信装置基于所述第一连接发送的对应于第二业务的第二报文,所述第二报文的目的地是所述第二通信装置;
    所述发送模块,还用于向所述第二通信装置发送所述第二报文。
  33. 根据权利要求31所述的装置,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接和第二连接,所述接收模块,用于接收所述第一通信装置基于所述第一连接发送的对应于第一业务的第一报文;
    所述接收模块,还用于接收所述第一通信装置基于所述第二连接发送的对应于所述第一业务的第二报文,所述第二报文的目的地是所述第二通信装置;
    所述发送模块,还用于向所述第二通信装置发送所述第二报文。
  34. 根据权利要求31-33任一所述的装置,其特征在于,所述第一报文包括多个第一子报文,所述装置还包括:添加模块,用于当所述多个第一子报文发生乱序时,在所述多个第一子报文中的发生乱序的第一子报文中添加乱序标识;
    传输模块,用于传输添加所述乱序标识的第一子报文,所述乱序标识用于指示所述多个第一子报文发生乱序但未发生丢包。
  35. 根据权利要求31-34任一所述的装置,其特征在于,所述第一报文基于远程直接内存访问RDMA协议传输。
  36. 一种数据传输装置,其特征在于,应用于第二通信装置,所述装置包括:
    接收模块,用于接收对应于第一业务的第一报文,所述第一报文由第一通信装置基于第一连接组发送,所述第一报文的目的地是所述第二通信装置,所述第一连接组由所述第一通信装置传输的业务共享。
  37. 根据权利要求36所述的装置,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接,所述第一报文由所述第一通信装置基于所述第一连接发送;
    所述接收模块,还用于接收对应于第二业务的第二报文,所述第二报文由所述第一通信装置基于所述第一连接发送,所述第二报文的目的地是所述第二通信装置。
  38. 根据权利要求36所述的装置,其特征在于,所述第一连接组包括所述第一通信装置和所述第二通信装置间的第一连接和第二连接,所述第一报文由所述第一通信装置基于所述第一连接发送;
    所述接收模块,还用于接收对应于所述第一业务的第二报文,所述第二报文由所述第一通信装置基于所述第二连接发送,所述第二报文的目的地是所述第二通信装置。
  39. 根据权利要求36-38任一所述的装置,其特征在于,所述第一报文包括多个第一子报文,所述第二通信装置包括应用层和传输层;
    所述接收模块,用于所述传输层接收所述多个第一子报文,所述多个第一子报文分别包括对应的序列号、子序列号和接口标识,所述多个第一子报文由所述第一通信装置对所述第一报文切分得到;所述传输层按照所述序列号和所述子序列号将所述多个第一子报文排列组合,得到所述第一报文;所述传输层调用所述接口标识对应的逻辑接口向所述应用层发送所述第一报文。
  40. 根据权利要求36-39任一所述的装置,其特征在于,所述第一报文包括多个第一子报文,所述装置还包括:
    确定模块,用于当接收到发生乱序的任一第一子报文时,若所述任一第一子报文携带有乱序标识,基于所述乱序标识确定所述多个第一子报文发生乱序但未发生丢包。
  41. 根据权利要求36-40任一所述的装置,其特征在于,所述第二通信装置是应用层软件和网卡硬件之间的中间件。
  42. 根据权利要求36-41任一所述的装置,其特征在于,所述第一报文基于远程直接内存访问RDMA协议传输。
  43. 一种数据传输设备,其特征在于,所述数据传输设备包括:处理器,所述处理器与存储器耦合,所述存储器中存储有至少一条程序指令或代码,所述至少一条程序指令或代码由所述处理器加载并执行,以使所述数据传输设备实现权利要求1-21中任一所述的数据传输方法。
  44. 一种数据传输系统,其特征在于,所述数据传输系统包括第一通信装置和第二通信装置;
    所述第一通信装置用于执行权利要求1-9任一所述的数据传输方法,所述第二通信装置用于执行权利要求15-21任一所述的数据传输方法。
  45. 根据权利要求44所述的系统,其特征在于,所述系统还包括第三通信装置;
    所述第三通信装置用于执行权利要求10-14任一所述的数据传输方法。
  46. 一种计算机可读存储介质,其特征在于,所述计算机存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行,以使计算机实现如权利要求1-21中任一所述的数据传输方法。
  47. 一种计算机程序产品,其特征在于,所述计算机程序产品包括:计算机程序代码,所述计算机程序代码由计算机加载并执行,以使所述计算机实现权利要求1-21中任一所述的数据传输方法。
PCT/CN2023/103736 2022-11-29 2023-06-29 数据传输方法、装置、设备、系统及存储介质 WO2024113830A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211517066.9A CN118118424A (zh) 2022-11-29 2022-11-29 数据传输方法、装置、设备、系统及存储介质
CN202211517066.9 2022-11-29

Publications (1)

Publication Number Publication Date
WO2024113830A1 true WO2024113830A1 (zh) 2024-06-06

Family

ID=91217637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/103736 WO2024113830A1 (zh) 2022-11-29 2023-06-29 数据传输方法、装置、设备、系统及存储介质

Country Status (2)

Country Link
CN (1) CN118118424A (zh)
WO (1) WO2024113830A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080109526A1 (en) * 2006-11-06 2008-05-08 Viswanath Subramanian Rdma data to responder node coherency domain
CN108075915A (zh) * 2016-11-16 2018-05-25 西北工业大学 一种基于自适应控制策略的rdma通信连接池管理方法
CN108600349A (zh) * 2018-04-11 2018-09-28 北京小米移动软件有限公司 连接池中的连接管理方法及装置
CN113596085A (zh) * 2021-06-24 2021-11-02 阿里云计算有限公司 数据处理方法系统及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080109526A1 (en) * 2006-11-06 2008-05-08 Viswanath Subramanian Rdma data to responder node coherency domain
CN108075915A (zh) * 2016-11-16 2018-05-25 西北工业大学 一种基于自适应控制策略的rdma通信连接池管理方法
CN108600349A (zh) * 2018-04-11 2018-09-28 北京小米移动软件有限公司 连接池中的连接管理方法及装置
CN113596085A (zh) * 2021-06-24 2021-11-02 阿里云计算有限公司 数据处理方法系统及装置

Also Published As

Publication number Publication date
CN118118424A (zh) 2024-05-31

Similar Documents

Publication Publication Date Title
US11716409B2 (en) Packet transmission method and apparatus
US10084692B2 (en) Streaming bridge design with host interfaces and network on chip (NoC) layers
JP6267367B2 (ja) 分散型直接相互接続ネットワークにおけるパケットルーティング方法
WO2020236279A1 (en) System and method for facilitating efficient management of idempotent operations in a network interface controller (nic)
WO2021254500A1 (zh) 一种转发报文的方法、设备和系统
WO2014063370A1 (zh) 一种实现pcie交换网络的报文传输方法、设备、系统和存储介质
KR20190112804A (ko) 패킷 처리 방법 및 장치
US20210218808A1 (en) Small Message Aggregation
WO2022052882A1 (zh) 数据传输方法和装置
WO2021244439A1 (zh) 网络性能的测量方法、装置、设备、系统及存储介质
WO2022111372A1 (zh) 报文传输方法、装置、设备及计算机可读存储介质
WO2021139216A1 (zh) 一种流量传输的方法、节点和系统
WO2022068744A1 (zh) 获取报文头信息、生成报文的方法、设备及存储介质
US20200127936A1 (en) Dynamic scheduling method, apparatus, and system
WO2022068633A1 (zh) 一种切片帧的发送方法及装置
US20240089213A1 (en) Pfc storm detection and processing method
US11646978B2 (en) Data communication method and apparatus
WO2024113830A1 (zh) 数据传输方法、装置、设备、系统及存储介质
US8467311B2 (en) Method and system for avoiding flooding of packets in switches
CN116962161A (zh) 路径检测方法、装置、系统及计算机可读存储介质
WO2019165855A1 (zh) 一种报文传输的方法及装置
US20220166721A1 (en) Traffic balancing method, network device, and electronic device
WO2022042403A1 (zh) 生成路由信息、发送位置信息及转发报文的方法及设备
JPWO2010110356A1 (ja) パケット再送制御システム、パケット再送制御方法および再送制御プログラム
WO2024125098A1 (zh) 数据传输方法、装置、设备及计算机可读存储介质