WO2024046151A1 - 数据流处理方法及相关装置 - Google Patents

数据流处理方法及相关装置 Download PDF

Info

Publication number
WO2024046151A1
WO2024046151A1 PCT/CN2023/113931 CN2023113931W WO2024046151A1 WO 2024046151 A1 WO2024046151 A1 WO 2024046151A1 CN 2023113931 W CN2023113931 W CN 2023113931W WO 2024046151 A1 WO2024046151 A1 WO 2024046151A1
Authority
WO
WIPO (PCT)
Prior art keywords
message
communication device
data stream
network
source
Prior art date
Application number
PCT/CN2023/113931
Other languages
English (en)
French (fr)
Inventor
林艺宏
姚学军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024046151A1 publication Critical patent/WO2024046151A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/164Adaptation or special uses of UDP protocol

Definitions

  • the present application relates to the field of communication technology, and in particular, to a data stream processing method and related devices.
  • RDMA over converged Ethernet based on converged Ethernet is a network protocol that allows the use of remote direct memory access (RDMA) technology on an Ethernet network.
  • the RoCE protocol has two versions: RoCE v1 protocol and RoCE v2 protocol.
  • RoCE v1 is an Ethernet link layer protocol, thus allowing communication between any two hosts in the same Ethernet broadcast domain.
  • RoCE v2 is a network layer protocol, so RoCE v2 packets can be routed.
  • the RoCE protocol can also be applied to traditional or non-converged Ethernet networks.
  • the RoCEv2 protocol is built on the user datagram protocol (UDP). That is, RoCE messages are transmitted based on UDP.
  • UDP is a protocol that can communicate without establishing a connection. It has the disadvantages of not providing data packet grouping, assembly, and inability to sort data packets. In other words, after a message is sent, it is impossible to know whether it has arrived safely and completely. Therefore, in the existing implementation, the packet loss processing of RoCE packets relies on the retransmission mechanism based on the retransmission counter (retry counter, RC) in the inner layer of the computer system. However, this retransmission mechanism has low retransmission efficiency and affects the processing performance of specific services.
  • retry counter retry counter
  • Embodiments of the present application disclose a data flow processing method and related devices, which can improve the retransmission efficiency of messages and improve the processing performance of services corresponding to messages.
  • embodiments of the present application provide a data flow processing method.
  • the method includes: when a network failure occurs on the transmission path of the first data flow, the first communication device on the transmission path generates a first message, The aforementioned first message instructs the source node of the aforementioned first data stream to retransmit one or more messages in the aforementioned first data stream; the aforementioned first communication device sends the aforementioned first message.
  • the above-mentioned first message is a negative acknowledgment NAK message.
  • the communication device may be a network device on the transmission path, a network interface board in the network device, a processor in the network device, or a switching chip in the network device.
  • the above-mentioned communication device is a device included in any node (ie, network equipment) on the data stream transmission path.
  • the network device senses that a fault occurs on the transmission path, it can generate a response message for the data flow and send it to the source device of the data flow (ie, the above-mentioned source node) to instruct the source device to retransmit the message.
  • the source device of the data flow ie, the above-mentioned source node
  • the network failure is a link or device failure connected to the port that transmits the first data flow; before the first communication device generates the first message, the method further includes: the first communication device The aforementioned first data flow is determined according to the aforementioned port.
  • network faults can usually be associated with the egress port of the network device.
  • the network fault can be a fault of a link connected to the egress port or a fault of a connected next-hop device or device. Therefore, by associating and storing the information corresponding to the data flow sent from the egress port with the identifier of the egress port, when the link or next hop corresponding to the egress port fails, it can be quickly determined based on the egress port identifier that a restart is required. The data flow of the message. The information corresponding to the data flow is quickly found to generate a response message indicating message retransmission, thereby further improving the efficiency of message retransmission.
  • the method before a network failure occurs in the transmission path of the first data stream, the method further includes: the first communication device receiving the second message in the first data stream; Obtain the first information corresponding to the aforementioned first data flow according to the aforementioned second message.
  • the aforementioned first information includes the source IP address, destination IP address, source port number, destination queue pair DQP and data packet in the aforementioned second message. Serial number, the aforementioned first information is used to generate the aforementioned first message.
  • the communication device can obtain the above-mentioned first information for storage based on the messages of the passing data flow, so that it can quickly respond to network faults, generate a response message indicating retransmission of the message, and send it to the source end. Improve the efficiency of message retransmission.
  • the first information further includes the source queue pair SQP of the first data flow; the SQP is calculated based on the source port number and the DQP.
  • the communication device can calculate the SQP of the first data stream through the relational expression.
  • the source queue identifier and the above-mentioned target identifier obtained from the second message can be used to identify the first data flow, so as to assist in quickly generating a NAK message corresponding to the first data flow.
  • the first information also includes the SQP of the first data stream; before a network failure occurs in the transmission path of the first data stream, it also includes:
  • the aforementioned first communication device receives the first confirmation ACK message corresponding to the aforementioned first data stream
  • the aforementioned first communication device obtains the DQP in the aforementioned ACK message
  • the first communication device stores the DQP in the ACK message as the SQP of the first data stream.
  • the SQP of the first data stream can be obtained through the ACK message corresponding to the first data stream.
  • the SQP and the above-mentioned target identifier obtained from the second message can be used to identify the first data flow, so as to assist in quickly generating a NAK message corresponding to the first data flow.
  • the information corresponding to the first data stream also includes the SQP of the first data stream; before a network failure occurs in the transmission path of the first data stream, it also includes:
  • the aforementioned communication device receives the third message from the aforementioned source end, and the aforementioned third message includes the aforementioned SQP;
  • the communication device stores the SQP.
  • the aforementioned third message is a link layer discovery protocol message.
  • the communication device can obtain the SQP of the first data stream by interacting with the source of the first data stream.
  • the SQP and the above-mentioned target identifier obtained from the second message can be used to identify the first data flow, so as to assist in quickly generating a NAK message corresponding to the first data flow.
  • the first communication device after the first communication device obtains the first information from the second message, it further includes:
  • the aforementioned first communication device receives the second confirmation ACK message corresponding to the aforementioned first data stream
  • the aforementioned first communication device obtains the first data packet sequence number in the aforementioned ACK message
  • the first communication device updates the data packet sequence number in the first information stored in the communication device based on the first data packet sequence number.
  • the communication device can update the stored PSN through the PSN in the received ACK message corresponding to the first data stream. Because the PSN carried in the ACK message is the PSN of the message that the destination has confirmed to have received. Use this PSN to update the stored PSN, so that the number of retransmission messages indicated by subsequently generated NAK messages is reduced. Because it can reduce the number of packets retransmitted by the source, saving computing and network resources.
  • the information corresponding to the first data flow is stored in a flow information table.
  • the flow information table includes the port identifiers of the N outgoing ports in the intermediate node and the information corresponding to the M data flows.
  • Each of the foregoing N port identifiers is stored in association with information corresponding to one or more of the foregoing M data streams.
  • the foregoing N and M are integers greater than 0, and N ⁇ M.
  • the information corresponding to the data flow is stored in the form of a flow information table.
  • the flow information table the information corresponding to the data flow corresponding to each egress port can be quickly found, so the data flow that needs to retransmit the message can be quickly determined. , to quickly find the information corresponding to the data flow to generate a response message instructing message retransmission, thereby further improving the efficiency of message retransmission.
  • the first data stream is a data stream transmitted based on the RoCEv2 protocol; the second message is the first message in the message MSG included in the first data stream.
  • the second message received by the communication device may be the first message (fist message) in a certain MSG in the data stream.
  • the packet sequence number in the received fist message is regarded as the error packet sequence number (ePSN) in the generated NAK message.
  • the ePSN in the NAK message is used to instruct retransmission of the ePSN and the message corresponding to the PSN following the ePSN. Therefore, obtaining the above target information from the fist message can enable subsequent retransmission starting from the first message of the MSG, ensuring that all messages in the MSG are sent to the destination as much as possible.
  • this application provides a data stream processing method, which method includes:
  • the first communication device receives the first message from the second communication device; the first message is generated when a network failure occurs in the transmission path of the first data stream, and the first message indicates retransmission of the first data stream.
  • One or more messages in a data stream, the aforementioned second communication device is on the transmission path of the aforementioned first data stream, and the aforementioned first communication device is the source node of the aforementioned first data stream;
  • the first communication device retransmits the one or more messages.
  • the above-mentioned second communication device is a device included in any node (ie, network equipment) on the data stream transmission path.
  • the network device senses a fault on the transmission path, it can generate a response message for the data flow and send it to the source device of the data flow (i.e. The above-mentioned first communication device) to instruct the source device to retransmit the message.
  • the source device of the data flow i.e.
  • the method before the first communication device receives the first message from the second communication device, the method further includes:
  • the first communication device sends a second message to the second communication device.
  • the second message includes the SQP of the first data stream, and the SQP is used to generate the first message.
  • the first communication device which is the source of the first data stream, may send the SQP to the second communication device after generating the SQP of the first data stream.
  • the source queue identifier and the above-mentioned target identifier obtained from the second message can be used to identify the first data flow, so as to assist the second communication device to quickly generate the NAK message corresponding to the first data flow (the above-mentioned third data flow). one message) to improve the efficiency of message retransmission.
  • this application provides a communication device, which includes:
  • a generating unit configured to generate a first message when a network failure occurs in the transmission path of the first data stream.
  • the first message instructs the source node of the first data stream to retransmit one of the first data streams. or multiple messages; the aforementioned communication device is on the aforementioned transmission path;
  • the sending unit is used to send the aforementioned first message.
  • the aforementioned network failure is a link or device failure connected to the port that transmits the aforementioned first data flow
  • the communication device further includes a determining unit configured to determine the first data flow according to the port before the generating unit generates the first message.
  • the communication device further includes a receiving unit, configured to receive the second message in the first data stream before a network failure occurs in the transmission path of the first data stream;
  • the communication device further includes an acquisition unit, configured to acquire first information corresponding to the first data stream according to the second message, where the first information includes the source IP address, destination IP address, source IP address, and source IP address in the second message.
  • the port number, destination queue pair DQP and data packet sequence number, the aforementioned first information are used to generate the aforementioned first message.
  • the first information further includes the source queue pair SQP of the first data flow; the SQP is calculated based on the source port number and the DQP.
  • the first information also includes the SQP of the first data stream; before a network failure occurs in the transmission path of the first data stream,
  • the aforementioned receiving unit is also used to receive the first confirmation ACK message corresponding to the aforementioned first data stream;
  • the aforementioned acquisition unit is also used to obtain the DQP in the aforementioned ACK message
  • the communication device further includes a storage unit configured to save the DQP in the ACK message as the SQP of the first data stream.
  • the first information also includes the SQP of the first data stream; before a network failure occurs in the transmission path of the first data stream,
  • the aforementioned receiving unit is also used to receive the third message from the aforementioned source node, where the aforementioned third message includes the aforementioned SQP;
  • the aforementioned communication device also includes a storage unit for storing the aforementioned SQP.
  • the third message is a link layer discovery protocol message.
  • the aforementioned obtaining unit obtains the first information from the aforementioned second message
  • the aforementioned receiving unit is also used to receive the second confirmation ACK message corresponding to the aforementioned first data stream;
  • the aforementioned acquisition unit is also used to obtain the first data packet sequence number in the aforementioned ACK message
  • the communication device further includes an update unit configured to update the data packet sequence number in the first information stored in the communication device based on the first data packet sequence number.
  • the first message is a negative acknowledgment NAK message.
  • the first communication device is a network device on the transmission path, or a network interface board in the network device, or a processor in the network device, or a processor in the network device. Swap chips.
  • the first data stream is a data stream transmitted based on the RoCEv2 protocol; the second message is the first message in the message MSG included in the first data stream.
  • this application provides a data stream processing device, which includes:
  • a receiving unit configured to receive a first message from the second communication device; the aforementioned first message is generated when a network failure occurs in the transmission path of the aforementioned first data stream, and the aforementioned first message indicates retransmission of the aforementioned One or more messages in the first data stream, the aforementioned second communication device is on the transmission path of the aforementioned first data stream, and the aforementioned data stream processing device is the source node of the aforementioned first data stream;
  • the retransmission unit is used to retransmit the aforementioned one or more messages.
  • the data stream processing device further includes a sending unit, configured to send a second message to the second communication device before the receiving unit receives the first message from the second communication device.
  • the second message includes the source queue pair SQP of the first data flow, and the SQP is used to generate the first message.
  • the present application provides a communication device, which includes a processor and a memory.
  • the memory is coupled to a processor.
  • the processor executes the computer program stored in the memory, the data stream processing method described in any one of the above first aspects can be implemented.
  • the communication device may also include a communication interface, which is used for the device to communicate with other devices.
  • the communication interface may be a transceiver, a circuit, a bus, a module or other types of communication interfaces.
  • the communication device may include:
  • Memory used to store computer programs
  • a first message is generated, and the first message instructs the source node of the first data flow to retransmit one or more messages in the first data flow. ; and send the aforementioned first message through the communication interface.
  • the computer program in the memory in this application can be stored in advance or downloaded from the Internet when using the device.
  • This application does not specifically limit the source of the computer program in the memory.
  • the coupling in the embodiment of this application is an indirect coupling or connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between devices, units or modules.
  • the present application provides a communication device, which includes a processor and a memory.
  • the memory is coupled to a processor.
  • the processor executes the computer program stored in the memory, the data flow processing method described in any one of the above second aspects can be implemented.
  • the communication device may also include a communication interface, which is used for the device to communicate with other devices.
  • the communication interface may be a transceiver, a circuit, a bus, a module or other types of communication interfaces.
  • the communication device may include:
  • Memory used to store computer programs
  • the first message from the second communication device is received through the communication interface; the first message is generated when a network failure occurs in the transmission path of the first data stream, and the first message indicates retransmission of the first data stream.
  • One or more messages in the data stream, the aforementioned second communication device is on the transmission path of the aforementioned first data stream, the aforementioned first communication device is the source node of the aforementioned first data stream; and the aforementioned one or more messages are retransmitted. arts.
  • the computer program in the memory in this application can be stored in advance or downloaded from the Internet when using the device.
  • This application does not specifically limit the source of the computer program in the memory.
  • the coupling in the embodiment of this application is an indirect coupling or connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between devices, units or modules.
  • the present application provides a computer-readable storage medium that stores a computer program, and the computer program is executed by a processor to implement the method described in any one of the above-mentioned first aspects.
  • the present application provides a computer-readable storage medium that stores a computer program, and the computer program is executed by a processor to implement the method described in any one of the above-mentioned second aspects.
  • a ninth aspect of the present application is a computer program product.
  • the computer program product is read and executed by a computer, any of the methods described in the first aspect will be executed.
  • a tenth aspect of the present application is a computer program product.
  • the computer program product is read and executed by a computer, any of the methods described in the second aspect will be executed.
  • the present application provides a communication system.
  • the communication system includes a first communication device and a second communication device.
  • the first communication device is used to perform the method described in any one of the above first aspects.
  • the second communication device The device is configured to perform the method described in any one of the above second aspects.
  • FIGS 1 and 2 show schematic diagrams of communication networks in embodiments of the present application
  • Figure 3 shows a schematic structural diagram of network equipment in the embodiment of the present application
  • Figure 4 shows a schematic flow chart of the data stream processing method provided by the embodiment of the present application.
  • Figure 5 shows a schematic diagram of the format of a message in an embodiment of the present application
  • Figure 6A shows a schematic diagram of the format of another message in an embodiment of the present application.
  • Figure 6B shows a schematic diagram of the format of a message header in an embodiment of the present application
  • FIGS. 7 to 10 show schematic structural diagrams of the device provided by embodiments of the present application.
  • Figure 11 shows a schematic structural diagram of a communication system provided by an embodiment of the present application.
  • the communication network 100 includes a processing device 110, a processing device 120 and a forwarding network 130.
  • the processing device 110 and the processing device 120 communicate through the forwarding network 130 .
  • the processing device 110 and the processing device 120 support the RDMA over converged Ethernet (RoCE) protocol based on converged Ethernet, and specifically can support the RoCEv2 protocol. That is, the processing device 110 and the processing device 120 can read data or write data from the other party's memory according to the RoCEv2 protocol without being aware of the central processing unit (CPU) of both parties.
  • RoCE converged Ethernet
  • the above-mentioned forwarding network 130 includes one or more network devices, which may be used to forward communication data between the processing device 110 and the processing device 120 .
  • network devices which may be used to forward communication data between the processing device 110 and the processing device 120 .
  • RoCE messages for interactive communication between the processing device 110 and the processing device 120 are forwarded.
  • the RoCEv2 protocol is widely used in high performance computing (HPC), artificial intelligence (artificial intelligence, AI) computing, centralized storage, distributed storage and other scenarios. Therefore, the above-mentioned communication network 100 may be an HPC system network, an AI computing cluster network, a data center network (which may be a centralized storage data center or a distributed storage data center), etc.
  • FIG. 2 shows an example in which the communication network 100 is a data center network.
  • the data center network may exemplarily include a processing device 201, a processing device 202, a processing device 203, a processing device 204 and a forwarding network 205.
  • the processing device 201, the processing device 202, the processing device 203 and the processing device 204 may support the RoCEv2 protocol.
  • any two of the four processing devices may be equivalent to the processing device 110 and the processing device 120 shown in FIG. 1 above.
  • the above-mentioned forwarding network 205 may be equivalent to the above-mentioned forwarding network 130 shown in FIG. 1 .
  • the forwarding network 205 may include multiple network devices, including network devices 2051 to 2057, for example.
  • Network device 2051 and network device 2052 may be spine switches, and network devices 2053 to 2057 may be leaf switches.
  • the leaf switch may be, for example, a Top of Rack (TOR) switch.
  • TOR Top of Rack
  • Each spine switch can communicate with all leaf switches.
  • the network device 2051 and the network device 2052 are both connected to the network devices 2053 to 2057.
  • the leaf switch communicates with the processing device.
  • network device 2053 is connected to the processing device 201;
  • network device 2054 is connected to the processing device 202;
  • network device 2055 is connected to the processing device 203;
  • network device 2056 and network device 2057 are both connected to the processing device 204.
  • the processing device shown in FIG. 1 or FIG. 2 may be a client, and the client may include a server or a workstation.
  • the server may include any server that can implement computing functions, such as a backend computing server or a data storage server.
  • Clients can also include handheld devices (such as mobile phones, tablets, PDAs, etc.), vehicle-mounted devices, wearable devices (such as smart watches, smart bracelets, pedometers, etc.), laptops, desktop computers, smart home devices (For example, refrigerators, TVs, air conditioners, electricity meters, etc.), intelligent robots, workshop equipment, and various forms of user equipment (user equipment, UE), mobile stations (mobile stations, MS), terminal equipment (terminal equipment), etc. wait.
  • handheld devices such as mobile phones, tablets, PDAs, etc.
  • wearable devices such as smart watches, smart bracelets, pedometers, etc.
  • laptops desktop computers
  • smart home devices For example, refrigerators, TVs, air conditioners, electricity meters, etc.
  • intelligent robots workshop equipment
  • the network device shown in FIG. 1 or FIG. 2 may be a switch or router.
  • the network device may include a main control board 310 and an interface board 320.
  • the main control board can also be called the main processing unit (MPU).
  • the interface board can also be called a line processing unit (LPU).
  • the main control board 310 provides a control plane and a management plane.
  • the control plane completes functions such as protocol processing, business processing, routing calculations, forwarding control, business scheduling, traffic statistics, or system security in the network equipment.
  • the management plane completes the system's operating status monitoring, environment monitoring, log and alarm information processing, system loading or system upgrade and other functions.
  • the interface board 320 provides interfaces of different types (such as optical ports or electrical ports) and different rates, and forwards data through a distributed data plane.
  • the distributed data plane may be, for example, a switch fabric unit (SFU).
  • SFU switch fabric unit
  • the switching network board can connect the main control board 310 and the interface board 320 to enable communication between them.
  • the above-mentioned main control board 310 may include a processor 3101, a memory 3102, an Ethernet interface 3103, an Ethernet interface 3104, a physical layer (physical, PHY) Ethernet interface 3105 and a PHY Ethernet interface 3106. in:
  • the processor 3101 is the control unit of the main control board 310, that is, it is the final execution unit for information processing and program execution in the main control board 310.
  • the processor 3101 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field-programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof.
  • the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and so on.
  • the memory 3102 is used to store computer programmers, configured parameters and data in the main control board 310 .
  • memory 3102 It can include random access memory (RAM) and flash memory (flash).
  • RAM random access memory
  • flash memory flash memory
  • This flash memory can be used to store computer programs and statically configured parameters, etc.
  • This RAM is equivalent to running memory, where computer programs and data executed during runtime can be stored.
  • the Ethernet interface 3103 may be a management network port. This interface can be used for system program loading and debugging, and can also be connected to remote network management workstations and other equipment to achieve remote management of the system.
  • the Ethernet interface 3103 can be connected to the above-mentioned switching network board through the PHY Ethernet interface 3105 to achieve communication with the remote network management workstation.
  • the Ethernet interface 3104 can be connected to the above-mentioned switching network board through the PHY Ethernet interface 3106, and implement communication with the above-mentioned interface board 320 through the switching network board.
  • the switching network board is connected to the Ethernet interface 3203 in the interface board 320 to implement communication with the processor 3201 in the interface board 320 .
  • the above-mentioned interface board 320 may include a processor 3201, a memory 3202, an Ethernet interface 3203, a LAN Switch (LSW) chip 3204, and a PHY Ethernet interface 3205.
  • a processor 3201 may include a central processing unit (CPU) for executing instructions.
  • the processor 3201 is the control unit of the interface board 320, that is, it is the final execution unit for information processing and program execution in the interface board 320.
  • the processor 3201 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field-programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof.
  • the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and so on.
  • the memory 3202 is used to store the computer programmer in the interface board 320, configured parameters and data, etc.
  • the memory 3202 may include random access memory (RAM) and flash memory (flash).
  • RAM random access memory
  • flash memory flash memory
  • This flash memory can be used to store computer programs and statically configured parameters, etc.
  • This RAM is equivalent to running memory, where computer programs and data executed during runtime can be stored.
  • the Ethernet interface 3203 can be connected to the PHY Ethernet interface 3106 in the main control board 310 through the switching network board to achieve communication with the main control board 310 .
  • the LSW chip 3204 can be controlled by the processor 3201 to perform initialization, service table delivery, protocol message sending and receiving, or control of various interrupts (including port connection (link up) and disconnection (link down) processing), etc.
  • the interface board 320 may also include another memory, which may be a dedicated memory of the LSW chip. This memory can be used to store the contents of forwarded messages, etc.
  • the memory may be RAM.
  • the PHY Ethernet interface 3205 is an interface connected to the LSW chip 3204, and can be used to connect the optical or electrical Ethernet interfaces between network devices.
  • the interface board 320 includes a coprocessor of the processor 3201 .
  • the co-processor can assist the processor 3201 in completing preset processing tasks.
  • the coprocessor can be used to implement the creation and management functions of the flow information table.
  • the coprocessor can also be used to implement functions such as message generation.
  • the structure of the network device shown in FIG. 3 is only an example and does not constitute a limitation on the embodiments of the present application.
  • the structure of the network device shown in Figure 3 above is a frame-type network device as an example.
  • the network device may also be a box-type network device, etc., and the embodiment of the present application does not limit this.
  • the communication network introduced above supports the RoCEv2 protocol.
  • the RoCEv2 protocol is built on the user datagram protocol (UDP). That is, RoCE messages are transmitted based on UDP.
  • UDP is a protocol that can communicate without establishing a connection. It has the disadvantages of not providing data packet grouping, assembly, and inability to sort data packets. In other words, after a message is sent, it is impossible to know whether it has arrived safely and completely. .
  • a retransmission mechanism based on the retry counter (RC) in the inner layer of the computer system is designed. Two currently commonly used RC-based retransmission schemes are as follows:
  • the destination can notify the source of the error packet sequence number (ePSN) through a negative acknowledgment (NAK) message. ), so that the source end retransmits the lost packet.
  • ePSN error packet sequence number
  • NAK negative acknowledgment
  • the above source end is the device that generates and sends the MSG.
  • the source end may be, for example, the processing device 110 shown in FIG. 1 above.
  • the above destination is the destination device to which the MSG goes.
  • the source end may also be called a source node.
  • the destination may be, for example, the processing device 120 shown in FIG. 1 above.
  • the MSG may include one or more messages.
  • Each packet includes a packet sequence number (PSN).
  • PSN packet sequence number
  • the destination finds that all the messages following a certain PSN have been received, but the message of the PSN has not been received. This certain PSN is the above-mentioned ePSN.
  • the destination can generate a NAK message and send it to the source to instruct the source to retransmit the ePSN message and retransmit the PSN following the ePSN. message.
  • the destination since the destination does not receive any message in the MSG, it cannot respond according to the first retransmission solution to instruct the source to retransmit. Since the source did not receive any response, it could only wait for the timeout. It is not until the timer of the RoCEv2 network card in the source expires that the source retransmits the unanswered packets.
  • the timeout time of the source network card is usually relatively long (for example, 2 seconds, etc.). This will bring a poor experience to some time-sensitive application scenarios (such as transaction storage and other delay-sensitive scenarios), making it difficult to meet business performance requirements.
  • the data stream processing method provided by the embodiment of the present application may include but is not limited to the following steps.
  • the communication device on the transmission path When a network failure occurs on the transmission path of the first data flow, the communication device on the transmission path generates a first message; the first message indicates retransmission of one or more messages in the first data flow. arts.
  • the communication device is a device included in the first node on the transmission path.
  • the first node is an intermediate node in the transmission path or the destination of the first data flow (or is also called a destination node). If the first node is an intermediate node in the transmission path, the intermediate node in the transmission path may be a network device on the transmission path.
  • the communication device may be the network device itself, or may be an interface board in the network device, or may be a processor on the interface board in the network device, or may be a switching chip in the network device, or may be It is the combination of the processor and the switching chip.
  • the communication device may be the destination device itself, or may be an interface board in the destination device, or may be an interface board in the destination device.
  • the processor on the device may be a switching chip in the destination device, or may be a combination of the processor and the switching chip.
  • the communication device sends the first message to the source end of the first data flow.
  • the above-mentioned first data stream may include one or more MSGs. That is, the first data stream may be composed of packets in the one or more MSGs. These packets may be RoCEv2 packets.
  • the source end of the first data stream may be the processing device 110 shown in FIG. 1 above.
  • the destination of the first data flow may be the processing device 120 shown in FIG. 1 above.
  • the transmission path of the first data stream may be a path between the processing device 110 and the processing device 120 for transmitting the first data stream.
  • the processor may be the above-mentioned main control unit of the interface board, or may be a co-processor of the main control unit.
  • the processor may be the processor 3201 shown in FIG. 3 above, or may be a coprocessor of the processor 3201 .
  • the first data flow is forwarded out of the intermediate node through a certain outgoing port of the intermediate node.
  • the certain egress port may be referred to as the target egress port for short.
  • the network failure may be a link failure or equipment failure on the path between the target egress port and the destination end of the first data flow in the transmission path of the first data flow.
  • the network failure may be a failure of a link connected to the target egress port.
  • the network failure may be a failure of a next-hop network device connected to the target egress port.
  • the source end of the first data flow is the processing device 201 and the destination end is the processing device 204 .
  • the transmission path of the first data stream is: processing device 201 - network device 2053 - network device 2051 - network device 2056 - processing device 204.
  • the above-mentioned intermediate node is a network device 2053
  • the next-hop device connected to the outbound port of the network device 2053 for sending the first data flow is the network device 2051.
  • the above-mentioned network failure may be, for example, a link failure between the network device 2053 and the network device 2051 .
  • the network failure may be, for example, that the network device 2051 fails.
  • the network failure may be, for example, a link or device failure between the network device 2051 - the network device 2056 - the processing device 204. It can be understood that this is only an example and does not constitute a limitation on the embodiments of the present application.
  • processing device 201-network device 2053-network device 2051-network device 2056-processing device 204" takes the transmission path of the above-mentioned first data stream as "processing device 201-network device 2053-network device 2051-network device 2056-processing device 204" as an example.
  • the flow information table can be generated and managed in the above communication device.
  • the flow information table may include one or more entries, and each entry may be used to store information corresponding to a data flow.
  • the information corresponding to the data flow may include identification information of the data flow and indication information of messages in the data flow that need to be retransmitted.
  • the indication information of the messages that need to be retransmitted in the above data stream may be: the message that needs to be retransmitted in a certain MSG in the data stream. Instructions for the message.
  • the identification information of the above-mentioned data flow may include one or more of the following information: the source Internet protocol (IP) address of the data flow, the destination IP address, the source port number, the destination port number, the source queue pair (source queue pair, SQP) and destination queue pair (destination queue pair, DQP).
  • IP Internet protocol
  • SQP source queue pair
  • DQP destination queue pair
  • the source queue pair and the destination queue pair may be established based on the RoCEv2 protocol, and are used to implement the RDMA data communication between the source end and the destination end.
  • the indication information of the packets that need to be retransmitted in the data stream may be the smallest PSN among the PSNs of the packets that need to be retransmitted. That is, the minimum PSN is used to indicate that all packets in the data flow starting from the minimum PSN need to be retransmitted. For example, if the message that needs to be retransmitted is a message that needs to be retransmitted in a certain MSG in the data flow, the smallest PSN is used to indicate the packets in the MSG starting from the smallest PSN. All articles need to be retransmitted. The message that needs to be retransmitted may include the message corresponding to the minimum PSN, or may not include the message corresponding to the minimum PSN.
  • the communication device may bind the outgoing port of the data flow in the intermediate node to which the communication device belongs to the data flow.
  • the port identifier of the egress port may be associated and stored in the flow information entry where the information corresponding to the data flow is located.
  • the network fault can be associated with the outgoing port of the data flow.
  • the network fault can be a failure of the link connected to the outgoing port or a failure of the next-hop device. Then, after a network failure occurs, the communication device can quickly find information corresponding to the data flow sent from the egress port based on the corresponding egress port identifier. Then a corresponding message indicating retransmission is generated.
  • the entry of the information corresponding to the above data stream may also include an aging period (age).
  • the aging period indicates the validity period of the entry.
  • the aging period may be 10 nanoseconds, 15 nanoseconds, or 20 nanoseconds, etc.
  • the aging period can be configured according to requirements, and the embodiments of this application do not limit this.
  • Table 1 exemplarily shows the structure of the flow information table. It can be seen that the flow information table can include multiple entries, and each entry corresponds to a data flow. Each entry includes the corresponding data flow's egress port ID, source IP address, destination IP address, source port number, destination port number, source queue pair ID, destination queue pair ID, packet sequence number, and aging period.
  • the packet sequence number is the smallest PSN among the PSNs of the packets that need to be retransmitted as described above.
  • an egress port can be used to send one or more data streams. Then, in the above flow information table, an egress port identifier can be bound to information corresponding to one or more data flows. For example, referring to Table 1 above, the egress port identifier PORT1 can be bound to the information corresponding to the two data flows. The egress port identifier PORT2 is bound to the information corresponding to a data flow.
  • the flow information table introduced above is only an example.
  • the flow information table may include the above-mentioned egress port identifier, source IP address, destination IP address, source port number, destination port number, source queue pair identifier, destination queue pair identifier, data packet sequence number, and One or more items in the aging cycle. It is not necessary to include the entire content, for example, the destination port number of the data flow, etc. may not be included.
  • the specific content included in the flow information table can be set according to actual needs, and is not limited in the embodiment of this application.
  • the flow information table in the communication device includes an entry corresponding to the first data flow.
  • the entry corresponding to the first data stream is called the target entry.
  • the target entry includes the port identifier of the outbound port of the first data flow in the intermediate node (ie, the target outbound port).
  • the target entry also includes one or more of the source IP address, destination IP address, source port number, destination port number, source queue pair identifier, and destination queue pair identifier of the first data flow.
  • the target entry may also include the aging period of the entry and the packet sequence number indicating the retransmission of the message.
  • the communication device may obtain part of the information from the message of the first data stream received by the intermediate node to which the communication device belongs and save it in the target entry.
  • the packet may be the first packet in a certain MSG in the first data flow.
  • Figure 5 For ease of understanding, please refer to Figure 5.
  • Figure 5 exemplarily shows the format of the above-mentioned first message. It can be seen that the message includes Ethernet header (Ethernet header), IP header (IP header), UDP header (UDP Header), basic transport header (base transport header, BTH), payload (payload), invariant cyclical redundancy check (ICRC) and frame check sequence (FCS).
  • Ethernet header Ethernet header
  • IP header IP header
  • UDP header UDP header
  • BTH basic transport header
  • payload payload
  • ICRC invariant cyclical redundancy check
  • FCS frame check sequence
  • the Ethernet header may include a source media access control address (MAC) and a destination MAC address.
  • MAC media access control address
  • the source MAC address may be the MAC address of the source end of the first data flow.
  • the destination MAC address may be, for example, the MAC address of the destination of the first data flow.
  • the IP packet header may include the source IP address and the destination IP address.
  • the transmission path of the above-mentioned first data flow is "processing device 201 - network device 2053 - network device 2051 - network device 2056 - processing device 204".
  • the source IP address in the IP packet header is the IP address of the network device 2053.
  • the destination IP address is the IP address of the network device 2056.
  • the UDP packet header may include the UDP source port number and the UDP destination port number.
  • the UDP port number used in RoCEv2 is fixed at 4791.
  • the basic transmission header is the header of wireless bandwidth technology (InfiniBand, IB) in the transport layer.
  • the header may include an operation code (OpCode), a destination queue pair identifier (Dest QP), and a packet sequence number (PSN).
  • This opcode indicates the RoCEv2 message type and indicates what operating mode the message is in.
  • this operation code is mainly used to capture the first message in the MSG.
  • a MSG includes a first message (that is, the first message), one or more middle messages (that is, intermediate messages), and a last message (that is, the last message). Among them, the operations in the fist message, the middle message and the last message are different.
  • the basic transmission message header also includes other contents. For details, refer to the description in the InfiniBand standard (InfiniBandTM Architecture Specification Volume 1, Release 1.1), which will not be described here.
  • the above load carries specific data.
  • the above ICRC and FCS are used for message verification.
  • the part of the information that the above-mentioned communication device can obtain from the above-mentioned first message is the source IP address, destination IP address, source port number, destination queue pair and data packet sequence in the first message. Number.
  • source queue pair identifier Since there is no source queue pair identifier in the first message above, the source queue pair identifier needs to be obtained through other methods.
  • source port number (source queue pair ID XOR destination queue pair ID) OR 0xC000 .
  • XOR represents the "XOR” logical operation
  • OR represents the "OR” logical operation.
  • the source queue pair ID (source port number & 0x3FFF) XOR the destination queue pair ID.
  • & represents the "AND" logical operation. Since the source port number and destination queue pair representation have been obtained from the first message above, the source queue pair identifier can be calculated.
  • the port representation of the target egress port corresponding to the first data flow can be obtained from the forwarding table of the intermediate node.
  • the aging period of the above-mentioned target table entry can adopt the default configured duration, or can be customized to configure a duration, which is not limited in the embodiment of this application.
  • the content included in the above target entry can be obtained and saved for subsequent use.
  • the above communication device includes a coprocessor of the main control unit on the interface board of the intermediate node. Then, the above-mentioned first message may be captured by the coprocessor from the inlet port or receiver of the intermediate node. Then, the coprocessor decapsulates the first captured message to obtain the above information. For example, the coprocessor can capture multiple packets in the first data stream, and then determine the first packet based on the operation code in the packet, and then obtain the above information.
  • the above communication device includes an LSW chip on the interface board of the intermediate node. After receiving the first message, the intermediate node will send the first message to the LSW chip for processing. Therefore, the LSW chip does not need to specifically capture the first message. Instead, after receiving the message and determining that it is the first message based on the above operation code, the above information can be obtained from it.
  • the source queue pair identifier in the target entry can be obtained by capturing an acknowledgment character (ACK) message corresponding to the first data stream.
  • the destination queue pair identifier in the ACK message is the source queue pair identifier corresponding to the first data flow.
  • the above communication device since a large number of ACK messages are received, it is also necessary to identify the ACK message corresponding to the first data stream from these ACK messages. For example, after the above communication device obtains an ACK message, it can obtain information such as ACK.SIP, ACK.DIP, ACK.SQP, and ACK.DQP from the ACK message.
  • the ACK.SIP is the source IP address in the ACK message, and is also the destination IP address of the data flow corresponding to the ACK message.
  • the ACK.DIP is the destination IP address in the ACK message, and is also the source IP address of the data stream corresponding to the ACK message.
  • the ACK.SQP is the source queue pair identifier in the ACK message, and is also the destination queue pair identifier of the data flow corresponding to the ACK message.
  • the ACK.DQP is the destination queue pair identifier in the ACK message, and is also the source queue pair identifier of the data flow corresponding to the ACK message.
  • source port number (source queue pair ID)
  • the source port number of the data flow is also the destination port number corresponding to the ACK message.
  • a data flow can be determined based on ACK.SIP, ACK.DIP, ACK.SQP, ACK.DQP and the calculated source port number of the data flow, and then it can be determined that the ACK message is the ACK message corresponding to the determined data flow.
  • the ACK message corresponding to the above-mentioned first data flow can be determined, and then the destination queue pair identifier in the ACK message can be obtained as the source queue pair identifier in the above-mentioned target entry.
  • the source queue pair identifier in the target entry can be obtained through interactive communication with the source end of the first data flow.
  • a queue pair (QP) link needs to be established between the source end and the destination end. That is, the source can create a queue pair.
  • the queue pair includes a send queue and a receive queue.
  • the sending queue is used to send information to the destination end, and the receiving queue is used to receive information from the destination end.
  • the destination also needs to create a queue pair, which also includes a sending queue and a receiving queue.
  • the receiving queue is used to receive information from the source end, and the sending queue is used to send information to the source end. After the source end creates a queue pair, it will notify the destination end of the queue pair's identifier.
  • the destination after the destination creates a queue pair, it will also notify the source of the queue pair's identity.
  • the identifier of the created queue pair can also be sent to the above-mentioned intermediate node.
  • the source end may send the identifier of the created queue pair to the above-mentioned intermediate node through a link layer discovery protocol (LLDP) message.
  • the LLDP message may include the source IP address, destination IP address, source port number, destination port number, source queue pair identifier (that is, the identifier of the queue pair created by the source end) and the destination of the first data flow. Queue pair ID (that is, the ID of the queue pair created by the destination).
  • the information included in the LLDP message can be used to identify the first data flow.
  • the source end After the source end generates the LLDP packet, it can first send the LLDP packet to the network device directly connected to the source end. For example, it is sent to the above-mentioned network device 2053.
  • the network device determines the first data flow based on the information in the message. Then, save the source queue ID in the message into the above target entry.
  • the above network device can also forward the LLDP message to the next-hop network device based on the destination IP address in the message. This allows the next-hop network device to also obtain the source queue identification of the first data flow. Since the destination IP address is the destination IP address of the first data flow, after forwarding the LLDP message hop by hop based on the indication of the destination IP address, all network devices on the transmission path of the first data flow can obtain the The source queue ID of the first data stream.
  • the first data stream may include multiple MSGs.
  • the multiple MSGs may be transmitted sequentially.
  • the above-mentioned first message may be the first message in the first transmitted MSG among the multiple MSGs. That is, the above target entry stores the first message in the first transmitted MSG.
  • the communication device can capture the third MSG in the second MSG. One message and the first message in the third MSG and so on. In this case, the communication device may update the PSN in the target entry with the PSN in the currently captured first message of the MSG. Based on the previous description, it can be known that the PSN in the target entry is used to instruct the retransmission of subsequent messages with the PSN. Therefore, updating the PSN in the target entry can reduce the number of retransmitted messages and save network resources.
  • the communication device may also update the data packet sequence number in the target entry based on the PSN in the ACK message corresponding to the captured first data stream.
  • the ACK message of the first data flow can be determined based on the method described above, and then the PSN in the ACK message can be obtained and updated into the above target entry. Since the PSN in the ACK is the PSN of the message that has been received, and the PSN in the target entry is used to indicate the retransmission of subsequent messages with the PSN, the PSN in the ACK is used to update the data packet in the target entry.
  • the sequence number can reduce the number of retransmitted packets and save network resources.
  • the intermediate node is a network device directly connected to the above-mentioned source end.
  • the intermediate node is the above-mentioned network device 2053.
  • the packets received from the source end by the intermediate node have not been encapsulated with an IP packet header, and the IP packet header is encapsulated by the intermediate node. Therefore, in the intermediate node, the IP source address in the above target entry is its own IP address.
  • the destination IP address in the destination entry is the destination IP address of the first data flow, which can be obtained by searching the routing table in the intermediate node.
  • the communication device can obtain the information corresponding to the first data flow from the flow information table. and generate the above first message based on the obtained information.
  • the network fault is associated with the target egress port of the first data flow.
  • the network failure may be a failure of the link connected to the target egress port, or a failure of the next-hop network device connected to the target egress port, etc.
  • the intermediate node can use existing network fault sensing methods to sense network faults.
  • the intermediate node can be made aware of network faults through heartbeat detection, Internet packet explorer (PING) or routing notification failure.
  • PING Internet packet explorer
  • routing notification failure The method of sensing network faults here is only an example and does not constitute a limitation on the embodiments of the present application.
  • the communication device can use the port identifier of the target egress port as an index to find the target entry in the flow information table, and obtain the information corresponding to the first data flow in the target entry. For example, the source IP address, destination IP address, source port number, destination port number, source queue pair identifier, destination queue pair identifier, and data packet sequence number corresponding to the first data flow are obtained. Then, the communication device generates the first message based on the obtained information.
  • the first message may be a NAK message under the RoCEv2 protocol.
  • the response message can be an ACK message and
  • the specific type of NAK message can be distinguished according to the flag bit in the message.
  • the NAK message includes an Ethernet header, an IP header, a UDP header, and a base transport header. BTH), ACK extended transport header (ACK extended transport header, AETH) and ICRC for verification.
  • the Ethernet packet header may include a source MAC address and a destination MAC address.
  • the source MAC address may be the MAC address of the destination of the first data flow.
  • the destination MAC address may be, for example, the MAC address of the source end of the first data flow.
  • the IP packet header may include the source IP address and the destination IP address.
  • the transmission path of the above-mentioned first data flow is "processing device 201 - network device 2053 - network device 2051 - network device 2056 - processing device 204".
  • the source IP address in the IP packet header is the IP address of the network device 2056.
  • the destination IP address is the IP address of the network device 2053.
  • the UDP packet header may include the UDP source port number and the UDP destination port number.
  • the destination port is still the default 4791.
  • the basic transmission header is the header of wireless bandwidth technology at the transport layer.
  • the header please refer to the corresponding description in Figure 5 above, and will not be described again here.
  • the ACK extension transport header contains additional transport fields for the ACK packet. See Figure 6B for example. It can be seen that the ACK extension transmission header may include 4 bytes (0-3 bytes), that is, a length of 32 bits. Among them, the message sequence number (MSN) occupies 0-23 bits, and the flag bit (syndrome) occupies 24-31 bits. The message sequence number indicates the total length of successfully received data. This flag bit can be used to indicate whether the message is an ACK message or a NAK message. See Table 2 for an example.
  • Table 2 exemplarily shows the specific definition of the content included in the 8 bits occupied by the flag bit.
  • bits 5-6 of the 8 bits that is, the above bits 6:5 are 00, it means that the message is an ACK message. If bits 6:5 is 11, it means that the message is a NAK message.
  • the description of the definition of the remaining bits can correspond to the description in Table 43 (Table 43) of the InfiniBandTM Architecture Specification Volume 1, Release 1.1, which will not be described again here.
  • the source IP address of the first data stream obtained above can be used as the destination IP address in the first message.
  • the obtained destination IP address of the first data flow is used as the source IP address in the first message.
  • the obtained source queue pair identifier is used as the destination queue pair identifier in the first message.
  • the obtained destination queue pair identifier is used as the source queue pair identifier in the first message.
  • the obtained data packet sequence number, or the obtained sequence number obtained by adding 1 to the obtained data packet sequence number, is used as the starting message sequence number indicating retransmission in the first message.
  • Table 3 For easy understanding, please refer to Table 3.
  • the destination port number in the first message above is the default 4791.
  • the source port number in the first message (source queue pair ID XOR destination queue pair ID) OR 0xC000.
  • the source queue pair identifier and the destination queue pair identifier are the source queue pair identifier and the destination queue pair identifier in the above-mentioned first message. Therefore, the source port number in the first packet can be calculated.
  • the communication device encapsulates various information in the first message obtained above into the first message, and sets bits 6:5 in the flag bits of the first message to 11 to indicate that the message is NAK message. Then a complete NAK message is generated, that is, the first message is generated. Then, the communication device sends the first message to the source end of the first data stream.
  • the source end receives the first message and retransmits the one or more messages based on the first message.
  • the source end After receiving the first message, the source end retransmits the message based on the instructions in the message. For example, the message indicated by the PSN included in the first message and the message following the PSN are retransmitted. This achieves rapid retransmission of lost messages after a network failure, improves the efficiency of message retransmission, and thereby improves the processing performance of the corresponding services of the messages.
  • the association of the flow information corresponding to the target egress port and the first data flow is mainly described as an example. It can be understood that in another possible implementation, the above-mentioned intermediate node receives the first data stream through a certain inlet port. To facilitate subsequent description, the certain ingress port may be referred to as the target ingress port for short.
  • the above-mentioned network failure may be a link failure or equipment failure on the path between the target inlet port and the source end of the first data flow in the transmission path of the first data flow.
  • the network failure may be a failure of a link connected to the target inlet port.
  • the network failure may be a failure of the previous hop network device connected to the target ingress port.
  • the network failure can be associated with the target ingress port.
  • the target ingress port is also associated with information corresponding to the data flow received from the target inlet port and stored in a flow information table (referred to as the ingress flow information table for short).
  • the ingress flow information table for short.
  • the acquisition and storage of information in the entry stream information table may refer to the foregoing description, and will not be described again here.
  • the network failure is a failure of the link or device connected to the target inlet port.
  • the network fault may be sensed by the main control board of the device. After sensing a network fault, the main control board can trigger the communication device to perform the above message generation operation. Then, the communication device can quickly determine the first data flow in the inlet flow information table based on the identification of the target ingress port, and obtain the information corresponding to the first data flow in the flow information table. Then, the above-mentioned first message is generated based on the obtained information.
  • the specific implementation of generating the first message please refer to the above description and will not be described again here.
  • the communication device sends the first message to the source end of the first data stream.
  • the source end retransmits the message based on the instructions in the message. For example, the message indicated by the PSN included in the first message and the message following the PSN are retransmitted. This achieves rapid retransmission of lost messages after a network failure, improves the efficiency of message retransmission, and thereby improves the processing performance of the corresponding services of the messages.
  • the communication device is a device included in the intermediate node in the transmission path.
  • the message can also be quickly retransmitted when sensing the above-mentioned network failure.
  • the network fault may be associated with the ingress port of the destination device.
  • the specific correlation method please refer to the above-mentioned correlation method between network faults and egress ports, which will not be described here.
  • the ingress port is also stored in a flow information table (referred to as the ingress flow information table for short) in association with information corresponding to the data flow received from the ingress port.
  • the destination device When the destination device senses a network fault, it determines that the network fault is a fault of a link or device connected to a certain ingress port. For example, the network fault may be sensed by the main control board of the device. After sensing a network fault, the main control board can trigger the communication device to perform the above message generation operation. Then, the communication device can quickly determine the first data flow in the inlet flow information table based on the identification of the ingress port, and obtain the information corresponding to the first data flow in the flow information table. Then, the above-mentioned first message is generated based on the obtained information. For the specific implementation of generating the first message, please refer to the above description and will not be described again here.
  • the communication device sends the first message to the source end of the first data flow.
  • the source end retransmits the message based on the instructions in the message. For example, the message indicated by the PSN included in the first message and the message following the PSN are retransmitted. This achieves rapid retransmission of lost messages after a network failure, improves the efficiency of message retransmission, and thereby improves the processing performance of the corresponding services of the messages.
  • each device includes a corresponding hardware structure and/or software module for executing each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
  • Embodiments of the present application can divide the device into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. In actual implementation, there may be other division methods.
  • FIG. 7 shows a possible logical structure diagram of the communication device 700.
  • the communication device 700 may be the communication device described in the above data stream processing method.
  • the communication device 700 includes a generating unit 701 and a sending unit 702. in:
  • Generating unit 701 configured to generate a first message when a network failure occurs in the transmission path of the first data stream, where the first message refers to Indicate that the source node of the first data flow retransmits one or more messages in the first data flow; the communication device 700 is on the transmission path;
  • the sending unit 702 is used to send the first message.
  • the network failure is a link or device failure connected to the port that transmits the first data stream
  • the communication device 700 further includes a determining unit configured to determine the first data flow according to the port before the generating unit 701 generates the first message.
  • the communication device 700 further includes a receiving unit configured to receive the second message in the first data stream before a network failure occurs in the transmission path of the first data stream;
  • the communication device 700 also includes an acquisition unit, configured to acquire first information corresponding to the first data flow according to the second message, where the first information includes the source IP address, destination IP address, The source port number, destination queue pair DQP and data packet sequence number are used to generate the first message.
  • acquisition unit configured to acquire first information corresponding to the first data flow according to the second message, where the first information includes the source IP address, destination IP address, The source port number, destination queue pair DQP and data packet sequence number are used to generate the first message.
  • the first information also includes the source queue pair SQP of the first data flow; the SQP is calculated based on the source port number and the DQP.
  • the first information also includes the SQP of the first data stream; before a network failure occurs in the transmission path of the first data stream,
  • the receiving unit is also used to receive the first confirmation ACK message corresponding to the first data stream;
  • the acquisition unit is also used to acquire the DQP in the ACK message
  • the communication device 700 further includes a storage unit configured to save the DQP in the ACK message as the SQP of the first data stream.
  • the first information also includes the SQP of the first data stream; before a network failure occurs in the transmission path of the first data stream,
  • the receiving unit is also used to receive a third message from the source node, where the third message includes the SQP;
  • the communication device 700 also includes a saving unit for saving the SQP.
  • the third message is a link layer discovery protocol message.
  • the obtaining unit obtains the first information from the second message
  • the receiving unit is also used to receive the second confirmation ACK message corresponding to the first data stream;
  • the acquisition unit is also used to acquire the first data packet sequence number in the ACK message
  • the communication device 700 further includes an updating unit configured to update the data packet sequence number in the first information stored in the communication device 700 based on the first data packet sequence number.
  • the first message is a negative acknowledgment NAK message.
  • the first communication device 700 is a network device on the transmission path, or a network interface board in the network device, or a processor in the network device, or a network device in the network device. switching chip.
  • the first data stream is a data stream transmitted based on the RoCEv2 protocol; the second message is the first message in the message MSG included in the first data stream.
  • FIG. 8 shows a possible logical structure diagram of the data stream processing device 800.
  • the data stream processing device 800 may be the source end described in the above data stream processing method, or a network interface board, processor or switching chip of the source end.
  • the data stream processing device 800 includes a receiving unit 801 and a retransmission unit 802. in:
  • Receiving unit 801 configured to receive a first message from a second communication device; the first message is generated when a network failure occurs in the transmission path of the first data stream, and the first message indicates retransmission One or more messages in the first data stream, the second communication device is on the transmission path of the first data stream, and the data stream processing device is the source node of the first data stream;
  • the retransmission unit 802 is used to retransmit the one or more messages.
  • the data stream processing device 800 further includes a sending unit, configured to send a second message to the second communication device before the receiving unit receives the first message from the second communication device,
  • the second message includes the source queue pair SQP of the first data flow, and the SQP is used to generate the first message.
  • FIG 9 shows a possible hardware structure diagram of the communication device 900 provided by this application.
  • the communication device 900 can be the above-mentioned data The communication device described in the data stream processing method.
  • the communication device 900 includes: a processor 901, a memory 902, and a communication interface 903.
  • the processor 901, the communication interface 903, and the memory 902 may be connected to each other or to each other via a bus 904.
  • the memory 902 is used to store computer programs and data of the communication device 900.
  • the memory 902 may include, but is not limited to, random access memory (RAM), read-only memory (ROM), Erasable programmable read-only memory (erasable programmable read only memory, EPROM) or portable read-only memory (compact disc read-only memory, CD-ROM), etc.
  • the software or program code required for the functions of all or part of the units of the communication device in the above method embodiment is stored in the memory 902 .
  • the processor 901 in addition to calling the program code in the memory 902 to implement some of the functions, can also cooperate with other components (such as communication interface 903) together to complete other functions described in the method embodiment (such as the function of receiving or sending messages).
  • the number of communication interfaces 903 may be multiple, and are used to support the communication device 900 to communicate, such as receiving or sending messages.
  • the processor 901 may be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field-programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof.
  • the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and so on.
  • the processor 901 may be used to read the program stored in the memory 902 and perform the operations performed by the communication device in the method described in FIG. 4 and its possible embodiments.
  • the communication device 900 may be the interface board 320 shown in FIG. 3 .
  • the above-mentioned processor 901 may be the processor 3201 in the interface board 320.
  • the above-mentioned memory 902 may be the memory 3202 in the interface board 320 .
  • the above-mentioned communication interface 903 may be the PHY Ethernet interface 3205 in the interface board 320. This is only an example and does not constitute a limitation on the embodiments of the present application.
  • FIG 10 shows a possible hardware structure diagram of the data stream processing device 1000 provided by this application.
  • the data stream processing device 1000 can be the source end described in the above data stream processing method, or the network interface of the source end. board, processor or switching chip, etc.
  • the data stream processing device 1000 includes: a processor 1001, a memory 1002, and a communication interface 1003.
  • the processor 1001, the communication interface 1003, and the memory 1002 may be connected to each other or to each other via a bus 1004.
  • the memory 1002 is used to store computer programs and data of the data stream processing device 1000.
  • the memory 1002 may include, but is not limited to, random access memory (random access memory, RAM), read-only memory (read-only memory, ROM). ), erasable programmable read only memory (EPROM) or portable read-only memory (compact disc read-only memory, CD-ROM), etc.
  • the software or program code required for the functions of all or part of the units at the source end in the above method embodiment is stored in the memory 1002 .
  • the processor 1001 in addition to calling the program code in the memory 1002 to implement some of the functions, can also cooperate with other components (such as communication interface 1003) together to complete other functions described in the method embodiment (such as the function of receiving or sending messages).
  • the number of communication interfaces 1003 may be multiple, and are used to support the data stream processing device 1000 to communicate, such as receiving or sending messages.
  • the processor 1001 may be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field-programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof.
  • the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and so on.
  • the processor 1001 can be used to read the program stored in the above-mentioned memory 1002 and perform the operations performed by the source end in the method described in the above-mentioned Figure 4 and its possible embodiments.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor to implement any of the above embodiments and possible embodiments. operations performed by the communication device.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor to implement any of the above embodiments and possible embodiments. operations performed by the source.
  • An embodiment of the present application also provides a computer program product.
  • the computer program product is read and executed by a computer, the operations performed by the communication device in any of the above embodiments and possible embodiments will be implement.
  • Embodiments of the present application also provide a computer program product.
  • the computer program product is read and executed by a computer, the operations performed by the source end in any of the above embodiments and possible embodiments will be implement.
  • an embodiment of the present application also provides a communication system 1100.
  • the communication system includes a first communication device 1101 and a second communication device 1102.
  • the first communication device 1101 may be the communication device in the above data stream processing method.
  • the second communication device 1102 may be the source end in the above data stream processing method.
  • the embodiments of the present application can improve the efficiency of message retransmission, thereby improving the processing performance of services corresponding to the messages.
  • first, second, etc. are used to distinguish the same or similar items with basically the same functions and functions. It should be understood that the terms “first”, “second” and “nth” There is no logical or sequential dependency, and there is no limit on the number or execution order. It should also be understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another.
  • the size of the sequence number of each process does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not be determined by the execution order of the embodiments of the present application.
  • the implementation process constitutes no limitation.
  • references throughout this specification to "one embodiment,” “an embodiment,” and “a possible implementation” mean that specific features, structures, or characteristics related to the embodiment or implementation are included herein. In at least one embodiment of the application. Therefore, “in one embodiment” or “in an embodiment” and “a possible implementation” appearing in various places throughout the specification do not necessarily refer to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请实施例提供了一种数据流处理方法及相关装置。该方法包括在第一数据流的传输路径出现网络故障的情况下,该传输路径上的第一通信装置生成第一报文,该第一报文指示该第一数据流的源节点重传该第一数据流中的一个或多个报文;该第一通信装置发送该第一报文。采用本申请实施例,提高报文重传的效率,进而提高报文对应业务的处理性能。

Description

数据流处理方法及相关装置
本申请要求于2022年08月31日提交中国专利局、申请号为202211054474.5、申请名称为“数据流处理方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种数据流处理方法及相关装置。
背景技术
基于融合以太网的RDMA(RDMA over converged Ethernet,RoCE)是一个网络协议,允许在一个以太网网络上使用远程直接存储访问(remote direct memory access,RDMA)技术。RoCE协议有RoCE v1协议和RoCE v2协议两个版本。RoCE v1是一个以太网链路层协议,因此允许同一个以太网广播域中的任意两台主机间进行通信。RoCE v2是一个网络层协议,因而RoCE v2数据包可以被路由。RoCE协议除了应用于融合以太网网络,还可以应用于传统或非融合的以太网网络。
RoCEv2协议构筑于用户数据报协议(user datagram protocol,UDP)之上。即RoCE报文是基于UDP进行传输的。UDP是无需建立连接即可通信的协议,有不提供数据包分组、组装和不能对数据包进行排序的缺点,也就是说,当报文发送之后,是无法得知其是否安全完整到达的。因此,现有的实现中,RoCE报文的丢包处理依赖基于计算机系统内层的重发计数器(retry counter,RC)的重传机制。但是这种重传机制重传效率低,影响具体业务的处理性能。
发明内容
本申请实施例公开了一种数据流处理方法及相关装置,能够提高报文的重传效率,提高报文对应业务的处理性能。
第一方面,本申请实施例提供一种数据流处理方法,该方法包括:在第一数据流的传输路径出现网络故障的情况下,前述传输路径上的第一通信装置生成第一报文,前述第一报文指示前述第一数据流的源节点重传前述第一数据流中的一个或多个报文;前述第一通信装置发送前述第一报文。
可选的,上述第一报文为消极确认NAK报文。
可选的,上述通信装置可以是上述传输路径上的网络设备,或者为该网络设备中的网络接口板,或者为该网络设备中的处理器,或者为该网络设备中的交换芯片。
本方案中,上述通信装置为数据流传输路径上的任一个节点(即网络设备)中包括的装置。当该网络设备感知到该传输路径上出现故障时,即可生成该数据流的响应报文发送给该数据流的源端设备(即上述源节点),以指示源端设备重传报文。而无需等待数据流的源端基于超时重传机制来重传报文。由于超时重传机制需要等待的时间较长,影响报文的重传效率和具体业务的想能。因而,采用本申请能够提高报文重传的效率,进而提高报文对应业务的处理性能。
一种可能的实施例中,前述网络故障为传输前述第一数据流的端口连接的链路或装置故障;前述第一通信装置生成第一报文之前,前述方法还包括:前述第一通信装置根据前述端口确定前述第一数据流。
本方案中,由于网络故障通常可以与网络设备的出端口关联,例如网络故障可以是该出端口连接的链路的故障或连接的下一跳装置或设备的故障等。因此,通过将从出端口发送的数据流对应的信息与该出端口的标识关联存储,使得当该出端口对应的链路或下一跳出现故障时,可以基于该出端口标识快速确定需要重传报文的数据流。以快速查找到该数据流对应的信息来生成指示报文重传的响应报文,从而进一步提高了报文重传的效率。
一种可能的实施例中,在前述第一数据流的传输路径出现网络故障之前,前述方法还包括:前述第一通信装置接收前述第一数据流中的第二报文;前述第一通信装置根据前述第二报文获取与前述第一数据流对应的第一信息,前述第一信息包括前述第二报文中的源IP地址、目的IP地址、源端口号、目的队列对DQP和数据包序列号,前述第一信息用于生成前述第一报文。
本方案中,通信装置可以基于过路的数据流的报文,从中获取上述第一信息以进行存储,便于后续可以快速响应网络故障,生成指示重传报文的响应报文发送给源端,以提高报文重传的效率。
一种可能的实施例中,前述第一信息还包括前述第一数据流的源队列对SQP;前述SQP基于前述源端口号和前述DQP计算得到。
本方案中,由于上述通信装置中接收的第二报文中不包括上述第一数据流的SQP,而在RoCEv2协议中,源端口号、SQP和DQP这三者之间存在如下关系:源端口号=(SQP XOR DQP)OR 0xC000。因此,通信装置可以通过该关系式计算得到第一数据流的SQP。该源队列标识和上述从第二报文中获取的目标标识一起可以用于识别出该第一数据流,以便于协助快速生成该第一数据流对应的NAK报文。
一种可能的实施例中,前述第一信息还包括前述第一数据流的SQP;在前述第一数据流的传输路径出现网络故障之前,还包括:
前述第一通信装置接收前述第一数据流对应的第一确认ACK报文;
前述第一通信装置获取前述ACK报文中的DQP;
前述第一通信装置将前述ACK报文中的DQP作为前述第一数据流的SQP保存。
本方案中,由于上述通信装置中接收的第二报文中不包括上述第一数据流的SQP,因此,可以通过该第一数据流对应的ACK报文来获取该第一数据流的SQP。该SQP和上述从第二报文中获取的目标标识一起可以用于识别出该第一数据流,以便于协助快速生成该第一数据流对应的NAK报文。
一种可能的实施例中,前述第一数据流对应的信息还包括前述第一数据流的SQP;在前述第一数据流的传输路径出现网络故障之前,还包括:
前述通信装置接收来自前述源端的第三报文,前述第三报文中包括前述SQP;
前述通信装置保存前述SQP。
可选的,前述第三报文为链路层发现协议报文。
本方案中,由于上述通信装置中接收的第二报文中不包括上述第一数据流的SQP,因此,通信装置可以通过与第一数据流的源端交互来获取该第一数据流的SQP。该SQP和上述从第二报文中获取的目标标识一起可以用于识别出该第一数据流,以便于协助快速生成该第一数据流对应的NAK报文。
一种可能的实施例中,前述第一通信装置从前述第二报文中获取第一信息之后,还包括:
前述第一通信装置接收前述第一数据流对应的第二确认ACK报文;
前述第一通信装置获取前述ACK报文中的第一数据包序列号;
前述第一通信装置基于前述第一数据包序列号更新前述通信装置中存储的前述第一信息中的数据包序列号。
本方案中,通信装置可以通过接收的该第一数据流对应的ACK报文中的PSN来更新已存储的PSN。因为ACK报文中携带的PSN为目的端已经确认接收到的报文的PSN。用该PSN更新已存储的PSN,使得后续生成的NAK报文指示的重传报文的数量减少。因为可以减少源端重传报文的数量,节省计算和网络资源。
一种可能的实施例中,前述第一数据流对应的信息存储在流信息表中,前述流信息表中包括前述中间节点中的N个出端口的端口标识和M个数据流对应的信息,前述N个端口标识中的每个端口标识与前述M个数据流中的一个或多个数据流对应的信息关联存储,前述N和M为大于0的整数,N≤M。
本方案中,以流信息表的形式来存储数据流对应的信息,通过流信息表可以快速查找到每个出端口对应的数据流对应的信息,因而可以快速确定需要重传报文的数据流,以快速查找到该数据流对应的信息来生成指示报文重传的响应报文,从而进一步提高了报文重传的效率。
一种可能的实施例中,上述第一数据流为基于RoCEv2协议传输的数据流;前述第二报文为前述第一数据流包括的消息MSG中的第一个报文。
本方案中,上述通信装置接收的第二报文可以是数据流中某一个MSG中的第一个报文(fist报文)。因为在本方案中,该接收的fist报文中的数据包序列号被当做是生成的NAK报文中的错误数据包序号(error packet sequence number,ePSN)。NAK报文中的该ePSN用于指示重传该ePSN以及该ePSN后面的PSN对应的报文。因此,从fist报文中获取上述目标信息可以使得后续能够从MSG的第一个报文开始重传,确保MSG中的所有报文尽可能地发送到目的端。
第二方面,本申请提供一种数据流处理方法,该方法包括:
第一通信装置接收来自第二通信装置的第一报文;前述第一报文是在前述第一数据流的传输路径出现网络故障的情况下生成的,前述第一报文指示重传前述第一数据流中的一个或多个报文,前述第二通信装置在前述第一数据流的传输路径上,前述第一通信装置为前述第一数据流的源节点;
前述第一通信装置重传前述一个或多个报文。
本方案中,上述第二通信装置为数据流传输路径上的任一个节点(即网络设备)中包括的装置。当该网络设备感知到该传输路径上出现故障时,即可生成该数据流的响应报文发送给该数据流的源端设备(即 上述第一通信装置),以指示源端设备重传报文。而无需等待数据流的源端基于超时重传机制来重传报文。由于超时重传机制需要等待的时间较长,影响报文的重传效率和具体业务的想能。因而,采用本申请能够提高报文重传的效率,进而提高报文对应业务的处理性能。
一种可能的实施例中,第一通信装置接收来自第二通信装置的第一报文之前,还包括:
前述第一通信装置向前述第二通信装置发送第二报文,前述第二报文中包括前述第一数据流的SQP,前述SQP用于生成前述第一报文。
本方案中,上述第一数据流的源端即上述第一通信装置可以在生成该第一数据流的SQP后,将该SQP发送给上述第二通信装置。该源队列标识和上述从第二报文中获取的目标标识一起可以用于识别出该第一数据流,以便于协助第二通信装置快速生成该第一数据流对应的NAK报文(上述第一报文),提高报文重传的效率。
第三方面,本申请提供一种通信装置,该通信装置包括:
生成单元,用于在第一数据流的传输路径出现网络故障的情况下,生成第一报文,前述第一报文指示前述第一数据流的源节点重传前述第一数据流中的一个或多个报文;前述通信装置在前述传输路径上;
发送单元,用于发送前述第一报文。
一种可能的实施例中,前述网络故障为传输前述第一数据流的端口连接的链路或装置故障;
前述通信装置还包括确定单元,用于在前述生成单元生成前述第一报文之前,根据前述端口确定前述第一数据流。
一种可能的实施例中,前述通信装置还包括接收单元,用于在前述第一数据流的传输路径出现网络故障之前,接收前述第一数据流中的第二报文;
前述通信装置还包括获取单元,用于根据前述第二报文获取与前述第一数据流对应的第一信息,前述第一信息包括前述第二报文中的源IP地址、目的IP地址、源端口号、目的队列对DQP和数据包序列号,前述第一信息用于生成前述第一报文。
一种可能的实施例中,前述第一信息还包括前述第一数据流的源队列对SQP;前述SQP基于前述源端口号和前述DQP计算得到。
一种可能的实施例中,前述第一信息还包括前述第一数据流的SQP;在前述第一数据流的传输路径出现网络故障之前,
前述接收单元,还用于接收前述第一数据流对应的第一确认ACK报文;
前述获取单元,还用于获取前述ACK报文中的DQP;
前述通信装置还包括保存单元,用于将前述ACK报文中的DQP作为前述第一数据流的SQP保存。
一种可能的实施例中,前述第一信息还包括前述第一数据流的SQP;在前述第一数据流的传输路径出现网络故障之前,
前述接收单元,还用于接收来自前述源节点的第三报文,前述第三报文中包括前述SQP;
前述通信装置还包括保存单元,用于保存前述SQP。
一种可能的实施例中,前述第三报文为链路层发现协议报文。
一种可能的实施例中,前述获取单元从前述第二报文中获取第一信息之后,
前述接收单元,还用于接收前述第一数据流对应的第二确认ACK报文;
前述获取单元,还用于获取前述ACK报文中的第一数据包序列号;
前述通信装置还包括更新单元,用于基于前述第一数据包序列号更新前述通信装置中存储的前述第一信息中的数据包序列号。
一种可能的实施例中,前述第一报文为消极确认NAK报文。
一种可能的实施例中,前述第一通信装置为前述传输路径上的网络设备,或者为前述网络设备中的网络接口板,或者为前述网络设备中的处理器,或者为前述网络设备中的交换芯片。
一种可能的实施例中,前述第一数据流为基于RoCEv2协议传输的数据流;前述第二报文为前述第一数据流包括的消息MSG中的第一个报文。
第四方面,本申请提供一种数据流处理装置,该数据流处理装置包括:
接收单元,用于接收来自第二通信装置的第一报文;前述第一报文是在前述第一数据流的传输路径出现网络故障的情况下生成的,前述第一报文指示重传前述第一数据流中的一个或多个报文,前述第二通信装置在前述第一数据流的传输路径上,前述数据流处理装置为前述第一数据流的源节点;
重传单元,用于重传前述一个或多个报文。
一种可能的实施例中,前述数据流处理装置还包括发送单元,用于在前述接收单元接收来自第二通信装置的第一报文之前,向前述第二通信装置发送第二报文,前述第二报文中包括前述第一数据流的源队列对SQP,前述SQP用于生成前述第一报文。
第五方面,本申请提供一种通信装置,该通信装置包括处理器和存储器。该存储器与处理器耦合,处理器执行存储器中存储的计算机程序时,可以实现上述第一方面任一项描述的数据流处理方法。该通信装置还可以包括通信接口,通信接口用于该设备与其它设备进行通信,示例性的,通信接口可以是收发器、电路、总线、模块或其它类型的通信接口。
在一种可能的实现中,该通信装置可以包括:
存储器,用于存储计算机程序;
处理器,用于:
在第一数据流的传输路径出现网络故障的情况下,生成第一报文,前述第一报文指示前述第一数据流的源节点重传前述第一数据流中的一个或多个报文;并通过通信接口发送前述第一报文。
需要说明的是,本申请中存储器中的计算机程序可以预先存储也可以使用该设备时从互联网下载后存储,本申请对于存储器中计算机程序的来源不进行具体限定。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或连接,其可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。
第六方面,本申请提供一种通信装置,该通信装置包括处理器和存储器。该存储器与处理器耦合,处理器执行存储器中存储的计算机程序时,可以实现上述第二方面任一项描述的数据流处理方法。该通信装置还可以包括通信接口,通信接口用于该设备与其它设备进行通信,示例性的,通信接口可以是收发器、电路、总线、模块或其它类型的通信接口。
在一种可能的实现中,该通信装置可以包括:
存储器,用于存储计算机程序;
处理器,用于:
通过通信接口接收来自第二通信装置的第一报文;前述第一报文是在前述第一数据流的传输路径出现网络故障的情况下生成的,前述第一报文指示重传前述第一数据流中的一个或多个报文,前述第二通信装置在前述第一数据流的传输路径上,前述第一通信装置为前述第一数据流的源节点;并重传前述一个或多个报文。
需要说明的是,本申请中存储器中的计算机程序可以预先存储也可以使用该设备时从互联网下载后存储,本申请对于存储器中计算机程序的来源不进行具体限定。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或连接,其可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。
第七方面,本申请提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行以实现上述第一方面任意一项所述的方法。
第八方面,本申请提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行以实现上述第二方面任意一项所述的方法。
第九方面,本申请一种计算机程序产品,当该计算机程序产品被计算机读取并执行时,上述第一方面任意一项所述的方法将被执行。
第十方面,本申请一种计算机程序产品,当该计算机程序产品被计算机读取并执行时,上述第二方面任意一项所述的方法将被执行。
第十一方面,本申请一种通信系统,该通信系统包括第一通信装置和第二通信装置,该第一通信装置用于执行上述第一方面任一项所述的方法,该第二通信装置用于执行上述第二方面任一项所述的方法。
上述第三方面至第十一方面提供的方案,用于实现或配合实现上述第一方面或第二方面中对应提供的方法,因此可以与第一方面或第二方面中对应的方法达到相同或相应的有益效果,此处不再进行赘述。
附图说明
图1和图2所示为本申请实施例中的通信网络示意图;
图3所示为本申请实施例中的网络设备结构示意图;
图4所示为本申请实施例提供的数据流处理方法的流程示意图;
图5所示为本申请实施例中的一种报文的格式示意图;
图6A所示为本申请实施例中的另一种报文的格式示意图;
图6B所示为本申请实施例中的一种报文头的格式示意图;
图7至图10所示为本申请实施例提供的装置结构示意图;
图11所示为本申请实施例提供的通信系统的结构示意图。
具体实施方式
下面结合附图对本申请实施例做示例性作介绍。
参见图1,示例性示出了本申请实施例的一种通信网络100的架构示意图。该通信网络100包括处理设备110、处理设备120和转发网络130。该处理设备110和处理设备120之间通过转发网络130进行通信。
该处理设备110和处理设备120支持基于融合以太网的RDMA(RDMA over converged Ethernet,RoCE)协议,具体的可以支持RoCEv2协议。即处理设备110和处理设备120可以根据该RoCEv2协议,在双方的中央处理器(central processing unit,CPU)不感知的情况下从对方的存储器中读取数据或写入数据。
上述转发网络130中包括一个或多个网络设备,该网络设备可以用于转发该处理设备110和处理设备120之间的通信数据。例如转发该处理设备110和处理设备120之间交互通信的RoCE报文等。
示例性地,RoCEv2协议被广泛应用在高性能计算(high performance computing,HPC)、人工智能(artificial intelligence,AI)计算、集中存储、分布式存储等场景中。因此,上述通信网络100可以是HPC系统网络,AI计算集群网络,数据中心网络(可以是集中式存储的数据中心或分布式存储的数据中心)等。
图2以上述通信网络100为数据中心网络为例示出。该数据中心网络可以示例性地包括处理设备201、处理设备202、处理设备203、处理设备204和转发网络205。处理设备201、处理设备202、处理设备203和处理设备204可以支持RoCEv2协议。示例性地,该四个处理设备中的任意两个可以相当于上述图1所示的处理设备110和处理设备120。
上述转发网络205可以相当于上述图1所示的转发网络130。该转发网络205可以包括多个网络设备,例如包括网络设备2051至网络设备2057。
示例性地,图2所示的数据中心网络为脊(spine)-叶(leaf)架构,网络设备2051和网络设备2052可以是spine交换机,网络设备2053至网络设备2057可以是leaf交换机。leaf交换机例如可以是机柜顶部(Top of Rack,TOR)交换机。每个spine交换机可以与所有的leaf交换机连接通信。例如网络设备2051和网络设备2052均与网络设备2053至网络设备2057连接。leaf交换机与处理设备连接通信。例如,网络设备2053与处理设备201连接;网络设备2054与处理设备202连接;网络设备2055与处理设备203连接;网络设备2056和网络设备2057均与处理设备204连接。
示例性地,上述图1或图2所示的处理设备例如可以是客户端,客户端可以包括服务器或工作站等。其中,服务器可以包括任意能够实现计算功能的服务器,例如后台计算服务器或数据存储服务器等。客户端还可以包括手持设备(例如,手机、平板电脑、掌上电脑等)、车载设备、可穿戴设备(例如智能手表、智能手环、计步器等)、笔记本电脑、台式电脑、智能家居设备(例如,冰箱、电视、空调、电表等)、智能机器人、车间设备,以及各种形式的用户设备(user equipment,UE)、移动台(mobile station,MS)、终端设备(terminal equipment),等等。
示例性地,上述图1或图2所示的网络设备可以是交换机或路由器等。
可以理解的是,上述介绍的通信网络仅为示例,不构成对本申请实施例的限制。支持RoCEv2协议的通信网络均属于本申请实施例适用的通信网络。
参见图3,图3示例性地示出了上述网络设备的结构示意图。该网络设备可以包括主控板310和接口板320。主控板又可以称为主处理单元(main processing unit,MPU)。接口板又可以称为线路处理单元(line processing unit,LPU)。该主控板310提供了控制平面和管理平面。控制平面完成该网络设备中的协议处理、业务处理、路由运算、转发控制、业务调度、流量统计或系统安全等功能。管理平面完成系统的运行状态监控、环境监控、日志和告警信息处理、系统加载或系统升级等功能。接口板320提供了不同类型(例如光口或电口)和不同速率的接口,通过分布式数据平面对数据进行转发。该分布式数据平面例如可以是交换网板(switch fabric unit,SFU)。该交换网板可以连通该主控板310和接口板320,使其之间可以通信。
示例性地,上述主控板310可以包括处理器3101、存储器3102、以太网接口3103、以太网接口3104、物理层(physical,PHY)以太网接口3105和PHY以太网接口3106。其中:
处理器3101是主控板310的控制单元,即是主控板310中的信息处理和程序运行的最终执行单元。示例性地,处理器3101可以是中央处理器单元CPU、通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。
存储器3102用于主控板310中的存储计算机程序员、配置的参数和数据等。示例性地,存储器3102 可以包括随机存取存储器(random access memory,RAM)和闪存(flash)。该闪存可以用于存储计算机程序和静态配置的参数等。该RAM相当于运行内存,运行时执行的计算机程序和数据可以存储在该RAM中。
以太网接口3103可以是管理网口。该接口可以用于进行系统的程序加载和调试等工作,也可以连接远端的网管工作站等设备以实现系统的远程管理。该以太网接口3103可以通过PHY以太网接口3105连接到上述交换网板,以实现与远端网管工作站的通信。
以太网接口3104可以通过PHY以太网接口3106连接到上述交换网板,并通过该交换网板实现与上述接口板320之间的通信。例如,通过该交换网板与该接口板320中的以太网接口3203连接,以实现与接口板320中的处理器3201的通信。
示例性地,上述接口板320可以包括处理器3201、存储器3202、以太网接口3203、局域网交换(LAN Switch,LSW)芯片3204和PHY以太网接口3205。其中:
处理器3201是接口板320的控制单元,即是接口板320中的信息处理和程序运行的最终执行单元。示例性地,处理器3201可以是中央处理器单元CPU、通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。
存储器3202用于存储接口板320中的计算机程序员、配置的参数和数据等。示例性地,存储器3202可以包括随机存取存储器(random access memory,RAM)和闪存(flash)。该闪存可以用于存储计算机程序和静态配置的参数等。该RAM相当于运行内存,运行时执行的计算机程序和数据可以存储在该RAM中。
以太网接口3203可以通过上述交换网板与上述主控板310中的PHY以太网接口3106连接,以实现与上述主控板310之间的通信。
LSW芯片3204可以由处理器3201控制以进行初始化、业务表项下发、协议报文收发或各类中断(含端口连接(link up)和断开(link down)处理)的控制等。一种可能的实现中,该接口板320还可以包括另一个存储器,该存储器可以是该LSW芯片的专用存储器。该存储器可以用于存储转发的报文的内容等。示例性地,该存储器可以是RAM。
PHY以太网接口3205为LSW芯片3204下挂的接口,可以用于网络设备之间的光口或电口的以太网接口对接。
一种可能的实现中,上述接口板320中包括处理器3201的协处理器。该协处理器可以协助处理器3201完成预设的处理工作。例如,在本申请实施例中,该协处理器可以用于实现流信息表的创建和管理功能。示例性地,该协处理器还可以用于实现报文的生成等功能。关于该协处理器实现的功能的具体介绍可以参见后续的描述,此处暂不详述。
可以理解的是,上述图3所示的网络设备的结构仅为示例,不构成对本申请实施例的限制。上述图3所示的网络设备的结构是以框式网络设备为例,在具体实现中,网络设备还可以是盒式网络设备等,本申请实施例对此不做限制。
上述介绍的通信网络支持RoCEv2协议。但是,RoCEv2协议构筑于用户数据报协议(user datagram protocol,UDP)之上。即RoCE报文是基于UDP进行传输的。而UDP是无需建立连接即可通信的协议,有不提供数据包分组、组装和不能对数据包进行排序的缺点,也就是说,当报文发送之后,是无法得知其是否安全完整到达的。现有的方案中,为了减少RoCE报文的丢包,设计了基于计算机系统内层的重发计数器(retry counter,RC)的重传机制。当前常用的两种基于RC的重传方案如下:
1、在一个消息(message,MSG)中存在部分报文丢包的情况下,目的端可以通过消极确认(negative acknowledge,NAK)报文向源端通告错误数据包序号(error packet sequence number,ePSN),以使得该源端对丢包的报文进行重传。
上述源端为生成并发送该MSG的设备。该源端例如可以是上述图1所示的处理设备110。上述目的端是该MSG前往的目的设备。该源端也可以称为源节点。该目的端例如可以是上述图1所示的处理设备120。该MSG可以包括一个或多个报文。每个报文中都包括数据包序号(packet sequence number,PSN)。MSG包括多个报文的情况下,该多个报文中包括的PSN可以是连续的。目的端接收到该MSG的报文后,发现某个PSN后面的报文都已经收到了,但该PSN的报文确没有接收到。该某个PSN即为上述ePSN。则该目的端可以生成NAK报文并向源端发送,以指示源端重传该ePSN的报文以及重传该ePSN后面的PSN 的报文。
2、在数据流中的某一个MSG中的报文均丢失,或数据流中该MSG后面的报文全部丢失的情况下,等待源端基于超时重传机制来重传报文。
在这种情况下,由于目的端没有接收到MSG中的任一个报文,所以无法按照上述第一种重传方案来应答以指示源端重传。由于源端没有收到任何应答,所以只能等待超时。直到该源端中的RoCEv2网卡的计时器超时,该源端才重新针对没有应答的报文进行重传。而在具体实现中,为了避免网络时延导致错误的重传,源端的网卡的超时时间通常比较长(例如为2s等)。这对于一些对时间比较敏感的应用场景(例如交易类存储等时延敏感场景)会带来比较差的体验,难以满足业务性能的需求。
在通信网络中,网络故障的出现通常会导致报文的丢失。出现网络故障后,路由收敛的时间通常为毫秒级的,即丢包的时间为毫秒级的。路由收敛的这段时间内,大概率会导致整个MSG丢包。这就导致无法采用上述第一种快速重传报文的方案,而只能触发上述第二种超时重传的机制。使得重传机制效率低,影响具体业务的处理性能。为了解决该问题,本申请实施例提供了一种数据流处理方法。
可以参见图4,本申请实施例提供的数据流处理方法可以包括但不限于如下步骤。
S401、在第一数据流的传输路径出现网络故障的情况下,该传输路径上的通信装置生成第一报文;该第一报文指示重传该第一数据流中的一个或多个报文。
该通信装置为该传输路径上的第一节点包括的装置。该第一节点为该传输路径中的中间节点或为该第一数据流的目的端(或者称为目的节点)。若该第一节点为该传输路径中的中间节点,该传输路径的中间节点可以是该传输路径上的网络设备。那么,该通信装置可以是该网络设备本身,或者可以是该网络设备中的接口板,或者可以是该网络设备中接口板上的处理器,或者可以是该网络设备中的交换芯片,或者可以是该处理器和该交换芯片的组合。若该第一节点为该第一数据流的目的端,那么,该通信装置可以是该目的端设备本身,或者可以是该目的端设备中的接口板,或者可以是该目的端设备中接口板上的处理器,或者可以是该目的端设备中的交换芯片,或者可以是该处理器和该交换芯片的组合。
S402、该通信装置向该第一数据流的源端发送该第一报文。
上述第一数据流中可以包括一个或多个MSG。即该第一数据流可以是由该一个或多个MSG中的报文组成的。这些报文可以是RoCEv2报文。
示例性地,该第一数据流的源端可以是上述图1所示的处理设备110。该第一数据流的目的端可以是上述图1所示的处理设备120。该第一数据流的传输路径可以是该处理设备110至处理设备120之间用于传输该第一数据流的路径。示例性地,该处理器可以是该接口板上述的主控单元,或者可以是该主控单元的协处理器。例如,该处理器可以是上述图3中所示的处理器3201,或者例如可以是该处理器3201的协处理器。
示例性地,上述第一数据流在上述中间节点中,通过该中间节点的某一个出端口转发出去。为了便于后续的描述,可以将该某一个出端口简称为目标出端口。上述网络故障可以是上述第一数据流的传输路径中,该目标出端口至该第一数据流的目的端之间的路径上的链路故障或设备故障等。例如,该网络故障可以是该目标出端口连接的链路的故障。或者,例如,该网络故障可以是该目标出端口连接的下一跳网络设备的故障。为了便于理解,以上述图2所示的通信网络为例。
在图2中,假设上述第一数据流的源端为处理设备201,目的端为处理设备204。该第一数据流的传输路径为:处理设备201-网络设备2053-网络设备2051-网络设备2056-处理设备204。假设上述中间节点为网络设备2053,该网络设备2053中用于发送该第一数据流的出端口连接的下一跳设备即为该网络设备2051。那么,上述网络故障例如可以是该网络设备2053和该网络设备2051之间的链路出现故障。或者,该网络故障例如可以是该网络设备2051出现故障。或者,该网络故障例如可以是该网络设备2051-网络设备2056-处理设备204之间的链路或设备出现故障。可以理解的是,此处仅为示例,不构成对本申请实施例的限制。
为了便于理解,后续的描述中主要以上述第一数据流的传输路径为“处理设备201-网络设备2053-网络设备2051-网络设备2056-处理设备204”为例介绍。
示例性地,在具体实现中,上述通信装置中可以生成并管理流信息表。该流信息表中可以包括一个或多个表项,每一个表项中可以用于存储一个数据流对应的信息。该数据流对应的信息可以包括该数据流的标识信息和该数据流中需要重传的报文的指示信息。
一种可能的实现中,由于一个数据流中可以包括一个或多个MSG,则上述数据流中需要重传的报文的指示信息可以是:该数据流中的某一个MSG中需要重传的报文的指示信息。
上述数据流的标识信息可以包括如下信息中的一项或多项:该数据流的源互联网协议(internet protocol,IP)地址、目的IP地址、源端口号、目的端口号、源队列对(source queue pair,SQP)和目的队列对(destination queue pair,DQP)。可以理解的是,该数据流的标识信息中包括的SQP和DQP分别为该SQP的标识和该DQP的标识。该源队列对和目的队列对可以是基于RoCEv2协议建立的,用于实现上述源端和目的端之间的RDMA数据通信。上述数据流中需要重传的报文的指示信息例如可以是该需要重传的报文的PSN中最小的PSN。即该最小的PSN用于指示从该最小的PSN开始往后的该数据流中的报文均需要重传。示例性地,若该需要重传的报文为该数据流中某一个MSG中需要重传的报文,则该最小的PSN用于指示从该最小的PSN开始往后的该MSG中的报文均需要重传。该需要重传的报文可以包括该最小的PSN对应的报文,或者可以不包括该最小的PSN对应的报文。
一种可能的实现中,上述通信装置可以将数据流在该通信装置所属的中间节点中的出端口与该数据流进行绑定。示例性地,可以将该出端口的端口标识关联存储到该数据流对应的信息所在的流信息表项中。基于前面的描述可知,网络故障可以与数据流的出端口关联,例如网络故障为该出端口连接的链路的故障或下一跳设备的故障等。那么,在出现网络故障后,该通信装置可以基于对应的出端口标识快速查找到从该出端口方发送的数据流对应的信息。进而生成对应的指示重传的报文。
一种可能的实现中,上述数据流对应的信息的表项还可以包括老化周期(age)。该老化周期指示该表项的有效期。示例性地,该老化周期例如可以是10纳秒、15纳秒或20纳秒等等。该老化周期可以根据需求配置,本申请实施例对此不做限制。
为了便于理解上述流信息表,可以示例性地参见表1。
表1
表1示例性示出了流信息表的结构。可以看到流信息表中可以包括多个表项,每个表项与一个数据流对应。每个表项包括对应的数据流的出端口标识、源IP地址、目的IP地址、源端口号、目的端口号、源队列对标识、目的队列对标识、数据包序列号和老化周期。该数据包序列号即为上述描述的需要重传的报文的PSN中最小的PSN。
一种可能的实现中,一个出端口可以用于发送一个或多个数据流。那么,在上述流信息表中,一个出端口标识可以与一个或多个数据流对应的信息绑定。例如,可以参见上述表1,出端口标识PORT1可以与两个数据流对应的信息绑定。出端口标识PORT2则与一个数据流对应的信息绑定。
可以理解的是,上述介绍的流信息表仅为示例。在具体应用的过程中,上述流信息表中可以包括上述出端口标识、源IP地址、目的IP地址、源端口号、目的端口号、源队列对标识、目的队列对标识、数据包序列号和老化周期中的一项或多项。可以不必包括该全部的内容,例如,可以不包括数据流的目的端口号等。该流信息表中包括的具体内容可以根据实际需要设置,本申请实施例不做限制。
基于上述的描述,上述通信装置中的流信息表中包括上述第一数据流对应的表项。为了便于后续描述,将该第一数据流对应的表项称为目标表项。同理,该目标表项中包括该第一数据流在上述中间节点中的出端口(即上述目标出端口)的端口标识。该目标表项还包括该第一数据流的源IP地址、目的IP地址、源端口号、目的端口号、源队列对标识和目的队列对标识中的一项或多项。该目标表项还可以包括该表项的老化周期和指示重传报文的数据包序列号。
一种可能的实现中,上述通信装置可以从该通信装置所属的中间节点接收到的第一数据流的报文中获取一部分信息保存到上述目标表项中。示例性地,该报文可以是该第一数据流中某一个MSG中的第一个报文。为了便于理解,可以参见图5。
图5示例性示出了上述第一个报文的格式。可以看到,该报文中包括以太报文头(Ethernet header)、IP报文头(IP header)、UDP报文头(UDP Header)、基本传输报文头(base transport header,BTH)、负载(payload)、不变的循环冗余检测码(invariant cyclical redundancy check,ICRC)和帧校验序列(frame check sequence,FCS)。
其中,该以太报文头中可以包括源媒体存取控制地址(media Access control address,MAC)和目的 MAC地址。示例性地,该源MAC地址例如可以是上述第一数据流的源端的MAC地址。该目的MAC地址例如可以是上述第一数据流的目的端的MAC地址。
该IP报文头中可以包括源IP地址和目的IP地址。示例性地,假设上述第一数据流的传输路径为“处理设备201-网络设备2053-网络设备2051-网络设备2056-处理设备204”。则该IP报文头中的源IP地址为该网络设备2053的IP地址。该目的IP地址为该网络设备2056的IP地址。
该UDP报文头可以包括UDP源端口号和UDP目的端口号。RoCEv2中使用的UDP端口号固定为4791。
基本传输报文头为无线带宽技术(InfiniBand,IB)在传输层的头部。该头部可以包括操作码(operation code,OpCode)、目的队列对标识(Dest QP)和数据包序列号(PSN)。该操作码表示RoCEv2报文类型,指出报文处于什么操作模式。在本申请实施例中,主要通过该操作码来捕获MSG中的第一个报文。具体的,一个MSG包括一个fist报文(即第一个报文),一个或多个middle报文(即中间报文),和一个last报文(即最后一个报文)。其中,fist报文、middle报文和last报文中的操作不相同,因此,可以根据操作码来确定哪一个是fist报文。在具体实现中,基本传输报文头中还包括其它的内容,具体参考InfiniBand标准(InfiniBandTM Architecture Specification Volume 1,Release 1.1)中的描述,此处不赘述。
上述负载中则承载的是具体的数据。上述ICRC和FCS用于进行报文校验。
基于上述的描述,上述通信装置可以从上述第一个报文中获得的该一部分信息为该第一个报文中的源IP地址、目的IP地址、源端口号、目的队列对和数据包序列号。
由于上述第一个报文中没有源队列对标识,因此需要通过其它方式获得该源队列对标识。一种可能的实现中,RoCEv2协议中,源端口号、源队列对标识和目的队列对标识这三者之间存在如下关系:源端口号=(源队列对标识XOR目的队列对标识)OR 0xC000。其中,XOR表示“异或”逻辑运算,OR表示“或”逻辑运算。那么,源队列对标识=(源端口号&0x3FFF)XOR目的队列对标识。其中,&表示“与”逻辑运算。由于源端口号和目的队列对表示已经从上述第一个报文中获得,因此,可以计算出该源队列对标识。
另外,上述第一数据流对应的目标出端口的端口表示可以在上述中间节点的转发表中获得。上述目标表项的老化周期可以采用默认配置的时长,或者可以自定义配置一个时长,本申请实施例不做限制。由此,可以获得上述目标表项中包括的内容,并进行保存,以备后续使用。
一种可能的实施例中,若上述通信装置包括该中间节点的接口板上主控单元的协处理器。那么,上述第一个报文可以是该协处理器从该中间节点的入端口处或接收器中捕获过来的。然后,该协处理器对该捕获的第一个报文进行解封装获取上述信息。示例性地,该协处理器可以捕获上述第一数据流中的多个报文,然后,基于报文中的操作码确定该第一个报文,进而获取上述信息。
另一种可能的实施例中,若上述通信装置包括该中间节点的接口板上的LSW芯片。由于该中间节点接收到第一个报文后,会将该第一个报文发送到该LSW芯片中进行处理。因此,该LSW芯片无需特意去捕获该第一个报文。而是在接收到报文后,基于上述操作码确定是上述第一个报文后,即可从中获取上述信息。
另一种可能的实现中,上述目标表项中的源队列对标识可以通过捕获上述第一数据流对应的确认字符(acknowledge,ACK)报文获得。该ACK报文中的目的队列对标识即为该第一数据流对应的源队列对标识。
在具体实现中,由于接收到的ACK报文比较多,因此还需要从这些ACK报文中识别出第一数据流对应的ACK报文。示例性地,上述通信装置获得一个ACK报文之后,可以从该ACK报文中获得ACK.SIP、ACK.DIP、ACK.SQP和ACK.DQP等信息。其中,该ACK.SIP为ACK报文中的源IP地址,也是该ACK报文对应的数据流的目的IP地址。该ACK.DIP为ACK报文中的目的IP地址,也是该ACK报文对应的数据流的源IP地址。该ACK.SQP为ACK报文中的源队列对标识,也是该ACK报文对应的数据流的目的队列对标识。该ACK.DQP为ACK报文中的目的队列对标识,也是该ACK报文对应的数据流的源队列对标识。基于上述的“源端口号=(源队列对标识XOR目的队列对标识)OR 0xC000”,已知该数据流的源队列对标识和目的队列对标识,因此,可以计算出该ACK报文对应的数据流的源端口号,也是ACK报文对应的目的端口号。基于ACK.SIP、ACK.DIP、ACK.SQP、ACK.DQP和该计算得到的数据流的源端口号可以确定一个数据流,进而可以确定该ACK报文是该确定的数据流对应的ACK报文。通过这种实现方式可以确定出上述第一数据流对应的ACK报文,进而获取该ACK报文中的目的队列对标识作为上述目标表项中的源队列对标识。
另一种可能的实现中,上述目标表项中的源队列对标识可以通过与上述第一数据流的源端交互通信获得。
示例性地,上述源端向目的端发送上述第一数据流之前,该源端和目的端之间需要建立队列对(QP)链接。即该源端可以创建一个队列对。该队列对包括一个发送队列和一个接收队列。该发送队列用于向目的端发送信息,该接收队列用于接收来自目的端的信息。同理,该目的端也要创建一个队列对,该队列对也包括一个发送队列和一个接收队列。该接收队列用于接收来自源端的信息,该发送队列用于向源端发送信息。源端创建好队列对之后会向目的端通告该队列对的标识。同理,目的端创建好队列对之后,也会像源端通告该队列对的标识。在本申请实施例中,在该源端创建好该第一数据流的队列对之后,还可以将该创建的队列对的标识发送给上述中间节点。
一种可能的实现中,该源端可以是通过链路层发现协议(link layer discovery protocol,LLDP)报文将该创建的队列对的标识发送给上述中间节点。示例性地,该LLDP报文中可以包括该第一数据流的源IP地址、目的IP地址、源端口号、目的端口号、源队列对标识(即源端创建的队列对的标识)和目的队列对标识(即目的端创建的队列对的标识)。该LLDP报文中包括的这些信息可以用于标识该第一数据流。该源端生成该LLDP报文后,可以首先将该LLDP报文发送给与该源端直连的网络设备。例如发送给上述网络设备2053。该网络设备接收到该LLDP报文后,基于该报文中的信息确定出该第一数据流。然后,将该报文中的源队列标识保存到上述目标表项中。
一种可能的实现中,上述网络设备接收到来自源端的LLDP报文之后,还可以基于该报文中的目的IP地址向下一跳网络设备转发该LLDP报文。使得该下一跳网络设备同样也可以获得该第一数据流的源队列标识。由于该目的IP地址为第一数据流的目的IP地址,基于该目的IP地址的指示逐跳转发该LLDP报文后,可以使得该第一数据流的传输路径上的网络设备均可以获得该第一数据流的源队列标识。
一种可能的实施例中,上述第一数据流中可以包括多个MSG。在上述源端,该多个MSG可以是依次传输的。那么,上述第一个报文可以是该多个MSG中第一个传输的MSG中的第一个报文。即上述目标表项中存储的是该第一个传输的MSG中的第一个报文。另一种可能的实施例中,随着该多个MSG中的第二个MSG、第三个MSG等等不断地从该源端发送出去,上述通信装置可以捕获到该第二MSG中的第一个报文和第三个MSG中的第一个报文等等。这种情况下,该通信装置可以用当前捕获到的MSG的第一个报文中PSN更新上述目标表项中PSN。基于前面的描述可知,该目标表项中的PSN用于指示重传该PSN往后的报文。因此,更新该目标表项中的PSN可以减少重传的报文数量,节省网络资源。
一种可能的实施例中,上述通信装置还可以基于捕获的第一数据流对应的ACK报文中的PSN来更新上述目标表项中的数据包数列号。具体的,可以基于前述描述的方式确定该第一数据流的ACK报文,然后获取该ACK报文中的PSN,并更新到上述目标表项中。由于ACK中的PSN为已经被接收到的报文的PSN,并且目标表项中的PSN用于指示重传该PSN往后的报文,因此用ACK中的PSN更新目标表项中的数据包序列号,可以减少重传的报文的数量,节省网络资源。
一种可能的实现中,若上述中间节点为上述源端直连的网络设备。例如,该中间节点为上述网络设备2053。该中间节点接收到的来自源端的报文还没有封装有IP报文头,该IP报文头是由该中间节点来封装的。因此,在该中间节点中,上述目标表项中的IP源地址为自身的IP地址。该目标表项中的目的IP地址即为上述第一数据流的目的IP地址,可以通过查找该中间节点中的路由表获得。
基于上述的描述,上述流信息表中已经存储有上述第一数据流对应的信息。那么,在该第一数据流的传输路径出现网络故障的情况下,通信装置可以从该流信息表中获取该第一数据流对应的信息。并基于获取的信息生成上述第一报文。
示例性地,基于前面的描述可知,该网络故障与该第一数据流的目标出端口关联。例如,该网络故障为该目标出端口连接的链路故障,或为该目标出端口连接的下一跳网络设备的故障等。在上述中间节点感知到该网络故障后,可以识别出该网络故障为该目标出端口关联的网络故障。可以理解的是,该中间节点可以采用现有的网络故障感知方法来感知网络故障。例如,可以通过心跳检测、因特网包探索器(packet internet groper,PING)或路由通告故障等方式来使该中间节点感知网络故障。此处感知网络故障的方式仅为示例,不构成对本申请实施例的限制。
然后,该通信装置可以以该目标出端口的端口标识为索引,在上述流信息表中查找到上述目标表项,并获取该目标表项中的第一数据流对应的信息。例如获取该第一数据流对应的源IP地址、目的IP地址、源端口号、目的端口号、源队列对标识、目的队列对标识和数据包序列号。然后,上述通信装置基于该获取的信息生成上述第一报文。该第一报文可以是RoCEv2协议下的NAK报文。
可以参见图6A,示例性地示出了上述数据流的响应报文的格式结构。该响应报文可以是ACK报文和 NAK报文,具体类型可以根据报文中的标志位来区分。在图6A中可以看到,该NAK报文包括以太报文头(Ethernet header)、IP报文头(IP header)、UDP报文头(UDP Header)、基本传输报文头(base transport header,BTH)、ACK扩展传输头(ACK extended transport header,AETH)和用于校验的ICRC。
其中,该以太报文头中可以包括源MAC地址和目的MAC地址。示例性地,该源MAC地址例如可以是上述第一数据流的目的端的MAC地址。该目的MAC地址例如可以是上述第一数据流的源端的MAC地址。
该IP报文头中可以包括源IP地址和目的IP地址。示例性地,假设上述第一数据流的传输路径为“处理设备201-网络设备2053-网络设备2051-网络设备2056-处理设备204”。则该IP报文头中的源IP地址为该网络设备2056的IP地址。该目的IP地址为该网络设备2053的IP地址。
该UDP报文头可以包括UDP源端口号和UDP目的端口号。该目的端口后仍然是默认的4791。该源端口号可以是基于上述的“源端口号=(源队列对标识XOR目的队列对标识)OR 0xC000”计算得到。
基本传输报文头为无线带宽技术在传输层的头部。该头部的具体描述可以参考上述图5中对应的描述,此处不赘述。
该ACK扩展传输头包含ACK数据包的附加传输字段。可以示例性地参见图6B。可以看到,该ACK扩展传输头可以包括4个字节(0-3字节),即32比特的长度。其中,消息序列号(message sequence number,MSN)占用0-23比特,标志位(syndrome)占用24-31比特。该消息序列号可以标示已成功接收的数据的总长度。该标志位可以用于标示该报文是ACK报文还是NAK报文。可以示例性地参见表2。
表2
上述表2示例性示出了标志位占用的8个比特中包括的内容的具体定义。其中,该8个比特中5-6比特(即上述bits 6:5)为00时,表示报文为ACK报文。若该bits 6:5为11,则表示该报文为NAK报文。其余比特的定义的描述可以对应参考InfiniBand标准(InfiniBandTM ArchitectureSpecification Volume 1,Release 1.1)的表43(Table 43)中的描述,此处不赘述。
基于上述的描述,上述通信装置生成上述第一报文的过程中,可以将上述获得的第一数据流的源IP地址作为该第一报文中的目的IP地址。将获得的第一数据流的目的IP地址作为该第一报文中的源IP地址。将获得的源队列对标识作为该第一报文中的目的队列对标识。将获得的目的队列对标识作为该第一报文中的源队列对标识。将获得的数据包序列号,或获得的数据包序列号加1后获得的序列号作为该第一报文中指示重传的起始报文序列号。为了便于理解可以参考表3。
表3
另外,上述第一报文中的目的端口号是默认的4791。该第一报文中的源端口号=(源队列对标识XOR目的队列对标识)OR 0xC000。该源队列对标识和目的队列对标识即为上述的第一报文中的源队列对标识和目的队列对标识。因此,可以计算出该第一报文中的源端口号。
通信装置将上述获得的第一报文中的各项信息封装到该第一报文中,并将该第一报文的标志位中的bits 6:5设置为11,以指示该报文为NAK报文。进而生成一个完整的NAK报文,即生成了该第一报文。然后,该通信装置向上述第一数据流的源端发送该第一报文。
S403、上述源端接收第一报文,并基于第一报文重传上述一个或多个报文。
该源端接收到该第一报文之后,基于该报文中的指示重传报文。例如,重传该第一报文中包括的PSN指示的报文以及该PSN后面的报文。从而实现了在发生网络故障后,快速重传丢失的报文,提高报文重传的效率,进而提高报文对应业务的处理性能。
上述实施例中主要是以上述目标出端口与第一数据流对应的流信息关联为例描述。可以理解的是,另一种可能的实现中,上述中间节点通过某一个入端口接收该第一数据流。为了便于后续的描述,可以将该某一个入端口简称为目标入端口。上述网络故障可以是该第一数据流的传输路径中,该目标入端口至该第一数据流的源端之间的路径上的链路故障或设备故障等。例如,该网络故障可以是该目标入端口连接的链路的故障。或者,例如,该网络故障可以是该目标入端口连接的上一跳网络设备的故障。这种情况下,该网络故障可以与该目标入端口关联。具体关联方式可以参考上述网络故障与出端口的关联方式,此处不赘述。并且该目标入端口还与从该目标入端口接收的数据流对应的信息关联存储到流信息表(简称为入口流信息表)中。例如可以参考上述表1,把上述表1中的出端口标识改为入端口标识即可获得该入口流信息表的一个示例。该入口流信息表中的信息的获取和存储可以参考前述的描述,此处不赘述。
当出现网络故障后,并确定该网络故障为上述目标入端口连接的链路或装置的故障。示例性地,该网络故障可以是由设备的主控板感知。该主控板感知到网络故障后,可以触发上述通信装置执行上述的报文生成的操作。那么,该通信装置可以基于该目标入端口的标识在上述入口流信息表中快速确定出上述第一数据流,并获取该流信息表中该第一数据流对应的信息。进而基于获取的信息生成上述第一报文。具体生成第一报文的实现可以参考上述的描述,此处不赘述。
然后,该通信装置向上述第一数据流的源端发送该第一报文。该源端接收到该第一报文之后,基于该报文中的指示重传报文。例如,重传该第一报文中包括的PSN指示的报文以及该PSN后面的报文。从而实现了在发生网络故障后,快速重传丢失的报文,提高报文重传的效率,进而提高报文对应业务的处理性能。
上述实施例主要以上述通信装置为上述传输路径中的中间节点包括的装置为例。另一种可能的实施例中,若该通信装置该第一数据流的目的端包括的装置,也可以在感知上述网络故障的情况下快速实现报文的重传。示例性地,在具体实现中,该网络故障可以与该目的端设备的入端口关联。具体关联方式可以参考上述网络故障与出端口的关联方式,此处不赘述。并且该入端口还与从该入端口接收的数据流对应的信息关联存储到流信息表(简称为入口流信息表)中。例如可以参考上述表1,把上述表1中的出端口标识改为入端口标识即可获得该入口流信息表的一个示例。该入口流信息表中的信息的获取和存储可以参考前述的描述,此处不赘述。
当该目的端设备感知到网络故障后,并确定该网络故障为某一个入端口连接的链路或装置的故障。示例性地,该网络故障可以是由设备的主控板感知。该主控板感知到网络故障后,可以触发上述通信装置执行上述的报文生成的操作。那么,该通信装置可以基于该入端口的标识在上述入口流信息表中快速确定出上述第一数据流,并获取该流信息表中该第一数据流对应的信息。进而基于获取的信息生成上述第一报文。具体生成第一报文的实现可以参考上述的描述,此处不赘述。
然后,该通信装置向上述第一数据流的源端发送该第一报文。该源端接收到该第一报文之后,基于该报文中的指示重传报文。例如,重传该第一报文中包括的PSN指示的报文以及该PSN后面的报文。从而实现了在发生网络故障后,快速重传丢失的报文,提高报文重传的效率,进而提高报文对应业务的处理性能。
上述主要对本申请实施例提供的数据流处理方法进行了介绍。可以理解的是,各个装置为了实现上述对应的功能,其包含了执行各个功能相应的硬件结构和/或软件模块。结合本文中所公开的实施例描述的各示例的单元及步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图7示出了通信装置700的一种可能的逻辑结构示意图。该通信装置700可以是上述数据流处理方法中所述的通信装置。该通信装置700包括生成单元701和发送单元702。其中:
生成单元701,用于在第一数据流的传输路径出现网络故障的情况下,生成第一报文,该第一报文指 示该第一数据流的源节点重传该第一数据流中的一个或多个报文;该通信装置700在该传输路径上;
发送单元702,用于发送该第一报文。
一种可能的实施例中,该网络故障为传输该第一数据流的端口连接的链路或装置故障;
该通信装置700还包括确定单元,用于在该生成单元701生成该第一报文之前,根据该端口确定该第一数据流。
一种可能的实施例中,该通信装置700还包括接收单元,用于在该第一数据流的传输路径出现网络故障之前,接收该第一数据流中的第二报文;
该通信装置700还包括获取单元,用于根据该第二报文获取与该第一数据流对应的第一信息,该第一信息包括该第二报文中的源IP地址、目的IP地址、源端口号、目的队列对DQP和数据包序列号,该第一信息用于生成该第一报文。
一种可能的实施例中,该第一信息还包括该第一数据流的源队列对SQP;该SQP基于该源端口号和该DQP计算得到。
一种可能的实施例中,该第一信息还包括该第一数据流的SQP;在该第一数据流的传输路径出现网络故障之前,
该接收单元,还用于接收该第一数据流对应的第一确认ACK报文;
该获取单元,还用于获取该ACK报文中的DQP;
该通信装置700还包括保存单元,用于将该ACK报文中的DQP作为该第一数据流的SQP保存。
一种可能的实施例中,该第一信息还包括该第一数据流的SQP;在该第一数据流的传输路径出现网络故障之前,
该接收单元,还用于接收来自该源节点的第三报文,该第三报文中包括该SQP;
该通信装置700还包括保存单元,用于保存该SQP。
一种可能的实施例中,该第三报文为链路层发现协议报文。
一种可能的实施例中,该获取单元从该第二报文中获取第一信息之后,
该接收单元,还用于接收该第一数据流对应的第二确认ACK报文;
该获取单元,还用于获取该ACK报文中的第一数据包序列号;
该通信装置700还包括更新单元,用于基于该第一数据包序列号更新该通信装置700中存储的该第一信息中的数据包序列号。
一种可能的实施例中,该第一报文为消极确认NAK报文。
一种可能的实施例中,该第一通信装置700为该传输路径上的网络设备,或者为该网络设备中的网络接口板,或者为该网络设备中的处理器,或者为该网络设备中的交换芯片。
一种可能的实施例中,该第一数据流为基于RoCEv2协议传输的数据流;该第二报文为该第一数据流包括的消息MSG中的第一个报文。
图7所示通信装置700中各个单元的具体操作以及有益效果可以参见上述图4及其可能的实施例中对应的描述,此处不再赘述。
在采用对应各个功能划分各个功能模块的情况下,图8示出了数据流处理装置800的一种可能的逻辑结构示意图。该数据流处理装置800可以是上述数据流处理方法中所述的源端,或为该源端的网络接口板、处理器或交换芯片等。该数据流处理装置800包括接收单元801和重传单元802。其中:
接收单元801,用于接收来自第二通信装置的第一报文;该第一报文是在该第一数据流的传输路径出现网络故障的情况下生成的,该第一报文指示重传该第一数据流中的一个或多个报文,该第二通信装置在该第一数据流的传输路径上,该数据流处理装置为该第一数据流的源节点;
重传单元802,用于重传该一个或多个报文。
一种可能的实施例中,该数据流处理装置800还包括发送单元,用于在该接收单元接收来自第二通信装置的第一报文之前,向该第二通信装置发送第二报文,该第二报文中包括该第一数据流的源队列对SQP,该SQP用于生成该第一报文。
图8所示数据流处理装置800中各个单元的具体操作以及有益效果可以参见上述图4及其可能的实施例中对应的描述,此处不再赘述。
图9所示为本申请提供的通信装置900的一种可能的硬件结构示意图,该通信装置900可以是上述数 据流处理方法中所述的通信装置。该通信装置900包括:处理器901、存储器902和通信接口903。处理器901、通信接口903以及存储器902可以相互连接或者通过总线904相互连接。
示例性的,存储器902用于存储通信装置900的计算机程序和数据,存储器902可以包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)或便携式只读存储器(compact disc read-only memory,CD-ROM)等。
上述方法实施例中通信装置的全部或部分单元的功能所需的软件或程序代码存储在存储器902中。
一种可能的实施方式中,如果是部分单元的功能所需的软件或程序代码存储在存储器902中,则处理器901除了调用存储器902中的程序代码实现部分功能外,还可以配合其他部件(如通信接口903)共同完成方法实施例描述的其他功能(如接收或发送报文的功能)。
通信接口903的个数可以为多个,用于支持通信装置900进行通信,例如接收或发送报文等。
示例性的,处理器901可以是中央处理器单元、通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。处理器901可以用于读取上述存储器902中存储的程序,执行上述图4及其可能的实施例所述的方法中通信装置执行的操作。
一种可能的实现中,上述通信装置900可以是上述图3所示的接口板320。上述处理器901可以是该接口板320中的处理器3201。上述存储器902可以是该接口板320中的存储器3202。上述通信接口903可以是该接口板320中的PHY以太网接口3205。此处仅为示例,不构成对本申请实施例的限制。
图9所示通信装置900中各个单元的具体操作以及有益效果可以参见上述方法实施例中对应的描述,此处不再赘述。
图10所示为本申请提供的数据流处理装置1000的一种可能的硬件结构示意图,该数据流处理装置1000可以是上述数据流处理方法中所述的源端,或为该源端的网络接口板、处理器或交换芯片等。该数据流处理装置1000包括:处理器1001、存储器1002和通信接口1003。处理器1001、通信接口1003以及存储器1002可以相互连接或者通过总线1004相互连接。
示例性的,存储器1002用于存储数据流处理装置1000的计算机程序和数据,存储器1002可以包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)或便携式只读存储器(compact disc read-only memory,CD-ROM)等。
上述方法实施例中源端的全部或部分单元的功能所需的软件或程序代码存储在存储器1002中。
一种可能的实施方式中,如果是部分单元的功能所需的软件或程序代码存储在存储器1002中,则处理器1001除了调用存储器1002中的程序代码实现部分功能外,还可以配合其他部件(如通信接口1003)共同完成方法实施例描述的其他功能(如接收或发送报文的功能)。
通信接口1003的个数可以为多个,用于支持数据流处理装置1000进行通信,例如接收或发送报文等。
示例性的,处理器1001可以是中央处理器单元、通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。处理器1001可以用于读取上述存储器1002中存储的程序,执行上述图4及其可能的实施例所述的方法中源端执行的操作。
图10所示数据流处理装置1000中各个单元的具体操作以及有益效果可以参见上述方法实施例中对应的描述,此处不再赘述。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行以实现上述各个实施例及其可能的实施例中任意一个实施例中的通信装置所做的操作。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行以实现上述各个实施例及其可能的实施例中任意一个实施例中的源端所做的操作。
本申请实施例还提供一种计算机程序产品,当该计算机程序产品被计算机读取并执行时,上述各个实施例及其可能的实施例中任意一个实施例中的通信装置所做的操作将被执行。
本申请实施例还提供一种计算机程序产品,当该计算机程序产品被计算机读取并执行时,上述各个实施例及其可能的实施例中任意一个实施例中的源端所做的操作将被执行。
参见图11,本申请实施例还提供一种通信系统1100。该通信系统包括第一通信装置1101和第二通信装置1102。该第一通信装置1101可以是上述数据流处理方法中的通信装置。该第二通信装置1102可以是上述数据流处理方法中的源端。
综上所述,采用本申请实施例能够提高报文重传的效率,进而提高报文对应业务的处理性能。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。
还应理解,在本申请的各个实施例中,各个过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
还应理解,术语“包括”(也称“includes”、“including”、“comprises”和/或“comprising”)当在本说明书中使用时指定存在所陈述的特征、整数、步骤、操作、元素、和/或部件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元素、部件、和/或其分组。
还应理解,说明书通篇中提到的“一个实施例”、“一实施例”、“一种可能的实现方式”意味着与实施例或实现方式有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”、“一种可能的实现方式”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (31)

  1. 一种数据流处理方法,其特征在于,所述方法包括:
    在第一数据流的传输路径出现网络故障的情况下,所述传输路径上的第一通信装置生成第一报文,所述第一报文指示所述第一数据流的源节点重传所述第一数据流中的一个或多个报文;
    所述第一通信装置发送所述第一报文。
  2. 根据权利要求1所述的方法,其特征在于,所述网络故障为传输所述第一数据流的端口连接的链路或装置故障;
    所述第一通信装置生成第一报文之前,所述方法还包括:
    所述第一通信装置根据所述端口确定所述第一数据流。
  3. 根据权利要求1或2所述的方法,其特征在于,在所述第一数据流的传输路径出现网络故障之前,所述方法还包括:
    所述第一通信装置接收所述第一数据流中的第二报文;
    所述第一通信装置根据所述第二报文获取与所述第一数据流对应的第一信息,所述第一信息包括所述第二报文中的源IP地址、目的IP地址、源端口号、目的队列对DQP和数据包序列号,所述第一信息用于生成所述第一报文。
  4. 根据权利要求3所述的方法,其特征在于,所述第一信息还包括所述第一数据流的源队列对SQP;所述SQP基于所述源端口号和所述DQP计算得到。
  5. 根据权利要求3所述的方法,其特征在于,所述第一信息还包括所述第一数据流的SQP;在所述第一数据流的传输路径出现网络故障之前,还包括:
    所述第一通信装置接收所述第一数据流对应的第一确认ACK报文;
    所述第一通信装置获取所述ACK报文中的DQP;
    所述第一通信装置将所述ACK报文中的DQP作为所述第一数据流的SQP保存。
  6. 根据权利要求3所述的方法,其特征在于,所述第一信息还包括所述第一数据流的SQP;在所述第一数据流的传输路径出现网络故障之前,还包括:
    所述第一通信装置接收来自所述源节点的第三报文,所述第三报文中包括所述SQP;
    所述第一通信装置保存所述SQP。
  7. 根据权利要求6所述的方法,其特征在于,所述第三报文为链路层发现协议报文。
  8. 根据权利要求3-7任一项所述的方法,其特征在于,所述第一通信装置从所述第二报文中获取第一信息之后,还包括:
    所述第一通信装置接收所述第一数据流对应的第二确认ACK报文;
    所述第一通信装置获取所述ACK报文中的第一数据包序列号;
    所述第一通信装置基于所述第一数据包序列号更新所述通信装置中存储的所述第一信息中的数据包序列号。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述第一报文为消极确认NAK报文。
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述第一通信装置为所述传输路径上的网络设备,或者为所述网络设备中的网络接口板,或者为所述网络设备中的处理器,或者为所述网络设备中的交换芯片。
  11. 根据权利要求3-8任一项所述的方法,其特征在于,所述第一数据流为基于RoCEv2协议传输的数据流;
    所述第二报文为所述第一数据流包括的消息MSG中的第一个报文。
  12. 一种数据流处理方法,其特征在于,所述方法包括:
    第一通信装置接收来自第二通信装置的第一报文;所述第一报文是在所述第一数据流的传输路径出现网络故障的情况下生成的,所述第一报文指示重传所述第一数据流中的一个或多个报文,所述第二通信装置在所述第一数据流的传输路径上,所述第一通信装置为所述第一数据流的源节点;
    所述第一通信装置重传所述一个或多个报文。
  13. 根据权利要求12所述的方法,其特征在于,第一通信装置接收来自第二通信装置的第一报文之前,还包括:
    所述第一通信装置向所述第二通信装置发送第二报文,所述第二报文中包括所述第一数据流的源队列对SQP,所述SQP用于生成所述第一报文。
  14. 一种通信装置,其特征在于,所述通信装置包括:
    生成单元,用于在第一数据流的传输路径出现网络故障的情况下,生成第一报文,所述第一报文指示所述第一数据流的源节点重传所述第一数据流中的一个或多个报文;所述通信装置在所述传输路径上;
    发送单元,用于发送所述第一报文。
  15. 根据权利要求14所述的通信装置,其特征在于,所述网络故障为传输所述第一数据流的端口连接的链路或装置故障;
    所述通信装置还包括确定单元,用于在所述生成单元生成所述第一报文之前,根据所述端口确定所述第一数据流。
  16. 根据权利要求14或15所述的通信装置,其特征在于,所述通信装置还包括接收单元,用于在所述第一数据流的传输路径出现网络故障之前,接收所述第一数据流中的第二报文;
    所述通信装置还包括获取单元,用于根据所述第二报文获取与所述第一数据流对应的第一信息,所述第一信息包括所述第二报文中的源IP地址、目的IP地址、源端口号、目的队列对DQP和数据包序列号,所述第一信息用于生成所述第一报文。
  17. 根据权利要求16所述的通信装置,其特征在于,所述第一信息还包括所述第一数据流的源队列对SQP;所述SQP基于所述源端口号和所述DQP计算得到。
  18. 根据权利要求16所述的通信装置,其特征在于,所述第一信息还包括所述第一数据流的SQP;在所述第一数据流的传输路径出现网络故障之前,
    所述接收单元,还用于接收所述第一数据流对应的第一确认ACK报文;
    所述获取单元,还用于获取所述ACK报文中的DQP;
    所述通信装置还包括保存单元,用于将所述ACK报文中的DQP作为所述第一数据流的SQP保存。
  19. 根据权利要求16所述的通信装置,其特征在于,所述第一信息还包括所述第一数据流的SQP;在所述第一数据流的传输路径出现网络故障之前,
    所述接收单元,还用于接收来自所述源节点的第三报文,所述第三报文中包括所述SQP;
    所述通信装置还包括保存单元,用于保存所述SQP。
  20. 根据权利要求19所述的通信装置,其特征在于,所述第三报文为链路层发现协议报文。
  21. 根据权利要求16-20任一项所述的通信装置,其特征在于,所述获取单元从所述第二报文中获取第一信息之后,
    所述接收单元,还用于接收所述第一数据流对应的第二确认ACK报文;
    所述获取单元,还用于获取所述ACK报文中的第一数据包序列号;
    所述通信装置还包括更新单元,用于基于所述第一数据包序列号更新所述通信装置中存储的所述第一信息中的数据包序列号。
  22. 根据权利要求14-21任一项所述的通信装置,其特征在于,所述第一报文为消极确认NAK报文。
  23. 根据权利要求14-22任一项所述的通信装置,其特征在于,所述第一通信装置为所述传输路径上的网络设备,或者为所述网络设备中的网络接口板,或者为所述网络设备中的处理器,或者为所述网络设备中的交换芯片。
  24. 根据权利要求14-21任一项所述的通信装置,其特征在于,所述第一数据流为基于RoCEv2协议传输的数据流;
    所述第二报文为所述第一数据流包括的消息MSG中的第一个报文。
  25. 一种数据流处理装置,其特征在于,所述数据流处理装置包括:
    接收单元,用于接收来自第二通信装置的第一报文;所述第一报文是在所述第一数据流的传输路径出现网络故障的情况下生成的,所述第一报文指示重传所述第一数据流中的一个或多个报文,所述第二通信装置在所述第一数据流的传输路径上,所述数据流处理装置为所述第一数据流的源节点;
    重传单元,用于重传所述一个或多个报文。
  26. 根据权利要求25所述的数据流处理装置,其特征在于,所述数据流处理装置还包括发送单元,用于在所述接收单元接收来自第二通信装置的第一报文之前,
    向所述第二通信装置发送第二报文,所述第二报文中包括所述第一数据流的源队列对SQP,所述SQP用于生成所述第一报文。
  27. 一种通信装置,其特征在于,包括处理器和存储器,其中,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,使得所述通信装置执行如权利要求1至11任一项所述的方法。
  28. 一种数据流处理装置,其特征在于,包括处理器和存储器,其中,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,使得所述通信装置执行如权利要求12或13所述的方法。
  29. 一种通信系统,其特征在于,所述通信系统包括第一通信装置和第二通信装置,所述第一通信装置为权利要求14至24任一项所述的通信装置,所述第二通信装置为权利要求25或26所述的数据流处理装置。
  30. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求1至11任意一项所述的方法;或者,
    所述计算机程序被处理器执行以实现权利要求12或13所述的方法。
  31. 一种计算机程序产品,其特征在于,当该计算机程序产品被计算机读取并执行时,权利要求1至11任意一项所述的方法将被执行;或者,
    当该计算机程序产品被计算机读取并执行时,权利要求12或13所述的方法将被执行。
PCT/CN2023/113931 2022-08-31 2023-08-21 数据流处理方法及相关装置 WO2024046151A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211054474.5 2022-08-31
CN202211054474.5A CN117714011A (zh) 2022-08-31 2022-08-31 数据流处理方法及相关装置

Publications (1)

Publication Number Publication Date
WO2024046151A1 true WO2024046151A1 (zh) 2024-03-07

Family

ID=90100403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/113931 WO2024046151A1 (zh) 2022-08-31 2023-08-21 数据流处理方法及相关装置

Country Status (2)

Country Link
CN (1) CN117714011A (zh)
WO (1) WO2024046151A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030172335A1 (en) * 2002-03-08 2003-09-11 Debendra Das Sharma Dynamic end to end retransmit apparatus and method
US20100165830A1 (en) * 2008-12-22 2010-07-01 LiveTimeNet, Inc. System and method for recovery of packets in overlay networks
CN103546917A (zh) * 2013-11-07 2014-01-29 华为技术有限公司 数据传输方法和装置
CN104038364A (zh) * 2013-12-31 2014-09-10 华为技术有限公司 分布式流处理系统的容错方法、节点及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030172335A1 (en) * 2002-03-08 2003-09-11 Debendra Das Sharma Dynamic end to end retransmit apparatus and method
US20100165830A1 (en) * 2008-12-22 2010-07-01 LiveTimeNet, Inc. System and method for recovery of packets in overlay networks
CN103546917A (zh) * 2013-11-07 2014-01-29 华为技术有限公司 数据传输方法和装置
CN104038364A (zh) * 2013-12-31 2014-09-10 华为技术有限公司 分布式流处理系统的容错方法、节点及系统

Also Published As

Publication number Publication date
CN117714011A (zh) 2024-03-15

Similar Documents

Publication Publication Date Title
US11522790B2 (en) Multipath data transmission processing method and network device
CN108881008B (zh) 一种数据传输的方法、装置和系统
WO2020236279A1 (en) System and method for facilitating efficient management of idempotent operations in a network interface controller (nic)
CN104025525B (zh) 用于发送分组的方法和设备以及交换机装置
US8098682B2 (en) System and method for interfacing with a management system
US9577791B2 (en) Notification by network element of packet drops
US8953631B2 (en) Interruption, at least in part, of frame transmission
US10320677B2 (en) Flow control and congestion management for acceleration components configured to accelerate a service
US20180287928A1 (en) Switch-based reliable multicast service
US20120155458A1 (en) Repeated Lost Packet Retransmission in a TCP/IP Network
CN111740939B (zh) 报文传输装置、设备、方法及存储介质
CN107147655A (zh) 一种网络双协议栈并行处理模型及其处理方法
US10326696B2 (en) Transmission of messages by acceleration components configured to accelerate a service
CN110838935A (zh) 高可用sdn控制器集群方法、系统、存储介质及设备
US20150055482A1 (en) TCP Extended Fast Recovery and Segment Timing
CN102769520A (zh) 基于sctp协议的无线网络拥塞控制方法
KR100459539B1 (ko) 외부네트워크 인터페이스에 대한 최대전송단위 조절기능을 가지는 라우터 및 그 방법
US11683379B2 (en) Efficient message transmission and loop avoidance in an RPL network
CN116074401B (zh) 一种在可编程交换机上的传输层协议实现方法
WO2024046151A1 (zh) 数据流处理方法及相关装置
Wang et al. Concurrent multipath transfer protocol used in ad hoc networks
US7490160B2 (en) Method of efficiently transmitting/receiving data using transport layer in a mobile ad hoc network, and network device using the method
CN111147386B (zh) 用于处理数据传输拥塞的方法、电子设备和计算机可读介质
CN117812027B (zh) Rdma加速组播方法、装置、设备及存储介质
Chen et al. DTS: A Dual Transport Switching Scheme for RDMA-based Applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23859185

Country of ref document: EP

Kind code of ref document: A1