WO2019029318A1 - 网络拥塞的通告方法、代理节点及计算机设备 - Google Patents
网络拥塞的通告方法、代理节点及计算机设备 Download PDFInfo
- Publication number
- WO2019029318A1 WO2019029318A1 PCT/CN2018/095602 CN2018095602W WO2019029318A1 WO 2019029318 A1 WO2019029318 A1 WO 2019029318A1 CN 2018095602 W CN2018095602 W CN 2018095602W WO 2019029318 A1 WO2019029318 A1 WO 2019029318A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- queue pair
- pair number
- data packet
- destination
- congestion
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/35—Switches specially adapted for specific applications
- H04L49/356—Switches specially adapted for specific applications for storage area networks
- H04L49/358—Infiniband Switches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/11—Identifying congestion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/11—Identifying congestion
- H04L47/115—Identifying congestion using a dedicated packet
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/12—Avoiding congestion; Recovering from congestion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/12—Avoiding congestion; Recovering from congestion
- H04L47/129—Avoiding congestion; Recovering from congestion at the destination endpoint, e.g. reservation of terminal resources or buffer space
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/26—Flow control; Congestion control using explicit feedback to the source, e.g. choke packets
- H04L47/263—Rate modification at the source after receiving feedback
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/30—Flow control; Congestion control in combination with information about buffer occupancy at either end or at transit nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/50—Queue scheduling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/70—Admission control; Resource allocation
- H04L47/80—Actions related to the user profile or the type of traffic
- H04L47/803—Application aware
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/565—Conversion or adaptation of application format or content
- H04L67/5651—Reducing the amount or size of exchanged application data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/161—Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
Definitions
- the present disclosure relates to the field of data communication technologies, and in particular, to a method for advertising network congestion, a proxy node, and a computer device.
- Remote direct memory access is a data transmission technology that can read data directly from the memory of other computers through a network without using a computer's processor, cache, or operating system. This reduces the data processing delay in network transmission.
- the RDMA data center usually adopts the Clause (CLOS) mode for networking.
- CLOS Clause
- the bandwidth of the uplink and downlink interfaces of the switch in the network is asymmetric.
- network congestion often occurs, which affects the communication performance of the RDMA.
- network congestion needs to be advertised in a timely manner.
- the sender and the receiver respectively obtain the queue number of the peer; when the data needs to be sent to the receiver, the sender is based on The link layer transmission protocol generates a first data packet, the destination queue pair number of the first data packet is the number of the queue pair at the receiving end, and the Internet Protocol (IP) header setting of the first data packet
- IP Internet Protocol
- ECN Explicit Congestion Notification
- the network node when receiving the first data packet and detecting network congestion, the network node changes the value on the ECN bit in the IP header of the first data packet to the value corresponding to the congestion state, and obtains the processed number a data message, and the processed first data message is sent to the receiving end; when the processed first data message is received and read to the ECN
- the value is the value corresponding to the congestion state, and the receiving end obtains the source (sender) queue pair number, and generates a first congestion notification message.
- the destination queue pair number of the first congestion notification message is the source (sender) queue.
- the receiving end sends the first congestion notification message to the sending end through the network, where the first congestion notification message is used to notify the sending end to obtain the destination queue pair number of the first congestion passing message, and in the first When the destination queue pair number of a congestion notification message is the same as the queue pair number of the sender, the transmission rate of the data stream to which the first data packet belongs is reduced.
- each network node on the transmission link between the transmitting end and the receiving end cannot send the first congestion notification message to the transmitting end, and can only be sent by the receiving end, and thus is sent.
- the terminal cannot reduce the transmission rate of the data stream to which the first data packet belongs, so that subsequent packets of the data stream to which the first data packet belongs are lost during transmission.
- embodiments of the present disclosure provide a method for advertising network congestion, a proxy node, and a computer device.
- the technical solution is as follows:
- a method for advertising network congestion includes:
- the proxy node receives the first data packet of the sender, and the first data packet carries the destination queue pair number.
- the proxy node obtains the source queue pair number of the first data packet according to the destination queue pair number in the first data packet, and adds the source queue pair number to the first data packet to obtain the second data packet.
- the first data packet is sent to the receiving end through the network node.
- the network node generates a first congestion notification message according to the second data packet, and then sends the first congestion notification message to the proxy node.
- the destination queue pair number of the first congestion notification message is actually the source queue pair number of the second data packet.
- the proxy node When receiving the first congestion notification message, the proxy node sends the first congestion notification message to the sending end, so that the sender sends the first queue number of the first congestion notification message to the same as the queue number of the sending end.
- the sending rate of the data stream to which the first data packet belongs is reduced.
- the proxy node adds the source queue pair number to the first data packet during the transmission of the first data packet, and when the network congestion is detected, the network node is configured according to the first datagram.
- the first congestion notification message is generated, and the first congestion notification message is sent to the sending end by using the forwarding of the proxy node, so that the sending end can reduce the sending rate of the data stream to which the first data packet belongs in time to avoid loss. Subsequent messages for this data stream.
- the proxy node maintains a queue pair tracking table, where the queue stores a correspondence between the destination queue number and the source queue pair number for each entry of the tracking table, and thus receives
- the proxy node may number the source queue pair according to the destination queue pair, and the source queue pair number corresponding to the destination queue pair number is found in the queue pair tracking table, and the source queue pair number that is found is the first number.
- the source queue pair number of the data message may be used to the queue pair tracking table, where the queue stores a correspondence between the destination queue number and the source queue pair number for each entry of the tracking table, and thus receives
- the proxy node may number the source queue pair according to the destination queue pair, and the source queue pair number corresponding to the destination queue pair number is found in the queue pair tracking table, and the source queue pair number that is found is the first number.
- the source queue pair number of the data message may be used to the queue pair tracking table, where the queue stores a correspondence between the destination queue number and the source queue pair number for each entry of the tracking table, and
- the solution shown in the embodiment of the present disclosure provides a source-to-queue pair number acquisition manner.
- the proxy node may track the sending sent by the sending end.
- the connection request message is connected, and the destination queue pair number extracted from the Base Transport Header (BTH) of the connection request message is numbered as the destination queue pair, and the proxy node can also track the connection response sent by the receiving end.
- the packet is numbered as the source queue pair number from the BTH of the connection response packet, and the queue pair tracking table is obtained by recording the correspondence between the source queue pair number and the destination queue pair number.
- the proxy node establishes a queue pair tracking table by tracking the packets sent by the sender and the receiver in the connection establishment process, so that the first data packet can be received according to the first data when receiving the first data packet.
- the destination queue pair number of the packet is found, and the corresponding source queue pair number is found.
- the proxy node may split the source queue pair number into the first part and the second part, and add the first part of the source queue pair number to In the checksum field of the User Datagram Protocol (UDP) header of the first data packet, the second part of the source queue pair number is added to the base transmission header BTH of the first data packet. Leave the field to get the second data message.
- UDP User Datagram Protocol
- the proxy node carries the source queue pair number into the first data packet, so that the first data packet carries the active queue pair number, so that when the network is congested, according to the source queue
- the first congestion notification message is sent to the sending end quickly, and the transmission rate of the first congestion notification message is increased.
- the network node when receiving the second data packet and detecting network congestion, parses the second data packet to obtain source media access control of the second data packet (Media Access Control, MAC address, destination MAC address of the second data packet, Internet Protocol (IP) address between the source network of the second data packet, destination IP address of the second data packet, and second data
- source media access control of the second data packet Media Access Control, MAC address, destination MAC address of the second data packet, Internet Protocol (IP) address between the source network of the second data packet, destination IP address of the second data packet, and second data
- IP Internet Protocol
- the source queue pair number of the packet is based on the parsing result.
- the network node uses the source MAC address of the second data packet as the destination MAC address of the first congestion advertisement packet and the destination MAC address of the second data packet as the first.
- the source MAC address of the congestion advertisement packet, the source IP address of the second data packet is used as the destination IP address of the first congestion advertisement packet, and the destination IP address of the second data packet is used as the source of the first congestion advertisement packet.
- the IP address, the source queue pair number of the second data packet is used as the destination queue pair number of the first congestion advertisement packet, and the first congestion notification packet is obtained.
- the first congestion notification message includes a queue depth of a queue to which the congestion time data stream belongs in the network node, and based on the queue depth, the proxy node can determine the first congestion notification message.
- the sending period, and then in the subsequent sending of the first congestion notification message, may be sent according to the determined sending period.
- the proxy node can learn the congestion degree of the current network according to the queue depth of the queue to which the data stream belongs in the network node, and then send the transmission period to the transmitting end according to the transmission period determined by the congestion degree of the network.
- the first congestion notification message is configured to enable the sending end to determine the sending rate of the data stream to which the first data packet belongs according to the congestion degree of the network, so that the first data packet is sent at the maximum sending rate while avoiding network congestion. To reduce the delay in the data transmission process.
- a method for advertising network congestion includes:
- the proxy node receives the first data packet that is sent by the sending end and carries the destination queue pair number, and sends the first data packet to the receiving end through the network node.
- the network node generates a second congestion notification message according to the first data packet, and sends the second congestion notification message to the proxy node, where the network congestion is detected.
- the destination queue pair number of the second congestion notification message is actually the destination queue pair number of the first data packet.
- the proxy node obtains the source queue pair number corresponding to the destination queue pair number according to the destination queue pair number of the second congestion notification message, and replaces the second congestion notification with the source queue pair number.
- the destination queue pair number in the packet is the first congestion notification packet, and the first congestion notification packet is sent to the sending end, so that the sending end is in the destination queue pair number of the first congestion notification packet and the sending end queue.
- the sending rate of the data stream to which the first data packet belongs is reduced.
- the network node during the transmission of the first data packet, when the network congestion is detected, the network node generates a second congestion notification message, and sends the second congestion notification message to the proxy node, the proxy The node uses the source queue pair number of the first data packet to replace the destination queue pair number of the second congestion advertisement message, and then obtains the first congestion notification message, and then sends the first congestion notification message to the sending end, so that the sending end
- the sending rate of the data stream to which the first data packet belongs can be reduced in time to avoid losing subsequent packets of the data stream.
- the network node when receiving the first data packet and detecting network congestion, parses the first data packet, and obtains a source media access control MAC address of the first data packet, The destination MAC address of the data packet, the IP address of the interconnection protocol between the source network of the first data packet, the destination IP address of the first data packet, and the destination queue pair number of the first data packet, based on the parsing result.
- the network node uses the source MAC address of the first data packet as the destination MAC address of the second congestion advertisement packet and the destination MAC address of the first data packet as the source MAC address of the second congestion advertisement packet, and will be the first
- the source IP address of the data packet is used as the destination IP address of the second congestion advertisement packet
- the destination IP address of the first data packet is used as the source IP address of the second congestion advertisement packet
- the destination queue of the first data packet is used.
- the second congestion notification message is obtained by numbering the destination queue pair number as the second congestion notification message.
- the proxy node maintains a queue pair tracking table, and the queue stores a correspondence between the destination queue number and the source queue pair number for each entry of the tracking table, and thus, when received In the first data packet, the proxy node may number the destination queue pair according to the destination queue pair, and find the source queue pair number corresponding to the destination queue pair number in the queue pair tracking table, and the found source queue pair number is the first data.
- the source queue pair number of the packet may be used to the packet.
- the solution shown in the embodiment of the present disclosure provides a source-to-queue pair number acquisition manner.
- the proxy node may track the connection sent by the sending end. Requesting a message, and numbering the destination queue pair number extracted from the basic transmission header of the connection request message as the destination queue pair, and the proxy node may also track the connection response message sent by the receiving end, and will reply from the connection.
- the destination queue pair number extracted from the BTH of the packet is numbered as the source queue pair, and the correspondence between the source queue pair number and the destination queue pair number is established, and the queue pair tracking table is obtained.
- the proxy node establishes a queue pair tracking table by tracking the packets sent by the sender and the receiver in the connection establishment process, so that the first data packet can be received according to the first data when receiving the first data packet.
- the destination queue pair number of the packet is found, and the corresponding source queue pair number is found.
- the first congestion notification message includes a queue depth of a queue to which the congestion time data stream belongs in the network node, and based on the queue depth, the proxy node can determine the first congestion notification message.
- the sending period, and then in the subsequent sending of the first congestion notification message, may be sent according to the determined sending period.
- the proxy node can learn the congestion degree of the current network according to the queue depth of the queue to which the data stream belongs in the network node, and then send the transmission period to the transmitting end according to the transmission period determined by the congestion degree of the network.
- the first congestion notification message is configured to enable the sending end to determine the sending rate of the data stream to which the first data packet belongs according to the congestion degree of the network, so that the first data packet is sent at the maximum sending rate while avoiding network congestion. To reduce the delay in the data transmission process.
- a proxy node for advertising network congestion comprising means for implementing an advertisement method for network congestion according to the first aspect, for example, a message receiving unit, a number obtaining unit , number adding unit and message sending unit.
- a fourth aspect provides a proxy node for announcing network congestion, the proxy node comprising means for implementing an advertisement method for network congestion according to the second aspect, for example, a message receiving unit, a message transmission Unit, number acquisition unit, number replacement unit.
- a fifth aspect provides a computer device, including: a memory, a processor, a communication interface, and a bus;
- the memory, the processor and the communication interface are connected by a bus, the memory is used for storing computer instructions, the processor is used to execute computer instructions for storing the memory, and when the computer device is running, the processor is running computer instructions, so that the computer device performs the first aspect.
- the method for advertising network congestion is used for storing computer instructions.
- a sixth aspect provides a computer device, including: a memory, a processor, a communication interface, and a bus;
- the memory, the processor and the communication interface are connected by a bus, the memory is used for storing computer instructions, the processor is used to execute computer instructions for storing the memory, and when the computer device is running, the processor is running computer instructions, so that the computer device performs the second aspect.
- the method for advertising network congestion is used for storing computer instructions.
- a storage medium wherein at least one instruction is stored in the storage medium, the at least one instruction being loaded by a processor and executed to implement a method of advertising network congestion as described in the first aspect.
- a storage medium wherein at least one instruction is stored in the storage medium, the at least one instruction being loaded by a processor and executed to implement a method of advertising network congestion as described in the second aspect.
- FIG. 1 is a schematic diagram of an RDMA communication flow shown in an embodiment of the present disclosure
- FIG. 2 is a schematic diagram of an RDMA protocol stack according to an embodiment of the present disclosure
- FIG. 3 is a schematic diagram of a CLOS networking structure according to an embodiment of the present disclosure.
- FIG. 4 is a schematic diagram of an RDMA congestion scenario according to an embodiment of the present disclosure.
- FIG. 5 is an implementation environment involved in a method for advertising network congestion provided according to an embodiment of the present disclosure
- FIG. 6 is a schematic diagram showing the location of a proxy node in a system according to an embodiment of the present disclosure
- FIG. 7 is a schematic diagram showing the location of a proxy node in a system according to an embodiment of the present disclosure.
- FIG. 8 is a flowchart of a method for advertising network congestion according to an embodiment of the present disclosure.
- FIG. 9 is a flowchart of a method for advertising network congestion according to an embodiment of the present disclosure.
- FIG. 10 is a schematic structural diagram of a proxy node for advertising network congestion according to an embodiment of the present disclosure.
- FIG. 11 is a schematic structural diagram of a proxy node for advertising network congestion according to an embodiment of the present disclosure
- FIG. 12 is a schematic structural diagram of a computing device provided by an embodiment of the present disclosure.
- the embodiment of the present disclosure adopts the RDMA technology for data transmission. Because RDMA technology can transfer data directly to the computer's storage area over the network without the need for a computer's processor, system performance is improved.
- RDMA communication involves two computer systems, wherein one computer system includes a server A and a channel adapter (CA) A, and the other computer system includes a server. B and channel adapter card B.
- CA channel adapter
- Each channel adapter card is divided into a transport layer, a network layer, a link layer, and a physical layer.
- the server's central processing unit communicates with the channel adapter card through a queue pair (QP).
- the queue pair includes a send queue and a receive queue.
- the sending queue is used by the CPU to send a command to the channel adapter card, which is called a work queue element (WQE);
- WQE work queue element
- the receiving queue is used by the CPU to receive a channel adapter card command, and the command is called a completion queue element. (complete queue element, CQE).
- CQE completion queue element
- server A establishes a physical connection with the port of the network node through the port, and communicates with the network node based on the established connection;
- server B establishes a physical connection with the port of the network node through the port, and based on the established The connection to communicate with the network node.
- Each node in the network can establish a connection with each other through the port and forward the packet based on the established connection.
- the channel adapter card A and the channel adapter card B transmit the number of the destination queue pair by sending each other.
- Bandwidth (Infiniband, IB) messages are communicated.
- the RDMA uses an application layer protocol
- the transport layer protocol may be an Infiniband transport layer protocol, etc.
- the network layer protocol may be an Infiniband network layer protocol or a UDP/IP protocol.
- the underlying link layer protocol may be an Infiniband link layer protocol, an Ethernet based RoCE protocol, and an Ethernet based RoCEv2 protocol.
- the Infiniband protocol is a layered protocol (similar to the TCP/IP protocol), each layer is responsible for different functions, the lower layer is the upper layer service, and the different layers are independent of each other.
- the Infiniband protocol, the Ethernet-based RoCE protocol, and the Ethernet-based RoCEv2 protocol RDMA protocol messages all include a base transport header (BTH) and an IB payload.
- the BTH has a length of 12 bits, and mainly includes a packet sequence number (PSN), a destination queue (destination QP), and a packet OP code.
- the destination queue pair is used to indicate the QP number of the receiving end.
- RDMA uses the CLOS architecture for networking.
- the data center network of the CLOS architecture is set up by using the spine-leaf method.
- the bandwidth of the upstream and downstream interfaces of the switch is asymmetric, and the number of uplink interfaces (the interfaces used to connect the router or the upper-level switch in the switch) is smaller than the downlink interface (the interface used to connect to the host in the switch). The number, therefore, there will be network congestion during data transmission.
- Figure 4(a) and Figure 4(b) show the scenario of the RDMA network congestion.
- the bandwidth of each uplink interface of the switch is 40GE, and the bandwidth of each downlink interface is 10GE. Referring to FIG.
- the embodiment of the present disclosure provides a method for advertising network congestion, in which the network node sends a congestion notification message to the sender through the proxy node when receiving the data packet and detecting network congestion.
- the message is used to notify the sender of the current network state, so that the sender reduces the transmission rate of the data packet to alleviate the network congestion state, thereby achieving the purpose of avoiding data loss.
- FIG. 5 shows a system involved in a method for advertising network congestion according to an embodiment of the present disclosure.
- the system includes: a transmitting end, a proxy node, a network node, and a receiving end.
- the sender and the receiver may be computers in the network, and the status of the sender and the receiver are not fixed.
- the sender may be the receiver in the next data transmission process, and the receiver may be the next data.
- the sender in the transmission process.
- the network node may be a switch, a router, or the like in the network, and the embodiment of the present disclosure does not specifically limit the type of the network node.
- a proxy node is a hardware unit in the network that can be internal to the sender or internal to the network node.
- the deployment form may be a hardware interface on a network interface card (NIC) application-specific integrated circuit (ASIC), or a micro on the NIC. Code or logic code of a NIC field-programmable gate array (FPGA) chip.
- NIC network interface card
- ASIC application-specific integrated circuit
- FPGA field-programmable gate array
- the deployment form may be hardware logic on the switch ASIC, microcode on the switch chip plug-in notification point (NP), or logic code of the switch chip plug-in FPGA chip.
- the dotted line in FIG. 7 shows the working process of the proxy node, that is, after the data packet of the receiving end is obtained, the data packet of the receiving end is sent to the transmitting end through the network; after the data packet of the transmitting end is obtained, the data packet is sent through the network. The data packet of the terminal is sent to the receiving end.
- the embodiment of the present disclosure provides a method for advertising network congestion.
- the method process provided by the embodiment of the present disclosure includes:
- the proxy node receives the first data packet of the sending end.
- the sender When the sender needs to send data to the receiver, the sender obtains the media access control (MAC) address of the sender, the IP address of the sender, the MAC address of the receiver, the IP address of the receiver, and the destination queue pair number of the receiver. And based on the RDMA message format, a standard RDMA data message is generated. In order to view the current network status, the sender will also set the ECN bit in the IP header of the RDMA data message. The ECN bit is used to identify that the RDMA sender has ECN capability, and the different values on the ECN bit indicate different networks.
- MAC media access control
- the status for example, a value of 10 indicates that the network is in a normal state, and a value of 11 indicates that the network is in a congested state or the like.
- the default network state is normal, that is, the value on the ECN bit is 10.
- Table 1 below shows the message format of the generated standard RDMA data message:
- the MAC header is a MAC header field; the IP header is an IP header field, and the value on the ECN identifier bit of the IP header is 10; the IB transport header is an IB transmission header field; Data is data to be transmitted; iCRC is The data transmission error detection field; the Eth frame check sequence (FCS) is a check field for storing the iCRC check value.
- the RDMA data message generated by the sender may be referred to as a first data message.
- the new data packet obtained by the node according to the first data packet is called a second data packet.
- the first data packet carries the source MAC address (the MAC address of the sender), the source IP address (the IP address of the sender), the destination MAC address (the MAC address of the receiver), the destination IP address (the IP address of the receiver), and the destination queue. For the number, etc.
- the first data packet passes through the sending end and each network node in the network, and the proxy node is located inside the sending end or inside the network node, therefore, the proxy The node can receive the first data packet, and then process the received first data packet.
- the proxy node obtains a source queue pair number of the first data packet according to the destination queue pair number.
- the proxy node When the first data packet is received, the proxy node obtains the destination queue pair number of the first data packet by parsing the first data packet, and obtains the source of the first data packet according to the destination queue pair number. Queue pair number. The proxy node obtains the source queue pair number of the first data packet according to the destination queue pair number, including but not limited to the following manner: the proxy node searches for the destination queue pair according to the destination queue pair number from the pre-established queue pair tracking table. The number of the source queue pair corresponding to the number. Each entry of the queue pair tracking table stores a correspondence between the destination queue pair number and the source queue pair number.
- the proxy node Since the queue pair tracking table is the key of the source node pair number of the first data packet obtained by the proxy node, the proxy node needs to establish a queue pair tracking table before obtaining the source queue pair number of the first data packet.
- the process of establishing a queue pair tracking table is as follows:
- the proxy node keeps track of the connection request message and the connection response message sent by each pair of the sender and the receiver during the connection establishment process.
- the transmitting end before sending a data packet to be sent to a receiving end, the transmitting end first establishes a connection, and acquires related information of the opposite end in the process of establishing a connection. Specifically, the sending end sends a connection request message to the receiving end, where the connection request message carries the MAC address of the sending end, the IP address of the sending end, and the queue pair number of the sending end.
- the receiving end By analyzing the connection request packet, the MAC address of the sender, the IP address of the sender, and the queue number of the sender are obtained.
- the receiver also sends a connection response packet to the sender.
- the connection response packet carries the MAC of the receiver.
- the sending end can obtain the MAC address of the receiving end, the IP address of the receiving end, the queue pair number of the receiving end, etc. by analyzing the connection response message. .
- connection request message and the connection response message pass through the sending end, each network node and the receiving end in the network, and the proxy node is located inside the sending end or inside the network node, therefore, The proxy node can track the connection request message and the connection response message sent by the sender and the receiver in the process of detecting each other, and then establish a queue pair tracking table.
- the proxy node extracts the destination queue pair number in the BTH of the connection request message, and extracts the destination queue pair number in the BTH of the connection response message.
- connection request message and the connection response message in the embodiment of the present disclosure both have a basic transmission header BTH, and the BTH carries the destination queue pair number. Therefore, the proxy node can obtain the connection by parsing the BTH of the connection request message. The destination queue pair number of the request packet is received, and the destination queue pair number of the connection response message is obtained by parsing the BTH of the connection response message.
- the proxy node numbers the destination queue pair number extracted from the connection request packet as the destination queue pair, and numbers the destination queue pair number extracted from the connection response packet as the source queue pair, and records the source queue pair.
- the correspondence between the number and the destination queue pair number is obtained by the queue pair tracking table.
- the proxy node may use 0X1010A1 as the destination queue pair number, 0X1010F1 as the source queue pair number, and record the extracted source queue pair number 0X1010F1 and the destination queue pair number 0X1010A1. Correspondence relationship.
- the proxy node may use 0X1010A2 as the destination queue pair number, 0X1010F2 as the source queue pair number, and record the correspondence between the extracted source queue pair number 0X1010F2 and the destination queue pair number 0X1010A2. .
- the queue pair tracking table shown in Table 2 can be established.
- Source queue pair number (24bit) Destination queue pair number (24bit) 0X1010F1 0X1010A1 0X1010F2 0X1010A2
- the proxy node may also obtain the MAC address of the sender and the MAC address of the receiver when establishing the queue pair tracking table.
- the queue number of the sender Based on the queue number of the sender, the queue number of the receiver, the MAC address of the sender, the MAC address of the receiver, and the IP of the sender.
- the address, the IP address of the receiving end, the transport layer protocol, and the queue pair tracking table are established.
- the proxy node When receiving the first data packet, the proxy node can obtain the destination queue pair number, the source MAC address, the destination MAC address, the source IP address, the destination IP address, and the transmission of the first data packet by parsing the first data packet. Layer protocol, etc., and then look up the source queue pair number corresponding to the destination queue pair number, source MAC address, destination MAC address, source IP address, destination IP address, and transport layer protocol from the queue pair tracking table, and find the source The queue pair number is used as the source queue pair number of the first data packet.
- a corresponding source queue can be accurately found in a scenario where one sender and one receiver, one sender and at least two receivers, and two senders and one receiver exist in the network. Pair number.
- the proxy node adds the source queue pair number to the first data packet, obtains the second data packet, and sends the second data packet to the receiving end by using the network node.
- the proxy node obtains the second data packet by adding the source queue pair number to the first data packet.
- the source data queue pair number is also carried in the second data packet, so that when the network node detects the network congestion, the network node can carry the data packet according to the second data packet.
- the source queue pair number is used to quickly determine the sender.
- the proxy node splits the source queue pair number into the first part and the second part.
- the length of the first portion and the length of the second portion may be the same or different, and it is only required to ensure that the sum of the length of the first portion and the length of the second portion is equal to the length of the source queue pair number.
- the proxy node adds the first part to the checksum field of the UDP header of the first data packet, and adds the second part to the reserved field of the base transmission header BTH of the first data packet. , get the second data message.
- the length of the source queue pair number is 16 bits
- the proxy node splits the source queue pair number into a first part of 8 bits and a second part of 8 bits, and adds the first part of 8 bits to the first data message.
- the second part of the 8-bit is added to the reserved field of the BTH of the first data message to obtain a second data message.
- the network node When receiving the second data packet and detecting network congestion, the network node generates a first congestion notification message according to the second data packet, and sends the first congestion notification message to the proxy node.
- the network node When the second data packet is transmitted to a network node in the network, the network node detects network congestion, and the network node generates a first congestion notification message according to the second data packet, the first congestion notification message.
- the value on the ECN bit is the value corresponding to the congestion state, and then the first congestion notification message is sent to the proxy node, and the proxy node sends the message to the sender to notify the current network state of the sender, thereby lowering the first end of the sender.
- the sending rate of the data stream to which the data packet belongs, and in order to ensure the normal transmission of the data packet, the network node further forwards the second data packet to the receiving end.
- the first step the network node parses the second data packet, and obtains the source MAC address of the second data packet, the destination MAC address of the second data packet, the source IP address of the second data packet, and the second data packet.
- the source IP address and the source queue pair number of the second data packet are the same.
- the second data packet carries the source MAC address of the second data packet, the destination MAC address of the second data packet, the source IP address of the second data packet, the destination IP address of the second data packet, and the second
- the source queue pair of the data packet is numbered. Therefore, the network node can obtain the source MAC address of the second data packet, the destination MAC address of the second data packet, and the source of the second data packet by parsing the second data packet.
- the network node uses the source MAC address of the second data packet as the destination MAC address of the first congestion advertisement packet, and the destination MAC address of the second data packet as the source MAC address of the first congestion advertisement packet,
- the source IP address of the second data packet is used as the destination IP address of the first congestion advertisement packet
- the destination IP address of the second data packet is used as the source IP address of the first congestion advertisement packet
- the second data packet is used.
- the source queue pair number is used as the destination queue pair number of the first congestion notification message, and the first congestion notification message is obtained.
- the network node uses the source MAC address of the second data packet as the destination MAC address of the first congestion advertisement packet and the destination MAC address of the second data packet as the source of the first congestion advertisement packet.
- the MAC address, the source IP address of the second data packet is used as the destination IP address of the first congestion advertisement packet, and the destination IP address of the second data packet is used as the source IP address of the first congestion notification packet, and the second
- the source queue pair number of the data packet is numbered as the destination queue pair of the first congestion advertisement packet, and the first congestion notification packet is obtained.
- the reserved field of the first congestion notification message may also carry the network node identifier and the queue depth of the queue to which the second data packet belongs to the network node to which the second data packet belongs.
- the proxy node sends the first congestion notification message to the sending end.
- the proxy node When receiving the first congestion notification message, the proxy node determines the first congestion according to the queue depth of the queue to which the network node belongs according to the data flow to which the second data packet belongs, that is, the data flow to which the first data packet belongs. And sending the received first congestion notification message to the sending end according to the determined sending period. Considering that at least one network node is involved in network congestion, and each network node that receives network congestion returns a first congestion notification message to the proxy node when receiving the second data packet, and the proxy node according to at least one The first congestion notification packet selects the deepest queue depth, and determines the sending period of the first congestion packet according to the deepest queue depth.
- the sending period of the first congestion notification message determined by the proxy node is inversely proportional to the selected queue depth. If the selected queue depth is greater than the preset length, indicating that the network congestion degree is serious, the first may be determined. The sending period of the congestion notification message is the first period. If the queue depth is less than the preset length, the network congestion level is light, and the first period of the first congestion notification packet is determined to be the second period. The preset length may be determined according to an empirical value, and the first period is smaller than the second period.
- the sending end When the sending end receives the first congestion notification message, the sending end decreases the sending rate of the data flow to which the first data packet belongs.
- the number of the first congestion notification message and the sending period of the first congestion notification message can reflect the current network state. Therefore, when receiving at least one first congestion notification message, the sending end can receive the packet according to the receiving.
- the number of the first congestion notification message and the sending period of the first congestion notification message are determined by using a congestion control algorithm to determine a transmission rate of the data flow to which the first data packet belongs, and then send the according to the determined transmission rate. Subsequent messages of the data stream to alleviate network congestion.
- the proxy node when receiving the first data packet, adds the source queue pair number to the first data packet, obtains the second data packet, and sends the second data packet to the first data packet. And the network node sends the second data packet to the receiving end through the network node, and in the forwarding process of the second data packet, if the network congestion is detected, the network node generates the first congestion notification according to the second data packet.
- the first congestion notification message is sent to the proxy node, and the first congestion notification packet is sent by the proxy node to the sending end, so that the sending end reduces the sending rate of the data stream to which the first data packet belongs.
- the network node in the present disclosure detects the congestion of the network, immediately sends the first congestion notification message to the sending end, so that the sending end can reduce the sending rate of the data stream to which the first data packet belongs in time, thereby avoiding losing the data.
- Follow-up messages of the stream
- the embodiment of the present disclosure provides a method for congestion notification.
- the method process provided by the embodiment of the present disclosure includes:
- the proxy node receives the first data packet of the sending end.
- step 801 The specific implementation of this step is the same as the foregoing step 801.
- step 801 For details, refer to step 801 above, and details are not described herein again.
- the proxy node sends the first data packet to the receiving end by using the network node.
- the proxy node When receiving the first data packet, the proxy node sends the first data packet to the receiving end by forwarding the network node.
- the network node When the network node receives the first data packet and detects network congestion, the network node generates a second congestion notification message according to the first data packet, and sends the second congestion notification message to the proxy node.
- the network node When the first data packet is transmitted to a network node in the network, the network node detects network congestion, and the network node generates a second congestion notification message according to the first data packet, the second congestion notification message.
- the value of the ECN bit is the value corresponding to the congestion state, and then the second congestion notification message is sent to the proxy node, and the proxy node sends the message to the sender to notify the current network state of the sender, thereby lowering the first end of the sender.
- the sending rate of the data stream to which the data packet belongs, and in order to ensure the normal transmission of the data packet, the network node further continues to send the first data packet to the receiving end.
- the first step the network node parses the first data packet, and obtains a source media access control MAC address of the first data packet, a destination MAC address of the first data packet, and an interconnection protocol between the source networks of the first data packet.
- the IP address, the destination IP address of the first data packet, and the destination queue pair number of the first data packet are the same.
- the first data packet carries the source MAC address of the first data packet, the destination MAC address of the first data packet, the source IP address of the first data packet, and the destination IP address of the first data packet.
- the network node can obtain the source MAC address of the first data packet, the destination MAC address of the first data packet, the source IP address of the first data packet, and the destination IP address of the first data packet by parsing the first data packet. address.
- the network node uses the source MAC address of the first data packet as the destination MAC address of the second congestion advertisement packet, and the destination MAC address of the first data packet as the source MAC address of the second congestion advertisement packet,
- the source IP address of the first data packet is used as the destination IP address of the second congestion advertisement packet
- the destination IP address of the first data packet is used as the source IP address of the second congestion advertisement packet
- the first data packet is received.
- the destination queue pair number is used as the destination queue pair number of the second congestion notification message, and the second congestion notification message is obtained.
- the network node uses the source MAC address of the first data packet as the destination MAC address of the second congestion advertisement packet and the destination MAC address of the first data packet as the source of the second congestion advertisement packet.
- the MAC address, the source IP address of the first data packet is used as the destination IP address of the second congestion advertisement packet, and the destination IP address of the first data packet is used as the source IP address of the second congestion advertisement packet, which will be the first
- the destination queue pair number of the data packet is numbered as the destination queue pair of the second congestion advertisement packet, and the second congestion notification packet is obtained.
- the reserved field of the second congestion notification message may further include a network node identifier and a queue depth of a queue to which the first data packet belongs to the network node to which the first data packet belongs.
- the proxy node obtains the source queue pair number corresponding to the destination queue pair number according to the destination queue pair number.
- the proxy node When receiving the second congestion notification message, the proxy node numbers the source queue pair corresponding to the destination queue pair number from the pre-established queue pair tracking table according to the destination queue pair number, and each table of the queue pair tracking table The item stores the correspondence between the destination queue pair number and the source queue pair number.
- the process of establishing the queue pair tracking table described in the above step 802 is the same.
- the process of establishing the queue pair tracking table in the above step 802 is not described herein again.
- the proxy node replaces the destination queue pair number in the second congestion notification message with the source queue pair number to obtain the first congestion notification message.
- the proxy node may directly replace the destination queue pair number in the second congestion notification message with the source queue pair number to obtain the first congestion notification message.
- the proxy node sends the first congestion notification message to the sending end.
- step 805 The specific implementation of this step is the same as the above step 805.
- steps 805 above For details, refer to step 805 above, and details are not described herein again.
- the sending end When the sending end receives the first congestion notification message, the sending end decreases the sending rate of the data flow to which the first data packet belongs.
- step 806 The specific implementation of this step is the same as the above step 806.
- step 806 For details, refer to step 806 above, and details are not described herein again.
- the network node when receiving the first data packet of the proxy node and detecting network congestion, the network node sends the generated second congestion notification message to the proxy node, and the proxy node according to the second congestion And generating, by the advertisement packet, the first congestion advertisement packet that carries the source queue pair number of the first data packet, and sending the first congestion advertisement packet to the sending end, so that the sending end reduces the data flow to which the first data packet belongs.
- the sending rate Because the network node in the present disclosure detects the congestion of the network, immediately sends the first congestion notification message to the sending end, so that the sending end can reduce the sending rate of the data stream to which the first data packet belongs in time, thereby avoiding losing the data.
- Follow-up messages of the stream because the network node in the present disclosure detects the congestion of the network, immediately sends the first congestion notification message to the sending end, so that the sending end can reduce the sending rate of the data stream to which the first data packet belongs in time, thereby avoiding losing the data.
- An embodiment of the present disclosure provides a proxy node for advertising network congestion.
- the proxy node includes:
- the message receiving unit 1001 is configured to receive a first data packet of the sending end, where the first data packet carries a destination queue pair number;
- the number obtaining unit 1002 is configured to obtain a source queue pair number of the first data packet according to the destination queue pair number;
- the number adding unit 1003 is configured to add the source queue pair number to the first data packet, obtain the second data packet, and send the second data packet to the receiving end by using the network node;
- the message receiving unit 1001 is further configured to receive a first congestion notification message, where the first congestion notification message is generated by the network node when receiving the second data packet and detecting network congestion, the first congestion notification
- the destination queue pair number of the packet is the source queue pair number.
- the message sending unit 1004 is configured to send the first congestion notification message to the sending end, where the first congestion notification message is used to notify the sending end of the destination queue pair number and the sending end queue of the first congestion notification message.
- the sending rate of the data stream to which the first data packet belongs is reduced.
- the number obtaining unit 1002 is further configured to: according to the destination queue pair number, look up the source queue pair corresponding to the destination queue pair number from the pre-established queue pair tracking table, and the queue pair tracking Each entry in the table stores the correspondence between the destination queue pair number and the source queue pair number.
- the number obtaining unit 1002 is further configured to track the connection request message and the connection response message sent by the sending end and the receiving end during the connection establishment process; and extract the BTH in the connection request message.
- the destination queue pair number and extracts the destination queue pair number in the BTH of the connection response message; the destination queue pair number extracted from the connection request message is used as the destination queue pair number, and the destination is extracted from the connection response message.
- the queue pair number is used as the source queue pair number, and the correspondence between the source queue pair number and the destination queue pair number is recorded, and the queue pair tracking table is obtained.
- the number adding unit 1003 is further configured to split the source queue pair number into the first part and the second part; and add the first part to the check of the UDP header of the first data packet. And in the field, and adding the second part to the reserved field of the BTH of the first data packet to obtain the second data message.
- the first congestion notification message includes a queue depth of a queue to which the congestion time data stream belongs in the network node
- the message sending unit 1004 is further configured to determine a sending period of the first congestion notification message according to the queue depth, and send the first congestion notification message to the sending end according to the sending period.
- Embodiments of the present disclosure provide a proxy node for advertising network congestion.
- the proxy node includes:
- the message receiving unit 1101 is configured to receive a first data packet of the sending end, where the first data packet carries a destination queue pair number;
- the message sending unit 1102 is configured to send, by using the network node, the first data packet to the receiving end;
- the message receiving unit 1101 is further configured to receive a second congestion notification message that carries the destination queue pair number, where the second congestion notification message is generated by the network node when receiving the first data packet and detecting network congestion;
- the number obtaining unit 1103 is configured to obtain a source queue pair number corresponding to the destination queue pair number according to the destination queue pair number;
- the number substitution unit 1104 is configured to replace the destination queue pair number in the second congestion notification message with the source queue pair number to obtain the first congestion notification message.
- the message sending unit 1102 is further configured to send the first congestion notification message to the sending end, where the first congestion notification message is used to notify the sending end of the queue pair of the destination queue pair number and the sending end of the first congestion notification message.
- the sending rate of the data stream to which the first data packet belongs is reduced.
- the number obtaining unit 1103 is configured to, according to the destination queue pair number, search for a source queue pair corresponding to the destination queue pair number from the pre-established queue pair tracking table, and the queue pair tracking table Each entry stores the correspondence between the destination queue pair number and the source queue pair number.
- the number obtaining unit 1103 is further configured to track a connection request message and a connection response message sent by the sending end and the receiving end during the connection establishment process; and extract the basic transmission of the connection request message.
- the destination queue pair number in the header BTH is extracted, and the destination queue pair number in the BTH of the connection response message is extracted; the destination queue pair number extracted from the connection request message is used as the destination queue pair number, and the connection response message is received.
- the destination queue pair number extracted in the source number is used as the source queue pair number, and the correspondence between the source queue pair number and the destination queue pair number is recorded, and the queue pair tracking table is obtained.
- the first congestion notification message includes a queue depth of a queue to which the congestion time data stream belongs in the network node
- the message sending unit 1102 is further configured to determine a sending period of the first congestion notification message according to the queue depth, and send the first congestion notification message to the sending end according to the sending period.
- the computing device 1200 includes a processor 1201, a memory 1202, a communication interface 1203, and a bus 1204.
- the processor 1201, the memory 1202, and the communication interface 1203 are directly connected by a bus 1204.
- the computing device 1200 can be a proxy node.
- the computing device 1200 is configured to perform the method for advertising network congestion performed by the proxy node in FIG. 8 above. specifically,
- the memory 1202 is configured to store computer instructions
- the processor 1201 calls the computer instructions stored in the memory 1202 via the bus 1204 for performing the following operations:
- the destination queue pair number of the first congestion notification message is the number of the source queue pair, and the first congestion notification message is used to notify the sender of the number of the queue pair of the destination queue pair number and the sender end of the first congestion notification message.
- the sending rate of the data stream to which the first data packet belongs is reduced.
- the processor 1201 invokes the computer instructions stored in the memory 1202 via the bus 1204 for performing the following operations:
- the source queue pair corresponding to the destination queue pair number is searched from the pre-established queue pair tracking table, and the queue pair stores the destination queue pair number and the source queue pair number between each entry of the tracking table.
- the processor 1201 invokes the computer instructions stored in the memory 1202 via the bus 1204 for performing the following operations:
- the destination queue pair number extracted from the connection request packet is used as the destination queue pair number
- the destination queue pair number extracted from the connection response packet is used as the source queue pair number
- the extracted source queue pair number and destination queue are established.
- the queue pair tracking table is obtained.
- the processor 1201 invokes the computer instructions stored in the memory 1202 via the bus 1204 for performing the following operations:
- the first part is added to the checksum field of the UDP header of the first data packet, and the second part is added to the reserved field of the BTH of the first data packet to obtain a second data packet.
- the first congestion notification message includes a queue queue depth to which the congestion time data stream belongs in the network node, and the processor 1201 calls the computer instruction stored in the memory 1202 through the bus 1204 for execution. The following operations:
- the first congestion notification message is sent to the sending end according to the sending period.
- the memory 1202 includes a computer storage medium.
- Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
- RAM random access memory
- ROM read only memory
- EPROM Erasable programmable read-only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory or other solid state storage technologies
- the computing device 1200 may also be connected to a remote computer on a network via a network such as the Internet. That is, computing device 1200 can be connected to the network via network interface unit 1205 coupled to said bus 1204, or network interface unit 1205 can be used to connect to other types of networks or remote computer systems (not shown).
- the computing device 1200 can perform not only the notification method of network congestion performed by the proxy node in FIG. 8, but also the notification method of network congestion performed by the proxy node in FIG. The embodiments of the present disclosure are not described again.
- Embodiments of the present disclosure provide a storage medium having at least one instruction stored therein, the at least one instruction being loaded by a processor and executed to implement a network congestion notification method as shown in FIG. 8 or FIG.
- the network congestion notification system provided by the foregoing embodiment only exemplifies the division of each functional module when the network congestion is advertised. In actual applications, the foregoing functions may be allocated differently according to requirements.
- the function module is completed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- the network congestion notification system provided by the foregoing embodiment is the same as the network congestion notification method embodiment, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
- a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
- the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本公开提供了一种网络拥塞的通告方法、代理节点及计算机设备,属于数据通讯领域。代理节点在接收到第一数据报文时,将源队列对编号添加到第一数据报文中,得到第二数据报文,并通过网络节点将第二数据报文发送至接收端,在第二数据报文的转发过程中,如果检测到网络拥塞,网络节点生成携带源队列对编号的第一拥塞通告报文,并将第一拥塞通告报文发送至代理节点,进而由代理节点将第一拥塞通告报文发送至发送端,以使发送端降低第一数据报文所属的数据流的发送速率。由于本公开中网络节点在检测到网络发生拥塞时,立即向发送端发送第一拥塞通告报文,使得发送端能够及时降低第一数据报文所属的数据流的发送速率,从而避免丢失该数据流的后续报文。
Description
本申请要求于2017年8月11日提交中国专利局、申请号为201710687388.0、发明名称为“网络拥塞的通告方法、代理节点及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本公开涉及数据通讯技术领域,特别涉及一种网络拥塞的通告方法、代理节点及计算机设备。
远程直接数据存取(remote direct memory access,RDMA)为一种数据传输技术,由于无需借助计算机的处理器、高速缓存或者操作系统等,通过网络即可从其他计算机的内存中直接读取数据,因而降低了网络传输中的数据处理延迟。然而,RDMA数据中心通常采用克劳斯(CLOS)方式进行组网,网络中交换机上下行接口带宽不对称,当数据在网络中传输时,经常会发生网络拥塞,进而影响RDMA的通讯性能。为了避免网络拥塞,提升RDMA通讯性能,需要及时地对网络拥塞进行通告。
当前对网络拥塞进行通告时,主要采用如下方法:在连接管理(connection management,CM)过程中,发送端和接收端分别获取对端的队列对编号;当需要向接收端发送数据时,发送端基于链路层传输协议,生成第一数据报文,该第一数据报文的目的队列对编号为接收端的队列对编号,该第一数据报文的网络互联协议(Internet Protocol,IP)头部设置有显示拥塞通知(Explicit Congestion Notification,ECN)位,该ECN位用于标识第一数据报文具有ECN能力,且该ECN位上的不同数值表明网络的不同状态;发送端将第一数据报文发送至网络节点;当接收到第一数据报文且检测到网络拥塞时,网络节点将第一数据报文IP头部中ECN位上的数值更改为拥塞状态对应的数值,得到处理后的第一数据报文,并将处理后的第一数据报文发送至接收端;当接收到处理后的第一数据报文并读取到ECN位上的数值为拥塞状态对应的数值,接收端获取源(发送端)队列对编号,生成第一拥塞通告报文,该第一拥塞通告报文的目的队列对编号为该源(发送端)队列对编号,接收端通过网络将该第一拥塞通告报文发送至发送端,该第一拥塞通告报文用于通知发送端获取该第一拥塞通过报文的目的队列对编号,并在该第一拥塞通告报文的目的队列对编号与发送端的队列对编号相同时,减小第一数据报文所属的数据流的发送速率。
上述方案中,当检测到网络拥塞时,在发送端与接收端之间的传输链路上的各个网络节点无法向发送端发送第一拥塞通告报文,只能由接收端进行发送,因而发送端无法及时减小第一数据报文所属的数据流的发送速率,使得第一数据报文所属的数据流的后续报文在传输过程中丢失。
发明内容
为了解决现有技术的问题,本公开实施例提供了一种网络拥塞的通告方法、代理节 点及计算机设备。所述技术方案如下:
第一方面,提供了一种网络拥塞的通告方法,所述方法包括:
在数据传输过程中,代理节点接收发送端的第一数据报文,该第一数据报文携带目的队列对编号。代理节点根据第一数据报文中的目的队列对编号,获取第一数据报文的源队列对编号,并将该源队列对编号添加到第一数据报文中,得到第二数据报文,进而通过网络节点将第一数据报文发送至接收端。在对第二数据报文的转发过程中,当检测到网络拥塞时,网络节点根据第二数据报文,生成第一拥塞通告报文,进而将该第一拥塞通告报文发送至代理节点,该第一拥塞通告报文的目的队列对编号实际上为第二数据报文的源队列对编号。当接收到第一拥塞通告报文时,代理节点将第一拥塞通告报文发送至发送端,以使发送端在第一拥塞通告报文的目的队列对编号与发送端的队列对编号相同时,降低第一数据报文所属的数据流的发送速率。
本公开实施例示出的方案,在第一数据报文的传输过程中,代理节点将源队列对编号添加到第一数据报文中,当检测到网络拥塞时,网络节点根据该第一数据报文,生成第一拥塞通告报文,进而借助代理节点的转发将第一拥塞通告报文发送至发送端,使得发送端能够及时降低第一数据报文所属的数据流的发送速率,以避免丢失该数据流的后续报文。
在本公开的第一种可能实现方式中,代理节点维护一个队列对跟踪表,该队列对跟踪表的每个表项存储有目的队列编号与源队列对编号之间的对应关系,因而当接收到第一数据报文时,代理节点可根据目的队列对编号,从队列对跟踪表中,查找到目的队列对编号对应的源队列对编号,该所查找到的源队列对编号即为第一数据报文的源队列对编号。
本公开实施例示出的方案,提供了一种源对队列对编号的获取方式。
在本公开的第二种可能实现方式中,为了实现接收端和发送端之间的数据传输,发送端和接收端需要预先建立连接,在连接建立过程中,代理节点可跟踪发送端所发送的连接请求报文,并将从连接请求报文的基础传输头部(Base Transport Header,BTH)中提取的目的队列对编号作为目的队列对编号,同时代理节点还可跟踪接收端所发送的连接应答报文,并将从连接应答报文的BTH中提取的目的队列对编号作为源队列对编号,进而通过记录源队列对编号与目的队列对编号之间的对应关系,得到队列对跟踪表。
本公开实施例示出的方案,代理节点通过跟踪发送端和接收端在连接建立过程中所发送的报文,建立队列对跟踪表,从而在接收到第一数据报文时,能够根据第一数据报文的目的队列对编号,查找到对应的源队列对编号。
在本公开的第三种可能实现方式中,代理节点在获取到源队列对编号后,可将源队列对编号拆分为第一部分和第二部分,并将源队列对编号的第一部分添加到第一数据报文的用户数据报协议(User Datagram Protocol,UDP)头部的校验和字段中,将源队列对编号的第二部分添加到第一数据报文的基础传输头部BTH的预留字段中,从而得到第二数据报文。
本公开实施例示出的方案,代理节点通过将源队列对编号扩展到第一数据报文中,使得第一数据报文中携带有源队列对编号,从而在网络拥塞时,可根据该源队列对编号快速地将第一拥塞通告报文发送至发送端,提高了第一拥塞通告报文的传输速率。
在本公开的第四种可能实现方式中,当接收到第二数据报文并检测到网络拥塞时,网络节点解析第二数据报文,得到第二数据报文的源媒体访问控制(Media Access Control,MAC)地址、第二数据报文的目的MAC地址、第二数据报文的源网络之间的互联协议(Internet Protocol,IP)地址、第二数据报文的目的IP地址及第二数据报文的源队列对编号,基于解析结果,网络节点通过将第二数据报文的源MAC地址作为第一拥塞通告报文的目的MAC地址、将第二数据报文的目的MAC地址作为第一拥塞通告报文的源MAC地址、将第二数据报文的源IP地址作为第一拥塞通告报文的目的IP地址、将第二数据报文的目的IP地址作为第一拥塞通告报文的源IP地址、将第二数据报文的源队列对编号作为第一拥塞通告报文的目的队列对编号,得到第一拥塞通告报文。
在本公开的第五种可能实现方式中,第一拥塞通告报文包括拥塞时刻数据流在网络节点中所属的队列的队列深度,基于该队列深度,代理节点能够确定出第一拥塞通告报文的发送周期,进而在后续对第一拥塞通告报文的发送过程中,可按照确定的发送周期进行发送。
本公开实施例示出的方案,代理节点根据拥塞时刻数据流在网络节点中所属的队列的队列深度,可获知当前网络的拥塞程度,进而根据网络的拥塞程度所确定的发送周期,向发送端发送第一拥塞通告报文,使得发送端能够根据网络的拥塞程度确定第一数据报文所属数据流的发送速率,从而在避免网络拥塞的前提下,采用最大的发送速率发送第一数据报文,以降低数据传输过程中的时延。
第二方面,提供了一种网络拥塞的通告方法,所述方法包括:
在数据传输过程中,代理节点接收发送端发送的携带目的队列对编号的第一数据报文,并通过网络节点将第一数据报文发送至接收端。在对第二数据报文的转发过程中,当检测到网络拥塞时,网络节点根据第一数据报文,生成第二拥塞通告报文,并将第二拥塞通告报文发送至代理节点,该第二拥塞通告报文的目的队列对编号实际上为第一数据报文的目的队列对编号。当接收到第二拥塞通告报文时,代理节点根据第二拥塞通告报文的目的队列对编号,获取该目的队列对编号对应的源队列对编号,并用该源队列对编号替换第二拥塞通告报文中的目的队列对编号,得到第一拥塞通告报文,进而将第一拥塞通告报文发送至发送端,以使发送端在第一拥塞通告报文的目的队列对编号与发送端的队列对编号相同时,降低第一数据报文所属的数据流的发送速率。
本公开实施例示出的方案,在第一数据报文的传输过程中,当检测到网络拥塞时,网络节点生成第二拥塞通告报文,并将第二拥塞通告报文发送至代理节点,代理节点采用第一数据报文的源队列对编号替换第二拥塞通告报文的目的队列对编号后,得到第一拥塞通告报文,进而将第一拥塞通告报文发送至发送端,使得发送端能够及时降低第一数据报文所属的数据流的发送速率,以避免丢失该数据流的后续报文。
在本公开的第一种可能实现方式,当接收到第一数据报文并检测到网络拥塞时,网络节点解析第一数据报文,得到第一数据报文的源媒体访问控制MAC地址、第一数据报文的目的MAC地址、第一数据报文的源网络之间的互联协议IP地址、第一数据报文的目的IP地址及第一数据报文的目的队列对编号,基于解析结果,网络节点通过将第一数据报文的源MAC地址作为第二拥塞通告报文的目的MAC地址、将第一数据报文 的目的MAC地址作为第二拥塞通告报文的源MAC地址、将第一数据报文的源IP地址作为第二拥塞通告报文的目的IP地址、将第一数据报文的目的IP地址作为第二拥塞通告报文的源IP地址、将第一数据报文的目的队列对编号作为第二拥塞通告报文的目的队列对编号,可得到第二拥塞通告报文。
在本公开的第二种可能实现方式,代理节点维护一个队列对跟踪表,该队列对跟踪表的每个表项存储有目的队列编号与源队列对编号之间的对应关系,因而当接收到第一数据报文时,代理节点可根据目的队列对编号,从队列对跟踪表中,查找到目的队列对编号对应的源队列对编号,该所查找到的源队列对编号即为第一数据报文的源队列对编号。
本公开实施例示出的方案,提供了一种源对队列对编号的获取方式。
在本公开的第三种可能实现方式,为了实现接收端和发送端之间的数据传输,发送端和接收端需要预先建立连接,在连接建立过程中,代理节点可跟踪发送端所发送的连接请求报文,并将从连接请求报文的基础传输头部中提取的目的队列对编号作为目的队列对编号,同时代理节点还可跟踪接收端所发送的连接应答报文,并将从连接应答报文的BTH中提取的目的队列对编号作为源队列对编号,进而建立源队列对编号与目的队列对编号之间的对应关系,得到队列对跟踪表。
本公开实施例示出的方案,代理节点通过跟踪发送端和接收端在连接建立过程中所发送的报文,建立队列对跟踪表,从而在接收到第一数据报文时,能够根据第一数据报文的目的队列对编号,查找到对应的源队列对编号。
在本公开的第四种可能实现方式,第一拥塞通告报文包括拥塞时刻数据流在网络节点中所属的队列的队列深度,基于该队列深度,代理节点能够确定出第一拥塞通告报文的发送周期,进而在后续对第一拥塞通告报文的发送过程中,可按照确定的发送周期进行发送。
本公开实施例示出的方案,代理节点根据拥塞时刻数据流在网络节点中所属的队列的队列深度,可获知当前网络的拥塞程度,进而根据网络的拥塞程度所确定的发送周期,向发送端发送第一拥塞通告报文,使得发送端能够根据网络的拥塞程度确定第一数据报文所属数据流的发送速率,从而在避免网络拥塞的前提下,采用最大的发送速率发送第一数据报文,以降低数据传输过程中的时延。
第三方面,提供了一种用于对网络拥塞进行通告的代理节点,该代理节点包括用于实现第一方面所述的网络拥塞的通告方法的单元,例如,报文接收单元、编号获取单元、编号添加单元及报文发送单元。
第四方面,提供了一种用于对网络拥塞进行通告的代理节点,该代理节点包括用于实现第二方面所述的网络拥塞的通告方法的单元,例如,报文接收单元、报文发送单元、编号获取单元、编号替换单元。
第五方面、提供了一种计算机设备,包括:存储器、处理器、通信接口及总线;
其中,存储器、处理器及通信接口通过总线连接,存储器用于存放计算计算机指令, 处理器用于执行存储器存储的计算机指令;当计算机设备运行时,处理器运行计算机指令,使得计算机设备执行第一方面所述的网络拥塞的通告方法。
第六方面、提供了一种计算机设备,包括:存储器、处理器、通信接口及总线;
其中,存储器、处理器及通信接口通过总线连接,存储器用于存放计算计算机指令,处理器用于执行存储器存储的计算机指令;当计算机设备运行时,处理器运行计算机指令,使得计算机设备执行第二方面所述的网络拥塞的通告方法。
第七方面,提供了一种存储介质,所述存储介质中至少存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如第一方面所述的网络拥塞的通告方法。
第八方面,提供了一种存储介质,所述存储介质中至少存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如第二方面所述的网络拥塞的通告方法。
图1是本公开实施例示出的一种RDMA通讯流程的示意图;
图2是本公开实施例示出的一种RDMA协议栈的示意图;
图3是本公开实施例示出的一种CLOS组网结构的示意图;
图4是本公开实施例示出的一种RDMA拥塞场景的示意图;
图5是本公开实施例示出的提供的网络拥塞的通告方法所涉及的实施环境;
图6是本公开实施例示出的代理节点在系统中的位置示意图;
图7是本公开实施例示出的代理节点在系统中的位置示意图;
图8是本公开实施例提供的一种网络拥塞的通告方法的流程图;
图9是本公开实施例提供的一种网络拥塞的通告方法的流程图;
图10是本公开实施例提供的一种用于对网络拥塞进行通告的代理节点的结构示意图;
图11是本公开实施例提供的一种用于对网络拥塞进行通告的代理节点的结构示意图;
图12是本公开实施例提供的一种计算计设备的结构示意图。
为使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开实施方式作进一步地详细描述。
为了解决网络传输过程中服务器端数据处理的延迟,本公开实施例采用RDMA技术进行数据传输。由于RDMA技术无需借助计算机的处理器,通过网络即可将数据直接传入到计算机的存储区,因而改进了系统性能。
图1为RDMA的通讯流程的示意图,参见图1,RDMA的通讯涉及两个计算机系统,其中,一个计算机系统包括服务器A和通道适配卡(channel adapter,CA)A,另一个计算机系统包括服务器B和通道适配卡B。每个通道适配卡被划分为传输层、网络 层、链路层及物理层等。
在每个计算机系统内部,服务器的中央处理器(central processing unit,CPU)通过队列对(queue pair,QP)与通道适配卡进行通讯。该队列对包括发送队列(send queue)和接收队列(receive queue)。其中,发送队列用于CPU向通道适配卡发送命令,该命令称为工作队列元素(work queue element,WQE);接收队列用于CPU接收通道适配卡的命令,该命令称为完成队列元素(complete queue element,CQE)。当执行完CPU发送的WQE,通道适配卡通过向CPU发送CQE反馈完成情况。
在两个计算机系统之间,服务器A通过端口与网络节点的端口建立物理连接,并基于所建立的连接,与网络节点通讯;服务器B通过端口与网络节点的端口建立物理连接,并基于所建立的连接,与网络节点进行通讯。网络内的各个节点,可通过端口彼此建立连接,并基于所建立的连接进行报文的转发。基于服务器A与网络节点之间、网络节点与网络节点之间、网络节点与服务器B之间所连接网络连接,通道适配卡A和通道适配卡B通过彼此发送携带目的队列对编号的无限带宽(Infiniband,IB)报文进行通讯。
图2为RDMA的协议栈的示意图,由图2可知,RDMA采用的是应用层协议,其传输层协议可以为Infiniband传输层协议等,其网络层协议可以为Infiniband网络层协议、UDP/IP协议、其底层链路层协议可以为Infiniband链路层协议、基于以太网的RoCE协议及基于以太网的RoCEv2协议等。
其中,Infiniband协议为一种分层协议(类似TCP/IP协议),每层负责不同的功能,下层为上层服务,且不同层次相互独立。Infiniband协议、基于以太网的RoCE协议及基于以太网的RoCEv2协议的RDMA协议报文均包括基础传输头(base transport header,BTH)和IB净荷(payload)。
其中,BTH的长度为12比特,主要包括包序列号(packet sequence number,PSN),目的队列对(destination QP)和报文操作码(packet OPcode)。其中,目的队列对用于指示接收端的QP编号。
目前RDMA采用CLOS架构进行组网,如图3所示为应用spine-leaf方式组建的CLOS架构的数据中心网络。采用CLOS架构进行组网时,由于交换机上下行接口带宽不对称,且上行接口(交换机中用于连接路由器或上一级交换机的接口)的数量小于下行接口(交换机中用于连接主机的接口)的数量,因此,在数据传输过程中,会存在网络拥塞的情况。图4(a)和图4(b)示出了RDMA网络拥塞的场景图,图中交换机每个上行接口带宽为40GE,每个下行接口带宽为10GE。参见图4(a),在接收端向发送端发送数据的场景下,当通过5个下行接口接收的数据都发向一个上行接口时,将发生网络拥塞。参见图4(b),在发送端向接收端发送数据的场景下,当通过一个上行接口向一个下行接口发送数据时,也将发生网络拥塞。当发生网络拥塞时,RDMA通讯性能会急剧下降。因此,为了提升RDMA通讯性能,在采用RDMA技术进行数据传输时,需要对网络拥塞进行通告。现有的网络拥塞的通告方法在检测到网络拥塞时,将数据报文发送至接收端,由接收端生成拥塞通告报文,进而将生成的拥塞通告报文发送至发送端。由于在检测到网络拥塞时,发送端未能及时降低数据报文的发送速率,使得网络设备缓存溢出,进一步导致数据在传输过程中丢失。
为了及时通告发送端网络状态,本公开实施例提供了一种网络拥塞的通告方法,该方法中网络节点在接收到数据报文且检测到网络拥塞时,通过代理节点向发送端发送拥 塞通告报文,以通知发送端当前的网络状态,使得发送端减小数据报文的发送速率,以缓解网络拥塞状态,从而达到避免数据丢失的目的。
图5示出了本公开实施例提供的网络拥塞的通告方法所涉及的系统,参见图5,该系统包括:发送端、代理节点、网络节点及接收端。
其中,发送端和接收端可以为网络中的计算机,且该发送端和接收端地位并不是固定不变的,发送端可能为下一次数据传输过程中的接收端,接收端可能为下一次数据传输过程中的发送端。
网络节点可以为网络中的交换机、路由器等,本公开实施例不对网络节点的类型作具体的限定。
代理节点为网络中的硬件单元,可位于发送端的内部,也可以位于网络节点的内部。参见图6,当代理节点位于发送端的内部时,其部署形式可以为网络接口卡(network interface card,NIC)特定用途集成电路(application-specific integrated circuit,ASIC)上的硬件逻辑、NIC上的微码或者NIC现场可编程逻辑门电路(field-programmable gate array,FPGA)芯片的逻辑代码。图6中虚线示出了代理节点的工作过程,即在发送端内部获取到接收端的数据报文后,将接收端的数据报文发送至发送端的RDMA应用(application,APP);在发送端内部获取到发送端的数据报文后,通过网络将该数据报文发送至接收端。参见图7,当代理节点位于网络节点的内部时,其部署形式可以为交换机ASIC上的硬件逻辑、交换机芯片外挂通知点(notification point,NP)上的微码或者交换机芯片外挂FPGA芯片的逻辑代码。图7中的虚线示出了代理节点的工作过程,即获取到接收端的数据报文后,通过网络将接收端的数据报文发送至发送端;获取到发送端的数据报文后,通过网络将发送端的数据报文发送至接收端。
基于图5所示的系统,本公开实施例提供了一种网络拥塞的通告方法,参见图8,本公开实施例提供的方法流程包括:
801、在发送端向接收端发送第一数据报文的过程中,代理节点接收发送端的第一数据报文。
当发送端需要向接收端发送数据时,发送端获取发送端的媒体访问控制(media access control,MAC)地址、发送端的IP地址、接收端的MAC地址、接收端的IP地址及接收端目的队列对编号,并基于RDMA报文格式,生成标准的RDMA数据报文。为了便于查看当前的网络状态,发送端还将在RDMA数据报文的IP头部设置ECN位,该ECN位用于标识RDMA发送端具有ECN能力,且该ECN位上的不同数值表明网络的不同状态,例如,可用数值10表明网络处于正常状态,可用数值11表明网络处于拥塞状态等。初始时刻,默认网络状态为正常状态,也即是ECN位上的数值为10。下述表1为所生成的标准RDMA数据报文的报文格式:
表1
其中,MAC header为MAC头部字段;IP header为IP头部字段,该IP header的ECN标识位上的数值为10;IB transport header为IB传输头部字段;Data为待发送的数据;iCRC为数据传输检错字段;Eth帧校验序列(frame check sequence,FCS)为校验字段,用于存储iCRC校验值。
需要说明的是,由于本公开涉及到多个RDMA数据报文,为了便于将不同的RDMA数据报文区分开,可将发送端生成的RDMA数据报文,称为第一数据报文,将代理节点根据第一数据报文得到的新数据报文,称为第二数据报文。其中,第一数据报文携带源MAC地址(发送端的MAC地址)、源IP地址(发送端的IP地址)、目的MAC地址(接收端的MAC地址)、目的IP地址(接收端的IP地址)、目的队列对编号等。
在发送端通过网络将第一数据报文发送至接收端的过程中,第一数据报文会经过发送端及网络中的各个网络节点,而代理节点位于发送端内部或网络节点内部,因此,代理节点可接收到第一数据报文,进而处理接收到的第一数据报文。
802、代理节点根据目的队列对编号,获取第一数据报文的源队列对编号。
当接收到第一数据报文时,代理节点通过解析第一数据报文,可得到第一数据报文的目的队列对编号,进而根据该目的队列对编号,获取到第一数据报文的源队列对编号。代理节点根据目的队列对编号,获取第一数据报文的源队列对编号时,包括但不限于如下方式:代理节点根据目的队列对编号,从预先建立的队列对跟踪表中,查找目的队列对编号对应的源队列对编号。其中,队列对跟踪表的每个表项存储目的队列对编号与源队列对编号之间的对应关系。
由于队列对跟踪表为代理节点获取第一数据报文的源队列对编号的关键,因此,在获取第一数据报文的源队列对编号之前,代理节点需要建立队列对跟踪表。队列对跟踪表的建立过程如下:
第一步,代理节点跟踪每对发送端和接收端在连接建立过程中所发送的连接请求报文和连接应答报文。
在互联网技术领域,发送端在将待发送的数据包发送至接收端之前,首先要建立连接,并在建立连接的过程中,获取对端的相关信息。具体地,发送端会向接收端发送连接请求报文,该连接请求报文中携带发送端的MAC地址、发送端的IP地址、发送端的队列对编号等,当接收到连接请求报文时,接收端通过解析连接请求报文,可得到发送端的MAC地址、发送端的IP地址、发送端的队列对编号等,同时接收端也会向发送端发送连接应答报文,该连接应答报文中携带接收端的MAC地址、接收端的IP地址、接收端的队列对编号等,当接收到连接应答报文时,发送端通过解析连接应答报文,可得到接收端的MAC地址、接收端的IP地址、接收端的队列对编号等。
在发送端和接收端建立连接的过程中,连接请求报文和连接应答报文会经过发送端、网络中的各个网络节点及接收端,而代理节点位于发送端内部或网络节点内部,因此,代理节点能够跟踪发送端和接收端在彼此探测的过程中所发送的连接请求报文和连接应答报文,进而建立队列对跟踪表。
第二,代理节点提取连接请求报文的BTH中的目的队列对编号,并提取连接应答报文的BTH中的目的队列对编号。
由于本公开实施例中的连接请求报文和连接应答报文中均具有基础传输头部BTH, 该BTH中携带目的队列对编号,因此,代理节点通过解析连接请求报文的BTH,可得到连接请求报文的目的队列对编号,并通过解析连接应答报文的BTH,可得到连接应答报文的目的队列对编号。
第三步,代理节点将从连接请求报文中提取的目的队列对编号作为目的队列对编号,将从连接应答报文中提取的目的队列对编号作为源队列对编号,并记录该源队列对编号与目的队列对编号之间的对应关系,得到队列对跟踪表。
例如,在发送端1和接收端1的连接建立过程中,如果代理节点从发送端1的连接请求报文的BTH中提取的目的队列对编号为0X1010A1,从接收端1的连接应答报文的BTH中提取的目的队列对编号为0X1010F1,则代理节点可将0X1010A1作为目的队列对编号,将0X1010F1作为源队列对编号,并记录所提取的源队列对编号0X1010F1与目的队列对编号0X1010A1之间的对应关系。在发送端2和接收端2的连接建立过程中,如果代理节点从发送端2的连接请求报文的BTH中提取的目的队列对编号为0X1010A2,从接收端2的连接应答报文的BTH中提取的目的队列对编号为0X1010F2,则代理节点可将0X1010A2作为目的队列对编号,将0X1010F2作为源队列对编号,并记录所提取的源队列对编号0X1010F2与目的队列对编号0X1010A2之间的对应关系。通过跟踪不同发送端和接收端之间的连接建立过程,可建立表2所示的队列对跟踪表。
表2
源队列对编号(24bit) | 目的队列对编号(24bit) |
0X1010F1 | 0X1010A1 |
0X1010F2 | 0X1010A2 |
在本公开的另一个实施例中,为了更精准地获取到第一数据报文的源队列对编号,代理节点在建立队列对跟踪表时,还可获取发送端的MAC地址、接收端的MAC地址、发送端的IP地址、接收端的IP地址、发送端与接收端之间的传输层协议等,进而基于发送端的队列对编号、接收端的队列对编号、发送端的MAC地址、接收端的MAC地址、发送端的IP地址、接收端的IP地址、传输层协议,建立队列对跟踪表。当接收到第一数据报文时,代理节点通过解析第一数据报文,可得到第一数据报文的目的队列对编号、源MAC地址、目的MAC地址、源IP地址、目的IP地址、传输层协议等,进而从队列对跟踪表中查找与目的队列对编号、源MAC地址、目的MAC地址、源IP地址、目的IP地址、传输层协议对应的源队列对编号,并将查找到的源队列对编号作为第一数据报文的源队列对编号。通过采用该种处理方式,可在网络中存在一个发送端与一个接收端、一个发送端与至少两个接收端以及两个发送端和一个接收端的场景下,能够准确地查找到对应的源队列对编号。
803、代理节点将源队列对编号添加到第一数据报文中,得到第二数据报文,并通过网络节点向接收端发送第二数据报文。
获取到源队列对编号后,代理节点通过将源队列对编号添加到第一数据报文中,可得到第二数据报文。采用该种处理方式,第二数据报文中也将携带源队列对编号,从而在第二数据报文的传输过程中,当网络节点检测到网络拥塞时,能够根据第二数据报文中携带的源队列对编号,快速确定发送端。
代理节点在将源队列对编号添加到第一数据报文中,得到第二数据报文时,可采用如下步骤:
第一步、代理节点将源队列对编号拆分为第一部分和第二部分。
其中,第一部分的长度和第二部分的长度可以相同,也可以不同,只需保证第一部分的长度和第二部分的长度之和等于源队列对编号的长度即可。
第二步、代理节点将第一部分添加到第一数据报文的UDP头部的校验和字段中,并将第二部分添加到第一数据报文的基础传输头部BTH的预留字段中,得到第二数据报文。
例如,源队列对编号的长度为16比特,代理节点将该源队列对编号拆分为8比特的第一部分和8比特的第二部分,并将8比特的第一部分添加到第一数据报文的UDP头部的校验和字段(UDP Checksum)中,并将8比特的第二部分添加到第一数据报文的BTH的预留字段(Reserved)中,得到第二数据报文。
804、当接收到第二数据报文并检测到网络拥塞时,网络节点根据第二数据报文,生成第一拥塞通告报文,并将第一拥塞通告报文发送至代理节点。
当第二数据报文传输到网络中的某个网络节点时,该网络节点检测到网络拥塞,该网络节点将根据第二数据报文生成第一拥塞通告报文,该第一拥塞通告报文的ECN位上的数值为拥塞状态对应的数值,进而将第一拥塞通告报文发送至代理节点,由代理节点发送至发送端,以通知发送端当前的网络状态,从而使发送端降低第一数据报文所属的数据流的发送速率,同时为了保证数据报文的正常传输,该网络节点还继续向接收端转发该第二数据报文。
网络节点在根据第二数据报文,生成第一拥塞通告报文时,可采用如下步骤:
第一步、网络节点解析第二数据报文,得到第二数据报文的源MAC地址、第二数据报文的目的MAC地址、第二数据报文的源IP地址、第二数据报文的目的IP地址及第二数据报文的源队列对编号。
由于第二数据报文中携带第二数据报文的源MAC地址、第二数据报文的目的MAC地址、第二数据报文的源IP地址、第二数据报文的目的IP地址及第二数据报文的源队列对编号,因此,网络节点通过解析第二数据报文,可得到第二数据报文的源MAC地址、第二数据报文的目的MAC地址、第二数据报文的源IP地址、第二数据报文的目的IP地址及第二数据报文的源队列对编号。
第二步、网络节点将第二数据报文的源MAC地址作为第一拥塞通告报文的目的MAC地址、将第二数据报文的目的MAC地址作为第一拥塞通告报文的源MAC地址、将第二数据报文的源IP地址作为第一拥塞通告报文的目的IP地址、将第二数据报文的目的IP地址作为第一拥塞通告报文的源IP地址、将第二数据报文的源队列对编号作为第一拥塞通告报文的目的队列对编号,得到第一拥塞通告报文。
基于RDMA报文格式,网络节点通过将第二数据报文的源MAC地址作为第一拥塞通告报文的目的MAC地址、将第二数据报文的目的MAC地址作为第一拥塞通告报文的源MAC地址、将第二数据报文的源IP地址作为第一拥塞通告报文的目的IP地址、将第二数据报文的目的IP地址作为第一拥塞通告报文的源IP地址、将第二数据报文的源队列对编号作为第一拥塞通告报文的目的队列对编号,可得到第一拥塞通告报文。该 第一拥塞通告报文的预留字段中还可以携带网络节点标识和拥塞时刻第二数据报文所属的数据流在该网络节点中所属的队列的队列深度等。
805、当代理节点接收到第一拥塞通告报文时,代理节点将第一拥塞通告报文发送至发送端。
当接收到第一拥塞通告报文时,代理节点根据第二数据报文所属的数据流(即第一数据报文所属的数据流)在网络节点中所属的队列的队列深度,确定第一拥塞通告报文的发送周期,进而按照所确定的发送周期,将接收到的第一拥塞通告报文发送至发送端。考虑到发生网络拥塞的网络节点至少为一个,且每个发生网络拥塞的网络节点在接收到第二数据报文时,均会向代理节点返回一个第一拥塞通告报文,代理节点根据至少一个第一拥塞通告报文,选取最深的队列深度,并根据最深的队列深度确定第一拥塞报文的发送周期。具体地,代理节点所确定的第一拥塞通告报文的发送周期与所选取的队列深度成反比,如果所选取的队列深度大于预设长度,说明网络拥塞程度较为严重,则可确定的第一拥塞通告报文的发送周期为第一周期;如果队列深度小于预设长度,说明网络拥塞程度较轻,则可确定的第一拥塞通告报文的发送周期为第二周期。其中,预设长度可根据经验值确定,该第一周期小于第二周期。
806、当发送端接收到第一拥塞通告报文时,发送端降低第一数据报文所属的数据流的发送速率。
实际上,第一拥塞通告报文的数量及第一拥塞通告报文的发送周期均能反映出当前的网络状态,因此,当接收到至少一个第一拥塞通告报文时,发送端可根据接收到的第一拥塞通告报文的数量及第一拥塞通告报文的发送周期,采用拥塞控制算法,确定第一数据报文所属的数据流的发送速率,进而根据所确定的发送速率发送所述数据流的后续报文,以缓解网络拥塞状态。
本公开实施例提供的方法,代理节点在接收到第一数据报文时,将源队列对编号添加到第一数据报文中,得到第二数据报文,并将第二数据报文发送至网络节点,并通过网络节点将第二数据报文发送至接收端,在第二数据报文的转发过程中,如果检测到网络拥塞,网络节点根据第二数据报文,生成第一拥塞通告报文,将第一拥塞通告报文发送至代理节点,由代理节点将第一拥塞通告报文发送至发送端,以使发送端降低第一数据报文所属的数据流的发送速率。由于本公开中网络节点在检测到网络发生拥塞时,立即向发送端发送第一拥塞通告报文,使得发送端能够及时降低第一数据报文所属的数据流的发送速率,从而避免丢失该数据流的后续报文。
本公开实施例提供了一种拥塞通告的方法,参见图9,本公开实施例提供的方法流程包括:
901、在发送端向接收端发送第一数据报文的过程中,代理节点接收发送端的第一数据报文。
该步骤具体实现时与上述步骤801相同,具体参见上述步骤801,此处不再赘述。
902、代理节点通过网络节点向接收端发送第一数据报文。
当接收到第一数据报文时,代理节点通过网络节点的转发,将第一数据报文发送至接收端。
903、当网络节点接收到第一数据报文并检测到网络拥塞时,网络节点根据第一数据报文,生成第二拥塞通告报文,并将第二拥塞通告报文发送至代理节点。
当第一数据报文传输到网络中的某个网络节点时,该网络节点检测到网络拥塞,该网络节点将根据第一数据报文生成第二拥塞通告报文,该第二拥塞通告报文的ECN位上的数值为拥塞状态对应的数值,进而将第二拥塞通告报文发送至代理节点,由代理节点发送至发送端,以通知发送端当前的网络状态,从而使发送端降低第一数据报文所属的数据流的发送速率,同时为了保证数据报文的正常传输,该网络节点还继续向接收端发送该第一数据报文。
网络节点在根据第一数据报文,生成第一拥塞通告报文时,可采用如下步骤:
第一步、网络节点解析第一数据报文,得到第一数据报文的源媒体访问控制MAC地址、第一数据报文的目的MAC地址、第一数据报文的源网络之间的互联协议IP地址、第一数据报文的目的IP地址及第一数据报文的目的队列对编号。
由于第一数据报文中携带第一数据报文的源MAC地址、第一数据报文的目的MAC地址、第一数据报文的源IP地址、第一数据报文的目的IP地址,因此,网络节点通过解析第一数据报文,可得到第一数据报文的源MAC地址、第一数据报文的目的MAC地址、第一数据报文的源IP地址、第一数据报文的目的IP地址。
第二步、网络节点将第一数据报文的源MAC地址作为第二拥塞通告报文的目的MAC地址、将第一数据报文的目的MAC地址作为第二拥塞通告报文的源MAC地址、将第一数据报文的源IP地址作为第二拥塞通告报文的目的IP地址、将第一数据报文的目的IP地址作为第二拥塞通告报文的源IP地址、将第一数据报文的目的队列对编号作为第二拥塞通告报文的目的队列对编号,得到第二拥塞通告报文。
基于RDMA报文格式,网络节点通过将第一数据报文的源MAC地址作为第二拥塞通告报文的目的MAC地址、将第一数据报文的目的MAC地址作为第二拥塞通告报文的源MAC地址、将第一数据报文的源IP地址作为第二拥塞通告报文的目的IP地址、将第一数据报文的目的IP地址作为第二拥塞通告报文的源IP地址、将第一数据报文的目的队列对编号作为第二拥塞通告报文的目的队列对编号,可得到第二拥塞通告报文。该第二拥塞通告报文的预留字段中还可以携带网络节点标识和拥塞时刻第一数据报文所属的数据流在该网络节点中所属的队列的队列深度等。
904、当代理节点接收到第二拥塞通告报文时,代理节点根据目的队列对编号,获取目的队列对编号对应的源队列对编号。
当接收到第二拥塞通告报文时,代理节点根据目的队列对编号,从预先建立的队列对跟踪表中,查找目的队列对编号对应的源队列对编号,该队列对跟踪表的每个表项存储目的队列对编号与源队列对编号之间的对应关系。
对于队列对跟踪表的建立过程,与上述步骤802中所述的队列对跟踪表的建立过程相同,具体参见上述步骤802中队列对跟踪表的建立过程,此处不再赘述。
905、代理节点用源队列对编号替换第二拥塞通告报文中的目的队列对编号,得到第一拥塞通告报文。
当获取到第一数据报文的源队列对编号,代理节点可直接用该源队列对编号替换第二拥塞通告报文中的目的队列对编号,得到第一拥塞通告报文。
906、代理节点将第一拥塞通告报文发送至发送端。
此步骤具体实现时与上述步骤805相同,具体参见上述步骤805,此处不再赘述。
907、当发送端接收到第一拥塞通告报文时,发送端降低第一数据报文所属的数据流的发送速率。
此步骤具体实现时与上述步骤806相同,具体参见上述步骤806,此处不再赘述。
本公开实施例提供的方法,当接收到代理节点的第一数据报文并检测到网络拥塞时,网络节点将所生成的第二拥塞通告报文发送至代理节点,由代理节点根据第二拥塞通告报文生成携带第一数据报文的源队列对编号的第一拥塞通告报文,并将该第一拥塞通告报文发送至发送端,以使发送端降低第一数据报文所属数据流的发送速率。由于本公开中网络节点在检测到网络发生拥塞时,立即向发送端发送第一拥塞通告报文,使得发送端能够及时降低第一数据报文所属的数据流的发送速率,从而避免丢失该数据流的后续报文。
本公开实施例提供了一种用于对网络拥塞进行通告的代理节点,参见图10,该代理节点包括:
报文接收单元1001,用于接收发送端的第一数据报文,该第一数据报文携带目的队列对编号;
编号获取单元1002,用于根据目的队列对编号,获取第一数据报文的源队列对编号;
编号添加单元1003,用于将源队列对编号添加到第一数据报文中,得到第二数据报文,并通过网络节点向接收端发送第二数据报文;
报文接收单元1001,还用于接收第一拥塞通告报文,该第一拥塞通告报文由网络节点在接收到所述第二数据报文并检测到网络拥塞时生成,该第一拥塞通告报文的目的队列对编号为源队列对编号;
报文发送单元1004,用于将第一拥塞通告报文发送至所述发送端,该第一拥塞通告报文用于通知发送端在第一拥塞通告报文的目的队列对编号与发送端的队列对编号相同时,降低第一数据报文所属的数据流的发送速率。
在本公开的另一个实施例中,编号获取单元1002,还用于根据目的队列对编号,从预先建立的队列对跟踪表中,查找目的队列对编号对应的源队列对编号,该队列对跟踪表的每个表项存储目的队列对编号与源队列对编号之间的对应关系。
在本公开的另一个实施例中,编号获取单元1002,还用于跟踪发送端和接收端在连接建立过程中所发送的连接请求报文和连接应答报文;提取连接请求报文的BTH中的目的队列对编号,并提取连接应答报文的BTH中的目的队列对编号;将从连接请求报文中提取的目的队列对编号作为目的队列对编号,将从连接应答报文中提取的目的队列对编号作为源队列对编号,并记录源队列对编号与目的队列对编号之间的对应关系,得到队列对跟踪表。
在本公开的另一个实施例中,编号添加单元1003,还用于将源队列对编号拆分为第一部分和第二部分;将第一部分添加到第一数据报文的UDP头部的校验和字段中,并将第二部分添加到第一数据报文的BTH的预留字段中,得到第二数据报文。
在本公开的另一个实施例中,第一拥塞通告报文包括拥塞时刻数据流在网络节点中 所属的队列的队列深度;
报文发送单元1004,还用于根据队列深度,确定第一拥塞通告报文的发送周期;按照发送周期,将第一拥塞通告报文发送至发送端。
本公开实施例提供了一种用于对网络拥塞进行通告的代理节点,参见图11,该代理节点包括:
报文接收单元1101,用于接收发送端的第一数据报文,该第一数据报文携带目的队列对编号;
报文发送单元1102,用于通过网络节点向接收端发送第一数据报文;
报文接收单元1101,还用于接收携带目的队列对编号的第二拥塞通告报文,该第二拥塞通告报文由网络节点在接收到第一数据报文并检测到网络拥塞时生成;
编号获取单元1103,用于根据目的队列对编号,获取目的队列对编号对应的源队列对编号;
编号替换单元1104,用于用源队列对编号替换第二拥塞通告报文中的目的队列对编号,得到第一拥塞通告报文;
报文发送单元1102,还用于将第一拥塞通告报文发送至发送端,该第一拥塞通告报文用于通知发送端在第一拥塞通告报文的目的队列对编号与发送端的队列对编号相同时,降低第一数据报文所属的数据流的发送速率。
在本公开的另一个实施例中,编号获取单元1103,用于根据目的队列对编号,从预先建立的队列对跟踪表中,查找目的队列对编号对应的源队列对编号,该队列对跟踪表的每个表项存储目的队列对编号与源队列对编号之间的对应关系。
在本公开的另一个实施例中,编号获取单元1103,还用于跟踪发送端和接收端在连接建立过程中所发送的连接请求报文和连接应答报文;提取连接请求报文的基础传输头部BTH中的目的队列对编号,并提取连接应答报文的BTH中的目的队列对编号;将从连接请求报文中提取的目的队列对编号作为目的队列对编号,将从连接应答报文中提取的目的队列对编号作为源队列对编号,并记录源队列对编号与目的队列对编号之间的对应关系,得到队列对跟踪表。
在本公开的另一个实施中,第一拥塞通告报文包括拥塞时刻数据流在网络节点中所属的队列的队列深度;
报文发送单元1102,还用于根据队列深度,确定第一拥塞通告报文的发送周期;按照发送周期,将第一拥塞通告报文发送至发送端。
参见图12,其示出了本公开的一个实施例中使用的计算设备1200。所述计算设备1200包括处理器1201、存储器1202、通信接口1203和总线1204。该处理器1201、存储器1202、通信接口1203通过总线1204直连。该计算设备1200可以为代理节点,当计算设备为代理节点时,该计算设备1200用于执行上述图8中代理节点所执行的网络拥塞的通告方法。具体地,
存储器1202用于存放计算机指令;
处理器1201通过总线1204调用存储器1202中存储的计算机指令,用于执行以下操 作:
通过通信接口1203接收发送端的第一数据报文,该第一数据报文携带目的队列对编号;
根据目的队列对编号,获取第一数据报文的源队列对编号;
将源队列对编号添加到第一数据报文中,得到第二数据报文,并通过网络节点向接收端发送第二数据报文;
通过通信接口1203接收第一拥塞通告报文,并将第一拥塞通告报文发送至发送端,该第一拥塞通告报文由网络节点在接收到第二数据报文并检测到网络拥塞时生成,该第一拥塞通报报文的目的队列对编号为该源队列对编号,该第一拥塞通告报文用于通知发送端在第一拥塞通告报文的目的队列对编号与发送端的队列对编号相同时,降低第一数据报文所属的数据流的发送速率。
在本公开的另一个实施例中,处理器1201通过总线1204调用存储器1202中存储的计算机指令,用于执行以下操作:
根据目的队列对编号,从预先建立的队列对跟踪表中,查找目的队列对编号对应的源队列对编号,该队列对跟踪表的每个表项存储目的队列对编号与源队列对编号之间的对应关系。
在本公开的另一个实施例中,处理器1201通过总线1204调用存储器1202中存储的计算机指令,用于执行以下操作:
跟踪发送端和接收端在连接建立过程中所发送的连接请求报文和连接应答报文;
提取连接请求报文基础传输头部的BTH中的目的队列对编号,并提取连接应答报文的BTH中的目的队列对编号;
将从连接请求报文中提取的目的队列对编号作为目的队列对编号,将从连接应答报文中提取的目的队列对编号作为源队列对编号,并建立所提取的源队列对编号与目的队列对编号之间的对应关系,得到队列对跟踪表。
在本公开的另一个实施例中,处理器1201通过总线1204调用存储器1202中存储的计算机指令,用于执行以下操作:
将源队列对编号拆分为第一部分和第二部分;
将第一部分添加到第一数据报文的UDP头部的校验和字段中,并将第二部分添加到第一数据报文的BTH的预留字段中,得到第二数据报文。
在本公开的另一个实施例中,第一拥塞通告报文包括拥塞时刻数据流在网络节点中的所属的队列队列深度,处理器1201通过总线1204调用存储器1202中存储的计算机指令,用于执行以下操作:
根据队列深度,确定第一拥塞通告报文的发送周期;
按照发送周期,将第一拥塞通告报文发送至发送端。
该存储器1202包括计算机存储介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机存储介质不局限于上述几 种。
根据本公开的各种实施例,所述计算设备1200还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算设备1200可以通过连接在所述总线1204上的网络接口单元1205连接到网络,或者说,也可以使用网络接口单元1205来连接到其他类型的网络或远程计算机系统(未示出)。
需要说明的是,当计算设备120为代理节点时,计算设备1200不仅可执行图8中代理节点所执行的网络拥塞的通告方法,还可执行图9中代理节点所执行的网络拥塞的通告方法,本公开实施例不再赘述。
本公开实施例提供了一种存储介质,该存储介质中至少存储有至少一条指令,该至少一条指令由处理器加载并执行以实现如图8或图9所示的网络拥塞的通告方法。
需要说明的是:上述实施例提供的网络拥塞的通告系统在对网络拥塞进行通告时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的网络拥塞的通告系统与网络拥塞的通告方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本公开的可选实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。
Claims (20)
- 一种网络拥塞的通告方法,其特征在于,所述方法应用于代理节点中,所述方法包括:接收发送端的第一数据报文,所述第一数据报文携带目的队列对编号;根据所述目的队列对编号,获取所述第一数据报文的源队列对编号;将所述源队列对编号添加到所述第一数据报文中,得到第二数据报文,并通过网络节点向接收端发送所述第二数据报文;接收第一拥塞通告报文,将所述第一拥塞通告报文发送至所述发送端,所述第一拥塞通告报文由所述网络节点在接收到所述第二数据报文并检测到网络拥塞时生成,所述第一拥塞通告报文的目的队列对编号为所述源队列对编号,所述第一拥塞通告报文用于通知所述发送端在所述第一拥塞通告报文的目的队列对编号与所述发送端的队列对编号相同时,降低所述第一数据报文所属的数据流的发送速率。
- 根据权利要求1所述的方法,其特征在于,所述根据所述目的队列对编号,获取所述第一数据报文的源队列对编号,包括:根据所述目的队列对编号,从预先建立的队列对跟踪表中,查找所述目的队列对编号对应的源队列对编号,所述队列对跟踪表的每个表项存储目的队列对编号与源队列对编号之间的对应关系。
- 根据权利要求1或2所述的方法,其特征在于,所述根据所述目的队列对编号,从预先建立的队列对跟踪表中,查找所述目的队列对编号对应的源队列对编号之前,还包括:跟踪所述发送端和所述接收端在连接建立过程中所发送的连接请求报文和连接应答报文;提取所述连接请求报文的基础传输头部BTH中的目的队列对编号,并提取所述连接应答报文的BTH中的目的队列对编号;将从所述连接请求报文中提取的目的队列对编号作为目的队列对编号,将从所述连接应答报文中提取的目的队列对编号作为源队列对编号,并记录所述源队列对编号与目的队列对编号之间的对应关系,得到所述队列对跟踪表。
- 根据权利要求1-3中任意一项所述的方法,其特征在于,所述将所述源队列对编号添加到所述第一数据报文中,得到第二数据报文,包括:将所述源队列对编号拆分为第一部分和第二部分;将所述第一部分添加到所述第一数据报文的用户数据报协议UDP头部的校验和字段中,并将所述第二部分添加到所述第一数据报文的基础传输头部BTH的预留字段中,得到所述第二数据报文。
- 根据权利要求1所述的方法,其特征在于,所述第一拥塞通告报文包括拥塞时 刻所述数据流在所述网络节点中所属的队列的队列深度,所述将所述第一拥塞通告报文发送至所述发送端,包括:根据所述队列深度,确定所述第一拥塞通告报文的发送周期;按照所述发送周期,将所述第一拥塞通告报文发送至所述发送端。
- 一种网络拥塞的通告方法,其特征在于,所述方法应用于代理节点中,所述方法包括:接收发送端的第一数据报文,所述第一数据报文携带目的队列对编号;通过网络节点向接收端发送所述第一数据报文;接收携带所述目的队列对编号的第二拥塞通告报文,根据所述目的队列对编号,获取所述目的队列对编号对应的源队列对编号,所述第二拥塞通告报文由所述网络节点在接收到所述第一数据报文并检测到网络拥塞时生成;用所述源队列对编号替换所述第二拥塞通告报文中的所述目的队列对编号,得到第一拥塞通告报文;将所述第一拥塞通告报文发送至所述发送端,所述第一拥塞通告报文用于通知所述发送端在所述第一拥塞通告报文的目的队列对编号与所述发送端的队列对编号相同时,降低所述第一数据报文所属的数据流的发送速率。
- 根据权利要求6所述的方法,其特征在于,所述根据所述目的队列对编号,获取所述目的队列对编号对应的源队列对编号,包括:根据所述目的队列对编号,从预先建立的队列对跟踪表中,查找所述目的队列对编号对应的源队列对编号,所述队列对跟踪表的每个表项存储目的队列对编号与源队列对编号之间的对应关系。
- 根据权利要求6或7所述的方法,其特征在于,所述根据所述目的队列对编号,从预先建立的队列对跟踪表中,查找所述目的队列对编号对应的源队列对编号之前,还包括:跟踪所述发送端和所述接收端在连接建立过程中所发送的连接请求报文和连接应答报文;提取所述连接请求报文的基础传输头部BTH中的目的队列对编号,并提取所述连接应答报文的BTH中的目的队列对编号;将从所述连接请求报文中提取的目的队列对编号作为目的队列对编号,将从所述连接应答报文中提取的目的队列对编号作为源队列对编号,并记录所述源队列对编号与目的队列对编号之间的对应关系,得到所述队列对跟踪表。
- 根据权利要求6-8中任意一项所述的方法,其特征在于,所述第一拥塞通告报文包括拥塞时刻所述数据流在所述网络节点中所属的队列的队列深度,将所述第一拥塞通告报文发送至所述发送端,包括:根据所述队列深度,确定所述第一拥塞通告报文的发送周期;按照所述发送周期,将所述第一拥塞通告报文发送至所述发送端。
- 一种用于对网络拥塞进行通告的代理节点,其特征在于,所述代理节点包括:报文接收单元,用于接收发送端的第一数据报文,所述第一数据报文携带目的队列对编号;编号获取单元,用于根据所述目的队列对编号,获取所述第一数据报文的源队列对编号;编号添加单元,用于将所述源队列对编号添加到所述第一数据报文中,得到第二数据报文,并通过网络节点向接收端发送所述第二数据报文;所述报文接收单元,用于接收第一拥塞通告报文,所述第一拥塞通告报文由所述网络节点在接收到所述第二数据报文并检测到网络拥塞时生成,所述第一拥塞通告报文的目的队列对编号为所述源队列对编号;报文发送单元,用于将所述第一拥塞通告报文发送至所述发送端,所述第一拥塞通告报文用于通知所述发送端在所述第一拥塞通告报文的目的队列对编号与所述发送端的队列对编号相同时,降低所述第一数据报文所属的数据流的发送速率。
- 根据权利要求10所述的代理节点,其特征在于,所述编号获取单元,用于根据所述目的队列对编号,从预先建立的队列对跟踪表中,查找所述目的队列对编号对应的源队列对编号,所述队列对跟踪表的每个表项存储目的队列对编号与源队列对编号之间的对应关系。
- 根据权利要求10或11所述的代理节点,其特征在于,所述编号获取单元,用于跟踪所述发送端和所述接收端在连接建立过程中所发送的连接请求报文和连接应答报文;提取所述连接请求报文的基础传输头部BTH中的目的队列对编号,并提取所述连接应答报文的BTH中的目的队列对编号;将从所述连接请求报文中提取的目的队列对编号作为目的队列对编号,将从所述连接应答报文中提取的目的队列对编号作为源队列对编号,并记录所述源队列对编号与目的队列对编号之间的对应关系,得到所述队列对跟踪表。
- 根据权利要求10-12中任意一项所述的代理节点,其特征在于,所述编号添加单元,用于将所述源队列对编号拆分为第一部分和第二部分;将所述第一部分添加到所述第一数据报文的用户数据报协议UDP头部的校验和字段中,并将所述第二部分添加到所述第一数据报文的基础传输头部BTH的预留字段中,得到所述第二数据报文。
- 根据权利要求10所述的代理节点,其特征在于,所述第一拥塞通告报文包括拥塞时刻所述数据流在所述网络节点中所属的队列的队列深度,所述报文发送单元,用于根据所述队列深度,确定所述第一拥塞通告报文的发送周期;按照所述发送周期,将所述第一拥塞通告报文发送至所述发送端。
- 一种用于对网络拥塞进行通告的代理节点,其特征在于,所述代理节点包括:报文接收单元,用于接收发送端的第一数据报文,所述第一数据报文携带目的队列对编号;报文发送单元,用于通过网络节点向接收端发送所述第一数据报文;所述报文接收单元,还用于接收携带所述目的队列对编号的第二拥塞通告报文,所述第二拥塞通告报文由所述网络节点在接收到所述第一数据报文并检测到网络拥塞时生成;编号获取单元,用于根据所述目的队列对编号,获取所述目的队列对编号对应的源队列对编号;编号替换单元,用于用所述源队列对编号替换第二拥塞通告报文中的目的队列对编号,得到第一拥塞通告报文;所述报文发送单元,还用于将所述第一拥塞通告报文发送至所述发送端,所述第一拥塞通告报文用于通知所述发送端在所述第一拥塞通告报文的目的队列对编号与所述发送端的队列对编号相同时,降低所述第一数据报文所属的数据流的发送速率。
- 根据权利要求15所述的代理节点,其特征在于,所述编号获取单元,用于根据所述目的队列对编号,从预先建立的队列对跟踪表中,查找所述目的队列对编号对应的源队列对编号,所述队列对跟踪表的每个表项存储目的队列对编号与源队列对编号之间的对应关系。
- 根据权利要求15或16所述的代理节点,其特征在于,所述编号获取单元,还用于跟踪所述发送端和所述接收端在连接建立过程中所发送的连接请求报文和连接应答报文;提取所述连接请求报文的基础传输头部BTH中的目的队列对编号,并提取所述连接应答报文的BTH中的目的队列对编号;将从所述连接请求报文中提取的目的队列对编号作为目的队列对编号,将从所述连接应答报文中提取的目的队列对编号作为源队列对编号,并记录所述源队列对编号与目的队列对编号之间的对应关系,得到所述队列对跟踪表。
- 根据权利要求15-17中任意一项所述的代理节点,其特征在于,所述第一拥塞通告报文包括拥塞时刻所述数据流在所述网络节点中所属的队列的队列深度;所述报文发送单元,还用于根据所述队列深度,确定所述第一拥塞通告报文的发送周期;按照所述发送周期,将所述第一拥塞通告报文发送至所述发送端。
- 一种计算机设备,其特征在于,包括:存储器、处理器、通信接口及总线;其中,所述存储器、所述处理器及所述通信接口通过所述总线连接,所述存储器用于存放计算机指令,当计算机设备运行时,所述处理器运行所述计算机指令,使得所述计算机设备执行权利要求1至9中任意一项所述的网络拥塞的通告方法。
- 一种存储介质,其特征在于,所述存储介质中至少存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如权利要求1至9中任意一项所述的网络拥塞的通告方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21193312.2A EP3979600A1 (en) | 2017-08-11 | 2018-07-13 | Network congestion notification method, agent node, and computer device |
EP18843231.4A EP3657743B1 (en) | 2017-08-11 | 2018-07-13 | Network congestion notification methods and agent nodes |
US16/786,461 US11374870B2 (en) | 2017-08-11 | 2020-02-10 | Network congestion notification method, agent node, and computer device |
US17/734,627 US20220263767A1 (en) | 2017-08-11 | 2022-05-02 | Network Congestion Notification Method, Agent Node, and Computer Device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710687388.0A CN109391560B (zh) | 2017-08-11 | 2017-08-11 | 网络拥塞的通告方法、代理节点及计算机设备 |
CN201710687388.0 | 2017-08-11 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/786,461 Continuation US11374870B2 (en) | 2017-08-11 | 2020-02-10 | Network congestion notification method, agent node, and computer device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019029318A1 true WO2019029318A1 (zh) | 2019-02-14 |
Family
ID=65270824
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/095602 WO2019029318A1 (zh) | 2017-08-11 | 2018-07-13 | 网络拥塞的通告方法、代理节点及计算机设备 |
Country Status (4)
Country | Link |
---|---|
US (2) | US11374870B2 (zh) |
EP (2) | EP3657743B1 (zh) |
CN (2) | CN113709057B (zh) |
WO (1) | WO2019029318A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4087199A4 (en) * | 2020-01-23 | 2023-02-01 | Huawei Technologies Co., Ltd. | METHOD AND DEVICE FOR OVERLOAD CONTROL |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11418446B2 (en) * | 2018-09-26 | 2022-08-16 | Intel Corporation | Technologies for congestion control for IP-routable RDMA over converged ethernet |
US11005770B2 (en) * | 2019-06-16 | 2021-05-11 | Mellanox Technologies Tlv Ltd. | Listing congestion notification packet generation by switch |
CN112104562B (zh) * | 2019-06-17 | 2023-04-18 | 华为技术有限公司 | 拥塞控制方法及装置、通信网络、计算机存储介质 |
CN112242956B (zh) * | 2019-07-18 | 2024-04-26 | 华为技术有限公司 | 流速控制方法和装置 |
CN112311685A (zh) * | 2019-07-24 | 2021-02-02 | 华为技术有限公司 | 一种处理网络拥塞的方法以及相关装置 |
CN112532535B (zh) * | 2019-09-17 | 2022-10-04 | 华为技术有限公司 | 一种用于优化网络拥塞的方法和装置 |
CN112751776B (zh) * | 2019-10-30 | 2024-07-19 | 华为技术有限公司 | 拥塞控制方法和相关装置 |
CN114095448A (zh) * | 2020-08-05 | 2022-02-25 | 华为技术有限公司 | 一种拥塞流的处理方法及设备 |
CN112383450A (zh) * | 2020-11-30 | 2021-02-19 | 盛科网络(苏州)有限公司 | 一种网络拥塞检测方法及装置 |
CN112637015B (zh) * | 2020-12-23 | 2022-08-26 | 苏州盛科通信股份有限公司 | 一种基于psn实现rdma网络的丢包检测方法及装置 |
EP4260195A1 (en) | 2021-01-06 | 2023-10-18 | Enfabrica Corporation | Server fabric adapter for i/o scaling of heterogeneous and accelerated compute systems |
CN113411263B (zh) * | 2021-06-18 | 2023-03-14 | 中国工商银行股份有限公司 | 一种数据传输方法、装置、设备及存储介质 |
CN113411264B (zh) * | 2021-06-30 | 2023-03-14 | 中国工商银行股份有限公司 | 一种网络队列的监控方法、装置、计算机设备和存储介质 |
CN118318429A (zh) * | 2021-08-11 | 2024-07-09 | 安法布里卡公司 | 用于使用流级别传输机制进行拥塞控制的系统和方法 |
WO2023135674A1 (ja) * | 2022-01-12 | 2023-07-20 | 日本電信電話株式会社 | 処理システム、処理装置、処理方法およびプログラム |
CN114745331B (zh) * | 2022-03-23 | 2023-11-07 | 新华三技术有限公司合肥分公司 | 一种拥塞通知方法及设备 |
CN114866477A (zh) * | 2022-04-21 | 2022-08-05 | 浪潮思科网络科技有限公司 | 一种网络设备拥塞控制机制的测试方法、系统及设备 |
CN114979001B (zh) * | 2022-05-20 | 2023-06-13 | 北京百度网讯科技有限公司 | 基于远程直接数据存取的数据传输方法、装置以及设备 |
CN115314442B (zh) * | 2022-08-08 | 2023-09-12 | 北京云脉芯联科技有限公司 | 拥塞控制和基于Group的限速限窗装置及方法、限速限窗方法 |
CN115484217B (zh) * | 2022-09-06 | 2024-01-05 | 燕山大学 | 基于正交架构一体化的高效动态收敛机制实现方法和系统 |
CN117785762A (zh) * | 2023-11-30 | 2024-03-29 | 中科驭数(北京)科技有限公司 | 一种信息存储方法、装置、设备和存储介质 |
CN117938776A (zh) * | 2024-02-19 | 2024-04-26 | 北京光润通科技发展有限公司 | 一种网络拥塞反向规避方法及网络拥塞反向规避网卡 |
CN118524065B (zh) * | 2024-07-24 | 2024-10-01 | 苏州元脑智能科技有限公司 | 拥塞控制方法及装置、存储介质及电子设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101166140A (zh) * | 2006-10-18 | 2008-04-23 | 中国科学院自动化研究所 | 一种应用于互联网的网络拥塞控制系统及方法 |
CN101227495A (zh) * | 2008-02-20 | 2008-07-23 | 中兴通讯股份有限公司 | 公用电信分组数据网系统及其拥塞控制方法 |
CN102868671A (zh) * | 2011-07-08 | 2013-01-09 | 华为技术有限公司 | 网络拥塞控制方法、设备及系统 |
WO2016041580A1 (en) * | 2014-09-16 | 2016-03-24 | Huawei Technologies Co.,Ltd | Scheduler, sender, receiver, network node and methods thereof |
CN105897605A (zh) * | 2016-04-08 | 2016-08-24 | 重庆邮电大学 | 一种基于IPv6的电力线载波通信网络拥塞控制方法 |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1240753A1 (en) * | 1999-12-13 | 2002-09-18 | Nokia Corporation | Congestion control method for a packet-switched network |
US20060203730A1 (en) * | 2005-03-14 | 2006-09-14 | Zur Uri E | Method and system for reducing end station latency in response to network congestion |
CN101098301B (zh) * | 2006-06-27 | 2011-04-20 | 华为技术有限公司 | 一种无线网络的二层拥塞控制方法 |
US7817634B2 (en) * | 2006-06-30 | 2010-10-19 | Intel Corporation | Network with a constrained usage model supporting remote direct memory access |
CN101188611A (zh) * | 2007-11-21 | 2008-05-28 | 华为技术有限公司 | 拥塞通知方法、系统和节点设备 |
US9473596B2 (en) * | 2011-09-27 | 2016-10-18 | International Business Machines Corporation | Using transmission control protocol/internet protocol (TCP/IP) to setup high speed out of band data communication connections |
CN104272680B (zh) * | 2012-03-09 | 2017-05-17 | 英国电讯有限公司 | 用信号通知拥塞 |
US9270489B1 (en) * | 2012-03-19 | 2016-02-23 | Google Inc. | Explicit congestion notification in mixed fabric networks |
CN102594713B (zh) * | 2012-03-29 | 2015-09-09 | 杭州华三通信技术有限公司 | 一种实现显式拥塞通告的方法及设备 |
CN102891803B (zh) * | 2012-10-10 | 2015-05-13 | 华为技术有限公司 | 拥塞处理方法及网络设备 |
US9336158B2 (en) * | 2013-02-26 | 2016-05-10 | Oracle International Corporation | Method and system for simplified address translation support for static infiniband host channel adaptor structures |
CN103647807B (zh) * | 2013-11-27 | 2017-12-15 | 华为技术有限公司 | 一种信息缓存方法、装置和通信设备 |
US9621471B2 (en) * | 2014-06-30 | 2017-04-11 | Vmware, Inc. | Framework for early congestion notification and recovery in a virtualized environment |
CN104394093B (zh) * | 2014-11-28 | 2017-12-01 | 广州杰赛科技股份有限公司 | 拥塞控制方法与无线网状网系统 |
CN104753816A (zh) * | 2015-03-27 | 2015-07-01 | 华为技术有限公司 | 一种rdma连接的报文处理方法及相关装置 |
US9674090B2 (en) * | 2015-06-26 | 2017-06-06 | Microsoft Technology Licensing, Llc | In-line network accelerator |
US10257273B2 (en) * | 2015-07-31 | 2019-04-09 | Netapp, Inc. | Systems, methods and devices for RDMA read/write operations |
US9813338B2 (en) * | 2015-12-10 | 2017-11-07 | Cisco Technology, Inc. | Co-existence of routable and non-routable RDMA solutions on the same network interface |
CN106027412B (zh) * | 2016-05-30 | 2019-07-12 | 南京理工大学 | 一种基于拥塞队列长度的tcp拥塞控制方法 |
CN107493238A (zh) * | 2016-06-13 | 2017-12-19 | 华为技术有限公司 | 一种网络拥塞控制方法、设备及系统 |
US10833998B2 (en) * | 2017-01-12 | 2020-11-10 | Marvell Israel (M.I.S.L) Ltd. | Method and apparatus for flow control |
US11005770B2 (en) * | 2019-06-16 | 2021-05-11 | Mellanox Technologies Tlv Ltd. | Listing congestion notification packet generation by switch |
-
2017
- 2017-08-11 CN CN202111036600.XA patent/CN113709057B/zh active Active
- 2017-08-11 CN CN201710687388.0A patent/CN109391560B/zh active Active
-
2018
- 2018-07-13 EP EP18843231.4A patent/EP3657743B1/en active Active
- 2018-07-13 WO PCT/CN2018/095602 patent/WO2019029318A1/zh unknown
- 2018-07-13 EP EP21193312.2A patent/EP3979600A1/en active Pending
-
2020
- 2020-02-10 US US16/786,461 patent/US11374870B2/en active Active
-
2022
- 2022-05-02 US US17/734,627 patent/US20220263767A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101166140A (zh) * | 2006-10-18 | 2008-04-23 | 中国科学院自动化研究所 | 一种应用于互联网的网络拥塞控制系统及方法 |
CN101227495A (zh) * | 2008-02-20 | 2008-07-23 | 中兴通讯股份有限公司 | 公用电信分组数据网系统及其拥塞控制方法 |
CN102868671A (zh) * | 2011-07-08 | 2013-01-09 | 华为技术有限公司 | 网络拥塞控制方法、设备及系统 |
WO2016041580A1 (en) * | 2014-09-16 | 2016-03-24 | Huawei Technologies Co.,Ltd | Scheduler, sender, receiver, network node and methods thereof |
CN105897605A (zh) * | 2016-04-08 | 2016-08-24 | 重庆邮电大学 | 一种基于IPv6的电力线载波通信网络拥塞控制方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3657743A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4087199A4 (en) * | 2020-01-23 | 2023-02-01 | Huawei Technologies Co., Ltd. | METHOD AND DEVICE FOR OVERLOAD CONTROL |
Also Published As
Publication number | Publication date |
---|---|
EP3657743A4 (en) | 2020-07-08 |
EP3657743A1 (en) | 2020-05-27 |
CN113709057B (zh) | 2023-05-05 |
EP3979600A1 (en) | 2022-04-06 |
US20220263767A1 (en) | 2022-08-18 |
CN109391560A (zh) | 2019-02-26 |
EP3657743B1 (en) | 2021-09-15 |
US20200177513A1 (en) | 2020-06-04 |
US11374870B2 (en) | 2022-06-28 |
CN109391560B (zh) | 2021-10-22 |
CN113709057A (zh) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019029318A1 (zh) | 网络拥塞的通告方法、代理节点及计算机设备 | |
US11882041B2 (en) | Congestion notification packet indicating specific packet flow experiencing congestion to facilitate individual packet flow based transmission rate control | |
US20200358886A1 (en) | Data Transmission Method, Apparatus, And System | |
US20220078114A1 (en) | Method and Apparatus for Providing Service for Traffic Flow | |
CN107948076B (zh) | 一种转发报文的方法及装置 | |
US11949576B2 (en) | Technologies for out-of-order network packet management and selective data flow splitting | |
WO2020001393A1 (zh) | 发送网络性能参数、计算网络性能的方法和网络节点 | |
CN110022264B (zh) | 控制网络拥塞的方法、接入设备和计算机可读存储介质 | |
US9413652B2 (en) | Systems and methods for path maximum transmission unit discovery | |
WO2023005773A1 (zh) | 基于远程直接数据存储的报文转发方法、装置、网卡及设备 | |
US9455916B2 (en) | Method and system for changing path and controller thereof | |
US9559960B2 (en) | Network congestion management | |
WO2018113425A1 (zh) | 一种检测时延的方法、装置及系统 | |
WO2022067791A1 (zh) | 一种数据处理、传输方法及相关设备 | |
TWI721103B (zh) | 集群精確限速方法和裝置 | |
WO2018121535A1 (zh) | 一种负载均衡处理方法及装置 | |
US9106546B1 (en) | Explicit congestion notification in mixed fabric network communications | |
WO2022028342A1 (zh) | 一种拥塞流的处理方法及设备 | |
CN117354253A (zh) | 一种网络拥塞通知方法、装置及存储介质 | |
CN107231316B (zh) | 报文的传输方法及装置 | |
WO2023005927A1 (zh) | 一种基于SRv6的隧道质量检测方法和相关装置 | |
WO2017041569A1 (zh) | 业务数据传输方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18843231 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018843231 Country of ref document: EP Effective date: 20200218 |