WO2017215392A1 - 一种网络拥塞控制方法、设备及系统 - Google Patents

一种网络拥塞控制方法、设备及系统 Download PDF

Info

Publication number
WO2017215392A1
WO2017215392A1 PCT/CN2017/084627 CN2017084627W WO2017215392A1 WO 2017215392 A1 WO2017215392 A1 WO 2017215392A1 CN 2017084627 W CN2017084627 W CN 2017084627W WO 2017215392 A1 WO2017215392 A1 WO 2017215392A1
Authority
WO
WIPO (PCT)
Prior art keywords
congestion
control message
packet
congestion control
source node
Prior art date
Application number
PCT/CN2017/084627
Other languages
English (en)
French (fr)
Inventor
沈利
申伟
孔维庆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP17812503.5A priority Critical patent/EP3461082B1/en
Publication of WO2017215392A1 publication Critical patent/WO2017215392A1/zh
Priority to US16/217,821 priority patent/US11115339B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/72Routing based on the source address
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/11Identifying congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/26Flow control; Congestion control using explicit feedback to the source, e.g. choke packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/26Flow control; Congestion control using explicit feedback to the source, e.g. choke packets
    • H04L47/263Rate modification at the source after receiving feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/625Queue scheduling characterised by scheduling criteria for service slots or service orders
    • H04L47/626Queue scheduling characterised by scheduling criteria for service slots or service orders channel conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2212/00Encapsulation of packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/26Flow control; Congestion control using explicit feedback to the source, e.g. choke packets
    • H04L47/265Flow control; Congestion control using explicit feedback to the source, e.g. choke packets sent by intermediate network nodes

Definitions

  • the present invention relates to the field of network communication technologies, and in particular, to a network congestion control method, device, and system.
  • a node in the network detects the network congestion, it sends a Congestion Notification Message (CNM) to the source node that sends the packet, and requests the source node to reduce the sending rate of the packet.
  • CNM Congestion Notification Message
  • the source node that sends the packet reduces the sending rate of the packet and periodically attempts to increase the sending rate of the packet. If the congestion is eliminated at this time, increasing the sending rate of the packet does not cause congestion. You will no longer receive CNM. The rate at which the message is sent can eventually be restored to the value before the congestion.
  • the node that detects the network congestion needs to encapsulate the flow identifier (Flow-ID) carried in the packet that causes the congestion into the CNM, and needs the destination media in the CNM.
  • the access control address Media Access Control, MAC
  • the source node determines, according to the Flow-ID, the rate limit of the packet with the Flow-ID.
  • the above CNM mechanism can only be used in a Layer 2 Ethernet (Ethernet) network, because in a Layer 3 Internet Protocol (IP) network, each time a route is made, the packet header will be replaced, the Flow-ID and the source node. The source MAC address will be lost. Therefore, the latter node will not be able to construct the CNM and send it to the source node, so the CNM mechanism cannot support the Layer 3 IP network.
  • the first aspect provides a network congestion control method, including: acquiring a quintuple of a packet causing network congestion when detecting network congestion; generating a congestion control message, where the congestion control message carries the packet The quintuple sends the congestion control message to the source node of the packet or the access forwarding device connected to the source node of the packet.
  • the method further includes determining a type of the source node of the message.
  • the type of the source node of the message is determined by one of the following fields: a device type field, a priority field, or a source IP address field.
  • the quintuple of the message is encapsulated in the payload domain of the congestion control message.
  • the destination address of the congestion control message is an Internet Protocol IP address of a source node of the packet, or an IP address of an access forwarding device to which the source node of the packet is connected.
  • the congestion control message also carries information indicating the degree of congestion.
  • the information for indicating the degree of congestion is one or more of the following information: a quantized feedback value of the congestion control message, a congestion point identifier, and a current congestion point transmission queue. The number of free bytes available, and the difference between the number of available bytes in the send queue of the congestion point.
  • the quintuple is a source IP address, a destination IP address, a source port number, a destination port number, and a protocol type carried in the packet.
  • a network congestion control method including receiving a first congestion control message, wherein The first congestion control message carries a quintuple of the packet that causes the congestion; the quintuple of the packet that causes the congestion is obtained, the flow identifier of the packet is obtained; and the second congestion control message is generated, where the second The congestion control message carries a flow identifier of the packet; and sends a second congestion control message to the source node of the packet.
  • the method further includes receiving one or more messages from the source node; establishing a mapping relationship between the quintuple of the one or more messages and the flow identifier.
  • the message is a data frame, and the data frame carries a CN-TAG field, and the flow identifier is carried in an RPID field of the CN-TAG field.
  • the second congestion control message is a congestion notification message
  • the congestion notification message carries a CN-TAG field
  • the flow identifier is carried in an RPID field of the CN-TAG field.
  • the first congestion control message also carries information indicating a degree of congestion.
  • the information for indicating the degree of congestion is one or more of the following information: the quantized feedback value of the congestion control message, the congestion point identifier, and the congestion point sending queue are currently idle. The number of bytes available, and the difference between the number of available bytes in the transmission queue of the congestion point.
  • the quintuple is a source IP address, a destination IP address, a source port number, a destination port number, and a protocol type carried in the packet.
  • a method for controlling network congestion includes: receiving one or more packets from a source node; establishing a mapping relationship between a quintuple and the flow identifier of the one or more packets; Obtaining a quintuple of a packet that is congested when the network is congested; acquiring a flow identifier of the packet in which the congestion occurs according to the mapping relationship; generating a congestion control message, and sending the packet to the source node of the packet, where The congestion control message carries the flow identifier.
  • the method further includes determining a type of the source node of the message.
  • the type of the source node of the message is determined by one of the following fields: a device type field, a priority field, or a source IP address field.
  • the message is a data frame, and the data frame carries a CN-TAG field, and the flow identifier is carried in an RPID field of the CN-TAG field.
  • the congestion control message is a congestion notification message
  • the congestion notification message carries a CN-TAG field
  • the flow identifier is carried in an RPID field of the CN-TAG field.
  • the congestion control message also carries information indicating the degree of congestion.
  • the information for indicating the degree of congestion is one or more of the following information: the quantized feedback value of the congestion control message, the congestion point identifier, and the congestion point sending queue are currently idle. The number of bytes available, and the difference between the number of available bytes in the transmission queue of the congestion point.
  • the quintuple is a source IP address, a destination IP address, a source port number, a destination port number, and a protocol type carried in the packet.
  • a network congestion control method including: receiving a plurality of packets from a source node; generating a CNP message when detecting network congestion; and sending the CNP message to the source node, to indicate the The source node reduces the data rate.
  • the message is a RoCE message.
  • the message carries a flow identifier, and the flow identifier is encapsulated in a BTH field of the message.
  • the flow identifier is encapsulated in a DestQP field of the message.
  • the CNP message carries a flow identifier, and the flow identifier is encapsulated in the CNP In the BTH field of interest.
  • the CNP message carries a flow identifier, the flow identifier being encapsulated in a DestQP field of the CNP message.
  • the method further includes determining a type of the source node.
  • the type of the source node is determined by one of the following fields: a device type field, a priority field, or a source IP address field.
  • the type of the source node is a RoCEv2 server.
  • the destination address of the CNP message is the address of the RoCEv2 server.
  • a network device including: a processing unit, configured to: when detecting a network congestion, acquire a quintuple of a packet that causes network congestion; generate a congestion control message, where the congestion control message carries The quintuple of the message; the sending unit is configured to send the congestion control message to the source node of the packet or the access forwarding device connected to the source node of the packet.
  • the processing unit is further configured to determine a type of a source node of the message.
  • the type of the source node of the message is determined by one of the following fields: a device type field, a priority field, or a source IP address field.
  • the quintuple of the message is encapsulated in the payload domain of the congestion control message.
  • the destination address of the congestion control message is an Internet Protocol IP address of a source node of the packet, or an IP address of an access forwarding device to which the source node of the packet is connected.
  • the congestion control message also carries information indicating the degree of congestion.
  • the information for indicating the degree of congestion is one or more of the following information: a quantized feedback value of the congestion control message, a congestion point identifier, and a current congestion point transmission queue. The number of free bytes available, and the difference between the number of available bytes in the send queue of the congestion point.
  • a network device including a receiving unit, configured to receive a first congestion control message, where the first congestion control message carries a quintuple of packets that cause congestion; and a processing unit is configured to The quintuple of the packet that causes the congestion, obtains the flow identifier of the packet, generates a second congestion control message, where the second congestion control message carries the flow identifier, and the sending unit is configured to: A second congestion control message is sent to the source node of the message.
  • the receiving unit is further configured to receive one or more messages from the source node.
  • the processing unit is further configured to establish a mapping relationship between the quintuple of the one or more messages and the flow identifier.
  • the network device further includes a storage unit, configured to store a mapping relationship table between the quintuple and the flow identifier of the message.
  • the message is a data frame, and the data frame carries a CN-TAG field, and the flow identifier is carried in an RPID field of the CN-TAG field.
  • the second congestion control message is a congestion notification message
  • the congestion notification message carries a CN-TAG field
  • the flow identifier is carried in an RPID field of the CN-TAG field.
  • the first congestion control message also carries information indicating a degree of congestion.
  • the information for indicating the degree of congestion is one or more of the following information: the quantized feedback value of the congestion control message, the congestion point identifier, and the congestion point sending queue are currently idle. The number of bytes available, and the difference between the number of available bytes in the transmission queue of the congestion point.
  • a network device including a receiving unit, configured to receive one or more from a source node a packet, a storage unit, configured to store a mapping relationship between the quintuple and the flow identifier of the one or more packets, and a processing unit, configured to acquire, when the network is congested, the packet that is congested a tuple that obtains a flow identifier corresponding to the packet according to the mapping relationship, generates a congestion control message, where the congestion control message carries the flow identifier, and a sending unit, configured to send the congestion control message to the The source node of the congested message occurs such that it reduces the transmission rate of the message according to the congestion control message.
  • the processing unit is further configured to determine a type of a source node of the message.
  • the type of the source node of the message is determined by one of the following fields: a device type field, a priority field, or a source IP address field.
  • the message is a data frame, and the data frame carries a CN-TAG field, and the flow identifier is carried in an RPID field of the CN-TAG field.
  • the congestion control message is a congestion notification message
  • the congestion notification message carries a CN-TAG field
  • the flow identifier is carried in an RPID field of the CN-TAG field.
  • the congestion control message also carries information indicating the degree of congestion.
  • a network device including a receiving unit, configured to receive one or more messages from a source node, a processing unit, configured to generate a CNP message when network congestion is detected, and a sending unit, configured to: The CNP message sends the source node to instruct the source node to decrease the data rate.
  • the processing unit is further configured to determine the type of the source node.
  • the type of the source node is determined by one of the following fields: a device type field, a priority field, or a source IP address field.
  • the type of the source node is a RoCEv2 server.
  • the CNP message carries a stream identification.
  • the destination address of the CNP message is an IP address of the RoCEV2 server.
  • a network system including a source node, an access forwarding device, a convergence device, and a destination node, where the source node is connected to one port of the access forwarding device, and the access forwarding device is The other port is connected to the aggregation device; the other port of the aggregation device is indirectly connected to the destination node, and the source node is configured to send multiple packets to the access forwarding device;
  • the forwarding device is configured to receive the plurality of packets, obtain a quintuple and a flow identifier in the packet, and establish a mapping relationship between the quintuple and the flow identifier, and forward the multiple packets.
  • the aggregation device is configured to: when detecting a network congestion, acquire a quintuple of a packet that is congested by the network, generate a first congestion control message, where the first congestion control message carries the quintuple Sending the first congestion control message to the access forwarding device; the access forwarding device is further configured to receive the first congestion control message, obtain the quintuple; according to the quintuple Obtain And generating a second congestion control message, where the second congestion control message carries the flow identifier; sending the second congestion control message to the source node; and the source node, configured to receive the second congestion control message And performing, according to the indication of the control message, a packet destined for the packet whose identifier is the flow identifier.
  • a network system including a source node, an access forwarding device, a convergence device, and a destination node, where the source node is connected to one port of the access forwarding device, and the access forwarding device is The other port is connected to the aggregation device; the other port of the aggregation device is indirectly connected to the destination node, and the source node is configured to send multiple packets to the access forwarding device; And the forwarding device is configured to receive the plurality of packets, obtain a quintuple and a flow identifier in the packet, and establish a mapping relationship between the quintuple and the flow identifier; when detecting network congestion occurs, Obtaining a quintuple of the packet in which the network is congested, and obtaining the quintuple of the packet in which the congestion occurs Obtaining a flow identifier of the packet in which the congestion occurs, generating a congestion control message, where the congestion control message carries the flow identifier of the packet in
  • a network system including a source node, an intermediate node device, and a destination node, where the source node is connected to the destination node by using an intermediate node device, where the source node is configured to send multiple reports.
  • the intermediate node device is configured to: when the network congestion is detected, generate a CNP message, where the CNP message carries a flow identifier of the packet in which the congestion occurs; and sends the CNP message to the
  • the source node is further configured to receive the CNP message, and perform a deceleration process on the packet with the flow identifier according to the indication of the CNP message.
  • the quintuple referred to in various embodiments of the present invention refers to a source IP address, a source port, a destination IP address, a destination port, and a transport layer protocol.
  • the technical solution provided by the embodiment of the present invention when detecting the congestion of the network, acquires the quintuple of the packet in which the network is congested, and obtains the flow identifier of the packet in which the network is congested according to the quintuple information, thereby generating a congestion control message. And sending a packet to the source node of the packet, instructing it to perform the deceleration processing on the packet with the flow identifier, which can be used in the Layer 3 network.
  • FIG. 1 is a schematic diagram of a network architecture according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of interaction of a network congestion control method according to an embodiment of the present invention.
  • 3A is a schematic structural diagram of a packet (data packet) according to an embodiment of the present invention.
  • FIG. 3B is a schematic diagram of a method for detecting network congestion according to an embodiment of the present invention.
  • FIG. 4A is a schematic structural diagram of a CNM frame format according to an embodiment of the present invention.
  • FIG. 4B is a schematic structural diagram of another CNM frame format provided by an example of the present invention.
  • FIG. 5 is a schematic diagram of a network congestion control interaction method according to another embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a network congestion control interaction method according to another embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a network congestion control interaction method according to another embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a frame format of a message (data packet) according to an embodiment of the present invention.
  • FIG. 8B is a schematic structural diagram of a CNP message format according to an embodiment of the present disclosure.
  • 8C is a schematic diagram of an internal structure of a BTH field according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a network congestion control interaction method according to another embodiment of the present invention.
  • FIG. 10 is a schematic diagram of an internal structure of a network device according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of an internal structure of a network device according to another embodiment of the present invention.
  • FIG. 12 is a schematic diagram of an internal structure of a network device according to another embodiment of the present invention.
  • FIG. 13 is a schematic block diagram of a network device according to another embodiment of the present invention.
  • FIG. 14 is a schematic diagram of an internal structure of a network device according to another implementation of the present invention.
  • FIG. 15 is a schematic block diagram of a network device according to another embodiment of the present invention.
  • the technical solution of the embodiment of the present invention can be applied to the network architecture shown in FIG. 1 , where the network architecture includes an access layer, an aggregation layer, and a core layer.
  • the source node (the hosts 1 to 4 in Figure 1) that sends packets is connected to the access layer device through a copper twisted pair cable or an optical fiber.
  • the four devices in Figure 1 are only examples. The number of devices in the actual networking can be increased.
  • the access layer device also known as the access forwarding device
  • the access layer device is connected to the aggregation layer device through a copper twisted pair cable or an optical fiber (the four devices in Figure 1 are only examples, and the number of devices in the actual networking may be increased or Reduced)
  • the aggregation layer device is connected to the core layer device through a copper twisted pair or fiber (the two in Figure 1 are only examples, and the number of devices in the actual networking can be increased or decreased).
  • the node that sends the message is collectively referred to as the source node (such as host 1), the node that the data finally reaches is called the destination node (such as host 3), and the node of the intermediate path is called the access forwarding device (such as Figure 1).
  • the host 1 serves as a source node to send a message to the host 2 as the destination node, and the node of the path may be a T1 node, an A1 node, a T2 node, and a host 2 node.
  • the host 1 sends the packet as the source node to the destination node 3, and the node of the path is: the host 1 sends the packet to the T1 node, then forwards it to the A1 node, and then forwards it to the C1 node, and then forwards it to the A3 node. It is forwarded to the T3 node and finally to the host 2 node.
  • the routing algorithm is optimized to calculate the shortest path between two nodes, that is, if there are multiple paths between two nodes, the shortest path is generally selected.
  • a data center there are three types of data streams: (1) storage data stream: no packet loss required; (2) high performance computing stream: low latency required; (3) Ethernet stream: Allows certain packet loss and delay. Due to the different requirements for data streams, traditional data centers use three different types of networks to carry different streams. This approach is still acceptable in small data centers. However, as the size of the data center expands, running three different types of networks is not an economic solution. An alternative approach is to use Ethernet to carry three different types of data streams and define additional mechanisms to enable Ethernet to meet the requirements of the three different types of networks.
  • ETS Enhanced Transmission Selection
  • PFC Priority-based Flow Control
  • QCN Quantized Congestion Notification
  • first type and second type servers there are at least two types of source nodes, which are called first type and second type servers.
  • the following multiple embodiments are respectively directed to three scenarios: all the hosts 1 to 4 in the first scenario are all of the first type, and all the hosts 1 to 4 in the second scenario are all of the second type.
  • the third scenario The type of host in Figure 1 includes a first type and a second type.
  • the source node host 1 wants to send a message to the destination node host 2, and the nodes in the intermediate path are the access forwarding device T1, the aggregation layer device A1, and then the access layer. Device T2, eventually to host 2.
  • Figure 2 provides a schematic diagram of the interaction method of the network congestion control, including:
  • Step 201 The source node host 1 sends multiple packets to the access forwarding device.
  • the packet may also be referred to as a data packet, a frame, etc.
  • the format of the packet is as shown in FIG. 3A, including a destination MAC address, a source MAC address, a service tag (S-TAG) field, and a client tag ( a Customer VLAN Tag (C-TAG) field, a Congestion Notification Tag (CN-TAG) field, and a Media Access Control Service Data Unit (MSDU) field, where the message is a quintuple
  • the information is encapsulated in the MSDU field, and the CN-TAG field further includes an EtherType field and a Reaction Point Identifier (RPID) field.
  • RPID Reaction Point Identifier
  • the RPID field encapsulates the flow identifier (Flow-ID) of the packet, and is used to uniquely identify the data flow type of each packet sent. For example, if the flow-ID is 1, the data flow to which the message belongs is the storage data flow, the flow-ID is 2, the data flow to which the message belongs is a high-performance computing flow, and the flow-ID is 3 to identify the packet to which the message belongs.
  • the data stream is an Ethernet stream.
  • quintuple refers to the source IP address, destination IP address, source port number, destination port number, and protocol type carried in the packet.
  • the source MAC address is the MAC address of the host 1
  • the destination MAC address is the MAC address of the access forwarding device
  • the source IP address is the IP address of the host 1
  • the destination IP address is IP address of host 2.
  • Step 202 After receiving the multiple packets, the access forwarding device obtains the quintuple and the flow identifier information of each packet, and stores the mapping relationship between the quintuple and the flow identifier of the packet locally. Sexually, as shown in Table 1.
  • the value of 6 in the protocol column in Table 1 indicates that the protocol is the Transmission Control Protocol (TCP), and the value 17 indicates that the protocol is User Data Protocol (UDP).
  • TCP Transmission Control Protocol
  • UDP User Data Protocol
  • Table 1 can also store other information, such as a virtual local area network (VLAN) and a source MAC address, as shown in Table 2.
  • VLAN virtual local area network
  • source MAC address a source MAC address
  • Step 203 The access forwarding device forwards the multiple packets to the aggregation layer device.
  • the routing table obtains the port and the next hop, and searches for the MAC address corresponding to the next hop according to the Address Resolution Protocol (ARP), and encapsulates the Ethernet header.
  • ARP Address Resolution Protocol
  • the IP address and port number of the next device can be encapsulated in the header of the packet so that the packet can be routed to the next device.
  • Step 204 When the aggregation layer device detects that network congestion occurs, the quintuple of the packet in which the congestion occurs is generated, and the first congestion control message is generated, where the first congestion control message carries the quintuple of the congested packet. Transmitting the first congestion control message to the access forwarding device;
  • the packets received from one port are temporarily stored in the memory queue, and then the packets are sequentially taken out from the queue and forwarded to another port.
  • the performance of network congestion on the network device is that the packets in the queue exceed the threshold.
  • the quintuple of the packet that is congested is obtained in step 204 of the embodiment of the present invention. When the packet in the queue exceeds the threshold, the quintuple of the newly received packet is obtained.
  • Step 205 After receiving the congestion control message, the access forwarding device acquires a quintuple of the congested packet carried in the congestion control message, and acquires congestion according to the mapping relationship between the quintuple and the flow identifier. The flow identifier of the packet; generating a second congestion control message, where the second congestion control message carries the flow identifier.
  • Step 206 Send a second congestion control message to the host 1 to instruct the host 1 to reduce the rate of the packets belonging to the flow identifier.
  • the first congestion control message may be a custom message, and the format of the message is as shown in Table 3:
  • the customized first congestion control message includes at least a 20-byte IP header, an 8-byte UDP header, and a payload field.
  • the structure of the 20-byte IP header includes a Version field, 4 bits, a version number of the specified IP protocol, an IP Header Length (IHL) field, 4 bits, a length of the IP protocol header, and a service type (Type of service field, which defines the processing method of the IP protocol packet; the Total Length field of the IP packet, the Identification field, the Flags field, and the Fragment offset field, when the data packet is When splitting, it connects with more Fragments (MF) to help the destination host to combine the fragmented packets.
  • MF Fragments
  • the Time to Live (TTL) field indicates how long the packet lasts on the network.
  • the value of the router is decremented by one. When it is 0, it will be discarded by the router.
  • the Protocol field 8 bits, indicates the protocol type used by the data part of the IP packet; the Header Checksum field, 16 bits, is Ipv4.
  • the checksum of the packet header, the source address field, and the destination address field For the meaning of these fields, refer to the existing requirements of the Request for Comments (RFC) 791 standard, and details are not described herein.
  • the structure of the 8-byte UDP header includes a Source Port field, a Destination Port field, a Length field, and a CheckSum field. For the meaning of these fields, refer to the existing requirements of the Request for Comments (RFC) 791 standard, and details are not described herein.
  • the source IP address of the IP header is the IP address of the converged aggregation device; the destination IP address is the IP address of the access forwarding device, and the destination port number of the UDP header is used to identify the first congestion control message, for example, 0x2000 is defined.
  • the access forwarding device reads the destination port number after receiving the first congestion message. If the value is 0x2000, the received message is a customized congestion control message.
  • the next step is to convert it to a standard CNM. .
  • the payload portion is encapsulated with a quintuple of the congested message.
  • the payload portion is also encapsulated with a field for indicating the degree of congestion.
  • a field for indicating the degree of congestion For example, a QntzFb field, a Congestion Point Identifier (CPID) field, a queue offset value (Qoffset) field, and a Qdelta field.
  • the first congestion control message may be as shown in FIG. 4A, and the QntzFb field is identified as a quantized feedback value of the first congestion control message, the length is 6 bits, occupying the lower 6 bits of the first 2 bytes; and the CPID field identifies the congestion point identifier. The length is 8 bytes.
  • the MAC address of the congestion point device is used as the upper 6 bytes, and the lower 2 bytes are used to identify different ports of the same device or different priority queues;
  • the Qoffset field occupies 2 bytes, indicating the congestion point.
  • the number of bytes available for the CP send queue is currently free;
  • the Qdelta field occupies 2 bytes, indicating the difference between the number of available bytes of the CP send queue for the congestion point.
  • the second congestion control message may refer to an existing CNM format, as shown in FIG. 4B, including a destination IP address, a source IP address, an S-TAT, a C-TAG, a CN-TAG, and a payload.
  • the payload further includes: Version, Reserved, QntzFb, CPID, Qoffset, Qdelta, Encapsulated Priority, Encapsulated Destination MAC address, Encapsulated MSDU length, Encapsulated MSDU, and the like.
  • the MAC address of the congestion point device is used as the upper 6 bytes, and the lower 2 bytes are used to identify different ports of the same device or different priority queues;
  • QOffset occupies 2 bytes, indicating the congestion point CP The number of bytes that are currently available for the send queue;
  • QDelta occupies 2 bytes, indicating the difference between the number of available bytes of the CP send queue for the congestion point;
  • Encapsulated priority takes 2 bytes, and the 1st byte is 3 Bit, padding triggers the CNM message data frame Priority, other bits are padded with 0;
  • Encapsulated destination MAC address occupies 6 bytes, filling the destination MAC address of the CNM message data frame;
  • Encapsulated MSDU length occupies 2 bytes, indicating the length of the Encapsulated MSDU field;
  • Encapsulated MSDU occupies at most 64 Bytes that fill the contents of the data frame that triggered the CNM message.
  • the flow identifier is encapsulated in an RPID field included in a CN-TAG field of the CNM message.
  • the method 200 further includes: before acquiring the quintuple of the packet in which the congestion occurs:
  • Step A When the aggregation layer device detects that network congestion occurs, determine which type of source node the packet from which congestion occurs is generated;
  • Step B1 When the source node of the packet belongs to the first type, step 204 is performed;
  • Step B2 When the source node of the packet belongs to the second type, perform the corresponding steps in Embodiment 4 or Embodiment 5.
  • the first type of source node is a Data Center Qualitified Congestion Notification (DCQCN) DCQCN server.
  • DCQCN Data Center Qualitified Congestion Notification
  • the packet type sent by the QCN server is a data packet carrying the CN-TAG, as shown in FIG. 3A.
  • the difference between the data packet and the normal packet is that the CN-TAG field is added to the Ethernet header, and the RPID is a Flow-ID, which is used to uniquely identify each packet (data stream) sent by the server.
  • the second type of source node is an RDMA over Converged Ethernet version 2.0 (RoCEv2) server, where the RDMA is a Remote Direct Memory Access (RDMA).
  • the format of the packet sent by this RoCEv2 server is different from that of the CNP server. For details, refer to FIG. 8A and the description of the fourth or fifth embodiment.
  • the type of source node from which the packet in which congestion occurs is derived.
  • the first implementation manner is: a new field is extended in the packet, and the name of the field is a source node type, which occupies 1 bit. When the value is 1, it indicates the first type; When the value is 0, it is represented as the second type.
  • the second implementation manner is: determining the type of the source node host #1 according to the priority field of the message. For example, when the priority is level 1, the host type is the first type. When the priority is level 2, the host type is the second type.
  • the third implementation manner is: determining the type of the source node host #1 according to other fields of the message. For example, it is judged according to the source IP address field. For example, the IP address in one network segment is used for the first type of host, and the IP address of the other network segment is used for the second type of host.
  • the host 1 performs rate limiting or lowers the sending rate on the packets belonging to the flow identifier according to the received second congestion control message, and periodically attempts to increase the sending rate of the packet. Eliminate, increase the sending rate of packets without causing congestion, and will not receive CNM. The rate at which the message is sent can eventually be restored to the value before the congestion.
  • the technical solution provided by the embodiment of the present invention when detecting the congestion of the network, acquires a quintuple of the packet in which the network is congested, and obtains the flow identifier of the packet in which the network is congested according to the quintuple information, thereby generating congestion. Controlling the message and sending the message to the source node of the packet, instructing it to decelerate the packet with the flow identifier, which can be used for three Layer network.
  • FIG. 5 is an interaction diagram of a network congestion control method according to another embodiment of the present invention, specifically:
  • Step 501 Host 1 sends multiple packets to the access forwarding device.
  • Step 502 The access forwarding device receives multiple packets, obtains the quintuple and the flow identifier information in the multiple packets, and establishes a mapping relationship between the quintuple and the flow identifier, and saves the local relationship; the mapping relationship
  • Table 1 or Table 2 in Embodiment 1 please refer to Table 1 or Table 2 in Embodiment 1, and details are not described herein again.
  • Step 503 Acquire a quintuple of the packet in which the network is congested when the network is congested; obtain the flow identifier of the packet in which the network is congested according to the mapping table of the quintuple and the flow identifier; generate network congestion Controlling a message, the message carrying the flow identifier.
  • Step 504 Send a network congestion control message to the host 1 to instruct the host 1 to reduce the sending rate of the packets belonging to the flow identifier.
  • the format of the network congestion control message may refer to an existing CNM format, as shown in FIG. 4B, including a destination IP address, a source IP address, an S-TAT, a C-TAG, a CN-TAG, and a payload.
  • a destination IP address a source IP address
  • an S-TAT a C-TAG
  • a CN-TAG a CN-TAG
  • the acquired stream identifier is encapsulated in the RPID field below the CN-TAG field of the CNM message.
  • the method 500 further includes: before acquiring the quintuple of the packet in which the network congestion occurs:
  • Step A When it is detected that network congestion occurs, it is determined which type of source node the packet in which congestion occurs is derived;
  • Step B When the message is originated from the source node of the first type, proceed to step 503;
  • the source node of the first type is a DCQCN server.
  • the source node of the second type is a RoCEv2 server.
  • the type of source node from which the packet in which congestion occurs is derived.
  • the first implementation manner is: a new field is extended in the packet, and the name of the field is a source node type, which occupies 1 bit. When the value is 1, it indicates the first type; When the value is 0, it is represented as the second type.
  • the second implementation manner is: determining the type of the source node host #1 according to the priority field of the message. For example, when the priority is level 1, the host type is the first type. When the priority is level 2, the host type is the second type.
  • the third implementation manner is: determining the type of the source node host #1 according to other fields of the message. For example, it is judged according to the source IP address field. For example, the IP address in one network segment is used for the first type of host, and the IP address of the other network segment is used for the second type of host.
  • the technical solution provided by the embodiment of the present invention when detecting the congestion of the network, acquires a quintuple of the packet in which the network is congested, and obtains the flow identifier of the packet in which the network is congested according to the quintuple information, thereby generating congestion.
  • Control elimination And send a message to the source node of the packet, instructing it to decelerate the packet with the flow identifier, which can be used in the Layer 3 network.
  • FIG. 6 shows an interaction diagram of the network congestion control method according to another embodiment of the present invention, which is specifically:
  • Step 601 Host 1 sends multiple packets to the access forwarding device.
  • Step 602 After receiving the multiple packets, the access forwarding device obtains the quintuple and the flow identifier information in the multiple packets, and establishes a mapping relationship between the quintuple and the flow identifier, and saves the mapping relationship table. Please refer to Table 1 or Table 2 in Embodiment 1 for details.
  • Step 603 Forward the multiple packets to the first intermediate node device (such as an aggregation layer device);
  • Step 604 The first intermediate node device receives the multiple packets, and forwards the multiple packets to the second intermediate node device (such as a core network device, a backbone network device, etc.);
  • the second intermediate node device such as a core network device, a backbone network device, etc.
  • Step 605 When the second intermediate node device detects that the network is congested, the quintuple information of the packet in which the congestion occurs is generated, and the first congestion control message is generated, where the message carries the quintuple of the congestion packet.
  • Step 606 Send the first congestion control message back to the first intermediate node device.
  • Step 607 After receiving the first congestion control message, the first intermediate node device forwards the first congestion control message to the access forwarding device.
  • Step 608 The access forwarding device receives the first congestion control message, and obtains a flow identifier of the packet in which the congestion occurs according to the mapping relationship between the quintuple and the flow identifier, and generates a second congestion control message, where the second congestion occurs. Controlling the message carrying the flow identifier;
  • Step 609 Send the second congestion control message to the host 1 to instruct the host 1 to perform a deceleration process on the packet having the flow identifier.
  • the steps described in the foregoing steps 601-609 are for the same type of scenario in the type of the host in FIG.
  • the method 600 should include: before acquiring the quintuple of the packet in which the network is congested:
  • Step A When it is detected that network congestion occurs, it is determined which type of source node the packet in which congestion occurs is derived;
  • Step B When the message is originated from the source node of the first type, proceed to step 605;
  • the source node of the first type is a CNP server.
  • the type of source node from which the packet in which congestion occurs is derived.
  • the first implementation manner is: a new field is extended in the packet, and the name of the field is a source node type, which occupies 1 bit. When the value is 1, it indicates the first type; When the value is 0, it is represented as the second type.
  • the second implementation manner is: determining the type of the source node host #1 according to the priority field of the message. For example, when the priority is level 1, the host type is the first type. When the priority is level 2, the host type is the second type.
  • the third implementation manner is: determining the type of the source node host #1 according to other fields of the message. For example, it is judged according to the source IP address field. For example, the IP address on one network segment is used for the first type of host and on the other network segment. The IP address is used for the second type of host.
  • the technical solution provided by the embodiment of the present invention when detecting the congestion of the network, acquires a quintuple of the packet in which the network is congested, and obtains the flow identifier of the packet in which the network is congested according to the quintuple information, thereby generating congestion.
  • the control message is sent to the source node of the packet, and is instructed to perform the deceleration processing on the packet with the flow identifier, which can be used in the Layer 3 network.
  • FIG. 7 shows an interaction diagram of network congestion control according to another embodiment of the present invention.
  • the type of the host is the same as that in the first embodiment.
  • the host type of the third type is different from that of the RoCEv2 server mentioned in the first embodiment; the format of the sent message is different from the first embodiment to the sixth embodiment.
  • the method provided by the embodiment specifically includes:
  • Step 701 Host 1 sends one or more packets to the access forwarding device.
  • FIG. 8A shows a message structure, as described in FIG. 8A, including a 2-layer Ethernet header Eth L2Header, a network layer header IP Header, a user data header UDP Header, and an IB basic transport header (IB BTH).
  • IB payload InfiniBand Payload, IB Payload
  • ICRC cyclic redundancy code
  • FCS Frame Check Sequence
  • the BTH field further includes a plurality of subfields, as shown in FIG. 8C.
  • the flow identifier is encapsulated in the BTH field.
  • the flow identifier is encapsulated in the DestQP field.
  • Step 702 The access forwarding device receives the one or more packets and forwards the packet to the first intermediate node device.
  • Step 703 The first intermediate node device receives the one or more packets and forwards the packet to the second intermediate node device.
  • Step 704 When the second intermediate node device detects that the network is congested, generates a CNP message, where the message carries the flow identifier of the packet in which the network is congested.
  • FIG. 8B shows a schematic diagram of a frame format of a CNP message.
  • the method includes a MAC header MAC Header, an IP header IP header, a UDP header UDP header, a basic transport header BTH, a reserved field reserved, a cyclic redundancy code ICRC check, and a Frame Check Sequence (FCS) field.
  • FCS Frame Check Sequence
  • the BTH field further includes a plurality of subfields, as shown in FIG. 8C.
  • the destination queue DestQP, the opcode opcode, and the like, the stream identifier is encapsulated in the DestQP field in the BTH field.
  • Step 705 The second intermediate node device sends the CNP message to the first intermediate node device.
  • Step 706 The first intermediate node device receives the CNP message and forwards the message to the access forwarding device.
  • Step 707 The access forwarding device receives the CNP message, and forwards the CNP message to the source node, instructing the source node to reduce the transmission rate of the packet with the flow identifier.
  • the data flow mentioned in the embodiment of the present invention can be uniquely determined by the source IP, the destination IP, the protocol number, the source port number, the destination port number, the SrcQP, and the DestQP 7 tuples.
  • step 804 when the first When the second intermediate node detects that the network is congested, it identifies the packet that caused the congestion, and parses out the destination queue (DestQP) carried in the packet.
  • DestQP destination queue
  • the CNP set the DestQP to the source queue (Source). Queue Pair, SrcQP).
  • the source IP address and destination IP address of the CNP are the destination IP address and source IP address of the congestion packets.
  • the source port number and destination port number of the UDP packet are the source port number and destination port number of the congestion packet respectively.
  • the opcode is 0x81.
  • the steps described in the foregoing steps 701-707 are for the same type of scenario in the type of the host in FIG.
  • the method 700 should also include:
  • Step A When it is detected that network congestion occurs, it is determined which type of source node the packet in which congestion occurs is derived;
  • Step B When the message is originated from the source node of the second type, proceed to step 704;
  • the source node of the second type is a RoCEv2 server.
  • the type of source node from which the packet in which congestion occurs is derived.
  • the first implementation manner is: a new field is extended in the packet, and the name of the field is a source node type, which occupies 1 bit. When the value is 1, it indicates the first type; When the value is 0, it is represented as the second type.
  • the second implementation manner is: determining the type of the source node host #1 according to the priority field of the message. For example, when the priority is level 1, the host type is the first type. When the priority is level 2, the host type is the second type.
  • the third implementation manner is: determining the type of the source node host #1 according to other fields of the message. For example, it is judged according to the source IP address field. For example, the IP address in one network segment is used for the first type of host, and the IP address of the other network segment is used for the second type of host.
  • the embodiment of the present invention when detecting congestion in the network, directly generates CNP when the congestion reaches the threshold, greatly shortening the control loop delay (CLD), thereby reducing the queue depth and reporting. Time delay to improve business performance.
  • CLD control loop delay
  • FIG. 9 shows an interaction diagram of network congestion control according to another embodiment of the present invention.
  • the type of the host and the fourth embodiment The types described are consistent, specifically:
  • Step 901 Host 1 sends multiple packets to the access forwarding device.
  • the packet carries a flow identifier, where the flow identifier is encapsulated in a BTH field of the packet.
  • the packet carries a flow identifier, where the flow identifier is encapsulated in a DestQP field of the packet.
  • Step 902 The access forwarding device receives the multiple packets and forwards the packets to the aggregation layer device.
  • Step 903 When the aggregation layer device detects that the network is congested, generates a CNP message, where the message carries the flow identifier of the packet in which the network is congested.
  • the flow identifier is encapsulated in a BTH field of the CNP message.
  • the flow identifier is encapsulated in a DestQP field of the CNP message.
  • Step 904 The aggregation layer device sends the CNP message to the access forwarding device.
  • Step 905 The access forwarding device receives the CNP message and forwards it to the source node host 1 to indicate the source node master.
  • the machine 1 reduces the transmission rate of the message with the flow identifier;
  • the format of the CNP message is shown in Embodiment 4 and FIG. 8B, and details are not described herein again.
  • the data packet mentioned in the embodiment of the present invention may be uniquely determined by the source IP, the destination IP, the protocol number, the source port number, the destination port number, the SrcQP, and the DestQP 7 tuples.
  • the aggregation layer device detects that the network is congested, the packet that causes the congestion is identified, and the DestQP carried in the packet is parsed.
  • the DestQP is set to SrcQP.
  • the source IP address and destination IP address of the CNP message are the destination IP address and source IP address of the congestion packet.
  • the source port number and destination port number of the UDP packet are the source port number and destination port number of the congestion packet respectively.
  • the opcode can be set. Is 0x81.
  • the steps described in the foregoing steps 901-905 are for the same type of scenario in the type of the host in FIG.
  • the method 900 should also include: before generating the CNP message:
  • Step A When it is detected that network congestion occurs, it is determined which type of source node the packet in which congestion occurs is derived;
  • Step B When the message is originated from the source node of the second type, proceed to step 903;
  • Embodiment 1 when the packet is originated from a source node of the first type, refer to the descriptions of Embodiment 1 to Embodiment 3.
  • the first type of source node is a DCQCN server.
  • the source node of the second type is a RoCEv2 server.
  • the type of source node from which the packet in which congestion occurs is derived.
  • the first implementation manner is: a new field is extended in the packet, and the name of the field is a source node type, which occupies 1 bit. When the value is 1, it indicates the first type; When the value is 0, it is represented as the second type.
  • the second implementation manner is: determining the type of the source node host #1 according to the priority field of the message. For example, when the priority is level 1, the host type is the first type. When the priority is level 2, the host type is the second type.
  • the third implementation manner is: determining the type of the source node host #1 according to other fields of the message. For example, it is judged according to the source IP address field. For example, the IP address in one network segment is used for the first type of host, and the IP address of the other network segment is used for the second type of host.
  • the technical solution provided by the embodiment of the present invention can reduce the queue delay and the packet delay and improve the service when the network is congested and the CNP is directly generated when the congestion reaches the threshold. performance.
  • FIG. 10 is a schematic diagram showing the internal structure of a network device according to another embodiment of the present invention.
  • the network device 1000 includes a processing unit 1010 and a sending unit 1020, where the processing unit 1010 is configured to: when detecting network congestion, acquire a quintuple of a packet that causes network congestion; generate a congestion control message, The congestion control message carries the quintuple of the packet, and the sending unit 1020 is configured to send the congestion control message to the access forwarding device of the source node of the packet.
  • quintuple refers to the source IP address, destination IP address, source port number, destination port number, and protocol type carried in the packet.
  • the processing unit 1010 is further configured to determine a type of a source node of the packet.
  • the type of the source node of the packet is determined by using priority information, a source node type field, or an IP address carried in the packet.
  • the type of source node from which the packet in which congestion occurs is derived.
  • An implementation scheme is: a new field is extended in the packet, and the name of the field is a source node type, which occupies 1 bit. When the value is 1, it indicates the first type; when the value is 0, the value is 0. When expressed as the second type.
  • the second implementation manner is: determining the type of the source node host #1 according to the priority field of the message. For example, when the priority is level 1, the host type is the first type. When the priority is level 2, the host type is the second type.
  • the third implementation manner is: determining the type of the source node host #1 according to other fields of the message. For example, it is judged according to the source IP address field. For example, the IP address in one network segment is used for the first type of host, and the IP address of the other network segment is used for the second type of host.
  • the format of the congestion control message may refer to the descriptions in Embodiment 1, Table 3, and FIG. 4A, and details are not described herein again.
  • the congestion control message also carries information indicating the degree of congestion.
  • the information used to indicate the degree of congestion is one or more of the following: a quantitative feedback value of the congestion control message, a congestion point identifier, a number of bytes currently available for the congestion point transmission queue, and a secondary acquisition congestion point CP. The difference in the number of bytes available in the send queue.
  • FIG. 11 is a schematic diagram showing the internal structure of a network device according to another embodiment of the present invention.
  • the network device 1100 includes a receiving unit 1110, a processing unit 1120, and a sending unit 1130, where the receiving unit 1110 is configured to receive a first congestion control message, where the first congestion control message carries congestion. a quintuple of the packet; the processing unit 1120 is configured to obtain a flow identifier of the packet according to the quintuple of the packet; generate a second congestion control message, where the second congestion control message carries the flow identifier
  • the sending unit 1130 is configured to send the second congestion control message to a source node of the message.
  • the receiving unit 1110 is further configured to receive one or more packets sent by the source node of the packet.
  • the network device 1100 further includes a storage unit 1140, configured to use a mapping relationship table between the quintuple and the flow identifier of the one or more packets.
  • a mapping relationship table between the quintuple and the flow identifier may be as shown in Table 2 or Table 3, and details are not described herein again.
  • the flow identifier is encapsulated in an RPID field in a CN-TAG field of the message, the five-tuple being encapsulated in an MSDU field of the message.
  • the format of the first congestion control message and the second congestion control message may refer to the description of Embodiment 1.
  • the second congestion control message is a congestion notification message, where the congestion notification message carries a CN-TAG field, where the flow identifier is carried in an RPID field of the CN-TAG field.
  • the first congestion control message further carries information for indicating a degree of congestion.
  • the information used to indicate the degree of congestion is one or more of the following information: a quantized feedback value of the congestion control message, a congestion point identifier, and a number of bytes currently available for the congestion point sending queue. The difference between the number of available bytes of the congestion point transmission queue is obtained twice.
  • the processing unit 1120 is further configured to determine a type of the source node of the packet.
  • the technical solution provided by the embodiment of the present invention when detecting the congestion of the network, acquires a quintuple of the packet in which the network is congested, and obtains the flow identifier of the packet in which the network is congested according to the quintuple information, thereby generating congestion.
  • the control message is sent to the source node of the packet, and is instructed to perform the deceleration processing on the packet with the flow identifier, which can be used in the Layer 3 network.
  • FIG. 12 is a schematic diagram showing the internal structure of a network device according to another embodiment of the present invention.
  • the network device 1200 includes a receiving unit 1210, a storage unit 1220, a processing unit 1230, and a transmitting unit 1240.
  • the receiving unit 1210 is configured to receive a plurality of packets from the source node
  • the storage unit 1220 is configured to store a mapping relationship between the quintuple and the stream identifier in the plurality of packets.
  • the processing unit 1230 is configured to When the network congestion is detected, the quintuple of the packet in which the congestion occurs is obtained; the flow identifier corresponding to the packet is obtained according to the quintuple; and the congestion control message is generated, where the congestion control message carries the flow identifier;
  • the sending unit 1240 is configured to send the congestion control message to the source node device of the congested packet, so that it reduces the transmission rate of the packet according to the congestion control message.
  • the processing unit 1230 is further configured to determine a type of a source node of the message.
  • the type of source node from which the packet in which congestion occurs is derived.
  • the first implementation manner is: a new field is extended in the packet, and the name of the field is a source node type, which occupies 1 bit. When the value is 1, it indicates the first type; When the value is 0, it is represented as the second type.
  • the second implementation manner is: determining the type of the source node host #1 according to the priority field of the message. For example, when the priority is level 1, the host type is the first type. When the priority is level 2, the host type is the second type.
  • the third implementation manner is: determining the type of the source node host #1 according to other fields of the message. For example, it is judged according to the source IP address field. For example, the IP address in one network segment is used for the first type of host, and the IP address of the other network segment is used for the second type of host.
  • the format of the congestion control message may be referred to the description in Embodiment 1 or FIG. 4A, and details are not described herein again.
  • the congestion control message further carries information indicating a degree of congestion.
  • the information used to indicate the degree of congestion is one or more of the following information: a quantized feedback value of the congestion control message, a congestion point identifier, and a number of bytes currently available for the congestion point sending queue. The difference between the number of available bytes of the congestion point transmission queue is obtained twice.
  • the technical solution provided by the embodiment of the present invention when detecting the congestion of the network, acquires a quintuple of the packet in which the network is congested, and obtains the flow identifier of the packet in which the network is congested according to the quintuple information, thereby generating congestion.
  • the control message is sent to the source node of the packet, and is instructed to perform the deceleration processing on the packet with the flow identifier, which can be used in the Layer 3 network.
  • unit may refer to an application-specific integrated circuit (ASIC), electronic circuit, (shared, dedicated or group) processor, and memory that executes one or more software or firmware programs. Combining logic circuits, and/or other suitable components that provide the described functionality.
  • ASIC application-specific integrated circuit
  • FIG. 13 is a schematic block diagram of a network device according to another embodiment of the present invention.
  • the network device 1300 includes a processor 1310, a memory 1320, a bus 1330, and a user interface 1340, a network interface 1350.
  • processor 1310 controls the operation of network device 1300, which may be a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array, or other programmable logic device.
  • a user interface 1360 configured to connect to a lower layer network device
  • the network interface 1350 is configured to connect to an upper layer network device.
  • bus 1350 which in addition to the data bus includes a power bus, a control bus, and a status signal bus.
  • various buses are labeled as bus system 1350 in the figure. It should be noted that the foregoing description of the structure of the network element can be applied to the embodiment of the present invention.
  • the memory 1320 may include a Read Only Memory (ROM) and a Random Access Memory (RAM), or other types of dynamic storage devices that can store information and instructions, or may be a disk storage.
  • the memory 1320 can be used to save instructions that implement the related methods provided by embodiments of the present invention. It will be appreciated that at least one of the cache and long term storage is programmed or loaded by the processor 1310 to the network element 1300.
  • the memory is for storing computer executable program code, wherein when the program code includes an instruction, when the processor executes the instruction, the instruction causes the network Meta performs the following operations:
  • the instruction when the processor executes the instruction, the instruction causes the network element to perform the following operations:
  • the instruction when the processor executes the instruction, the instruction causes the network element to perform the following operations:
  • the source node is instructed to reduce the transmission rate of the packet with the flow identifier.
  • FIG. 14 is a schematic diagram showing the internal structure of a network device according to another embodiment of the present invention.
  • the network device 1400 includes a receiving unit 1410, a processing unit 1420, and a transmitting unit 1430.
  • the receiving unit 1410 is configured to receive a plurality of packets from the source node.
  • the processing unit 1420 is configured to generate a CNP message when the network is detected to be congested, where the CNP message carries the flow of the congested packet.
  • the sending unit 1430 is configured to send the CNP message to the source node, where the source node is configured to reduce the transmission rate of the packet that belongs to the flow identifier.
  • the packet is a standard RoCE packet or a standard RoCEv2 packet.
  • the packet carries a flow identifier, where the flow identifier is encapsulated in a BTH field of the packet.
  • the packet carries a flow identifier, where the flow identifier is encapsulated in a DestQP field of the packet.
  • the CNP message carries a flow identifier, and the flow identifier is encapsulated in a BTH field of the CNP message.
  • the CNP message carries a flow identifier, where the flow identifier is encapsulated in a DestQP field of the CNP message.
  • the processing unit 1420 is further configured to determine a type of the source node.
  • the type of the source node is determined by a priority field of the message.
  • determining the priority information of the source node of the packet has multiple implementation schemes.
  • the first implementation manner is: extending a new field in the packet, where the name of the field is a source node type. It occupies 1 bit. When the value is 1, it indicates the first type. When the value is 0, it indicates the second type.
  • the second implementation manner is: determining the type of the source node according to the priority field of the packet. For example, when the priority is level 1, the host type is the first type. When the priority is level 2, the host type is the second type.
  • the third implementation manner is: determining the type of the source node according to other fields of the message. For example, based on the port number, or based on the protocol type and other fields to determine. For example, when the protocol type is 6, the host is the first type; when the protocol type is 7, the host is the second type.
  • the type of the source node is a RoCEv2 server.
  • the destination address of the CNP message is an IP address of the RoCEV2 server.
  • the technical solution provided by the embodiment of the present invention can reduce the queue depth and the packet delay and improve the service by directly generating the CNP when the network is congested in the packet sent by the RoCEv2 server. performance.
  • FIG. 15 is a schematic block diagram of a network device according to another embodiment of the present invention.
  • the network device 1500 includes a processor 1510, a memory 1520, a bus 1530 and a user interface 1540, and a network interface 1550.
  • processor 1510 controls the operation of network device 1500, which may be a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array, or other programmable logic device.
  • a user interface 1560 configured to connect to an underlying network device
  • the network interface 1550 is configured to connect to an upper layer network device.
  • bus 1550 which in addition to the data bus includes a power bus, a control bus, and a status signal bus.
  • various buses are labeled as bus system 1550 in the figure. It should be noted that the foregoing description of the structure of the network element can be applied to Embodiments of the invention.
  • the memory 1520 may include a Read Only Memory (ROM) and a Random Access Memory (RAM), or other types of dynamic storage devices that may store information and instructions, or may be a disk storage.
  • the memory 1520 can be used to save instructions that implement the related methods provided by embodiments of the present invention. It will be appreciated that at least one of the cache and long term storage is programmed or loaded into the processor 1510 of the network element 1500 by programming or loading executable instructions.
  • the memory is for storing computer executable program code, wherein when the program code includes an instruction, when the processor executes the instruction, the instruction causes the network Meta performs the following operations:
  • Embodiments of the present invention also provide a computer storage medium for storing computer software instructions for use by a user equipment, including a program designed to perform the above aspects.
  • the embodiment of the invention further provides a computer storage medium for storing computer software instructions used by the network device, which comprises a program designed to execute the above aspects.
  • the embodiment of the present invention further provides a communication network system, including a source node, a first intermediate node, a second intermediate node, and a destination node device, where
  • a source node configured to send one or more packets to the first intermediate node device
  • a first intermediate node configured to store a mapping relationship table of the quintuple and the flow identifier of the one or more messages, and forward the one or more messages to the second intermediate node;
  • a second intermediate node configured to acquire a quintuple of the packet in which congestion occurs when the network congestion is detected; generate a first congestion control message, where the first congestion control message carries the packet of the congested packet a tuple; and transmitting the first congestion control message to the first intermediate node;
  • the first intermediate node is configured to acquire, according to the quintuple carried by the first congestion control message, a flow identifier of the packet in which the congestion occurs, and generate a second congestion control message, where the second congestion control message carries The flow identifier; sending the second congestion control message to the source node;
  • the source node is configured to perform rate limiting processing on the packet with the flow identifier according to the second congestion control message.
  • the technical solution provided by the embodiment of the present invention when detecting the congestion of the network, acquires the quintuple of the packet in which the network is congested, and obtains the flow identifier of the packet in which the network is congested according to the quintuple information, thereby generating a congestion control message. And sending a packet to the source node of the packet, instructing it to perform the deceleration processing on the packet with the flow identifier, which can be used in the Layer 3 network.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明实施例提供了一种网络拥塞控制方法、设备及系统。所述方法包括接收第一拥塞控制消息,其中,所述第一拥塞控制消息携带引发拥塞的报文的五元组;根据所述引发拥塞的报文的五元组,获取所述报文的流标识;生成第二拥塞控制消息,其中所述第二拥塞控制消息携带所述五元组对应的流标识;向所述报文的源节点发送第二拥塞控制消息。本发明实施例可以解决三层IP网络的网络拥塞问题。本发明实施例提供的技术方案当检测到网络发生拥塞时,获取发生网络拥塞的报文的五元组,根据五元组信息获取发生网络拥塞的报文的流标识,进而生成拥塞控制消息,并发送会至报文的源节点,指示其对具有该流标识的报文进行降速处理,可以用于三层网络中。

Description

一种网络拥塞控制方法、设备及系统 技术领域
本发明涉及网络通信技术领域,尤其涉及一种网络拥塞控制方法、设备及系统。
背景技术
现有技术中,当网络中某一个节点检测到网络拥塞,会向发送报文的源节点发送拥塞通告消息(Congestion Notification Message,CNM),请求源节点降低报文的发送速率。发送报文的源节点收到上述CNM后,降低报文的发送速率,并周期性尝试增加报文的发送速率,如果此时拥塞已经消除,增加报文的发送速率并不会引起拥塞,也就不会再收到CNM。报文的发送速率最终可以恢复到拥塞之前的值。
但是该种技术方案中存在一个技术问题:检测到网络拥塞的节点需要将引起拥塞的报文中携带的流标识(Flow Identifier,Flow-ID)封装到CNM中,并且需要将CNM中的目的媒体接入控制地址(Media Access Control,MAC)设置为引起拥塞的报文的源MAC地址,这样才能保证CNM能够被发送至源节点。源节点根据Flow-ID来确定对具有该Flow-ID的报文进行限速。上述CNM机制只能用于二层以太网(Ethernet)网络中,因为在三层因特网协议(Internet Protocol,IP)网络中,每做一次路由,报文头将被替换,Flow-ID和源节点的源MAC地址将丢失。因此,后面的节点将无法构造CNM并发送至源节点,因此CNM机制无法支持三层IP网络。
发明内容
有鉴于此,实有必要提供一种网络拥塞控制方法、设备及系统,用于解决三层IP网络的网络拥塞问题。
第一方面,提供一种网络拥塞控制方法,包括当检测到网络拥塞时,获取引发网络拥塞的报文的五元组;生成拥塞控制消息,其中,所述拥塞控制消息携带所述报文的五元组;将所述拥塞控制消息发往所述报文的源节点或者所述报文的源节点所连接的接入转发设备。
在一种可能的设计中,所述方法还包括确定所述报文的源节点的类型。
在另一种可能的设计中,所述报文的源节点的类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
在一种可能的设计中,所述报文的五元组封装在所述拥塞控制消息的净荷域。
在另一种可能的设计中,所述拥塞控制消息的目的地址是所述报文的源节点的因特网协议IP地址,或者是所述报文的源节点所连接的接入转发设备的IP地址。
在另一种可能的设计中,所述拥塞控制消息还携带用于指示拥塞程度的信息。
在另一种可能的设计中,所述用于指示拥塞程度的信息是下述信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
在另一种可能的设计中,所述五元组为报文中携带的源IP地址、目的IP、源端口号、目的端口号、协议类型。
另一方面,还提供一种网络拥塞控制方法,包括接收第一拥塞控制消息,其中,所述 第一拥塞控制消息携带引发拥塞的报文的五元组;根据所述引发拥塞的报文的五元组,获取所述报文的流标识;生成第二拥塞控制消息,其中所述第二拥塞控制消息携带所述报文的流标识;向所述报文的源节点发送第二拥塞控制消息。
在一种可能的设计中,所述方法还包括接收来自所述源节点的一个或多个报文;建立所述一个或多个报文的五元组和流标识的映射关系。
在另一种可能的设计中,所述报文为数据帧,所述数据帧携带CN-TAG字段,所述流标识携带在所述CN-TAG字段的RPID字段中。
在另一种可能的设计中,所述第二拥塞控制消息为拥塞通知消息,所述拥塞通知消息携带CN-TAG字段,所述流标识携带在所述CN-TAG字段的RPID字段中。
在另一种可能的设计中,所述第一拥塞控制消息还携带用于指示拥塞程度的信息。
在另一种可能的设计中,所述用于指示拥塞程度的信息为以下信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
在另一种可能的设计中,所述五元组为报文中携带的源IP地址、目的IP、源端口号、目的端口号、协议类型。
另一方面,还提供一种网络拥塞控制方法,包括接收来自源节点的一个或多个报文;建立所述一个或多个报文中五元组与其流标识之间的映射关系;当检测到网络拥塞时,获取发生拥塞的报文的五元组;根据所述映射关系,获取所述发生拥塞的报文的流标识;生成拥塞控制消息,并发送至报文源节点,其中,所述拥塞控制消息携带所述流标识。
在一种可能的设计中,所述方法还包括确定所述报文的源节点的类型。
在另一种可能的设计中,所述报文的源节点的类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
在另一种可能的设计中,所述报文为数据帧,所述数据帧携带CN-TAG字段,所述流标识携带在所述CN-TAG字段的RPID字段中。
在另一种可能的设计中,所述拥塞控制消息为拥塞通知消息,所述拥塞通知消息携带CN-TAG字段,所述流标识携带在所述CN-TAG字段的RPID字段中。
在另一种可能的设计中,所述拥塞控制消息还携带用于指示拥塞程度的信息。
在另一种可能的设计中,所述用于指示拥塞程度的信息为以下信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
在另一种可能的设计中,所述五元组为报文中携带的源IP地址、目的IP、源端口号、目的端口号、协议类型。
另一方面,还提供一种网络拥塞控制方法,包括接收来自源节点的多个报文;当检测到网络拥塞,生成CNP消息;将所述CNP消息发送所述源节点,用于指示所述源节点降低数据速率。
在一种可能的设计中,所述报文为RoCE报文。
在另一种可能的设计中,所述报文携带流标识,所述流标识封装在所述报文的BTH字段中。
在另一种可能的选择中,所述流标识封装在所述报文的DestQP字段中。
在另一种可能的设计中,所述CNP消息携带流标识,所述流标识封装在所述CNP消 息的BTH字段中。
在另一种可能的设计中,所述CNP消息携带流标识,所述流标识封装在所述CNP消息的DestQP字段中。
在另一种可能的设计中,所述方法还包括确定所述源节点的类型。
在另一种可能的设计中,所述源节点的类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
在另一种可能的设计中,所述源节点的类型为RoCEv2服务器。
在另一种可能的设计中,所述CNP消息的目的地址为所述RoCEv2服务器的地址。
另一方面,还提供一种网络设备,包括处理单元,用于当检测到网络拥塞时,获取引发网络拥塞的报文的五元组;生成拥塞控制消息,其中,所述拥塞控制消息携带所述报文的五元组;发送单元,用于将所述拥塞控制消息发往所述报文的源节点或者所述报文的源节点所连接的接入转发设备。
在一种可能的设计中,所述处理单元还用于确定所述报文的源节点的类型。
在另一种可能的设计中,所述报文的源节点的类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
在另一种可能的设计中,所述报文的五元组封装在所述拥塞控制消息的净荷域。
在另一种可能的设计中,所述拥塞控制消息的目的地址是所述报文的源节点的因特网协议IP地址,或者是所述报文的源节点所连接的接入转发设备的IP地址。
在另一种可能的设计中,所述拥塞控制消息还携带用于指示拥塞程度的信息。
在另一种可能的设计中,所述用于指示拥塞程度的信息是下述信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
另一方面,还提供一种网络设备,包括接收单元,用于接收第一拥塞控制消息,其中,所述第一拥塞控制消息携带引发拥塞的报文的五元组;处理单元,用于根据所述引发拥塞的报文的五元组,获取所述报文的流标识;生成第二拥塞控制消息,所述第二拥塞控制消息携带所述流标识;发送单元,用于将所述第二拥塞控制消息发送至所述报文的源节点。
在一种可能的设计中,所述接收单元,还用于接收来自所述源节点的一个或多个报文。
在另一种可能的设计中,所述处理单元,还用于建立所述一个或多个报文的五元组和流标识的映射关系。
在另一种可能的设计中,所述网络设备还包括存储单元,用于存储报文的五元组与流标识的映射关系表。
在另一种可能的设计中,所述报文为数据帧,所述数据帧携带CN-TAG字段,所述流标识携带在所述CN-TAG字段的RPID字段中。
在另一种可能的设计中,所述第二拥塞控制消息为拥塞通知消息,所述拥塞通知消息携带CN-TAG字段,所述流标识携带在所述CN-TAG字段的RPID字段中。
在另一种可能的设计中,所述第一拥塞控制消息还携带用于指示拥塞程度的信息。
在另一种可能的设计中,所述用于指示拥塞程度的信息为以下信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
另一方面,还提供一种网络设备,包括接收单元,用于接收来自源节点的一个或多个 报文;存储单元,用于存储所述一个或多个报文中五元组与其流标识之间的映射关系;处理单元,用于当检测到网络拥塞时,获取发生拥塞的报文的五元组;根据所述映射关系,获取所述报文对应的流标识;生成拥塞控制消息,所述拥塞控制消息携带所述流标识;发送单元,用于将所述拥塞控制消息发送至所述发生拥塞的报文的源节点,以使得其根据所述拥塞控制消息降低报文的传输速率。
一种可能的设计中,所述处理单元还用于确定所述报文的源节点的类型。
在另一种可能的设计中,所述报文的源节点的类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
在另一种可能的设计中,所述报文为数据帧,所述数据帧携带CN-TAG字段,所述流标识携带在所述CN-TAG字段的RPID字段中。
在另一种可能的设计中,所述拥塞控制消息为拥塞通知消息,所述拥塞通知消息携带CN-TAG字段,所述流标识携带在所述CN-TAG字段的RPID字段中。
在另一种可能的设计中,所述拥塞控制消息还携带用于指示拥塞程度的信息。
另一方面,还提供一种网络设备,包括接收单元,用于接收来自源节点的一个或多个报文;处理单元,用于当检测到网络拥塞,生成CNP消息;发送单元,用于将所述CNP消息发送所述源节点,用于指示所述源节点降低数据速率。
在一种可能的设计中,所述处理单元还用于确定所述源节点的类型。
在另一种可能的设计中,所述源节点的类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
在另一种可能的设计中,所述源节点的类型为RoCEv2服务器。
在另一种可能的设计中,所述CNP消息携带流标识。
在另一种可能的设计中,所述CNP消息的目的地址为所述RoCEV2服务器的IP地址。
另一方面,还提供一种网络系统,包括源节点、接入转发设备、汇聚设备、目的节点,其中,所述源节点连接至所述接入转发设备的一个端口,所述接入转发设备的另一端口连接所述汇聚设备;所述汇聚设备的另一端口间接连接至所述目的节点,包括所述源节点,用于发送多个报文至所述接入转发设备;所述接入转发设备,用于接收所述多个报文,获取所述报文中的五元组和流标识,并建立所述五元组和流标识的映射关系,将所述多个报文转发至汇聚设备;所述汇聚设备,用于当检测到发生网络拥塞,获取发生网络拥塞的报文的五元组,生成第一拥塞控制消息,所述第一拥塞控制消息携带所述五元组;将所述第一拥塞控制消息发送至所述接入转发设备;所述接入转发设备,还用于接收所述第一拥塞控制消息,获取所述五元组;根据所述五元组获取流标识;生成第二拥塞控制消息,所述第二拥塞控制消息携带所述流标识;将所述第二拥塞控制消息发送至所述源节点;源节点,用于接收所述第二拥塞控制消息,并根据所述控制消息的指示,对报文为所述流标识的报文进行降速处理。
另一方面,还提供一种网络系统,包括源节点、接入转发设备、汇聚设备、目的节点,其中,所述源节点连接至所述接入转发设备的一个端口,所述接入转发设备的另一端口连接所述汇聚设备;所述汇聚设备的另一端口间接连接至所述目的节点,包括所述源节点,用于发送多个报文至所述接入转发设备;所述接入转发设备,用于接收所述多个报文,获取所述报文中的五元组和流标识,并建立所述五元组和流标识的映射关系;当检测到发生网络拥塞时,获取发生网络拥塞的报文的五元组,根据所述发生拥塞的报文的五元组,获 取发生拥塞的报文的流标识;生成拥塞控制消息,所述拥塞控制消息携带所述发生拥塞的报文的流标识;发送所述拥塞控制消息至所述源节点;所述源节点,还用于接收所述拥塞控制消息,并根据所述拥塞控制消息的指示,对具有所述流标识的报文进行降速处理。
另一方面,还提供一种网络系统,包括源节点,中间节点设备,目的节点,所述源节点通过中间节点设备连接至所述目的节点,包括:所述源节点,用于发送多个报文至所述中间节点设备;所述中间节点设备,用于当检测到网络拥塞时,生成CNP消息,所述CNP消息携带了发生拥塞的报文的流标识;将所述CNP消息发送至所述源节点;所述源节点,还用于接收所述CNP消息,并根据所述CNP消息的指示,对具有所述流标识的报文进行降速处理。
本发明各个实施例提到的五元组是指源IP地址,源端口,目的IP地址,目的端口和传输层协议。
本发明实施例提供的技术方案当检测到网络发生拥塞时,获取发生网络拥塞的报文的五元组,根据五元组信息获取发生网络拥塞的报文的流标识,进而生成拥塞控制消息,并发送会至报文的源节点,指示其对具有该流标识的报文进行降速处理,可以用于三层网络中。
附图说明
图1是本发明实施例提供的一种网络架构示意图;
图2是本发明实施例提供的一种网络拥塞控制方法交互示意图;
图3A是本发明实施例提供的一种报文(数据包)的结构示意图;
图3B是本发明实施例提供的一种检测网络拥塞的方法示意图;
图4A是本发明实施例提供的CNM帧格式的结构示意图;
图4B是本发明实例提供的另一种CNM帧格式的结构示意图;
图5是本发明又一实施例提供的一种网络拥塞控制交互方法示意图;
图6是本发明又一实施例提供的一种网络拥塞控制交互方法示意图;
图7是本发明又一实施例提供的一种网络拥塞控制交互方法示意图;
图8A是本发明实施例提供的一种报文(数据包)的帧格式示意图;
图8B是本发明实施例提供的一种CNP消息格式的结构示意图;
图8C是本发明实施例提供的一种BTH字段的内部结构示意图;
图9是本发明又一实施例提供的一种网络拥塞控制交互方法示意图;
图10是本发明实施例提供的一种网络设备的内部结构示意图;
图11是本发明又一实施例提供的一种网络设备的内部结构示意图;
图12是本发明又一实施例提供的一种网络设备的内部结构示意图;
图13是本发明又一实施例提供的一种网络设备的示意图框图;
图14是本发明又一实施提供的一种网络设备的内部结构示意图;
图15是本发明又一实施例提供的一种网络设备的示意性框图。
具体实施方式
为使得本发明的发明目的、特征、优点能够更加的明显和易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,下面所描述 的实施例仅仅是本发明一部分实施例,而非全部实施例。基于本发明中的实施例,本领域的技术人员所获得的所有其他实施例,都属于本发明保护的范围。
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本发明的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
本发明实施例的技术方案可以应用于如图1所示的网络架构中,该网络架构包括接入层、汇聚层和核心层。其中,发送报文的源节点(图1中的主机1~4)通过铜缆双绞线或者光纤连接至接入层设备(图1中4台仅为示例,实际组网中设备数目可以增加或减少),接入层设备(也可称为接入转发设备)通过铜缆双绞线或者光纤连接至汇聚层设备(图1中4台仅为示例,实际组网中设备数目可以增加或减少),汇聚层设备通过铜缆双绞线或者光纤连接至核心层设备(图1中2台仅为示例,实际组网中设备数目可以增加或减少)。
为方便说明,发送报文的节点统一称为源节点(比如主机1),数据最终达到的节点称为目的节点(比如主机3),中间途径的节点称为接入转发设备(比如图1中的T1~T4)、汇聚设备(比如图1中的A1~A4)、核心网设备(比如图1中的C1~C2)。例如,主机1作为源节点向作为目的节点的主机2发送报文,途径的节点可以依次为T1节点,A1节点,T2节点,到达主机2节点。还比如,主机1作为源节点向作为目的节点主机3发送报文,途径的节点为:主机1发送报文至T1节点,再转发至A1节点,再转发至C1节点,再转发至A3节点,再转发至T3节点,最终到达主机2节点。应理解,路由算法是经过优化的、会计算两个节点之间的最短路径,即如果两个节点之间存在多条路径,一般会选择最短路径。
一般地,在数据中心(Data Center,DC)网络当中,存在以下三种类型的数据流:(1)存储数据流:要求无丢包;(2)高性能计算流:要求低延迟;(3)以太网流:允许一定的丢包和时延。由于对数据流的要求不同,传统的数据中心使用三种不同类型的网络承载不同的流。这种方法在小型的数据中心当中还是可接受的。然而,随着数据中心规模的扩大,运行三种不同类型的网络,并不是经济的解决方案。一种可选的方法是:统一采用以太网承载三种不同类型的数据流,并定义附加机制,以使以太网能够满足该三种不同类型网络的要求。为此,电气和电子工程师协会(Institute of Electrical and Electronics Engineers,IEEE)定义了以下的规范:(1)增强传输选择(Enhanced Transmission Selection,ETS):用于避免一种数据流类型的大规模流量猝发影响其它数据流类型,为不同的数据流类型提供最小带宽保证。一种数据流类型只有在其它数据流类型带宽不占用的情况下,才能使用分配带宽之外的额外带宽。这使多种数据流类型可在同一网络中和谐共存。(2)基于优先级的流量控制(Priority-based Flow Control,PFC):用于满足三种流量在以太网中共存时,存储流量无丢包,且对其它的两种流量无影响的要求。(3)量化的拥塞通告(Quantized Congestion Notification,QCN):用于降低引起拥塞的源节点的报文发送速率,从根源上避免拥塞,以保持网络的畅通,解决因拥塞引发报文重传或流量控制,导致报文时延增加的问题。工作原理:QCN就是其中的一种“附加机制”,它用于避免网络拥塞,以减少丢包和降低网络的延迟(拥塞会导致丢包,丢包后重传将增加报文的延迟)。为达到避免网络拥塞的目的, 以太网交换机和源节点(在数据中心当中,通常指服务器)均需支持QCN:当以太网交换机检测到拥塞时,会向数据源节点发送拥塞通告消息,要求数据源节点降低报文的发送速率。数据源节点收到拥塞通告消息后,降低报文的发送速率。
需要说明的是,源节点存在至少两种类型,称为第一类型和第二类型服务器。下面的多个实施例分别针对三种场景:第一种场景图1中的所有主机1至4均为第一类型,第二种场景图1中的所有主机1至4均为第二类型,第三种场景图1中的主机的类型包括第一类型和第二类型。
实施例一
在图1所示的网络架构下,示例性地,源节点主机1欲发送报文至目的节点主机2,中间途径的节点分别为接入转发设备T1、汇聚层设备A1、再到接入层设备T2,最终至主机2。假如中间节点A1发生了网络拥塞,图2提供一种网络拥塞控制的交互方法示意图,包括:
步骤201:源节点主机1发送多个报文至接入转发设备;
具体地,报文也可以称为数据包、帧等,报文的格式如图3A所示,包括目的MAC地址、源MAC地址、服务标记(Service VLAN Tag,S-TAG)字段、客户标记(Customer VLAN Tag,C-TAG)字段、拥塞通知标记(Congestion Notification Tag,CN-TAG)字段、媒体接入控制服务数据单元(MAC Service Data Unit,MSDU)字段,其中,该报文的五元组信息封装在MSDU字段中,CN-TAG字段又进一步包括以太类型(EtherType)字段和反应点标识(Reaction Point Identifier,RPID)字段。
其中,RPID字段封装了该报文的流标识(Flow-ID),用于唯一标识发送的每一个报文的数据流类型。比如,Flow-ID为1标识该报文所属的数据流为存储数据流、Flow-ID为2标识该报文所属的数据流为高性能计算流、Flow-ID为3标识该报文所属的数据流为以太网流。
应理解,五元组是指报文中携带的源IP地址、目的IP、源端口号、目的端口号、协议类型。
当报文从主机1发往接入转发设备时,源MAC地址就是主机1的MAC地址,目的MAC地址就是接入转发设备的MAC地址,源IP地址是主机1的IP地址,目的IP地址是主机2的IP地址。
步骤202:接入转发设备收到所述多个报文后,获取每一个报文的五元组和流标识信息,并在本地存储报文的五元组与流标识的映射关系表,示例性地,如表1所示。
表1
Figure PCTCN2017084627-appb-000001
Figure PCTCN2017084627-appb-000002
表1中协议一列中取值6代表该协议为传输控制协议(Transmission Control Protocol,TCP),取值17代表该协议为用户数据协议(User Data Protocol,UDP)。
可选地,表1还可以存储其他信息,比如,虚拟局域网标识(Virtual Local Access Network,VLAN)、源MAC地址,如表2所示。所述VLAN Tag和源MAC地址在后续构造拥塞通知消息(Congestion Notification Message,CNM)时,需要填入CNM相应的字段中。
表2
Figure PCTCN2017084627-appb-000003
步骤203:接入转发设备将所述多个报文转发至汇聚层设备;
应理解,如何将报文转发至下一个设备,可以采用现有技术规定的方式。比如,查路由表获取出端口和下一跳,并根据地址解析协议表(Address Resolution Protocol,ARP)查找下一跳对应的MAC地址,封装以太头。可以在报文的头部封装下一个设备的IP地址、端口号,使得报文可以被路由至下一个设备。
步骤204:当汇聚层设备检测到发生网络拥塞,获取发生拥塞的报文的五元组;生成第一拥塞控制消息,所述第一拥塞控制消息携带所述发生拥塞的报文的五元组;将所述第一拥塞控制消息发送至所述接入转发设备;
需要说明的是:在网络设备上,从一个端口收到的报文会暂时存放在内存队列中,然后再从队列中按顺序取出报文,转发到另一个端口上。网络拥塞在网络设备上的表现是队列中的报文超过了阈值。本发明实施例步骤204中提到的获取发生拥塞的报文的五元组,即当队列中报文超过阈值时,获取新收到的报文的五元组。
步骤205:接入转发设备收到所述拥塞控制消息后,获取所述拥塞控制消息中携带的发生拥塞的报文的五元组;根据五元组和流标识的映射关系表,获取发生拥塞的报文的流标识;生成第二拥塞控制消息,该第二拥塞控制消息携带流标识。
步骤206:将第二拥塞控制消息发送至主机1,以指示主机1降低属于该流标识的报文的速率。
可选地,所述第一拥塞控制消息可以是自定义消息,该消息格式如表3所示:
表3
IP Header UDP Header 净荷
如表3所示,自定义的第一拥塞控制消息报文中至少包括20字节IP头、8字节UDP头以及净荷字段。其中,20字节的IP头的结构包括版本(Version)字段,4位,指定IP协议的版本号;报头长度(IP Header Length,IHL)字段,4位,IP协议包头的长度;服务类型(Type of service)字段,定义IP协议包的处理方法;IP包总长度(Total Length)字段、标识(Identification)字段、标记(Flags)字段、段偏移量(Fragment offset)字段,当数据包被分割时,它和更多段位(More Fragments,MF)进行连接,帮助目的主机将分段的包组合;生存时间(Time to live,TTL)字段,表示数据包在网络上生存多久,每通过一个路由器该值减一,为0时将被路由器丢弃;协议(Protocol)字段,8位,表示IP数据包的数据部分使用的协议类型;报头校验和(Header Checksum)字段,16位,是Ipv4数据包报头的校验和、源地址(Source address)字段、目的地址(Destination Address)字段。这些字段的含义具体可以参考现有的请求评议(Request For Comments,RFC)791标准的记载,这里不再赘述。其中,8字节的UDP头的结构包括源端口(Source Port)字段、目的端口(Destination port)字段、长度(Length)字段、校验和(CheckSum)字段。这些字段的含义具体可以参考现有的请求评议(Request For Comments,RFC)791标准的记载,这里不再赘述。
IP头的源IP地址为该发生拥塞的汇聚设备的IP地址;目的IP地址为接入转发设备的IP地址,UDP头的目的端口号是用来标识该第一拥塞控制消息,比如,定义0x2000为目的端口号,接入转发设备收到第一拥塞消息后读取目的端口号这个字段,如果值是0x2000,说明接收的消息是自定义的拥塞控制消息,下一步需要将其转换成标准CNM。净荷部分封装有所述发生拥塞的报文的五元组。
可选地,净荷部分还封装有用于表示拥塞程度的字段。比如,QntzFb字段、拥塞点标识(Congestion Point Identifier,CPID)字段、队列偏移值(Qoffset)字段、Qdelta字段。比如,第一拥塞控制消息可以如图4A所示,QntzFb字段标识为第一拥塞控制消息的量化反馈值,长度为6bit,占用前2个字节的低6位;CPID字段标识拥塞点标识符,长度为8字节。为了保证标识符的唯一性,使用拥塞点设备的MAC地址作为高位6字节,低位2字节用于标识同一设备的不同端口或者不同优先级队列;Qoffset字段占用2个字节,表示拥塞点CP发送队列当前空闲可用的字节数;Qdelta字段占用2个字节,表示2次获取拥塞点CP发送队列可用字节数的差值。
可选地,所述第二拥塞控制消息可以参考现有的CNM格式,如图4B所示,包括目的IP地址、源IP地址、S-TAT、C-TAG、CN-TAG、净荷。其中,净荷又进一步包括:Version、Reserved、QntzFb、CPID、Qoffset、Qdelta、Encapsulated Priority、Encapsulated Destination MAC address、Encapsulated MSDU length、Encapsulated MSDU等字段。其中,Version为CNM消息的版本号,长度为4bit,占用第1个字节的高4位,默认值填充0;Reserved CNM消息保留字段,长度为6bit,占用第1个字节的低4位,和第2个字节的高2位,默认值填充0;QntzFB为CNM消息的量化反馈值,长度为6bit,占用前2个字节的低6位;Congestion Point Identifier(CPID)为拥塞点标识符,长度为8字节。为了保证标识符的唯一性,使用拥塞点设备的MAC地址作为高位6字节,低位2字节用于标识同一设备的不同端口或者不同优先级队列;QOffset占用2个字节,表示拥塞点CP发送队列当前空闲可用的字节数;QDelta占用2个字节,表示2次获取拥塞点CP发送队列可用字节数的差值;Encapsulated priority占用2个字节,第1个字节的高3位,填充触发CNM消息数据帧的 优先级,其它位填充0;Encapsulated destination MAC address占用6个字节,填充触发CNM消息数据帧的目的MAC地址;Encapsulated MSDU length占用2个字节,表示Encapsulated MSDU字段的长度;Encapsulated MSDU最多占用64个字节,填充触发CNM消息的数据帧的内容。
其中,流标识封装在所述CNM消息的CN-TAG字段所包含的RPID字段中。
应理解,上述步骤201~205所描述的内容是针对图1中主机的类型为同一种类型的场景。当图1中的主机为多种类型时,所述方法200在获取发生拥塞的报文的五元组之前,还应包括:
步骤A:当汇聚层设备检测到发生网络拥塞,确定发生拥塞的报文来源于哪种类型的源节点;
步骤B1:当所述报文的源节点属于第一类型时,执行步骤204;
步骤B2:当所述报文的源节点属于第二类型时,执行实施例四或实施例五中对应的步骤。
可选地,所述第一类型的源节点是数据中心量化拥塞通知(Data Center Qualtified Congestion Notification,DCQCN)DCQCN服务器。
关于DCQCN的具体描述,可以参考标准IEEE 802.1Qau的描述。其中,QCN服务器发出的报文类型是携带CN-TAG的数据报文,如图3A所示。该数据报文和普通报文的区别在于以太头部增加了CN-TAG字段,其中携带了RPID即为Flow-ID,用于唯一标识服务器发送的每一条报文(数据流)。
可选地,所述第二类型的源节点是RDMA聚合以太网版本2.0(RDMA over Converged Ethernet version 2.0,RoCEv2)服务器,其中,RDMA为远程直接数据存取(Remote Direct Memory Access,RDMA)。这种RoCEv2服务器发送的报文格式与CNP服务器不同。具体可以参考图8A以及实施例四或五的描述,这里暂不展开叙述。
可选地,确定发生拥塞的报文来源于哪种类型的源节点,有多种实现方案。例如,第一种实现方案为:在所述报文中扩展出一个新的字段,该字段的名称为源节点类型,占1个比特,当取值为1时,表示第一类型;当取值为0时,表示为第二类型。
第二种实现方式为:根据所述报文的优先级字段来判断源节点主机#1的类型。比如,当优先级为1级时,表示主机类型为第一类型,当优先级为2级时,表示主机类型为第二类型。
第三种实现方式为:根据报文的其他字段来判断源节点主机#1的类型。比如,根据源IP地址字段来判断。比如,处于一个网段的IP地址用于第一类型主机,处于另一网段的IP地址用于第二类型主机。
应理解,判断源节点的类型有多种实现方式,本发明实施例对此不进行任何限制。
还应理解,主机1根据接收到的第二拥塞控制消息,对属于所述流标识的报文进行限速或者降低发送速率处理,并周期性尝试增加报文的发送速率,如果此时拥塞已经消除,增加报文的发送速率并不会引起拥塞,也就不会再收到CNM。报文的发送速率最终可以恢复到拥塞之前的值。
综上,本发明实施例提供的技术方案当检测到网络发生拥塞时,获取发生网络拥塞的报文的五元组,根据五元组信息获取发生网络拥塞的报文的流标识,进而生成拥塞控制消息,并发送会至报文的源节点,指示其对具有该流标识的报文进行降速处理,可以用于三 层网络中。
实施例二
在图1所示的网络架构下,示例性地,源节点主机1欲发送报文至目的节点主机2,中间途径的节点分别为接入转发设备T1、汇聚层设备A1、再到接入层设备T2,最终至主机2。假如中间节点T1发生了网络拥塞,如图5所示,图5为本发明又一实施例提供的网络拥塞控制方法交互图,具体为:
步骤501:主机1发送多个报文至接入转发设备。
步骤502:接入转发设备接收到多个报文,获取多个报文中的五元组与流标识信息,建立五元组与流标识的映射关系表,并保存在本地;所述映射关系表请参照实施例一中的表1或表2,这里不再赘述。
步骤503:当检测到网络发生拥塞时,获取发生网络拥塞的报文的五元组;根据上述五元组与流标识的映射关系表,获取发生网络拥塞的报文的流标识;生成网络拥塞控制消息,所述消息携带所述流标识。
步骤504:将网络拥塞控制消息发送至主机1,用于指示主机1降低属于所述流标识的报文的发送速率。
需要说明的是,对实施例一的报文的格式、五元组与流标识的映射关系表的格式等技术细节的描述,也适用用实施例二,这里不再赘述。
可选地,所述网络拥塞控制消息的格式可以参考现有的CNM格式,如图4B所示,包括目的IP地址、源IP地址、S-TAT、C-TAG、CN-TAG、净荷。关于网络拥塞控制消息的格式请进一步参考实施例一中对第二拥塞控制消息的描述,这里不再赘述。
应理解,获取的流标识封装在所述CNM消息的CN-TAG字段下面的RPID字段中。
还应理解,上述步骤501~504所述的步骤是针对图1中主机的类型为同一种类型的场景。当图1中的主机为多种类型时,所述方法500在获取发生网络拥塞的报文的五元组之前,还应包括:
步骤A:当检测到发生网络拥塞,确定发生拥塞的报文来源于哪种类型的源节点;
步骤B:当所述报文来源于第一类型的源节点时,继续执行步骤503;
可选地,所述第一类型的源节点是DCQCN服务器。
可选地,所述第二类型的源节点是RoCEv2服务器。
可选地,确定发生拥塞的报文来源于哪种类型的源节点,有多种实现方案。例如,第一种实现方案为:在所述报文中扩展出一个新的字段,该字段的名称为源节点类型,占1个比特,当取值为1时,表示第一类型;当取值为0时,表示为第二类型。
第二种实现方式为:根据所述报文的优先级字段来判断源节点主机#1的类型。比如,当优先级为1级时,表示主机类型为第一类型,当优先级为2级时,表示主机类型为第二类型。
第三种实现方式为:根据报文的其他字段来判断源节点主机#1的类型。比如,根据源IP地址字段来判断。比如,处于一个网段的IP地址用于第一类型主机,处于另一网段的IP地址用于第二类型主机。
应理解,判断源节点的类型有多种实现方式,本发明实施例对此不进行任何限制。
综上,本发明实施例提供的技术方案当检测到网络发生拥塞时,获取发生网络拥塞的报文的五元组,根据五元组信息获取发生网络拥塞的报文的流标识,进而生成拥塞控制消 息,并发送会至报文的源节点,指示其对具有该流标识的报文进行降速处理,可以用于三层网络中。
实施例三
在图1所示的网络架构下,示例性地,源节点主机1欲发送报文至目的节点主机2,中间途径的节点分别为接入转发设备T1、汇聚层设备A1、再到接入层设备T2,最终至主机2。假如中间节点T2发生了网络拥塞,如图6所示,图6示出了本发明又一实施例提供的网络拥塞控制方法交互图,具体为:
步骤601:主机1发送多个报文至接入转发设备;
步骤602:接入转发设备收到多个报文后,获取多个报文中的五元组与流标识信息,建立五元组与流标识的映射关系表,并保存;所述映射关系表请参照实施例一中的表1或表2,这里不再赘述;
步骤603:转发所述多个报文至第一中间节点设备(比如汇聚层设备);
步骤604:第一中间节点设备接收所述多个报文,转发所述多个报文至第二中间节点设备(比如核心网设备、骨干网设备等);
步骤605:当第二中间节点设备检测到网络发生拥塞时,获取发生拥塞的报文的五元组信息;生成第一拥塞控制消息,该消息携带所述拥塞报文的五元组;
步骤606:将所述第一拥塞控制消息发送回至第一中间节点设备;
步骤607:第一中间节点设备收到第一拥塞控制消息后,转发该第一拥塞控制消息至接入转发设备;
步骤608:接入转发设备接收所述第一拥塞控制消息,根据五元组与流标识的映射关系表,获取发生拥塞的报文的流标识;生成第二拥塞控制消息,所述第二拥塞控制消息携带所述流标识;
步骤609:将所述第二拥塞控制消息发送至主机1,用于指示主机1对具有所述流标识的报文进行降速处理。
需要说明的是,对实施例一的报文的格式、五元组与流标识的映射关系表、第一拥塞控制消息、第二拥塞控制消息的格式等技术细节的描述,也适用实施例三,这里不再赘述。
应理解,上述步骤601~609所述的步骤是针对图1中主机的类型为同一种类型的场景。当图1中的主机为多种类型时,所述方法600在获取发生网络拥塞的报文的五元组之前,还应包括:
步骤A:当检测到发生网络拥塞,确定发生拥塞的报文来源于哪种类型的源节点;
步骤B:当所述报文来源于第一类型的源节点时,继续执行步骤605;
可选地,所述第一类型的源节点是CNP服务器。
可选地,确定发生拥塞的报文来源于哪种类型的源节点,有多种实现方案。例如,第一种实现方案为:在所述报文中扩展出一个新的字段,该字段的名称为源节点类型,占1个比特,当取值为1时,表示第一类型;当取值为0时,表示为第二类型。
第二种实现方式为:根据所述报文的优先级字段来判断源节点主机#1的类型。比如,当优先级为1级时,表示主机类型为第一类型,当优先级为2级时,表示主机类型为第二类型。
第三种实现方式为:根据报文的其他字段来判断源节点主机#1的类型。比如,根据源IP地址字段来判断。比如,处于一个网段的IP地址用于第一类型主机,处于另一网段的 IP地址用于第二类型主机。
应理解,判断源节点的类型有多种实现方式,本发明实施例对此不进行任何限制。
综上,本发明实施例提供的技术方案当检测到网络发生拥塞时,获取发生网络拥塞的报文的五元组,根据五元组信息获取发生网络拥塞的报文的流标识,进而生成拥塞控制消息,并发送会至报文的源节点,指示其对具有该流标识的报文进行降速处理,可以用于三层网络中。
实施例四
在图1所示的网络架构下,示例性地,源节点主机1欲发送报文至目的节点主机2,中间途径的节点分别为接入转发设备T1、汇聚层设备A1、再到接入层设备T2,最终至主机2。假如中间节点A1发生了网络拥塞,如图7所示,图7示出了本发明又一实施例提供的一种网络拥塞控制的交互图,在该实施例中,主机的类型与实施例一至三的主机类型不同,属于实施例一中提到的RoCEv2服务器;发出的报文格式也不同于实施例一至六,该实施例所提供的方法,具体地:
步骤701:主机1发送一个或多个报文至接入转发设备;
图8A示出了一种报文结构,如图8A所述,包括2层以太报头Eth L2Header、网络层报头IP Header、用户数据报头UDP Header、IB基本传输头(InfiniBand Base Transport Header,IB BTH)、IB净荷(InfiniBand Payload,IB Payload)、循环冗余码ICRC,帧检验序列(Frame Check Sequence,FCS)。关于RoCEv2报文的各个字段的含义以及具体描述,请参考Supplement to InfiniBand Architecture Specification Volume 1Release 1.2.1的记载,这里不再赘述。
其中,BTH字段又包括多个子字段,如图8C所示。
可选地,流标识封装在所述BTH字段中。
可选地,流标识封装在所述DestQP字段中。
步骤702:接入转发设备收到所述一个或多个报文,并转发至第一中间节点设备;
步骤703:第一中间节点设备收到所述一个或多个报文,并转发至第二中间节点设备;
步骤704:当第二中间节点设备检测到网络发生拥塞,生成CNP消息,该消息携带发生网络拥塞的报文的流标识;
如图8B所示,图8B示出了CNP消息的帧格式示意图。包括MAC报头MAC Header、IP头IP Header、UDP头UDP header、基本传输头BTH、保留字段reserved、循环冗余码ICRC校验,帧检验序列(Frame Check Sequence,FCS)字段。关于CNP消息的各个字段的含义以及具体描述,请参考Supplement to InfiniBand Architecture Specification Volume 1Release 1.2.1的记载,这里不再赘述。
其中,BTH字段又包括多个子字段,如图8C所示。比如目的队列DestQP、操作码opcode等等字段,流标识封装在所述BTH字段中的DestQP字段中。
步骤705:第二中间节点设备将所述CNP消息发送至第一中间节点设备;
步骤706:第一中间节点设备接收所述CNP消息,并转发至接入转发设备;
步骤707:接入转发设备接收所述CNP消息,并转发CNP消息至源节点,指示源节点降低具有该流标识的报文的传输速率。
应理解,本发明实施例提到的报文属于哪个数据流可以通过源IP、目的IP、协议号、源端口号、目的端口号、SrcQP和DestQP 7个元组唯一确定。其中,在步骤804中,当第 二中间节点检测到网络发生拥塞时,识别出引起拥塞的报文,解析出这条报文里面携带的目的队列(Destination Queue Pair,DestQP),构造CNP的时候把这个DestQP设置成源队列(Source Queue Pair,SrcQP)。
CNP的源IP、目的IP分别为拥塞报文的目的IP和源IP,UDP的源端口号、目的端口号分别为拥塞报文的源端口号和目的端口号,opcode为0x81。
应理解,上述步骤701~707所述的步骤是针对图1中主机的类型为同一种类型的场景。当图1中的主机为多种类型时,所述方法700在生成CNP消息之前,还应包括:
步骤A:当检测到发生网络拥塞,确定发生拥塞的报文来源于哪种类型的源节点;
步骤B:当所述报文来源于第二类型的源节点时,继续执行步骤704;
可选地,所述第二类型的源节点是RoCEv2服务器。
可选地,确定发生拥塞的报文来源于哪种类型的源节点,有多种实现方案。例如,第一种实现方案为:在所述报文中扩展出一个新的字段,该字段的名称为源节点类型,占1个比特,当取值为1时,表示第一类型;当取值为0时,表示为第二类型。
第二种实现方式为:根据所述报文的优先级字段来判断源节点主机#1的类型。比如,当优先级为1级时,表示主机类型为第一类型,当优先级为2级时,表示主机类型为第二类型。
第三种实现方式为:根据报文的其他字段来判断源节点主机#1的类型。比如,根据源IP地址字段来判断。比如,处于一个网段的IP地址用于第一类型主机,处于另一网段的IP地址用于第二类型主机。
应理解,判断源节点的类型有多种实现方式,本发明实施例对此不进行任何限制。综上,本发明实施例提供的技术方案当检测到网络发生拥塞时,由于拥塞到达门限时直接生成CNP,极大缩短了控制环延迟(Control Loop Delay,CLD),因此可以减少队列深度和报文时延,提升业务性能。
实施例五
在图1所示的网络架构下,示例性地,源节点主机1欲发送报文至目的节点主机2,中间途径的节点分别为接入转发设备T1、汇聚层设备A1、再到接入层设备T2,最终至主机2。假如中间节点A1发生了网络拥塞,如图9所示,图9示出了本发明又一实施例提供的一种网络拥塞控制的交互图,在该实施例中,主机的类型与实施例四所述的类型一致,具体地:
步骤901:主机1发送多个报文至接入转发设备;
所述报文的结构请参考图8A以及实施例四的描述,这里不再赘述。
可选地,所述报文携带流标识,所述流标识封装在所述报文的BTH字段中。
可选地,所述报文携带流标识,所述流标识封装在所述报文的DestQP字段中。
步骤902:接入转发设备收到所述多个报文,并转发至汇聚层设备;
步骤903:当汇聚层设备检测到网络发生拥塞,生成CNP消息,该消息携带发生网络拥塞的报文的流标识;
可选地,所述流标识封装在所述CNP消息的BTH字段中。
可选地,所述流标识封装在所述CNP消息的DestQP字段中。
步骤904:汇聚层设备将所述CNP消息发送至接入转发设备;
步骤905:接入转发设备接收所述CNP消息,并转发至源节点主机1,指示源节点主 机1降低具有该流标识的报文的传输速率;
可选地,所述CNP消息的格式请参考实施例四与图8B所示,这里不再赘述。
应理解,本发明实施例所提到的报文属于哪一条数据流可以通过源IP、目的IP、协议号、源端口号、目的端口号、SrcQP和DestQP 7个元组唯一确定。其中,在步骤903中,当汇聚层设备检测到网络发生拥塞时,识别出引起拥塞的报文,解析出这条报文里面携带的DestQP,构造CNP的时候把这个DestQP设置成SrcQP。
CNP消息的源IP、目的IP分别为拥塞报文的目的IP和源IP,UDP的源端口号、目的端口号分别为拥塞报文的源端口号和目的端口号,操作码(opcode)可以设置为0x81。
应理解,上述步骤901~905所述的步骤是针对图1中主机的类型为同一种类型的场景。当图1中的主机为多种类型时,所述方法900在生成CNP消息之前,还应包括:
步骤A:当检测到发生网络拥塞,确定发生拥塞的报文来源于哪种类型的源节点;
步骤B:当所述报文来源于第二类型的源节点时,继续执行步骤903;
可选地,当所述报文来源于第一类型的源节点时,请参考实施例一至实施例三的介绍。
可选地,第一类型的源节点是DCQCN服务器。
所述第二类型的源节点是RoCEv2服务器。
可选地,确定发生拥塞的报文来源于哪种类型的源节点,有多种实现方案。例如,第一种实现方案为:在所述报文中扩展出一个新的字段,该字段的名称为源节点类型,占1个比特,当取值为1时,表示第一类型;当取值为0时,表示为第二类型。
第二种实现方式为:根据所述报文的优先级字段来判断源节点主机#1的类型。比如,当优先级为1级时,表示主机类型为第一类型,当优先级为2级时,表示主机类型为第二类型。
第三种实现方式为:根据报文的其他字段来判断源节点主机#1的类型。比如,根据源IP地址字段来判断。比如,处于一个网段的IP地址用于第一类型主机,处于另一网段的IP地址用于第二类型主机。
应理解,判断源节点的类型有多种实现方式,本发明实施例对此不进行任何限制。综上,本发明实施例提供的技术方案当检测到网络发生拥塞时,由于拥塞到达门限时直接生成CNP,极大缩短了控制环延迟CLD,因此可以减少队列深度和报文时延,提升业务性能。
实施例六
图10示出了本发明又一实施例提供的一种网络设备的内部结构示意图。如图10所示,该网络设备1000包括处理单元1010和发送单元1020,其中,处理单元1010用于当检测到网络拥塞时,获取引发网络拥塞的报文的五元组;生成拥塞控制消息,其中,所述拥塞控制消息携带所述报文的五元组;发送单元1020用于将所述拥塞控制消息发送至所述报文的源节点的接入转发设备。
具体地,所述报文的格式可以参考实施例一以及图3A的描述,这里不再赘述。
应理解,五元组是指报文中携带的源IP地址、目的IP、源端口号、目的端口号、协议类型。
可选地,所述处理单元1010还用于确定所述报文的源节点的类型。
可选地,所述报文的源节点的类型是通过所述报文携带的优先级信息、源节点类型字段、或者IP地址来确定的。
可选地,确定发生拥塞的报文来源于哪种类型的源节点,有多种实现方案。例如,第 一种实现方案为:在所述报文中扩展出一个新的字段,该字段的名称为源节点类型,占1个比特,当取值为1时,表示第一类型;当取值为0时,表示为第二类型。
第二种实现方式为:根据所述报文的优先级字段来判断源节点主机#1的类型。比如,当优先级为1级时,表示主机类型为第一类型,当优先级为2级时,表示主机类型为第二类型。
第三种实现方式为:根据报文的其他字段来判断源节点主机#1的类型。比如,根据源IP地址字段来判断。比如,处于一个网段的IP地址用于第一类型主机,处于另一网段的IP地址用于第二类型主机。
应理解,判断源节点的类型有多种实现方式,本发明实施例对此不进行任何限制。可选地,拥塞控制消息的格式可以参考实施例一、表3以及图4A的描述,这里不再赘述。
可选地,拥塞控制消息还携带用于表示拥塞程度的信息。
可选地,用于表示拥塞程度的信息为以下一个或多个信息:拥塞控制消息的量化反馈值,拥塞点标识符,拥塞点发送队列当前空闲可用的字节数,2次获取拥塞点CP发送队列可用字节数的差值。
实施例七
图11示出了本发明又一实施例提供的一种网络设备的内部结构示意图。如图11所示,该网络设备1100包括接收单元1110、处理单元1120和发送单元1130,其中,接收单元1110用于接收第一拥塞控制消息,其中,所述第一拥塞控制消息携带引发拥塞的报文的五元组;处理单元1120用于根据所述报文的五元组,获取所述报文的流标识;生成第二拥塞控制消息,所述第二拥塞控制消息携带所述流标识;发送单元1130用于将所述第二拥塞控制消息发送至所述报文的源节点。
应理解,报文的格式可以参考实施例一以及图3A的描述,这里不再赘述。
可选地,所述接收单元1110还用于接收来自所述报文的源节点发来的一个或多个报文。
可选地,所述网络设备1100还包括存储单元1140,用于所述一个或多个报文的五元组与流标识的映射关系表。应理解,所述五元组与流标识的映射关系表可以如表2或表3所示,这里不再赘述。
应理解,所述流标识封装在所述报文的CN-TAG字段中的RPID字段中,所述五元组封装在所述报文的MSDU字段中。
可选地,所述第一拥塞控制消息、以及第二拥塞控制消息的格式可以参考实施一的描述。
可选地,所述第二拥塞控制消息为拥塞通知消息,所述拥塞通知消息携带CN-TAG字段,所述流标识携带在所述CN-TAG字段的RPID字段中。
可选地,所述第一拥塞控制消息还携带用于指示拥塞程度的信息。
可选地,所述用于指示拥塞程度的信息为以下信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
可选地,所述处理单元1120还用于判断所述报文的源节点的类型。
应理解,本发明实施例是对应方法实施例二的装置实施例,对方法实施例的描述,也应适用于本发明实施例,这里不再赘述。
综上,本发明实施例提供的技术方案当检测到网络发生拥塞时,获取发生网络拥塞的报文的五元组,根据五元组信息获取发生网络拥塞的报文的流标识,进而生成拥塞控制消息,并发送会至报文的源节点,指示其对具有该流标识的报文进行降速处理,可以用于三层网络中。
实施例八
图12示出了本发明又一实施例提供的一种网络设备的内部结构示意图。如图12所示,该网络设备1200包括接收单元1210、存储单元1220、处理单元1230和发送单元1240。其中,所述接收单元1210用于接收来自源节点的多个报文;存储单元1220用于存储所述多个报文中五元组与流标识之间的映射关系;处理单元1230用于当检测到网络拥塞时,获取发生拥塞的报文的五元组;根据所述五元组,获取所述报文对应的流标识;生成拥塞控制消息,所述拥塞控制消息携带所述流标识;发送单元1240用于将所述拥塞控制消息发送至所述发生拥塞的报文的源节点设备,以使得其根据所述拥塞控制消息降低报文的传输速率。
可选地,所述处理单元1230还用于确定所述报文的源节点的类型。
可选地,确定发生拥塞的报文来源于哪种类型的源节点,有多种实现方案。例如,第一种实现方案为:在所述报文中扩展出一个新的字段,该字段的名称为源节点类型,占1个比特,当取值为1时,表示第一类型;当取值为0时,表示为第二类型。
第二种实现方式为:根据所述报文的优先级字段来判断源节点主机#1的类型。比如,当优先级为1级时,表示主机类型为第一类型,当优先级为2级时,表示主机类型为第二类型。
第三种实现方式为:根据报文的其他字段来判断源节点主机#1的类型。比如,根据源IP地址字段来判断。比如,处于一个网段的IP地址用于第一类型主机,处于另一网段的IP地址用于第二类型主机。
应理解,判断源节点的类型有多种实现方式,本发明实施例对此不进行任何限制。可选地,所述拥塞控制消息的格式可以参考实施例一或者图4A的描述,这里不再赘述。
可选地,所述拥塞控制消息还携带用于指示拥塞程度的信息。
可选地,所述用于指示拥塞程度的信息为以下信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
应理解,本发明实施例是对应方法实施例三的装置实施例,对方法实施例的描述,也应适用于本发明实施例,这里不再赘述。
综上,本发明实施例提供的技术方案当检测到网络发生拥塞时,获取发生网络拥塞的报文的五元组,根据五元组信息获取发生网络拥塞的报文的流标识,进而生成拥塞控制消息,并发送会至报文的源节点,指示其对具有该流标识的报文进行降速处理,可以用于三层网络中。
应理解,图10-12的网络设备以功能单元的形式展示。在不受限制的情况下,本文所使用的术语“单元”可指执行一个或多个软件或固件程序的专用集成电路(ASIC)、电子电路、(共享、专用或组)处理器以及存储器,组合逻辑电路,和/或提供所述功能的其它合适的部件。
实施例九
图13为本发明又一实施例提供的一种网络设备的示意性框图。该网络设备1300包括处理器1310、存储器1320、总线1330和用户接口1340、网络接口1350。
具体地,处理器1310控制网络设备1300的操作,处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件。
用户接口1360,用于连接下层网络设备;
网络接口1350用于连接上层网络设备;
网络设备1300的各个组件通过总线1350耦合在一起,其中总线系统1350除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图中将各种总线都标为总线系统1350。需要说明的是,上述对于网元结构的描述,可应用于本发明的实施例。
存储器1320可以包括只读存储器(Read Only Memory,ROM)和随机存取存储器(Random Access Memory,RAM),或者可存储信息和指令的其他类型的动态存储设备,也可以是磁盘存储器。存储器1320可用于保存实现本发明实施例提供的相关方法的指令。可以理解,通过编程或装载可执行指令到网元1300的处理器1310,缓存和长期存储中的至少一个。在一种具体的实施例中,所述存储器,用于存储计算机可执行程序代码,其中,当所述程序代码包括指令,当所述处理器执行所述指令时,所述指令使所述网元执行以下操作:
当检测到网络拥塞时,获取引发网络拥塞的报文的五元组;
生成拥塞控制消息,其中,所述拥塞控制消息携带所述报文的五元组;
将所述拥塞控制消息发往所述报文的源节点的接入转发设备;
或者当所述程序代码包括指令,当所述处理器执行所述指令时,所述指令使所述网元执行以下操作:
接收第一拥塞控制消息,其中,所述第一拥塞控制消息携带引发拥塞的报文的五元组;
根据所述引发拥塞的报文的五元组,获取所述报文的流标识;
生成第二拥塞控制消息,其中所述第二拥塞控制消息携带所述报文的流标识;
向所述报文的源节点发送第二拥塞控制消息;
或者当所述程序代码包括指令,当所述处理器执行所述指令时,所述指令使所述网元执行以下操作:
接收来自源节点的一个或多个报文;
建立所述一个或多个报文中五元组与流标识之间的映射关系;
当检测到网络拥塞时,获取发生拥塞的报文的五元组;
根据所述映射关系,获取所述发生拥塞的报文的流标识;
生成拥塞控制消息,所述拥塞控制消息携带所述流标识;以及
发送至所述源节点,用于指示所述源节点降低具有所述流标识的报文的传输速率。
以上作为网络设备的网元包含的处理器所执行操作的具体实现方式可以参照实施例一至三中的由网络设备执行的对应步骤,本发明实施例不再赘述。
实施例十
图14示出了本发明又一实施例提供的一种网络设备的内部结构示意图。如图14所示, 该网络设备1400包括接收单元1410、处理单元1420和发送单元1430。其中,所述接收单元1410用于接收来自源节点的多个报文;所述处理单元1420用于当检测到网络发生拥塞时,生成CNP消息,所述CNP消息携带发生拥塞的报文的流标识;发送单元1430用于将所述CNP消息发送至所述源节点,用于指示所述源节点降低属于所述流标识的报文的传输速率。
可选地,所述报文是标准的RoCE报文或者标准的RoCEv2报文。
所述RoCE或者RoCEv2报文请具体参考图8A或者Supplement to InfiniBand Architecture Specification Volume 1Release 1.2.1的记载,这里不再赘述。
可选地,所述报文携带流标识,所述流标识封装在所述报文的BTH字段中。
可选地,所述报文携带流标识,所述流标识封装在所述报文的DestQP字段中。
可选地,所述CNP消息携带流标识,所述流标识封装在所述CNP消息的BTH字段中。
可选地,所述CNP消息携带流标识,所述流标识封装在所述CNP消息的DestQP字段中。
可选地,所述处理单元1420还用于确定所述源节点的类型。
可选地,所述源节点的类型通过所述报文的优先权字段所确定。
具体地,确定所述报文的源节点的优先级信息有多种实现方案,第一种实现方案为:在所述报文中扩展出一个新的字段,该字段的名称为源节点类型,占1个比特,当取值为1时,表示第一类型;当取值为0时,表示为第二类型。
第二种实现方式为:根据所述报文的优先级字段来判断源节点的类型。比如,当优先级为1级时,表示主机类型为第一类型,当优先级为2级时,表示主机类型为第二类型。
第三种实现方式为:根据报文的其他字段来判断源节点的类型。比如,根据端口号、或者根据协议类型等字段来判断。比如,当协议类型为6时,代表所述主机为第一类型;当协议类型为7时,表示所述主机为第二类型。
可选地,所述源节点的类型为RoCEv2服务器。
可选地,所述CNP消息的目的地址为所述RoCEV2服务器的IP地址。
以上作为网络设备的网元包含的处理单元所执行操作的具体实现方式可以参照实施例四和五中的由网络设备执行的对应步骤,本发明实施例不再赘述。
综上,本发明实施例提供的技术方案,对于RoCEv2服务器发出的报文,由于发生网络拥塞时直接生成CNP,极大缩短了Control Loop Delay,因此可以减少队列深度和报文时延,提升业务性能。
实施例十一
图15为本发明又一实施例提供的一种网络设备的示意性框图。该网络设备1500包括处理器1510、存储器1520、总线1530和用户接口1540、网络接口1550。
具体地,处理器1510控制网络设备1500的操作,处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件。
用户接口1560,用于连接下层网络设备;
网络接口1550用于连接上层网络设备;
网络设备1500的各个组件通过总线1550耦合在一起,其中总线系统1550除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图中将各种总线都标为总线系统1550。需要说明的是,上述对于网元结构的描述,可应用于 本发明的实施例。
存储器1520可以包括只读存储器(Read Only Memory,ROM)和随机存取存储器(Random Access Memory,RAM),或者可存储信息和指令的其他类型的动态存储设备,也可以是磁盘存储器。存储器1520可用于保存实现本发明实施例提供的相关方法的指令。可以理解,通过编程或装载可执行指令到网元1500的处理器1510,缓存和长期存储中的至少一个。在一种具体的实施例中,所述存储器,用于存储计算机可执行程序代码,其中,当所述程序代码包括指令,当所述处理器执行所述指令时,所述指令使所述网元执行以下操作:
接收来自源节点的多个报文;
当检测到网络拥塞,生成CNP消息;
将所述CNP消息发送所述源节点,用于指示所述源节点降低数据速率。
以上作为网络设备的网元包含的处理器所执行操作的具体实现方式可以参照实施例四和五中的由网络设备执行的对应步骤,本发明实施例不再赘述。
本发明实施例还提供了一种计算机存储介质,用于储存为用户设备所用的计算机软件指令,其包含用于执行上述方面所设计的程序。
本发明实施例还提供了一种计算机存储介质,用于储存为上述网络设备所用的计算机软件指令,其包含用于执行上述方面所设计的程序。
本发明实施例还提供一种通信网络系统,包括源节点、第一中间节点、第二中间节点和目的节点设备,其中,
源节点,用于发送一个或多个报文至所述第一中间节点设备;
第一中间节点,用于存储所述一个或多个报文的五元组和流标识的映射关系表,以及转发所述一个或多个报文至第二中间节点;
第二中间节点,用于当检测到发生网络拥塞时,获取发生拥塞的报文的五元组;生成第一拥塞控制消息,所述第一拥塞控制消息携带所述发生拥塞的报文的五元组;以及发送所述第一拥塞控制消息至所述第一中间节点;
所述第一中间节点,用于根据所述第一拥塞控制消息携带的五元组,获取所述发生拥塞的报文的流标识;生成第二拥塞控制消息,所述第二拥塞控制消息携带所述流标识;将所述第二拥塞控制消息发送至所述源节点;
所述源节点,用于根据所述第二拥塞控制消息,对具有所述流标识的报文进行限速处理。
以上作为网络系统包含的各个网元所执行操作的具体实现方式可以参照实施例一至五中由各个对应的网络设备执行的对应步骤,本发明实施例不再赘述。
本发明实施例提供的技术方案当检测到网络发生拥塞时,获取发生网络拥塞的报文的五元组,根据五元组信息获取发生网络拥塞的报文的流标识,进而生成拥塞控制消息,并发送会至报文的源节点,指示其对具有该流标识的报文进行降速处理,可以用于三层网络中。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
本领域普通技术人员可知,上述方法中的全部或部分步骤可以通过程序指令相关的硬 件完成,该程序可以存储于一计算机可读存储介质中,该计算机可读存储介质如ROM、RAM和光盘等。
综上所述,以上仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (38)

  1. 一种网络拥塞控制方法,其特征在于,包括:
    当检测到网络拥塞时,获取引发网络拥塞的报文的五元组;
    生成拥塞控制消息,其中,所述拥塞控制消息携带所述报文的五元组;
    将所述拥塞控制消息发往所述报文的源节点或者所述报文的源节点所连接的接入转发设备。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    确定所述报文的源节点的类型。
  3. 根据权利要求2所述的方法,其特征在于,所述报文的源节点类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
  4. 根据权利要求1~3任意一项所述的方法,其特征在于,所述报文的五元组封装在所述拥塞控制消息的净荷域。
  5. 根据权利要求1~4任意一项所述的方法,其特征在于,所述拥塞控制消息的目的地址是所述报文的源节点的因特网协议IP地址,或者是所述报文的源节点所连接的接入转发设备的IP地址。
  6. 根据权利要求1~5任意一项所述的方法,其特征在于,所述拥塞控制消息还携带用于指示拥塞程度的信息。
  7. 根据权利要求6所述的方法,其特征在于,所述用于指示拥塞程度的信息是下述信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
  8. 一种网络拥塞控制方法,其特征在于,包括:
    接收第一拥塞控制消息,其中,所述第一拥塞控制消息携带引发拥塞的报文的五元组;
    根据所述引发拥塞的报文的五元组,获取所述报文的流标识;
    生成第二拥塞控制消息,其中所述第二拥塞控制消息携带所述报文的流标识;
    向所述报文的源节点发送第二拥塞控制消息。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    接收来自所述源节点的一个或多个报文;
    建立所述一个或多个报文的五元组和流标识的映射关系。
  10. 如权利要求8或9所述的方法,其特征在于,所述第二拥塞控制消息为拥塞通知消息,所述拥塞通知消息携带拥塞通知标记CN-TAG字段,所述流标识携带在所述CN-TAG字段的反应点标识RPID字段中。
  11. 根据权利要求8-10任意一项所述的方法,其特征在于,所述第一拥塞控制消息还携带用于指示拥塞程度的信息。
  12. 根据权利要求11所述的方法,其特征在于,所述用于指示拥塞程度的信息为以下信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
  13. 一种网络拥塞控制方法,其特征在于,包括:
    接收来自源节点的一个或多个报文;
    建立所述一个或多个报文中五元组与流标识之间的映射关系;
    当检测到网络拥塞时,获取发生拥塞的报文的五元组;
    根据所述映射关系,获取所述发生拥塞的报文的流标识;
    生成拥塞控制消息,所述拥塞控制消息携带所述流标识;以及
    发送至所述源节点,用于指示所述源节点降低具有所述流标识的报文的传输速率。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    确定所述报文的源节点的类型。
  15. 根据权利要求14所述的方法,其特征在于,所述报文的源节点的类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
  16. 根据权利要求13~15任意一项所述的方法,其特征在于,所述拥塞控制消息为拥塞通知消息,所述拥塞通知消息携带拥塞通知标记CN-TAG字段,所述流标识携带在所述拥塞通知标记CN-TAG字段的反应点标识RPID字段中。
  17. 根据权利要求13-16任意一项所述的方法,其特征在于,所述拥塞控制消息还携带用于指示拥塞程度的信息。
  18. 根据权利要求17所述的方法,其特征在于,所述用于指示拥塞程度的信息为以下信息中的一种或多种:
    所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
  19. 一种网络设备,其特征在于,包括:
    处理单元,用于当检测到网络拥塞时,获取引发网络拥塞的报文的五元组;生成拥塞控制消息,其中,所述拥塞控制消息携带所述报文的五元组;
    发送单元,用于将所述拥塞控制消息发送至所述报文的源节点或者所述报文的源节点所连接的接入转发设备。
  20. 根据权利要求19所述的网络设备,其特征在于,所述处理单元还用于
    确定所述报文的源节点的类型。
  21. 根据权利要求20所述的网络设备,其特征在于,所述报文的源节点的类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
  22. 根据权利要求19所述的网络设备,其特征在于,所述报文的五元组封装在所述拥塞控制消息的净荷域。
  23. 根据权利要求19或22所述的网络设备,其特征在于,所述拥塞控制消息的目的地址是所述报文的源节点的因特网协议IP地址,或者是所述报文的源节点的接入转发设备的IP地址。
  24. 根据权利要求19-23任意一项所述的网络设备,其特征在于,所述拥塞控制消息还携带用于指示拥塞程度的信息。
  25. 根据权利要求24所述的网络设备,其特征在于,所述用于指示拥塞程度的信息是下述信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
  26. 一种网络设备,其特征在于,包括:
    接收单元,用于接收第一拥塞控制消息,其中,所述第一拥塞控制消息携带引发拥塞的报文的五元组;
    处理单元,用于根据所述报文的五元组,获取所述报文的流标识;生成第二拥塞控制 消息,所述第二拥塞控制消息携带所述报文的流标识;
    发送单元,用于将所述第二拥塞控制消息发送至所述报文的源节点。
  27. 根据权利要求26所述的网络设备,其特征在于,所述接收单元还用于接收来自所述报文的源节点的一个或多个报文;
    所述网络设备还包括存储单元,用于存储所述一个或多个报文的五元组与流标识的映射关系表。
  28. 根据权利要求26或27所述的网络设备,其特征在于,所述第二拥塞控制消息为拥塞通知消息,所述拥塞通知消息携带拥塞通知标记CN-TAG字段,所述流标识携带在所述CN-TAG字段的反应点标识RPID字段中。
  29. 根据权利要求26-28任意一项所述的网络设备,其特征在于,所述第一拥塞控制消息还携带用于指示拥塞程度的信息。
  30. 根据权利要求29所述的网络设备,其特征在于,所述用于指示拥塞程度的信息为以下信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
  31. 一种网络设备,其特征在于,包括:
    接收单元,用于接收来自源节点的一个或多个报文;
    存储单元,用于存储所述一个或多个报文中五元组与其流标识之间的映射关系表;
    处理单元,用于当检测到网络拥塞时,获取发生拥塞的报文的五元组;根据所述映射关系表,获取所述报文的流标识;生成拥塞控制消息,所述拥塞控制消息携带所述流标识;
    发送单元,用于将所述拥塞控制消息发送至所述源节点,以使得所述源节点根据所述拥塞控制消息降低具有所述流标识的报文的传输速率。
  32. 根据权利要求31所述的网络设备,其特征在于,所述处理单元还用于
    确定所述报文的源节点的类型。
  33. 根据权利要求32所述的网络设备,其特征在于,所述报文的源节点的类型是通过以下的一种字段来确定:设备类型字段、优先级字段或者源IP地址字段。
  34. 根据权利要求31或32所述的网络设备,其特征在于,所述拥塞控制消息为拥塞通知消息,所述拥塞通知消息携带拥塞通知标记CN-TAG字段,所述流标识携带在所述CN-TAG字段的反应点标识RPID字段中。
  35. 根据权利要求31-34任意一项所述的网络设备,其特征在于,所述拥塞控制消息还携带用于指示拥塞程度的信息。
  36. 根据权利要求35所述的网络设备,其特征在于,所述用于指示拥塞程度的信息为以下信息中的一种或多种:所述拥塞控制消息的量化反馈值、拥塞点标识符、拥塞点发送队列当前空闲可用的字节数、2次获取拥塞点发送队列可用字节数的差值。
  37. 一种网络系统,至少包括源节点、接入转发设备、汇聚设备、目的节点,其中,所述源节点连接至所述接入转发设备的一个端口,所述接入转发设备的另一端口连接所述汇聚设备的一个端口;所述汇聚设备的另一端口直接或间接连接至所述目的节点,其特征在于,包括:
    所述源节点,用于发送一个或多个报文至所述接入转发设备;
    所述接入转发设备,用于接收所述一个或多个报文,建立所述一个或多个报文的五元组和流标识的映射关系,将所述一个或多个报文转发至汇聚设备;
    所述汇聚设备,用于当检测到发生网络拥塞,获取发生网络拥塞的报文的五元组,生成第一拥塞控制消息,所述第一拥塞控制消息携带所述五元组;将所述第一拥塞控制消息发送至所述接入转发设备;
    所述接入转发设备,用于接收所述第一拥塞控制消息,根据所述五元组与流标识的映射关系,获取所述发生拥塞报文的流标识;生成第二拥塞控制消息,所述第二拥塞控制消息携带所述流标识;将所述第二拥塞控制消息发送至所述源节点;
    所述源节点,用于接收所述第二拥塞控制消息,对具有所述流标识的报文进行降速处理。
  38. 一种网络系统,包括源节点、接入转发设备、汇聚设备、目的节点,其中,所述源节点连接至所述接入转发设备的一个端口,所述接入转发设备的另一端口连接所述汇聚设备的一个端口;所述汇聚设备的另一端口直接或者间接连接至所述目的节点,其特征在于,包括:
    所述源节点,用于发送一个或多个报文至所述接入转发设备;
    所述接入转发设备,用于接收所述多个报文,建立所述一个或多个报文的五元组和流标识的映射关系;当检测到发生网络拥塞时,获取发生网络拥塞的报文的五元组,根据所述五元组与流标识的映射关系,获取发生拥塞的报文的流标识;生成拥塞控制消息,所述拥塞控制消息携带所述发生拥塞的报文的流标识;发送所述拥塞控制消息至所述源节点;
    所述源节点,用于接收所述拥塞控制消息,对具有所述流标识的报文进行降速处理。
PCT/CN2017/084627 2016-06-13 2017-05-17 一种网络拥塞控制方法、设备及系统 WO2017215392A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17812503.5A EP3461082B1 (en) 2016-06-13 2017-05-17 Network congestion control method and device
US16/217,821 US11115339B2 (en) 2016-06-13 2018-12-12 Network congestion control method, device, and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610421670.XA CN107493238A (zh) 2016-06-13 2016-06-13 一种网络拥塞控制方法、设备及系统
CN201610421670.X 2016-06-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/217,821 Continuation US11115339B2 (en) 2016-06-13 2018-12-12 Network congestion control method, device, and system

Publications (1)

Publication Number Publication Date
WO2017215392A1 true WO2017215392A1 (zh) 2017-12-21

Family

ID=60642796

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/084627 WO2017215392A1 (zh) 2016-06-13 2017-05-17 一种网络拥塞控制方法、设备及系统

Country Status (4)

Country Link
US (1) US11115339B2 (zh)
EP (1) EP3461082B1 (zh)
CN (1) CN107493238A (zh)
WO (1) WO2017215392A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190342199A1 (en) * 2019-02-08 2019-11-07 Intel Corporation Managing congestion in a network
US10721174B2 (en) 2018-10-09 2020-07-21 Cisco Technology, Inc. Network-based coordination of loss/delay mode for congestion control of latency-sensitive flows
CN111464456A (zh) * 2020-03-31 2020-07-28 杭州迪普科技股份有限公司 一种流量控制方法及装置
US20220263767A1 (en) * 2017-08-11 2022-08-18 Huawei Technologies Co., Ltd. Network Congestion Notification Method, Agent Node, and Computer Device

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10567285B2 (en) 2017-03-17 2020-02-18 Citrix Systems, Inc. Increasing QoS throughput and efficiency through lazy byte batching
US11381509B2 (en) * 2017-03-17 2022-07-05 Citrix Systems, Inc. Increased packet scheduling throughput and efficiency using úber batching
US11153211B2 (en) * 2017-12-09 2021-10-19 Intel Corporation Fast data center congestion response
CN109936510B (zh) * 2017-12-15 2022-11-15 微软技术许可有限责任公司 多路径rdma传输
CN109981471B (zh) * 2017-12-27 2023-04-18 华为技术有限公司 一种缓解拥塞的方法、设备和系统
CN110061923B (zh) * 2018-01-19 2022-10-04 北京金山云网络技术有限公司 流量控制方法、装置、交换机、发送端服务器及介质
CN108418767B (zh) * 2018-02-09 2021-12-21 华为技术有限公司 数据传输方法、设备及计算机存储介质
CN112005528B (zh) * 2018-06-07 2022-08-26 华为技术有限公司 一种数据交换方法、数据交换节点及数据中心网络
US11418446B2 (en) * 2018-09-26 2022-08-16 Intel Corporation Technologies for congestion control for IP-routable RDMA over converged ethernet
CN109347761B (zh) * 2018-11-29 2022-08-26 新华三技术有限公司 一种流量转发控制方法及装置
CN109639534B (zh) * 2019-01-11 2021-03-02 锐捷网络股份有限公司 一种测试网络传输性能的方法、装置及计算机存储介质
CN109902204A (zh) * 2019-01-16 2019-06-18 北京左江科技股份有限公司 一种内容模糊查找方法
CN111490943A (zh) * 2019-01-29 2020-08-04 中兴通讯股份有限公司 拥塞控制方法、终端及可读存储介质
CN111865810B (zh) * 2019-04-30 2022-08-09 华为技术有限公司 一种拥塞信息采集方法、系统、相关设备及计算机存储介质
WO2020236272A1 (en) * 2019-05-23 2020-11-26 Cray Inc. System and method for facilitating fine-grain flow control in a network interface controller (nic)
US11005770B2 (en) * 2019-06-16 2021-05-11 Mellanox Technologies Tlv Ltd. Listing congestion notification packet generation by switch
CN112242956B (zh) * 2019-07-18 2024-04-26 华为技术有限公司 流速控制方法和装置
US11575609B2 (en) 2019-07-19 2023-02-07 Intel Corporation Techniques for congestion management in a network
CN110647071B (zh) * 2019-09-05 2021-08-27 华为技术有限公司 一种控制数据传输的方法、装置及存储介质
CN112751776B (zh) * 2019-10-30 2024-07-19 华为技术有限公司 拥塞控制方法和相关装置
CN113141311A (zh) * 2020-01-19 2021-07-20 华为技术有限公司 一种用于获取转发信息的方法及装置
CN111614471B (zh) * 2020-04-29 2022-06-07 网络通信与安全紫金山实验室 一种基于sdn的dcqcn数据传输系统及传输方法
CN113691410B (zh) * 2020-05-19 2023-05-12 花瓣云科技有限公司 网络性能数据的获取方法、装置和服务器
CN111628999B (zh) * 2020-05-27 2022-07-26 网络通信与安全紫金山实验室 一种基于sdn的fast-cnp数据传输方法及系统
CN114095448A (zh) * 2020-08-05 2022-02-25 华为技术有限公司 一种拥塞流的处理方法及设备
CN114143827A (zh) * 2020-09-03 2022-03-04 华为技术有限公司 RoCE网络拥塞控制的方法及相关装置
CN112565087A (zh) * 2020-11-23 2021-03-26 盛科网络(苏州)有限公司 一种pfc反压报文及其处理方法
US11991246B2 (en) * 2020-12-30 2024-05-21 Oracle International Corporation Cloud scale multi-tenancy for RDMA over converged ethernet (RoCE)
CN113162864B (zh) * 2021-04-25 2022-11-08 中国工商银行股份有限公司 RoCE网络流量控制方法、装置、设备及存储介质
CN113411264B (zh) * 2021-06-30 2023-03-14 中国工商银行股份有限公司 一种网络队列的监控方法、装置、计算机设备和存储介质
CN113709242A (zh) * 2021-08-26 2021-11-26 华为技术有限公司 报文转发方法和通信装置
US11765237B1 (en) * 2022-04-20 2023-09-19 Mellanox Technologies, Ltd. Session-based remote direct memory access
CN115314442B (zh) * 2022-08-08 2023-09-12 北京云脉芯联科技有限公司 拥塞控制和基于Group的限速限窗装置及方法、限速限窗方法
CN115914115B (zh) * 2022-12-15 2024-10-18 苏州浪潮智能科技有限公司 网络拥塞控制方法、装置及通信系统
CN116032852B (zh) * 2023-03-28 2023-06-02 新华三工业互联网有限公司 基于会话的流量控制方法、装置、系统、设备及存储介质
CN116506366A (zh) * 2023-06-30 2023-07-28 无锡沐创集成电路设计有限公司 报文传输方法、装置、设备、介质及产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102255808A (zh) * 2011-07-08 2011-11-23 福建星网锐捷网络有限公司 拥塞通告方法、装置、系统及网络设备
CN103907323A (zh) * 2012-10-30 2014-07-02 华为技术有限公司 缓解网络拥塞的方法、核心网设备和接入网设备
CN104852855A (zh) * 2014-02-19 2015-08-19 华为技术有限公司 拥塞控制方法、装置及设备
US20150381505A1 (en) * 2014-06-30 2015-12-31 Vmware, Inc. Framework for Early Congestion Notification and Recovery in a Virtualized Environment

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060203730A1 (en) * 2005-03-14 2006-09-14 Zur Uri E Method and system for reducing end station latency in response to network congestion
US8121038B2 (en) * 2007-08-21 2012-02-21 Cisco Technology, Inc. Backward congestion notification
CN102255790A (zh) * 2010-05-18 2011-11-23 中兴通讯股份有限公司 拥塞控制信息的通知方法和系统
CN102480471B (zh) * 2010-11-24 2014-09-17 杭州华三通信技术有限公司 实现监控RRPP环中QoS处理的方法和网络节点
WO2012101763A1 (ja) * 2011-01-25 2012-08-02 富士通株式会社 通信装置、通信システム、通信方法、および通信プログラム
EP2764747B1 (en) * 2011-10-04 2015-12-30 Telefonaktiebolaget LM Ericsson (PUBL) Congestion handling in a base station of a mobile network
US8730806B2 (en) * 2012-04-03 2014-05-20 Telefonaktiebolaget L M Ericsson (Publ) Congestion control and resource allocation in split architecture networks
US9197568B2 (en) * 2012-10-22 2015-11-24 Electronics And Telecommunications Research Institute Method for providing quality of service in software-defined networking based network and apparatus using the same
US9185015B2 (en) * 2013-02-19 2015-11-10 Broadcom Corporation Application aware elephant flow identification
US9094445B2 (en) * 2013-03-15 2015-07-28 Centripetal Networks, Inc. Protecting networks from cyber attacks and overloading
US20160192233A1 (en) * 2013-06-13 2016-06-30 Telefonaktiebolaget L M Ericsson (Publ) Congestion control for a multimedia session
CN105850083B (zh) * 2013-12-31 2019-12-31 意大利电信股份公司 多播通信网络中的拥塞管理
US20160044530A1 (en) * 2014-08-08 2016-02-11 Industrial Technology Research Institute Method and apparatus of congestion management
US20160380884A1 (en) * 2015-06-26 2016-12-29 Futurewei Technologies, Inc. Flow-Based Distribution in Hybrid Access Networks
US10009277B2 (en) * 2015-08-04 2018-06-26 Mellanox Technologies Tlv Ltd. Backward congestion notification in layer-3 networks
US10237376B2 (en) * 2015-09-29 2019-03-19 Mellanox Technologies, Ltd. Hardware-based congestion control for TCP traffic
US10320681B2 (en) * 2016-04-12 2019-06-11 Nicira, Inc. Virtual tunnel endpoints for congestion-aware load balancing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102255808A (zh) * 2011-07-08 2011-11-23 福建星网锐捷网络有限公司 拥塞通告方法、装置、系统及网络设备
CN103907323A (zh) * 2012-10-30 2014-07-02 华为技术有限公司 缓解网络拥塞的方法、核心网设备和接入网设备
CN104852855A (zh) * 2014-02-19 2015-08-19 华为技术有限公司 拥塞控制方法、装置及设备
US20150381505A1 (en) * 2014-06-30 2015-12-31 Vmware, Inc. Framework for Early Congestion Notification and Recovery in a Virtualized Environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3461082A4 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220263767A1 (en) * 2017-08-11 2022-08-18 Huawei Technologies Co., Ltd. Network Congestion Notification Method, Agent Node, and Computer Device
US10721174B2 (en) 2018-10-09 2020-07-21 Cisco Technology, Inc. Network-based coordination of loss/delay mode for congestion control of latency-sensitive flows
US11509595B2 (en) 2018-10-09 2022-11-22 Cisco Technology, Inc. Network-based coordination of loss/delay mode for congestion control of latency-sensitive flows
US20190342199A1 (en) * 2019-02-08 2019-11-07 Intel Corporation Managing congestion in a network
EP3694165A1 (en) * 2019-02-08 2020-08-12 INTEL Corporation Managing congestion in a network
US10944660B2 (en) 2019-02-08 2021-03-09 Intel Corporation Managing congestion in a network
CN111464456A (zh) * 2020-03-31 2020-07-28 杭州迪普科技股份有限公司 一种流量控制方法及装置
CN111464456B (zh) * 2020-03-31 2023-08-29 杭州迪普科技股份有限公司 一种流量控制方法及装置

Also Published As

Publication number Publication date
EP3461082A4 (en) 2019-04-10
CN107493238A (zh) 2017-12-19
EP3461082B1 (en) 2022-06-15
US20190116126A1 (en) 2019-04-18
US11115339B2 (en) 2021-09-07
EP3461082A1 (en) 2019-03-27

Similar Documents

Publication Publication Date Title
WO2017215392A1 (zh) 一种网络拥塞控制方法、设备及系统
US11882041B2 (en) Congestion notification packet indicating specific packet flow experiencing congestion to facilitate individual packet flow based transmission rate control
US11979322B2 (en) Method and apparatus for providing service for traffic flow
US8238347B2 (en) Fibre channel over ethernet
US9246834B2 (en) Fibre channel over ethernet
US8565231B2 (en) Ethernet extension for the data center
US8929218B2 (en) Congestion notification across multiple layer-2 domains
US8351352B1 (en) Methods and apparatus for RBridge hop-by-hop compression and frame aggregation
US7830793B2 (en) Network device architecture for consolidating input/output and reducing latency
WO2020063298A1 (zh) 处理tcp报文的方法、toe组件以及网络设备
US10601610B2 (en) Tunnel-level fragmentation and reassembly based on tunnel context
WO2020063339A1 (zh) 一种实现数据传输的方法、装置和系统
US9900246B2 (en) System and method for loop suppression in transit networks
US10805436B2 (en) Deliver an ingress packet to a queue at a gateway device
WO2022179454A1 (zh) 一种数据处理方法、装置及芯片

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17812503

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017812503

Country of ref document: EP

Effective date: 20181218