WO2021244240A1 - 网络拥塞的控制方法、装置、设备、系统及存储介质 - Google Patents

网络拥塞的控制方法、装置、设备、系统及存储介质 Download PDF

Info

Publication number
WO2021244240A1
WO2021244240A1 PCT/CN2021/093165 CN2021093165W WO2021244240A1 WO 2021244240 A1 WO2021244240 A1 WO 2021244240A1 CN 2021093165 W CN2021093165 W CN 2021093165W WO 2021244240 A1 WO2021244240 A1 WO 2021244240A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
flow control
switch
control information
message
Prior art date
Application number
PCT/CN2021/093165
Other languages
English (en)
French (fr)
Inventor
严金丰
郑合文
韩磊
刘和洋
陶佩莹
王煜
姚学军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21817960.4A priority Critical patent/EP4152705A4/en
Publication of WO2021244240A1 publication Critical patent/WO2021244240A1/zh
Priority to US18/071,263 priority patent/US20230107366A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/11Identifying congestion
    • H04L47/115Identifying congestion using a dedicated packet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/26Flow control; Congestion control using explicit feedback to the source, e.g. choke packets

Definitions

  • This application relates to the field of communication technology, and in particular to a method, device, device, system, and storage medium for controlling network congestion.
  • TCP/IP transmission control protocol/internet protocol
  • RDMA remote direct memory access
  • the currently widely used RDMA protocol is the RDMA (RDMA overconverged ethernet, RoCE) protocol on aggregated Ethernet.
  • RoCE RDMA overconverged ethernet
  • the embodiments of the present application provide a method, device, device, system, and storage medium for controlling network congestion to solve the problems provided by related technologies.
  • the technical solutions are as follows:
  • a method for controlling network congestion includes: the first switch receives a target signaling message sent by a second switch in a target network congestion state, and The target signaling message carries traffic source information.
  • the first switch sends target flow control information to the network device corresponding to the flow source information according to the target signaling message, where the target flow control information is used to instruct to perform flow control.
  • the target switch after receiving the target signaling message sent by the second switch in the congestion state of the target network, it sends the target flow control information to the network device corresponding to the traffic source information carried in the target signaling message , To control the flow with instructions, thereby suppressing the queue backlog on the congested side, ensuring low business latency without affecting business throughput, supporting large-scale RoCE networking, and solving the failure of DCQCN speed control in large-scale high-concurrency scenarios problem.
  • sending the target flow control information to the network device corresponding to the traffic source information according to the target signaling message includes: sending the target signaling message to the network device corresponding to the traffic source information
  • the first flow control information, the first flow control information is used to instruct the network device to suspend sending data packets of the target queue, and the target queue is one or more queues of the network device.
  • sending the first flow control information to the network device corresponding to the traffic source information according to the target signaling message includes: constructing the first priority-based traffic according to the target signaling message To control the PFC message, the value of the time field of the first PFC message is the first value, and the first value is used to indicate the first flow control information; the first PFC message is sent to the network device corresponding to the flow source information.
  • receiving the target signaling message sent by the second switch in the congestion state of the target network includes: receiving the first signaling message sent by the second switch in the congestion state of the target network, The first signaling message is used to instruct to send the first flow control information.
  • receiving the first signaling message sent by the second switch in the congestion state of the target network includes: receiving the first congestion notification packet CNP sent by the second switch in the congestion state of the target network Message, the value of the designated field in the frame header of the first CNP message is the first characteristic value, and the first characteristic value is used to indicate to send the first flow control information.
  • sending the target flow control information to the network device corresponding to the traffic source information according to the target signaling message includes: sending the target signaling message to the network device corresponding to the traffic source information
  • the second flow control information, the second flow control information is used to instruct the network device to continue sending data packets of the target queue, and the target queue is one or more queues of the network device.
  • the second flow control information is used to instruct the network device corresponding to the flow source information to continue sending the data packets of the target queue, so as not to affect the throughput of the service.
  • sending the second flow control information to the network device corresponding to the traffic source information according to the target signaling message includes: constructing the second priority-based flow according to the target signaling message To control the PFC message, the value of the time field of the second PFC message is the second value, and the second value is used to indicate the second flow control information; the second PFC message is sent to the network device corresponding to the flow source information.
  • receiving the target signaling message sent by the second switch in the congestion state of the target network includes: receiving the second signaling message sent by the second switch in the congestion state of the target network, The second signaling message is used to instruct to send the second flow control information.
  • receiving the second signaling message sent by the second switch in the congestion state of the target network includes: receiving the second congestion notification packet CNP sent by the second switch in the congestion state of the target network Message, the value of the designated field in the frame header of the second CNP message is the second characteristic value, and the second characteristic value is used to indicate to send the second flow control information.
  • sending the target flow control information to the network device corresponding to the flow source information according to the target signaling message includes: determining the flow source port according to the flow source information carried in the target signaling message ; Send target flow control information to the network device corresponding to the flow source information through the flow source port.
  • sending the target flow control information to the network device corresponding to the flow source information through the flow source port includes: sending the third flow control information to the network device corresponding to the flow source information through the flow source port Information, the third flow control information is used to instruct to suspend sending data packets of the queue corresponding to the flow source port.
  • sending the target flow control information to the network device corresponding to the flow source information through the flow source port includes: sending the fourth flow control information to the network device corresponding to the flow source information through the flow source port Information, the fourth flow control information is used to instruct to continue sending the data packets of the queue corresponding to the flow source port.
  • a method applied to a second switch includes: the second switch recognizes the network congestion state; in response to the network congestion state being the target network congestion state, sending a target signaling message to the first switch, the target signal The message is made to carry traffic source information, and the target signaling message is used to instruct the first switch to perform flow control.
  • the target signaling message is used to instruct the first switch to perform flow control, thereby suppressing the queue backlog on the congested side, ensuring low service delay, and not Affects the throughput of the business, can support large-scale RoCE networking, and solve the problem of DCQCN speed control failure in large-scale high-concurrency scenarios.
  • the target signaling message includes the first signaling message or the second signaling message, and in response to the network congestion state being the target network congestion state, the target signaling message is sent to the first switch
  • the signaling message includes: in response to the network congestion state being the target network congestion state, and the current queue length is greater than the first threshold, sending a first signaling message to the first switch, the first signaling message being used to indicate the first
  • a switch sends first flow control information, where the first flow control information is used to instruct the network device corresponding to the flow source information to suspend sending data packets of the target queue, and the target queue is one or more queues of the network device; or,
  • a second signaling message is sent to the first switch, the second signaling message is used to instruct the first switch to send the second flow control Information, the second flow control information is used to instruct the network device to continue sending data packets of the target queue, and the second threshold is less than the first threshold.
  • the method before sending the first signaling message to the first switch, the method further includes: acquiring the first CNP message, and adding the value of the specified field in the frame header of the first CNP message The value is set to the first characteristic value, and the first CNP message is used as the first signaling message;
  • the method further includes: acquiring the second CNP message, setting the value of the designated field in the frame header of the second CNP message to the second characteristic value, and setting the second CNP message The message is used as the second signaling message.
  • first signaling message or the second signaling message through the CNP message is only an example, and the first signaling message or the second signaling message can also be constructed through other types of message formats.
  • the application embodiment does not limit this.
  • identifying the network congestion state includes: reading the current queue length and an explicit congestion notification ECN threshold range, the ECN threshold range is used to indicate the probability of adding an ECN identifier, and the ECN identifier is used To indicate that the network is congested; identify the network congestion state according to the current queue length and the ECN threshold range.
  • a method for controlling network congestion is provided.
  • the method is applied to a network device.
  • the method includes: the network device receives target flow control information sent by a first switch, and the target flow control information is used to indicate flow control.
  • the target flow control information is sent after the first switch receives the target signaling message sent by the second switch in the congestion state of the target network; the flow control is performed according to the target flow control information.
  • the flow control after receiving the target flow control information sent by the first switch, the flow control is performed based on the target flow control information, thereby suppressing the queue backlog on the congested side, ensuring low service delay and not affecting the service.
  • Throughput it can support large-scale RoCE networking and solve the problem of DCQCN speed control failure in large-scale high-concurrency scenarios.
  • receiving the target flow control information sent by the first switch includes: receiving first flow control information sent by the first switch, where the first flow control information is used to instruct to suspend sending the target queue For data packets, the target queue is one or more queues of the network device;
  • Performing flow control according to the target flow control information includes: suspending sending data packets of the target queue according to the first flow control information.
  • receiving the first flow control information sent by the first switch includes: receiving the first PFC message sent by the first switch, and the value of the time field of the first PFC message is The first value, the first value is used to indicate the first flow control information;
  • Suspending sending the data packet of the target queue according to the first flow control information includes: determining a time length for suspending sending the data packet according to the value of the time field of the first PFC message, and suspending sending the data packet of the target queue within the time length.
  • receiving the target flow control information sent by the first switch includes: receiving second flow control information sent by the first switch, where the second flow control information is used to instruct to continue sending the target queue For data packets, the target queue is one or more queues of the network device;
  • Performing flow control according to the target flow control information includes: continuing to send data packets of the target queue according to the second flow control information.
  • receiving the second flow control information sent by the first switch includes: receiving a second PFC message sent by the first switch, and the value of the time field of the second PFC message is The second value, the second value is used to indicate the second flow control information;
  • Continuing to send the data packet of the target queue according to the second flow control information includes: continuing to send the data packet of the target queue according to the value of the time field of the second PFC message.
  • a device for controlling network congestion includes:
  • the receiving module is configured to receive a target signaling message sent by the second switch in a congested state of the target network, and the target signaling message carries traffic source information;
  • the sending module is used to send target flow control information to the network device corresponding to the flow source information according to the target signaling message, and the target flow control information is used to instruct to perform flow control.
  • the sending module is configured to send the first flow control information to the network device corresponding to the flow source information according to the target signaling message, and the first flow control information is used to instruct the network device to suspend Send the data packet of the target queue, the target queue is one or more queues of the network device.
  • the sending module is configured to construct a first priority-based flow control PFC message according to the target signaling message, and the value of the time field of the first PFC message is the first Value, the first value is used to indicate the first flow control information; the first PFC message is sent to the network device corresponding to the flow source information.
  • the receiving module is configured to receive the first signaling message sent by the second switch in the congestion state of the target network, and the first signaling message is used to instruct to send the first flow control information.
  • the receiving module is configured to receive the first congestion notification packet CNP message sent by the second switch in the congestion state of the target network, and the designated field in the frame header of the first CNP message
  • the value of is the first characteristic value, and the first characteristic value is used to indicate to send the first flow control information.
  • the sending module is configured to send the second flow control information to the network device corresponding to the flow source information according to the target signaling message, and the second flow control information is used to instruct the network device to continue Send the data packet of the target queue, the target queue is one or more queues of the network device.
  • the sending module is configured to construct a second priority-based flow control PFC message according to the target signaling message, and the value of the time field of the second PFC message is second The second value is used to indicate the second flow control information; the second PFC message is sent to the network device corresponding to the flow source information.
  • the receiving module is configured to receive a second signaling message sent by the second switch in a congested state of the target network, and the second signaling message is used to instruct to send the second flow control information.
  • the receiving module is configured to receive the second congestion notification packet CNP message sent by the second switch in the congestion state of the target network, and the designated field in the frame header of the second CNP message
  • the value of is the second characteristic value, and the second characteristic value is used to indicate to send the second flow control information.
  • the sending module is used to determine the traffic source port according to the traffic source information carried in the target signaling message; send the target flow control to the network device corresponding to the traffic source information through the traffic source port information.
  • the sending module is configured to send third flow control information to the network device corresponding to the flow source information through the flow source port, and the third flow control information is used to instruct to suspend the sending station.
  • the sending module is configured to send fourth flow control information to the network device corresponding to the flow source information through the flow source port, where the fourth flow control information is used to instruct to continue sending The data packets of the queue corresponding to the traffic source port.
  • a device for controlling network congestion includes:
  • Identification module used to identify network congestion status
  • the sending module is used to send a target signaling message to the first switch in response to the network congestion state being the target network congestion state, the target signaling message carries traffic source information, and the target signaling message is used to instruct the first switch to perform traffic control.
  • the target signaling message includes the first signaling message or the second signaling message
  • the sending module is configured to respond to the network congestion state as the target network congestion state, and If the current queue length is greater than the first threshold, send a first signaling message to the first switch, the first signaling message is used to instruct the first switch to send first flow control information, and the first flow control information is used to indicate the source of the flow
  • the network device corresponding to the message suspends sending data packets of the target queue, which is one or more queues of the network device; or,
  • a second signaling message is sent to the first switch, the second signaling message is used to instruct the first switch to send the second flow control Information, the second flow control information is used to instruct the network device to continue sending data packets of the target queue, and the second threshold is less than the first threshold.
  • the device further includes:
  • An acquiring module configured to acquire the first CNP message, set the value of the designated field in the frame header of the first CNP message to the first characteristic value, and use the first CNP message as the first signaling message;
  • the acquiring module is configured to acquire the second CNP message, set the value of the designated field in the frame header of the second CNP message as the second characteristic value, and use the second CNP message as the second signaling message.
  • the identification module is used to read the current queue length and the explicit congestion notification ECN threshold range
  • the ECN threshold range is used to indicate the probability of adding an ECN mark
  • the ECN mark is used to indicate The network is congested; the network congestion status is identified according to the current queue length and the ECN threshold range.
  • the target network congestion status includes ECN failure status or congestion notification packet CNP failure status
  • the identification module is used to respond to the current queue length being greater than the maximum value of the reference range, and no supplement CNP message, the network congestion state is ECN failure state, and the reference range is determined based on the ECN threshold range; in response to the current queue length being greater than the maximum value of the reference range, and CNP messages have been supplemented, the network congestion state is CNP failure state.
  • a device for controlling network congestion includes:
  • the receiving module is configured to receive target flow control information sent by the first switch, the target flow control information is used to instruct flow control, and the target flow control information shown is the target sent by the second switch when the first switch receives the target network in a congested state Sent after the signaling message;
  • the control module is used to perform flow control according to the target flow control information.
  • the receiving module is configured to receive the first flow control information sent by the first switch, and the first flow control information is used to instruct to suspend sending data packets of the target queue, and the target queue is the network One or more queues of the device;
  • the control module is configured to suspend sending the data packets of the target queue according to the first flow control information.
  • the receiving module is configured to receive the first PFC message sent by the first switch, the value of the time field of the first PFC message is the first value, and the first value is used for Indicate the first flow control information;
  • the control module is configured to determine the length of time for suspending sending data packets according to the value of the time field of the first PFC message, and suspend sending the data packets of the target queue within the time length.
  • the receiving module is configured to receive second flow control information sent by the first switch, and the second flow control information is used to instruct to continue sending data packets of the target queue, and the target queue is the network One or more queues of the device;
  • the control module is configured to continue to send data packets of the target queue according to the second flow control information.
  • the receiving module is configured to receive the second PFC message sent by the first switch, the value of the time field of the second PFC message is the second value, and the second value is used for Indicating the second flow control information;
  • the control module is configured to continue to send data packets of the target queue according to the value of the time field of the second PFC message.
  • the target network congestion state includes an explicit congestion notification (ECN) failure state or a congestion notification packet (congestion notification packet, CNP) failure state;
  • ECN explicit congestion notification
  • CNP congestion notification packet
  • ECN failure status refers to the status that the current queue length of the second switch is greater than the maximum value of the reference range and does not supplement CNP packets
  • CNP failure status refers to the status that the current queue length of the second switch is greater than the maximum value of the reference area and has been supplemented
  • the state of the CNP message; the reference range is determined based on the ECN threshold range.
  • the ECN threshold range is used to indicate the probability of adding an ECN identifier, and the ECN identifier is used to indicate network congestion.
  • a network congestion control device comprising: a memory and a processor, the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement any of the above-mentioned first aspects. 1. The method for controlling network congestion.
  • a network congestion control device comprising: a memory and a processor, and at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement any of the above-mentioned second aspects. 1. The method for controlling network congestion.
  • a network congestion control device comprising: a memory and a processor, and at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor, so as to implement any of the foregoing third aspect. 1. The method for controlling network congestion.
  • a network congestion control system is also provided, which includes the above three devices.
  • a computer-readable storage medium is also provided, and at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to realize the network congestion prevention according to any one of the first aspect to the third aspect above. Control Method.
  • Another communication device which includes a transceiver, a memory, and a processor.
  • the transceiver, the memory, and the processor communicate with each other through an internal connection path, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory to control the transceiver to receive signals and control the transceiver to send signals
  • the processor executes the instructions stored in the memory, the processor is caused to execute the first aspect or the method in any one of the possible implementation manners of the first aspect.
  • the processor executes the instructions stored in the memory
  • the processor is caused to execute the second aspect or the method in any one of the possible implementation manners of the second aspect.
  • the processor executes the instructions stored in the memory, the processor is caused to execute the third aspect or the method in any one of the possible implementation manners of the third aspect.
  • processors there are one or more processors, and one or more memories.
  • the memory may be integrated with the processor, or the memory and the processor may be provided separately.
  • the memory can be a non-transitory (non-transitory) memory, such as a read only memory (ROM), which can be integrated with the processor on the same chip, or can be set in different On the chip, the embodiment of the present application does not limit the type of the memory and the setting mode of the memory and the processor.
  • ROM read only memory
  • a computer program (product) is provided, the computer program (product) includes: computer program code, when the computer program code is run by a computer, the computer executes the methods in the above aspects.
  • a chip including a processor, configured to call and execute instructions stored in the memory from a memory, so that a communication device installed with the chip executes the methods in the foregoing aspects.
  • Another chip including: an input interface, an output interface, a processor, and a memory.
  • the input interface, output interface, the processor, and the memory are connected through an internal connection path, and the processor is used to execute all When the code in the memory is executed, the processor is used to execute the methods in the foregoing aspects.
  • Figure 1 is a schematic diagram of a network architecture provided in related technologies
  • Figure 2 is a schematic diagram of an ECN identification model provided by an embodiment of the application.
  • Figure 3 is a schematic diagram of a traffic model provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of a streaming transmission process provided by an embodiment of the application.
  • Figure 6 is a schematic diagram of a traffic model provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of the relationship between the qp number of the server and the delay provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of a traffic model provided by an embodiment of the application.
  • Fig. 9 is a schematic diagram of changes in the flow rate and the number of CNPs provided by related technologies.
  • FIG. 10 is a schematic structural diagram of a data center network provided by an embodiment of this application.
  • FIG. 11 is a flowchart of a method for controlling network congestion provided by an embodiment of the application.
  • Figure 12 is a schematic diagram of an application scenario provided by an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of a network congestion control apparatus provided by an embodiment of the application.
  • FIG. 14 is a schematic structural diagram of a network congestion control device provided by an embodiment of the application.
  • 15 is a schematic structural diagram of a network congestion control device provided by an embodiment of the application.
  • FIG. 16 is a schematic structural diagram of a network congestion control device provided by an embodiment of the application.
  • RoCE1 is an RDMA protocol implemented based on the Ethernet link layer
  • RoCEv2 is an RDMA protocol implemented based on the user datagram protocol (UDP) layer of the Ethernet TCP/IP protocol.
  • UDP user datagram protocol
  • the network architecture includes a sending end network interface controller (NIC), a switch, and a receiving end NIC.
  • the sending end NIC is a response point (Reaction Point, RP)
  • the switch is a congestion point (CP)
  • the receiving end NIC is a notification point (Notification Point, NP).
  • the switch checks the queue buffer length of the switch's outgoing port. If the queue buffer length at the egress end of the switch exceeds a given threshold, the data packet is marked with an explicit congestion notification (ECN) identifier according to the marking probability.
  • ECN was originally defined in RFC 3168. When a switch detects congestion, it will implement it by embedding a congestion indicator in the IP header and a congestion confirmation in the TCP header.
  • the RoCEv2 standard defines RoCEv2 Congestion Management (RCM). After enabling ECN, once the switch detects that RoCEv2 traffic is congested, it will mark the ECN field in the IP header of the packet.
  • the congestion indicator is interpreted by the destination terminal node according to the forward explicit congestion notification (FECN) congestion indication identifier in the base transport header (BTH) existing in the IB data segment.
  • FECN forward explicit congestion notification
  • BTH base transport header
  • the congestion notification will be fed back to the source node, and the source node will then perform network data on the problematic queue pairs (QP). Packet rate limit in response to congestion notification.
  • the marking probability when the queue buffer length at the egress end of the switch is less than the given lower threshold Kmin, the marking probability is 0%; when the queue at the egress end of the switch When the buffer length is greater than the given upper threshold Kmax, the marking probability is 100%; when the queue buffer length at the exit of the switch is between Kmin and Kmax, the marking probability linearly increases with the queue buffer length.
  • the receiving end NIC when a data packet carrying the ECN identifier arrives at the receiving end NIC, it indicates that congestion has occurred in the network, and the receiving end NIC returns a congestion notification packet (CNP) message for the message carrying the ECN identifier.
  • CNP congestion notification packet
  • the receiving end NIC In order to pass the congestion information to the sender NIC through the CNP message.
  • the receiving end NIC immediately sends a CNP message.
  • the length of the reference time period is N microseconds, and the value of N can be configured as 0, that is, the receiving end NIC returns a CNP message every time it receives a message carrying an ECN identifier.
  • CNP messages will also have delays and packet loss. Every hop of equipment and every link that passes from the sender to the receiver will have a certain delay, which will eventually increase the CNP message received by the sender. The time of the text. At the same time, congestion on the outgoing port of the switch will gradually increase. If the sending end cannot slow down in time, packet loss may still occur. Therefore, on the sender side, the sender NIC controls the sending rate of each data stream. For example, the speed reduction control of the data stream is triggered by the received CNP message, and the speed increase control of the data stream is triggered by the time timer and byte counter. The speed increase control and the speed reduction control are independent of each other.
  • the bandwidth allocated to each data stream decreases, the number of CNP packets sent per unit time also decreases, and the CNP packet interval increases accordingly.
  • the bandwidth limit of the sender sends out two thousand data packets per second, and the interval of each data packet is 500us, so even if each data packet is marked with an ECN mark, the minimum interval of CNP packets returned by the receiver is 500us.
  • the speed-up period of the DCQCN source end is 300us, at this time the interval of CNP is greater than the speed-up period.
  • a single top of rack (ToR) switch realizes data stream transmission from multiple senders to one receiver, and the TCP protocol and the RoCE protocol are in accordance with 9 :1 bandwidth ratio for transmission as an example.
  • the relationship between the number of data streams, ECN, and queue backlog is shown in Figure 4. It can be seen from Figure 4 that when the number of data streams is small, for example, when the number of data streams is less than 4, DCQCN works very well, the ECN marking rate increases with the increase of the number of data streams, and the queue buffer length is controlled very low. , The business delay is also very small.
  • the related technology also proposes a method to record the congestion degree of the switch's outgoing port and the time of the received CNP message on the switch side, and determine whether to supplement the CNP message according to the congestion degree of the switch's outgoing port method.
  • the switch checks the time interval of the CNP packet of the data flow passing through the outgoing port. If no CNP message passes during the DCQCN speed-up period, the switch side will add a CNP message to the sender and update the CNP message time.
  • the switch only updates the elapsed time of the CNP packet, and no longer supplements the CNP packet.
  • the outgoing port of the switch is not congested, only the elapsed time of the CNP message is updated, and the CNP message is no longer supplemented.
  • the flow model shown in Figure 8 is used for further analysis.
  • the sender 1 (sender1) and the sender 2 (sender2) in the traffic model simultaneously send data streams to the receiver (receiver), and the receiver can construct as many CNP packets as needed to sender1 and sender2.
  • the relationship between the number of CNP packets processed by the server per second and the flow rate can be shown in Figure 9.
  • the rate of the server decreases as the number of CNP packets processed per unit time increases, but when the rate drops to a certain value, no more Decline, call this phenomenon CNP failure.
  • test results shown in Table 1 are obtained. It can be seen from the test results shown in Table 1 below that at any ratio, CNP failures will occur. At this time, the queue backlog is very deep and the service delay is very large.
  • TCP RoCE bandwidth ratio Number of data streams Port accumulation Time delay 90:10 7*11 19446KB 31637.50us 70:30 7*40 19757KB 10360.06us 50:50 7*65 18845KB 5987.04us 30:70 7*85 16829KB 3824.71us 10:90 7*115 18160KB 3206.68us
  • an embodiment of the present application provides a method for controlling network congestion.
  • This method divides the congestion state of the network into ECN normal state, ECN failure state and CNP failure state.
  • the priority flow control (priority flow control) is generated according to the path level trigger type.
  • -based flow control (PFC) message the flow is controlled through PFC, thereby suppressing the queue backlog on the congested side, ensuring low service latency without affecting service throughput, and supporting large-scale RoCE networking.
  • the method provided in the embodiment of the present application can be applied to the data center network shown in FIG. 10.
  • the data center network is a Clos network architecture.
  • the Clos network architecture is a switching architecture that can be rearranged, non-blocking, and scalable, and is used for high-performance computing, high-performance distributed storage, big data, artificial intelligence, etc. .
  • the Clos network architecture includes: a first switch, a second switch, and an origin server.
  • the first switch is a ToR switch shown by T in FIG. 10
  • the second switch is a ToR switch shown by T different from the first switch in FIG. 10
  • the source server is a server (Host) shown by H in FIG.
  • the first switch can be used as the source switch
  • the second switch can be used as the congested side switch
  • the source server can be used as the network device corresponding to the traffic source.
  • the functions of each switch are as follows.
  • ECN failure status and CNP failure status can be regarded as the target network congestion status.
  • the second switch constructs a target signaling message to notify the source switch that the first switch is currently congested in the target network state. For example, enter the CNP failure state and perform the PFC supplement operation. For example, if the queue length is greater than the given upper limit, the second switch constructs the first signaling message and sends it to the first switch. Taking the first signaling message as a CNP message as an example, the second switch sends the CNP message to the The designated field of the frame header of, for example, the reserved field is filled with the first characteristic value, such as 1. If the queue length is less than the given lower limit, the second switch constructs a second signaling message and sends it to the first switch. Still taking the first signaling message as a CNP message as an example, the second switch The reserved field of the frame header is filled with the second characteristic value, for example, 2.
  • First switch The first switch receives and parses the target signaling message sent by the second switch, and finds that the notification object is the first switch itself. If the reserved field is the first characteristic value 1, it identifies the target signaling message If it is the first signaling message, the first switch constructs the first PFC message and sends it to the corresponding source server (H) to trigger the corresponding source server to suspend sending data packets of the target queue. For example, the first PFC message constructed by the first switch is an xoff PFC message.
  • the first switch constructs a second PFC message and sends it to the corresponding source server to trigger the corresponding source server to continue sending The packet of the destination queue.
  • the second PFC message constructed by the first switch is an xon PFC message.
  • the target queue is one or more queues of the source server, which can be indicated by a PFC message.
  • the first switch can also perform flow control through the flow source port. For example, the corresponding target flow control information is sent to the source server corresponding to the flow source information through the flow source port to instruct to suspend or continue sending the data packets of the queue corresponding to the flow source port.
  • the intermediate switch is shown in the leaf (L) in FIG. Leaf) switch and the convergence (Spine) switch shown in S in Figure 10.
  • Intermediate switch (L, S): The intermediate switch receives and parses the target signaling message sent by the second switch. If the notification object is not the intermediate switch itself, the intermediate switch does not perform special operations and continues to forward the target signaling message.
  • the number of switches shown in FIG. 10 is only an exemplary description, and the embodiment of the present application does not limit the number of switches.
  • the source server is only an example of a network device corresponding to the traffic source. In addition to the source server, it may also be other switches.
  • the embodiment of the present application does not limit the type of the network device corresponding to the traffic source.
  • the method provided by the embodiment of the present application is not limited to the data center network scenario shown in FIG. 10, but can also be applied to other scenarios where the DCQCN technology is used. The embodiment of the present application does not limit the application scenario.
  • the method for controlling network congestion provided by the embodiment of the present application will be described.
  • the method can be implemented through interaction between the first switch, the second switch, and the network device corresponding to the traffic source.
  • the network device is the source server.
  • the method provided by the embodiment of the present application includes the following steps.
  • the second switch recognizes the network congestion state.
  • the second switch recognizes the network congestion state, including but not limited to the following two processes 1101A and 1101B.
  • the threshold range of the ECN is used to indicate the probability of adding an ECN mark, and the ECN mark is used to indicate that the network is congested.
  • the second switch is a switch that detects congestion
  • the current queue length is the queue length of the data currently buffered by the outgoing port of the second switch.
  • the second switch checks the queue length of the data buffered by the outgoing port of the second switch. If the queue length of the buffered data at the egress end of the second switch exceeds a given threshold, the data packet is marked with an ECN identifier according to the marking probability. Therefore, the ECN identifier can be used to indicate network congestion.
  • the embodiment of the present application does not limit the size of a given threshold. For example, it can be set based on application scenarios or based on experience.
  • the marking probability is determined based on the ECN threshold range and queue length.
  • the ECN threshold range refers to a given lower threshold Kmin and a given upper threshold The range formed by Kmax.
  • the threshold range of the ECN is used to indicate the probability of adding an ECN identifier. The more serious the network congestion, the greater the number of packets with the ECN identifier.
  • the method provided in the embodiments of the present application recognizes the network congestion state through the current queue length and the threshold range of the ECN.
  • the application of this method to the application scenario shown in FIG. 12 is taken as an example.
  • the network architecture includes four L switches, namely switch L1, switch L2, switch L3, and switch L4, and includes two S switches, switch S1 and switch S2, respectively.
  • the switch L4 is the second switch and reads the current queue length, the current kmin and kmax values of the ECN.
  • the second switch recognizes the network congestion state according to the current queue length and the ECN threshold range.
  • the network congestion state includes but is not limited to the ECN normal state, the ECN failure state, and the congestion notification packet CNP failure state.
  • the network congestion state is identified according to the current queue length and the ECN threshold range, including but not limited to the following three identification results:
  • Recognition result 1 In response to the current queue length being within the reference range, the network congestion state is the ECN normal state.
  • the embodiments of the present application do not limit the reference range, and may be set based on experience, or may be adjusted based on application scenarios.
  • the reference range is determined based on the ECN threshold range. Taking the range between 0 and 1.5 times the maximum Kmax as the reference range as an example, if the current queue length is within the reference range of 0-1.5 Kmax, the network congestion state is the ECN normal state.
  • the maximum value of the reference range is 1.5Kmax. If the current queue length is greater than 1.5Kmax and no CNP packets are supplemented, the network is congested.
  • the status is ECN failure status.
  • the second switch may supplement the CNP message. For example, the congestion degree of the outgoing port of the second switch and the time of the received CNP packet are recorded on the second switch side, and whether to supplement the CNP packet is determined according to the congestion degree of the outgoing port of the second switch.
  • the manner of supplementing the CNP message refer to the related description of FIG. 5 above, and will not be repeated here.
  • the maximum value of the reference range is 1.5Kmax. If the current queue length is greater than 1.5Kmax and CNP packets have been supplemented, the network is congested The status is CNP failure status.
  • switch L4 is congested. After reading the current queue length, the current kmin and kmax values of ECN, if the current queue length is near the reference range of kmin-kmax, For example, the reference area is [0,1.5*kmax], the switch L4 recognizes that the current network congestion state is the ECN normal state. If the queue length is much greater than 1.5 kmax, for example, greater than 3* kmax, and the CNP packet is not supplemented, the switch L4 recognizes that the current network congestion state is an ECN failure state and turns on the intelligent supplementary CNP packet function. If the queue length is still much larger than 1.5kmax, for example, larger than 3*kmax, the switch L4 recognizes that the current network congestion state is a CNP failure state.
  • the second switch In response to the network congestion state being the target network congestion state, the second switch sends a target signaling message to the first switch, the target signaling message carries traffic source information, and the target signaling message is used to instruct the first switch to perform traffic control.
  • the target signaling message includes the first signaling message or the second signaling message, and in response to the network congestion state being the target network congestion state, sending the target signaling message to the first switch includes: But it is not limited to the following two sending situations.
  • Sending situation 1 In response to the network congestion state being the target network congestion state, and the current queue length is greater than the first threshold, the first signaling message is sent to the first switch, and the first signaling message is used to instruct the first switch to send First flow control information, where the first flow control information is used to instruct the network device corresponding to the flow source information to suspend sending data packets of the target queue, and the target queue is one or more queues of the network device.
  • the size of the first threshold can be set based on experience, can also be set based on application scenarios, and can also be adjusted during the implementation of the method.
  • the current queue length is greater than the first threshold, indicating that the congestion situation is serious and flow control needs to be activated.
  • instructing the first switch to perform the first type of flow control through the first signaling message that is, instructing the first switch to send the first flow control information.
  • the first signaling message may be a message in any format, which can instruct the first switch to perform the first type of flow control.
  • the method before sending the first signaling message to the first switch, the method further includes: acquiring the first CNP message, setting the value of the designated field in the frame header of the first CNP message to the first characteristic value, and setting The first CNP message is used as the first signaling message.
  • the frame header of the CNP message includes an 8-bit opcode (pocode) field, a 1-bit solicited event (SE) field, a 1-bit migration request (migreq, M) field, and a 2-bit padding count (Pad Count) field, 4-bit Head version (Head version) field, 16-bit Partition Key (P_KEY) field, 8-bit Reserved (Reserved) field, 24-bit Destination Queue Pair (DestQP) field, 1 bit Ack request (Ack request) field, a 7-bit reserved field, and a 24-bit packet sequence number (PSN) field.
  • pocode 8-bit opcode
  • SE solicited event
  • M 1-bit migration request
  • Pad Count 2-bit padding count
  • P_KEY 16-bit Partition Key
  • DestQP Destination Queue Pair
  • Ack request Ack request
  • PSN packet sequence number
  • the value of the second reserved field of the CNP message is set as the first characteristic value.
  • the value of the 7-bit reserved field in the frame header of the CNP message is set as the first characteristic value
  • the first characteristic value is 1, using 0000001 is represented as an example, and the frame header of the CNP message is shown in Table 3.
  • Sending situation 2 In response to the network congestion state being the target network congestion state, and the current queue length is less than the second threshold, a second signaling message is sent to the first switch, and the second signaling message is used to instruct the first switch to send The second flow control information, the second flow control information is used to instruct the network device to continue sending data packets of the target queue, and the second threshold is less than the first threshold.
  • the size of the second threshold can be set based on experience, or set based on application scenarios, and can also be adjusted during the implementation of the method.
  • the second threshold is less than the first threshold, and the current queue length is less than the second threshold, indicating that the congestion situation is alleviated and another type of flow control needs to be activated.
  • the second signaling message is used to instruct the first switch to perform the second type of flow control.
  • the second signaling message may be a message in any format, which can instruct the first switch to perform the second type of flow control.
  • the method before sending the second signaling message to the first switch, the method further includes: acquiring the second CNP message, setting the value of the designated field in the frame header of the second CNP message to the second characteristic value, and setting The second CNP message is used as the second signaling message.
  • the value of the 7-bit reserved field in the frame header of the CNP message is set as the second characteristic value, and the second characteristic value is 2.
  • the frame header of the CNP message is shown in Table 4.
  • the switch L4 when the switch L4 recognizes that the current network congestion state is the target network congestion state, it compares the current queue length with the given threshold first threshold thh and second threshold thl. If the current queue length is greater than thh, the switch L4 constructs the first signaling message and sends it to the first switch. For example, a CNP message is used as a signaling message, and characteristic value 1 is filled in the reserved field of the frame header of the CNP message. If the current queue length is less than thl, the switch L4 constructs a second signaling message and sends it to the first switch.
  • a CNP message is used as a signaling message, and the characteristic value 2 is filled in the reserved field of the frame header of the CNP message.
  • the reserved field in the frame header of the CNP message is a reserved field, and the default is all 0s.
  • the embodiment of the present application uses the reserved field to construct the first and second signaling messages to notify the first switch to perform different actions, that is, to adopt different flow control methods.
  • the target network congestion state includes an ECN failure state or a CNP failure state; wherein, the ECN failure state refers to a state where the current queue length of the second switch is greater than the maximum value of the reference range and the CNP message is not supplemented;
  • the CNP failure state refers to the state in which the current queue length of the second switch is greater than the maximum value of the reference area and CNP packets have been supplemented.
  • the first switch receives a target signaling message sent by the second switch in a congested state of the target network, where the target signaling message carries traffic source information.
  • the first switch receives the target signaling message sent by the second switch in the congested state of the target network, including but not limited to the following two situations: Kind of reception.
  • Receiving situation 1 The first signaling message sent by the second switch in the congestion state of the target network is received, the first signaling message is used to instruct the first switch to send the first flow control information to perform the first type of flow control.
  • receiving the first signaling message sent by the second switch in the congestion state of the target network includes: receiving the first CNP message sent by the second switch in the congestion state of the target network, in the frame header of the first CNP message
  • the value of the designated field of is the first characteristic value, and the first characteristic value is used to instruct to send the first flow control information.
  • Receiving situation two receiving a second signaling message sent by the second switch in the congestion state of the target network, where the second signaling message is used to instruct the first switch to send second flow control information to perform the second type of flow control.
  • receiving the second signaling message sent by the second switch in the congestion state of the target network includes: receiving the second congestion notification packet CNP message sent by the second switch in the congestion state of the target network.
  • the value of the designated field in the frame header is the second characteristic value, and the second characteristic value is used to indicate to send the second flow control information.
  • the first switch sends target flow control information to a network device corresponding to the flow source information according to the target signaling message, where the target flow control information is used to instruct to perform flow control.
  • the first switch sends the target flow control information to the network device corresponding to the flow source information according to the target signaling message, including but not limited to the following two situations.
  • Case 1 Send first flow control information to the network device corresponding to the traffic source information according to the target signaling message.
  • the first flow control information is used to instruct the network device to suspend sending data packets of the target queue.
  • the target queue is one or Multiple queues.
  • sending the first flow control information to the network device corresponding to the traffic source information according to the target signaling message includes: constructing the first PFC message according to the target signaling message, and the value of the time field of the first PFC message It is the first value, and the first value is used to indicate the first flow control information; the first PFC message is sent to the network device corresponding to the flow source information.
  • Case 2 Send second flow control information to the network device corresponding to the traffic source information according to the target signaling message.
  • the second flow control information is used to instruct the network device to continue sending data packets of the target queue, and the target queue is one or Multiple queues.
  • sending the second flow control information to the network device corresponding to the traffic source information according to the target signaling message includes: constructing a second priority-based flow control PFC message according to the target signaling message, and the second PFC message
  • the value of the time field of the message is the second value, and the second value is used to indicate the second flow control information; the second PFC message is sent to the network device corresponding to the flow source information.
  • PFC is a technology in the IEEE Data Center Bridge (DCB) protocol suite and an enhanced version of flow control.
  • DCB Data Center Bridge
  • the first PFC message needs to be sent to the corresponding network device.
  • the first PFC message includes but is not limited to a PFC message of XOFF. If the reserved field of the target signaling message received by the first switch is 00000010, it means that the target signaling message received by the first switch is the second signaling message, and the first switch needs to send the first signaling message to the corresponding network device.
  • Two PFC messages Exemplarily, the second PFC message includes but is not limited to a PFC message of XON.
  • XON/XOFF is a software data flow communication protocol that controls the data flow between a computer and other devices.
  • X represents the transmitter.
  • XON/XOFF is often referred to as "software flow control".
  • the receiver will send an XOFF character, when it cannot receive any more data (for example, it may take time to process something), when it can receive more data again, it will send an XON character To the transmitter.
  • XON/XOFF PFC messages are used as flow control messages to implement priority-based flow control. Exemplarily, the format of the PFC message is shown in Table 5.
  • the intermediate switch receives and parses the target signaling message. If the notification object is not the intermediate switch itself, no special operation is performed and the target signaling message continues to be forwarded.
  • switch L1 receives the target signaling message, extracts the reserved field from the frame header and parses it, and finds that it is the characteristic value 1, then constructs a PFC message in the format shown in Table 5 and sends it to the corresponding network device, such as the source server . After that, the target signaling message can be discarded.
  • the time[n] field of the PFC message is 65535, which is used to indicate to suspend sending the data packet of the target queue n within the time indicated by 65535.
  • the first switch L2 receives the target signaling message, extracts the reserved field from the frame header and parses it, and finds that it is the characteristic value 2, then constructs the PFC message in the format shown in Table 5 to the corresponding network device, such as the source server. After that, the target signaling message is discarded. Among them, the time[n] field of the PFC message is 0, which is used to indicate to continue sending the data packet of the target queue n.
  • the Priority_enable_vector field e[n] indicates whether the time value of the queue with priority n is valid. Taking a queue with n priority levels for a network device as an example, if the network device is required to suspend sending data packets in all queues, the target queue includes n queues, and the values of e[1] to e[n] are all non-zero. Set the value of time according to the pause time.
  • the value of time is set according to the pause time.
  • the first switch may not specify the target queue, but only indicates through the target flow control information
  • the network device suspends or continues to send data packets, and the network device determines which queue of data packets to suspend or send.
  • the foregoing process in the embodiment of the present application only uses the method of constructing a PFC message to carry target flow control information as an example for description.
  • the PAUSE message can also be used for implementation.
  • the embodiment of the present application does not limit the type of the message carrying the target flow control information.
  • PAUSE message is a message used to control MAC data flow.
  • the PAUSE message When the amount of data at the opposite end is too large and the data cannot be processed in time, it will send a PAUSE message to the upstream MAC of the data (in the embodiment of this application, it is the network device corresponding to the traffic source) to notify the upstream MAC to stop sending data for a period of time ,
  • the stop time is recorded in the PAUSE_TIMING field of the PAUSE packet.
  • the PAUSE_TIMING field recorded with the stop time is used to carry target flow control information.
  • the upstream MAC When the upstream MAC receives a valid PAUSE message from the opposite end, it will start timing and stop sending data to prevent the opposite end from being unable to process the data in time, causing the opposite end to FIFO overflow or data loss. If the timer expires and no new PAUSE message is received, the data will be resent. If the timing is not over, and the PAUSE_TIMING field of the newly received PAUSE message is all 0s, it means that the data can be re-sent. At this time, the timing is stopped and the data is sent again.
  • the first switch sends the target flow control information to the network device corresponding to the flow source information according to the target signaling message, including: determining the flow source port according to the flow source information carried in the target signaling message; The source port sends target flow control information to the network device corresponding to the flow source information.
  • sending the target flow control information to the network device corresponding to the flow source information through the flow source port includes: sending third flow control information to the network device corresponding to the flow source information through the flow source port, where the third flow control information is used To instruct to suspend sending data packets of the queue corresponding to the source port of the traffic.
  • the third flow control information is used to control the suspension of sending data packets of all the queues corresponding to the flow source port, thereby realizing port-level control.
  • the embodiment of this application does not limit the sending mode of the third flow control information.
  • the PAUSE message is sent to the network device corresponding to the traffic source information, and the third flow control information is carried through the PAUSE message to indicate the source port of the traffic.
  • the data packets of the corresponding queue are all suspended.
  • the value of the PAUSE_TIMING field of the PAUSE message is set to be non-zero, so as to carry the third flow control information, which is used to instruct to suspend sending data packets within the time indicated by the field.
  • sending the target flow control information to the network device corresponding to the flow source information through the flow source port includes: sending fourth flow control information to the network device corresponding to the flow source information through the flow source port, and the fourth flow control information is used for Instructs to continue sending data packets of the queue corresponding to the traffic source port.
  • the fourth flow control information is used to control the continued sending of the data packets of all the queues corresponding to the flow source port, thereby achieving port-level control.
  • the embodiment of this application does not limit the sending mode of the fourth flow control information.
  • the PAUSE message is sent to the network device corresponding to the traffic source information, and the fourth flow control information is carried through the PAUSE message to indicate the source port of the traffic.
  • the data packets of the corresponding queues continue to be sent.
  • the value of the PAUSE_TIMING field of the PAUSE message is set to 0, so as to carry the fourth flow control information, which is used to instruct to continue sending the data packet.
  • the method for controlling network congestion includes the following processes.
  • the network device receives the target flow control information sent by the first switch, where the target flow control information is used to instruct flow control, and the target flow control information is the target signaling report sent by the second switch when the second switch is received by the first switch in the target network congestion state. Sent after the text.
  • receiving the target flow control information sent by the first switch includes but is not limited to the following two situations.
  • Case 1 Receive first flow control information sent by the first switch, where the first flow control information is used to instruct to suspend sending data packets of the target queue, and the target queue is one or more queues of the network device.
  • receiving the first flow control information sent by the first switch includes: receiving the first PFC message sent by the first switch, the value of the time field of the first PFC message is a first value, and the first value is used for Indicates the first flow control information.
  • Case 2 Receive second flow control information sent by the first switch, where the second flow control information is used to instruct to continue sending data packets of the target queue, and the target queue is one or more queues of the network device.
  • receiving the second flow control information sent by the first switch includes: receiving the second PFC message sent by the first switch, the value of the time field of the second PFC message is the second value, and the second value is used for Indicates the second flow control information.
  • the network device performs flow control according to the target flow control information.
  • the flow control is performed according to the target flow control information, including but not limited to the following two control methods.
  • Control method 1 Suspend sending data packets of the target queue according to the first flow control information.
  • suspending sending the data packet of the target queue according to the first flow control information includes: determining a time length for suspending sending the data packet according to the value of the time field of the first PFC message, and suspending sending the data of the target queue within the time length Bag.
  • Control method 2 Continue to send data packets of the target queue according to the second flow control information.
  • continuing to send the data packet of the target queue according to the second flow control information includes: continuing to send the data packet of the target queue according to the value of the time field of the second PFC message.
  • the method provided in the embodiment of the present application is used to control network congestion, and the method provided in the embodiment of the present application is compared with the related technology of supplementing the CNP message with respect to the CNP failure state, and the obtained test results are shown in Table 6.
  • the method provided by the embodiments of the present application recognizes the network congestion state and instructs the first switch to perform flow control through a signaling message when it is in the CNP failure state, thereby suppressing the queue backlog on the congested side, ensuring low service delay, and not Affects the throughput of the business and can support large-scale RoCE networking. Not only solves the problem of CNP failure, but also solves the problem of DCQCN speed control failure in large-scale high-concurrency scenarios.
  • the embodiment of the present application provides a device for controlling network congestion, and the device is used to execute the function performed by the first switch in the method for controlling network congestion shown in FIG. 11.
  • the device includes:
  • the receiving module 1301 is configured to receive a target signaling message sent by the second switch in a congested state of the target network, where the target signaling message carries traffic source information;
  • the sending module 1302 is configured to send target flow control information to the network device corresponding to the flow source information according to the target signaling message, and the target flow control information is used to instruct to perform flow control.
  • the sending module 1302 is configured to send first flow control information to the network device corresponding to the traffic source information according to the target signaling message, and the first flow control information is used to instruct the network device to suspend sending data of the target queue
  • the target queue is one or more queues of the network device.
  • the sending module 1302 is configured to construct a first priority-based flow control PFC message according to the target signaling message, and the value of the time field of the first PFC message is the first value, and the first value Used to indicate the first flow control information; send the first PFC message to the network device corresponding to the flow source information.
  • the receiving module 1301 is configured to receive a first signaling message sent by the second switch in a congested state of the target network, and the first signaling message is used to instruct to send the first flow control information.
  • the receiving module 1301 is configured to receive the first congestion notification packet CNP message sent by the second switch in the congestion state of the target network, and the value of the designated field in the frame header of the first CNP message is the first The characteristic value, the first characteristic value is used to indicate to send the first flow control information.
  • the sending module 1302 is configured to send second flow control information to the network device corresponding to the traffic source information according to the target signaling message, and the second flow control information is used to instruct the network device to continue to send data of the target queue
  • the target queue is one or more queues of the network device.
  • the sending module 1302 is configured to construct a second priority-based flow control PFC message according to the target signaling message, the value of the time field of the second PFC message is the second value, and the second value Used to indicate the second flow control information; send a second PFC message to the network device corresponding to the flow source information.
  • the receiving module 1301 is configured to receive a second signaling message sent by the second switch in a congested state of the target network, and the second signaling message is used to instruct to send the second flow control information.
  • the receiving module 1301 is configured to receive the second congestion notification packet CNP message sent by the second switch in the congestion state of the target network, and the value of the designated field in the frame header of the second CNP message is the second The characteristic value, the second characteristic value is used to indicate to send the second flow control information.
  • the sending module 1302 is configured to determine the flow source port according to the flow source information carried in the target signaling message; and send the target flow control information to the network device corresponding to the flow source information through the flow source port.
  • the sending module 1302 is configured to send third flow control information to the network device corresponding to the flow source information through the flow source port, and the third flow control information is used to instruct to suspend sending the queue corresponding to the flow source port. data pack.
  • the sending module 1302 is configured to send fourth flow control information to the network device corresponding to the flow source information through the flow source port, and the fourth flow control information is used to instruct to continue sending the queue corresponding to the flow source port. data pack.
  • the device provided by the embodiment of the present application after receiving the target signaling message sent by the second switch in the target network congestion state, sends the target flow control information to the network device corresponding to the traffic source information carried in the target signaling message , To control the flow with instructions, thereby suppressing the queue backlog on the congested side, ensuring low business latency without affecting business throughput, supporting large-scale RoCE networking, and solving the failure of DCQCN speed control in large-scale high-concurrency scenarios problem.
  • the embodiment of the present application provides a device for controlling network congestion, and the device is used to execute the function performed by the second switch in the method for controlling network congestion shown in FIG. 11.
  • the device includes:
  • the identification module 1401 is used to identify the network congestion state
  • the sending module 1402 is configured to send a target signaling message to the first switch in response to the network congestion state being the target network congestion state, the target signaling message carries traffic source information, and the target signaling message is used to instruct the first switch to perform flow control.
  • the target signaling message includes a first signaling message or a second signaling message
  • the sending module 1402 is configured to respond to the network congestion state as the target network congestion state, and the current queue length is greater than The first threshold, sending a first signaling message to the first switch, the first signaling message is used to instruct the first switch to send first flow control information, and the first flow control information is used to indicate the network device corresponding to the flow source information Suspend sending data packets of the target queue, which is one or more queues of the network device; or,
  • a second signaling message is sent to the first switch, the second signaling message is used to instruct the first switch to send the second flow control Information, the second flow control information is used to instruct the network device to continue sending data packets of the target queue, and the second threshold is less than the first threshold.
  • the device further includes:
  • An acquiring module configured to acquire the first CNP message, set the value of the designated field in the frame header of the first CNP message to the first characteristic value, and use the first CNP message as the first signaling message;
  • the acquiring module is configured to acquire the second CNP message, set the value of the designated field in the frame header of the second CNP message as the second characteristic value, and use the second CNP message as the second signaling message.
  • the identification module 1401 is used to read the current queue length and the explicit congestion notification ECN threshold range.
  • the ECN threshold range is used to indicate the probability of adding an ECN identifier, and the ECN identifier is used to indicate network congestion;
  • the current queue length and ECN threshold range identify network congestion status.
  • the device provided by the embodiment of the application by identifying the network congestion state, in the target network congestion state, instructs the first switch to perform flow control through the target signaling message, thereby suppressing the queue backlog on the congested side, ensuring low service delay, and not Affects the throughput of the business, can support large-scale RoCE networking, and solve the problem of DCQCN speed control failure in large-scale high-concurrency scenarios.
  • the embodiment of the present application provides a network congestion control device, which is used to perform the functions performed by the network device in the network congestion control method shown in FIG. 11.
  • the device includes:
  • the receiving module 1501 is configured to receive target flow control information sent by the first switch, the target flow control information is used to instruct flow control, and the target flow control information shown is the first switch received by the second switch and sent in the target network congestion state Sent after the target signaling message;
  • the control module 1502 is configured to perform flow control according to the target flow control information.
  • the receiving module 1501 is configured to receive first flow control information sent by the first switch, and the first flow control information is used to instruct to suspend sending data packets of the target queue, and the target queue is one or more of the network devices. Queue
  • the control module 1502 is configured to suspend sending data packets of the target queue according to the first flow control information.
  • the receiving module 1501 is configured to receive the first PFC message sent by the first switch, the value of the time field of the first PFC message is a first value, and the first value is used to indicate the first flow control information;
  • the control module 1502 is configured to determine the length of time for suspending sending data packets according to the value of the time field of the first PFC message, and suspend sending the data packets of the target queue within the time length.
  • the receiving module 1501 is configured to receive second flow control information sent by the first switch, and the second flow control information is used to instruct to continue sending data packets of the target queue, and the target queue is one or more of the network devices. Queue
  • the control module 1502 is configured to continue to send data packets of the target queue according to the second flow control information.
  • the receiving module 1501 is configured to receive the second PFC message sent by the first switch, the value of the time field of the second PFC message is the second value, and the second value is used to indicate the second flow control information;
  • the control module 1502 is configured to continue to send data packets of the target queue according to the value of the time field of the second PFC message.
  • the device after receiving the target flow control information sent by the first switch, performs flow control based on the target flow control information, thereby suppressing the queue backlog on the congested side, ensuring low service delay and not affecting the service Throughput, it can support large-scale RoCE networking and solve the problem of DCQCN speed control failure in large-scale high-concurrency scenarios.
  • the target network congestion state involved includes but is not limited to the ECN failure state or the CNP failure state; wherein, the ECN failure state refers to the first The current queue length of the second switch is greater than the maximum value of the reference range, and the status of the CNP packet is not supplemented; the CNP failure status refers to the status that the current queue length of the second switch is greater than the maximum value of the reference area and the CNP packet has been supplemented.
  • FIG. 16 is a schematic diagram of the hardware structure of a network congestion control device 1600 according to an embodiment of the application.
  • the network congestion control device 1600 shown in FIG. 16 can execute the corresponding steps in the network congestion control method provided in the above-mentioned embodiment shown in FIG. 11.
  • the control device 1600 for network congestion includes a processor 1601, a memory 1602, an interface 1603, and a bus 1604.
  • the interface 1603 may be implemented in a wireless or wired manner.
  • the interface 1603 may be a network card.
  • the aforementioned processor 1601, memory 1602, and interface 1603 are connected through a bus 1604.
  • the interface 1603 may include a transmitter and a receiver for communicating with other communication devices.
  • the processor 1601 is configured to perform processing related steps 301 to 304 in the embodiment shown in FIG. 3.
  • control device 1600 for network congestion shown in FIG. 16 is the first switch in FIG. 11, and the processor 1602 reads instructions in the memory 1601 to enable the control device 1600 for network congestion shown in FIG. 16 to execute the first switch. All or part of the operations performed.
  • control device 1600 for network congestion shown in FIG. 16 is the second switch in FIG. 11, and the processor 1602 reads instructions in the memory 1601 to enable the control device 1600 for network congestion shown in FIG. 16 to execute the second switch. All or part of the operations performed by the switch.
  • control device 1600 for network congestion shown in FIG. 16 is the network device in FIG. 11, and the processor 1602 reads instructions in the memory 1601 to enable the control device 1600 for network congestion shown in FIG. All or part of the operation performed.
  • the memory 1602 includes an operating system 16021 and an application program 16022, which are used to store programs, codes or instructions. When the processor or hardware device executes these programs, codes or instructions, the processing process of the control device 1600 involving network congestion in the method embodiment can be completed.
  • the memory 1602 may include a read-only memory (English: Read-only Memory, abbreviation: ROM) and a random access memory (English: Random Access Memory, abbreviation: RAM).
  • ROM includes basic input/output system (English: Basic Input/Output System, abbreviation: BIOS) or embedded system
  • BIOS Basic Input/Output System
  • BIOS Basic Input/Output System
  • RAM includes application programs and operating system.
  • the system is booted by the BIOS solidified in the ROM or the bootloader in the embedded system to guide the control device 1600 with network congestion into a normal operating state.
  • the application program and the operating system run in the RAM, thereby completing the processing process of the network congested control device 1600 in the method embodiment.
  • FIG. 16 only shows a simplified design of the control device 1600 for network congestion.
  • the control device 1600 for network congestion may include any number of interfaces, processors or memories.
  • processor may be a central processing unit (CPU), or other general-purpose processors, digital signal processing (DSP), and application specific integrated circuits. ASIC), field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or any conventional processor. It is worth noting that the processor can be a processor that supports an advanced RISC machine (advanced RISC machines, ARM) architecture.
  • the foregoing memory may include a read-only memory and a random access memory, and provide instructions and data to the processor.
  • the memory may also include non-volatile random access memory.
  • the memory can also store device type information.
  • the memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache. By way of exemplary but not limiting illustration, many forms of RAM are available.
  • static random access memory static random access memory
  • dynamic random access memory dynamic random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate synchronous dynamic random access Memory double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • serial link DRAM SLDRAM
  • direct memory bus random access memory direct rambus RAM
  • a computer-readable storage medium is also provided, and at least one instruction is stored in the storage medium.
  • the instruction is loaded and executed by a processor to implement the method for controlling network congestion as described above.
  • This application provides a computer program.
  • the computer program When the computer program is executed by a computer, it can cause a processor or computer to execute the corresponding steps and/or processes in the foregoing method embodiments.
  • a chip including a processor, configured to call and execute instructions stored in the memory from a memory, so that a communication device installed with the chip executes the methods in the foregoing aspects.
  • Another chip including: an input interface, an output interface, a processor, and a memory.
  • the input interface, output interface, the processor, and the memory are connected through an internal connection path, and the processor is used to execute all When the code in the memory is executed, the processor is used to execute the methods in the foregoing aspects.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium, (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk).
  • first and second are used to distinguish the same or similar items with basically the same function and function. It should be understood that the terms “first”, “second”, and “nth” There are no logic or timing dependencies between them, and there is no restriction on the execution order. It should also be understood that although the terms first, second, etc. are used in the description to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了网络拥塞的控制方法、装置、设备、系统及存储介质,该方法包括:第一交换机接收第二交换机在目标网络拥塞状态发送的目标信令报文,该目标信令报文携带流量来源信息。第一交换机根据该目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,该目标流量控制信息用于指示进行流量控制。第一交换机接收到第二交换机在目标网络拥塞状态发送的目标信令报文后,通过向目标信令报文中携带的流量来源信息所对应的网络设备发送目标流量控制信息,以指示进行流量控制,从而抑制拥塞侧的队列积压,保证业务低时延,且不影响业务的吞吐量,能够支持大规模RoCE组网,解决了大规模高并发场景下DCQCN控速失效的问题。

Description

网络拥塞的控制方法、装置、设备、系统及存储介质
本申请要求于2020年05月30日提交的申请号为202010480552.2、发明名称为“网络拥塞的控制方法、装置、设备、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,特别涉及一种网络拥塞的控制方法、装置、设备、系统及存储介质。
背景技术
随着高性能计算、分布式存储等应用的出现和广泛使用,对数据中心网络和协议提出高吞吐、低时延、低中央处理单元(central processing unit,CPU)开销的需求。由于传统的传输控制协议/网际协议(transmission control protocol/internet protocol,TCP/IP)协议CPU开销极大,不能很好的满足这些应用的需求。因此,允许用户态的应用程序直接读取和写入远程内存,而无内核干预和内存拷贝发生的远程直接内存访问(remote direct memory access,RDMA)协议应运而生。
目前运用比较广泛的RDMA协议是聚合以太网上的RDMA(RDMA over converged ethernet,RoCE)协议,在RoCE网络中,对网络拥塞进行有效的控制,是降低业务时延,支持大规模RoCE组网的关键。
发明内容
本申请实施例提供了一种网络拥塞的控制方法、装置、设备、系统及存储介质,以解决相关技术提供的问题,技术方案如下:
第一方面,提供了一种网络拥塞的控制方法,以该方法应用于第一交换机为例,该方法包括:第一交换机接收第二交换机在目标网络拥塞状态发送的目标信令报文,该目标信令报文携带流量来源信息。第一交换机根据该目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,该目标流量控制信息用于指示进行流量控制。
本申请实施例提供的方法,接收到第二交换机在目标网络拥塞状态发送的目标信令报文后,通过向目标信令报文中携带的流量来源信息所对应的网络设备发送目标流量控制信息,以指示进行流量控制,从而抑制拥塞侧的队列积压,保证业务低时延,且不影响业务的吞吐量,能够支持大规模RoCE组网,解决了大规模高并发场景下DCQCN控速失效的问题。
在第一方面的一种可能的实现方式中,根据目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,包括:根据目标信令报文向流量来源信息对应的网络设备发送第一流量控制信息,第一流量控制信息用于指示网络设备暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
通过第一流量控制信息来指示流量来源信息对应的网络设备暂停发送目标队列的数据包, 能够有效抑制拥塞侧的队列积压,进一步保证业务低时延。
在第一方面的一种可能的实现方式中,根据目标信令报文向流量来源信息对应的网络设备发送第一流量控制信息,包括:根据目标信令报文构造第一基于优先级的流量控制PFC报文,第一PFC报文的时间字段的值为第一值,第一值用于指示第一流量控制信息;向流量来源信息对应的网络设备发送第一PFC报文。
在第一方面的一种可能的实现方式中,接收第二交换机在目标网络拥塞状态发送的目标信令报文,包括:接收第二交换机在目标网络拥塞状态发送的第一信令报文,第一信令报文用于指示发送第一流量控制信息。
在第一方面的一种可能的实现方式中,接收第二交换机在目标网络拥塞状态发送的第一信令报文,包括:接收第二交换机在目标网络拥塞状态发送的第一拥塞通知包CNP报文,第一CNP报文的帧头中的指定字段的值为第一特征值,第一特征值用于指示发送第一流量控制信息。
在第一方面的一种可能的实现方式中,根据目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,包括:根据目标信令报文向流量来源信息对应的网络设备发送第二流量控制信息,第二流量控制信息用于指示网络设备继续发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
通过第二流量控制信息来指示流量来源信息对应的网络设备继续发送目标队列的数据包,从而不影响业务的吞吐量。
在第一方面的一种可能的实现方式中,根据目标信令报文向流量来源信息对应的网络设备发送第二流量控制信息,包括:根据目标信令报文构造第二基于优先级的流量控制PFC报文,第二PFC报文的时间字段的值为第二值,第二值用于指示第二流量控制信息;向流量来源信息对应的网络设备发送第二PFC报文。
在第一方面的一种可能的实现方式中,接收第二交换机在目标网络拥塞状态发送的目标信令报文,包括:接收第二交换机在目标网络拥塞状态发送的第二信令报文,第二信令报文用于指示发送第二流量控制信息。
在第一方面的一种可能的实现方式中,接收第二交换机在目标网络拥塞状态发送的第二信令报文,包括:接收第二交换机在目标网络拥塞状态发送的第二拥塞通知包CNP报文,第二CNP报文的帧头中的指定字段的值为第二特征值,第二特征值用于指示发送第二流量控制信息。
在第一方面的一种可能的实现方式中,根据目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,包括:根据目标信令报文携带的流量来源信息确定流量来源端口;通过流量来源端口向流量来源信息对应的网络设备发送目标流量控制信息。
在第一方面的一种可能的实现方式中,通过流量来源端口向流量来源信息对应的网络设备发送目标流量控制信息,包括:通过流量来源端口向流量来源信息对应的网络设备发送第三流量控制信息,所述第三流量控制信息用于指示暂停发送所述流量来源端口所对应的队列的数据包。
在第一方面的一种可能的实现方式中,通过流量来源端口向流量来源信息对应的网络设备发送目标流量控制信息,包括:通过流量来源端口向流量来源信息对应的网络设备发送第四流量控制信息,所述第四流量控制信息用于指示继续发送所述流量来源端口所对应的队列 的数据包。
第二方面,提供了一种方法应用于第二交换机,方法包括:第二交换机识别网络拥塞状态;响应于网络拥塞状态为目标网络拥塞状态,向第一交换机发送目标信令报文,目标信令报文携带流量来源信息,目标信令报文用于指示第一交换机进行流量控制。
本申请实施例提供的方法,通过识别网络拥塞状态,在目标网络拥塞状态,通过目标信令报文指示第一交换机进行流量控制,从而抑制拥塞侧的队列积压,保证业务低时延,且不影响业务的吞吐量,能够支持大规模RoCE组网,解决大规模高并发场景下DCQCN控速失效的问题。
在第二方面的一种可能的实现方式中,目标信令报文包括第一信令报文或第二信令报文,响应于网络拥塞状态为目标网络拥塞状态,向第一交换机发送目标信令报文,包括:响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度大于第一阈值,向第一交换机发送第一信令报文,第一信令报文用于指示第一交换机发送第一流量控制信息,第一流量控制信息用于指示流量来源信息对应的网络设备暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列;或者,
响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度小于第二阈值,向第一交换机发送第二信令报文,第二信令报文用于指示第一交换机发送第二流量控制信息,第二流量控制信息用于指示网络设备继续发送目标队列的数据包,第二阈值小于第一阈值。
在第二方面的一种可能的实现方式中,向第一交换机发送第一信令报文之前,还包括:获取第一CNP报文,将第一CNP报文的帧头中的指定字段的值设为第一特征值,将第一CNP报文作为第一信令报文;
向第一交换机发送第二信令报文之前,还包括:获取第二CNP报文,将第二CNP报文的帧头中的指定字段的值设为第二特征值,将第二CNP报文作为第二信令报文。
通过CNP报文来构造第一信令报文或第二信令报文只是一种示例,还可以通过其他类型的报文格式来构造第一信令报文或第二信令报文,本申请实施例对此不进行限定。
在第二方面的一种可能的实现方式中,识别网络拥塞状态,包括:读取当前的队列长度及显式拥塞通知ECN阈值范围,ECN阈值范围用于指示添加ECN标识的概率,ECN标识用于指示网络发生拥塞;根据当前的队列长度及ECN阈值范围识别网络拥塞状态。
通过识别不同网络拥塞状态,便于后续基于不同的网络拥塞状态进行相应的网络拥塞的控制。
第三方面,提供了一种网络拥塞的控制方法,该方法应用于网络设备,方法包括:网络设备接收第一交换机发送的目标流量控制信息,目标流量控制信息用于指示进行流量控制,所示目标流量控制信息是第一交换机接收到第二交换机在目标网络拥塞状态发送的目标信令报文之后发送的;根据目标流量控制信息进行流量控制。
本申请实施例提供的方法,接收到第一交换机发送的目标流量控制信息后,基于该目标流量控制信息进行流量控制,从而抑制拥塞侧的队列积压,保证业务低时延,且不影响业务的吞吐量,能够支持大规模RoCE组网,解决大规模高并发场景下DCQCN控速失效的问题。
在第三方面的一种可能的实现方式中,接收第一交换机发送的目标流量控制信息,包括: 接收第一交换机发送的第一流量控制信息,第一流量控制信息用于指示暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列;
根据目标流量控制信息进行流量控制,包括:根据第一流量控制信息暂停发送目标队列的数据包。
在第三方面的一种可能的实现方式中,接收第一交换机发送的第一流量控制信息,包括:接收第一交换机发送的第一PFC报文,第一PFC报文的时间字段的值为第一值,第一值用于指示第一流量控制信息;
根据第一流量控制信息暂停发送目标队列的数据包,包括:根据第一PFC报文的时间字段的值确定暂停发送数据包的时间长度,在时间长度内暂停发送目标队列的数据包。
在第三方面的一种可能的实现方式中,接收第一交换机发送的目标流量控制信息,包括:接收第一交换机发送的第二流量控制信息,第二流量控制信息用于指示继续发送目标队列的数据包,目标队列为网络设备的一个或多个队列;
根据目标流量控制信息进行流量控制,包括:根据第二流量控制信息继续发送目标队列的数据包。
在第三方面的一种可能的实现方式中,接收第一交换机发送的第二流量控制信息,包括:接收第一交换机发送的第二PFC报文,第二PFC报文的时间字段的值为第二值,第二值用于指示第二流量控制信息;
根据第二流量控制信息继续发送目标队列的数据包,包括:根据第二PFC报文的时间字段的值继续发送目标队列的数据包。
第四方面,提供了一种网络拥塞的控制装置,装置包括:
接收模块,用于接收第二交换机在目标网络拥塞状态发送的目标信令报文,目标信令报文携带流量来源信息;
发送模块,用于根据目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,目标流量控制信息用于指示进行流量控制。
在第四方面的一种可能的实现方式中,发送模块,用于根据目标信令报文向流量来源信息对应的网络设备发送第一流量控制信息,第一流量控制信息用于指示网络设备暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
在第四方面的一种可能的实现方式中,发送模块,用于根据目标信令报文构造第一基于优先级的流量控制PFC报文,第一PFC报文的时间字段的值为第一值,第一值用于指示第一流量控制信息;向流量来源信息对应的网络设备发送第一PFC报文。
在第四方面的一种可能的实现方式中,接收模块,用于接收第二交换机在目标网络拥塞状态发送的第一信令报文,第一信令报文用于指示发送第一流量控制信息。
在第四方面的一种可能的实现方式中,接收模块,用于接收第二交换机在目标网络拥塞状态发送的第一拥塞通知包CNP报文,第一CNP报文的帧头中的指定字段的值为第一特征值,第一特征值用于指示发送第一流量控制信息。
在第四方面的一种可能的实现方式中,发送模块,用于根据目标信令报文向流量来源信息对应的网络设备发送第二流量控制信息,第二流量控制信息用于指示网络设备继续发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
在第四方面的一种可能的实现方式中,发送模块,用于根据目标信令报文构造第二基于优先级的流量控制PFC报文,第二PFC报文的时间字段的值为第二值,第二值用于指示第二流量控制信息;向流量来源信息对应的网络设备发送第二PFC报文。
在第四方面的一种可能的实现方式中,接收模块,用于接收第二交换机在目标网络拥塞状态发送的第二信令报文,第二信令报文用于指示发送第二流量控制信息。
在第四方面的一种可能的实现方式中,接收模块,用于接收第二交换机在目标网络拥塞状态发送的第二拥塞通知包CNP报文,第二CNP报文的帧头中的指定字段的值为第二特征值,第二特征值用于指示发送第二流量控制信息。
在第四方面的一种可能的实现方式中,发送模块,用于根据目标信令报文携带的流量来源信息确定流量来源端口;通过流量来源端口向流量来源信息对应的网络设备发送目标流量控制信息。
在第四方面的一种可能的实现方式中,发送模块,用于通过流量来源端口向流量来源信息对应的网络设备发送第三流量控制信息,所述第三流量控制信息用于指示暂停发送所述流量来源端口所对应的队列的数据包。
在第四方面的一种可能的实现方式中,发送模块,用于通过流量来源端口向流量来源信息对应的网络设备发送第四流量控制信息,所述第四流量控制信息用于指示继续发送所述流量来源端口所对应的队列的数据包。
第五方面,提供了一种网络拥塞的控制装置,装置包括:
识别模块,用于识别网络拥塞状态;
发送模块,用于响应于网络拥塞状态为目标网络拥塞状态,向第一交换机发送目标信令报文,目标信令报文携带流量来源信息,目标信令报文用于指示第一交换机进行流量控制。
在第五方面的一种可能的实现方式中,目标信令报文包括第一信令报文或第二信令报文,发送模块,用于响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度大于第一阈值,向第一交换机发送第一信令报文,第一信令报文用于指示第一交换机发送第一流量控制信息,第一流量控制信息用于指示流量来源信息对应的网络设备暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列;或者,
响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度小于第二阈值,向第一交换机发送第二信令报文,第二信令报文用于指示第一交换机发送第二流量控制信息,第二流量控制信息用于指示网络设备继续发送目标队列的数据包,第二阈值小于第一阈值。
在第五方面的一种可能的实现方式中,装置还包括:
获取模块,用于获取第一CNP报文,将第一CNP报文的帧头中的指定字段的值设为第一特征值,将第一CNP报文作为第一信令报文;
或者,获取模块,用于获取第二CNP报文,将第二CNP报文的帧头中的指定字段的值设为第二特征值,将第二CNP报文作为第二信令报文。
在第五方面的一种可能的实现方式中,识别模块,用于读取当前的队列长度及显式拥塞通知ECN阈值范围,ECN阈值范围用于指示添加ECN标识的概率,ECN标识用于指示网络发生拥塞;根据当前的队列长度及ECN阈值范围识别网络拥塞状态。
在第五方面的一种可能的实现方式中,目标网络拥塞状态包括ECN失效状态或拥塞通知 包CNP失效状态,识别模块,用于响应于当前的队列长度大于参考范围的最大值,且未补充CNP报文,则网络拥塞状态为ECN失效状态,参考范围基于ECN阈值范围确定;响应于当前的队列长度大于参考范围的最大值,且已补充CNP报文,则网络拥塞状态为CNP失效状态。
第六方面,提供了一种网络拥塞的控制装置,装置包括:
接收模块,用于接收第一交换机发送的目标流量控制信息,目标流量控制信息用于指示进行流量控制,所示目标流量控制信息是第一交换机接收到第二交换机在目标网络拥塞状态发送的目标信令报文之后发送的;
控制模块,用于根据目标流量控制信息进行流量控制。
在第六方面的一种可能的实现方式中,接收模块,用于接收第一交换机发送的第一流量控制信息,第一流量控制信息用于指示暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列;
控制模块,用于根据第一流量控制信息暂停发送目标队列的数据包。
在第六方面的一种可能的实现方式中,接收模块,用于接收第一交换机发送的第一PFC报文,第一PFC报文的时间字段的值为第一值,第一值用于指示第一流量控制信息;
控制模块,用于根据第一PFC报文的时间字段的值确定暂停发送数据包的时间长度,在时间长度内暂停发送目标队列的数据包。
在第六方面的一种可能的实现方式中,接收模块,用于接收第一交换机发送的第二流量控制信息,第二流量控制信息用于指示继续发送目标队列的数据包,目标队列为网络设备的一个或多个队列;
控制模块,用于根据第二流量控制信息继续发送目标队列的数据包。
在第六方面的一种可能的实现方式中,接收模块,用于接收第一交换机发送的第二PFC报文,第二PFC报文的时间字段的值为第二值,第二值用于指示第二流量控制信息;
控制模块,用于根据第二PFC报文的时间字段的值继续发送目标队列的数据包。
在第一方面至第六方面的一种可能的实现方式中,目标网络拥塞状态包括显式拥塞通知(explicit congestion notification,ECN)失效状态或拥塞通知包(congestion notification packet,CNP)失效状态;
ECN失效状态是指第二交换机当前的队列长度大于参考范围的最大值,且未补充CNP报文的状态;CNP失效状态是指第二交换机当前的队列长度大于参考区域的最大值,且已补充CNP报文的状态;参考范围基于ECN阈值范围确定,ECN阈值范围用于指示添加ECN标识的概率,ECN标识用于指示网络发生拥塞。
还提供一种网络拥塞的控制设备,该设备包括:存储器及处理器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行,以实现上述第一方面任一所述的网络拥塞的控制方法。
还提供一种网络拥塞的控制设备,该设备包括:存储器及处理器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行,以实现上述第二方面任一所述的网络拥塞的控制方法。
还提供一种网络拥塞的控制设备,该设备包括:存储器及处理器,所述存储器中存储有 至少一条指令,所述至少一条指令由所述处理器加载并执行,以实现上述第三方面任一所述的网络拥塞的控制方法。
还提供了一种网络拥塞的控制系统,该系统包括上述三种设备。
还提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如上第一方面至第三方面中任一所述的网络拥塞的控制方法。
提供了另一种通信装置,该装置包括:收发器、存储器和处理器。其中,该收发器、该存储器和该处理器通过内部连接通路互相通信,该存储器用于存储指令,该处理器用于执行该存储器存储的指令,以控制收发器接收信号,并控制收发器发送信号,并且当该处理器执行该存储器存储的指令时,使得该处理器执行第一方面或第一方面的任一种可能的实施方式中的方法。或者,当该处理器执行该存储器存储的指令时,使得该处理器执行第二方面或第二方面的任一种可能的实施方式中的方法。或者,当该处理器执行该存储器存储的指令时,使得该处理器执行第三方面或第三方面的任一种可能的实施方式中的方法。
作为一种示例性实施例,所述处理器为一个或多个,所述存储器为一个或多个。
作为一种示例性实施例,所述存储器可以与所述处理器集成在一起,或者所述存储器与处理器分离设置。
在具体实现过程中,存储器可以为非瞬时性(non-transitory)存储器,例如只读存储器(read only memory,ROM),其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请实施例对存储器的类型以及存储器与处理器的设置方式不做限定。
提供了一种计算机程序(产品),所述计算机程序(产品)包括:计算机程序代码,当所述计算机程序代码被计算机运行时,使得所述计算机执行上述各方面中的方法。
提供了一种芯片,包括处理器,用于从存储器中调用并运行所述存储器中存储的指令,使得安装有所述芯片的通信设备执行上述各方面中的方法。
提供另一种芯片,包括:输入接口、输出接口、处理器和存储器,所述输入接口、输出接口、所述处理器以及所述存储器之间通过内部连接通路相连,所述处理器用于执行所述存储器中的代码,当所述代码被执行时,所述处理器用于执行上述各方面中的方法。
附图说明
图1为相关技术中提供的一种网络架构示意图;
图2为本申请实施例提供的ECN标识模型示意图;
图3为本申请实施例提供的流量模型示意图;
图4为本申请实施例提供的ECN打标率与队列积压关系示意图;
图5为本申请实施例提供的流传输过程示意图;
图6为本申请实施例提供的流量模型示意图;
图7为本申请实施例提供的服务器qp数与时延的关系示意图;
图8为本申请实施例提供的流量模型示意图;
图9为相关技术提供的流速率与CNP数目变化示意图;
图10为本申请实施例提供的数据中心网络的结构示意图;
图11为本申请实施例提供的网络拥塞的控制方法流程图;
图12为本申请实施例提供的应用场景示意图;
图13为本申请实施例提供的网络拥塞的控制装置的结构示意图;
图14为本申请实施例提供的网络拥塞的控制装置的结构示意图;
图15为本申请实施例提供的网络拥塞的控制装置的结构示意图;
图16为本申请实施例提供的网络拥塞的控制设备的结构示意图。
具体实施方式
本申请的实施方式部分使用的术语仅用于对本申请的实施例进行解释,而非旨在限定本申请。
由于传统的TCP/IP协议CPU开销极大,不能很好的满足数据中心网络和协议提出的高吞吐、低时延、低CPU开销的需求,因此,RDMA协议应运而生。
目前运用比较广泛的RDMA协议是RoCE协议,RoCE协议有RoCE1和RoCE2两个版本。其中,RoCE1是基于以太网链路层实现的RDMA协议,RoCEv2则是基于以太网TCP/IP协议中用户数据报协议(user datagram protocol,UDP)层实现的RDMA协议。无论是哪个版本的RoCE协议,在RoCE网络中,对网络拥塞进行有效的控制,是降低业务时延,支持大规模RoCE组网的关键。
对此,相关技术提供了一种数据中心量化拥塞通知(data center quantized congestion notification,DCQCN)算法来对RoCE网络进行拥塞控制。以图1所示的网络架构为例,该网络架构包括发送端网络接口控制器(network interface controller,NIC)、交换机及接收端NIC。其中,发送端NIC为响应点(Reaction Point,RP),交换机为拥塞点(congestion point,CP),接收端NIC为通知点(Notification Point,NP)。
在交换机侧,当一个数据包到达交换机的出端口时,交换机查看交换机的出端口的队列缓存长度。如果交换机的出口端的队列缓存长度超过给定阈值,则对这个数据包按标记概率标记显式拥塞通知(explicit congestion notification,ECN)标识。ECN最初在RFC 3168中定义,交换机会在检测到拥塞时,通过在IP头部嵌入一个拥塞指示器和在TCP头部嵌入一个拥塞确认实现。RoCEv2标准定义了RoCEv2拥塞管理(RCM)。启用了ECN之后,交换机一旦检测到RoCEv2流量出现了拥塞,会在数据包的IP头部ECN域进行标记。
其中,拥塞指示器被目的终端节点按照存在于IB数据段中的基本传输头(base transport header,BTH)中的前向显式拥塞通告(forward explicit congestion notification,FECN)拥塞指示标识来解释意义。换句话说,当被ECN标记过的数据包到达原本要到达的目的地时,拥塞通知就会被反馈给源节点,源节点再通过对有问题的队列对(queue pairs,QP)进行网络数据包的速率限制来回应拥塞通知。
以基于图2所示的ECN标识模型对数据包按标记概率标记ECN为例,当交换机的出口端的队列缓存长度小于给定下限阈值Kmin时,则标记概率为0%;当交换机的出口端的队列缓存长度大于给定上限阈值Kmax时,则标记概率为100%;当交换机的出口端的队列缓存长度介于Kmin和Kmax之间时,标记概率随队列缓存长度线性增加。
在接收端侧,当携带ECN标识的数据包到达接收端NIC时,表明网络中发生了拥塞,则接收端NIC针对该携带ECN标识的报文返回拥塞通知包(congestion notification packet,CNP)报文,以通过该CNP报文将拥塞信息传递给发送端NIC。示例性地,如果某条数据流中携带ECN标识的数据包到达接收端NIC,且在之前的参考时间段内没有相应CNP报文被发送,则 接收端NIC立刻发送一个CNP报文。其中,参考时间段的长度为N微妙,N的值可配置为0,即接收端NIC每收到一个携带ECN标识的报文就返回一个CNP报文。
CNP报文作为拥塞控制报文,也会存在延迟和丢包,从发送端到接收端经过的每一跳设备、每一条链路都会有一定的延迟,会最终加大发送端接收到CNP报文的时间。与此同时,交换机的出端口下的拥塞也会逐步增多,若发送端不能及时降速,仍然可能造成丢包。因此,在发送端侧,发送端NIC控制每条数据流的发送速率。例如,通过收到的CNP报文来触发对数据流的降速控制,通过时间定时器和字节计数器来触发对数据流的升速控制,升速控制与降速控制相互独立。
通过上述过程不难看出,DCQCN算法的升速控制基于发送端NIC的定时器触发,降速控制基于接收端NIC发送的CNP报文触发。当升速控制和降速控制不能协调工作时,将会导致DCQCN控速失效。尤其是网络数据流规模增大时,更容易发生DCQCN控速失效的问题。
例如,随着数据流数目的增加,每条数据流分到的带宽随之减小,单位时间内发出的CNP报文数量也随之变少,而CNP报文间隔随之变大。比如发送端带宽限制每秒发出两千个数据包,每个数据包的间隔是500us,这样即使每个数据包都标记ECN标识,那么接收端返回的CNP报文间隔最小也是500us。但是DCQCN源端的升速周期是300us,这时CNP的间隔大于升速周期。当拥塞发生时,会使得发送端的数据流在拥塞状态下仍错误的升速,导致DCQCN控速失败。
以图3所示的数据流量模型,由单个柜顶(top of rack,ToR)交换机实现多个发送端到一个接收端的数据流传输,且以TCP协议的报文与RoCE协议的报文按照9:1的带宽比例进行传输为例。数据流数、ECN和队列积压的关系如图4所示。由图4可以看出,当数据流数目较少,例如数据流数目低于4时,DCQCN工作的很好,ECN打标率随着数据流数目的增加而增加,队列缓存长度控制的很低,业务时延也很小。但是,当数据流数目达到一定程度后,例如图4所示的数据流数目超过4以后,即使ECN达标率已经达到100%了,但是队列缓存长度突变为MB级,业务时延劣化为ms级,这时ECN失效。
为了解决ECN失效的问题,相关技术还提出了一种在交换机侧记录交换机的出端口的拥塞程度和收到的CNP报文时间,根据交换机的出端口的拥塞程度来确定是否补充CNP报文的方法。以图5所示的数据流传输示意图为例,当交换机的出端口进入拥塞状态时,则交换机检查经过该出端口的数据流的CNP报文的时间间隔。如果在DCQCN的升速周期内没有CNP报文经过,则交换机侧补充一个CNP报文发往发送端并更新CNP报文时间。如果在DCQCN的升速周期内有CNP报文经过,则交换机只更新CNP报文经过的时间,不再补充CNP报文。当交换机的出端口不拥塞时,则只更新CNP报文经过的时间,不再补充CNP报文。
然而,上述补充CNP报文的方式,在数据流进一步增加时仍会出现队列积压,时延劣化的问题。例如,以图6所示的流量模型,由单个ToR交换机实现多个发送端到一个接收端的数据流传输,且以TCP协议的报文与RoCE协议的报文按照9:1的带宽比例进行传输为例,不断增加每个服务器的数据流数(qp数)。服务器的qp数与时延的关系可如图7的测试结果所示,可以看出当每个服务器输出的数据流数小于9时,时延较小。但是当服务器输出的数据流数大于9时,时延突变劣化为ms级,导致该方案仍然存在CNP失效。
为便于理解,以图8所示的流量模型做进一步分析。如图8所示,流量模型中的发送端1 (sender1)和发送端2(sender2)同时向接收端(receiver)发送数据流,receiver可根据需要构造任意多的CNP报文给sender1和sender2。服务器每秒处理的CNP报文数与流速率的关系可如图9所示,服务器的速率随着单位时间内处理的CNP报文数增加而减少,但是当速率下降到一定值后,不再下降,称这种现象为CNP失效。因此当大规模高并发场景下,每个数据流分配的带宽小于网卡能控速的极限值时,CNP失效必然出现,这时队列会压不住,业务时延会很大。
例如,以在图6所示的流量模型下进一步测试为例,得到如表1所示的测试结果。从下面表1所示的测试结果可以看出,在任何比例下,CNP失效都会出现,这时队列积压很深,业务时延很大。
表1
TCP:RoCE带宽配比 数据流数目 端口堆积 时延
90:10 7*11 19446KB 31637.50us
70:30 7*40 19757KB 10360.06us
50:50 7*65 18845KB 5987.04us
30:70 7*85 16829KB 3824.71us
10:90 7*115 18160KB 3206.68us
对此,为了解决CNP失效的问题,也即解决大规模高并发场景下DCQCN控速失效的问题,本申请实施例提供了一种网络拥塞的控制方法。该方法将网络的拥塞状态分为ECN正常状态、ECN失效状态和CNP失效状态三种,通过识别当前的网络拥塞的状态,根据网络拥塞的状态按照路径级触发式产生优先级的流量控制(priority-based flow control,PFC)报文,通过PFC控制流量,从而抑制拥塞侧的队列积压,保证业务低时延的同时不影响业务的吞吐,进而支持大规模RoCE组网。
示例性地,本申请实施例提供的方法可应用于图10所示的数据中心网络中。该数据中心网络是一种Clos网络架构,Clos网络架构是一种交换架构,能够做到可重排无阻塞、可扩展,用于高性能计算、高性能分布式存储、大数据、人工智能等。该Clos网络架构中包括:第一交换机、第二交换机和源服务器。第一交换机如图10中T所示的ToR交换机,第二交换机如图10中与第一交换机不同的T所示的ToR交换机,源服务器如图10中H所示的服务器(Host)。第一交换机可作为源端交换机,第二交换机作为拥塞侧交换机,源服务器作为流量来源对应的网络设备。其中,各个交换机的功能如下。
第二交换机(T):用于读取队列长度、当前ECN的阈值范围(如水线kmin、kmax值)。如果队列长度在kmin-kmax附近,则识别网络拥塞状态为ECN正常状态;如果队列长度远大于kmax,则识别网络拥塞状态为ECN失效状态,并智能添加CNP报文;如果添加CNP报文后,队列长度仍远大于kmax,则识别网络拥塞状态为CNP失效状态。ECN失效状态和CNP失效状态可以作为目标网络拥塞状态,识别当前网络拥塞状态为目标网络拥塞状态时,该第二交换机构造目标信令报文,通知源端交换机即第一交换机当前进入目标网络拥塞状态。例如,进入CNP失效状态,进行补PFC操作。例如,如果队列长度大于给定上限值,则第二交换机构造第一信令报文发往第一交换机,以第一信令报文为CNP报文为例,第二交换机在CNP报文的帧头的指定字段例如保留(reserved)字段填入第一特征值,例如1。如果队列长度小于给 定下限值,则第二交换机构造第二信令报文发往第一交换机,仍以第一信令报文为CNP报文为例,第二交换机在CNP报文的帧头的reserved字段填入第二特征值,例如2。
第一交换机(T):第一交换机接收并解析第二交换机发送的目标信令报文,发现通告对象是第一交换机自身,如果reserved字段为第一特征值1,标识该目标信令报文是第一信令报文,则第一交换机构造第一PFC报文发送给对应的源服务器(H),以触发对应的源服务器暂停发送目标队列的数据包。例如,第一交换机构造的第一PFC报文为xoff PFC报文。如果reserved字段为第二特征值2,标识该目标信令报文是第二信令报文,则第一交换机构造第二PFC报文发送给对应的源服务器,以触发对应的源服务器继续发送目标队列的数据包。例如,第一交换机构造的第二PFC报文为xon PFC报文。其中,目标队列是源服务器的一个或多个队列,可通过PFC报文来指示。此外,第一交换机也可通过流量来源端口来进行流量控制。例如,通过该流量来源端口向流量来源信息对应的源服务器发送对应的目标流量控制信息,以指示暂停或继续发送该流量来源端口所对应的队列的数据包。
该Clos网络架构中除了包括第一交换机和第二交换机外,在示例性实施例中,第一交换机与第二交换机之间还具有中间交换机,该中间交换机如图10中L所示的叶(Leaf)交换机以及图10中S所示的汇聚(Spine)交换机。
中间交换机(L、S):中间交换机接收并解析第二交换机发送的目标信令报文,如果通告对象不是中间交换机自身,则中间交换机不作特殊操作,将该目标信令报文继续转发。
需要说明的是,图10所示的各个交换机的数量仅是示例性说明,本申请实施例不对各个交换机的数量进行限定。此外,源服务器仅是一种作为流量来源对应的网络设备的举例,除了源服务器,还可以是其他交换机,本申请实施例不对流量来源对应的网络设备的类型进行限定。再有,本申请实施例提供的方法不仅仅局限于图10所示数据中心网络场景,也可应用于其他使用DCQCN技术的场景,本申请实施例对应用场景不进行限定。
接下来,对本申请实施例提供的网络拥塞的控制方法进行说明。该方法可通过第一交换机、第二交换机及流量来源对应的网络设备之间的交互实现,示例性地,网络设备为源服务器。参见图11,本申请实施例提供的方法包括如下几个步骤。
1101,第二交换机识别网络拥塞状态。
在示例性实施例中,第二交换机识别网络拥塞状态,包括但不限于如下1101A和1101B两个过程。
1101A:读取当前的队列长度及ECN的阈值范围,ECN的阈值范围用于指示添加ECN标识的概率,ECN标识用于指示网络发生拥塞。
在示例性实施例中,第二交换机为检测拥塞的交换机,当前的队列长度为第二交换机的出端口当前缓存数据的队列长度。当一个数据包到达第二交换机的出端口时,该第二交换机查看第二交换机的出端口缓存数据的队列长度。如果第二交换机的出口端缓存数据的队列长度超过给定阈值,则对这个数据包按标记概率标记ECN标识。因此,ECN标识能够用于指示网络拥塞。本申请实施例不对给定阈值的大小进行限定,例如可基于应用场景设置,也可以基于经验设置。
对数据包标记ECN标识时,标记概率基于ECN的阈值范围和队列长度来确定,例如图2所示的ECN标记模型所示,该ECN的阈值范围是指给定下限阈值Kmin与给定上限阈值Kmax 构成的范围。该ECN的阈值范围用于指示添加ECN标识的概率,网络拥塞越严重,具有ECN标识的报文数量越多。
本申请实施例提供的方法通过当前的队列长度和ECN的阈值范围来识别网络拥塞状态。为便于理解,以该方法应用于图12所示的应用场景为例。如图12所示,该网络架构包括4个L交换机,分别为交换机L1、交换机L2、交换机L3和交换机L4,包括2个S交换机,分别为交换机S1和交换机S2。以交换机L4侧发生拥塞为例,该交换机L4即为第二交换机,读取当前的队列长度、当前ECN的kmin和kmax值。
1101B:第二交换机根据当前的队列长度及ECN的阈值范围识别网络拥塞状态。
示例性地,网络拥塞状态包括但不限于ECN正常状态、ECN失效状态和拥塞通知包CNP失效状态。
在示例性实施例中,根据当前的队列长度及ECN的阈值范围识别网络拥塞状态,包括但不限于如下三种识别结果:
识别结果一:响应于当前的队列长度在参考范围内,则网络拥塞状态为ECN正常状态。
其中,本申请实施例不对参考范围进行限定,可基于经验设置,也可基于应用场景进行调整。例如,参考范围基于ECN阈值范围确定。以将0与1.5倍的最大值Kmax之间的范围作为参考范围为例,如果当前的队列长度在0-1.5Kmax的参考范围内,则网络拥塞状态为ECN正常状态。
识别结果二:响应于当前的队列长度大于参考范围的最大值,且未补充CNP报文,则网络拥塞状态为ECN失效状态。
例如,仍以0与1.5倍的最大值Kmax之间的区域作为参考范围为例,参考范围的最大值为1.5Kmax,如果当前的队列长度大于1.5Kmax,且未补充CNP报文,则网络拥塞状态为ECN失效状态。针对识别结果是网络拥塞状态为ECN失效状态的情况,第二交换机可补充CNP报文。例如,在第二交换机侧记录第二交换机的出端口的拥塞程度和收到的CNP报文时间,根据第二交换机的出端口的拥塞程度来确定是否补充CNP报文。补充CNP报文的方式参见上述图5的相关描述,此处不再赘述。
识别结果三:响应于当前的队列长度大于参考范围的最大值,且已补充CNP报文,则网络拥塞状态为CNP失效状态。
例如,仍以0与1.5倍的最大值Kmax之间的范围作为参考范围为例,参考范围的最大值为1.5Kmax,如果当前的队列长度大于1.5Kmax,且已补充CNP报文,则网络拥塞状态为CNP失效状态。
综上,以如图12所示的应用场景为例,交换机L4发生拥塞,读取当前的队列长度、当前ECN的kmin和kmax值后,如果当前的队列长度在kmin-kmax的参考范围附近,例如参考区域为[0,1.5*kmax],则交换机L4识别当前网络拥塞状态为ECN正常状态。如果队列长度远大于1.5kmax,例如大于3*kmax,且未补充CNP报文,则交换机L4识别当前网络拥塞状态为ECN失效状态,打开智能补充CNP报文功能。如果队列长度还是远大于1.5kmax,例如大于3*kmax,则交换机L4识别当前网络拥塞状态为CNP失效状态。
1102,第二交换机响应于网络拥塞状态为目标网络拥塞状态,向第一交换机发送目标信令报文,目标信令报文携带流量来源信息,目标信令报文用于指示第一交换机进行流量控制。
在示例性实施例中,目标信令报文包括第一信令报文或第二信令报文,响应于网络拥塞 状态为目标网络拥塞状态,向第一交换机发送目标信令报文,包括但不限于如下两种发送情况。
发送情况一:响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度大于第一阈值,向第一交换机发送第一信令报文,第一信令报文用于指示第一交换机发送第一流量控制信息,该第一流量控制信息用于指示流量来源信息对应的网络设备暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
针对发送情况一,第一阈值的大小可基于经验设置,也可基于应用场景设置,还可在实施方法过程中进行调整。当前的队列长度大于第一阈值,说明拥塞情况较为严重,需要启动流量控制。例如,通过第一信令报文来指示第一交换机进行第一类流量控制,即指示第一交换机发送第一流量控制信息。其中,第一信令报文可是任意格式的报文,能够指示第一交换机进行第一类流量控制即可。
示例性地,向第一交换机发送第一信令报文之前,还包括:获取第一CNP报文,将第一CNP报文的帧头中的指定字段的值设为第一特征值,将第一CNP报文作为第一信令报文。
以CNP报文的格式如表2所示为例,表2中的第一行为各个字段的比特数,第二行为各个字段的名称,第三行为各个字段的值。该CNP报文的帧头包括8比特的操作码(pocode)字段、1比特的请求事件(solicited event,SE)字段、1比特的迁移请求(migreq,M)字段、2比特的填充计数(Pad Count)字段、4比特的头版本(Head version)字段、16比特的分区键(Partition Key,P_KEY)字段、8比特的保留(Reserved)字段、24比特的目的队列对(DestQP)字段、1比特的确认请求(Ack request)字段、7比特的保留字段以及24比特的数据包序列号(packet sequence number,PSN)字段。
表2
Figure PCTCN2021093165-appb-000001
在本申请实施例中,将CNP报文的第2个保留字段即7比特的保留字段的值设为第一特征值。例如,在表2所示的CNP报文的帧头基础上,将CNP报文的帧头中的7比特的保留字段的值设为第一特征值,以该第一特征值为1,用0000001表示为例,该CNP报文的帧头如表3所示。
表3
Figure PCTCN2021093165-appb-000002
发送情况二:响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度小于第二阈值,向第一交换机发送第二信令报文,第二信令报文用于指示第一交换机发送第二流量控制信息,第二流量控制信息用于指示网络设备继续发送目标队列的数据包,第二阈值小于第一阈值。
针对发送情况二,第二阈值的大小可基于经验设置,也可基于应用场景设置,还可在实 施方法过程中进行调整。第二阈值小于第一阈值,当前的队列长度小于第二阈值,说明拥塞情况得到缓解,需要启动另一类流量控制。例如,通过第二信令报文来指示第一交换机进行第二类流量控制。其中,第二信令报文可是任意格式的报文,能够指示第一交换机进行第二类流量控制即可。
示例性地,向第一交换机发送第二信令报文之前,还包括:获取第二CNP报文,将第二CNP报文的帧头中的指定字段的值设为第二特征值,将第二CNP报文作为第二信令报文。
仍以上面表2所示的CNP报文的帧头的格式为例,将CNP报文的帧头中的7比特的保留字段的值设为第二特征值,以该第二特征值为2,用0000010表示为例,该CNP报文的帧头如表4所示。
表4
Figure PCTCN2021093165-appb-000003
仍以图12所示的应用场景为例,当交换机L4识别当前的网络拥塞状态为目标网络拥塞状态时,则比较当前的队列长度与给定门限第一阈值thh和第二阈值thl。如果当前的队列长度大于thh,则交换机L4构造第一信令报文发往第一交换机。例如,使用CNP报文作为信令报文,在CNP报文的帧头的reserved字段填入特征值1。如果当前的队列长度小于thl,则交换机L4构造第二信令报文发往第一交换机。例如,使用CNP报文作为信令报文,在CNP报文的帧头的reserved字段填入特征值2。CNP报文的帧头中reserved字段为保留字段,默认为全0。本申请实施例利用该保留字段构造第一、第二信令报文,通告第一交换机做不同动作,即采用不同的流量控制方式。
在示例性实施例中,目标网络拥塞状态包括ECN失效状态或CNP失效状态;其中,ECN失效状态是指第二交换机当前的队列长度大于参考范围的最大值,且未补充CNP报文的状态;CNP失效状态是指第二交换机当前的队列长度大于参考区域的最大值,且已补充CNP报文的状态。
1103,第一交换机接收第二交换机在目标网络拥塞状态发送的目标信令报文,目标信令报文携带流量来源信息。
在示例性实施例中,针对1102中第二交换机发送目标信令报文的两种情况,第一交换机接收第二交换机在目标网络拥塞状态发送的目标信令报文,包括但不限于如下两种接收情况。
接收情况一:接收第二交换机在目标网络拥塞状态发送的第一信令报文,第一信令报文用于指示第一交换机发送第一流量控制信息,进行第一类流量控制。
示例性地,接收第二交换机在目标网络拥塞状态发送的第一信令报文,包括:接收第二交换机在目标网络拥塞状态发送的第一CNP报文,第一CNP报文的帧头中的指定字段的值为第一特征值,第一特征值用于指示发送第一流量控制信息。
接收情况二:接收第二交换机在目标网络拥塞状态发送的第二信令报文,第二信令报文用于指示第一交换机发送第二流量控制信息,进行第二类流量控制。
示例性地,接收第二交换机在目标网络拥塞状态发送的第二信令报文,包括:接收第二 交换机在目标网络拥塞状态发送的第二拥塞通知包CNP报文,第二CNP报文的帧头中的指定字段的值为第二特征值,第二特征值用于指示发送第二流量控制信息。
1104,第一交换机根据目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,目标流量控制信息用于指示进行流量控制。
在示例性实施例中,第一交换机根据目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,包括但不限于如下两种情况。
情况一:根据目标信令报文向流量来源信息对应的网络设备发送第一流量控制信息,第一流量控制信息用于指示网络设备暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
示例性地,根据目标信令报文向流量来源信息对应的网络设备发送第一流量控制信息,包括:根据目标信令报文构造第一PFC报文,第一PFC报文的时间字段的值为第一值,第一值用于指示第一流量控制信息;向流量来源信息对应的网络设备发送第一PFC报文。
情况二:根据目标信令报文向流量来源信息对应的网络设备发送第二流量控制信息,第二流量控制信息用于指示网络设备继续发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
示例性地,根据目标信令报文向流量来源信息对应的网络设备发送第二流量控制信息,包括:根据目标信令报文构造第二基于优先级的流量控制PFC报文,第二PFC报文的时间字段的值为第二值,第二值用于指示第二流量控制信息;向流量来源信息对应的网络设备发送第二PFC报文。
PFC是IEEE数据中心桥接(Data Center Bridge,DCB)协议族中的技术,是流量控制的增强版。本申请实施例提供的方法在识别网络拥塞状态为CNP失效状态后,触发第一交换机向流量来源信息对应的网络设备发送对应的PFC报文,以进行流量控制。
例如,如表3所示,如果第一交换机接收到的目标信令报文的reserved字段为00000001,则表示第一交换机接收到的目标信令报文为第一信令报文,第一交换机需要给对应的网络设备发送第一PFC报文。示例性地,该第一PFC报文包括但不限于为XOFF的PFC报文。如果第一交换机接收到的目标信令报文的reserved字段为00000010,则表示第一交换机接收到的目标信令报文为第二信令报文,第一交换机需要给对应的网络设备发送第二PFC报文。示例性地,该第二PFC报文包括但不限于为XON的PFC报文。
XON/XOFF是一种在计算机和其它设备之间控制数据流的软件数据流通信协议。其中,X代表发射器。XON/XOFF常指为“软件流控制”。典型地,接收器将发送一个XOFF字符,当它不能接收任何更多的数据时(例如,它可能需要时间来处理一些事情),当它能够再次接收更多的数据时,将发送一个XON字符给发射器。在本申请实施例中,将XON/XOFF的PFC报文作为流量控制报文,实现基于优先级的流量控制。示例性地,PFC报文的格式如表5所示。
表5
Figure PCTCN2021093165-appb-000004
Figure PCTCN2021093165-appb-000005
仍以图12所示的应用场景为例,中间交换机接收目标信令报文并解析,如果通告对象不是中间交换机自身,则不作特殊操作,将目标信令报文继续转发。
例如,交换机L1收到目标信令报文,从帧头中提取reserved字段并解析,发现是特征值1,则构造如表5所示格式的PFC报文发送给对应的网络设备,例如源服务器。之后,可将该目标信令报文丢弃。其中,PFC报文的time[n]字段为65535,用于指示在65535所示时间内暂停发送目标队列n的数据包。
又例如,第一交换机L2收到目标信令报文,从帧头中提取reserved字段并解析,发现是特征值2,则构造表5所示格式的PFC报文给对应的网络设备,例如源服务器。之后,将该目标信令报文丢弃。其中,PFC报文的time[n]字段为0,用于指示继续发送目标队列n的数据包。
需要说明的是,由于PFC是基于优先级的流量控制报文,Priority_enable_vector字段e[n]指示优先级为n的队列的time值是否有效。以网络设备具有n个优先级的队列为例,如果需要网络设备暂停发送所有队列的数据包,则目标队列包括n个队列,则e[1]至e[n]的值均为非0,将time的值按照暂停时间进行设置。如果需要网络设备暂停发送部分队列的数据包,以目标队列包括1个队列为例,则目标队列是哪个优先级的队列,哪个优先级队列所对应的e[n]的值为非0,将time的值按照暂停时间进行设置。
除了通过上述方式由第一交换机向网络设备发送目标流量控制信息来指示网络设备暂停或继续发送目标队列的数据包的方式外,第一交换机也可以不指定目标队列,只是通过目标流量控制信息指示网络设备暂停或继续发送数据包,由网络设备来确定暂停或发送哪个队列的数据包。
另外,本申请实施例上述过程仅以构造PFC报文来携带目标流量控制信息的方式为例进行说明。除了PFC报文之外,还可以采用PAUSE报文实现,本申请实施例不对携带目标流量控制信息的报文的类型进行限定。
PAUSE报文是一种用于控制MAC数据流量的报文。当对端数据量过大,将无法及时处理数据时,会向数据上游MAC(在本申请实施例中是流量来源对应的网络设备)发送PAUSE报 文,通知上游MAC在一段时间内停止发送数据,停止时间记录在PAUSE报文的PAUSE_TIMING字段。也就是说,该记录有停止时间的PAUSE_TIMING字段用于携带目标流量控制信息。当上游MAC接收到对端的有效PAUSE报文时,会开始计时,并会停止发送数据,防止对端无法及时处理数据,导致对端FIFO溢出或者数据丢失。若计时结束,并且没有收到新的PAUSE报文,将重新发送数据。若计时没有结束,且新收到的PAUSE报文PAUSE_TIMING字段为全0,则表示可以重新发送数据,此时停止计时,重新开始发送数据。
在示例性实施例中,第一交换机根据目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,包括:根据目标信令报文携带的流量来源信息确定流量来源端口;通过流量来源端口向流量来源信息对应的网络设备发送目标流量控制信息。
示例性地,通过流量来源端口向流量来源信息对应的网络设备发送目标流量控制信息,包括:通过流量来源端口向流量来源信息对应的网络设备发送第三流量控制信息,该第三流量控制信息用于指示暂停发送流量来源端口所对应的队列的数据包。该种情况下,通过第三流量控制信息来控制暂停发送该流量来源端口所对应的所有队列的数据包,实现了端口级控制。本申请实施例不对第三流量控制信息的发送方式进行限定,例如,向流量来源信息对应的网络设备发送PAUSE报文,通过PAUSE报文来携带第三流量控制信息,以指示该流量来源端口所对应的队列的数据包均暂停发送。例如,将PAUSE报文的PAUSE_TIMING字段的值设为非0,以用来携带第三流量控制信息,用于指示在该字段所表示的时间内暂停发送数据包。
示例性地,通过流量来源端口向流量来源信息对应的网络设备发送目标流量控制信息,包括:通过流量来源端口向流量来源信息对应的网络设备发送第四流量控制信息,该第四流量控制信息用于指示继续发送所述流量来源端口所对应的队列的数据包。该种情况下,通过第四流量控制信息来控制继续发送该流量来源端口所对应的所有队列的数据包,实现了端口级控制。本申请实施例不对第四流量控制信息的发送方式进行限定,例如,向流量来源信息对应的网络设备发送PAUSE报文,通过PAUSE报文来携带第四流量控制信息,以指示该流量来源端口所对应的队列的数据包均继续发送。例如,将PAUSE报文的PAUSE_TIMING字段的值设为0,以用来携带第四流量控制信息,用于指示继续发送数据包。
以第一交换机与流量来源信息对应的网络设备之间的交互过程为例,如图11所示,该网络拥塞的控制方法包括如下几个过程。
1105,网络设备接收第一交换机发送的目标流量控制信息,目标流量控制信息用于指示进行流量控制,目标流量控制信息是第一交换机接收到第二交换机在目标网络拥塞状态发送的目标信令报文之后发送的。
在示例性实施例中,接收第一交换机发送的目标流量控制信息,包括但不限于如下两种情况。
情况一:接收第一交换机发送的第一流量控制信息,第一流量控制信息用于指示暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
示例性地,接收第一交换机发送的第一流量控制信息,包括:接收第一交换机发送的第一PFC报文,第一PFC报文的时间字段的值为第一值,第一值用于指示第一流量控制信息。
情况二:接收第一交换机发送的第二流量控制信息,第二流量控制信息用于指示继续发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
示例性地,接收第一交换机发送的第二流量控制信息,包括:接收第一交换机发送的第二PFC报文,第二PFC报文的时间字段的值为第二值,第二值用于指示第二流量控制信息。
1106,网络设备根据目标流量控制信息进行流量控制。
在示例性实施例中,根据目标流量控制信息进行流量控制,包括但不限于如下两种控制方式。
控制方式一:根据第一流量控制信息暂停发送目标队列的数据包。
示例性地,根据第一流量控制信息暂停发送目标队列的数据包,包括:根据第一PFC报文的时间字段的值确定暂停发送数据包的时间长度,在时间长度内暂停发送目标队列的数据包。
控制方式二:根据第二流量控制信息继续发送目标队列的数据包。
示例性地,根据第二流量控制信息继续发送目标队列的数据包,包括:根据第二PFC报文的时间字段的值继续发送目标队列的数据包。
采用本申请实施例提供的方法进行网络拥塞的控制,针对CNP失效状态,将本申请实施例提供的方法与补充CNP报文的相关技术进行对比,得到的测试结果如表6所示。
表6
Figure PCTCN2021093165-appb-000006
从表6所示的CNP失效场景下的测试结果可以看出,本申请实施例提供的方法在端口堆积和业务时延均得到了数量级的优化,且对吞吐几乎无影响。
本申请实施例提供的方法,通过识别网络拥塞状态,当处于CNP失效状态时,通过信令报文指示第一交换机进行流量控制,从而抑制拥塞侧的队列积压,保证业务低时延,且不影响业务的吞吐量,能够支持大规模RoCE组网。不仅解决了CNP失效的问题,还解决了大规模高并发场景下DCQCN控速失效的问题。
本申请实施例提供了一种网络拥塞的控制装置,该装置用于执行图11所示的网络拥塞的控制方法中第一交换机所执行的功能。参见图13,该装置包括:
接收模块1301,用于接收第二交换机在目标网络拥塞状态发送的目标信令报文,目标信令报文携带流量来源信息;
发送模块1302,用于根据目标信令报文向流量来源信息对应的网络设备发送目标流量控制信息,目标流量控制信息用于指示进行流量控制。
在示例性实施例中,发送模块1302,用于根据目标信令报文向流量来源信息对应的网络设备发送第一流量控制信息,第一流量控制信息用于指示网络设备暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
在示例性实施例中,发送模块1302,用于根据目标信令报文构造第一基于优先级的流量 控制PFC报文,第一PFC报文的时间字段的值为第一值,第一值用于指示第一流量控制信息;向流量来源信息对应的网络设备发送第一PFC报文。
在示例性实施例中,接收模块1301,用于接收第二交换机在目标网络拥塞状态发送的第一信令报文,第一信令报文用于指示发送第一流量控制信息。
在示例性实施例中,接收模块1301,用于接收第二交换机在目标网络拥塞状态发送的第一拥塞通知包CNP报文,第一CNP报文的帧头中的指定字段的值为第一特征值,第一特征值用于指示发送第一流量控制信息。
在示例性实施例中,发送模块1302,用于根据目标信令报文向流量来源信息对应的网络设备发送第二流量控制信息,第二流量控制信息用于指示网络设备继续发送目标队列的数据包,目标队列为网络设备的一个或多个队列。
在示例性实施例中,发送模块1302,用于根据目标信令报文构造第二基于优先级的流量控制PFC报文,第二PFC报文的时间字段的值为第二值,第二值用于指示第二流量控制信息;向流量来源信息对应的网络设备发送第二PFC报文。
在示例性实施例中,接收模块1301,用于接收第二交换机在目标网络拥塞状态发送的第二信令报文,第二信令报文用于指示发送第二流量控制信息。
在示例性实施例中,接收模块1301,用于接收第二交换机在目标网络拥塞状态发送的第二拥塞通知包CNP报文,第二CNP报文的帧头中的指定字段的值为第二特征值,第二特征值用于指示发送第二流量控制信息。
在示例性实施例中,发送模块1302,用于根据目标信令报文携带的流量来源信息确定流量来源端口;通过流量来源端口向流量来源信息对应的网络设备发送目标流量控制信息。
在示例性实施例中,发送模块1302,用于通过流量来源端口向流量来源信息对应的网络设备发送第三流量控制信息,第三流量控制信息用于指示暂停发送流量来源端口所对应的队列的数据包。
在示例性实施例中,发送模块1302,用于通过流量来源端口向流量来源信息对应的网络设备发送第四流量控制信息,第四流量控制信息用于指示继续发送流量来源端口所对应的队列的数据包。
本申请实施例提供的装置,接收到第二交换机在目标网络拥塞状态发送的目标信令报文后,通过向目标信令报文中携带的流量来源信息所对应的网络设备发送目标流量控制信息,以指示进行流量控制,从而抑制拥塞侧的队列积压,保证业务低时延,且不影响业务的吞吐量,能够支持大规模RoCE组网,解决了大规模高并发场景下DCQCN控速失效的问题。
本申请实施例提供了一种网络拥塞的控制装置,该装置用于执行图11所示的网络拥塞的控制方法中第二交换机所执行的功能。参见图14,该装置包括:
识别模块1401,用于识别网络拥塞状态;
发送模块1402,用于响应于网络拥塞状态为目标网络拥塞状态,向第一交换机发送目标信令报文,目标信令报文携带流量来源信息,目标信令报文用于指示第一交换机进行流量控制。
在示例性实施例中,目标信令报文包括第一信令报文或第二信令报文,发送模块1402,用于响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度大于第一阈值,向第一交 换机发送第一信令报文,第一信令报文用于指示第一交换机发送第一流量控制信息,第一流量控制信息用于指示流量来源信息对应的网络设备暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列;或者,
响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度小于第二阈值,向第一交换机发送第二信令报文,第二信令报文用于指示第一交换机发送第二流量控制信息,第二流量控制信息用于指示网络设备继续发送目标队列的数据包,第二阈值小于第一阈值。
在示例性实施例中,该装置还包括:
获取模块,用于获取第一CNP报文,将第一CNP报文的帧头中的指定字段的值设为第一特征值,将第一CNP报文作为第一信令报文;
或者,获取模块,用于获取第二CNP报文,将第二CNP报文的帧头中的指定字段的值设为第二特征值,将第二CNP报文作为第二信令报文。
在示例性实施例中,识别模块1401,用于读取当前的队列长度及显式拥塞通知ECN阈值范围,ECN阈值范围用于指示添加ECN标识的概率,ECN标识用于指示网络发生拥塞;根据当前的队列长度及ECN阈值范围识别网络拥塞状态。
本申请实施例提供的装置,通过识别网络拥塞状态,在目标网络拥塞状态,通过目标信令报文指示第一交换机进行流量控制,从而抑制拥塞侧的队列积压,保证业务低时延,且不影响业务的吞吐量,能够支持大规模RoCE组网,解决大规模高并发场景下DCQCN控速失效的问题。
本申请实施例提供了一种网络拥塞的控制装置,该装置用于执行图11所示的网络拥塞的控制方法中网络设备所执行的功能。参见图15,该装置包括:
接收模块1501,用于接收第一交换机发送的目标流量控制信息,目标流量控制信息用于指示进行流量控制,所示目标流量控制信息是第一交换机接收到第二交换机在目标网络拥塞状态发送的目标信令报文之后发送的;
控制模块1502,用于根据目标流量控制信息进行流量控制。
在示例性实施例中,接收模块1501,用于接收第一交换机发送的第一流量控制信息,第一流量控制信息用于指示暂停发送目标队列的数据包,目标队列为网络设备的一个或多个队列;
控制模块1502,用于根据第一流量控制信息暂停发送目标队列的数据包。
在示例性实施例中,接收模块1501,用于接收第一交换机发送的第一PFC报文,第一PFC报文的时间字段的值为第一值,第一值用于指示第一流量控制信息;
控制模块1502,用于根据第一PFC报文的时间字段的值确定暂停发送数据包的时间长度,在时间长度内暂停发送目标队列的数据包。
在示例性实施例中,接收模块1501,用于接收第一交换机发送的第二流量控制信息,第二流量控制信息用于指示继续发送目标队列的数据包,目标队列为网络设备的一个或多个队列;
控制模块1502,用于根据第二流量控制信息继续发送目标队列的数据包。
在示例性实施例中,接收模块1501,用于接收第一交换机发送的第二PFC报文,第二PFC报文的时间字段的值为第二值,第二值用于指示第二流量控制信息;
控制模块1502,用于根据第二PFC报文的时间字段的值继续发送目标队列的数据包。
本申请实施例提供的装置,接收到第一交换机发送的目标流量控制信息后,基于该目标流量控制信息进行流量控制,从而抑制拥塞侧的队列积压,保证业务低时延,且不影响业务的吞吐量,能够支持大规模RoCE组网,解决大规模高并发场景下DCQCN控速失效的问题。
应理解的是,上述图13-图15提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。此外,在示例性实施例中,上述图13-图15提供的装置在实现其功能时,涉及的目标网络拥塞状态包括但不限于ECN失效状态或CNP失效状态;其中,ECN失效状态是指第二交换机当前的队列长度大于参考范围的最大值,且未补充CNP报文的状态;CNP失效状态是指第二交换机当前的队列长度大于参考区域的最大值,且已补充CNP报文的状态。
图16为本申请实施例的网络拥塞的控制设备1600的硬件结构示意图。图16所示的网络拥塞的控制设备1600可以执行上述图11所示实施例提供的网络拥塞的控制方法中的相应步骤。
如图16所示,网络拥塞的控制设备1600包括处理器1601、存储器1602、接口1603和总线1604。其中接口1603可以通过无线或有线的方式实现,示例性地,该接口1603可以是网卡。上述处理器1601、存储器1602和接口1603通过总线1604连接。
接口1603可以包括发送器和接收器,用于与其他通信设备通信。处理器1601用于执行上述图3所示实施例中301-304的处理相关步骤。处理器1601和/或用于本文所描述的技术的其他过程。
例如,图16所示的网络拥塞的控制设备1600为图11中的第一交换机,处理器1602读取存储器1601中的指令,使图16所示的网络拥塞的控制设备1600能够执行第一交换机所执行的全部或部分操作。
又例如,图16所示的网络拥塞的控制设备1600为图11中的第二交换机,处理器1602读取存储器1601中的指令,使图16所示的网络拥塞的控制设备1600能够执行第二交换机所执行的全部或部分操作。
又例如,图16所示的网络拥塞的控制设备1600为图11中的网络设备,处理器1602读取存储器1601中的指令,使图16所示的网络拥塞的控制设备1600能够执行网络设备所执行的全部或部分操作。
存储器1602包括操作系统16021和应用程序16022,用于存储程序、代码或指令,当处理器或硬件设备执行这些程序、代码或指令时可以完成方法实施例中涉及网络拥塞的控制设备1600的处理过程。可选的,存储器1602可以包括只读存储器(英文:Read-only Memory,缩写:ROM)和随机存取存储器(英文:Random Access Memory,缩写:RAM)。其中,ROM包括基本输入/输出系统(英文:Basic Input/Output System,缩写:BIOS)或嵌入式系统;RAM包括应用程序和操作系统。当需要运行网络拥塞的控制设备1600时,通过固化在ROM中的BIOS或者嵌入式系统中的bootloader引导系统进行启动,引导网络拥塞的控制设备1600进入正 常运行状态。在网络拥塞的控制设备1600进入正常运行状态后,运行在RAM中的应用程序和操作系统,从而,完成方法实施例中涉及网络拥塞的控制设备1600的处理过程。
可以理解的是,图16仅仅示出了网络拥塞的控制设备1600的简化设计。在实际应用中,网络拥塞的控制设备1600可以包含任意数量的接口,处理器或者存储器。
应理解的是,上述处理器可以是中央处理器(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。值得说明的是,处理器可以是支持进阶精简指令集机器(advanced RISC machines,ARM)架构的处理器。
进一步地,在一种可选的实施例中,上述存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。存储器还可以包括非易失性随机存取存储器。例如,存储器还可以存储设备类型的信息。
该存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用。例如,静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic random access memory,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
还提供了一种计算机可读存储介质,存储介质中存储有至少一条指令,指令由处理器加载并执行以实现如上任一所述的网络拥塞的控制方法。
本申请提供了一种计算机程序,当计算机程序被计算机执行时,可以使得处理器或计算机执行上述方法实施例中对应的各个步骤和/或流程。
提供了一种芯片,包括处理器,用于从存储器中调用并运行所述存储器中存储的指令,使得安装有所述芯片的通信设备执行上述各方面中的方法。
提供另一种芯片,包括:输入接口、输出接口、处理器和存储器,所述输入接口、输出接口、所述处理器以及所述存储器之间通过内部连接通路相连,所述处理器用于执行所述存储器中的代码,当所述代码被执行时,所述处理器用于执行上述各方面中的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算 机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。
本申请中术语“第一”、“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对执行顺序进行限定。还应理解,尽管描述中使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (49)

  1. 一种网络拥塞的控制方法,其特征在于,所述方法应用于第一交换机,所述方法包括:
    所述第一交换机接收第二交换机在目标网络拥塞状态发送的目标信令报文,所述目标信令报文携带流量来源信息;
    根据所述目标信令报文向所述流量来源信息对应的网络设备发送目标流量控制信息,所述目标流量控制信息用于指示进行流量控制。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述目标信令报文向所述流量来源信息对应的网络设备发送目标流量控制信息,包括:
    根据所述目标信令报文向所述流量来源信息对应的网络设备发送第一流量控制信息,所述第一流量控制信息用于指示所述网络设备暂停发送目标队列的数据包,所述目标队列为所述网络设备的一个或多个队列。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述目标信令报文向所述流量来源信息对应的网络设备发送第一流量控制信息,包括:
    根据所述目标信令报文构造第一基于优先级的流量控制PFC报文,所述第一PFC报文的时间字段的值为第一值,所述第一值用于指示所述第一流量控制信息;
    向所述流量来源信息对应的网络设备发送所述第一PFC报文。
  4. 根据权利要求2或3所述的方法,其特征在于,所述接收第二交换机在目标网络拥塞状态发送的目标信令报文,包括:
    接收所述第二交换机在目标网络拥塞状态发送的第一信令报文,所述第一信令报文用于指示发送所述第一流量控制信息。
  5. 根据权利要求4所述的方法,其特征在于,所述接收所述第二交换机在目标网络拥塞状态发送的第一信令报文,包括:
    接收所述第二交换机在目标网络拥塞状态发送的第一拥塞通知包CNP报文,所述第一CNP报文的帧头中的指定字段的值为第一特征值,所述第一特征值用于指示发送所述第一流量控制信息。
  6. 根据权利要求1所述的方法,其特征在于,所述根据所述目标信令报文向所述流量来源信息对应的网络设备发送目标流量控制信息,包括:
    根据所述目标信令报文向所述流量来源信息对应的网络设备发送第二流量控制信息,所述第二流量控制信息用于指示所述网络设备继续发送目标队列的数据包,所述目标队列为所述网络设备的一个或多个队列。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述目标信令报文向所述流量来源 信息对应的网络设备发送第二流量控制信息,包括:
    根据所述目标信令报文构造第二基于优先级的流量控制PFC报文,所述第二PFC报文的时间字段的值为第二值,所述第二值用于指示所述第二流量控制信息;
    向所述流量来源信息对应的网络设备发送所述第二PFC报文。
  8. 根据权利要求6或7所述的方法,其特征在于,所述接收第二交换机在目标网络拥塞状态发送的目标信令报文,包括:
    接收所述第二交换机在目标网络拥塞状态发送的第二信令报文,所述第二信令报文用于指示发送所述第二流量控制信息。
  9. 根据权利要求8所述的方法,其特征在于,所述接收所述第二交换机在目标网络拥塞状态发送的第二信令报文,包括:
    接收所述第二交换机在目标网络拥塞状态发送的第二拥塞通知包CNP报文,所述第二CNP报文的帧头中的指定字段的值为第二特征值,所述第二特征值用于指示发送所述第二流量控制信息。
  10. 根据权利要求1-9任一所述的方法,其特征在于,所述根据所述目标信令报文向所述流量来源信息对应的网络设备发送目标流量控制信息,包括:
    根据所述目标信令报文携带的流量来源信息确定流量来源端口;
    通过所述流量来源端口向所述流量来源信息对应的网络设备发送目标流量控制信息。
  11. 根据权利要求1-10任一所述的方法,其特征在于,所述目标网络拥塞状态包括显式拥塞通知ECN失效状态或拥塞通知包CNP失效状态;
    所述ECN失效状态是指所述第二交换机当前的队列长度大于参考范围的最大值,且未补充CNP报文的状态;所述CNP失效状态是指所述第二交换机当前的队列长度大于所述参考区域的最大值,且已补充CNP报文的状态;所述参考范围基于ECN阈值范围确定,所述ECN阈值范围用于指示添加ECN标识的概率,所述ECN标识用于指示网络发生拥塞。
  12. 一种网络拥塞的控制方法,其特征在于,所述方法应用于第二交换机,所述方法包括:
    所述第二交换机识别网络拥塞状态;
    响应于网络拥塞状态为目标网络拥塞状态,向第一交换机发送目标信令报文,所述目标信令报文携带流量来源信息,所述目标信令报文用于指示所述第一交换机进行流量控制。
  13. 根据权利要求12所述的方法,其特征在于,所述目标信令报文包括第一信令报文或第二信令报文,所述响应于网络拥塞状态为目标网络拥塞状态,向第一交换机发送目标信令报文,包括:
    响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度大于第一阈值,向所述第一交换机发送第一信令报文,所述第一信令报文用于指示所述第一交换机发送第一流量控制信息,所述第一流量控制信息用于指示所述流量来源信息对应的网络设备暂停发送目标队列 的数据包,所述目标队列为所述网络设备的一个或多个队列;或者,
    响应于网络拥塞状态为目标网络拥塞状态,且所述当前的队列长度小于第二阈值,向所述第一交换机发送第二信令报文,所述第二信令报文用于指示所述第一交换机发送第二流量控制信息,所述第二流量控制信息用于指示所述网络设备继续发送目标队列的数据包,所述第二阈值小于所述第一阈值。
  14. 根据权利要求13所述的方法,其特征在于,所述向所述第一交换机发送第一信令报文之前,还包括:获取第一拥塞通知包CNP报文,将所述第一CNP报文的帧头中的指定字段的值设为第一特征值,将所述第一CNP报文作为所述第一信令报文;
    所述向所述第一交换机发送第二信令报文之前,还包括:获取第二CNP报文,将所述第二CNP报文的帧头中的指定字段的值设为第二特征值,将所述第二CNP报文作为所述第二信令报文。
  15. 根据权利要求12-14任一所述的方法,其特征在于,所述识别网络拥塞状态,包括:
    读取当前的队列长度及显式拥塞通知ECN阈值范围,所述ECN阈值范围用于指示添加ECN标识的概率,所述ECN标识用于指示网络发生拥塞;
    根据所述当前的队列长度及所述ECN阈值范围识别网络拥塞状态。
  16. 根据权利要求15所述的方法,其特征在于,所述目标网络拥塞状态包括ECN失效状态或拥塞通知包CNP失效状态,所述根据所述当前的队列长度及所述ECN阈值范围识别网络拥塞状态,包括:
    响应于所述当前的队列长度大于参考范围的最大值,且未补充CNP报文,则所述网络拥塞状态为ECN失效状态,所述参考范围基于所述ECN阈值范围确定;
    响应于所述当前的队列长度大于所述参考范围的最大值,且已补充CNP报文,则所述网络拥塞状态为CNP失效状态。
  17. 一种网络拥塞的控制方法,其特征在于,所述方法应用于网络设备,所述方法包括:
    所述网络设备接收第一交换机发送的目标流量控制信息,所述目标流量控制信息用于指示进行流量控制,所示目标流量控制信息是所述第一交换机接收到第二交换机在目标网络拥塞状态发送的目标信令报文之后发送的;
    根据所述目标流量控制信息进行流量控制。
  18. 根据权利要求17所述的方法,其特征在于,所述接收第一交换机发送的目标流量控制信息,包括:
    接收所述第一交换机发送的第一流量控制信息,所述第一流量控制信息用于指示暂停发送目标队列的数据包,所述目标队列为所述网络设备的一个或多个队列;
    所述根据所述目标流量控制信息进行流量控制,包括:
    根据所述第一流量控制信息暂停发送所述目标队列的数据包。
  19. 根据权利要求18所述的方法,其特征在于,所述接收所述第一交换机发送的第一流量控制信息,包括:
    接收所述第一交换机发送的第一基于优先级的流量控制PFC报文,所述第一PFC报文的时间字段的值为第一值,所述第一值用于指示所述第一流量控制信息;
    所述根据所述第一流量控制信息暂停发送所述目标队列的数据包,包括:
    根据所述第一PFC报文的时间字段的值确定暂停发送数据包的时间长度,在所述时间长度内暂停发送所述目标队列的数据包。
  20. 根据权利要求17所述的方法,其特征在于,所述接收第一交换机发送的目标流量控制信息,包括:
    接收所述第一交换机发送的第二流量控制信息,所述第二流量控制信息用于指示继续发送目标队列的数据包,所述目标队列为所述网络设备的一个或多个队列;
    所述根据所述目标流量控制信息进行流量控制,包括:
    根据所述第二流量控制信息继续发送所述目标队列的数据包。
  21. 根据权利要求20所述的方法,其特征在于,所述接收所述第一交换机发送的第二流量控制信息,包括:
    接收所述第一交换机发送的第二基于优先级的流量控制PFC报文,所述第二PFC报文的时间字段的值为第二值,所述第二值用于指示所述第二流量控制信息;
    所述根据所述第二流量控制信息继续发送所述目标队列的数据包,包括:
    根据所述第二PFC报文的时间字段的值继续发送所述目标队列的数据包。
  22. 根据权利要求17-21任一所述的方法,其特征在于,所述目标网络拥塞状态包括显式拥塞通知ECN失效状态或拥塞通知包CNP失效状态;
    所述ECN失效状态是指所述第二交换机当前的队列长度大于参考范围的最大值,且未补充CNP报文的状态;所述CNP失效状态是指所述第二交换机当前的队列长度大于所述参考区域的最大值,且已补充CNP报文的状态;所述参考范围基于ECN阈值范围确定,所述ECN阈值范围用于指示添加ECN标识的概率,所述ECN标识用于指示网络发生拥塞。
  23. 一种网络拥塞的控制装置,其特征在于,所述装置包括:
    接收模块,用于接收第二交换机在目标网络拥塞状态发送的目标信令报文,所述目标信令报文携带流量来源信息;
    发送模块,用于根据所述目标信令报文向所述流量来源信息对应的网络设备发送目标流量控制信息,所述目标流量控制信息用于指示进行流量控制。
  24. 根据权利要求23所述的装置,其特征在于,所述发送模块,用于根据所述目标信令报文向所述流量来源信息对应的网络设备发送第一流量控制信息,所述第一流量控制信息用于指示所述网络设备暂停发送目标队列的数据包,所述目标队列为所述网络设备的一个或多个队列。
  25. 根据权利要求24所述的装置,其特征在于,所述发送模块,用于根据所述目标信令报文构造第一基于优先级的流量控制PFC报文,所述第一PFC报文的时间字段的值为第一值,所述第一值用于指示所述第一流量控制信息;向所述流量来源信息对应的网络设备发送所述第一PFC报文。
  26. 根据权利要求24或25所述的装置,其特征在于,所述接收模块,用于接收所述第二交换机在目标网络拥塞状态发送的第一信令报文,所述第一信令报文用于指示发送所述第一流量控制信息。
  27. 根据权利要求26所述的装置,其特征在于,所述接收模块,用于接收所述第二交换机在目标网络拥塞状态发送的第一拥塞通知包CNP报文,所述第一CNP报文的帧头中的指定字段的值为第一特征值,所述第一特征值用于指示发送所述第一流量控制信息。
  28. 根据权利要求23所述的装置,其特征在于,所述发送模块,用于根据所述目标信令报文向所述流量来源信息对应的网络设备发送第二流量控制信息,所述第二流量控制信息用于指示所述网络设备继续发送目标队列的数据包,所述目标队列为所述网络设备的一个或多个队列。
  29. 根据权利要求28所述的装置,其特征在于,所述发送模块,用于根据所述目标信令报文构造第二基于优先级的流量控制PFC报文,所述第二PFC报文的时间字段的值为第二值,所述第二值用于指示所述第二流量控制信息;向所述流量来源信息对应的网络设备发送所述第二PFC报文。
  30. 根据权利要求28或29所述的装置,其特征在于,所述接收模块,用于接收所述第二交换机在目标网络拥塞状态发送的第二信令报文,所述第二信令报文用于指示发送所述第二流量控制信息。
  31. 根据权利要求30所述的装置,其特征在于,所述接收模块,用于接收所述第二交换机在目标网络拥塞状态发送的第二拥塞通知包CNP报文,所述第二CNP报文的帧头中的指定字段的值为第二特征值,所述第二特征值用于指示发送所述第二流量控制信息。
  32. 根据权利要求23-31任一所述的装置,其特征在于,所述发送模块,用于根据所述目标信令报文携带的流量来源信息确定流量来源端口;通过所述流量来源端口向所述流量来源信息对应的网络设备发送目标流量控制信息。
  33. 根据权利要求23-32任一所述的装置,其特征在于,所述目标网络拥塞状态包括显式拥塞通知ECN失效状态或拥塞通知包CNP失效状态;
    所述ECN失效状态是指所述第二交换机当前的队列长度大于参考范围的最大值,且未补 充CNP报文的状态;所述CNP失效状态是指所述第二交换机当前的队列长度大于所述参考区域的最大值,且已补充CNP报文的状态;所述参考范围基于ECN阈值范围确定,所述ECN阈值范围用于指示添加ECN标识的概率,所述ECN标识用于指示网络发生拥塞。
  34. 一种网络拥塞的控制装置,其特征在于,所述装置包括:
    识别模块,用于识别网络拥塞状态;
    发送模块,用于响应于网络拥塞状态为目标网络拥塞状态,向第一交换机发送目标信令报文,所述目标信令报文携带流量来源信息,所述目标信令报文用于指示所述第一交换机进行流量控制。
  35. 根据权利要求34所述的装置,其特征在于,所述目标信令报文包括第一信令报文或第二信令报文,所述发送模块,用于响应于网络拥塞状态为目标网络拥塞状态,且当前的队列长度大于第一阈值,向所述第一交换机发送第一信令报文,所述第一信令报文用于指示所述第一交换机发送第一流量控制信息,所述第一流量控制信息用于指示所述流量来源信息对应的网络设备暂停发送目标队列的数据包,所述目标队列为所述网络设备的一个或多个队列;或者,
    响应于网络拥塞状态为目标网络拥塞状态,且所述当前的队列长度小于第二阈值,向所述第一交换机发送第二信令报文,所述第二信令报文用于指示所述第一交换机发送第二流量控制信息,所述第二流量控制信息用于指示所述网络设备继续发送目标队列的数据包,所述第二阈值小于所述第一阈值。
  36. 根据权利要求35所述的装置,其特征在于,所述装置还包括:
    获取模块,用于获取第一拥塞通知包CNP报文,将所述第一CNP报文的帧头中的指定字段的值设为第一特征值,将所述第一CNP报文作为所述第一信令报文;
    或者,所述获取模块,用于获取第二CNP报文,将所述第二CNP报文的帧头中的指定字段的值设为第二特征值,将所述第二CNP报文作为所述第二信令报文。
  37. 根据权利要求34-36任一所述的装置,其特征在于,所述识别模块,用于读取当前的队列长度及显式拥塞通知ECN阈值范围,所述ECN阈值范围用于指示添加ECN标识的概率,所述ECN标识用于指示网络发生拥塞;根据所述当前的队列长度及所述ECN阈值范围识别网络拥塞状态。
  38. 根据权利要求37所述的装置,其特征在于,所述目标网络拥塞状态包括ECN失效状态或拥塞通知包CNP失效状态,所述识别模块,用于响应于所述当前的队列长度大于参考范围的最大值,且未补充CNP报文,则所述网络拥塞状态为ECN失效状态,所述参考范围基于所述ECN阈值范围确定;响应于所述当前的队列长度大于所述参考范围的最大值,且已补充CNP报文,则所述网络拥塞状态为CNP失效状态。
  39. 一种网络拥塞的控制装置,其特征在于,所述装置包括:
    接收模块,用于接收第一交换机发送的目标流量控制信息,所述目标流量控制信息用于指示进行流量控制,所示目标流量控制信息是所述第一交换机接收到第二交换机在目标网络拥塞状态发送的目标信令报文之后发送的;
    控制模块,用于根据所述目标流量控制信息进行流量控制。
  40. 根据权利要求39所述的装置,其特征在于,所述接收模块,用于接收所述第一交换机发送的第一流量控制信息,所述第一流量控制信息用于指示暂停发送目标队列的数据包,所述目标队列为所述网络设备的一个或多个队列;
    所述控制模块,用于根据所述第一流量控制信息暂停发送所述目标队列的数据包。
  41. 根据权利要求40所述的装置,其特征在于,所述接收模块,用于接收所述第一交换机发送的第一基于优先级的流量控制PFC报文,所述第一PFC报文的时间字段的值为第一值,所述第一值用于指示所述第一流量控制信息;
    所述控制模块,用于根据所述第一PFC报文的时间字段的值确定暂停发送数据包的时间长度,在所述时间长度内暂停发送所述目标队列的数据包。
  42. 根据权利要求39所述的装置,其特征在于,所述接收模块,用于接收所述第一交换机发送的第二流量控制信息,所述第二流量控制信息用于指示继续发送目标队列的数据包,所述目标队列为所述网络设备的一个或多个队列;
    所述控制模块,用于根据所述第二流量控制信息继续发送所述目标队列的数据包。
  43. 根据权利要求42所述的装置,其特征在于,所述接收模块,用于接收所述第一交换机发送的第二基于优先级的流量控制PFC报文,所述第二PFC报文的时间字段的值为第二值,所述第二值用于指示所述第二流量控制信息;
    所述控制模块,用于根据所述第二PFC报文的时间字段的值继续发送所述目标队列的数据包。
  44. 根据权利要求39-43任一所述的装置,其特征在于,所述目标网络拥塞状态包括显式拥塞通知ECN失效状态或拥塞通知包CNP失效状态;
    所述ECN失效状态是指所述第二交换机当前的队列长度大于参考范围的最大值,且未补充CNP报文的状态;所述CNP失效状态是指所述第二交换机当前的队列长度大于所述参考区域的最大值,且已补充CNP报文的状态;所述参考范围基于ECN阈值范围确定,所述ECN阈值范围用于指示添加ECN标识的概率,所述ECN标识用于指示网络发生拥塞。
  45. 一种网络拥塞的控制设备,其特征在于,所述设备包括:
    存储器及处理器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行,以实现权利要求1-11中任一所述的网络拥塞的控制方法。
  46. 一种网络拥塞的控制设备,其特征在于,所述设备包括:
    存储器及处理器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行,以实现权利要求12-16中任一所述的网络拥塞的控制方法。
  47. 一种网络拥塞的控制设备,其特征在于,所述设备包括:
    存储器及处理器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行,以实现权利要求17-22中任一所述的网络拥塞的控制方法。
  48. 一种网络拥塞的控制系统,其特征在于,所述系统包括:所述权利要求45所述的设备、所述权利要求46所述的设备及所述权利要求47所述的设备。
  49. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1-22中任一所述的网络拥塞的控制方法。
PCT/CN2021/093165 2020-05-30 2021-05-11 网络拥塞的控制方法、装置、设备、系统及存储介质 WO2021244240A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21817960.4A EP4152705A4 (en) 2020-05-30 2021-05-11 METHOD AND APPARATUS FOR NETWORK OVERLOAD CONTROL, APPARATUS, SYSTEM AND STORAGE MEDIUM
US18/071,263 US20230107366A1 (en) 2020-05-30 2022-11-29 Network congestion control method, apparatus, device, and system, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010480552.2A CN113746744A (zh) 2020-05-30 2020-05-30 网络拥塞的控制方法、装置、设备、系统及存储介质
CN202010480552.2 2020-05-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/071,263 Continuation US20230107366A1 (en) 2020-05-30 2022-11-29 Network congestion control method, apparatus, device, and system, and storage medium

Publications (1)

Publication Number Publication Date
WO2021244240A1 true WO2021244240A1 (zh) 2021-12-09

Family

ID=78727769

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/093165 WO2021244240A1 (zh) 2020-05-30 2021-05-11 网络拥塞的控制方法、装置、设备、系统及存储介质

Country Status (4)

Country Link
US (1) US20230107366A1 (zh)
EP (1) EP4152705A4 (zh)
CN (1) CN113746744A (zh)
WO (1) WO2021244240A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116915706A (zh) * 2023-09-13 2023-10-20 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 数据中心网络拥塞控制方法、装置、设备及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760252A (zh) * 2022-03-24 2022-07-15 北京邮电大学 数据中心网络拥塞控制方法及系统
CN114884823B (zh) * 2022-04-29 2024-03-22 北京有竹居网络技术有限公司 流量拥塞控制方法、装置、计算机可读介质及电子设备
CN115883492A (zh) * 2022-11-18 2023-03-31 浪潮思科网络科技有限公司 一种MLAG环境下的RoCE-SAN无损存储网络故障收敛方法
CN115941599B (zh) * 2023-03-10 2023-05-16 珠海星云智联科技有限公司 一种用于预防pfc死锁的流量控制方法、设备及介质
CN116055416B (zh) * 2023-03-28 2023-05-30 新华三工业互联网有限公司 应用于长距通信网络场景下传输速率的调整方法及装置
CN116489106B (zh) * 2023-06-21 2023-09-19 新华三技术有限公司 一种拥塞控制方法、装置、转发芯片及客户端
CN116527593B (zh) * 2023-07-03 2023-09-19 珠海星云智联科技有限公司 网络流量拥塞控制方法及相关装置
CN116915721B (zh) * 2023-09-12 2023-12-19 珠海星云智联科技有限公司 一种拥塞控制方法、装置、计算设备及可读存储介质
CN117544567B (zh) * 2024-01-09 2024-03-19 南京邮电大学 存传一体的rdma数据中心拥塞控制方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106330742A (zh) * 2015-06-23 2017-01-11 华为技术有限公司 一种流量控制的方法及网络控制器
CN109802894A (zh) * 2019-01-03 2019-05-24 中国联合网络通信集团有限公司 流量控制方法及装置
KR101992750B1 (ko) * 2017-12-18 2019-06-25 울산과학기술원 라우터 장치 및 그의 혼잡 제어 방법
CN109981471A (zh) * 2017-12-27 2019-07-05 华为技术有限公司 一种缓解拥塞的方法、设备和系统
US20200021532A1 (en) * 2018-07-10 2020-01-16 Cisco Technology, Inc. Automatic rate limiting based on explicit network congestion notification in smart network interface card

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10025609B2 (en) * 2015-04-23 2018-07-17 International Business Machines Corporation Virtual machine (VM)-to-VM flow control for overlay networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106330742A (zh) * 2015-06-23 2017-01-11 华为技术有限公司 一种流量控制的方法及网络控制器
KR101992750B1 (ko) * 2017-12-18 2019-06-25 울산과학기술원 라우터 장치 및 그의 혼잡 제어 방법
CN109981471A (zh) * 2017-12-27 2019-07-05 华为技术有限公司 一种缓解拥塞的方法、设备和系统
US20200021532A1 (en) * 2018-07-10 2020-01-16 Cisco Technology, Inc. Automatic rate limiting based on explicit network congestion notification in smart network interface card
CN109802894A (zh) * 2019-01-03 2019-05-24 中国联合网络通信集团有限公司 流量控制方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116915706A (zh) * 2023-09-13 2023-10-20 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 数据中心网络拥塞控制方法、装置、设备及存储介质
CN116915706B (zh) * 2023-09-13 2023-12-26 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 数据中心网络拥塞控制方法、装置、设备及存储介质

Also Published As

Publication number Publication date
EP4152705A1 (en) 2023-03-22
CN113746744A (zh) 2021-12-03
US20230107366A1 (en) 2023-04-06
EP4152705A4 (en) 2023-11-01

Similar Documents

Publication Publication Date Title
WO2021244240A1 (zh) 网络拥塞的控制方法、装置、设备、系统及存储介质
US20220311544A1 (en) System and method for facilitating efficient packet forwarding in a network interface controller (nic)
KR102109021B1 (ko) 데이터 전송 방법, 전송 노드, 수신 노드 및 데이터 전송 시스템
US11477129B2 (en) Data transmission method, computing device, network device, and data transmission system
US9391907B2 (en) Packet aggregation
US10826830B2 (en) Congestion processing method, host, and system
US9215188B2 (en) System and method for processing network packets received on a client device using opportunistic polling between networking layers
WO2018082615A1 (zh) 一种发送报文的方法、装置、芯片及终端
US20060203730A1 (en) Method and system for reducing end station latency in response to network congestion
US9559960B2 (en) Network congestion management
WO2019134383A1 (zh) 控制网络拥塞的方法、接入设备和计算机可读存储介质
US11870698B2 (en) Congestion control method and apparatus, communications network, and computer storage medium
CN112995048B (zh) 数据中心网络的阻塞控制与调度融合方法及终端设备
WO2020210780A1 (en) Chunk based network qualitative services
CN108243117B (zh) 一种流量监控方法、装置及电子设备
RU2715016C1 (ru) Передающее устройство, способ, программа и носитель записи
WO2023226532A1 (zh) 拥塞控制方法、节点及系统
WO2024061042A1 (zh) 数据传输方法和数据传输系统
US20230403229A1 (en) System and method for facilitating efficient host memory access from a network interface controller (nic)
CN116743660A (zh) 一种面向广域网的拥塞控制方法及装置
CN115632748A (zh) 数据处理方法、装置、电子设备和存储介质
CN113630337A (zh) 数据流接纳方法、装置及系统、计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21817960

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021817960

Country of ref document: EP

Effective date: 20221213