WO2021147704A1 - 拥塞控制方法及装置 - Google Patents

拥塞控制方法及装置 Download PDF

Info

Publication number
WO2021147704A1
WO2021147704A1 PCT/CN2021/071251 CN2021071251W WO2021147704A1 WO 2021147704 A1 WO2021147704 A1 WO 2021147704A1 CN 2021071251 W CN2021071251 W CN 2021071251W WO 2021147704 A1 WO2021147704 A1 WO 2021147704A1
Authority
WO
WIPO (PCT)
Prior art keywords
congestion
message
data stream
congested
time interval
Prior art date
Application number
PCT/CN2021/071251
Other languages
English (en)
French (fr)
Inventor
刘和洋
郑合文
韩磊
严金丰
吴炳晖
温华锋
陶佩莹
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21744459.5A priority Critical patent/EP4087199A4/en
Publication of WO2021147704A1 publication Critical patent/WO2021147704A1/zh
Priority to US17/870,700 priority patent/US20220368633A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/122Avoiding congestion; Recovering from congestion by diverting traffic away from congested entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/11Identifying congestion
    • H04L47/115Identifying congestion using a dedicated packet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/106Active monitoring, e.g. heartbeat, ping or trace-route using time related information in packets, e.g. by adding timestamps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/31Flow control; Congestion control by tagging of packets, e.g. using discard eligibility [DE] bits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/35Flow control; Congestion control by embedding flow control information in regular packets, e.g. piggybacking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/29Flow control; Congestion control using a combination of thresholds

Definitions

  • This application relates to the field of communication technology, and in particular to a congestion control method and device.
  • RDMA remote direct memory access
  • data transmission is directly sent and received on the network interface cards (NICs) of the end node through the registered cache.
  • NICs network interface cards
  • the network protocols are all deployed on the NICs without passing through the network protocol stack of the host. This method significantly reduces the number of data in the host.
  • the RDMA (RDMA overconverged ethernet, RoCE) protocol applied to converged Ethernet includes two versions: RoCEv1 and RoCEv2.
  • RoCEv1 is an RDMA protocol based on the Ethernet link layer
  • RoCEv2 is based on the Ethernet link layer.
  • the RDMA protocol implemented by the user datagram protocol (UDP) layer in the transmission control protocol/internet protocol (TCP/IP) protocol.
  • DCQCN a data center quantized congestion notification
  • ECN explicit congestion notification
  • CE congestion encountered
  • CNP explicit congestion notification packet
  • the destination device If a CE packet arrives in a flow, and the destination device has not sent a CNP packet for the flow in the past n microseconds, the destination device immediately sends a CNP packet. That is, if there are multiple CE packets arriving at a certain flow within a time window (n microseconds), the source device generates at most one CNP packet for the flow every n microseconds. On the destination device, when the destination device receives a CNP message, the destination device reduces the sending rate and updates the rate reduction factor. The destination device will also increase the sending rate according to a certain algorithm when it does not receive a CNP packet for a continuous period of time.
  • FIG. 1 the schematic diagram of DCQCN rate control failure.
  • a packet time interval for each flow that is, the minimum time interval for which the flow can obtain CNP packets
  • the source device does not receive CNP packets, which will cause the congested flow to be processed for rate increase, resulting in the failure of speed control convergence, which affects the efficiency of packet transmission.
  • the present application provides a congestion control method and device, so as to prevent the data stream from being processed for rate increase when it is congested, and to improve the efficiency of message transmission.
  • a congestion control method comprising: acquiring time information of one or more congestion packets in a first data stream sent, the one or more congestion packets carrying a congestion flag . If the first data stream is congested, obtain a first congestion notification message according to the time information of one or more congestion messages in the first data stream, and the first congestion notification message is used to Messages whose congestion exceeds the first set time interval are notified of congestion. Sending the first congestion notification message. It can avoid that the data stream is still processed for rate increase when it is congested, and the message transmission efficiency is improved.
  • the congestion mark may be an ECN mark, and the name of the congestion mark is not limited in this application.
  • the congestion notification message may be a CNP message, and the name of the congestion notification message is not limited in this application.
  • acquiring the first congestion notification message according to the time information of one or more congestion messages in the first data stream includes: The time interval between the time and the time at which the first congestion message is sent is greater than or equal to the first set time interval, and the first congestion notification message is acquired, where the first congestion message is in the The last packet of the first data stream sent before the current time. If a new congestion message is not sent after a certain time interval after sending a congestion message, a supplementary congestion notification message needs to be sent to the source device to prevent the data stream from being processed by the source device for speed increase.
  • the time interval may be the interval for increasing the rate of packets of the data stream.
  • acquiring the first congestion notification message includes: sending the first congestion message.
  • start a timer wherein the timing time of the timer is the first set time interval; and if the timing time is reached, the next message of the first data stream is not sent, Obtain the first congestion notification message.
  • the timer is used to start timing from sending the last congestion message.
  • a supplementary congestion notification message can be sent to the source device to prevent the data stream from being sent to the source device. Perform speed-up processing.
  • acquiring the first congestion notification message includes: monitoring the current time according to the set period Whether the time interval between and the time when the first congestion message is sent is greater than or equal to the first set time interval. If the time interval between the current time and the time at which the first congestion message is sent is greater than or equal to the first set time interval, acquire the first congestion notification message. By periodically monitoring whether the next message is sent after the last congestion message is sent, if the next message has not been sent yet, a supplementary congestion notification message can be sent to the source device to prevent the data stream from being affected. The source device performs speed-up processing.
  • acquiring the first congestion notification message according to the time information of one or more congestion messages in the first data stream includes: The time interval between the time when the second congestion message is sent and the time when the first congestion message is sent is greater than or equal to the first set time interval, the first congestion notification message is acquired, and the first congestion message is The text is the previous message of the second congestion message, and the first congestion message and the second congestion message correspond to the first data flow. If the time interval between sending two congestion messages is too large, a supplementary congestion notification message can also be sent to the source-end device to prevent the data stream from being processed by the source-end device.
  • the sending the first congestion notification message includes: monitoring whether a time interval between the current time and obtaining the first congestion notification message is greater than or equal to a second set time interval. If the time interval between the current time and obtaining the first congestion notification message is greater than or equal to the second set time interval, sending the first congestion notification message.
  • the supplementary congestion notification message may be sent after a certain time delay. The supplementary congestion notification message can be made as far as possible after the congestion notification message of the last congestion message sent and before the congestion notification message of the next congestion message sent. However, there is no correspondence between the congestion notification message and the congestion message.
  • each congestion notification message is the same, except that there can be a certain correspondence between the number of congestion messages sent and the number of congestion notification messages sent.
  • the foregoing second set time interval may be determined according to the time interval between the time when the first congestion message is sent and the time when the second congestion notification message sent by the destination device is received.
  • the sending the first congestion notification message includes: immediately sending the first congestion notification message after obtaining the first congestion notification message. Sending the congestion notification message immediately enables the source-end device to receive the congestion notification in time, so as to perform speed reduction processing in time.
  • the method further includes: obtaining transmission parameters of one or more congestion packets in the first data stream. According to the transmission parameters of the one or more congestion packets, it is determined that the first data stream is congested. According to the transmission parameters of the congested message, it can be judged whether the data stream is congested.
  • the method further includes: sampling and copying one or more congestion packets of the first data stream at a set sampling rate to obtain a mirrored data stream corresponding to the first data stream, Wherein, the mirrored data stream includes one or more mirrored packets.
  • Congestion identification and control can be performed by the forwarding chip, or a new coprocessor can be added. The coprocessor samples and copies congested packets in the data stream from the forwarding chip to perform congestion identification and control.
  • the transmission parameters include the number of messages and the length of the messages
  • the determining that the first data stream is congested according to the transmission parameters of the one or more congestion messages includes: The number of congestion packets and the length of the congestion packets of one or more of the first data flows in the first queue sent within the time period determine the congestion packet rate of the queue. If the congested packet rate of the queue is greater than or equal to the first rate threshold, it is determined that the first queue is congested, and the first queue enters a congested state.
  • the congestion packet rate of the queue can be accurately determined according to the number of packets and the packet length of one or more data streams sent in a queue, so as to determine whether the queue is congested. If the queue is congested, all data streams in the queue are congested.
  • the method further includes: if the congestion message rate of the queue is less than or equal to a second rate threshold, determining that the first queue is not congested, and the first queue exits the congested state, Wherein, the second rate threshold is less than the first rate threshold.
  • the rate of congested packets in the queue is less than a certain value, and it can be judged that the queue has exited the congested state.
  • the congested packet rate is between the first rate threshold and the second rate threshold, the queue is still in a congested state, and congestion control needs to be performed on the queue.
  • the transmission parameter includes a sequence number of one or more congestion packets in the first data stream, and the first data stream is determined according to the transmission parameters of the one or more congestion packets.
  • the occurrence of congestion in a data stream includes: obtaining the degree of continuity of the sequence numbers of congested packets in the first data stream. If the degree of continuity is greater than or equal to the first threshold, it is determined that the first data stream is congested, and the first data stream enters a congested state. For a single data flow, the packet sequence number is carried in the header of the packet. By obtaining the continuity degree of the sequence number of the congestion packet in the data flow, the continuity degree of the congestion packet can be determined, so as to determine whether the data flow occurs congestion.
  • the degree of continuity of the sequence numbers of the congested message may refer to the number of consecutive sequence numbers of the congested message.
  • the degree of continuity of the congested message may refer to the number of messages corresponding to the consecutive sequence numbers of the congested message. quantity.
  • the number of consecutive sequence numbers of the congested message may be the number of consecutive sequence numbers in the sampled data stream.
  • the degree of continuity of the sequence numbers of congestion packets may refer to the difference between the sequence numbers of congestion packets, and the difference between the sequence numbers of congestion packets in a certain number of packets sent by a network device If it is less than the set threshold, it can also be considered that the continuity degree of the sequence numbers of the congested packets in the data stream is greater than the set threshold, and the continuity degree of the congested packets can be determined, thereby judging that the data stream is congested.
  • the method further includes: if the degree of continuity is less than or equal to a second threshold, determining that the first data stream is not congested, the first data stream exits the congested state, and the second threshold Less than the above-mentioned first threshold. If the continuity of the sequence number of the congested message is less than a certain threshold, it can be judged that the data stream exits the congested state.
  • a congestion control device including: a first acquiring unit configured to acquire time information of one or more congestion messages in a first data stream sent, the one or more congestion messages Carrying a congestion flag; a second obtaining unit, configured to obtain the first congestion notification packet congestion according to the time information of one or more congestion messages in the first data stream when the first data stream is congested A notification message, where the first congestion notification message is used to perform congestion notification on a message that is congested for more than a first set time interval; and the first sending unit is configured to send the first congestion notification message.
  • the second acquiring unit is configured to acquire the first congestion notification report if the time interval between the current time and the time at which the first congestion message is sent is greater than or equal to the first set time interval Wherein, the first congestion message is the last message of the first data stream sent before the current time.
  • the second acquiring unit includes: a starting unit, configured to start a timer when the first congestion message is sent, wherein the timing time of the timer is the first setting Time interval; and a third acquiring unit, configured to acquire the first congestion notification message if the next message of the first data stream is not sent when the timing time arrives.
  • the second acquiring unit includes: a first monitoring unit, configured to monitor whether the time interval between the current time and the time when the first congestion message is sent is greater than or equal to the A first set time interval; and a fourth acquiring unit, configured to acquire the first set time interval if the time interval between the current time and the time at which the first congestion message is sent is greater than or equal to the first set time interval A congestion notification message.
  • the second obtaining unit is configured to obtain if the time interval between the time when the second congestion message is sent and the time when the first congestion message is sent is greater than or equal to the first set time interval In the first congestion notification message, the first congestion message is the previous message of the second congestion message, and the first congestion message and the second congestion message correspond to the first congestion message. data flow.
  • the first sending unit includes: a second monitoring unit, configured to monitor whether a time interval between the current time and obtaining the first congestion notification message is greater than or equal to a second set time interval; And a second sending unit, configured to send the first congestion notification message if the time interval between the current time and obtaining the first congestion notification message is greater than or equal to the second set time interval, wherein: The second set time interval is determined according to the time interval between the time when the first congestion message is sent and the time when the second congestion notification message sent by the destination device is received.
  • the first sending unit is configured to send the first congestion notification message immediately after acquiring the first congestion notification message.
  • the apparatus further includes: a fifth acquiring unit, configured to acquire transmission parameters of one or more congestion packets in the first data stream; and a first determining unit, configured to obtain transmission parameters of one or more congestion packets in the first data stream; The transmission parameters of one or more congestion packets determine that the first data stream is congested.
  • the device further includes: a sampling and copying unit, configured to sample and copy one or more congestion packets of the first data stream at a set sampling rate to obtain the first data stream The corresponding mirrored data stream, wherein the mirrored data stream includes one or more mirrored packets.
  • a sampling and copying unit configured to sample and copy one or more congestion packets of the first data stream at a set sampling rate to obtain the first data stream The corresponding mirrored data stream, wherein the mirrored data stream includes one or more mirrored packets.
  • the transmission parameters include the number of messages and the length of the messages
  • the first determining unit includes: a second determining unit configured to send according to one or more of the first queues sent in the first time period. The number of congestion packets and the length of the congestion packets of the first data flow to determine the congestion packet rate of the queue; and a third determining unit, configured to determine if the congestion packet rate of the queue is greater than or equal to a first rate threshold , It is determined that congestion occurs in the first queue, and the first queue enters a congested state.
  • the device further includes: a fourth determining unit, configured to determine that the first queue is not congested if the congested message rate of the queue is less than or equal to a second rate threshold, and the first queue The queue exits the congested state, wherein the second rate threshold is less than the first rate threshold.
  • the transmission parameter includes a sequence number of one or more congestion packets in the first data stream
  • the first determining unit includes: a sixth obtaining unit, configured to obtain the first The degree of continuity of the sequence numbers of congested packets in the data stream; a fifth determining unit, configured to determine that if the degree of continuity is greater than or equal to a first threshold, that the first data stream is congested and the first data stream enters congestion state.
  • the device further includes: a sixth determining unit, configured to determine that the first data stream is not congested if the continuity degree is less than or equal to a second threshold, and the first data stream exits the station. For the congestion state, the second threshold is less than the first threshold.
  • a congestion control device including a processor and a physical interface, where the processor is configured to execute the first aspect or any one of the first aspects to implement the method.
  • a computer-readable storage medium stores instructions that, when run on a computer, cause the computer to execute the above-mentioned first aspect or any one of the implementations of the first aspect. The method described.
  • a computer program product containing instructions which when run on a computer, causes the computer to execute the method described in the first aspect or any one of the first aspects.
  • Figure 1 is a schematic diagram of DCQCN speed control failure
  • FIG. 2 is a schematic structural diagram of a communication system that implements the congestion control method of an embodiment of the present application
  • FIG. 3 is a schematic diagram of a typical scenario where the congestion control method according to an embodiment of the application is applicable;
  • FIG. 4 is a schematic flowchart of a congestion control method provided by an embodiment of this application.
  • FIG. 5 is a schematic flowchart of another congestion control method provided by an embodiment of this application.
  • Figure 6 is a schematic diagram of identifying congestion in a queue
  • Figure 7 is a schematic diagram of identifying congestion according to queue depth
  • Figure 8 is a schematic diagram of queue congestion control
  • Figure 9 is a schematic diagram of another queue congestion control
  • Figure 10 is a schematic diagram of identifying data stream congestion
  • Figure 11 is a schematic diagram of flow congestion control
  • Figure 12 is a schematic diagram of another first-class congestion control
  • FIG. 13 is a schematic structural diagram of a congestion control device provided by an embodiment of the application.
  • FIG. 14 is a schematic structural diagram of another congestion control apparatus provided by an embodiment of the application.
  • FIG. 2 is an example of a communication system that implements the congestion control method of the embodiment of the present application.
  • the communication system may include a source device 100, 1 to n network devices 200, and a destination device 300, where n ⁇ 1 .
  • the source device refers to the sender of the data stream
  • the destination device refers to the receiver of the data stream
  • the network device may be a switch or the like.
  • network equipment implements congestion control.
  • the above-mentioned source device may also be called a reaction point (RP)
  • the above-mentioned network device may also be called a congestion point (CP)
  • the above-mentioned destination device It can also be called a notification point (NP).
  • RP reaction point
  • CP congestion point
  • NP notification point
  • FIG 3 is an example of a typical scenario where the congestion control method of this application is applicable.
  • this scenario is a data center network, which can be applied to high-performance computing, high-performance distributed storage, big data, artificial intelligence, etc.
  • servers can transmit data to other servers through leaf nodes and root nodes. Its data transmission uses RDMA technology.
  • the transmission rate between the server and the leaf node is lower than the transmission rate between the leaf node and the root node.
  • the transmission rate between the server and the leaf node is 25G/s
  • the transmission rate between the leaf node and the root node is 100G/s.
  • the data center network may be a CLOS network.
  • the CLOS network includes a server, a core switch, and an access switch.
  • the server accesses the network through the access switch and the core switch
  • the source device may be one or more servers in the network.
  • the destination device can be another server or multiple servers in the network
  • the network device can be an access switch and a core switch in the network.
  • the embodiments of the application provide a congestion control method and device.
  • congestion notification is performed for packets that are congested for more than a certain time interval to avoid data
  • the flow is congested, it is still processed for rate increase, which improves the efficiency of message transmission.
  • FIG. 4 is a schematic flowchart of a congestion control method provided by an embodiment of the application. As shown in FIG. 4, the congestion control method may include:
  • the first data stream refers to a data stream sent from a source device to a destination device through one or more network devices.
  • the first data stream may be one data stream, or multiple data streams of different service types with the same source and destination, that is, an aggregate stream.
  • the congestion identification and congestion control in this embodiment may be executed by any network device or each network device between the source device and the destination device. Specifically, the congestion identification and congestion control in this embodiment may be executed by the forwarding chip of the network device, or may be executed by a coprocessor. If it is executed by the forwarding chip, no other operations can be required to identify and control the congestion of the forwarded data stream, and the timeliness is high. If it is executed by the coprocessor, it can sample and copy the congested message forwarded by the forwarding chip, and then perform congestion identification and control based on the mirrored data stream obtained by the copy.
  • the first data stream may include congestion messages and ordinary messages. Each message carries the corresponding content in the first data stream.
  • Congested packets refer to packets marked with ECN.
  • Ordinary messages refer to messages that are not marked with ECN.
  • the network device is aimed at the congested packet in the first data stream.
  • the transmission parameters include transmission rate, message length, queue depth, etc. This transmission parameter is of great significance to whether the message can be smoothly transmitted to the destination device. Therefore, it is necessary to obtain the transmission parameters of each congested packet in the first data stream.
  • S102 Determine that the first data stream is congested according to the transmission parameters of one or more congestion packets.
  • the congested packets received by the network equipment from the upstream enter the queue of its outgoing port.
  • the network device may include one or more queues.
  • One or more data streams can enter a queue and wait for transmission.
  • a queue has a certain depth. According to the transmission parameters of congestion packets, if too many congestion packets are sent in a queue, it can be determined that the queue or a certain data flow in the queue is congested.
  • S101 and S102 are optional steps, which are represented by dashed lines in the figure. That is, it is possible to determine whether the data stream is congested before executing the following congestion control process, or the following congestion control process can be executed on the premise of knowing that the data stream has been congested.
  • the network device forwards each congestion message in the first data stream, and can record the time information for sending each congestion message.
  • the time information may be, for example, the absolute time or the relative time of sending the congestion message.
  • the congestion control is performed by the forwarding chip, it is sufficient to record the time information of sending the congested message. If the congestion control is performed by the coprocessor, the forwarding chip records the time information for sending the congestion message, and then sends the time information to the coprocessor.
  • S104 Obtain a first congestion notification message according to the time information of one or more congestion messages in the first data stream.
  • the first congestion notification message is used to perform processing on messages that are congested for more than a first set time interval. Congestion notification.
  • the destination device when the CE message arrives, the destination device immediately sends a CNP message to notify the source device that the data stream is congested.
  • the source device when the source device receives a CNP packet, the source device reduces the sending rate and updates the rate reduction factor.
  • the source device will increase the sending rate according to a certain algorithm when it does not receive a CNP message for a continuous period of time.
  • the time interval for the data stream to obtain the corresponding CNP packet may be greater than the time interval for the source device to increase the sending rate, which will lead to congestion.
  • the state of the data stream is processed for rate increase, so that the rate control convergence fails, which affects the efficiency of message transmission.
  • the destination device will not feed back the CNP message, and the network device will obtain the first CNP message.
  • a CNP message. Obtaining the first CNP message may be a CNP message that was previously generated and stored in a network device, or it may be a CNP message that is generated in real time. The content of the CNP message is the same as the content of the CNP message fed back by the destination device.
  • the destination device If there is a packet that is congested for more than the first set time interval, since the destination device has not received the CE packet, it will not feed back the CNP packet to the source. If the source device does not receive a CNP packet for a continuous period of time, it will increase the sending rate according to a certain algorithm. After the network device obtains the first CNP message, it sends the first CNP message to the source-end device and informs the source-end device that the data stream is congested and cannot increase the speed of the data stream, which can effectively Control the rate of data flow and improve the efficiency of message transmission.
  • a congestion control method by obtaining time information of congested packets in a data stream, if the data stream is congested, congestion notification is performed for packets that are congested for more than a certain time interval to avoid data When the flow is congested, it is still processed for rate increase, which improves the efficiency of message transmission.
  • FIG. 5 is a schematic flowchart of another congestion control method provided by an embodiment of the application. As shown in FIG. 5, the congestion control method may include:
  • S201 Sampling and copying one or more first data streams in the first queue sent in the first time period at a set sampling rate, and obtaining mirrored data streams corresponding to the one or more first data streams, where , Each mirrored data stream includes one or more mirrored congestion packets.
  • the coprocessor can implement congestion identification and control.
  • the coprocessor copies the data stream in the forwarding chip to obtain a mirrored data stream.
  • a mirrored data stream includes one or more mirrored congestion packets.
  • Congestion identification and control can also be implemented by the forwarding chip, that is, the forwarding chip can identify and control the congestion of the forwarded data stream without duplicating the data stream. That is, copying the data stream is an optional step. This embodiment is described by taking the implementation of congestion recognition and control by the coprocessor as an example.
  • the data stream can be sampled at a set sampling rate. It is also possible to process all congested packets without sampling.
  • the congested packets received by the network equipment from the upstream enter the queue of its outgoing port.
  • the network device may include one or more queues.
  • One or more data streams can enter a queue and wait for transmission. Perform the above-mentioned congestion packet sampling and copy operations on each data stream in each queue.
  • the following steps S202 to S207 are to identify whether the queue is congested.
  • the coprocessor can intercept the header of the mirrored congestion message to obtain the transmission parameters of the mirrored congestion message.
  • the transmission parameters may include the number of congested packets and the length of congested packets. For example, if 4 data streams enter the first queue, the coprocessor receives a total of N CE mirrored congestion packets of the 4 mirrored data streams in 1s, and the length of each mirrored congestion packet is Mbit.
  • S203 Determine the congestion message rate of the queue according to the number of congestion messages and the length of the congestion message in one or more mirrored data flows in the first queue sent in the first time period.
  • the congestion packet rate of the queue can be determined.
  • the congestion packet rate of a queue refers to the rate of congestion packets entering the queue. For example, in the above example, it can be determined that the congestion packet rate of the queue is N*Mbit/s.
  • S204 Determine whether the congestion packet rate of the queue is greater than or equal to the first rate threshold, if yes, proceed to S205; otherwise, continue to perform S204 or proceed to S212.
  • the first rate threshold may be a first set ratio of the ROCE traffic configuration value of the queue, for example, the ROCE traffic configuration value of the queue*90%.
  • the ROCE traffic configuration value of the queue is the ROCE traffic value pre-configured for each queue.
  • the first queue is congested, and the first queue enters a congested state. It can be understood that the first queue is any queue in the network device, that is, the foregoing congestion identification operation can be performed on any queue.
  • S206 Determine whether the congestion packet rate of the queue is less than or equal to the second rate threshold, if yes, proceed to S207; otherwise, continue to execute S206.
  • the congestion message rate of the current queue After the first queue enters the congestion state, monitor the congestion message rate of the current queue. If the congestion message rate of the queue decreases, judge whether the congestion message rate of the queue is less than or equal to the second queue rate threshold. If the queue is congested If the message rate is less than or equal to the second rate threshold, for example, the congested message rate of the queue is less than or equal to the ROCE traffic configuration value of the queue * 60%, it is determined that the first queue is not congested, and the first queue exits the congested state; If the congestion packet rate is between the first rate threshold and the second rate threshold, continue to monitor whether the congestion packet rate of the queue is less than or equal to the second rate threshold.
  • the second rate threshold is less than the above-mentioned first rate threshold.
  • the second rate threshold may be a second set ratio of the ROCE traffic configuration value of the queue, for example, the above-mentioned ROCE traffic configuration value of the queue*60%.
  • the queue's congested packet rate is greater than or equal to the queue ROCE traffic configuration value * 90%, it is determined that the queue is congested; if the queue's congested packet rate is less than the queue ROCE traffic configuration value * 90%, And when it is greater than the queue ROCE traffic configuration value * 60%, there are still many packets sent in the queue and have not exited the congestion state; the queue can be determined only when the queue's congested packet rate is less than or equal to the queue ROCE traffic configuration value * 60% If no congestion occurs, exit the congested state.
  • one or more data streams of the source device are sent to the destination device through one or more network devices, and one data stream includes one or more congestion reports. Arts.
  • the outgoing port of the network device includes one or more queues, and each queue includes one or more data streams.
  • One or more of the network devices can be used to control queue congestion.
  • service flow 1 sent by source device 1 and service flow 2 sent by source device 2 are sent to the same destination device through this network device, and both service flow 1 and service flow 2 enter the network Queue 1 of the device.
  • Service flow 1 and service flow 2 respectively include several congestion messages. Congestion control can be performed by the forwarding chip or coprocessor of the network device.
  • the coprocessor obtains the mirrored packets of the CE packets in the service flow 1 and the service flow 2 included in the queue 1. Specifically, the coprocessor copies and forwards the CE message forwarded by the chip to obtain a mirrored message of the CE message. For example, the coprocessor obtains the mirrored packets of CE packets f1-1, f1-2, and f1-3 of service flow 1, and obtains the mirrored packets of CE packets f2-1, f2-2 of service flow 2. Arts. Further, based on a certain sampling ratio, the CE packets in the foregoing service flow 1 and service flow 2 may be sampled.
  • the coprocessor calculates the congestion packet rate of queue 1 according to the packet lengths of CE packets f1-1, f1-2, f1-3, f2-1, and f2-2 in queue 1, and then judges Whether the calculated congestion packet rate of the queue is greater than or equal to the first rate threshold, if the judgment result is yes, it is determined that queue 1 is congested; if the judgment result is no, then continue to judge the calculated congestion packet of the queue Whether the rate is greater than or equal to the first rate threshold. After determining that queue 1 is congested, determine whether the calculated congestion packet rate of the queue is less than or equal to the second rate threshold. If the result of the determination is yes, determine that queue 1 exits the congestion state; otherwise, continue to determine the calculated queue rate Whether the rate of congested packets is less than or equal to the second rate threshold.
  • the congestion needs to be controlled.
  • the following steps S208 to S211 are used to identify congestion control when the queue is congested. It is understandable that if congestion occurs in the first queue, all data streams in the queue are congested. The congestion control of the following steps S208 to S211 is required for each data stream.
  • the forwarding chip forwards each message in the first data stream, and can record the time information of sending each congestion message.
  • the time information may be, for example, the absolute time or the relative time of sending the congestion message.
  • the forwarding chip After the forwarding chip records the time information of sending the congestion packet, and sends the time information to the coprocessor, the coprocessor can obtain the time information of one or more mirrored congestion packets in each mirrored data stream.
  • S209 Determine whether the time interval between the current time and the time of sending the first congestion message is greater than or equal to the first set time interval, if yes, proceed to S210; otherwise, continue to execute S209.
  • the network device performs congestion notification for CE packets that are congested for more than the first set time interval.
  • the first message is the last message sent before the current time. That is, after the first message is sent, it is judged how long the message has not been sent.
  • a timer may be set in the coprocessor, and a timer is started when the first congestion message is sent, where the time period of the timer is the above-mentioned first set time interval. If the next congestion message of the first data flow is not sent when the timing time expires, the first CNP message is acquired.
  • Obtaining the first CNP message may be a CNP message that was previously generated and stored in a network device, or it may be a CNP message that is generated in real time.
  • the monitoring will be exited and the first CNP packet will be obtained; if the time interval between the current time in the current cycle and the time when the first congestion packet is sent is less than the first set Time interval, in the next cycle, continue to determine that the time interval between the current time and the time when the first congestion message is sent is greater than or equal to the first set time interval, until the current time and the time when the first congestion message is sent are monitored.
  • the time interval between is greater than or equal to the first set time interval.
  • the period is less than or equal to the first set time interval.
  • the time interval between the time when the second congestion message is sent and the time when the first congestion message is sent is greater than or equal to the first set time interval. If the time interval between the time of sending the second congestion message and the time of sending the first congestion message is greater than or equal to the first set time interval, the first CNP message is obtained, and the first congestion message is the second congestion message The last message of the text. That is, according to the time interval between sending congestion messages, it is determined whether the time that the CE message (the second message) is congested exceeds the first set time interval.
  • the network device marks the ECN mark in the newly added packet on the egress port queue.
  • two queue depth thresholds are specifically defined, namely Kmin and Kmax.
  • Kmin the number of messages passing through the queue All are marked with ECN marks
  • the queue depth is between Kmin and Kmax, the probability of being marked with ECN marks gradually increases with the queue depth.
  • the destination device receives a CE message with an ECN mark, it indicates that the network is congested, so the destination device transmits the network congestion information to the source device.
  • the network device when the network device finishes sending the CE message f2-1, it records the time of sending the CE message f2-1, and then monitors the current time and sends f2 through a timer or a set period Whether the time interval between the times of -1 is greater than or equal to the first set time interval. When it is determined that the time interval between the current time and the time when f2-1 is sent is greater than or equal to the first set time interval, and the network device does not send any CE message, the network device obtains the first CNP message.
  • the destination device When the destination device receives the CE message f2-1, it feeds back the second CNP message to the source device, and when it receives the CE message f2-2, it feeds back another second CNP message to the source device. It is understandable that the second CNP message does not have a one-to-one correspondence with the received CE message in time, and may only correspond in quantity.
  • the destination device After sending f2-1, if the CE message is not sent within the first set time interval, the destination device will not feed back the CNP message. If the source-end device does not receive a CNP packet within a certain period of time, it will increase the speed of the data stream. This is contrary to the fact that packets are congested in the queue of the network device, and therefore, it will cause the failure of speed control.
  • the network device generates or acquires the first CNP packet when the time interval between the current time and the time when the first packet is sent is greater than or equal to the first set time interval.
  • the above-mentioned first set time interval may be a rate-increasing interval of the data stream.
  • the network device may immediately send the first CNP message to the source device. Specifically, if the coprocessor starts a timer when sending the first congestion message, and when the above-mentioned timing time expires, it sends the first CNP message. Then the first CNP message may be sent before the second CNP message of the two CE messages, or between the second CNP messages of the two CE messages, or sent between the two CE messages. After the second CNP message.
  • the network device may also monitor whether the time interval between the current time and the time at which the first CNP message is acquired is greater than or equal to a second set time interval, which is based on sending The time interval between the time of the first congestion message and the time of receiving the second CNP message sent by the destination device is determined. If the time interval between the current time and the time when the first CNP message is acquired is greater than or equal to the second set time interval, the first CNP message is sent. That is, as shown in FIG. 9 another schematic diagram of queue congestion control, the coprocessor can estimate the time when the CE message f2-1 is sent to the destination device and the time when the second CNP message is received. After the CE message is sent After f2-1 and before sending the CE message f2-2, the first CNP message is sent, so that the first CNP message can be positioned between the second CNP messages of the above two CE messages as much as possible.
  • a second set time interval which is based on sending The time interval between the time of the first congestion message and the time of receiving the second CNP message
  • each data stream in the first queue is congested.
  • the following steps S212 to S217 are to identify whether the first data stream is congested.
  • the first data stream may be any data stream in the aforementioned queue.
  • the aforementioned transmission parameters may also include the sequence number of the congested message.
  • the sequence number of the congested message can be obtained from the header of the congested message.
  • the coprocessor can intercept the header of the mirrored congestion message to obtain the sequence number of the congestion message.
  • the degree of continuity of the sequence numbers of the congestion message may mean that the number of the sequence numbers of the congestion message is greater than the set value. Then, the number of consecutive sequence numbers of the congested packets in the first data stream is obtained.
  • the degree of continuity of the sequence numbers of the congested messages may also mean that the maximum value of the difference between the sequence numbers of the congested messages is less than the set value. Then, the maximum value of the difference between the sequence numbers of the congested packets in the first data stream is obtained.
  • the continuity degree of the sequence number of the congestion message corresponds to the continuity degree of the congestion message. Therefore, according to the continuity degree of the sequence number of the congestion message in the first data stream, it can be determined The degree of continuity of congested packets.
  • the degree of continuity of the sequence numbers of the congested packets in the first data stream may be obtained by sampling, and the degree of continuity of the sequence numbers of the congested packets in the first data stream obtained by the sampling may be used to indicate the degree of continuity of the congested packets. .
  • S213 Determine whether the continuity degree of the sequence numbers of the congested packets in the first data stream is greater than or equal to the first threshold, if yes, proceed to S214; otherwise, continue to execute S213.
  • the continuity of sequence numbers in a data stream exceeds a certain threshold, it means that the sent messages are all congested messages. It can be determined that the data stream is congested and the data stream enters the congested state.
  • the CE message of the forwarding chip received on the coprocessor: f1-1 f2-5 f1-3 f2-6 f1-4 f1-5 f1-6 f1-7, fi represents the i-th data stream.
  • f1-3 f1-4 f1-5 f1-6 f1-7 in the first data stream have consecutive sequence numbers (for example, the first threshold is 5 messages), then it is recognized that f1 is Congested flow.
  • the maximum value of the difference between the sequence numbers of congested packets is less than the set threshold, and it can also be considered that the sequence numbers of congested packets in the data stream are The degree of continuity is greater than the set threshold, and the degree of continuity of congested packets can be determined, so as to determine that the data stream is congested. For example, if a network device sends 100 packets, only one of the difference between the sequence numbers of congested packets has a maximum value of 2, and the remaining sequence numbers have a difference of 1, then it can be determined that the network device sends 100 packets. 98 of these messages are all congested messages, it can be determined that the data stream is congested.
  • the source device 1 and the source device 2 send the data stream to the same destination device.
  • the congestion identification can be performed on the network device 1 or the end network device (network device 3) close to the destination device.
  • the coprocessor of the network device 3 obtains three consecutive packets f1-1, f1-2, and f1-3 of the data stream f1, and the number of packets corresponding to the consecutive sequence numbers is greater than or equal to 3. It can be determined that the data stream f1 is congested, and the data stream f1 enters the congested state.
  • the first data stream After the first data stream enters the congested state, it is necessary to monitor whether the first data stream is always in the congested state or exits the congested state, and the continuity degree of the sequence numbers of the congested packets in the first data stream can be obtained again.
  • continuity of sequence numbers in a data stream is less than a certain threshold, it means that the sent messages are all ordinary messages, and it can be determined that the data stream has exited the congested state.
  • the maximum value of the difference between the sequence numbers of congested packets is greater than a set threshold, and it can also be considered that the sequence numbers of congested packets in the data stream are different from each other.
  • the degree of continuity is less than the set threshold, so it is determined that the data stream is not congested.
  • S216 Determine whether the continuity degree of the sequence numbers of the congested packets in the first data stream is less than or equal to the second threshold, if yes, proceed to S217; otherwise, continue to execute S216.
  • the degree of continuity of the sequence numbers in the sent first data stream is less than a certain threshold, that is, the sent first data stream has fewer congestion packets, it can be determined that the data stream is not congested, and the data stream exits the congestion state.
  • the CE message of the forwarding chip obtained on the coprocessor: f1-1 f2-5 f1-3 f2-7 f1-4 f2-9 f1-5 f1-6 f1-7, fi means the i-th data stream .
  • the f2-5 f2-7 f2-9 messages in the data stream f2 are not continuous, and 2 messages are missing in the total of 5 messages from f2-5 to f2-9, then it is recognized that f2 is not Congestion occurs.
  • S208-S211 may also be congestion control only on the first data stream.
  • Figure 11 a schematic diagram of flow congestion control.
  • the network device obtains the first CNP message.
  • the destination device receives the CE message f2-1, it feeds back the second CNP message to the source device, and when it receives the CE message f2-2, it feeds back another second CNP message to the source device.
  • the second CNP message does not have a one-to-one correspondence with the received CE message in time, and may only correspond in quantity.
  • the source-end device If the source-end device does not receive a CNP packet within a certain period of time, it will increase the speed of the data stream. This is contrary to the fact that packets are congested in the queue of the network device, and therefore, it will cause the failure of speed control. Therefore, the network device generates or acquires the first CNP packet when the time interval between the current time and the time when the first packet is sent is greater than or equal to the first set time interval.
  • the above-mentioned first set time interval may be a rate-increasing interval of the data stream.
  • the network device may immediately send the first CNP message to the source device. Then the first CNP message may be sent before the second CNP message of the two CE messages, or between the second CNP messages of the two CE messages, or sent between the two CE messages. After the second CNP message.
  • the network device can also monitor whether the time interval between the current time and the time when the first CNP message is acquired is greater than or equal to a second set time interval, which is based on the time when the first congestion message is sent The time interval between the time and the time when the second CNP packet sent by the destination device is received is determined. If the time interval between the current time and the time when the first congestion message is sent is greater than or equal to the second set time interval, the first CNP message is sent. That is, as shown in another schematic diagram of flow congestion control as shown in Figure 12, the coprocessor can estimate the time when the CE message f2-1 is sent to the destination device and the time when the second CNP message is received. After the CE message is sent After f2-1 and before sending the CE message f2-2, the first CNP message is sent, so that the first CNP message can be positioned between the second CNP messages of the above two CE messages as much as possible.
  • queue congestion identification and control, and flow congestion identification and control may not have an order of execution, and congestion identification and control may only be performed on data streams, or only congestion identification and control may be performed on queues.
  • a congestion control method by obtaining time information of congested packets in a data stream, if the data stream is congested, congestion notification is performed for packets that are congested for more than a certain time interval to avoid data When the flow is congested, it is still processed for rate increase, which improves the efficiency of message transmission.
  • an embodiment of the present application further provides a congestion control device 1000, which includes: a first acquiring unit 11, a second acquiring unit 12, and a first sending unit 13 , It may also include a sampling and copying unit 14, a fifth acquiring unit 15, and a first determining unit 16 (represented by dotted lines in the figure). in:
  • the first acquiring unit 11 is configured to acquire time information of one or more congestion messages in the first data stream sent, and the one or more congestion messages carry a congestion flag;
  • the second acquiring unit 12 is configured to If congestion occurs in the first data stream, a first congestion notification packet congestion notification message is obtained according to the time information of one or more congestion messages in the first data stream, and the first congestion notification message is used To perform a congestion notification on a message that is congested for more than a first set time interval; and the first sending unit 13 is configured to send the first congestion notification message.
  • the second obtaining unit 12 is configured to obtain the first congestion notification if the time interval between the current time and the time when the first congestion message is sent is greater than or equal to the first set time interval Message, wherein the first congestion message is the last message of the first data stream sent before the current time.
  • the second acquiring unit 12 includes: a starting unit, configured to start a timer when the first congestion message is sent, wherein the timing time of the timer is the first setting A predetermined time interval; and a third acquiring unit, configured to acquire the first congestion notification message if the next message of the first data stream is not sent when the timing time arrives.
  • the second acquiring unit 12 includes: a first monitoring unit configured to monitor whether the time interval between the current time and the time at which the first congestion message is sent is greater than or equal to all according to a set period. The first set time interval; and a fourth obtaining unit, configured to obtain the first set time interval if the time interval between the current time and the time at which the first congestion message is sent is greater than or equal to the first set time interval The first congestion notification message.
  • the second acquiring unit 12 is configured to, if the time interval between the time when the second congestion message is sent and the time when the first congestion message is sent is greater than or equal to the first set time interval, Acquire the first congestion notification message, where the first congestion message is the previous message of the second congestion message, and the first congestion message and the second congestion message correspond to the first congestion message One data stream.
  • the first sending unit 13 includes: a second monitoring unit, configured to monitor whether the time interval between the current time and obtaining the first congestion notification message is greater than or equal to a second set time interval And a second sending unit, configured to send the first congestion notification message if the time interval between the current time and obtaining the first congestion notification message is greater than or equal to the second set time interval, wherein The second set time interval is determined according to the time interval between the time when the first congestion message is sent and the time when the second congestion notification message sent by the destination device is received.
  • the first sending unit 13 is configured to send the first congestion notification message immediately after acquiring the first congestion notification message.
  • the device further includes: a fifth obtaining unit 15 configured to obtain transmission parameters of one or more congestion packets in the first data stream; and a first determining unit 16 configured to obtain data according to The transmission parameters of the one or more congestion packets determine that the first data stream is congested.
  • the device further includes: a sampling and copying unit 14 configured to sample and copy one or more congestion packets of the first data stream at a set sampling rate to obtain the first data The mirrored data stream corresponding to the stream, wherein the mirrored data stream includes one or more mirrored packets.
  • a sampling and copying unit 14 configured to sample and copy one or more congestion packets of the first data stream at a set sampling rate to obtain the first data The mirrored data stream corresponding to the stream, wherein the mirrored data stream includes one or more mirrored packets.
  • the transmission parameters include the number of messages and the length of the messages
  • the first determining unit 16 includes: a second determining unit configured to send according to one or The number of congestion messages and the length of the congestion messages of the multiple first data flows determine the congestion message rate of the queue; and a third determining unit is configured to determine if the congestion message rate of the queue is greater than or equal to the first rate The threshold determines that the first queue is congested and the first queue enters the congested state.
  • the device further includes: a fourth determining unit, configured to determine that the first queue is not congested if the congested message rate of the queue is less than or equal to a second rate threshold, and the first queue The queue exits the congested state, wherein the second rate threshold is less than the first rate threshold.
  • the transmission parameter includes a sequence number of one or more congestion packets in the first data stream
  • the first determining unit 16 includes: a sixth acquiring unit configured to acquire the first data stream; The degree of continuity of the sequence numbers of congested packets in a data stream; the fifth determining unit is configured to determine that if the degree of continuity is greater than or equal to a first threshold, the first data stream is congested and the first data stream enters Congestion state.
  • the device further includes: a sixth determining unit, configured to determine that the first data stream is not congested if the continuity degree is less than or equal to a second threshold, and the first data stream exits the station. For the congestion state, the second threshold is less than the first threshold.
  • a congestion control device by acquiring time information of congested packets in a data stream, if the data stream is congested, congestion notification is performed for packets that are congested for more than a certain time interval to avoid data When the flow is congested, it is still processed for rate increase, which improves the efficiency of message transmission.
  • an embodiment of the present application further provides another congestion control device.
  • the congestion control device 2000 includes a processor 21 and a physical interface 22.
  • the number of processors 21 may be one or more.
  • the processor 21 includes a central processing unit, a network processor, a graphics processing unit (GPU), an application specific integrated circuit, a programmable logic device (PLD) or any combination thereof.
  • the above-mentioned PLD may be a complex programmable logic device, a field programmable gate array, a general array logic or any combination thereof.
  • the processor 21 may include a control plane 211 and a forwarding plane 212.
  • the forwarding plane 212 may specifically include a forwarding chip A.
  • the forwarding chip A receives the service message from the source device and forwards it to the destination device; and in one implementation, the forwarding chip A is also used to implement the embodiment of the present application Congestion control.
  • the forwarding plane 212 may also include a coprocessor B.
  • the coprocessor B copies the CE message from the forwarding chip A, that is, obtains a mirrored message of the CE message, and implements the Congestion control.
  • the control plane 211 and the forwarding plane 212 can be implemented by independent circuits, or can be integrated into one circuit.
  • the processor 21 is a multi-core CPU.
  • One or some of the multiple cores implement the control plane 211, and the other cores implement the forwarding plane 212.
  • the control plane 211 is implemented by a CPU
  • the forwarding plane 212 is implemented by a network processor (NP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or It can be realized in any combination.
  • the congestion control device is a frame-type network device
  • the control plane 211 is implemented by the main control card
  • the forwarding plane 212 is implemented by the line card.
  • both the control plane 211 and the forwarding plane 212 are implemented by NPs with control plane capabilities.
  • the physical interface 22 is used to send and receive service messages and send CNP messages. Specifically, the physical interface 22 is used to receive service packets from the source end device and forward the service packets to the destination end device; and receive CNP packets from the destination end device or obtain CNP packets from itself, and send them to the source end equipment.
  • the number of physical interfaces 22 may be one or more.
  • the physical interface 22 may include a wireless interface and/or a wired interface.
  • the wireless interface may include a wireless local area network (WLAN) interface, a Bluetooth interface, a cellular network interface, or any combination thereof.
  • the wired interface may include an Ethernet interface, an asynchronous transfer mode interface, a fiber channel interface, or any combination thereof.
  • the Ethernet interface can be an electrical interface or an optical interface.
  • the physical interface 22 does not necessarily include (although usually includes) an Ethernet interface.
  • a congestion control device by acquiring time information of congested packets in a data stream, if the data stream is congested, congestion notification is performed for packets that are congested for more than a certain time interval to avoid data When the flow is congested, it is still processed for rate increase, which improves the efficiency of message transmission.
  • This application also provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
  • the computer program When the computer program is executed by a computer, the computer executes the steps performed by the network device in any of the above-mentioned method embodiments. And/or processing.
  • the computer program product includes computer program code.
  • the computer program code runs on a computer, the computer executes the steps and steps performed by the network device in any of the above-mentioned method embodiments. /Or processing.
  • the disclosed system, device, and method can be implemented in other ways.
  • the division of the unit is only a logical function division. In actual implementation, there can be other divisions.
  • multiple units or components can be combined or integrated into another system, or some features can be ignored or not. implement.
  • the displayed or discussed mutual coupling, or direct coupling, or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions can be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions can be sent from a website, computer, server, or data center to another via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) A website, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium can be read-only memory (ROM), or random access memory (RAM), or magnetic media, such as floppy disks, hard disks, magnetic tapes, magnetic disks, or optical media, for example, Digital versatile disc (DVD), or semiconductor media, for example, solid state disk (SSD), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种拥塞控制方法及装置。在本申请中,网络设备获取发送的第一数据流中的一个或多个拥塞报文的时间信息,该一个或多个拥塞报文携带有显示拥塞通知的标记;若该第一数据流发生拥塞时,根据该第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知报文,该第一拥塞通知报文用于对被拥塞超过第一设定时间间隔的报文进行拥塞通知;以及发送该第一拥塞通知报文。采用本申请的方案,可以避免数据流被拥塞时仍然被做速率升速处理,提高了报文传输效率。

Description

拥塞控制方法及装置
本申请要求于2020年01月23日提交中国国家知识产权局、申请号为202010077053.9、发明名称为“拥塞控制方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种拥塞控制方法及装置。
背景技术
为了减少网络传输中服务器端数据处理的延迟,利用远程直接数据存储(remote direct memory access,RDMA)技术进行数据传输,其允许客户端的应用程序直接远程读取和写入服务器端的内存。RDMA技术中在端节点的网卡(network interface cards,NICs)上通过已注册的缓存直接收发数据,网络协议全部部署在NICs上,不需要经过主机的网络协议栈,这种方式显著减少了主机中的中央处理器(central processing unit,CPU)的占有率和整体时延。应用于聚合以太网的RDMA(RDMA over converged ethernet,RoCE)协议包括两个版本:RoCEv1和RoCEv2,二者的主要区别是RoCEv1是基于以太网链路层实现的RDMA协议,RoCEv2是基于以太网中的传输控制协议/因特网互联协议(transmission control protocol/internet protocol,TCP/IP)协议中的用户数据报协议(user datagram protocol,UDP)层实现的RDMA协议。
在部署了满足高吞吐量、超低时延和低CPU开销需求的网络协议后,需要找到一个拥塞控制算法以使网络无丢包可靠传输,因此提出了数据中心量化拥塞通知(data center quantized congestion notification,DCQCN)。DCQCN规定当目的端设备接收到携带显示拥塞通知(explicit congestion notification,ECN)标记的报文(即拥塞发生(congestion encountered,CE)报文)时,表示网络拥塞,因此目的端设备将该网络拥塞信息传递给源端设备。RoCEv2协议为此定义了显式的拥塞通知包(congestion notification packet,CNP)报文。如果CE报文到达某个流,并且在过去的n微秒内目的端设备没有为该流发送过CNP报文,则目的端设备立即发送一个CNP报文。即如果在时间窗口(n微妙)内到达某个流的有多个CE报文,则源端设备每n微秒最多为该流生成一个CNP报文。在目的端设备上,当目的端设备收到一个CNP报文时,目的端设备减小发送速率,并更新速率降低因子。目的端设备还会在连续一段时间内未收到CNP报文时,按照一定的算法增加发送速率。
然而,当流的规模较大时,每条流能分到的平均带宽较小,容易发生拥塞。如图1所示的DCQCN控速失败的示意图,对于发生拥塞的流,可能存在各条流的报文时间间隔(即该流能获得CNP报文的最小时间间隔)大于速率升速的时间间隔,由于发生拥塞,源端设备未收到CNP报文,这会导致处于拥塞状态的流被做速率升速处理,从而出现控速收敛失败的情况,影响报文传输效率。
发明内容
本申请提供一种拥塞控制方法及装置,以避免数据流被拥塞时仍然被做速率升速处理,提高报文传输效率。
第一方面,提供了一种拥塞控制方法,所述方法包括:获取发送的第一数据流中的一个或多个拥塞报文的时间信息,所述一个或多个拥塞报文携带有拥塞标记。若所述第一数据流发生拥塞,根据所述第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知报文,所述第一拥塞通知报文用于对被拥塞超过第一设定时间间隔的报文进行拥塞通知。发送所述第一拥塞通知报文。可以避免数据流被拥塞时仍然被做速率升速处理,提高了报文传输效率。该拥塞标记可以是ECN标记,本申请对拥塞标记的名称不做限定。该拥塞通知报文可以是CNP报文,本申请对拥塞通知报文的名称不做限定。
在一个实现中,所述若所述第一数据流发生拥塞时,根据所述第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知报文,包括:若当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,其中,所述第一拥塞报文为在所述当前时间之前发送的第一数据流的最后一个报文。若发送一个拥塞报文之后超过一定时间间隔仍未发送新的拥塞报文,需要向源端设备发送补充的拥塞通知报文,以避免该数据流被源端设备进行升速处理。该时间间隔可以是数据流的报文升速间隔。
在又一个实现中,所述若当前时间与发送第一拥塞报文的时间的时间间隔大于或等于第一设定时间间隔,获取第一拥塞通知报文,包括:发送完所述第一拥塞报文时,启动计时器,其中,所述计时器的计时时间为所述第一设定时间间隔;以及若所述计时时间到达时,未发送所述第一数据流的下一个报文,获取所述第一拥塞通知报文。通过计时器从发送上一个拥塞报文开始进行计时,在计时时间到达时,若尚未发送下一个报文,可以向源端设备发送补充的拥塞通知报文,以避免该数据流被源端设备进行升速处理。
在又一个实现中,所述若当前时间与发送第一拥塞报文的时间的时间间隔大于或等于第一设定时间间隔,获取第一拥塞通知报文,包括:根据设定周期监测当前时间与发送所述第一拥塞报文的时间之间的时间间隔是否大于或等于所述第一设定时间间隔。若当前时间与发送所述第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文。通过周期性地监测在发送完上一个拥塞报文开始,是否有发送下一个报文,若尚未发送下一个报文,可以向源端设备发送补充的拥塞通知报文,以避免该数据流被源端设备进行升速处理。
在又一个实现中,所述若所述第一数据流发生拥塞时,根据所述第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知报文,包括:若发送第二拥塞报文的时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,所述第一拥塞报文为所述第二拥塞报文的上一个报文,所述第一拥塞报文和所述第二拥塞报文对应所述第一数据流。若发送两个拥塞报文之间的时间间隔过大,也可以向源端设备发送补充的拥塞通知报文,以避免该数据流被源端设备进行升速处理。
在又一个实现中,所述发送第一拥塞通知报文,包括:监测当前时间与获取所述第一拥塞通知报文之间的时间间隔是否大于或等于第二设定时间间隔。若当前时间与获取所述第一拥塞通知报文之间的时间间隔大于或等于所述第二设定时间间隔,发送所述第一拥塞 通知报文。可以在获取到补充的拥塞通知报文后,延迟一定时间发送该补充的拥塞通知报文。可以使得该补充的拥塞通知报文尽量在上一个发送的拥塞报文的拥塞通知报文之后,以及在下一个发送的拥塞报文的拥塞通知报文之前。然而,拥塞通知报文与拥塞报文之间并无对应关系,每个拥塞通知报文的内容都相同,只是发送拥塞报文和发送拥塞通知报文的数量上可以有一定的对应关系。上述第二设定时间间隔可以是根据发送所述第一拥塞报文的时间与接收到目的端设备发送的第二拥塞通知报文的时间之间的时间间隔确定的。
在又一个实现中,所述发送第一拥塞通知报文,包括:获取所述第一拥塞通知报文后,即时发送所述第一拥塞通知报文。即时发送拥塞通知报文,可以使得源端设备及时地接收到拥塞通知,从而及时进行降速处理。
在又一个实现中,所述方法还包括:获取所述第一数据流中的一个或多个拥塞报文的传输参数。根据所述一个或多个拥塞报文的传输参数,确定所述第一数据流发生拥塞。根据拥塞报文的传输参数可以判断数据流是否发生拥塞。
在又一个实现中,所述方法还包括:以设定采样率对所述第一数据流的一个或多个拥塞报文进行采样和复制,获取所述第一数据流对应的镜像数据流,其中,所述镜像数据流包括一个或多个镜像报文。可以由转发芯片进行拥塞识别和控制,也可以新增一个协处理器,协处理器从转发芯片采样和复制数据流中的拥塞报文,进行拥塞识别和控制。
在又一个实现中,所述传输参数包括报文数量和报文长度,所述根据所述一个或多个拥塞报文的传输参数,确定所述第一数据流发生拥塞,包括:根据第一时间段内发送的第一队列中的一个或多个所述第一数据流的拥塞报文数量和拥塞报文长度,确定队列的拥塞报文速率。若所述队列的拥塞报文速率大于或等于第一速率阈值,确定所述第一队列发生拥塞,所述第一队列进入拥塞状态。可以根据一个队列中发送的一个或多个数据流的报文数量和报文长度,准确地确定该队列的拥塞报文速率,从而判断该队列是否发生拥塞。队列发生拥塞,则该队列中的数据流都发生拥塞。
在又一个实现中,所述方法还包括:若所述队列的拥塞报文速率小于或等于第二速率阈值,确定所述第一队列未发生拥塞,所述第一队列退出所述拥塞状态,其中,所述第二速率阈值小于所述第一速率阈值。队列发生拥塞后,在队列的拥塞报文速率小于一定值,可以判断该队列退出拥塞状态。然而,拥塞报文速率位于第一速率阈值和第二速率阈值之间时,该队列仍处于拥塞状态,需要对该队列进行拥塞控制。
在又一个实现中,所述传输参数包括所述第一数据流中的一个或多个拥塞报文的序列号,所述根据所述一个或多个拥塞报文的传输参数,确定所述第一数据流发生拥塞,包括:获取所述第一数据流中拥塞报文的序列号的连续程度。若所述连续程度大于或等于第一阈值,确定所述第一数据流发生拥塞,所述第一数据流进入拥塞状态。对于单个数据流,报文头中都携带有报文的序列号,通过获取数据流中的拥塞报文的序列号的连续程度,可以确定拥塞报文的连续程度,从而判断该数据流是否发生拥塞。例如,拥塞报文的序列号的连续程度可以是指拥塞报文的连续的序列号的数量,相应的,拥塞报文的连续程度可以是指拥塞报文的连续的序列号对应的报文的数量。该拥塞报文的连续的序列号的数量可以是采样的该数据流中的连续的序列号的数量。又例如,拥塞报文的序列号的连续程度可以是指拥塞报文的序列号之间的差值,网络设备发送出去的一定数量的报文中,拥塞报文的序 列号之间的差值小于设定阈值,则也可以认为该数据流中的拥塞报文的序列号的连续程度大于设定阈值,可以确定拥塞报文的连续程度,从而判断该数据流发生拥塞。
在又一个实现中,所述方法还包括:若所述连续程度小于或等于第二阈值,确定所述第一数据流未发生拥塞,所述第一数据流退出所述拥塞状态,第二阈值小于上述第一阈值。若拥塞报文的序列号的连续程度小于一定阈值,则可以判断该数据流退出拥塞状态。
第二方面,提供了一种拥塞控制装置,包括:第一获取单元,用于获取发送的第一数据流中的一个或多个拥塞报文的时间信息,所述一个或多个拥塞报文携带有拥塞标记;第二获取单元,用于若所述第一数据流发生拥塞时,根据所述第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知包拥塞通知报文,所述第一拥塞通知报文用于对被拥塞超过第一设定时间间隔的报文进行拥塞通知;以及第一发送单元,用于发送所述第一拥塞通知报文。
在一个实现中,所述第二获取单元用于若当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,其中,所述第一拥塞报文为在所述当前时间之前发送的第一数据流的最后一个报文。
在又一个实现中,所述第二获取单元包括:启动单元,用于发送完所述第一拥塞报文时,启动计时器,其中,所述计时器的计时时间为所述第一设定时间间隔;以及第三获取单元,用于若所述计时时间到达时,未发送所述第一数据流的下一个报文,获取所述第一拥塞通知报文。
在又一个实现中,所述第二获取单元包括:第一监测单元,用于根据设定周期监测当前时间与发送所述第一拥塞报文的时间之间的时间间隔是否大于或等于所述第一设定时间间隔;以及第四获取单元,用于若当前时间与发送所述第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文。
在又一个实现中,所述第二获取单元用于若发送第二拥塞报文的时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,所述第一拥塞报文为所述第二拥塞报文的上一个报文,所述第一拥塞报文和所述第二拥塞报文对应所述第一数据流。
在又一个实现中,所述第一发送单元包括:第二监测单元,用于监测当前时间与获取所述第一拥塞通知报文之间的时间间隔是否大于或等于第二设定时间间隔;以及第二发送单元,用于若当前时间与获取所述第一拥塞通知报文之间的时间间隔大于或等于所述第二设定时间间隔,发送所述第一拥塞通知报文,其中,所述第二设定时间间隔是根据发送所述第一拥塞报文的时间与接收到目的端设备发送的第二拥塞通知报文的时间之间的时间间隔确定的。
在又一个实现中,所述第一发送单元用于获取所述第一拥塞通知报文后,即时发送所述第一拥塞通知报文。
在又一个实现中,所述装置还包括:第五获取单元,用于获取所述第一数据流中的一个或多个拥塞报文的传输参数;以及第一确定单元,用于根据所述一个或多个拥塞报文的传输参数,确定所述第一数据流发生拥塞。
在又一个实现中,所述装置还包括:采样复制单元,用于以设定采样率对所述第一数 据流的一个或多个拥塞报文进行采样和复制,获取所述第一数据流对应的镜像数据流,其中,所述镜像数据流包括一个或多个镜像报文。
在又一个实现中,所述传输参数包括报文数量和报文长度,所述第一确定单元包括:第二确定单元,用于根据第一时间段内发送的第一队列中的一个或多个所述第一数据流的拥塞报文数量和拥塞报文长度,确定队列的拥塞报文速率;以及第三确定单元,用于若所述队列的拥塞报文速率大于或等于第一速率阈值,确定所述第一队列发生拥塞,所述第一队列进入拥塞状态。
在又一个实现中,所述装置还包括:第四确定单元,用于若所述队列的拥塞报文速率小于或等于第二速率阈值,确定所述第一队列未发生拥塞,所述第一队列退出所述拥塞状态,其中,所述第二速率阈值小于所述第一速率阈值。
在又一个实现中,所述传输参数包括所述第一数据流中的一个或多个拥塞报文的序列号,所述第一确定单元包括:第六获取单元,用于获取所述第一数据流中拥塞报文的序列号的连续程度;第五确定单元,用于若所述连续程度大于或等于第一阈值,确定所述第一数据流发生拥塞,所述第一数据流进入拥塞状态。
在又一个实现中,所述装置还包括:第六确定单元,用于若所述连续程度小于或等于第二阈值,确定所述第一数据流未发生拥塞,所述第一数据流退出所述拥塞状态,第二阈值小于上述第一阈值。
第三方面,提供了一种拥塞控制装置,包括处理器和物理接口,所述处理器用于执行上述第一方面或第一方面的任一个实现所述的方法。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一个实现所述的方法。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一个实现所述的方法。
附图说明
图1为DCQCN控速失败的示意图;
图2为实现本申请实施例的拥塞控制方法的一个通信系统的结构示意图;
图3为本申请实施例的拥塞控制方法适用的一个典型场景示意图;
图4为本申请实施例提供的一种拥塞控制方法的流程示意图;
图5为本申请实施例提供的又一种拥塞控制方法的流程示意图;
图6为识别队列发生拥塞的示意图;
图7为根据队列深度识别拥塞的示意图;
图8为队列拥塞控制的示意图;
图9为又一队列拥塞控制的示意图;
图10为识别数据流发生拥塞的示意图;
图11为流拥塞控制的示意图;
图12为又一流拥塞控制的示意图;
图13为本申请实施例提供的一种拥塞控制装置的结构示意图;
图14为本申请实施例提供的又一种拥塞控制装置的结构示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。
图2为实现本申请实施例的拥塞控制方法的一个通信系统示例,如图2所示,该通信系统可以包括源端设备100、1~n个网络设备200以及目的端设备300,n≥1。其中,源端设备是指数据流的发送端,目的端设备是指数据流的接收端,网络设备可以是交换机等。本申请实施例由网络设备实现拥塞控制。在实现基于DCQCN的拥塞控制算法的通信系统中,上述源端设备也可以称为反应点(reaction point,RP),上述网络设备也可以称为拥塞点(congestion point,CP),上述目的端设备也可以称为通知点(notification point,NP)。
图3为本申请拥塞控制方法适用的一个典型场景示例,如图3所示,该场景为数据中心网络,其可应用于高性能计算、高性能分布式存储、大数据、人工智能等。在该数据中心网络中,服务器通过叶子节点,还可以通过根节点,与其它服务器之间进行数据传输。其数据传输采用RDMA技术。其中,一般地,服务器与叶子节点的传输速率低于叶子节点与根节点的传输速率,例如服务器与叶子节点的传输速率为25G/s,叶子节点与根节点的传输速率为100G/s。具体地,该数据中心网络可以是CLOS网络,CLOS网络包括服务器、核心交换机和接入交换机,服务器通过接入交换机和核心交换机接入网络,源端设备可以是该网络中的一个或多个服务器,目的端设备可以是该网络中的另一个或多个服务器,网络设备可以是该网络中的接入交换机和核心交换机。
本申请实施例提供一种拥塞控制方法及装置,通过获取数据流中的拥塞报文的时间信息,若该数据流发生拥塞,对于被拥塞超过一定时间间隔的报文进行拥塞通知,以避免数据流被拥塞时仍然被做速率升速处理,提高了报文传输效率。
图4为本申请实施例提供的一种拥塞控制方法的流程示意图,如图4所示,该拥塞控制方法可以包括:
S101、获取发送的第一数据流中的一个或多个拥塞报文的传输参数。
该第一数据流是指经过一个或多个网络设备由源端设备发送给目的端设备的数据流。该第一数据流可以是一个数据流,也可以是源和目的相同的不同业务类型的多个数据流,即汇聚流。本实施例的拥塞识别和拥塞控制可以是由源端设备和目的端设备之间的任一个网络设备或每个网络设备执行。具体地,本实施例的拥塞识别和拥塞控制可以由网络设备的转发芯片执行,也可以由一个协处理器执行。如果是由转发芯片执行,则可以无需其他操作,对转发的数据流进行拥塞识别和控制,时效性较高。如果是由协处理器执行,则可以对转发芯片转发的拥塞报文进行采样和复制,然后基于复制得到的镜像数据流进行拥塞识别和控制。
第一数据流中可以包括拥塞报文和普通报文。每个报文承载第一数据流中的相应的内容。拥塞报文是指被打上ECN标记的报文。普通报文是指未被打上ECN标记的报文。本实施例中,网络设备针对的是第一数据流中的拥塞报文。
报文采用相应的传输参数在各个网络设备之间传输。该传输参数包括传输速率、报文 长度、队列深度等。该传输参数对报文能否顺利传输到目的端设备具有十分重要的意义。因此,需要获取第一数据流中的各个拥塞报文的传输参数。
S102、根据一个或多个拥塞报文的传输参数,确定第一数据流发生拥塞。
网络设备从上游接收到的拥塞报文进入其出端口的队列中。网络设备可以包括一个或多个队列。一个或多个数据流可以进入到一个队列中排队等待发送。一个队列具有一定的深度,根据拥塞报文的传输参数,判断如果一个队列中发送的拥塞报文过多,则可确定造成该队列或该队列中的某个数据流发生拥塞。
上述S101和S102为可选的步骤,图中以虚线表示。即可以在执行下述的拥塞控制流程前确定数据流是否发生拥塞,也可以在知晓数据流已经发生拥塞的前提下,执行下述的拥塞控制流程。
S103、获取发送的第一数据流中的一个或多个拥塞报文的时间信息,一个或多个拥塞报文携带有拥塞标记。
网络设备转发第一数据流中的各个拥塞报文,可以记录发送各个拥塞报文的时间信息。该时间信息例如可以是发送拥塞报文的绝对时间或相对时间。
如果由转发芯片执行拥塞控制,则记录发送拥塞报文的时间信息即可。如果由协处理器执行拥塞控制,则转发芯片记录发送拥塞报文的时间信息后,发送该时间信息给协处理器。
S104、根据第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知报文,第一拥塞通知报文用于对被拥塞超过第一设定时间间隔的报文进行拥塞通知。
在目的端设备上,当CE报文到达时,目的端设备立即发送一个CNP报文,以通知源端设备数据流发生拥塞。在源端设备上,当源端设备收到一个CNP报文时,源端设备减小发送速率,并更新速率降低因子。此外源端设备还会在连续一段时间内未收到CNP报文时,按照一定的算法增加发送速率。但是,如果数据流的规模较大时,对于发生拥塞的数据流,可能存在数据流获得对应CNP报文的时间间隔大于源端设备进行发送速率升速的时间间隔的情况,这会导致处于拥塞状态的数据流被做速率升速处理,从而出现控速收敛失败的情况,影响报文传输效率。
因此,在本实施例中,若确定第一数据流发生拥塞,则根据上述获取的第一数据流中的各个拥塞报文的时间信息,若存在被拥塞超过第一设定时间间隔的报文,获取第一CNP报文。该第一CNP报文用于对被拥塞超过第一设定时间间隔的报文进行拥塞通知,即对于未发送出去的CE报文,目的端设备不会反馈CNP报文,则网络设备获取第一CNP报文。获取第一CNP报文,可以是获取之前生成并存储在网络设备中的CNP报文,也可以是实时生成CNP报文。该CNP报文的内容与目的端设备反馈的CNP报文内容相同。
S105、发送第一拥塞通知报文。
若存在被拥塞超过第一设定时间间隔的报文,由于目的端设备未接到该CE报文,不会向源端反馈CNP报文。而如果源端设备在连续一段时间内未收到CNP报文时,会按照一定的算法增加发送速率。网络设备在获取到第一CNP报文后,向源端设备发送该第一CNP报文,以及时通知源端设备该数据流被拥塞,不能对该数据流做升速处理,从而可以有效地控制数据流的速率,提高报文传输效率。
根据本申请实施例提供的一种拥塞控制方法,通过获取数据流中的拥塞报文的时间信息,若该数据流发生拥塞,对于被拥塞超过一定时间间隔的报文进行拥塞通知,以避免数据流被拥塞时仍然被做速率升速处理,提高了报文传输效率。
图5为本申请实施例提供的又一种拥塞控制方法的流程示意图,如图5所示,该拥塞控制方法可以包括:
S201、以设定采样率对在第一时间段内发送的第一队列中的一个或多个第一数据流进行采样和复制,获取一个或多个第一数据流对应的镜像数据流,其中,每个镜像数据流包括一个或多个镜像拥塞报文。
本实施例可以由协处理器实施拥塞识别和控制,协处理器复制转发芯片中的数据流,得到镜像数据流,一个镜像数据流包括一个或多个镜像拥塞报文。也可以由转发芯片实施拥塞识别和控制,即由转发芯片对转发的数据流进行拥塞识别和控制,可以不用复制数据流。即复制数据流是可选的步骤。本实施例以由协处理器实施拥塞识别和控制为例进行描述。
另外,为提高拥塞识别和控制的效率,可以以设定采样率对数据流进行采样。也可以不进行采样,而对所有拥塞报文进行处理。
网络设备从上游接收到的拥塞报文进入其出端口的队列中。网络设备可以包括一个或多个队列。一个或多个数据流可以进入到一个队列中排队等待发送。对每个队列中的每个数据流进行上述拥塞报文的采样和复制操作。
下述步骤S202~S207为识别队列是否发生拥塞。
S202、获取每个镜像数据流中的一个或多个镜像拥塞报文的传输参数。
协处理器在获取到镜像拥塞报文后,可以截取镜像拥塞报文的头部,获得镜像拥塞报文的传输参数。该传输参数可以包括拥塞报文的数量和拥塞报文的长度。例如,有4个数据流进入第一队列,协处理器在1s中收到该4个镜像数据流的共N个CE镜像拥塞报文,每个镜像拥塞报文的长度为Mbit。
S203、根据第一时间段内发送的第一队列中的一个或多个镜像数据流中的拥塞报文数量和拥塞报文长度,确定队列的拥塞报文速率。
根据一个队列中各个镜像数据流的一个或多个镜像拥塞报文的报文数量和报文长度,可以确定队列的拥塞报文速率。队列的拥塞报文速率是指进入该队列中的拥塞报文的速率。例如,在上述示例中,可以确定队列的拥塞报文速率为N*Mbit/s。
S204、判断队列的拥塞报文速率是否大于或等于第一速率阈值,若是,则进行到S205;否则,继续执行S204或进行到S212。
在计算出上述队列的拥塞报文速率后,判断若队列的拥塞报文速率超出一定速率阈值,即进入该队列的拥塞报文的速率过大,造成累积的拥塞报文数量过多,则可以确定该队列发生拥塞,该队列进入拥塞状态。具体地,判断队列的拥塞报文速率是否大于或等于第一速率阈值。该第一速率阈值可以是队列ROCE流量配置值的第一设定比例,例如队列ROCE流量配置值*90%。该队列ROCE流量配置值是针对每个队列预先配置的ROCE流量值。
S205、确定第一队列发生拥塞,第一队列进入拥塞状态。
若判断出队列的拥塞报文速率大于或等于第一速率阈值,则确定第一队列发生拥塞, 第一队列进入拥塞状态。可以理解的是,该第一队列是网络设备中的任一个队列,即对任一个队列可以执行上述拥塞识别操作。
S206、判断队列的拥塞报文速率是否小于或等于第二速率阈值,若是,则进行到S207;否则,继续执行S206。
在第一队列进入拥塞状态后,监控当前的队列的拥塞报文速率,若队列的拥塞报文速率下降,则判断队列的拥塞报文速率是否小于或等于第二队列速率阈值,若队列的拥塞报文速率小于或等于第二速率阈值,例如,队列的拥塞报文速率小于或等于队列ROCE流量配置值*60%,则确定第一队列未发生拥塞,第一队列退出拥塞状态;若队列的拥塞报文速率处于第一速率阈值与第二速率阈值之间,则继续监测队列的拥塞报文速率是否小于或等于第二速率阈值。该第二速率阈值小于上述第一速率阈值。该第二速率阈值可以是队列ROCE流量配置值的第二设定比例,例如上述队列ROCE流量配置值*60%。
S207、若队列的拥塞报文速率小于或等于第二速率阈值,确定第一队列未发生拥塞,第一队列退出拥塞状态。
从上述示例中,可以看出,若队列的拥塞报文速率大于或等于队列ROCE流量配置值*90%,确定队列发生拥塞;若队列的拥塞报文速率小于队列ROCE流量配置值*90%,且大于队列ROCE流量配置值*60%时,队列中发送的报文仍然较多,尚未退出拥塞状态;队列的拥塞报文速率小于或等于队列ROCE流量配置值*60%时,才可确定队列未发生拥塞,退出拥塞状态。
具体地,如图6所示,为识别队列发生拥塞的示意图,源端设备的一个或多个数据流通过一个或多个网络设备发送至目的端设备,一个数据流包括一个或多个拥塞报文。网络设备的出端口包括一个或多个队列,每个队列包括一个或多个数据流。可以通过其中的一个或多个网络设备对队列拥塞进行控制。如图6所示,源端设备1发送的业务流1和源端设备2发送的业务流2都经过该网络设备发往同一个目的端设备,并且业务流1和业务流2都进入该网络设备的队列1。业务流1和业务流2分别包括若干个拥塞报文。可以由网络设备的转发芯片或协处理器进行拥塞控制。
例如,由网络设备的协处理器进行拥塞控制,则协处理器获取该队列1包括的业务流1和业务流2中的CE报文的镜像报文。具体地,协处理器拷贝转发芯片转发的CE报文,获得CE报文的镜像报文。例如,协处理器获取到业务流1的CE报文f1-1,f1-2,f1-3的镜像报文,以及获取到业务流2的CE报文f2-1,f2-2的镜像报文。进一步地,可以基于一定的采样比例,对上述业务流1和业务流2中的CE报文进行采样。
协处理器根据队列1中的CE报文f1-1,f1-2,f1-3,f2-1,f2-2的报文长度,计算出队列1的队列的拥塞报文速率,然后,判断计算出的队列的拥塞报文速率是否大于或等于第一速率阈值,如果判断的结果为是,则确定队列1发生拥塞;如果判断的结果为否,则继续判断计算出的队列的拥塞报文速率是否大于或等于第一速率阈值。在确定队列1发生拥塞后,判断计算出的队列的拥塞报文速率是否小于或等于第二速率阈值,如果判断的结果为是,则确定队列1退出拥塞状态;否则继续判断计算出的队列的拥塞报文速率是否小于或等于第二速率阈值。
在确定第一队列发生拥塞后,则需对拥塞进行控制。下述步骤S208~S211为识别队列 发生拥塞时进行拥塞控制。可以理解的是,若第一队列发生拥塞,则该队列中的所有数据流都被拥塞。则需对每个数据流进行下列步骤S208~S211的拥塞控制。
S208、获取第一数据流中的一个或多个拥塞报文的时间信息。
转发芯片转发第一数据流中的各个报文,可以记录发送各个拥塞报文的时间信息。该时间信息例如可以是发送拥塞报文的绝对时间或相对时间。
转发芯片记录发送拥塞报文的时间信息后,发送该时间信息给协处理器,则协处理器可以获取到各个镜像数据流中的一个或多个镜像拥塞报文的时间信息。
S209、判断当前时间与发送第一拥塞报文的时间之间的时间间隔是否大于或等于第一设定时间间隔,若是,则进行到S210;否则,继续执行S209。
S210、获取第一拥塞通知报文。
本实施例对被拥塞超过第一设定时间间隔的CE报文,网络设备进行拥塞通知。
可以判断当前时间与发送第一报文的时间之间的时间间隔是否大于或等于第一设定时间间隔,当前时间可以是网络设备的系统时间。该第一报文是在当前时间之前发送的最后一个报文。即在第一报文发送出去之后,判断有多长时间未发送报文。
具体地,在一个实现中,协处理器中可以设置一个计时器,在发送完第一拥塞报文时,启动计时器,其中,该计时器的计时时间为上述第一设定时间间隔。若在计时时间达到时,未发送第一数据流的下一个拥塞报文,则获取第一CNP报文。获取第一CNP报文,可以是获取之前生成并存储在网络设备中的CNP报文,也可以是实时生成CNP报文。
在另一个实现中,也可以根据设定周期监测当前时间与发送第一拥塞报文的时间之间的时间间隔是否大于或等于第一设定时间间隔。若当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于第一设定时间间隔,则获取第一CNP报文。即每隔设定周期判断当前时间与发送第一拥塞报文的时间之间的时间间隔是否大于或等于第一设定时间间隔,如果当前周期内当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于第一设定时间间隔,则退出监测,获取第一CNP报文;如果当前周期内当前时间与发送第一拥塞报文的时间之间的时间间隔小于第一设定时间间隔,则在下一个周期继续判断当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于第一设定时间间隔,直到监测到当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于第一设定时间间隔。该周期小于或等于该第一设定时间间隔。
作为S209和S210的一种替换方式,也可以判断发送第二拥塞报文的时间与发送第一拥塞报文的时间之间的时间间隔是否大于或等于第一设定时间间隔。若发送第二拥塞报文的时间与发送第一拥塞报文的时间之间的时间间隔大于或等于第一设定时间间隔,获取第一CNP报文,第一拥塞报文为第二拥塞报文的上一个报文。即根据发送拥塞报文之间的时间间隔,确定CE报文(第二报文)被拥塞的时间是否超出第一设定时间间隔。
作为S209和S210的又一种替换方式,如图7所示,也可以判断出端口队列的深度是否超出队列深度阈值。如果出端口队列的深度超出队列深度阈值,则网络设备在该出端口队列上新加入的报文中打上ECN标记。在图7中,具体定义了两种队列深度阈值,即Kmin和Kmax,当队列深度小于或等于Kmin时,不对报文标上ECN标记;当队列深度大于Kmax时,所有经过该队列的报文全部标上ECN标记;当队列深度介于Kmin和Kmax之间时, 标上ECN标记的概率随队列深度逐渐上升。当目的端设备接收到带ECN标记的CE报文时,表示网络拥塞,因此目的端设备将该网络拥塞信息传递给源端设备。
如图8所示的队列拥塞控制的示意图,网络设备在发送完CE报文f2-1时,记录发送CE报文f2-1的时间,然后通过计时器或设定周期监测当前时间与发送f2-1的时间之间的时间间隔是否大于或等于第一设定时间间隔。在确定当前时间与发送f2-1的时间之间的时间间隔大于或等于第一设定时间间隔时,网络设备未发送任何CE报文,则网络设备获取第一CNP报文。目的端设备在接收到CE报文f2-1时,向源端设备反馈第二CNP报文,以及在接收到CE报文f2-2时,向源端设备反馈又一个第二CNP报文。可以理解的是,第二CNP报文在时间上并不与接收到的CE报文一一对应,可以仅仅是数量上的对应。当发送完f2-1后,超过第一设定时间间隔未发送CE报文,则目的端设备不会反馈CNP报文。源端设备若在一定时间内未接收到CNP报文,则会对该数据流进行升速处理。这与报文在网络设备的队列中被拥塞的事实是相反的,因此,会造成控速失败。因此,网络设备在当前时间与发送第一报文的时间之间的时间间隔大于或等于第一设定时间间隔时生成或获取第一CNP报文。上述第一设定时间间隔可以是数据流的升速间隔。
S211、即时发送第一CNP报文。
网络设备在获取到第一CNP报文后,可以即刻向源端设备发送第一CNP报文。具体地,若协处理器在发送第一拥塞报文时启动计时器,在上述计时时间到达时,发送第一CNP报文。则第一CNP报文可能发送在上述两个CE报文的第二CNP报文之前,或者在上述两个CE报文的第二CNP报文之间,或者发送在上述两个CE报文的第二CNP报文后。
作为一种替换的方式,网络设备也可以监测当前时间与获取到第一CNP报文的时间之间的时间间隔是否大于或等于第二设定时间间隔,该第二设定时间间隔是根据发送第一拥塞报文的时间与接收到目的端设备发送的第二CNP报文的时间之间的时间间隔确定的。若当前时间与获取到第一CNP报文的时间之间的时间间隔大于或等于第二设定时间间隔,则发送第一CNP报文。即如图9所示的又一个队列拥塞控制的示意图,协处理器可以估计CE报文f2-1发送至目的端设备的时间以及接收到第二CNP报文的时间,在发送完CE报文f2-1后,以及发送CE报文f2-2之前,发送第一CNP报文,可以尽量使得第一CNP报文位于上述两个CE报文的第二CNP报文之间。
继续参考图5,若确定第一队列未发生拥塞后,可以进一步识别第一队列中的每一个数据流是否发生拥塞。下面步骤S212~S217为识别第一数据流是否发生拥塞。该第一数据流可以是上述队列中的任一个数据流。
S212、若队列的拥塞报文速率小于第一速率阈值,获取第一数据流中拥塞报文的序列号的连续程度。
上述传输参数还可以包括拥塞报文的序列号。可以从拥塞报文的报文头中获取拥塞报文的序列号。具体地,协处理器可以截取镜像拥塞报文的头部,获得上述拥塞报文的序列号。
拥塞报文的序列号的连续程度可以是指拥塞报文的连续的序列号的数量大于设定值。则获取第一数据流中拥塞报文的连续的序列号的数量。
拥塞报文的序列号的连续程度还可以是指拥塞报文的序列号之间的差值的最大值小 于设定值。则获取第一数据流中拥塞报文的序列号之间的差值的最大值。
由于拥塞报文具有唯一的序列号,拥塞报文的序列号的连续程度与拥塞报文的连续程度具有对应关系,因此,根据第一数据流中拥塞报文的序列号的连续程度,可以确定拥塞报文的连续程度。上述第一数据流中拥塞报文的序列号的连续程度可以是通过采样获得的,该采样获得的第一数据流中拥塞报文的序列号的连续程度可以用于表示拥塞报文的连续程度。
S213、判断第一数据流中拥塞报文的序列号的连续程度是否大于或等于第一阈值,若是,则进行到S214;否则,继续执行S213。
如果一个数据流中序列号的连续程度超过一定阈值,则说明发送出去的报文都是拥塞报文,可以确定该数据流发生拥塞,该数据流进入拥塞状态。例如:协处理器上收到转发芯片的CE报文:f1-1 f2-5 f1-3 f2-6 f1-4 f1-5 f1-6 f1-7,fi表示第i个数据流。其中,第1个数据流中的f1-3 f1-4 f1-5 f1-6 f1-7共5个报文的序列号连续(例如第一阈值为5个报文),则识别出f1为拥塞流。
又例如,网络设备发送出去的一定数量的报文中,拥塞报文的序列号之间的差值的最大值小于设定阈值,则也可以认为该数据流中的拥塞报文的序列号的连续程度大于设定阈值,可以确定拥塞报文的连续程度,从而判断该数据流发生拥塞。例如,网络设备发送100个报文,其中拥塞报文的序列号之间的差值中只有一个最大值为2,其余的序列号的差值均为1,则可以确定该网络设备发送的100个报文中98个均为拥塞报文,则可以确定该数据流发生了拥塞。
S214、确定第一数据流发生拥塞,第一数据流进入拥塞状态。
如图10所示,为识别数据流发生拥塞的示意图,源端设备1和源端设备2向同一目的端设备发送数据流。其中,第一数据流f1在网络设备1处发生拥塞,则可以在网络设备1或靠近目的端设备的末端网络设备(网络设备3)进行拥塞识别。在图10中,网络设备3的协处理器获取到数据流f1的3个序号连续的报文f1-1,f1-2和f1-3,连续的序列号对应的报文的数量大于或等于3,则可以确定数据流f1发生拥塞,数据流f1进入拥塞状态。
S215、重新获取第一数据流中拥塞报文的序列号的连续程度。
在第一数据流进入拥塞状态后,需要监控第一数据流是否一直处于拥塞状态或者退出拥塞状态,则可以重新获取第一数据流中拥塞报文的序列号的连续程度。
如果一个数据流中序列号的连续程度小于一定阈值,则说明发送出去的报文都是普通报文,可以确定该数据流退出拥塞状态。
又例如,网络设备发送出去的一定数量的报文中,拥塞报文的序列号之间的差值的最大值大于设定阈值,则也可以认为该数据流中的拥塞报文的序列号的连续程度小于设定阈值,从而判断该数据流未发生拥塞。
S216、判断第一数据流中拥塞报文的序列号的连续程度是否小于或等于第二阈值,若是,则进行到S217;否则,继续执行S216。
发送的第一数据流中序列号的连续程度小于一定阈值,即发送的第一数据流的拥塞报文较少,则可以确定该数据流未发生拥塞,该数据流退出拥塞状态。例如,协处理器上获取到转发芯片的CE报文:f1-1 f2-5 f1-3 f2-7 f1-4 f2-9 f1-5 f1-6 f1-7,fi表示第i个数据流, 其中,数据流f2中的f2-5 f2-7 f2-9报文不连续,并在f2-5~f2-9总计5个报文中缺失2个报文,则此时识别出f2未发生拥塞。
S217、确定第一数据流未发生拥塞,第一数据流退出拥塞状态。
在确定第一数据流发生拥塞后,则进行到S208~S211,即进行流拥塞控制。与队列拥塞控制所不同的是,S208~S211也可以是仅对第一数据流进行拥塞控制。如图11所示的流拥塞控制的示意图,网络设备在发送完CE报文f2-1时,记录发送CE报文f2-1的时间,然后通过计时器或设定周期监测当前时间与发送第一拥塞报文的时间之间的时间间隔是否大于或等于第一设定时间间隔。在确定当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于第一设定时间间隔时,网络设备未发送任何CE报文,则网络设备获取第一CNP报文。目的端设备在接收到CE报文f2-1时,向源端设备反馈第二CNP报文,以及在接收到CE报文f2-2时,向源端设备反馈又一个第二CNP报文。可以理解的是,第二CNP报文在时间上并不与接收到的CE报文一一对应,可以仅仅是数量上的对应。当发送完f2-1后,超过第一设定时间间隔未发送CE报文,则目的端设备不会反馈CNP报文。源端设备若在一定时间内未接收到CNP报文,则会对该数据流进行升速处理。这与报文在网络设备的队列中被拥塞的事实是相反的,因此,会造成控速失败。因此,网络设备在当前时间与发送第一报文的时间之间的时间间隔大于或等于第一设定时间间隔时生成或获取第一CNP报文。上述第一设定时间间隔可以是数据流的升速间隔。
网络设备在获取到第一CNP报文后,可以即刻向源端设备发送第一CNP报文。则第一CNP报文可能发送在上述两个CE报文的第二CNP报文之前,或者在上述两个CE报文的第二CNP报文之间,或者发送在上述两个CE报文的第二CNP报文后。
网络设备也可以监测当前时间与获取到第一CNP报文的时间之间的时间间隔是否大于或等于第二设定时间间隔,该第二设定时间间隔是根据发送第一拥塞报文的时间与接收到目的端设备发送的第二CNP报文的时间之间的时间间隔确定的。若当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于第二设定时间间隔,则发送第一CNP报文。即如图12所示的又一个流拥塞控制的示意图,协处理器可以估计CE报文f2-1发送至目的端设备的时间以及接收到第二CNP报文的时间,在发送完CE报文f2-1后,以及发送CE报文f2-2之前,发送第一CNP报文,可以尽量使得第一CNP报文位于上述两个CE报文的第二CNP报文之间。
可以理解的是,上述队列拥塞识别和控制、以及流拥塞识别和控制可以不具有执行的先后顺序,且也可以仅对数据流进行拥塞识别和控制,或者仅对队列进行拥塞识别和控制。
根据本申请实施例提供的一种拥塞控制方法,通过获取数据流中的拥塞报文的时间信息,若该数据流发生拥塞,对于被拥塞超过一定时间间隔的报文进行拥塞通知,以避免数据流被拥塞时仍然被做速率升速处理,提高了报文传输效率。
基于上述拥塞控制方法的同一构思,如图13所示,本申请实施例还提供一种拥塞控制装置1000,该装置1000包括:第一获取单元11、第二获取单元12和第一发送单元13,还可以包括采样复制单元14、第五获取单元15和第一确定单元16(图中以虚线表示)。其中:
第一获取单元11,用于获取发送的第一数据流中的一个或多个拥塞报文的时间信息,所述一个或多个拥塞报文携带有拥塞标记;第二获取单元12,用于若所述第一数据流发生 拥塞,根据所述第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知包拥塞通知报文,所述第一拥塞通知报文用于对被拥塞超过第一设定时间间隔的报文进行拥塞通知;以及第一发送单元13,用于发送所述第一拥塞通知报文。
在一个实现中,所述第二获取单元12用于若当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,其中,所述第一拥塞报文为在所述当前时间之前发送的第一数据流的最后一个报文。
在又一个实现中,所述第二获取单元12包括:启动单元,用于发送完所述第一拥塞报文时,启动计时器,其中,所述计时器的计时时间为所述第一设定时间间隔;以及第三获取单元,用于若所述计时时间到达时,未发送所述第一数据流的下一个报文,获取所述第一拥塞通知报文。
在又一个实现中,所述第二获取单元12包括:第一监测单元,用于根据设定周期监测当前时间与发送所述第一拥塞报文的时间之间的时间间隔是否大于或等于所述第一设定时间间隔;以及第四获取单元,用于若当前时间与发送所述第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文。
在又一个实现中,所述第二获取单元12用于若发送第二拥塞报文的时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,所述第一拥塞报文为所述第二拥塞报文的上一个报文,所述第一拥塞报文和所述第二拥塞报文对应所述第一数据流。
在又一个实现中,所述第一发送单元13包括:第二监测单元,用于监测当前时间与获取所述第一拥塞通知报文之间的时间间隔是否大于或等于第二设定时间间隔;以及第二发送单元,用于若当前时间与获取所述第一拥塞通知报文之间的时间间隔大于或等于所述第二设定时间间隔,发送所述第一拥塞通知报文,其中,所述第二设定时间间隔是根据发送所述第一拥塞报文的时间与接收到目的端设备发送的第二拥塞通知报文的时间之间的时间间隔确定的。
在又一个实现中,所述第一发送单元13用于获取所述第一拥塞通知报文后,即时发送所述第一拥塞通知报文。
在又一个实现中,所述装置还包括:第五获取单元15,用于获取所述第一数据流中的一个或多个拥塞报文的传输参数;以及第一确定单元16,用于根据所述一个或多个拥塞报文的传输参数,确定所述第一数据流发生拥塞。
在又一个实现中,所述装置还包括:采样复制单元14,用于以设定采样率对所述第一数据流的一个或多个拥塞报文进行采样和复制,获取所述第一数据流对应的镜像数据流,其中,所述镜像数据流包括一个或多个镜像报文。
在又一个实现中,所述传输参数包括报文数量和报文长度,所述第一确定单元16包括:第二确定单元,用于根据第一时间段内发送的第一队列中的一个或多个所述第一数据流的拥塞报文数量和拥塞报文长度,确定队列的拥塞报文速率;以及第三确定单元,用于若所述队列的拥塞报文速率大于或等于第一速率阈值,确定所述第一队列发生拥塞,所述第一队列进入拥塞状态。
在又一个实现中,所述装置还包括:第四确定单元,用于若所述队列的拥塞报文速率 小于或等于第二速率阈值,确定所述第一队列未发生拥塞,所述第一队列退出所述拥塞状态,其中,所述第二速率阈值小于所述第一速率阈值。
在又一个实现中,所述传输参数包括所述第一数据流中的一个或多个拥塞报文的序列号,所述第一确定单元16包括:第六获取单元,用于获取所述第一数据流中拥塞报文的序列号的连续程度;第五确定单元,用于若所述连续程度大于或等于第一阈值,确定所述第一数据流发生拥塞,所述第一数据流进入拥塞状态。
在又一个实现中,所述装置还包括:第六确定单元,用于若所述连续程度小于或等于第二阈值,确定所述第一数据流未发生拥塞,所述第一数据流退出所述拥塞状态,第二阈值小于上述第一阈值。
有关上述各单元的具体实现可参考图4和图5所示的拥塞控制方法的相关描述。
根据本申请实施例提供的一种拥塞控制装置,通过获取数据流中的拥塞报文的时间信息,若该数据流发生拥塞,对于被拥塞超过一定时间间隔的报文进行拥塞通知,以避免数据流被拥塞时仍然被做速率升速处理,提高了报文传输效率。
如图14所示,本申请实施例还提供又一种拥塞控制装置,该拥塞控制装置2000包括处理器21和物理接口22。
处理器21的数量可以为一个或多个。处理器21包括中央处理器,网络处理器,图形处理器(graphics processing unit,GPU),专用集成电路,可编程逻辑器件(programmable logic device,PLD)或其任意组合。上述PLD可以是复杂可编程逻辑器件,现场可编程门阵列,通用阵列逻辑或其任意组合。处理器21可以包括控制面211和转发面212。转发面212具体可以包括转发芯片A,该转发芯片A从源端设备接收到业务报文,并转发给目的端设备;并且在一个实现中,该转发芯片A还用于实施本申请实施例中的拥塞控制。通过转发面实现拥塞控制具有较高的时效性。该转发面212还可以包括协处理器B,在又一个实现中,该协处理器B从转发芯片A拷贝CE报文,即获得CE报文的镜像报文,并实施本申请实施例中的拥塞控制。
控制面211和转发面212可以由独立的电路实现,也可以整合在一个电路中。例如,处理器21为多核CPU。多个核中的一个或一些实现控制面211,其他的核实现转发面212。又例如,控制面211由CPU实现,转发面212由网络处理器(network processor,NP),专用集成电路(application specific integrated circuit,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或其任意组合实现。又例如,拥塞控制装置为框式网络设备,控制面211由主控卡实现,转发面212由线卡实现。又例如,控制面211和转发面212都由带控制面能力的NP实现。
物理接口22用于收发业务报文和发送CNP报文。具体地,该物理接口22用于从源端设备接收业务报文,并向目的端设备转发业务报文;并且从目的端设备接收CNP报文或从自身获取CNP报文,并发送给源端设备。
物理接口22的数量可以为一个或多个。物理接口22可以包括无线接口和/或有线接口。例如,无线接口可以包括无线局域网(wireless local area network,WLAN)接口,蓝牙接口,蜂窝网络接口或其任意组合。有线接口可以包括以太网接口、异步传输模式接口、光纤通道接口或其任意组合。以太网接口可以为电接口或光接口。物理接口22并不一定包括 (尽管通常包括)以太网接口。
根据本申请实施例提供的一种拥塞控制装置,通过获取数据流中的拥塞报文的时间信息,若该数据流发生拥塞,对于被拥塞超过一定时间间隔的报文进行拥塞通知,以避免数据流被拥塞时仍然被做速率升速处理,提高了报文传输效率。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被计算机执行时,使得计算机执行上述任一方法实施例中由网络设备执行的步骤和/或处理。
本申请还提供一种计算机程序产品,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行上述任一方法实施例中由网路设备执行的步骤和/或处理。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。所显示或讨论的相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者通过该计算机可读存储介质进行传输。该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是只读存储器(read-only memory,ROM),或随机存储存储器(random access memory,RAM),或磁性介质,例如,软盘、硬盘、磁带、磁碟、或光介质,例如,数字通用光盘(digital versatile disc,DVD)、或者半导体介质,例如,固态硬盘(solid state disk,SSD)等。

Claims (27)

  1. 一种拥塞控制方法,其特征在于,所述方法包括:
    获取发送的第一数据流中的一个或多个拥塞报文的时间信息,所述拥塞报文携带有拥塞标记;
    若所述第一数据流发生拥塞,根据所述第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知报文,所述第一拥塞通知报文用于对被拥塞超过第一设定时间间隔的报文进行拥塞通知;
    发送所述第一拥塞通知报文。
  2. 根据权利要求1所述的方法,其特征在于,所述若所述第一数据流发生拥塞时,根据所述第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知报文,包括:
    若当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,其中,所述第一拥塞报文为在所述当前时间之前发送的第一数据流的最后一个报文。
  3. 根据权利要求2所述的方法,其特征在于,所述若当前时间与发送第一拥塞报文的时间的时间间隔大于或等于第一设定时间间隔,获取第一拥塞通知报文,包括:
    发送完所述第一拥塞报文时,启动计时器,其中,所述计时器的计时时间为所述第一设定时间间隔;
    若所述计时时间到达时,未发送所述第一数据流的下一个报文,获取所述第一拥塞通知报文。
  4. 根据权利要求2所述的方法,其特征在于,所述若当前时间与发送第一拥塞报文的时间的时间间隔大于或等于第一设定时间间隔,获取第一拥塞通知报文,包括:
    根据设定周期监测当前时间与发送所述第一拥塞报文的时间之间的时间间隔是否大于或等于所述第一设定时间间隔;
    若当前时间与发送所述第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文。
  5. 根据权利要求1所述的方法,其特征在于,所述若所述第一数据流发生拥塞时,根据所述第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知报文,包括:
    若发送第二拥塞报文的时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,所述第一拥塞报文为所述第二拥塞报文的上一个报文,所述第一拥塞报文和所述第二拥塞报文对应所述第一数据流。
  6. 根据权利要求1~5任一项所述的方法,其特征在于,所述发送第一拥塞通知报文, 包括:
    监测当前时间与获取所述第一拥塞通知报文之间的时间间隔是否大于或等于第二设定时间间隔;
    若当前时间与获取所述第一拥塞通知报文之间的时间间隔大于或等于所述第二设定时间间隔,发送所述第一拥塞通知报文。
  7. 根据权利要求1~5任一项所述的方法,其特征在于,所述发送第一拥塞通知报文,包括:
    获取所述第一拥塞通知报文后,即时发送所述第一拥塞通知报文。
  8. 根据权利要求1~7任一项所述的方法,其特征在于,所述方法还包括:
    获取所述第一数据流中的一个或多个拥塞报文的传输参数;
    根据所述一个或多个拥塞报文的传输参数,确定所述第一数据流发生拥塞。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    以设定采样率对所述第一数据流的一个或多个拥塞报文进行采样和复制,获取所述第一数据流对应的镜像数据流,其中,所述镜像数据流包括一个或多个镜像报文。
  10. 根据权利要求8或9所述的方法,其特征在于,所述传输参数包括报文数量和报文长度,所述根据所述一个或多个拥塞报文的传输参数,确定所述第一数据流发生拥塞,包括:
    根据第一时间段内发送的第一队列中的一个或多个所述第一数据流的拥塞报文数量和拥塞报文长度,确定队列的拥塞报文速率;
    若所述队列的拥塞报文速率大于或等于第一速率阈值,确定所述第一队列发生拥塞,所述第一队列进入拥塞状态。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    若所述队列的拥塞报文速率小于或等于第二速率阈值,确定所述第一队列未发生拥塞,所述第一队列退出所述拥塞状态,其中,所述第二速率阈值小于所述第一速率阈值。
  12. 根据权利要求8~11任一项所述的方法,其特征在于,所述传输参数包括所述第一数据流中的一个或多个拥塞报文的序列号,所述根据所述一个或多个拥塞报文的传输参数,确定所述第一数据流发生拥塞,包括:
    获取所述第一数据流中拥塞报文的序列号的连续程度;
    若所述连续程度大于或等于第一阈值,确定所述第一数据流发生拥塞,所述第一数据流进入拥塞状态。
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    若所述连续程度小于或等于第二阈值,确定所述第一数据流未发生拥塞,所述第一数据流退出所述拥塞状态,其中,所述第二阈值小于所述第一阈值。
  14. 一种拥塞控制装置,其特征在于,包括:
    第一获取单元,用于获取发送的第一数据流中的一个或多个拥塞报文的时间信息,所述拥塞报文携带有拥塞标记;
    第二获取单元,用于若所述第一数据流发生拥塞,根据所述第一数据流中的一个或多个拥塞报文的时间信息,获取第一拥塞通知报文,所述第一拥塞通知报文用于对被拥塞超过第一设定时间间隔的报文进行拥塞通知;
    第一发送单元,用于发送所述第一拥塞通知报文。
  15. 根据权利要求14所述的装置,其特征在于,所述第二获取单元用于若当前时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,其中,所述第一拥塞报文为在所述当前时间之前发送的第一数据流的最后一个报文。
  16. 根据权利要求15所述的装置,其特征在于,所述第二获取单元包括:
    启动单元,用于发送完所述第一拥塞报文时,启动计时器,其中,所述计时器的计时时间为所述第一设定时间间隔;
    第三获取单元,用于若所述计时时间到达时,未发送所述第一数据流的下一个报文,获取所述第一拥塞通知报文。
  17. 根据权利要求14所述的装置,其特征在于,所述第二获取单元包括:
    第一监测单元,用于根据设定周期监测当前时间与发送所述第一拥塞报文的时间之间的时间间隔是否大于或等于所述第一设定时间间隔;
    第四获取单元,用于若当前时间与发送所述第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文。
  18. 根据权利要求14所述的装置,其特征在于,所述第二获取单元用于若发送第二拥塞报文的时间与发送第一拥塞报文的时间之间的时间间隔大于或等于所述第一设定时间间隔,获取所述第一拥塞通知报文,所述第一拥塞报文为所述第二拥塞报文的上一个报文,所述第一拥塞报文和所述第二拥塞报文对应所述第一数据流。
  19. 根据权利要求14~18任一项所述的装置,其特征在于,所述第一发送单元包括:
    第二监测单元,用于监测当前时间与获取所述第一拥塞通知报文之间的时间间隔是否大于或等于第二设定时间间隔;
    第二发送单元,用于若当前时间与获取所述第一拥塞通知报文之间的时间间隔大于或等于所述第二设定时间间隔,发送所述第一拥塞通知报文。
  20. 根据权利要求14~18任一项所述的装置,其特征在于,所述第一发送单元用于获取所述第一拥塞通知报文后,即时发送所述第一拥塞通知报文。
  21. 根据权利要求14~20任一项所述的装置,其特征在于,所述装置还包括:
    第五获取单元,用于获取所述第一数据流中的一个或多个拥塞报文的传输参数;
    第一确定单元,用于根据所述一个或多个拥塞报文的传输参数,确定所述第一数据流发生拥塞。
  22. 根据权利要求21所述的装置,其特征在于,所述装置还包括:
    采样复制单元,用于以设定采样率对所述第一数据流的一个或多个拥塞报文进行采样和复制,获取所述第一数据流对应的镜像数据流,其中,所述镜像数据流包括一个或多个镜像报文。
  23. 根据权利要求21或22所述的装置,其特征在于,所述传输参数包括报文数量和报文长度,所述第一确定单元包括:
    第二确定单元,用于根据第一时间段内发送的第一队列中的一个或多个所述第一数据流的拥塞报文数量和拥塞报文长度,确定队列的拥塞报文速率;
    第三确定单元,用于若所述队列的拥塞报文速率大于或等于第一速率阈值,确定所述第一队列发生拥塞,所述第一队列进入拥塞状态。
  24. 根据权利要求23所述的装置,其特征在于,所述装置还包括:
    第四确定单元,用于若所述队列的拥塞报文速率小于或等于第二速率阈值,确定所述第一队列未发生拥塞,所述第一队列退出所述拥塞状态,其中,所述第二速率阈值小于所述第一速率阈值。
  25. 根据权利要求21~24任一项所述的装置,其特征在于,所述传输参数包括所述第一数据流中的一个或多个拥塞报文的序列号,所述第一确定单元包括:
    第六获取单元,用于获取所述第一数据流中拥塞报文的序列号的连续程度;
    第五确定单元,用于若所述连续程度大于或等于第一阈值,确定所述第一数据流发生拥塞,所述第一数据流进入拥塞状态。
  26. 根据权利要求25所述的装置,其特征在于,所述装置还包括:
    第六确定单元,用于若所述连续程度小于或等于第二阈值,确定所述第一数据流未发生拥塞,所述第一数据流退出所述拥塞状态,所述第二阈值小于所述第一阈值。
  27. 一种拥塞控制装置,其特征在于,包括处理器和物理接口,所述处理器用于执行如权利要求1~13中任一项所述的方法。
PCT/CN2021/071251 2020-01-23 2021-01-12 拥塞控制方法及装置 WO2021147704A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21744459.5A EP4087199A4 (en) 2020-01-23 2021-01-12 METHOD AND DEVICE FOR OVERLOAD CONTROL
US17/870,700 US20220368633A1 (en) 2020-01-23 2022-07-21 Congestion control method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010077053.9 2020-01-23
CN202010077053.9A CN113162862A (zh) 2020-01-23 2020-01-23 拥塞控制方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/870,700 Continuation US20220368633A1 (en) 2020-01-23 2022-07-21 Congestion control method and apparatus

Publications (1)

Publication Number Publication Date
WO2021147704A1 true WO2021147704A1 (zh) 2021-07-29

Family

ID=76882168

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071251 WO2021147704A1 (zh) 2020-01-23 2021-01-12 拥塞控制方法及装置

Country Status (4)

Country Link
US (1) US20220368633A1 (zh)
EP (1) EP4087199A4 (zh)
CN (1) CN113162862A (zh)
WO (1) WO2021147704A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174477A (zh) * 2022-06-29 2022-10-11 无锡芯光互连技术研究院有限公司 一种基于优先级流量控制的拥塞控制方法及系统
WO2023011179A1 (zh) * 2021-08-05 2023-02-09 清华大学 一种拥塞控制方法及装置
CN116915706A (zh) * 2023-09-13 2023-10-20 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 数据中心网络拥塞控制方法、装置、设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692958A (zh) * 2022-09-05 2024-03-12 维沃移动通信有限公司 信息处理方法、设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533959A (zh) * 2016-12-23 2017-03-22 锐捷网络股份有限公司 一种交换设备出口端速率的确定方法及交换设备
CN108418767A (zh) * 2018-02-09 2018-08-17 华为技术有限公司 数据传输方法、设备及计算机存储介质
CN109039936A (zh) * 2018-08-30 2018-12-18 华为技术有限公司 传输速率控制方法、装置、发送设备和接收设备
WO2019210725A1 (zh) * 2018-05-04 2019-11-07 华为技术有限公司 拥塞控制方法、装置、设备及存储介质
US20200021532A1 (en) * 2018-07-10 2020-01-16 Cisco Technology, Inc. Automatic rate limiting based on explicit network congestion notification in smart network interface card

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709057B (zh) * 2017-08-11 2023-05-05 华为技术有限公司 网络拥塞的通告方法、代理节点、网络节点及计算机设备
US10944660B2 (en) * 2019-02-08 2021-03-09 Intel Corporation Managing congestion in a network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533959A (zh) * 2016-12-23 2017-03-22 锐捷网络股份有限公司 一种交换设备出口端速率的确定方法及交换设备
CN108418767A (zh) * 2018-02-09 2018-08-17 华为技术有限公司 数据传输方法、设备及计算机存储介质
WO2019210725A1 (zh) * 2018-05-04 2019-11-07 华为技术有限公司 拥塞控制方法、装置、设备及存储介质
US20200021532A1 (en) * 2018-07-10 2020-01-16 Cisco Technology, Inc. Automatic rate limiting based on explicit network congestion notification in smart network interface card
CN109039936A (zh) * 2018-08-30 2018-12-18 华为技术有限公司 传输速率控制方法、装置、发送设备和接收设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4087199A4

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023011179A1 (zh) * 2021-08-05 2023-02-09 清华大学 一种拥塞控制方法及装置
CN115174477A (zh) * 2022-06-29 2022-10-11 无锡芯光互连技术研究院有限公司 一种基于优先级流量控制的拥塞控制方法及系统
CN115174477B (zh) * 2022-06-29 2024-04-05 无锡芯光互连技术研究院有限公司 一种基于优先级流量控制的拥塞控制方法及系统
CN116915706A (zh) * 2023-09-13 2023-10-20 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 数据中心网络拥塞控制方法、装置、设备及存储介质
CN116915706B (zh) * 2023-09-13 2023-12-26 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 数据中心网络拥塞控制方法、装置、设备及存储介质

Also Published As

Publication number Publication date
EP4087199A1 (en) 2022-11-09
EP4087199A4 (en) 2023-02-01
CN113162862A (zh) 2021-07-23
US20220368633A1 (en) 2022-11-17

Similar Documents

Publication Publication Date Title
WO2021147704A1 (zh) 拥塞控制方法及装置
US7839783B2 (en) Systems and methods of improving performance of transport protocols
WO2018210117A1 (zh) 一种拥塞控制方法、网络设备及其网络接口控制器
US11451481B2 (en) Network control apparatus and network control method
US7061866B2 (en) Metered packet flow for packet switched networks
WO2020063339A1 (zh) 一种实现数据传输的方法、装置和系统
CN112104562B (zh) 拥塞控制方法及装置、通信网络、计算机存储介质
CN108881031B (zh) 一种基于sdn网络的自适应可靠数据传输方法
JP4924285B2 (ja) 通信装置、通信システム、転送効率向上方法及び転送効率向上プログラム
WO2021103706A1 (zh) 控制数据包发送方法、模型训练方法、装置及系统
WO2021008562A1 (zh) 流速控制方法和装置
US8111700B2 (en) Computer-readable recording medium storing packet identification program, packet identification method, and packet identification device
CN108833207B (zh) 时延测量方法及系统
CN111314961A (zh) Tcp传输方法、装置和系统
CN108243117B (zh) 一种流量监控方法、装置及电子设备
US10063489B2 (en) Buffer bloat control
WO2023207461A1 (zh) 拥塞流识别方法、装置、设备及计算机可读存储介质
WO2023226532A1 (zh) 拥塞控制方法、节点及系统
JPH11163936A (ja) フロー制御方法及びフロー制御装置
WO2023280004A1 (zh) 一种网络配置方法、设备和系统
US20230066848A1 (en) Method and system for granular dynamic quota-based congestion management
JP4766703B2 (ja) エッジノードおよび帯域制御方法
WO2023226603A1 (zh) 一种抑制拥塞队列产生的方法及装置
US20230061885A1 (en) Method and system for dynamic quota-based congestion management
Cheng et al. Improving the ramping up behavior of TCP slow start

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21744459

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021744459

Country of ref document: EP

Effective date: 20220802

NENP Non-entry into the national phase

Ref country code: DE