WO2020207479A1 - Method and device for controlling network congestion - Google Patents

Method and device for controlling network congestion Download PDF

Info

Publication number
WO2020207479A1
WO2020207479A1 PCT/CN2020/084260 CN2020084260W WO2020207479A1 WO 2020207479 A1 WO2020207479 A1 WO 2020207479A1 CN 2020084260 W CN2020084260 W CN 2020084260W WO 2020207479 A1 WO2020207479 A1 WO 2020207479A1
Authority
WO
WIPO (PCT)
Prior art keywords
message
priority
rtt
data
timestamp
Prior art date
Application number
PCT/CN2020/084260
Other languages
French (fr)
Chinese (zh)
Inventor
徐永慧
周洪
郑合文
孙文昊
刘大伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020207479A1 publication Critical patent/WO2020207479A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • H04L47/283Flow control; Congestion control in relation to timing considerations in response to processing delays, e.g. caused by jitter or round trip time [RTT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/25Flow control; Congestion control with rate being modified by the source upon detecting a change of network conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • H04L43/0864Round trip delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/29Flow control; Congestion control using a combination of thresholds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/106Active monitoring, e.g. heartbeat, ping or trace-route using time related information in packets, e.g. by adding timestamps

Definitions

  • the embodiments of the present application relate to the field of communication technologies, and in particular, to a method and device for network congestion control.
  • RDMA remote direct memory access
  • NIC network interface card
  • RoCE RDMA over Converged Ethernet
  • an existing flow control method measures the round trip delay of a packet (64KB), calculates the round trip time (Round Trip Time, RTT), and calculates the round trip time according to the round trip time. RTT adjusts the sending rate.
  • RTT adjusts the sending rate.
  • the calculation of RTT in the prior art is performed on a data segment.
  • the size of the data segment is 64KB and can contain multiple messages.
  • Host A is sending the first message of this data segment. When the message is sent, the time stamp t s at the time of sending is recorded, and when the host A receives the Acknowledgement (ACK) sent by the host B, the completion time t comp is recorded.
  • the round-trip time RTT t comp- t s- t serial , where t serial is the serialized transmission time of the data segment, and the serialized transmission time of the data segment is divided by the data segment size (64KB) At the line rate.
  • the size of the data segment in this solution is 64KB, and the size of the data block required by each request in actual applications is not fixed, so there is no guarantee of an ACK every 64KB. If the data block is small, there will be multiple data segments per 64KB ACK, using different ACK completion times to calculate RTT will affect the accuracy of RTT, so the RTT calculated by this scheme is not accurate, resulting in the congestion depth of the network queue cannot be effectively controlled; and the RTT measured by this scheme is reversed The influence of path congestion cannot accurately reflect whether congestion occurs in the request direction or the response direction, which may cause misjudgment by the control system.
  • the embodiments of the present application provide a network congestion control method and device, which can avoid the influence of reverse path congestion, accurately control the congestion depth of network queues, and improve system performance.
  • a network congestion control method is provided.
  • the method is applied to a first device.
  • the first device is a device that sends a data packet.
  • the method includes: the first device sends a first device to a second device.
  • a message the first message carries a first time stamp; the first time stamp is a local time stamp when the first message is sent; the first device receives the second message sent by the second device, The second message carries the first time stamp; the first time stamp is subtracted from the second time stamp to obtain the first round-trip time RTT; the second time stamp is when the first device receives the second message According to the first RTT, adjust the sending rate of the data message; wherein the priority of the first message is the same as the priority of the data message, and the priority of the second message is higher than the priority The priority of the data message.
  • the measurement of the first RTT is not affected by whether the reverse path (the transmission direction of the untransmitted service message) is congested, and the determined first RTT is more accurate. Therefore, the first RTT is used to adjust the data message When sending rate, it can reduce network queue congestion and improve system performance. It is understandable that the first RTT not only considers the queuing and processing time in the buffer of the switch (or router), but also avoids the influence of congestion in the reverse path (the transmission direction of untransmitted service packets). Therefore, the first RTT Related to the degree of network congestion, it dynamically changes with changes in the degree of network congestion, and can more accurately reflect the degree of congestion of the current network. Therefore, the first RTT can be called a dynamic RTT.
  • the above method further includes: the first device sends a third message to the second device, the third message carries a third time stamp, and the third time The stamp is the local timestamp when the third message is sent; the priority of the third message is higher than the priority of the data message; the first device receives the fourth message sent by the second device; the fourth The message carries the third time stamp; the priority of the fourth message is higher than the priority of the data message; the fourth time stamp is subtracted from the third time stamp to obtain the second RTT; the fourth time The stamp is the local timestamp when the first device receives the fourth message.
  • the second RTT can be measured more accurately by using the third message and the fourth message with a priority higher than the priority of the data message. It is understandable that under the condition that the data transmission path between the network card of the first device and the network card of the second device is unchanged, the value of the second RTT is basically fixed, and may slightly change with network performance. Therefore, the second RTT can be called a fixed RTT.
  • the foregoing adjusting the sending rate of the data message according to the foregoing first RTT includes: subtracting the foregoing second RTT from the foregoing first RTT, Obtain the time difference; the time difference is used to indicate the congestion depth of the network queue; according to the time difference, adjust the sending rate of the data message. Based on this solution, the time difference obtained by the difference between the first RTT and the second RTT can accurately reflect the depth of network queue congestion between the first device and the second device. Therefore, when adjusting the sending rate of data packets according to the time difference, it can be Effectively reduce the congestion depth of network queues and improve system performance.
  • the foregoing adjusting the sending rate of data packets according to the foregoing time difference includes: if the time difference is less than a first preset threshold, increasing the foregoing data The message sending rate; if the time difference is greater than the second preset threshold, the sending rate of the data message is reduced; the first preset threshold is less than the second preset threshold. Based on this solution, the sending rate of data packets can be reduced when the network queue is relatively congested, so as to reduce the congestion depth of the network queue.
  • the foregoing method further includes: if the first device determines that the first preamble has been sent cumulatively since the last time the first message was sent. Set the number of data packets to obtain the third RTT; or, if the first device determines that the time interval between the current time and the last sending of the first packet reaches the first preset duration, obtain the third RTT, and Record the current timestamp.
  • the third RTT and the foregoing first RTT are dynamic RTTs at different moments. Based on this solution, since the depth of network congestion changes dynamically, the dynamic RTT can be detected periodically to obtain the current network congestion degree.
  • the foregoing method further includes: if the first device determines that the second preset has been sent cumulatively since the last time the third message was sent. Set the number of data packets to obtain the fourth RTT; or, if the first device determines that the time interval between the current time and the last sending of the third packet reaches the second preset duration, obtain the fourth RTT, and Record the current timestamp.
  • the fourth RTT and the foregoing second RTT are fixed RTTs at different times. Based on this solution, the fixed RTT can be periodically and cyclically detected, so that when the data transmission path between the first device and the second device changes, the fixed RTT corresponding to the new transmission path can be detected more accurately.
  • the foregoing third time stamp is carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in The payload of the message RDMA.
  • the foregoing first time stamp is carried in a reserved field in the BTH of the RDMA message ; If the first message is different from the data message, the first time stamp is carried in the reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in the payload of the message RDMA. .
  • the reserved field of the existing protocol to carry the timestamp, compared with the prior art, there is no need to record the relationship between the timestamp and the message sequence number and occupy less resources.
  • a second aspect of the embodiments of the present application provides a network congestion control method, the method includes: a second device receives a first message sent by a first device, the first message carries a first time stamp; The timestamp is the local timestamp when the first message is sent; the first device is the device that sends the data message; the second device sends a second message to the first device, and the second message carries the first message. A timestamp; where the priority of the first message is the same as the priority of the data message, and the priority of the second message is higher than the priority of the data message.
  • the first RTT measured by the first device is not affected by the reverse path (the transmission of the untransmitted service message). Transmission direction) is the impact of congestion.
  • the above method further includes: the second device receives a third message sent by the first device, the third message carries a third timestamp, and the third The timestamp is the local timestamp when the third message is sent; the priority of the third message is higher than the priority of the data message; the second device sends the fourth message to the first device; The fourth message carries the foregoing third time stamp; the priority of the fourth message is higher than the priority of the foregoing data message.
  • the second RTT measured by the first device is more accurate by using the third message and the fourth message whose priority is higher than the priority of the data message.
  • a network congestion control device is provided.
  • the device is a device for sending data packets.
  • the device includes: a processing unit and a transceiver unit; the processing unit is configured to: Send a first message to the second device, where the first message carries a first time stamp; the first time stamp is the local time stamp when the first message is sent; The second message sent by the second device, the second message carries the first timestamp; subtract the first timestamp from the second timestamp to obtain the first round-trip time RTT; The second time stamp is the local time stamp when the device receives the second message; the sending rate of the data message is adjusted according to the first RTT; where the priority of the first message and the data The priorities of the messages are the same, and the priority of the second message is higher than the priority of the data message.
  • the processing unit is further configured to: send a third message to the second device through the transceiver unit, the third message carrying a third message Time stamp, the third time stamp is the local time stamp when the third message is sent; the priority of the third message is higher than the priority of the data message;
  • the fourth message sent by the second device; the fourth message carries the third timestamp; the priority of the fourth message is higher than the priority of the data message; the fourth time is used Subtract the third time stamp from the stamp to obtain the second RTT; the fourth time stamp is the local time stamp when the device receives the fourth message.
  • the processing unit is specifically configured to: subtract the second RTT from the first RTT to obtain the time difference; Used to indicate the depth of network queue congestion; adjust the sending rate of data packets according to the time difference.
  • the processing unit is specifically configured to: if the time difference is less than a first preset threshold, increase the sending of the data message Rate; if the time difference is greater than a second preset threshold, reduce the sending rate of the data message; the first preset threshold is less than the second preset threshold.
  • the processing unit is further configured to: if the processing unit determines that the first message has been sent cumulatively since the last time the first message was sent The first preset number of data packets, obtain the third RTT; or, if the processing unit determines that the time interval between the current time and the last time the first packet is sent reaches the first preset duration, obtain the third RTT , And record the current timestamp.
  • the processing unit is further configured to: if the processing unit determines that the third message has been sent cumulatively since the last time the third message was sent The second preset number of data packets, obtain the fourth RTT; or, if the processing unit determines that the time interval between the current time and the last sending of the third packet reaches the second preset duration, obtain the fourth RTT , And record the current timestamp.
  • the third timestamp is carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried In the payload of the message RDMA.
  • the first time stamp is carried in the BTH of the RDMA message In the reserved field; if the first message and the data message are different, the first time stamp is carried in the reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in the message In the payload of the text RDMA.
  • a network congestion control device which includes: a processing unit and a transceiving unit; the processing unit is configured to receive a first message sent by a first device through the transceiving unit , The first message carries a first timestamp; the first timestamp is a local timestamp when the first message is sent; the first device is a device that sends a data message; through the transceiver unit Send a second message to the first device, and the second message carries the first time stamp; wherein the priority of the first message is the same as the priority of the data message, so The priority of the second message is higher than the priority of the data message; or, the priority of the first message is higher than the priority of the data message, and the priority of the second message It is the same as the priority of the data message.
  • the processing unit is further configured to: receive, through the transceiver unit, a third message sent by the first device, where the third message carries a Three timestamps, where the third timestamp is the local timestamp when the third message is sent; the priority of the third message is higher than the priority of the data message; The first device sends a fourth message; the fourth message carries the third timestamp; the priority of the fourth message is higher than the priority of the data message.
  • a computer storage medium is provided, and computer program code is stored in the computer storage medium.
  • the processor executes any of the above The network congestion control method described in the aspect.
  • the sixth aspect of the embodiments of the present application provides a computer program product that stores computer software instructions executed by the above-mentioned processor, and the computer software instructions include a program for executing the solution described in the above-mentioned aspect.
  • a seventh aspect of the embodiments of the present application provides a network congestion control device, which includes a transceiver, a processor, and a memory.
  • the transceiver is used for sending and receiving information or communicating with other network elements; and the memory is used for Computer-executable instructions are stored; the processor is used to execute the computer-executed instructions to implement the network congestion control method described in any of the above aspects.
  • the eighth aspect of the embodiments of the present application provides a network congestion control device.
  • the device exists in the form of a chip.
  • the structure of the device includes a processor and a memory.
  • the memory is used for coupling with the processor and storing the device.
  • Necessary program instructions and data, the processor is used to execute the program instructions stored in the memory, so that the device executes the functions of the device in the above method.
  • Figure 1 is a schematic diagram of a network congestion control solution provided by the prior art of this application.
  • Figure 2 is a schematic diagram of a network architecture provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of a network congestion control method provided by an embodiment of this application.
  • FIG. 4 is a schematic diagram of a way to carry a timestamp according to an embodiment of the application
  • FIG. 5 is a schematic diagram of another way of carrying time stamps according to an embodiment of the application.
  • FIG. 6 is a schematic flowchart of another network congestion control method provided by an embodiment of this application.
  • FIG. 7 is a schematic flowchart of another network congestion control method provided by an embodiment of this application.
  • FIG. 8 is a schematic diagram of a data transmission structure provided by an embodiment of this application.
  • FIG. 9 is a schematic diagram of the composition of a network congestion control device provided by an embodiment of this application.
  • FIG. 10 is a schematic diagram of the composition of another network congestion control device provided by an embodiment of the application.
  • FIG. 11 is a schematic diagram of the composition of another network congestion control device provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of the composition of another network congestion control apparatus provided by an embodiment of the application.
  • At least one of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c can be single or multiple.
  • RDMA remote direct memory access transfer data directly to the storage area of the computer through the network, and quickly move the data from a system to the remote system memory without any impact on the operating system, so that it does not require much computer processing functions. It can eliminate the overhead of external memory copy and context switching.
  • the RDMA protocol enables the computer's network interface card (NIC) to read from or write data to the memory of another computer through the network without the intervention of the computer's operating system.
  • NIC network interface card
  • RDMA Traversal Converged Ethernet (RoCE) is proposed by InfiniBand (IB) for RDMA to run on Ethernet.
  • RoCEv1 directly bearing and running on the Ethernet link layer
  • UDP User Datagram Protocol
  • the RTT round-trip time represents the total delay from the start of the sender sending data to the sender receiving the confirmation from the receiving end (the receiving end immediately sends the confirmation after receiving the data).
  • RTT is determined by three parts: the propagation time of the link, the processing time of the end system, the queuing and processing time in the buffer of the switch (or router). Among them, the propagation time of the link and the processing time of the end system are relatively fixed, and the queuing and processing time in the buffer of the switch (or router) will change with the change of the congestion degree of the entire network. Therefore, the change of RTT is to a certain extent Reflects changes in the degree of network congestion.
  • an embodiment of the present application provides a network congestion control method that can avoid reverse paths.
  • the impact of congestion can accurately control the depth of network queue congestion and improve system performance.
  • the embodiment of the application provides a network congestion control method, which is applied to a computer node in a data center that uses the RoCE protocol for data exchange.
  • the computer node is interconnected through one or more switches; a certain topological relationship between multiple switches (For example, CLOS topology) is connected to form a data center network with one or more paths.
  • the embodiment of the present application does not limit the topological relationship between the switches, which is only an exemplary description here.
  • Figure 2 is a network architecture provided by an embodiment of the application, including a computer node A and a computer node B.
  • the computer node A can be connected to the computer node B through one or more switches.
  • the computer node A includes a host A and a network card A.
  • Computer node B includes host B and network card B.
  • the network card A of the computer node A and the network card B of the computer node B exchange data through remote direct memory access (RDMA).
  • RDMA remote direct memory access
  • a communication queue pair (Queue Pair, QP) is created.
  • One of the communication queue pairs is a sending queue and the other is a receiving queue.
  • QP is full-duplex communication.
  • the end sending a request is the requesting end, and the end receiving the request and responding is the responding end.
  • RoCE requests are issued by the application program, and the request types used for data exchange with remote computer nodes mainly include Write, Send, and Read.
  • Write and Send are the computer requesting end to send data, and the computer responding end will respond with an acknowledgement character (Acknowledgement, ACK) after receiving the data;
  • Read is the computer requesting end sending a read request, and the computer responding end receiving the request and responding to the read data. That is, Write/Send is the requester to send data, and Read is the responder to send data, so the data transmission directions of Write/Send and Read are different.
  • the data exchange is bidirectional.
  • network card A sends a request
  • network card B is the responding end.
  • RDMA Write/Send carries data on network card A
  • RDMA Read carries data on network card B
  • network card B sends a request
  • network card B is the requester
  • network card A is the responder.
  • RDMA Write/Send network card B carries data
  • RDMA Read carries data on network card A.
  • the network congestion control method provided in the embodiment of the present application distinguishes the data transmission direction.
  • an embodiment of the present application provides a network congestion control method, which is applied to a first device, and the first device is a device that sends a data message.
  • the method may include steps S301-S307.
  • the network congestion control method of steps S301-S307 can be executed to reduce the congestion depth of the network queue.
  • the first device sends a first message to the second device.
  • the first message carries the first time stamp.
  • the first time stamp is a local time stamp when the first device sends the first message.
  • the priority of the first message is the same as the priority of the data message.
  • the first message may be a data message, or may be a message specifically for measuring delay, which is not limited in the embodiment of the present application.
  • the first message is a message dedicated to measuring delay
  • the first message and the data message sent by the first device are different messages, but the first message and the data message have priority The same level.
  • the first device in this embodiment is a device that sends a data packet, and the transmission direction of the data packet is from the first device to the second device.
  • the first device as the computer node A and the second device as the computer node B as an example.
  • the Write or Send request message (for example, the first message is a Write or Send request message)
  • the Write or Send request message carries data
  • the computer node A is the device that sends the data message
  • So the transmission direction of the data message is from computer node A to computer node B.
  • the first device as the computer node B and the second device as the computer node A as an example.
  • computer node A sends a Read request
  • computer node B sends the above-mentioned first message to computer node A, that is, computer node B is the device that sends the data message, so the data message is The transmission direction is from computer node B to computer node A.
  • it may also include the second device sending a Read request to the first device.
  • the first message when the first message is a message specifically for measuring the delay sent by the first device, the first message carries the foregoing first time stamp.
  • the carrying manner of the first time stamp is shown in FIG. 4 or FIG. 5.
  • the first time stamp in the first message may be carried in a reserved field (reserved, rsvd) in the basic transport header (Base Transport Header, BTH) of the remote direct memory access RDMA of the message.
  • BTH Basic Transport Header
  • the first reserved field (rsvd) is the 5th byte in the BTH
  • the second reserved field (rsvd) is the lower 7 bits of the 9th byte in the BTH.
  • the first time stamp can be carried in one of the first reserved field (rsvd) and the second reserved field (rsvd), or can be carried in a combination of two reserved fields, which is not carried out in this embodiment of the application. limited.
  • the operation code (Operation Code, Opcode) in Figure 4 is used to indicate the type of the data packet or the higher-level protocol type in IB PayLoad; the request event identifier (Solicited Event, SE) indicates that the responder should generate an event; the migration status
  • the identifier (MigReq, M) is used to identify the migration status; the number of payload padding bytes (Pad Count, Pad), which identifies how many extra bytes are filled in IB PayLoad; the transport header version number (Transport Header Version, TVer), Used to indicate the version number of the packet; Partition Key is used to characterize the logical memory partition associated with this Packet; Destination Queue Pair indicates the destination serial number; A (Acknowledge Request, A) ) Request to respond with a response; Packet Sequence Number (PSN), used to detect lost or duplicate data packets.
  • PSN Packet Sequence Number
  • the first time stamp in the above first message can also be carried in the payload of the RDMA message, and the payload part may not carry valid data; by carrying the first time stamp in the payload, there can be more Large storage space, so it can carry a more accurate time stamp.
  • the meanings of other fields in FIG. 5 are the same as those in FIG. 4, and will not be repeated here.
  • the foregoing first time stamp is carried in the data message.
  • the manner of carrying the first time stamp in the data message is shown in FIG. 4, that is, the first time stamp in the second message can be carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message.
  • the first time stamp in the second message can be carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message.
  • this embodiment uses the reserved field of the existing protocol to carry the time stamp. Compared with the prior art, there is no need to record the relationship between the time stamp and the message sequence number, and it takes up less resources.
  • the second device receives the first message.
  • the second device constructs a second message according to the first message.
  • the second device extracts the first time stamp from the first message, and constructs a second message, and the second message carries the first time stamp.
  • the manner of carrying the first time stamp in the second message is shown in Figure 4 or Figure 5, that is, the first time stamp in the second message can be carried in the basic transmission of remote direct memory access RDMA of the message.
  • the reserved field in the header BTH or carried in the payload of the message RDMA.
  • the second device sends a second message to the first device.
  • the second message carries the foregoing first time stamp.
  • the priority of the second message is higher than the priority of the data message.
  • the second message may be an out-of-band message.
  • the out-of-band message is an out-of-service message and is used to assist in measuring RTT.
  • the second message is completely independent from the service and can be sent through the underlying control module. . It is understandable that because the priority of the second message sent by the second device to the first device is higher than the priority of the data message, when sending the second message, it can be sent before the data message, so that It is not affected by whether the network is congested in the transmission direction from the second device to the first device, etc., so that the measured RTT can more accurately reflect the degree of queue congestion in the transmission direction from the first device to the second device.
  • the first device sends the first packet with the same priority as the data packet to the second device
  • the second device sends the first packet with the priority higher than the priority of the data packet to the first device.
  • the second message can avoid the influence of congestion on the reverse path (the transmission direction from the second device to the first device), and the RTT can be measured more accurately.
  • the first device For example, take the first device as the computer node A and the second device as the computer node B as an example.
  • the priority of the first message is the same as the priority of the data message.
  • After computer node B receives the first message Send the second message with priority higher than the priority of the data message to computer node A, so as to avoid the influence of network congestion in the transmission direction from computer node B to computer node A, so that the measured RTT can be more accurately reflected
  • Computer node B sends a first message to computer node A (this first message is the first message that computer node B replies with the same priority as the data message after receiving the Read request sent by computer node A, the first message A message can carry data, or it can be a message with the same priority as the data message that specifically measures the delay), the first message has the same priority as the data message, and the computer node A receives After the first message, a second message with a priority higher than the priority of the data message is sent to the computer node B, so as to avoid the influence of network congestion in the transmission direction from the computer node A to the computer node B, so that the measurement The RTT can more accurately reflect the degree of queue congestion in the transmission direction from computer node B to computer node A.
  • this embodiment can avoid congestion in the reverse path (transmission direction of untransmitted service packets) by adopting a message with a higher priority than that of the data message in the transmission direction of the untransmitted service message. Influence, so that the measured delay can more accurately reflect the degree of queue congestion in the direction of transmitting service packets.
  • the first device receives the second message.
  • the first device subtracts the first time stamp from the second time stamp to obtain the first RTT.
  • the second time stamp is a local time stamp when the first device receives the second message.
  • the second timestamp minus the first timestamp can be understood as the total experience from the first device sending the first message to the first device receiving the second message sent by the peer (the second device) Time is the first RTT.
  • the first RTT not only considers the switch (or router) The queuing and processing time in the cache, and avoids the impact of reverse path (transmission direction of untransmitted service packets) congestion, so the first RTT is related to the degree of network congestion, and will change dynamically with the degree of network congestion Changes can more accurately reflect the current network congestion.
  • This first RTT may be referred to as dynamic RTT.
  • S307 The first device adjusts the sending rate of the data packet according to the first RTT.
  • the first RTT can reflect the current congestion degree of the network queue.
  • the larger the first RTT the more congested the network queue, so the sending rate of the data packet can be adjusted according to the first RTT.
  • the computer node A For example, take the first device as the computer node A and the second device as the computer node B as an example. If the computer node A sends a Write or Send request message (the first message), the computer node A is the device that sends the data message. At this time, the transmission direction of the data message is from the computer node A to the computer node B. An RTT can reflect the congestion degree of the network queue in the transmission direction from the computer node A to the computer node B. Therefore, the computer node A can adjust the sending rate of the data message according to the first RTT.
  • the first device For example, take the first device as the computer node B and the second device as the computer node A as an example. If computer node A sends a Read request, after receiving the Read request from computer node A, computer node B sends the above-mentioned first message to computer node A. This computer node B is the device that sends the data message. The transmission direction is from the computer node B to the computer node A, so the computer node B can adjust the sending rate of the data message according to the first RTT.
  • the foregoing first device may adjust the sending rate of the data message according to the first RTT, which may include: if the first RTT is greater than the first preset threshold, it is determined that the network is relatively congested at the time, and the sending of data messages can be reduced. If the first RTT is less than the second preset threshold, it is determined that the current network is not congested, and the sending rate of data packets can be appropriately increased to make full use of the network capacity.
  • the second preset threshold is less than or Equal to the first preset threshold.
  • the method of steps S301-S307 can be used to adjust the sending rate of the data packet to reduce the congestion degree of the network queue.
  • the first device sends the first message to the second device through the first device; the second device receives the first message; the second device constructs the second message according to the first message; The second device sends the second message to the first device; the first device receives the second message; the first device subtracts the first timestamp from the second timestamp to obtain the first round-trip time RTT; the first device according to the first RTT To adjust the sending rate of data packets.
  • the measurement of the first RTT is not affected by the congestion of the reverse path (the transmission direction of the untransmitted service packet), and the determined first RTT is more accurate. Therefore, the transmission rate of the data packet is adjusted through the first RTT. At this time, it can reduce the network queue congestion and improve system performance.
  • the embodiment of the present application also provides a network congestion control method. As shown in FIG. 6, before the above step S307, the method further includes steps S601-S606.
  • the first device sends a third packet to the second device.
  • the third message carries a third timestamp, and the third timestamp is a local timestamp when the third message is sent.
  • the priority of the third message is higher than the priority of the data message.
  • the third message may be a message that does not carry data.
  • the third message may be an out-of-band message to assist in measuring RTT.
  • the way of carrying the third time stamp in the third message is shown in Figure 4 or Figure 5, that is, the third time stamp in the third message can be carried in the basic transmission of remote direct memory access RDMA of the message.
  • the reserved field in the header BTH or carried in the payload of the message RDMA.
  • the second device receives the third message.
  • the second device constructs a fourth message according to the third message.
  • the second device extracts the third timestamp from the third message, and constructs a fourth message, and the fourth message carries the third timestamp.
  • FIG. 4 Exemplarily, the manner of carrying the third time stamp in the fourth message is shown in FIG. 4 or FIG. 5.
  • FIG. 4 or FIG. 5 For details, please refer to the aforementioned related description, which will not be repeated here.
  • S604 The second device sends a fourth packet to the first device.
  • the priority of the fourth message is higher than the priority of the data message.
  • the fourth message may be a message that does not carry data.
  • the third message may be an out-of-band message to assist in measuring RTT.
  • S605 The first device receives the fourth packet.
  • the first device subtracts the third time stamp from the fourth time stamp to obtain the second RTT.
  • the fourth time stamp is a local time stamp when the first device receives the fourth message.
  • the fourth timestamp minus the third timestamp can be understood as the total elapsed time from the first device sending the third message to the first device receiving the fourth message sent by the second device, which is The second RTT.
  • the first device may save the second RTT in the context.
  • the value of the second RTT is basically fixed, and may vary slightly with network performance, etc. .
  • This second RTT may be referred to as a fixed RTT.
  • steps S301-S306 are the same as the path for obtaining the second RTT in steps S601-S606.
  • the above steps S301-S306 may be executed before steps S601-S606, or may be executed after steps S601-S606, or may also be executed simultaneously with steps S601-S606, which is not limited in the embodiment of the present application.
  • the first device After performing the above steps S301-S306 and S601-S606, correspondingly, in the above S307, the first device adjusts the sending rate of the data packet according to the first RTT, including: the first device adjusts according to the first RTT and the second RTT The sending rate of data packets.
  • the first device adjusts the sending rate of the data packet according to the first RTT and the second RTT, including: the first device subtracts the second RTT from the first RTT to obtain a time difference, which is used to indicate network queue congestion Depth: The first device adjusts the sending rate of the data message according to the time difference.
  • the first device may obtain the saved second RTT from the context, and subtract the second RTT saved in the context from the first RTT to obtain the time difference.
  • the time difference between the first RTT and the second RTT may be used to indicate the depth of network queue congestion in the transmission direction from the first device to the second device. It can be understood that if the first device is a device that sends data packets, the time difference specifically represents the depth of queue congestion in the transmission direction from the first device to the second device; the smaller the time difference, the smaller the time difference, the The queue in the transmission direction to the second device is less congested; the larger the time difference is, the deeper the queue congestion in the transmission direction from the first device to the second device, that is, the more serious the network congestion. If the first device is a device that receives data packets, the time difference specifically represents the depth of queue congestion in the transmission direction from the second device to the first device.
  • adjusting the sending rate of the data message by the first device according to the time difference may include: if the time difference is less than a first preset threshold, increasing the sending rate of the data message; if the time difference is greater than the second preset threshold, reducing The sending rate of small data packets; the first preset threshold (T low ) is less than the second preset threshold (T high ), and the setting of the first preset threshold and the second preset threshold may be empirical values, and Link speed, device jitter and other factors are related.
  • the sending rate of data packets can be increased to fully utilize the network capacity; if T q is greater than or equal to T high , it means that the depth of network queue congestion is very large. It can be considered that the current network queue is relatively congested. This situation can reduce the sending rate of data packets to reduce the depth of network queue congestion; if T q is greater than or equal to T low , And is less than T high , indicating that the depth of network queue congestion is within the acceptable range of the network, and the network can be considered lightly congested. In this case, the current data message sending rate may not be changed, and the light-congested network state can be maintained.
  • the foregoing adjustment of the sending rate of the data message may increase or decrease the sending rate of the data message through a preset algorithm.
  • the algorithm can be a sum-increasing and multiplicative-decrease (AIMD) algorithm.
  • AIMD multiplicative-decrease
  • the AIMD algorithm can be used to control the sending rate of data packets, including: when the network is not congested, linearly increasing its sending speed; When congested, it multiply reduces its sending speed.
  • the embodiment of the present application does not limit the algorithm used to adjust the sending rate of the data message, and is only an exemplary description here.
  • the embodiment of the present application accurately obtains the time difference used to indicate the congestion depth of the network queue, and adjusts the sending rate of data packets according to the time difference, which can reduce the sending rate of data packets when the network is congested. , Thereby reducing the depth of network queue congestion and improving system performance.
  • the first RTT and the second RTT are obtained, and the second RTT is subtracted from the first RTT to obtain the time difference; and according to the time difference, the sending rate of the data packet is adjusted to reduce Congestion depth of small network queues.
  • the second RTT can be measured more accurately, and the difference between the first RTT and the second RTT can be calculated to obtain the time difference.
  • the time difference can accurately reflect the congestion depth of the network queue between the first device and the second device. Therefore, when adjusting the sending rate of data packets according to the time difference, the congestion depth of the network queue can be effectively reduced and system performance can be improved.
  • the embodiment of the present application also provides a network congestion control method. As shown in FIG. 7, the method further includes steps S701-S704 after step S307.
  • the first device determines that the first preset number of data packets have been sent cumulatively since the first message was sent last time, acquire the third RTT, or if the first device determines that the current time is different from the last time the first message was sent When the time interval reaches the first preset duration, the third RTT is obtained, and the current time stamp is recorded.
  • the third RTT and the first RTT are dynamic RTTs at different moments.
  • the dynamic RTT can be detected periodically to determine the degree of current network congestion.
  • the cycle period of the cyclic detection of the dynamic RTT may be: the first device has cumulatively sent a first preset number of data packets since sending the first packet to obtain the third RTT. For example, since the last time the data packet carrying the first time stamp was sent, the first device has sent J data packets cumulatively, and J is greater than or equal to 2, and the first device obtains the third RTT.
  • the cycle period of the cyclic detection dynamic RTT may be: the first device determines that the time interval between the current time and the last sending of the first message reaches the first preset duration, obtains the third RTT, and records the current time stamp. For example, since the first message was sent last time, and the time interval reaches K microseconds, the first device obtains the third RTT.
  • step S701 The specific implementation manner of obtaining the third RTT in step S701 is the same as the specific implementation manner of obtaining the first RTT in the foregoing steps S301-S306.
  • step S701 The specific implementation manner of obtaining the third RTT in step S701 is the same as the specific implementation manner of obtaining the first RTT in the foregoing steps S301-S306.
  • the first device determines that the second preset number of data packets have been sent cumulatively since the last time the third message was sent, acquire the fourth RTT; or, if the first device determines that the current time is different from the last time the third message was sent When the time interval reaches the second preset duration, the fourth RTT is obtained, and the current time stamp is recorded.
  • the fourth RTT and the second RTT are fixed RTTs at different times.
  • the data transmission path between the first device and the second device may change.
  • the switch can choose the second path to reach the network card B for transmission data. It is understandable that after the network path between the network card A and the network card B changes, the fixed RTT will also change, so the fixed RTT can be detected periodically.
  • the cycle period of the cyclic detection of the fixed RTT may be: the first device has cumulatively sent a second preset number of data packets since sending the third message last time, and obtains the fourth RTT. For example, since the third message was sent last time, the first device has sent N data packets cumulatively, and N is greater than or equal to 2, and the first device obtains the fourth RTT.
  • the cycle period of cyclically detecting the fixed RTT may be: the first device determines that the time interval between the current time and the last sending of the third packet reaches the second preset duration, and obtains the fourth RTT. For example, since the third message was sent last time and the time interval reaches M microseconds, the first device obtains the fourth RTT.
  • step S702 The specific implementation manner of obtaining the fourth RTT in step S702 is the same as the implementation manner of obtaining the second RTT in the foregoing steps S601-S606.
  • step S702 The specific implementation manner of obtaining the fourth RTT in step S702 is the same as the implementation manner of obtaining the second RTT in the foregoing steps S601-S606.
  • steps S601-S606 The specific implementation manner of obtaining the fourth RTT in step S702 is the same as the implementation manner of obtaining the second RTT in the foregoing steps S601-S606.
  • the cycle time for obtaining the third RTT in step S701 may be less than the cycle time for obtaining the fourth RTT in step S702, which is not limited in the embodiment of the present application, and is only an exemplary description here.
  • the current fixed RTT may be a fixed RTT saved in the context
  • the fixed RTT saved in the context may be the second RTT or the fourth RTT.
  • the first device may obtain the saved current fixed RTT from the context, and subtract the current fixed RTT saved in the context from the third RTT to obtain the time difference. If the current fixed RTT saved in the context is the second RTT, step S703 may subtract the second RTT from the third RTT to obtain the time difference. If the current fixed RTT saved in the context is the fourth RTT, step S703 may subtract the fourth RTT from the third RTT to obtain the time difference.
  • S704 Adjust the sending rate of the data message according to the time difference.
  • step S704 may refer to the specific implementation manner in the foregoing step S307, which will not be repeated here.
  • the network congestion control method provided in this embodiment can repeatedly execute steps S701-S704 to control different network paths and different network congestion conditions at different times to ensure high network performance.
  • the fixed RTT and the dynamic RTT are periodically and cyclically detected, which can improve the accuracy of the fixed RTT and the dynamic RTT, thereby effectively reducing the depth of network queue congestion and improving system performance.
  • FIG. 8 is a schematic diagram of a data transmission structure provided by an embodiment of the application. For example, if NIC A sends Write/Send to NIC B, NIC A is the device that sends data messages, and the transmission direction of the data message is from NIC A to NIC B, and the direction from NIC B to NIC A is no data transmission The direction of the message; if the network card A sends a Read to the network card B, then the network card A is the device that receives the data message, and the transmission direction of the data message is from the network card B to the network card A, and the direction from the network card A to the network card B is not transmitted The direction of the data message.
  • network card A is the device that sends data packets.
  • the network card A may include a fixed RTT request module 810, a fixed RTT response module 812, a dynamic RTT request module 820, a dynamic RTT response module 822, a rate control module 830, and a sending module 840.
  • the network card B includes a fixed RTT reflection module 811, a dynamic RTT reflection module 821, and a receiving module 850.
  • the fixed RTT request module 810 is used to construct a fixed RTT request message and encapsulate the local time stamp 1 of the network card A in the message.
  • the priority of the fixed RTT request message is higher than the priority of the data message sent by the network card A to the network card B.
  • the fixed RTT reflection module 811 is configured to receive the request message sent by the fixed RTT request module 810, and retrieve the time stamp 1 from the request message to construct a fixed RTT response message.
  • the priority of the fixed RTT response message is higher than the priority of the data message sent by the network card A to the network card B.
  • the fixed RTT response module 812 is used to receive the fixed RTT response message sent by the fixed RTT reflection module 811, and extract the time stamp 1 from the fixed RTT response message, and take out the local time stamp 2 and take out according to the local time stamp 2 when the network card A receives the fixed RTT response message Time stamp 1, calculate the difference and save it in the context, and record the time difference as a fixed RTT.
  • the dynamic RTT request module 820 is configured to encapsulate the local time stamp 3 in the data message, and send the data message encapsulating the time stamp 3 to the network card B.
  • the receiving module 850 is configured to receive the data message sent by the dynamic RTT request module 820.
  • the dynamic RTT reflection module 821 is used to extract the time stamp 3 from the data message received by the receiving module 850 to construct a dynamic RTT response message.
  • the priority of the dynamic RTT response message is higher than the priority of the data message.
  • the dynamic RTT response module 822 is used to receive the dynamic RTT response message sent by the dynamic RTT reflection module 821, and extract the time stamp 3 from the dynamic RTT response message, and take it out according to the local time stamp 4 when the network card A receives the dynamic RTT response message Time stamp 3, calculate the difference and save it in the context, and record the time difference as dynamic RTT.
  • the rate control module 830 is used to obtain the dynamic RTT from the dynamic RTT response module 822, and obtain the stored fixed RTT from the context, and calculate the time difference.
  • the time difference (T q ) is the dynamic RTT minus the fixed RTT, and the time difference is used to indicate the network Congestion level of the queue.
  • T q is less than T low , it means that the network queue is not congested and the data transmission rate can be increased; if T q is greater than or equal to T high , it means the network queue is congested, and the data transmission rate can be reduced to reduce the congestion depth of the queue; if T q is greater than or equal to T low and less than T high , indicating that the network queue is lightly congested, and the current data sending rate is not changed.
  • the sending module 840 is a data transmission module for sending data messages.
  • the sending module 840 may send data packets according to the sending rate adjusted by the rate control module 830.
  • FIG. 8 only takes the network card A as a device for sending data packets as an example for description.
  • the network card B may also be a device for sending data, which is not limited in the embodiment of the present application.
  • the functional modules included in the network card B are the same as the modules included in the network card A in FIG. 8.
  • the embodiment of the present application may divide the computer into functional modules according to the foregoing method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 9 shows a possible structural schematic diagram of a network congestion control device involved in the above embodiment.
  • the network congestion control device 900 includes: a processing module 901 and a transceiver module 902.
  • the processing module 901 can execute S301, S305-S307 in FIG. 3, or S601, S605-S606 in FIG. 6, or S701-S704 in FIG. 7 through the transceiver module 902.
  • S301, S305-S307 in FIG. 3, or S601, S605-S606 in FIG. 6, or S701-S704 in FIG. 7 through the transceiver module 902.
  • all relevant content of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.
  • FIG. 10 shows a possible structural schematic diagram of a network congestion control device involved in the foregoing embodiment.
  • the network congestion control device 1000 includes: a processing module 1001 and a transceiver module 1002.
  • the processing module 1001 can execute S302-S304 in FIG. 3 or S602-S604 in FIG. 6 through the transceiver module 1002.
  • S302-S304 in FIG. 3 or S602-S604 in FIG. 6 can execute S302-S304 in FIG. 3 or S602-S604 in FIG. 6 through the transceiver module 1002.
  • all relevant content of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.
  • FIG. 11 shows a schematic diagram of a possible structure of the network congestion control apparatus 1100 involved in the foregoing embodiment.
  • the network congestion control device 1100 includes: a processor 1101 and a transceiver 1102.
  • the processor 1101 is used to control and manage the actions of the network congestion control device 1100.
  • the processor 1101 is used to perform the operations shown in FIG. 3 through the transceiver 1102. S301, S305-S307, or S601, S605-S606 in FIG. 6, or S701-S704 in FIG. 7, and/or other processes used in the technology described herein.
  • the aforementioned network congestion control apparatus 1100 may further include a memory 1103 configured to store the program code and data corresponding to any of the network congestion control methods provided above by the network congestion control apparatus 1100.
  • the memory 1103 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • FIG. 12 shows a schematic diagram of a possible structure of the network congestion control apparatus 1200 involved in the foregoing embodiment.
  • the network congestion control device 1200 includes a processor 1201 and a transceiver 1202.
  • the processor 1201 is configured to control and manage the actions of the network congestion control device 1200.
  • the processor 1201 is configured to execute the operation shown in FIG. 3 through the transceiver 1202. S302-S304, or S602-S604 in FIG. 6, and/or other processes used in the techniques described herein.
  • the aforementioned network congestion control apparatus 1200 may further include a memory 1203 configured to store the program code and data corresponding to any of the network congestion control methods provided above by the network congestion control apparatus 1200.
  • the memory 1203 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM read-only memory
  • RAM random access memory
  • the steps of the method or algorithm described in conjunction with the disclosure of this application can be implemented in a hardware manner, or implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in random access memory (Random Access Memory, RAM), flash memory, erasable programmable read-only memory (Erasable Programmable ROM, EPROM), and electrically erasable Programming read-only memory (Electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and can write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC may be located in the core network interface device.
  • the processor and the storage medium may also exist as discrete components in the core network interface device.
  • the functions described in this application can be implemented by hardware, software, firmware or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
  • the computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another.
  • the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.

Abstract

The embodiments of the present application disclose a method and device for controlling network congestion, relate to the field of communication technology, and solve the problem of inaccurate RTT measured in the prior art, which causes the congestion depth of the network queue to not be effectively controlled. The specific solution is: applied to a first apparatus, the first apparatus is an apparatus that sends the data message, the first apparatus sends a first message to a second apparatus, the first message carries a first timestamp; the first timestamp is the local timestamp when the first message is sent; the first apparatus receives a second message sent by the second apparatus, the second message carries the first timestamp; subtracting the first timestamp from the second timestamp to obtain a first RTT; adjusting the sending rate of the data message according to the first RTT; wherein, the priority of the first message is the same as the priority of the data message, the priority of the second message is higher than the priority of the data message.

Description

一种网络拥塞控制方法和装置Method and device for controlling network congestion
本申请要求于2019年04月12日提交国家知识产权局、申请号为201910295531.0、申请名称为“一种网络拥塞控制方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office on April 12, 2019, the application number is 201910295531.0, and the application name is "a network congestion control method and device", the entire content of which is incorporated herein by reference Applying.
技术领域Technical field
本申请实施例涉及通信技术领域,尤其涉及一种网络拥塞控制方法和装置。The embodiments of the present application relate to the field of communication technologies, and in particular, to a method and device for network congestion control.
背景技术Background technique
目前,数据中心网络中计算机可以通过远程直接内存访问(Remote Direct Memory Access,RDMA)进行数据交换,从而使计算机的网络接口卡(Network Interface Card,NIC)通过网络从另外一个计算机的内存读取或者向另外一个计算机的内存写入数据,而不需要计算机的操作系统介入。RDMA在以太网上运行称为RDMA穿越汇聚以太网(RDMA over Converged Ethernet,RoCE)技术。At present, computers in the data center network can exchange data through remote direct memory access (RDMA), so that the computer's network interface card (NIC) can read or read from the memory of another computer through the network. Write data to the memory of another computer without the intervention of the computer's operating system. RDMA running on Ethernet is called RDMA over Converged Ethernet (RoCE) technology.
为了避免网络拥塞丢包给RoCE带来性能损失,现有的一种流控方法通过测量一段报文(64KB)的往返时延,计算往返时间(Round Trip Time,RTT),并根据该往返时间RTT调整发送速率。如图1所示,现有技术中的RTT的计算是按一个数据段来进行的,该数据段的大小为64KB,可以包含多个报文,主机A在发送这个数据段的第一个报文时,记录发送时的时间戳t s,主机A收到主机B发送的确认报文(Acknowledgement,ACK)时,记录完成时间t comp。如图1所示,往返时间RTT=t comp-t s-t 串行,其中,t 串行为数据段串行化传送时间,该数据段串行化传送时间为数据段大小(64KB)除以线路速率。 In order to avoid network congestion and packet loss from causing performance loss to RoCE, an existing flow control method measures the round trip delay of a packet (64KB), calculates the round trip time (Round Trip Time, RTT), and calculates the round trip time according to the round trip time. RTT adjusts the sending rate. As shown in Figure 1, the calculation of RTT in the prior art is performed on a data segment. The size of the data segment is 64KB and can contain multiple messages. Host A is sending the first message of this data segment. When the message is sent, the time stamp t s at the time of sending is recorded, and when the host A receives the Acknowledgement (ACK) sent by the host B, the completion time t comp is recorded. As shown in Figure 1, the round-trip time RTT = t comp- t s- t serial , where t serial is the serialized transmission time of the data segment, and the serialized transmission time of the data segment is divided by the data segment size (64KB) At the line rate.
但是,该方案中的数据段大小为64KB,而实际应用中每个请求要求的数据块大小并不固定,因此无法保证每64KB一个ACK,如果数据块较小,每64KB数据段会有多个ACK,使用不同的ACK的完成时间来计算RTT会影响RTT准确度,因此采用该方案计算的RTT不准确,从而导致网络队列的拥塞深度不能得到有效的控制;而且该方案测量的RTT受反向路径拥塞的影响,不能准确的反应是请求方向发生了拥塞,还是应答方向发生了拥塞,从而可能造成控制系统误判。However, the size of the data segment in this solution is 64KB, and the size of the data block required by each request in actual applications is not fixed, so there is no guarantee of an ACK every 64KB. If the data block is small, there will be multiple data segments per 64KB ACK, using different ACK completion times to calculate RTT will affect the accuracy of RTT, so the RTT calculated by this scheme is not accurate, resulting in the congestion depth of the network queue cannot be effectively controlled; and the RTT measured by this scheme is reversed The influence of path congestion cannot accurately reflect whether congestion occurs in the request direction or the response direction, which may cause misjudgment by the control system.
发明内容Summary of the invention
本申请实施例提供一种网络拥塞控制方法和装置,能够避免反向路径拥塞的影响,准确的控制网络队列拥塞深度,提升系统性能。The embodiments of the present application provide a network congestion control method and device, which can avoid the influence of reverse path congestion, accurately control the congestion depth of network queues, and improve system performance.
为达到上述目的,本申请实施例采用如下技术方案:In order to achieve the foregoing objectives, the following technical solutions are adopted in the embodiments of this application:
本申请实施例的第一方面,提供一种网络拥塞控制方法,该方法应用于第一设备,该第一设备为发送数据报文的设备,该方法包括:第一设备向第二设备发送第一报文,该第一报文中携带第一时间戳;该第一时间戳为发送上述第一报文时的本地时间戳;上述第一设备接收上述第二设备发送的第二报文,该第二报文中携带上述第一时间戳;用第二时间戳减去上述第一时间戳,获取第一往返时间RTT;该第二时间戳为上述第 一设备接收上述第二报文时的本地时间戳;根据该第一RTT,调整数据报文的发送速率;其中,上述第一报文的优先级和上述数据报文的优先级相同,上述第二报文的优先级高于上述数据报文的优先级。基于本方案,该第一RTT的测量不受反向路径(未传输业务报文的传输方向)是否拥塞的影响,确定的第一RTT较准确,故通过该第一RTT,调整数据报文的发送速率时,能够减小网络队列拥塞程度,提升系统性能。可以理解的,该第一RTT不仅考虑了交换机(或路由器)的缓存中的排队和处理时间,而且避免了反向路径(未传输业务报文的传输方向)拥塞的影响,因此该第一RTT与网络拥塞程度相关,是会随着网络拥塞程度的变化动态变化的,能够较为准确的反映当前网络的拥塞程度,因此该第一RTT可以称为动态RTT。In a first aspect of the embodiments of the present application, a network congestion control method is provided. The method is applied to a first device. The first device is a device that sends a data packet. The method includes: the first device sends a first device to a second device. A message, the first message carries a first time stamp; the first time stamp is a local time stamp when the first message is sent; the first device receives the second message sent by the second device, The second message carries the first time stamp; the first time stamp is subtracted from the second time stamp to obtain the first round-trip time RTT; the second time stamp is when the first device receives the second message According to the first RTT, adjust the sending rate of the data message; wherein the priority of the first message is the same as the priority of the data message, and the priority of the second message is higher than the priority The priority of the data message. Based on this solution, the measurement of the first RTT is not affected by whether the reverse path (the transmission direction of the untransmitted service message) is congested, and the determined first RTT is more accurate. Therefore, the first RTT is used to adjust the data message When sending rate, it can reduce network queue congestion and improve system performance. It is understandable that the first RTT not only considers the queuing and processing time in the buffer of the switch (or router), but also avoids the influence of congestion in the reverse path (the transmission direction of untransmitted service packets). Therefore, the first RTT Related to the degree of network congestion, it dynamically changes with changes in the degree of network congestion, and can more accurately reflect the degree of congestion of the current network. Therefore, the first RTT can be called a dynamic RTT.
结合第一方面,在一种可能的实现方式中,上述方法还包括:上述第一设备向上述第二设备发送第三报文,该第三报文中携带第三时间戳,该第三时间戳为发送第三报文时的本地时间戳;该第三报文的优先级高于上述数据报文的优先级;上述第一设备接收上述第二设备发送的第四报文;该第四报文中携带上述第三时间戳;该第四报文的优先级高于上述数据报文的优先级;用第四时间戳减去该第三时间戳,获取第二RTT;该第四时间戳为上述第一设备接收上述第四报文时的本地时间戳。基于本方案,通过采用优先级高于数据报文的优先级的第三报文和第四报文,能够较为准确的测量第二RTT。可以理解的,在第一设备的网卡和第二设备的网卡之间的数据传输路径不变的情况下,该第二RTT的值基本是固定的,可能随着网络性能等的有略微变化,因此该第二RTT可以称为固定RTT。With reference to the first aspect, in a possible implementation manner, the above method further includes: the first device sends a third message to the second device, the third message carries a third time stamp, and the third time The stamp is the local timestamp when the third message is sent; the priority of the third message is higher than the priority of the data message; the first device receives the fourth message sent by the second device; the fourth The message carries the third time stamp; the priority of the fourth message is higher than the priority of the data message; the fourth time stamp is subtracted from the third time stamp to obtain the second RTT; the fourth time The stamp is the local timestamp when the first device receives the fourth message. Based on this solution, the second RTT can be measured more accurately by using the third message and the fourth message with a priority higher than the priority of the data message. It is understandable that under the condition that the data transmission path between the network card of the first device and the network card of the second device is unchanged, the value of the second RTT is basically fixed, and may slightly change with network performance. Therefore, the second RTT can be called a fixed RTT.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,上述根据上述第一RTT,调整数据报文的发送速率,包括:用上述第一RTT减去上述第二RTT,得到时间差;该时间差用于指示网络队列拥塞深度;根据该时间差,调整数据报文的发送速率。基于本方案,第一RTT和第二RTT的差值得到的时间差能够准确的反应第一设备和第二设备之间网络队列拥塞的深度,因此根据该时间差调整数据报文的发送速率时,能够有效的减小网络队列的拥塞深度,提升系统性能。Combining the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing adjusting the sending rate of the data message according to the foregoing first RTT includes: subtracting the foregoing second RTT from the foregoing first RTT, Obtain the time difference; the time difference is used to indicate the congestion depth of the network queue; according to the time difference, adjust the sending rate of the data message. Based on this solution, the time difference obtained by the difference between the first RTT and the second RTT can accurately reflect the depth of network queue congestion between the first device and the second device. Therefore, when adjusting the sending rate of data packets according to the time difference, it can be Effectively reduce the congestion depth of network queues and improve system performance.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,上述根据上述时间差,调整数据报文的发送速率,包括:若该时间差小于第一预设阈值,增大上述数据报文的发送速率;若该时间差大于第二预设阈值,减小上述数据报文的发送速率;上述第一预设阈值小于上述第二预设阈值。基于本方案,能够在网络队列较拥塞时减小数据报文的发送速率,以减小网络队列拥塞深度。Combining the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing adjusting the sending rate of data packets according to the foregoing time difference includes: if the time difference is less than a first preset threshold, increasing the foregoing data The message sending rate; if the time difference is greater than the second preset threshold, the sending rate of the data message is reduced; the first preset threshold is less than the second preset threshold. Based on this solution, the sending rate of data packets can be reduced when the network queue is relatively congested, so as to reduce the congestion depth of the network queue.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,上述方法还包括:若所述第一设备确定从上次发送所述第一报文开始已累积发送第一预设数量的数据包,获取第三RTT;或者,若所述第一设备确定当前时间与上次发送所述第一报文的时间间隔达到第一预设时长,获取所述第三RTT,并记录当前时间戳。需要说明的是,该第三RTT与上述第一RTT为不同时刻的动态RTT。基于本方案,由于网络拥塞的深度是动态变化的,因此可以通过周期性的循环检测动态RTT,获取当前网络的拥塞程度。Combining the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing method further includes: if the first device determines that the first preamble has been sent cumulatively since the last time the first message was sent. Set the number of data packets to obtain the third RTT; or, if the first device determines that the time interval between the current time and the last sending of the first packet reaches the first preset duration, obtain the third RTT, and Record the current timestamp. It should be noted that the third RTT and the foregoing first RTT are dynamic RTTs at different moments. Based on this solution, since the depth of network congestion changes dynamically, the dynamic RTT can be detected periodically to obtain the current network congestion degree.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,上述方法还包括:若所述第一设备确定从上次发送所述第三报文开始已累积发送第二预设数量的 数据包,获取第四RTT;或者,若所述第一设备确定当前时间与上次发送所述第三报文的时间间隔达到第二预设时长,获取所述第四RTT,并记录当前时间戳。需要说明的是,该第四RTT与上述第二RTT为不同时刻的固定RTT。基于本方案,可以周期性的循环检测固定RTT,从而能够在第一设备和第二设备之间的数据传输路径发生变化时,较为准确的检测新的传输路径对应的固定RTT。Combining the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing method further includes: if the first device determines that the second preset has been sent cumulatively since the last time the third message was sent. Set the number of data packets to obtain the fourth RTT; or, if the first device determines that the time interval between the current time and the last sending of the third packet reaches the second preset duration, obtain the fourth RTT, and Record the current timestamp. It should be noted that the fourth RTT and the foregoing second RTT are fixed RTTs at different times. Based on this solution, the fixed RTT can be periodically and cyclically detected, so that when the data transmission path between the first device and the second device changes, the fixed RTT corresponding to the new transmission path can be detected more accurately.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,上述第三时间戳携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。基于本方案,通过使用现有协议的保留字段携带时间戳,与现有技术相比,不需要记录时间戳和报文序号的关系,占用资源较少。Combining the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing third time stamp is carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in The payload of the message RDMA. Based on this solution, by using the reserved field of the existing protocol to carry the timestamp, compared with the prior art, there is no need to record the relationship between the timestamp and the message sequence number and occupy less resources.
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,若上述第一报文为上述数据报文,上述第一时间戳携带在报文RDMA的BTH中的保留字段中;若上述第一报文和上述数据报文不同,上述第一时间戳携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。基于本方案,通过使用现有协议的保留字段携带时间戳,与现有技术相比,不需要记录时间戳和报文序号的关系,占用资源较少。Combining the first aspect and the foregoing possible implementation manners, in another possible implementation manner, if the foregoing first message is the foregoing data message, the foregoing first time stamp is carried in a reserved field in the BTH of the RDMA message ; If the first message is different from the data message, the first time stamp is carried in the reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in the payload of the message RDMA. . Based on this solution, by using the reserved field of the existing protocol to carry the timestamp, compared with the prior art, there is no need to record the relationship between the timestamp and the message sequence number and occupy less resources.
本申请实施例的第二方面,提供一种网络拥塞控制方法,该方法包括:第二设备接收第一设备发送的第一报文,该第一报文中携带第一时间戳;该第一时间戳为发送该第一报文时的本地时间戳;第一设备为发送数据报文的设备;上述第二设备向上述第一设备发送第二报文,该第二报文中携带上述第一时间戳;其中,该第一报文的优先级和数据报文的优先级相同,该第二报文的优先级高于数据报文的优先级。基于本方案,通过在未发送数据的传输方向上使用优先级高于数据报文的优先级的报文,从而使得第一设备测量的第一RTT不受反向路径(未传输业务报文的传输方向)是否拥塞的影响。A second aspect of the embodiments of the present application provides a network congestion control method, the method includes: a second device receives a first message sent by a first device, the first message carries a first time stamp; The timestamp is the local timestamp when the first message is sent; the first device is the device that sends the data message; the second device sends a second message to the first device, and the second message carries the first message. A timestamp; where the priority of the first message is the same as the priority of the data message, and the priority of the second message is higher than the priority of the data message. Based on this solution, by using a message with a higher priority than that of a data message in the transmission direction of the unsent data, the first RTT measured by the first device is not affected by the reverse path (the transmission of the untransmitted service message). Transmission direction) is the impact of congestion.
结合第二方面,在一种可能的实现方式中,上述方法还包括:上述第二设备接收上述第一设备发送的第三报文,该第三报文中携带第三时间戳,该第三时间戳为发送上述第三报文时的本地时间戳;该第三报文的优先级高于上述数据报文的优先级;上述第二设备向上述第一设备发送第四报文;该第四报文中携带上述第三时间戳;该第四报文的优先级高于上述数据报文的优先级。基于本方案,通过采用优先级高于数据报文的优先级的第三报文和第四报文,使得第一设备测量的第二RTT较为准确。With reference to the second aspect, in a possible implementation manner, the above method further includes: the second device receives a third message sent by the first device, the third message carries a third timestamp, and the third The timestamp is the local timestamp when the third message is sent; the priority of the third message is higher than the priority of the data message; the second device sends the fourth message to the first device; The fourth message carries the foregoing third time stamp; the priority of the fourth message is higher than the priority of the foregoing data message. Based on this solution, the second RTT measured by the first device is more accurate by using the third message and the fourth message whose priority is higher than the priority of the data message.
本申请实施例的第三方面,提供一种网络拥塞控制装置,该装置为发送数据报文的装置,该装置包括:处理单元和收发单元;所述处理单元,用于:通过所述收发单元向第二设备发送第一报文,所述第一报文中携带第一时间戳;所述第一时间戳为发送所述第一报文时的本地时间戳;通过所述收发单元接收所述第二设备发送的第二报文,所述第二报文中携带所述第一时间戳;用第二时间戳减去所述第一时间戳,获取第一往返时间RTT;所述第二时间戳为所述装置接收所述第二报文时的本地时间戳;根据所述第一RTT,调整数据报文的发送速率;其中,所述第一报文的优先级和所述数据报文的优先级相同,所述第二报文的优先级高于所述数据报文的优先级。In a third aspect of the embodiments of the present application, a network congestion control device is provided. The device is a device for sending data packets. The device includes: a processing unit and a transceiver unit; the processing unit is configured to: Send a first message to the second device, where the first message carries a first time stamp; the first time stamp is the local time stamp when the first message is sent; The second message sent by the second device, the second message carries the first timestamp; subtract the first timestamp from the second timestamp to obtain the first round-trip time RTT; The second time stamp is the local time stamp when the device receives the second message; the sending rate of the data message is adjusted according to the first RTT; where the priority of the first message and the data The priorities of the messages are the same, and the priority of the second message is higher than the priority of the data message.
结合第三方面,在一种可能的实现方式中,所述处理单元,还用于:通过所述收发单元向所述第二设备发送第三报文,所述第三报文中携带第三时间戳,所述第三时 间戳为发送所述第三报文时的本地时间戳;所述第三报文的优先级高于所述数据报文的优先级;通过所述收发单元接收所述第二设备发送的第四报文;所述第四报文中携带所述第三时间戳;所述第四报文的优先级高于所述数据报文的优先级;用第四时间戳减去所述第三时间戳,获取所述第二RTT;所述第四时间戳为所述装置接收所述第四报文时的本地时间戳。With reference to the third aspect, in a possible implementation manner, the processing unit is further configured to: send a third message to the second device through the transceiver unit, the third message carrying a third message Time stamp, the third time stamp is the local time stamp when the third message is sent; the priority of the third message is higher than the priority of the data message; The fourth message sent by the second device; the fourth message carries the third timestamp; the priority of the fourth message is higher than the priority of the data message; the fourth time is used Subtract the third time stamp from the stamp to obtain the second RTT; the fourth time stamp is the local time stamp when the device receives the fourth message.
结合第三方面和上述可能的实现方式,在另一种可能的实现方式中,所述处理单元,具体用于:用所述第一RTT减去所述第二RTT,得到时间差;所述时间差用于指示网络队列拥塞深度;根据所述时间差,调整数据报文的发送速率。In combination with the third aspect and the foregoing possible implementation manners, in another possible implementation manner, the processing unit is specifically configured to: subtract the second RTT from the first RTT to obtain the time difference; Used to indicate the depth of network queue congestion; adjust the sending rate of data packets according to the time difference.
结合第三方面和上述可能的实现方式,在另一种可能的实现方式中,所述处理单元,具体用于:若所述时间差小于第一预设阈值,增大所述数据报文的发送速率;若所述时间差大于第二预设阈值,减小所述数据报文的发送速率;所述第一预设阈值小于所述第二预设阈值。Combining the third aspect and the foregoing possible implementation manners, in another possible implementation manner, the processing unit is specifically configured to: if the time difference is less than a first preset threshold, increase the sending of the data message Rate; if the time difference is greater than a second preset threshold, reduce the sending rate of the data message; the first preset threshold is less than the second preset threshold.
结合第三方面和上述可能的实现方式,在另一种可能的实现方式中,所述处理单元,还用于:若所述处理单元确定从上次发送所述第一报文开始已累积发送第一预设数量的数据包,获取第三RTT;或者,若所述处理单元确定当前时间与上次发送所述第一报文的时间间隔达到第一预设时长,获取所述第三RTT,并记录当前时间戳。In combination with the third aspect and the foregoing possible implementation manners, in another possible implementation manner, the processing unit is further configured to: if the processing unit determines that the first message has been sent cumulatively since the last time the first message was sent The first preset number of data packets, obtain the third RTT; or, if the processing unit determines that the time interval between the current time and the last time the first packet is sent reaches the first preset duration, obtain the third RTT , And record the current timestamp.
结合第三方面和上述可能的实现方式,在另一种可能的实现方式中,所述处理单元,还用于:若所述处理单元确定从上次发送所述第三报文开始已累积发送第二预设数量的数据包,获取第四RTT;或者,若所述处理单元确定当前时间与上次发送所述第三报文的时间间隔达到第二预设时长,获取所述第四RTT,并记录当前时间戳。In combination with the third aspect and the foregoing possible implementation manners, in another possible implementation manner, the processing unit is further configured to: if the processing unit determines that the third message has been sent cumulatively since the last time the third message was sent The second preset number of data packets, obtain the fourth RTT; or, if the processing unit determines that the time interval between the current time and the last sending of the third packet reaches the second preset duration, obtain the fourth RTT , And record the current timestamp.
结合第三方面和上述可能的实现方式,在另一种可能的实现方式中,所述第三时间戳携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。Combining the third aspect and the foregoing possible implementation manners, in another possible implementation manner, the third timestamp is carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried In the payload of the message RDMA.
结合第三方面和上述可能的实现方式,在另一种可能的实现方式中,若所述第一报文为所述数据报文,所述第一时间戳携带在报文RDMA的BTH中的保留字段中;若所述第一报文和所述数据报文不同,所述第一时间戳携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。Combining the third aspect and the foregoing possible implementation manners, in another possible implementation manner, if the first message is the data message, the first time stamp is carried in the BTH of the RDMA message In the reserved field; if the first message and the data message are different, the first time stamp is carried in the reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in the message In the payload of the text RDMA.
本申请实施例的第四方面,提供一种网络拥塞控制装置,该装置包括:处理单元和收发单元;所述处理单元,用于:通过所述收发单元接收第一设备发送的第一报文,所述第一报文中携带第一时间戳;所述第一时间戳为发送所述第一报文时的本地时间戳;第一设备为发送数据报文的设备;通过所述收发单元向所述第一设备发送第二报文,所述第二报文中携带所述第一时间戳;其中,所述第一报文的优先级和所述数据报文的优先级相同,所述第二报文的优先级高于所述数据报文的优先级;或者,所述第一报文的优先级高于所述数据报文的优先级,所述第二报文的优先级和所述数据报文的优先级相同。In a fourth aspect of the embodiments of the present application, there is provided a network congestion control device, which includes: a processing unit and a transceiving unit; the processing unit is configured to receive a first message sent by a first device through the transceiving unit , The first message carries a first timestamp; the first timestamp is a local timestamp when the first message is sent; the first device is a device that sends a data message; through the transceiver unit Send a second message to the first device, and the second message carries the first time stamp; wherein the priority of the first message is the same as the priority of the data message, so The priority of the second message is higher than the priority of the data message; or, the priority of the first message is higher than the priority of the data message, and the priority of the second message It is the same as the priority of the data message.
结合第四方面,在一种可能的实现方式中,所述处理单元,还用于:通过所述收发单元接收所述第一设备发送的第三报文,所述第三报文中携带第三时间戳,所述第三时间戳为发送所述第三报文时的本地时间戳;所述第三报文的优先级高于所述数据 报文的优先级;通过所述收发单元向所述第一设备发送第四报文;所述第四报文中携带所述第三时间戳;所述第四报文的优先级高于所述数据报文的优先级。With reference to the fourth aspect, in a possible implementation manner, the processing unit is further configured to: receive, through the transceiver unit, a third message sent by the first device, where the third message carries a Three timestamps, where the third timestamp is the local timestamp when the third message is sent; the priority of the third message is higher than the priority of the data message; The first device sends a fourth message; the fourth message carries the third timestamp; the priority of the fourth message is higher than the priority of the data message.
上述第三方面以及第三方面的各种实现方式的效果描述可以参考第一方面和第一方面的各种实现方式的相应效果的描述,上述第四方面以及第四方面的各种实现方式的效果描述可以参考第二方面和第二方面的各种实现方式的相应效果的描述,在此不再赘述。For the description of the effects of the foregoing third aspect and various implementation manners of the third aspect, reference may be made to the description of the corresponding effects of the first aspect and the various implementation manners of the first aspect. The foregoing fourth aspect and the various implementation manners of the fourth aspect For the effect description, reference may be made to the second aspect and the description of the corresponding effects of the various implementation manners of the second aspect, which are not repeated here.
本申请实施例的第五方面,提供一种计算机存储介质,所述计算机存储介质中存储有计算机程序代码,当所述计算机程序代码在处理器上运行时,使得所述处理器执行上述任一方面所述的网络拥塞控制方法。In a fifth aspect of the embodiments of the present application, a computer storage medium is provided, and computer program code is stored in the computer storage medium. When the computer program code runs on a processor, the processor executes any of the above The network congestion control method described in the aspect.
本申请实施例的第六方面,提供了一种计算机程序产品,该程序产品储存有上述处理器执行的计算机软件指令,该计算机软件指令包含用于执行上述方面所述方案的程序。The sixth aspect of the embodiments of the present application provides a computer program product that stores computer software instructions executed by the above-mentioned processor, and the computer software instructions include a program for executing the solution described in the above-mentioned aspect.
本申请实施例的第七方面,提供了一种网络拥塞控制装置,该装置包括收发器、处理器以及存储器,收发器,用于收发信息,或者用于与其他网元通信;存储器,用于存储计算机执行指令;处理器,用于执行所计算机执行指令实现上述任一方面所述的网络拥塞控制方法。A seventh aspect of the embodiments of the present application provides a network congestion control device, which includes a transceiver, a processor, and a memory. The transceiver is used for sending and receiving information or communicating with other network elements; and the memory is used for Computer-executable instructions are stored; the processor is used to execute the computer-executed instructions to implement the network congestion control method described in any of the above aspects.
本申请实施例的第八方面,提供了一种网络拥塞控制装置,该装置以芯片的产品形态存在,该装置的结构中包括处理器和存储器,该存储器用于与处理器耦合,保存该装置必要的程序指令和数据,该处理器用于执行存储器中存储的程序指令,使得该装置执行上述方法中装置的功能。The eighth aspect of the embodiments of the present application provides a network congestion control device. The device exists in the form of a chip. The structure of the device includes a processor and a memory. The memory is used for coupling with the processor and storing the device. Necessary program instructions and data, the processor is used to execute the program instructions stored in the memory, so that the device executes the functions of the device in the above method.
附图说明Description of the drawings
图1为本申请现有技术提供的一种网络拥塞控制方案的示意图;Figure 1 is a schematic diagram of a network congestion control solution provided by the prior art of this application;
图2为本申请实施例提供的一种网络架构示意图;Figure 2 is a schematic diagram of a network architecture provided by an embodiment of the application;
图3为本申请实施例提供的一种网络拥塞控制方法的流程示意图;FIG. 3 is a schematic flowchart of a network congestion control method provided by an embodiment of this application;
图4为本申请实施例提供的一种时间戳的携带方式的示意图;FIG. 4 is a schematic diagram of a way to carry a timestamp according to an embodiment of the application;
图5为本申请实施例提供的另一种时间戳的携带方式的示意图;FIG. 5 is a schematic diagram of another way of carrying time stamps according to an embodiment of the application;
图6为本申请实施例提供的另一种网络拥塞控制方法的流程示意图;6 is a schematic flowchart of another network congestion control method provided by an embodiment of this application;
图7为本申请实施例提供的另一种网络拥塞控制方法的流程示意图;FIG. 7 is a schematic flowchart of another network congestion control method provided by an embodiment of this application;
图8为本申请实施例提供的一种数据传输的结构示意图;FIG. 8 is a schematic diagram of a data transmission structure provided by an embodiment of this application;
图9为本申请实施例提供的一种网络拥塞控制装置的组成示意图;FIG. 9 is a schematic diagram of the composition of a network congestion control device provided by an embodiment of this application;
图10为本申请实施例提供的另一种网络拥塞控制装置的组成示意图;10 is a schematic diagram of the composition of another network congestion control device provided by an embodiment of the application;
图11为本申请实施例提供的另一种网络拥塞控制装置的组成示意图;11 is a schematic diagram of the composition of another network congestion control device provided by an embodiment of the application;
图12为本申请实施例提供的另一种网络拥塞控制装置的组成示意图。FIG. 12 is a schematic diagram of the composition of another network congestion control apparatus provided by an embodiment of the application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。在本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达, 是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c或a-b-c,其中a、b和c可以是单个,也可以是多个。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. In this application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, both A and B exist, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the associated objects are in an "or" relationship. "The following at least one item (a)" or similar expressions refers to any combination of these items, including any combination of single item (a) or plural items (a). For example, at least one of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c can be single or multiple.
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that in this application, words such as "exemplary" or "for example" are used to indicate examples, illustrations, or illustrations. Any embodiment or design solution described as "exemplary" or "for example" in this application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as "exemplary" or "for example" are used to present related concepts in a specific manner.
首先,对本申请实施例中涉及的部分名词进行解释说明:First, explain some terms involved in the embodiments of this application:
RDMA远程直接内存访问,通过网络将数据直接传入计算机的存储区,将数据从一个系统快速移动到远程系统存储器中,而不对操作系统造成任何影响,从而不需要用到多少计算机的处理功能,能够消除外部存储器复制和上下文切换的开销。RDMA协议使计算机的网络接口卡(NIC)通过网络从另外一个计算机的内存读取或者向另外一个计算机的内存写入数据,而不需要计算机的操作系统介入。RDMA穿越汇聚以太网(RoCE)是InfiniBand(IB)提出来的用于RDMA在以太网上运行。在RoCE技术中,直接在以太链路层上承载运行被称为RoCE版本1(RoCEv1),在用户数据报协议(User Datagram Protocol,UDP)上承载运行被称为RoCE版本2(RoCEv2)。RDMA remote direct memory access, transfer data directly to the storage area of the computer through the network, and quickly move the data from a system to the remote system memory without any impact on the operating system, so that it does not require much computer processing functions. It can eliminate the overhead of external memory copy and context switching. The RDMA protocol enables the computer's network interface card (NIC) to read from or write data to the memory of another computer through the network without the intervention of the computer's operating system. RDMA Traversal Converged Ethernet (RoCE) is proposed by InfiniBand (IB) for RDMA to run on Ethernet. In the RoCE technology, directly bearing and running on the Ethernet link layer is called RoCE version 1 (RoCEv1), and bearing and running on the User Datagram Protocol (UDP) is called RoCE version 2 (RoCEv2).
RTT往返时间,表示从发送端发送数据开始,到发送端收到来自接收端的确认(接收端收到数据后便立即发送确认),总共经历的时延。RTT由三个部分决定:链路的传播时间、末端系统的处理时间、交换机(或路由器)的缓存中的排队和处理时间。其中,链路的传播时间和末端系统的处理时间相对固定,交换机(或路由器)的缓存中的排队和处理时间会随着整个网络拥塞程度的变化而变化,因此,RTT的变化在一定程度上反映了网络拥塞程度的变化。The RTT round-trip time represents the total delay from the start of the sender sending data to the sender receiving the confirmation from the receiving end (the receiving end immediately sends the confirmation after receiving the data). RTT is determined by three parts: the propagation time of the link, the processing time of the end system, the queuing and processing time in the buffer of the switch (or router). Among them, the propagation time of the link and the processing time of the end system are relatively fixed, and the queuing and processing time in the buffer of the switch (or router) will change with the change of the congestion degree of the entire network. Therefore, the change of RTT is to a certain extent Reflects changes in the degree of network congestion.
为了解决现有技术测量的往返时间RTT不准确,而且不能准确的反应具体发生拥塞的传输方向,造成控制系统误判等问题,本申请实施例提供一种网络拥塞控制方法,能够避免反向路径拥塞的影响,准确控制网络队列拥塞深度,提升系统性能。In order to solve the problems that the round-trip time RTT measured in the prior art is inaccurate and cannot accurately reflect the specific congestion transmission direction, causing misjudgment by the control system, etc., an embodiment of the present application provides a network congestion control method that can avoid reverse paths. The impact of congestion can accurately control the depth of network queue congestion and improve system performance.
本申请实施例提供一种网络拥塞控制方法,该方法应用于数据中心使用RoCE协议进行数据交换的计算机节点,该计算机节点通过一个或者多个交换机互连;多个交换机之间通过一定的拓扑关系(例如CLOS拓扑)连接,组成一个或多个路径的数据中心网络。本申请实施例对于交换机之间的拓扑关系并不进行限定,在此仅是示例性说明。The embodiment of the application provides a network congestion control method, which is applied to a computer node in a data center that uses the RoCE protocol for data exchange. The computer node is interconnected through one or more switches; a certain topological relationship between multiple switches (For example, CLOS topology) is connected to form a data center network with one or more paths. The embodiment of the present application does not limit the topological relationship between the switches, which is only an exemplary description here.
图2为本申请实施例提供的一种网络架构,包括计算机节点A和计算机节点B,该计算机节点A可以通过一个或者多个交换机与计算机节点B连接,计算机节点A包括主机A和网卡A,计算机节点B包括主机B和网卡B。计算机节点A的网卡A和计算机节点B的网卡B之间通过远程直接内存访问(RDMA)进行数据交换。Figure 2 is a network architecture provided by an embodiment of the application, including a computer node A and a computer node B. The computer node A can be connected to the computer node B through one or more switches. The computer node A includes a host A and a network card A. Computer node B includes host B and network card B. The network card A of the computer node A and the network card B of the computer node B exchange data through remote direct memory access (RDMA).
如图2所示,计算机节点A和计算机节点B在进行数据交换时,创建通信队列对(Queue Pair,QP),该通信队列对中一个为发送队列,另一个为接收队列。QP为全双工通信,发送请求的一端为请求端,接收请求并应答的一端为响应端。RoCE的请求由应用程序下发,与远程计算机节点进行数据交换使用的请求类型主要有Write、Send和Read。其中,Write和Send是计算机请求端发送数据,计算机应答端接收完数 据后并回应确认字符(Acknowledgement,ACK);Read是计算机请求端发送读请求,计算机应答端接收请求并回应读取的数据。即Write/Send是请求方发送数据,Read是应答端发送数据,因此Write/Send和Read的数据传输方向是不同的。As shown in Figure 2, when computer node A and computer node B exchange data, a communication queue pair (Queue Pair, QP) is created. One of the communication queue pairs is a sending queue and the other is a receiving queue. QP is full-duplex communication. The end sending a request is the requesting end, and the end receiving the request and responding is the responding end. RoCE requests are issued by the application program, and the request types used for data exchange with remote computer nodes mainly include Write, Send, and Read. Among them, Write and Send are the computer requesting end to send data, and the computer responding end will respond with an acknowledgement character (Acknowledgement, ACK) after receiving the data; Read is the computer requesting end sending a read request, and the computer responding end receiving the request and responding to the read data. That is, Write/Send is the requester to send data, and Read is the responder to send data, so the data transmission directions of Write/Send and Read are different.
示例性的,对于每个QP,数据交换是双向的,当网卡A发送请求时,网卡A为请求端,网卡B为应答端,此时,RDMA的Write/Send在网卡A携带数据,RDMA的Read在网卡B携带数据;当网卡B发送请求时,网卡B为请求端,网卡A为应答端,此时,RDMA的Write/Send网卡B携带数据,RDMA的Read在网卡A携带数据。为了准确的反应具体发生拥塞的传输方向,本申请实施例提供的网络拥塞控制方法中区分数据的传输方向。Exemplarily, for each QP, the data exchange is bidirectional. When network card A sends a request, network card A is the requesting end, and network card B is the responding end. At this time, RDMA Write/Send carries data on network card A, and RDMA Read carries data on network card B; when network card B sends a request, network card B is the requester, and network card A is the responder. At this time, RDMA Write/Send network card B carries data, and RDMA Read carries data on network card A. In order to accurately reflect the specific transmission direction where congestion occurs, the network congestion control method provided in the embodiment of the present application distinguishes the data transmission direction.
结合图2,如图3所示,本申请实施例提供一种网络拥塞控制方法,应用于第一设备,该第一设备为发送数据报文的设备,该方法可以包括步骤S301-S307。第一设备发送数据报文时,可以执行步骤S301-S307的网络拥塞控制方法,以减小网络队列的拥塞深度。With reference to FIG. 2, as shown in FIG. 3, an embodiment of the present application provides a network congestion control method, which is applied to a first device, and the first device is a device that sends a data message. The method may include steps S301-S307. When the first device sends a data message, the network congestion control method of steps S301-S307 can be executed to reduce the congestion depth of the network queue.
S301、第一设备向第二设备发送第一报文。S301. The first device sends a first message to the second device.
该第一报文中携带第一时间戳。该第一时间戳为第一设备发送该第一报文时的本地时间戳。The first message carries the first time stamp. The first time stamp is a local time stamp when the first device sends the first message.
示例性的,该第一报文的优先级和数据报文的优先级相同。该第一报文可以为数据报文,也可以为专门测量时延的报文,本申请实施例对此并不进行限定。例如,当该第一报文为专门测量时延的报文时,该第一报文和第一设备发送的数据报文为不同的报文,但该第一报文与数据报文的优先级相同。可以理解的,本实施例中第一设备为发送数据报文的设备,数据报文的传输方向为从第一设备至第二设备。Exemplarily, the priority of the first message is the same as the priority of the data message. The first message may be a data message, or may be a message specifically for measuring delay, which is not limited in the embodiment of the present application. For example, when the first message is a message dedicated to measuring delay, the first message and the data message sent by the first device are different messages, but the first message and the data message have priority The same level. It can be understood that the first device in this embodiment is a device that sends a data packet, and the transmission direction of the data packet is from the first device to the second device.
示例性的,以第一设备为计算机节点A,第二设备为计算机节点B为例。当计算机节点A发送Write或Send请求报文(例如,第一报文为Write或Send请求报文)时,该Write或Send请求报文中携带数据,该计算机节点A为发送数据报文的设备,故数据报文的传输方向为从计算机节点A至计算机节点B。Exemplarily, take the first device as the computer node A and the second device as the computer node B as an example. When computer node A sends a Write or Send request message (for example, the first message is a Write or Send request message), the Write or Send request message carries data, and the computer node A is the device that sends the data message , So the transmission direction of the data message is from computer node A to computer node B.
示例性的,以第一设备为计算机节点B,第二设备为计算机节点A为例。当计算机节点A发送Read请求时,计算机节点B接收计算机节点A发送的Read请求后,向计算机节点A发送上述第一报文,即计算机节点B为发送数据报文的设备,故数据报文的传输方向为从计算机节点B至计算机节点A。可以理解的,在该情况下,可选的,在上述步骤S301之前,还可以包括第二设备向第一设备发送Read请求。Exemplarily, take the first device as the computer node B and the second device as the computer node A as an example. When computer node A sends a Read request, after receiving the Read request from computer node A, computer node B sends the above-mentioned first message to computer node A, that is, computer node B is the device that sends the data message, so the data message is The transmission direction is from computer node B to computer node A. It can be understood that, in this case, optionally, before the above step S301, it may also include the second device sending a Read request to the first device.
示例性的,当该第一报文为第一设备发送的专门测量时延的报文时,该第一报文中携带上述第一时间戳。该第一时间戳的携带方式如图4或图5所示。Exemplarily, when the first message is a message specifically for measuring the delay sent by the first device, the first message carries the foregoing first time stamp. The carrying manner of the first time stamp is shown in FIG. 4 or FIG. 5.
如图4所示,该第一报文中的第一时间戳可以携带在报文远程直接内存访问RDMA的基本传输头(Base Transport Header,BTH)中的保留字段(reserved,rsvd)中。BTH中有两个保留字段,分别为第一保留字段(rsvd)和第二保留字段(rsvd)。其中。第一保留字段(rsvd)为BTH中的第5个字节,第二保留字段(rsvd)为BTH中的第9个字节的低7bit。该第一时间戳可以采用第一保留字段(rsvd)和第二保留字段(rsvd)中的一个保留字段携带,也可以通过两个保留字段组合的方式携带,本申请实施例对此并不进行限定。As shown in FIG. 4, the first time stamp in the first message may be carried in a reserved field (reserved, rsvd) in the basic transport header (Base Transport Header, BTH) of the remote direct memory access RDMA of the message. There are two reserved fields in the BTH, namely the first reserved field (rsvd) and the second reserved field (rsvd). among them. The first reserved field (rsvd) is the 5th byte in the BTH, and the second reserved field (rsvd) is the lower 7 bits of the 9th byte in the BTH. The first time stamp can be carried in one of the first reserved field (rsvd) and the second reserved field (rsvd), or can be carried in a combination of two reserved fields, which is not carried out in this embodiment of the application. limited.
图4中的操作码(Operation Code,Opcode),用于表明数据包的类型或IB PayLoad中更高层的协议类型;请求事件标识(Solicited Event,SE)表明回应者产生应该产生一个事件;迁移状态标识(MigReq,M),用于标识迁移状态;负载填充字节数(Pad Count,Pad),标识有多少额外字节被填充到IB PayLoad中;传输头版本号(Transport Header Version,TVer),用于表示该包的版本号;分区识别号(Partition Key)用于表征与本Packet关联的逻辑内存分区;目的端通信队列对号(Destination Queue Pair)表明目的端序号;A(Acknowledge Request,A)请求回应一个应答;报文序列号(Packet Sequence Number,PSN),用于检测丢失或重复的数据包。The operation code (Operation Code, Opcode) in Figure 4 is used to indicate the type of the data packet or the higher-level protocol type in IB PayLoad; the request event identifier (Solicited Event, SE) indicates that the responder should generate an event; the migration status The identifier (MigReq, M) is used to identify the migration status; the number of payload padding bytes (Pad Count, Pad), which identifies how many extra bytes are filled in IB PayLoad; the transport header version number (Transport Header Version, TVer), Used to indicate the version number of the packet; Partition Key is used to characterize the logical memory partition associated with this Packet; Destination Queue Pair indicates the destination serial number; A (Acknowledge Request, A) ) Request to respond with a response; Packet Sequence Number (PSN), used to detect lost or duplicate data packets.
如图5所示,上述第一报文中的第一时间戳也可以携带在报文RDMA的负载Payload中,负载部分可以不携带有效数据;通过在负载中携带第一时间戳,可以有更大的存放空间,因此能够携带更精确的时间戳。图5中其他字段的含义与图4相同,在此不再赘述。As shown in Figure 5, the first time stamp in the above first message can also be carried in the payload of the RDMA message, and the payload part may not carry valid data; by carrying the first time stamp in the payload, there can be more Large storage space, so it can carry a more accurate time stamp. The meanings of other fields in FIG. 5 are the same as those in FIG. 4, and will not be repeated here.
示例性的,当该第一报文为数据报文时,上述第一时间戳携带在该数据报文中。该数据报文中第一时间戳的携带方式如图4所示,即第二报文中的第一时间戳可以携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中。具体可以参考前述相关描述,在此不再赘述。Exemplarily, when the first message is a data message, the foregoing first time stamp is carried in the data message. The manner of carrying the first time stamp in the data message is shown in FIG. 4, that is, the first time stamp in the second message can be carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message. For details, please refer to the aforementioned related description, which will not be repeated here.
可以理解的,本实施例通过使用现有协议的保留字段携带时间戳,与现有技术相比,不需要记录时间戳和报文序号的关系,占用资源较少。It is understandable that this embodiment uses the reserved field of the existing protocol to carry the time stamp. Compared with the prior art, there is no need to record the relationship between the time stamp and the message sequence number, and it takes up less resources.
S302、第二设备接收第一报文。S302. The second device receives the first message.
S303、第二设备根据第一报文,构建第二报文。S303. The second device constructs a second message according to the first message.
示例性的,第二设备从第一报文中取出第一时间戳,并构建第二报文,该第二报文中携带第一时间戳。Exemplarily, the second device extracts the first time stamp from the first message, and constructs a second message, and the second message carries the first time stamp.
示例性的,该第二报文中第一时间戳的携带方式如图4或图5所示,即第二报文中的第一时间戳可以携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。具体可以参考前述相关描述,在此不再赘述。Exemplarily, the manner of carrying the first time stamp in the second message is shown in Figure 4 or Figure 5, that is, the first time stamp in the second message can be carried in the basic transmission of remote direct memory access RDMA of the message. In the reserved field in the header BTH, or carried in the payload of the message RDMA. For details, please refer to the aforementioned related description, which will not be repeated here.
S304、第二设备向第一设备发送第二报文。S304. The second device sends a second message to the first device.
该第二报文中携带上述第一时间戳。The second message carries the foregoing first time stamp.
示例性的,该第二报文的优先级高于数据报文的优先级。例如,该第二报文可以为带外报文,该带外报文是业务之外的消息,用于辅助测量RTT,该第二报文与业务之间完全独立,可以通过底层控制模块发送。可以理解的,由于第二设备向第一设备发送的第二报文的优先级高于数据报文的优先级,从而在发送该第二报文时,可以优先于数据报文发送,从而能够不受从第二设备至第一设备的传输方向上网络是否拥塞等的影响,使得测得的RTT更能准确的反应从第一设备至第二设备的传输方向上的队列拥塞程度。Exemplarily, the priority of the second message is higher than the priority of the data message. For example, the second message may be an out-of-band message. The out-of-band message is an out-of-service message and is used to assist in measuring RTT. The second message is completely independent from the service and can be sent through the underlying control module. . It is understandable that because the priority of the second message sent by the second device to the first device is higher than the priority of the data message, when sending the second message, it can be sent before the data message, so that It is not affected by whether the network is congested in the transmission direction from the second device to the first device, etc., so that the measured RTT can more accurately reflect the degree of queue congestion in the transmission direction from the first device to the second device.
示例性的,本实施例通过第一设备向第二设备发送与数据报文的优先级相同的第一报文,第二设备向第一设备发送优先级高于数据报文的优先级的第二报文,从而能够避免反向路径(从第二设备至第一设备的传输方向)拥塞的影响,较为准确的测量RTT。Exemplarily, in this embodiment, the first device sends the first packet with the same priority as the data packet to the second device, and the second device sends the first packet with the priority higher than the priority of the data packet to the first device. The second message can avoid the influence of congestion on the reverse path (the transmission direction from the second device to the first device), and the RTT can be measured more accurately.
例如,以第一设备为计算机节点A,第二设备为计算机节点B为例。计算机节点A向计算机节点B发送第一报文(Write或Send请求报文)时,该第一报文的优先级与数据报文的优先级相同,计算机节点B接收该第一报文后,向计算机节点A发送优先级高于数据报文的优先级的第二报文,从而能够避免计算机节点B至计算机节点A的传输方向上网络拥塞的影响,使得测得的RTT更能准确的反应从计算机节点A至计算机节点B的传输方向上的队列拥塞程度。For example, take the first device as the computer node A and the second device as the computer node B as an example. When computer node A sends the first message (Write or Send request message) to computer node B, the priority of the first message is the same as the priority of the data message. After computer node B receives the first message, Send the second message with priority higher than the priority of the data message to computer node A, so as to avoid the influence of network congestion in the transmission direction from computer node B to computer node A, so that the measured RTT can be more accurately reflected The degree of queue congestion in the transmission direction from computer node A to computer node B.
例如,以第一设备为计算机节点B,第二设备为计算机节点A为例。计算机节点B向计算机节点A发送第一报文(该第一报文为计算机节点B接收计算机节点A发送的Read请求后,回复的与数据报文的优先级相同的第一报文,该第一报文可以携带数据,也可以为专门测量时延的与数据报文的优先级相同的报文)时,该第一报文的优先级与数据报文的优先级相同,计算机节点A接收该第一报文后,向计算机节点B发送优先级高于数据报文的优先级的第二报文,从而能够避免计算机节点A至计算机节点B的传输方向上网络拥塞的影响,使得测得的RTT更能准确的反应从计算机节点B至计算机节点A的传输方向上的队列拥塞程度。For example, take the first device as the computer node B and the second device as the computer node A as an example. Computer node B sends a first message to computer node A (this first message is the first message that computer node B replies with the same priority as the data message after receiving the Read request sent by computer node A, the first message A message can carry data, or it can be a message with the same priority as the data message that specifically measures the delay), the first message has the same priority as the data message, and the computer node A receives After the first message, a second message with a priority higher than the priority of the data message is sent to the computer node B, so as to avoid the influence of network congestion in the transmission direction from the computer node A to the computer node B, so that the measurement The RTT can more accurately reflect the degree of queue congestion in the transmission direction from computer node B to computer node A.
可以理解的,本实施例通过在未传输业务报文的传输方向上采用优先级高于数据报文的优先级的报文,可以避免反向路径(未传输业务报文的传输方向)拥塞的影响,使得测量的时延能更准确的反应传输业务报文的方向上的队列拥塞程度。It is understandable that this embodiment can avoid congestion in the reverse path (transmission direction of untransmitted service packets) by adopting a message with a higher priority than that of the data message in the transmission direction of the untransmitted service message. Influence, so that the measured delay can more accurately reflect the degree of queue congestion in the direction of transmitting service packets.
S305、第一设备接收第二报文。S305. The first device receives the second message.
S306、第一设备用第二时间戳减去第一时间戳,获取第一RTT。S306. The first device subtracts the first time stamp from the second time stamp to obtain the first RTT.
该第二时间戳为第一设备接收第二报文时的本地时间戳。The second time stamp is a local time stamp when the first device receives the second message.
示例性的,该第二时间戳减去第一时间戳可以理解为从第一设备发送第一报文开始,到第一设备接收对端(第二设备)发送的第二报文总共经历的时间,即为第一RTT。Exemplarily, the second timestamp minus the first timestamp can be understood as the total experience from the first device sending the first message to the first device receiving the second message sent by the peer (the second device) Time is the first RTT.
可以理解的,由于第一报文的优先级与数据报文的优先级相同,第二报文的优先级高于数据报文的优先级,因此该第一RTT不仅考虑了交换机(或路由器)的缓存中的排队和处理时间,而且避免了反向路径(未传输业务报文的传输方向)拥塞的影响,因此该第一RTT与网络拥塞程度相关,是会随着网络拥塞程度的变化动态变化的,能够较为准确的反映当前网络的拥塞程度。该第一RTT可以称为动态RTT。It is understandable that since the priority of the first message is the same as the priority of the data message, and the priority of the second message is higher than the priority of the data message, the first RTT not only considers the switch (or router) The queuing and processing time in the cache, and avoids the impact of reverse path (transmission direction of untransmitted service packets) congestion, so the first RTT is related to the degree of network congestion, and will change dynamically with the degree of network congestion Changes can more accurately reflect the current network congestion. This first RTT may be referred to as dynamic RTT.
S307、第一设备根据第一RTT,调整数据报文的发送速率。S307: The first device adjusts the sending rate of the data packet according to the first RTT.
示例性的,该第一RTT能够反映当前网络队列的拥塞程度。在第一设备和第二之间的数据传输路径不变的情况下,该第一RTT越大,表示网络队列越拥塞,因此可以根据该第一RTT调整数据报文的发送速率。Exemplarily, the first RTT can reflect the current congestion degree of the network queue. In the case that the data transmission path between the first device and the second device remains unchanged, the larger the first RTT, the more congested the network queue, so the sending rate of the data packet can be adjusted according to the first RTT.
例如,以第一设备为计算机节点A,第二设备为计算机节点B为例。若计算机节点A发送Write或Send请求报文(第一报文),该计算机节点A为发送数据报文的设备,此时数据报文的传输方向为从计算机节点A至计算机节点B,该第一RTT能够反映从计算机节点A至计算机节点B的传输方向上的网络队列的拥塞程度。因此计算机节点A可以根据第一RTT调整数据报文的发送速率。For example, take the first device as the computer node A and the second device as the computer node B as an example. If the computer node A sends a Write or Send request message (the first message), the computer node A is the device that sends the data message. At this time, the transmission direction of the data message is from the computer node A to the computer node B. An RTT can reflect the congestion degree of the network queue in the transmission direction from the computer node A to the computer node B. Therefore, the computer node A can adjust the sending rate of the data message according to the first RTT.
例如,以第一设备为计算机节点B,第二设备为计算机节点A为例。若计算机节点A发送Read请求,计算机节点B接收计算机节点A发送的Read请求后,向计算机节点A发送上述第一报文,该计算机节点B为发送数据报文的设备,此时数据报文 的传输方向为从计算机节点B至计算机节点A,因此计算机节点B可以根据该第一RTT调整数据报文的发送速率。For example, take the first device as the computer node B and the second device as the computer node A as an example. If computer node A sends a Read request, after receiving the Read request from computer node A, computer node B sends the above-mentioned first message to computer node A. This computer node B is the device that sends the data message. The transmission direction is from the computer node B to the computer node A, so the computer node B can adjust the sending rate of the data message according to the first RTT.
示例性的,上述第一设备可以根据第一RTT,调整数据报文的发送速率,可以包括:若第一RTT大于第一预设阈值,确定当时网络较为拥塞,可以减小数据报文的发送速率,使得网络拥塞程度降低;若第一RTT小于第二预设阈值,确定当前网络不拥塞,可以适当的增加数据报文的发送速率,以充分利用网络容量,该第二预设阈值小于或等于第一预设阈值。需要说明的是,本申请实施例对于第一设备如何根据第一RTT,调整数据报文的发送速率的具体方法并不进行限定,在此仅是示例性说明。Exemplarily, the foregoing first device may adjust the sending rate of the data message according to the first RTT, which may include: if the first RTT is greater than the first preset threshold, it is determined that the network is relatively congested at the time, and the sending of data messages can be reduced. If the first RTT is less than the second preset threshold, it is determined that the current network is not congested, and the sending rate of data packets can be appropriately increased to make full use of the network capacity. The second preset threshold is less than or Equal to the first preset threshold. It should be noted that the embodiment of the present application does not limit the specific method of how the first device adjusts the sending rate of the data packet according to the first RTT, and is only an exemplary description here.
需要说明的是,在第一设备和第二设备之间的数据传输路径不变的情况下,可以采用步骤S301-S307的方法调整数据报文的发送速率,减小网络队列的拥塞程度。It should be noted that under the condition that the data transmission path between the first device and the second device is unchanged, the method of steps S301-S307 can be used to adjust the sending rate of the data packet to reduce the congestion degree of the network queue.
本申请实施例提供的网络拥塞控制方法,通过第一设备向第二设备发送第一报文;第二设备接收第一报文;第二设备根据第一报文,构建第二报文;第二设备向第一设备发送第二报文;第一设备接收第二报文;第一设备用第二时间戳减去第一时间戳,获取第一往返时间RTT;第一设备根据第一RTT,调整数据报文的发送速率。本实施例中第一RTT的测量不受反向路径(未传输业务报文的传输方向)拥塞的影响,确定的第一RTT较准确,故通过该第一RTT,调整数据报文的发送速率时,能够减小网络队列拥塞程度,提升系统性能。In the network congestion control method provided by the embodiments of the present application, the first device sends the first message to the second device through the first device; the second device receives the first message; the second device constructs the second message according to the first message; The second device sends the second message to the first device; the first device receives the second message; the first device subtracts the first timestamp from the second timestamp to obtain the first round-trip time RTT; the first device according to the first RTT To adjust the sending rate of data packets. In this embodiment, the measurement of the first RTT is not affected by the congestion of the reverse path (the transmission direction of the untransmitted service packet), and the determined first RTT is more accurate. Therefore, the transmission rate of the data packet is adjusted through the first RTT. At this time, it can reduce the network queue congestion and improve system performance.
本申请实施例还提供一种网络拥塞控制方法,如图6所示,在上述步骤S307之前,该方法还包括步骤S601-S606。The embodiment of the present application also provides a network congestion control method. As shown in FIG. 6, before the above step S307, the method further includes steps S601-S606.
S601、第一设备向第二设备发送第三报文。S601. The first device sends a third packet to the second device.
该第三报文中携带第三时间戳,该第三时间戳为发送第三报文时的本地时间戳。第三报文的优先级高于数据报文的优先级。示例性的,该第三报文可以为不携带数据的报文,例如,该第三报文可以为带外报文,用于辅助测量RTT。The third message carries a third timestamp, and the third timestamp is a local timestamp when the third message is sent. The priority of the third message is higher than the priority of the data message. Exemplarily, the third message may be a message that does not carry data. For example, the third message may be an out-of-band message to assist in measuring RTT.
示例性的,该第三报文中第三时间戳的携带方式如图4或图5所示,即第三报文中的第三时间戳可以携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。具体可以参考前述相关描述,在此不再赘述。Exemplarily, the way of carrying the third time stamp in the third message is shown in Figure 4 or Figure 5, that is, the third time stamp in the third message can be carried in the basic transmission of remote direct memory access RDMA of the message. In the reserved field in the header BTH, or carried in the payload of the message RDMA. For details, please refer to the aforementioned related description, which will not be repeated here.
S602、第二设备接收第三报文。S602. The second device receives the third message.
S603、第二设备根据第三报文,构建第四报文。S603. The second device constructs a fourth message according to the third message.
示例性的,第二设备从第三报文中取出第三时间戳,并构建第四报文,该第四报文中携带第三时间戳。Exemplarily, the second device extracts the third timestamp from the third message, and constructs a fourth message, and the fourth message carries the third timestamp.
示例性的,该第四报文中第三时间戳的携带方式如图4或图5所示。具体可以参考前述相关描述,在此不再赘述。Exemplarily, the manner of carrying the third time stamp in the fourth message is shown in FIG. 4 or FIG. 5. For details, please refer to the aforementioned related description, which will not be repeated here.
S604、第二设备向第一设备发送第四报文。S604: The second device sends a fourth packet to the first device.
该第四报文的优先级高于数据报文的优先级。示例性的,该第四报文可以为不携带数据的报文,例如,该第三报文可以为带外报文,用于辅助测量RTT。The priority of the fourth message is higher than the priority of the data message. Exemplarily, the fourth message may be a message that does not carry data. For example, the third message may be an out-of-band message to assist in measuring RTT.
S605、第一设备接收第四报文。S605: The first device receives the fourth packet.
S606、第一设备用第四时间戳减去第三时间戳,获取第二RTT。S606. The first device subtracts the third time stamp from the fourth time stamp to obtain the second RTT.
该第四时间戳为第一设备接收第四报文时的本地时间戳。The fourth time stamp is a local time stamp when the first device receives the fourth message.
示例性的,该第四时间戳减去第三时间戳可以理解为从第一设备发送第三报文开始,到第一设备接收第二设备发送的第四报文总共经历的时间,即为第二RTT。Exemplarily, the fourth timestamp minus the third timestamp can be understood as the total elapsed time from the first device sending the third message to the first device receiving the fourth message sent by the second device, which is The second RTT.
(可选的)第一设备可以将该第二RTT保存到上下文中。(Optional) The first device may save the second RTT in the context.
可以理解的,本实施例中通过发送优先级高于数据报文的第三报文,并接收对端发送的优先级高于数据报文的第四报文,确定第二RTT,能够不受网络是否拥塞的影响,更加准确的测量第一设备和第二设备之间的RTT。It can be understood that, in this embodiment, by sending a third message with a higher priority than a data message, and receiving a fourth message with a higher priority than the data message sent by the opposite end, it is determined that the second RTT is not affected by Whether the network is congested, the RTT between the first device and the second device can be measured more accurately.
需要说明的是,在第一设备的网卡和第二设备的网卡之间的数据传输路径不变的情况下,该第二RTT的值基本是固定的,可能随着网络性能等的有略微变化。该第二RTT可以称为固定RTT。It should be noted that under the condition that the data transmission path between the network card of the first device and the network card of the second device is unchanged, the value of the second RTT is basically fixed, and may vary slightly with network performance, etc. . This second RTT may be referred to as a fixed RTT.
可以理解的,上述步骤S301-S306中获取第一RTT的路径与步骤S601-S606中获取第二RTT的路径相同。上述步骤S301-S306可以在步骤S601-S606之前执行,或者,也可以在步骤S601-S606之后执行,或者,还可以和步骤S601-S606同时执行,本申请实施例对此并不进行限定。It can be understood that the path for obtaining the first RTT in steps S301-S306 is the same as the path for obtaining the second RTT in steps S601-S606. The above steps S301-S306 may be executed before steps S601-S606, or may be executed after steps S601-S606, or may also be executed simultaneously with steps S601-S606, which is not limited in the embodiment of the present application.
在执行上述步骤S301-S306以及S601-S606之后,相应的,上述S307中第一设备根据第一RTT,调整数据报文的发送速率,包括:第一设备根据第一RTT和第二RTT,调整数据报文的发送速率。After performing the above steps S301-S306 and S601-S606, correspondingly, in the above S307, the first device adjusts the sending rate of the data packet according to the first RTT, including: the first device adjusts according to the first RTT and the second RTT The sending rate of data packets.
示例性的,第一设备根据第一RTT和第二RTT,调整数据报文的发送速率,包括:第一设备用第一RTT减去第二RTT,得到时间差,该时间差用于指示网络队列拥塞深度;第一设备根据该时间差,调整数据报文的发送速率。Exemplarily, the first device adjusts the sending rate of the data packet according to the first RTT and the second RTT, including: the first device subtracts the second RTT from the first RTT to obtain a time difference, which is used to indicate network queue congestion Depth: The first device adjusts the sending rate of the data message according to the time difference.
(可选的)第一设备可以从上下文中获取保存的第二RTT,并用第一RTT减去上下文中保存的第二RTT,得到时间差。(Optional) The first device may obtain the saved second RTT from the context, and subtract the second RTT saved in the context from the first RTT to obtain the time difference.
示例性的,该第一RTT与第二RTT的时间差可以用于表示从第一设备到第二设备的传输方向上的网络队列拥塞的深度。可以理解的,若第一设备为发送数据报文的设备,该时间差具体表示从第一设备至第二设备的传输方向上的队列拥塞的深度;该时间差值越小,表示从第一设备至第二设备的传输方向上的队列越不拥塞;该时间差值越大,表示从第一设备至第二设备的传输方向上的队列拥塞的深度越深,即网络拥塞越严重。若第一设备为接收数据报文的设备,该时间差具体表示从第二设备至第一设备的传输方向上的队列拥塞的深度。该时间差值越小,表示从第二设备至第一设备的传输方向上的队列越不拥塞;该时间差值越大,表示从第二设备至第一设备的传输方向上的队列拥塞的深度越深,即网络拥塞越严重。Exemplarily, the time difference between the first RTT and the second RTT may be used to indicate the depth of network queue congestion in the transmission direction from the first device to the second device. It can be understood that if the first device is a device that sends data packets, the time difference specifically represents the depth of queue congestion in the transmission direction from the first device to the second device; the smaller the time difference, the smaller the time difference, the The queue in the transmission direction to the second device is less congested; the larger the time difference is, the deeper the queue congestion in the transmission direction from the first device to the second device, that is, the more serious the network congestion. If the first device is a device that receives data packets, the time difference specifically represents the depth of queue congestion in the transmission direction from the second device to the first device. The smaller the time difference, the less congested the queue in the transmission direction from the second device to the first device; the larger the time difference, the less congested the queue in the transmission direction from the second device to the first device. The deeper the depth, the more serious the network congestion.
示例性的,第一设备根据该时间差,调整数据报文的发送速率,可以包括:若时间差小于第一预设阈值,增大数据报文的发送速率;若时间差大于第二预设阈值,减小数据报文的发送速率;该第一预设阈值(T low)小于第二预设阈值(T high),该第一预设阈值和第二预设阈值的设定可以为经验值,与链路速率、设备抖动等因素有关。 Exemplarily, adjusting the sending rate of the data message by the first device according to the time difference may include: if the time difference is less than a first preset threshold, increasing the sending rate of the data message; if the time difference is greater than the second preset threshold, reducing The sending rate of small data packets; the first preset threshold (T low ) is less than the second preset threshold (T high ), and the setting of the first preset threshold and the second preset threshold may be empirical values, and Link speed, device jitter and other factors are related.
例如,若时间差T q小于T low,则表示网络队列拥塞的深度很小,可以认为网络不拥塞,该情况可以增大数据报文的发送速率,以充分利用网络容量;若T q大于或等于T high,则表示网络队列拥塞的深度很大,可以认为当前网络队列较拥塞,该情况可以减小数据报文的发送速率,以减小网络队列拥塞深度;若T q大于或等于T low,且小于T high,表示网络队列拥塞的深度在网络可承受范围内,可以认为网络轻拥塞,该情 况可以不改变当前数据报文的发送速率,保持轻拥塞的网络状态。 For example, if the time difference T q is less than T low , it means that the depth of network queue congestion is very small, and the network can be considered not to be congested. In this case, the sending rate of data packets can be increased to fully utilize the network capacity; if T q is greater than or equal to T high , it means that the depth of network queue congestion is very large. It can be considered that the current network queue is relatively congested. This situation can reduce the sending rate of data packets to reduce the depth of network queue congestion; if T q is greater than or equal to T low , And is less than T high , indicating that the depth of network queue congestion is within the acceptable range of the network, and the network can be considered lightly congested. In this case, the current data message sending rate may not be changed, and the light-congested network state can be maintained.
示例性的,上述调整数据报文的发送速率可以通过预设算法增大或减小数据报文的发送速率。例如,该算法可以为和式增加,积式减少(Additive Increase Multiplicative Decrease,AIMD)算法,采用AIMD算法控制数据报文的发送速率可以包括:网络无拥塞时,线性的增加其发送速度;当网络拥塞时,乘性减小其发送速度。本申请实施例对于调整数据报文发送速率采用的算法并不进行限定,在此仅是示例性说明。Exemplarily, the foregoing adjustment of the sending rate of the data message may increase or decrease the sending rate of the data message through a preset algorithm. For example, the algorithm can be a sum-increasing and multiplicative-decrease (AIMD) algorithm. The AIMD algorithm can be used to control the sending rate of data packets, including: when the network is not congested, linearly increasing its sending speed; When congested, it multiply reduces its sending speed. The embodiment of the present application does not limit the algorithm used to adjust the sending rate of the data message, and is only an exemplary description here.
可以理解的,本申请实施例通过准确的获取用于表示网络队列拥塞深度的时间差,并根据该时间差,调整数据报文的发送速率,能够在网络较拥塞时,减小数据报文的发送速率,从而使得网络队列拥塞深度减小,提升系统性能。It is understandable that the embodiment of the present application accurately obtains the time difference used to indicate the congestion depth of the network queue, and adjusts the sending rate of data packets according to the time difference, which can reduce the sending rate of data packets when the network is congested. , Thereby reducing the depth of network queue congestion and improving system performance.
本申请实施例提供的网络拥塞控制方法,通过获取第一RTT和第二RTT,并采用第一RTT减去第二RTT,得到时间差;并根据该时间差,调整数据报文的发送速率,以减小网络队列的拥塞深度。本实施例通过采用优先级高于数据报文的优先级的第三报文和第四报文,能够较为准确的测量第二RTT,并计算第一RTT和第二RTT的差值得到时间差,该时间差能够准确的反应第一设备和第二设备之间网络队列拥塞的深度,因此根据该时间差调整数据报文的发送速率时,能够有效的减小网络队列的拥塞深度,提升系统性能。In the network congestion control method provided by the embodiments of the present application, the first RTT and the second RTT are obtained, and the second RTT is subtracted from the first RTT to obtain the time difference; and according to the time difference, the sending rate of the data packet is adjusted to reduce Congestion depth of small network queues. In this embodiment, by using the third message and the fourth message with a priority higher than the priority of the data message, the second RTT can be measured more accurately, and the difference between the first RTT and the second RTT can be calculated to obtain the time difference. The time difference can accurately reflect the congestion depth of the network queue between the first device and the second device. Therefore, when adjusting the sending rate of data packets according to the time difference, the congestion depth of the network queue can be effectively reduced and system performance can be improved.
本申请实施例还提供一种网络拥塞控制方法,如图7所示,该方法在步骤S307之后,还包括步骤S701-S704。The embodiment of the present application also provides a network congestion control method. As shown in FIG. 7, the method further includes steps S701-S704 after step S307.
S701、若第一设备确定从上次发送第一报文开始已累积发送第一预设数量的数据包,获取第三RTT,或者,若第一设备确定当前时间与上次发送第一报文的时间间隔达到第一预设时长,获取第三RTT,并记录当前时间戳。S701. If the first device determines that the first preset number of data packets have been sent cumulatively since the first message was sent last time, acquire the third RTT, or if the first device determines that the current time is different from the last time the first message was sent When the time interval reaches the first preset duration, the third RTT is obtained, and the current time stamp is recorded.
该第三RTT与第一RTT为不同时刻的动态RTT。The third RTT and the first RTT are dynamic RTTs at different moments.
示例性的,由于网络拥塞程度是动态变化的,因此可以通过周期性的循环检测动态RTT,确定当前网络的拥塞程度。Exemplarily, since the degree of network congestion changes dynamically, the dynamic RTT can be detected periodically to determine the degree of current network congestion.
一种实现方式中,循环检测动态RTT的循环周期可以为:第一设备从发送第一报文开始已累积发送第一预设数量的数据包,获取第三RTT。例如,从上次发送携带第一时间戳的数据报文开始,第一设备已累积发送J个数据包,J大于或等于2,第一设备获取第三RTT。In an implementation manner, the cycle period of the cyclic detection of the dynamic RTT may be: the first device has cumulatively sent a first preset number of data packets since sending the first packet to obtain the third RTT. For example, since the last time the data packet carrying the first time stamp was sent, the first device has sent J data packets cumulatively, and J is greater than or equal to 2, and the first device obtains the third RTT.
另一种实现方式中,循环检测动态RTT的循环周期可以为:第一设备确定当前时间与上次发送第一报文的时间间隔达到第一预设时长,获取第三RTT,并记录当前时间戳。例如,从上次发送第一报文开始,时间间隔达到K微秒,第一设备获取第三RTT。In another implementation manner, the cycle period of the cyclic detection dynamic RTT may be: the first device determines that the time interval between the current time and the last sending of the first message reaches the first preset duration, obtains the third RTT, and records the current time stamp. For example, since the first message was sent last time, and the time interval reaches K microseconds, the first device obtains the third RTT.
步骤S701中获取第三RTT的具体实现方式与前述步骤S301-S306中获取第一RTT的具体实现方式相同,具体可以参考前述实施例的相关描述,在此不再赘述。The specific implementation manner of obtaining the third RTT in step S701 is the same as the specific implementation manner of obtaining the first RTT in the foregoing steps S301-S306. For details, reference may be made to the relevant description of the foregoing embodiment, and details are not described herein again.
S702、若第一设备确定从上次发送第三报文开始已累积发送第二预设数量的数据包,获取第四RTT;或者,若第一设备确定当前时间与上次发送第三报文的时间间隔达到第二预设时长,获取第四RTT,并记录当前时间戳。S702. If the first device determines that the second preset number of data packets have been sent cumulatively since the last time the third message was sent, acquire the fourth RTT; or, if the first device determines that the current time is different from the last time the third message was sent When the time interval reaches the second preset duration, the fourth RTT is obtained, and the current time stamp is recorded.
该第四RTT与第二RTT为不同时刻的固定RTT。The fourth RTT and the second RTT are fixed RTTs at different times.
示例性的,由于第一设备和第二设备之间的数据传输的路径可能发生变化。例如,若计算机节点A的网卡A和计算机节点B的网卡B之间有多条路径,当网卡A和网 卡B之间的第一路径异常时,交换机可以选择能够到达网卡B的第二路径传输数据。可以理解的,网卡A至网卡B之间的网络路径发生变化后,固定RTT也会发生变化,因此可以周期性的循环检测固定RTT。Exemplarily, the data transmission path between the first device and the second device may change. For example, if there are multiple paths between the network card A of the computer node A and the network card B of the computer node B, when the first path between the network card A and the network card B is abnormal, the switch can choose the second path to reach the network card B for transmission data. It is understandable that after the network path between the network card A and the network card B changes, the fixed RTT will also change, so the fixed RTT can be detected periodically.
一种实现方式中,循环检测固定RTT的循环周期可以为:第一设备从上次发送第三报文开始已累积发送第二预设数量的数据包,获取第四RTT。例如,从上次发送第三报文开始,第一设备已累积发送N个数据包,N大于或等于2,第一设备获取第四RTT。In an implementation manner, the cycle period of the cyclic detection of the fixed RTT may be: the first device has cumulatively sent a second preset number of data packets since sending the third message last time, and obtains the fourth RTT. For example, since the third message was sent last time, the first device has sent N data packets cumulatively, and N is greater than or equal to 2, and the first device obtains the fourth RTT.
另一种实现方式中,循环检测固定RTT的循环周期可以为:第一设备确定当前时间与上次发送第三报文的时间间隔达到第二预设时长,获取第四RTT。例如,从上次发送第三报文开始,时间间隔达到M微秒,第一设备获取第四RTT。In another implementation manner, the cycle period of cyclically detecting the fixed RTT may be: the first device determines that the time interval between the current time and the last sending of the third packet reaches the second preset duration, and obtains the fourth RTT. For example, since the third message was sent last time and the time interval reaches M microseconds, the first device obtains the fourth RTT.
步骤S702中获取第四RTT的具体实现方式与前述步骤S601-S606中获取第二RTT的实现方式相同,具体可以参考前述实施例的相关描述,在此不再赘述。The specific implementation manner of obtaining the fourth RTT in step S702 is the same as the implementation manner of obtaining the second RTT in the foregoing steps S601-S606. For details, reference may be made to the related description of the foregoing embodiment, and details are not repeated here.
示例性的,上述步骤S701中获取第三RTT的循环时间可以小于步骤S702中获取第四RTT的循环时间,本申请实施例对此并不进行限定,在此仅是示例性说明。Exemplarily, the cycle time for obtaining the third RTT in step S701 may be less than the cycle time for obtaining the fourth RTT in step S702, which is not limited in the embodiment of the present application, and is only an exemplary description here.
S703、用第三RTT减去当前固定RTT,得到时间差。S703: Subtract the current fixed RTT from the third RTT to obtain the time difference.
示例性的,该当前固定RTT可以为上下文保存的固定RTT,该上下文保存的固定RTT可以为第二RTT或第四RTT。Exemplarily, the current fixed RTT may be a fixed RTT saved in the context, and the fixed RTT saved in the context may be the second RTT or the fourth RTT.
(可选的)第一设备可以从上下文中获取保存的当前固定RTT,并用第三RTT减去上下文中保存的当前固定RTT,得到时间差。若上下文中保存的当前固定RTT为第二RTT,步骤S703可以用第三RTT减去第二RTT,得到时间差。若上下文中保存的当前固定RTT为第四RTT,步骤S703可以用第三RTT减去第四RTT,得到时间差。(Optional) The first device may obtain the saved current fixed RTT from the context, and subtract the current fixed RTT saved in the context from the third RTT to obtain the time difference. If the current fixed RTT saved in the context is the second RTT, step S703 may subtract the second RTT from the third RTT to obtain the time difference. If the current fixed RTT saved in the context is the fourth RTT, step S703 may subtract the fourth RTT from the third RTT to obtain the time difference.
S704、根据时间差,调整数据报文的发送速率。S704: Adjust the sending rate of the data message according to the time difference.
可以理解的,步骤S704的具体实现方式可以参考前述步骤S307中的具体实现方式,在此不再赘述。It can be understood that the specific implementation manner of step S704 may refer to the specific implementation manner in the foregoing step S307, which will not be repeated here.
需要说明的是,本实施例提供的网络拥塞控制方法可以重复执行步骤S701-S704,对不同时刻的不同网络路径、不同网络拥塞情况等进行控制,以确保网络性能较高。It should be noted that the network congestion control method provided in this embodiment can repeatedly execute steps S701-S704 to control different network paths and different network congestion conditions at different times to ensure high network performance.
本实施例通过周期性的循环检测固定RTT和动态RTT,能够提高固定RTT和动态RTT的准确性,从而有效的降低网络队列拥塞的深度,提高系统性能。In this embodiment, the fixed RTT and the dynamic RTT are periodically and cyclically detected, which can improve the accuracy of the fixed RTT and the dynamic RTT, thereby effectively reducing the depth of network queue congestion and improving system performance.
图8为本申请实施例提供的一种数据传输的结构示意图。例如,若网卡A向网卡B发送Write/Send,则网卡A为发送数据报文的设备,数据报文的传输方向为从网卡A至网卡B,从网卡B至网卡A的方向为未传输数据报文的方向;若网卡A向网卡B发送Read,则网卡A为接收数据报文的设备,数据报文的传输方向为从网卡B至网卡A,从网卡A至网卡B的方向为未传输数据报文的方向。FIG. 8 is a schematic diagram of a data transmission structure provided by an embodiment of the application. For example, if NIC A sends Write/Send to NIC B, NIC A is the device that sends data messages, and the transmission direction of the data message is from NIC A to NIC B, and the direction from NIC B to NIC A is no data transmission The direction of the message; if the network card A sends a Read to the network card B, then the network card A is the device that receives the data message, and the transmission direction of the data message is from the network card B to the network card A, and the direction from the network card A to the network card B is not transmitted The direction of the data message.
以网卡A向网卡B发送Write/Send请求为例,即网卡A为发送数据报文的设备。该网卡A可以包括固定RTT请求模块810、固定RTT应答模块812、动态RTT请求模块820、动态RTT应答模块822、速率控制模块830和发送模块840。网卡B包括固定RTT反射模块811、动态RTT反射模块821和接收模块850。Take network card A sending a Write/Send request to network card B as an example, that is, network card A is the device that sends data packets. The network card A may include a fixed RTT request module 810, a fixed RTT response module 812, a dynamic RTT request module 820, a dynamic RTT response module 822, a rate control module 830, and a sending module 840. The network card B includes a fixed RTT reflection module 811, a dynamic RTT reflection module 821, and a receiving module 850.
固定RTT请求模块810,用于构建固定RTT请求消息,并在消息中封装网卡A的本地时间戳1。该固定RTT请求消息的优先级高于网卡A向网卡B发送的数据报文 的优先级。The fixed RTT request module 810 is used to construct a fixed RTT request message and encapsulate the local time stamp 1 of the network card A in the message. The priority of the fixed RTT request message is higher than the priority of the data message sent by the network card A to the network card B.
固定RTT反射模块811,用于接收固定RTT请求模块810发送的请求消息,并从请求消息中取出时间戳1,构建固定RTT应答消息。该固定RTT应答消息的优先级高于网卡A向网卡B发送的数据报文的优先级。The fixed RTT reflection module 811 is configured to receive the request message sent by the fixed RTT request module 810, and retrieve the time stamp 1 from the request message to construct a fixed RTT response message. The priority of the fixed RTT response message is higher than the priority of the data message sent by the network card A to the network card B.
固定RTT应答模块812,用于接收固定RTT反射模块811发送的固定RTT应答消息,并从固定RTT应答消息中取出时间戳1,根据网卡A接收该固定RTT应答消息时的本地时间戳2和取出的时间戳1,计算差值并保存到上下文中,将该时间差记为固定RTT。The fixed RTT response module 812 is used to receive the fixed RTT response message sent by the fixed RTT reflection module 811, and extract the time stamp 1 from the fixed RTT response message, and take out the local time stamp 2 and take out according to the local time stamp 2 when the network card A receives the fixed RTT response message Time stamp 1, calculate the difference and save it in the context, and record the time difference as a fixed RTT.
动态RTT请求模块820,用于在数据报文中封装本地的时间戳3,并向网卡B发送封装时间戳3的数据报文。The dynamic RTT request module 820 is configured to encapsulate the local time stamp 3 in the data message, and send the data message encapsulating the time stamp 3 to the network card B.
接收模块850,用于接收动态RTT请求模块820发送的数据报文。The receiving module 850 is configured to receive the data message sent by the dynamic RTT request module 820.
动态RTT反射模块821,用于从接收模块850接收的数据报文中取出时间戳3,构建动态RTT应答消息。该动态RTT应答消息的优先级高于数据报文的优先级。The dynamic RTT reflection module 821 is used to extract the time stamp 3 from the data message received by the receiving module 850 to construct a dynamic RTT response message. The priority of the dynamic RTT response message is higher than the priority of the data message.
动态RTT应答模块822,用于接收动态RTT反射模块821发送的动态RTT应答消息,并从动态RTT应答消息中取出时间戳3,根据网卡A接收该动态RTT应答消息时的本地时间戳4和取出的时间戳3,计算差值并保存到上下文中,将该时间差记为动态RTT。The dynamic RTT response module 822 is used to receive the dynamic RTT response message sent by the dynamic RTT reflection module 821, and extract the time stamp 3 from the dynamic RTT response message, and take it out according to the local time stamp 4 when the network card A receives the dynamic RTT response message Time stamp 3, calculate the difference and save it in the context, and record the time difference as dynamic RTT.
速率控制模块830,用于从动态RTT应答模块822获取动态RTT,并从上下文中获取保存的固定RTT,计算时间差,该时间差(T q)为动态RTT减去固定RTT,此时间差用于表示网络队列拥塞程度。若T q小于T low,则表示网络队列没有拥塞,可以增大数据发送速率;若T q大于或等于T high,则表示网络队列拥塞,可以减小数据发送速率,以降低队列拥塞深度;若T q大于或等于T low,且小于T high,表示网络队列为轻拥塞,则不改变当前数据发送速率。 The rate control module 830 is used to obtain the dynamic RTT from the dynamic RTT response module 822, and obtain the stored fixed RTT from the context, and calculate the time difference. The time difference (T q ) is the dynamic RTT minus the fixed RTT, and the time difference is used to indicate the network Congestion level of the queue. If T q is less than T low , it means that the network queue is not congested and the data transmission rate can be increased; if T q is greater than or equal to T high , it means the network queue is congested, and the data transmission rate can be reduced to reduce the congestion depth of the queue; if T q is greater than or equal to T low and less than T high , indicating that the network queue is lightly congested, and the current data sending rate is not changed.
发送模块840,为数据传输模块,用于发送数据报文。该发送模块840可以根据速率控制模块830调整后的发送速率发送数据报文。The sending module 840 is a data transmission module for sending data messages. The sending module 840 may send data packets according to the sending rate adjusted by the rate control module 830.
需要说明的是,图8中仅以网卡A为发送数据报文的设备为例进行说明,实际应用中,网卡B也可以为发送数据的设备,本申请实施例对此并不进行限定。当网卡B为发送数据的一端时,网卡B包括的功能模块与图8中的网卡A包括的模块相同。It should be noted that FIG. 8 only takes the network card A as a device for sending data packets as an example for description. In actual applications, the network card B may also be a device for sending data, which is not limited in the embodiment of the present application. When the network card B is the end transmitting data, the functional modules included in the network card B are the same as the modules included in the network card A in FIG. 8.
上述主要从方法步骤的角度对本申请实施例提供的方案进行了介绍。可以理解的是,计算机为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,本申请能够以硬件和计算机软件的结合形式来实现。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。The foregoing mainly introduces the solutions provided in the embodiments of the present application from the perspective of method steps. It can be understood that, in order to implement the above-mentioned functions, a computer includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that, in combination with the modules and algorithm steps of the examples described in the embodiments disclosed herein, this application can be implemented in a combination of hardware and computer software. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
本申请实施例可以根据上述方法示例对计算机进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiment of the present application may divide the computer into functional modules according to the foregoing method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
在采用对应各个功能划分各个功能模块的情况下,图9示出了上述实施例中所涉及的一种网络拥塞控制装置可能的结构示意图,该网络拥塞控制装置900包括:处理模块901和收发模块902。处理模块901可以通过收发模块902执行图3中的S301、S305-S307,或图6中的S601、S605-S606,或图7中的S701-S704。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。In the case of dividing each function module corresponding to each function, FIG. 9 shows a possible structural schematic diagram of a network congestion control device involved in the above embodiment. The network congestion control device 900 includes: a processing module 901 and a transceiver module 902. The processing module 901 can execute S301, S305-S307 in FIG. 3, or S601, S605-S606 in FIG. 6, or S701-S704 in FIG. 7 through the transceiver module 902. Among them, all relevant content of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.
在采用对应各个功能划分各个功能模块的情况下,图10示出了上述实施例中所涉及的一种网络拥塞控制装置可能的结构示意图,该网络拥塞控制装置1000包括:处理模块1001和收发模块1002。处理模块1001可以通过收发模块1002执行图3中的S302-S304,或图6中的S602-S604。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。In the case of dividing each functional module corresponding to each function, FIG. 10 shows a possible structural schematic diagram of a network congestion control device involved in the foregoing embodiment. The network congestion control device 1000 includes: a processing module 1001 and a transceiver module 1002. The processing module 1001 can execute S302-S304 in FIG. 3 or S602-S604 in FIG. 6 through the transceiver module 1002. Among them, all relevant content of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.
在采用集成的单元的情况下,图11示出了上述实施例中所涉及的网络拥塞控制装置1100的一种可能的结构示意图。该网络拥塞控制装置1100包括:处理器1101和收发器1102,该处理器1101用于对网络拥塞控制装置1100的动作进行控制管理,例如,处理器1101用于通过收发器1102执行图3中的S301、S305-S307,或图6中的S601、S605-S606,或图7中的S701-S704,和/或用于本文所描述的技术的其它过程。可选的,上述网络拥塞控制装置1100还可以包括存储器1103,该存储器1103用于存储网络拥塞控制装置1100执行上文所提供的任一网络拥塞控制方法所对应的程序代码和数据。该存储器1103可以为只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。In the case of using an integrated unit, FIG. 11 shows a schematic diagram of a possible structure of the network congestion control apparatus 1100 involved in the foregoing embodiment. The network congestion control device 1100 includes: a processor 1101 and a transceiver 1102. The processor 1101 is used to control and manage the actions of the network congestion control device 1100. For example, the processor 1101 is used to perform the operations shown in FIG. 3 through the transceiver 1102. S301, S305-S307, or S601, S605-S606 in FIG. 6, or S701-S704 in FIG. 7, and/or other processes used in the technology described herein. Optionally, the aforementioned network congestion control apparatus 1100 may further include a memory 1103 configured to store the program code and data corresponding to any of the network congestion control methods provided above by the network congestion control apparatus 1100. The memory 1103 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
在采用集成的单元的情况下,图12示出了上述实施例中所涉及的网络拥塞控制装置1200的一种可能的结构示意图。该网络拥塞控制装置1200包括:处理器1201和收发器1202,该处理器1201用于对网络拥塞控制装置1200的动作进行控制管理,例如,处理器1201用于通过收发器1202执行图3中的S302-S304,或图6中的S602-S604,和/或用于本文所描述的技术的其它过程。可选的,上述网络拥塞控制装置1200还可以包括存储器1203,该存储器1203用于存储网络拥塞控制装置1200执行上文所提供的任一网络拥塞控制方法所对应的程序代码和数据。该存储器1203可以为只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。In the case of using an integrated unit, FIG. 12 shows a schematic diagram of a possible structure of the network congestion control apparatus 1200 involved in the foregoing embodiment. The network congestion control device 1200 includes a processor 1201 and a transceiver 1202. The processor 1201 is configured to control and manage the actions of the network congestion control device 1200. For example, the processor 1201 is configured to execute the operation shown in FIG. 3 through the transceiver 1202. S302-S304, or S602-S604 in FIG. 6, and/or other processes used in the techniques described herein. Optionally, the aforementioned network congestion control apparatus 1200 may further include a memory 1203 configured to store the program code and data corresponding to any of the network congestion control methods provided above by the network congestion control apparatus 1200. The memory 1203 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
结合本申请公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于核心网接口设备中。当然,处理器和存储介质也可以作为分立组件存在于核心网接口设备中。The steps of the method or algorithm described in conjunction with the disclosure of this application can be implemented in a hardware manner, or implemented in a manner in which a processor executes software instructions. Software instructions can be composed of corresponding software modules, which can be stored in random access memory (Random Access Memory, RAM), flash memory, erasable programmable read-only memory (Erasable Programmable ROM, EPROM), and electrically erasable Programming read-only memory (Electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and can write information to the storage medium. Of course, the storage medium may also be an integral part of the processor. The processor and the storage medium may be located in the ASIC. In addition, the ASIC may be located in the core network interface device. Of course, the processor and the storage medium may also exist as discrete components in the core network interface device.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。Those skilled in the art should be aware that in one or more of the above examples, the functions described in this application can be implemented by hardware, software, firmware or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another. The storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。The specific implementations described above further describe the purpose, technical solutions and beneficial effects of this application in detail. It should be understood that the above are only specific implementations of this application and are not intended to limit the scope of this application. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of this application shall be included in the protection scope of this application.

Claims (22)

  1. 一种网络拥塞控制方法,其特征在于,应用于第一设备,所述第一设备为发送数据报文的设备,所述方法包括:A network congestion control method, characterized in that it is applied to a first device, the first device being a device that sends data packets, and the method includes:
    所述第一设备向第二设备发送第一报文,所述第一报文中携带第一时间戳;所述第一时间戳为发送所述第一报文时的本地时间戳;The first device sends a first message to the second device, and the first message carries a first time stamp; the first time stamp is a local time stamp when the first message is sent;
    所述第一设备接收所述第二设备发送的第二报文,所述第二报文中携带所述第一时间戳;Receiving, by the first device, a second message sent by the second device, where the second message carries the first timestamp;
    用第二时间戳减去所述第一时间戳,获取第一往返时间RTT;所述第二时间戳为所述第一设备接收所述第二报文时的本地时间戳;Subtract the first time stamp from the second time stamp to obtain the first round-trip time RTT; the second time stamp is the local time stamp when the first device receives the second message;
    根据所述第一RTT,调整数据报文的发送速率;Adjusting the sending rate of data packets according to the first RTT;
    其中,所述第一报文的优先级和所述数据报文的优先级相同,所述第二报文的优先级高于所述数据报文的优先级。Wherein, the priority of the first message is the same as the priority of the data message, and the priority of the second message is higher than the priority of the data message.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    所述第一设备向所述第二设备发送第三报文,所述第三报文中携带第三时间戳,所述第三时间戳为发送所述第三报文时的本地时间戳;所述第三报文的优先级高于所述数据报文的优先级;Sending, by the first device, a third message to the second device, where the third message carries a third timestamp, and the third timestamp is a local timestamp when the third message is sent; The priority of the third message is higher than the priority of the data message;
    所述第一设备接收所述第二设备发送的第四报文;所述第四报文中携带所述第三时间戳;所述第四报文的优先级高于所述数据报文的优先级;The first device receives the fourth message sent by the second device; the fourth message carries the third timestamp; the priority of the fourth message is higher than that of the data message priority;
    用第四时间戳减去所述第三时间戳,获取第二RTT;所述第四时间戳为所述第一设备接收所述第四报文时的本地时间戳。Subtract the third timestamp from the fourth timestamp to obtain the second RTT; the fourth timestamp is the local timestamp when the first device receives the fourth packet.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述第一RTT,调整数据报文的发送速率,包括:The method according to claim 2, wherein the adjusting the sending rate of the data packet according to the first RTT comprises:
    用所述第一RTT减去所述第二RTT,得到时间差;所述时间差用于指示网络队列拥塞深度;Subtract the second RTT from the first RTT to obtain a time difference; the time difference is used to indicate the depth of network queue congestion;
    根据所述时间差,调整数据报文的发送速率。According to the time difference, the sending rate of the data message is adjusted.
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述时间差,调整数据报文的发送速率,包括:The method according to claim 3, wherein the adjusting the sending rate of the data message according to the time difference comprises:
    若所述时间差小于第一预设阈值,增大所述数据报文的发送速率;若所述时间差大于第二预设阈值,减小所述数据报文的发送速率;所述第一预设阈值小于所述第二预设阈值。If the time difference is less than a first preset threshold, increase the sending rate of the data message; if the time difference is greater than a second preset threshold, reduce the sending rate of the data message; the first preset The threshold is less than the second preset threshold.
  5. 根据权利要求2-4中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 2-4, wherein the method further comprises:
    若所述第一设备确定从上次发送所述第一报文开始已累积发送第一预设数量的数据包,获取第三RTT;或者,If the first device determines that the first preset number of data packets have been sent cumulatively since sending the first message last time, acquire the third RTT; or,
    若所述第一设备确定当前时间与上次发送所述第一报文的时间间隔达到第一预设时长,获取所述第三RTT,并记录当前时间戳。If the first device determines that the time interval between the current time and the last sending of the first message reaches the first preset duration, obtain the third RTT, and record the current timestamp.
  6. 根据权利要求2-5中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 2-5, wherein the method further comprises:
    若所述第一设备确定从上次发送所述第三报文开始已累积发送第二预设数量的数据包,获取第四RTT;或者,If the first device determines that the second preset number of data packets have been sent cumulatively since sending the third message last time, acquire the fourth RTT; or,
    若所述第一设备确定当前时间与上次发送所述第三报文的时间间隔达到第二预设 时长,获取所述第四RTT,并记录当前时间戳。If the first device determines that the time interval between the current time and the last sending of the third message reaches the second preset duration, obtains the fourth RTT, and records the current timestamp.
  7. 根据权利要求2-6中任一项所述的方法,其特征在于,所述第三时间戳携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。The method according to any one of claims 2-6, wherein the third timestamp is carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in the message. In the payload of the text RDMA.
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,若所述第一报文为所述数据报文,所述第一时间戳携带在报文RDMA的BTH中的保留字段中;若所述第一报文和所述数据报文不同,所述第一时间戳携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。The method according to any one of claims 1-7, wherein if the first message is the data message, the first timestamp is carried in a reserved field in the BTH of the RDMA message If the first message and the data message are different, the first time stamp is carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in the message RDMA The load Payload.
  9. 一种网络拥塞控制方法,其特征在于,所述方法包括:A network congestion control method, characterized in that the method includes:
    第二设备接收第一设备发送的第一报文,所述第一报文中携带第一时间戳;所述第一时间戳为发送所述第一报文时的本地时间戳;所述第一设备为发送数据报文的设备;The second device receives the first message sent by the first device, and the first message carries a first time stamp; the first time stamp is the local time stamp when the first message is sent; One device is a device that sends data packets;
    所述第二设备向所述第一设备发送第二报文,所述第二报文中携带所述第一时间戳;Sending, by the second device, a second message to the first device, the second message carrying the first timestamp;
    其中,所述第一报文的优先级和数据报文的优先级相同,所述第二报文的优先级高于所述数据报文的优先级。Wherein, the priority of the first message is the same as the priority of the data message, and the priority of the second message is higher than the priority of the data message.
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:The method according to claim 9, wherein the method further comprises:
    所述第二设备接收所述第一设备发送的第三报文,所述第三报文中携带第三时间戳,所述第三时间戳为发送所述第三报文时的本地时间戳;所述第三报文的优先级高于所述数据报文的优先级;The second device receives a third message sent by the first device, the third message carries a third timestamp, and the third timestamp is a local timestamp when the third message is sent ; The priority of the third message is higher than the priority of the data message;
    所述第二设备向所述第一设备发送第四报文;所述第四报文中携带所述第三时间戳;所述第四报文的优先级高于所述数据报文的优先级。The second device sends a fourth message to the first device; the fourth message carries the third timestamp; the priority of the fourth message is higher than the priority of the data message level.
  11. 一种网络拥塞控制装置,其特征在于,所述网络拥塞控制装置为发送数据报文的装置,所述装置包括:处理单元和收发单元;A network congestion control device, characterized in that the network congestion control device is a device for sending data messages, and the device includes: a processing unit and a transceiver unit;
    所述处理单元,用于:The processing unit is used to:
    通过所述收发单元向第二设备发送第一报文,所述第一报文中携带第一时间戳;所述第一时间戳为发送所述第一报文时的本地时间戳;Sending a first message to a second device through the transceiver unit, where the first message carries a first timestamp; the first timestamp is a local timestamp when the first message is sent;
    通过所述收发单元接收所述第二设备发送的第二报文,所述第二报文中携带所述第一时间戳;Receiving a second message sent by the second device through the transceiver unit, where the second message carries the first time stamp;
    用第二时间戳减去所述第一时间戳,获取第一往返时间RTT;所述第二时间戳为所述装置接收所述第二报文时的本地时间戳;Subtract the first time stamp from the second time stamp to obtain the first round trip time RTT; the second time stamp is the local time stamp when the device receives the second message;
    根据所述第一RTT,调整数据报文的发送速率;Adjusting the sending rate of data packets according to the first RTT;
    其中,所述第一报文的优先级和所述数据报文的优先级相同,所述第二报文的优先级高于所述数据报文的优先级。Wherein, the priority of the first message is the same as the priority of the data message, and the priority of the second message is higher than the priority of the data message.
  12. 根据权利要求11所述的装置,其特征在于,所述处理单元,还用于:The device according to claim 11, wherein the processing unit is further configured to:
    通过所述收发单元向所述第二设备发送第三报文,所述第三报文中携带第三时间戳,所述第三时间戳为发送所述第三报文时的本地时间戳;所述第三报文的优先级高于所述数据报文的优先级;Sending a third message to the second device through the transceiving unit, where the third message carries a third timestamp, and the third timestamp is a local timestamp when the third message is sent; The priority of the third message is higher than the priority of the data message;
    通过所述收发单元接收所述第二设备发送的第四报文;所述第四报文中携带所述 第三时间戳;所述第四报文的优先级高于所述数据报文的优先级;The fourth message sent by the second device is received by the transceiver unit; the fourth message carries the third time stamp; the priority of the fourth message is higher than that of the data message priority;
    用第四时间戳减去所述第三时间戳,获取第二RTT;所述第四时间戳为所述装置接收所述第四报文时的本地时间戳。Subtract the third timestamp from the fourth timestamp to obtain the second RTT; the fourth timestamp is the local timestamp when the device receives the fourth message.
  13. 根据权利要求12所述的装置,其特征在于,所述处理单元,具体用于:The device according to claim 12, wherein the processing unit is specifically configured to:
    用所述第一RTT减去所述第二RTT,得到时间差;所述时间差用于指示网络队列拥塞深度;Subtract the second RTT from the first RTT to obtain a time difference; the time difference is used to indicate the depth of network queue congestion;
    根据所述时间差,调整数据报文的发送速率。According to the time difference, the sending rate of the data message is adjusted.
  14. 根据权利要求13所述的装置,其特征在于,所述处理单元,具体用于:The device according to claim 13, wherein the processing unit is specifically configured to:
    若所述时间差小于第一预设阈值,增大所述数据报文的发送速率;若所述时间差大于第二预设阈值,减小所述数据报文的发送速率;所述第一预设阈值小于所述第二预设阈值。If the time difference is less than a first preset threshold, increase the sending rate of the data message; if the time difference is greater than a second preset threshold, reduce the sending rate of the data message; the first preset The threshold is less than the second preset threshold.
  15. 根据权利要求12-14中任一项所述的装置,其特征在于,所述处理单元,还用于:The device according to any one of claims 12-14, wherein the processing unit is further configured to:
    若所述处理单元确定从上次发送所述第一报文开始已累积发送第一预设数量的数据包,获取第三RTT;或者,If the processing unit determines that the first preset number of data packets have been sent cumulatively since the first message was sent last time, acquire the third RTT; or,
    若所述处理单元确定当前时间与上次发送所述第一报文的时间间隔达到第一预设时长,获取所述第三RTT,并记录当前时间戳。If the processing unit determines that the time interval between the current time and the last sending of the first message reaches the first preset duration, acquire the third RTT, and record the current time stamp.
  16. 根据权利要求12-15中任一项所述的装置,其特征在于,所述处理单元,还用于:The device according to any one of claims 12-15, wherein the processing unit is further configured to:
    若所述处理单元确定从上次发送所述第三报文开始已累积发送第二预设数量的数据包,获取第四RTT;或者,If the processing unit determines that the second preset number of data packets have been sent cumulatively since sending the third message last time, acquire the fourth RTT; or,
    若所述处理单元确定当前时间与上次发送所述第三报文的时间间隔达到第二预设时长,获取所述第四RTT,并记录当前时间戳。If the processing unit determines that the time interval between the current time and the last sending of the third message reaches the second preset duration, acquire the fourth RTT, and record the current time stamp.
  17. 根据权利要求12-16中任一项所述的装置,其特征在于,所述第三时间戳携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。The apparatus according to any one of claims 12-16, wherein the third time stamp is carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in the message. In the payload of the text RDMA.
  18. 根据权利要求11-17中任一项所述的装置,其特征在于,若所述第一报文为所述数据报文,所述第一时间戳携带在报文RDMA的BTH中的保留字段中;若所述第一报文和所述数据报文不同,所述第一时间戳携带在报文远程直接内存访问RDMA的基本传输头BTH中的保留字段中,或者,携带在报文RDMA的负载Payload中。The apparatus according to any one of claims 11-17, wherein if the first message is the data message, the first timestamp is carried in a reserved field in the BTH of the message RDMA If the first message and the data message are different, the first time stamp is carried in a reserved field in the basic transmission header BTH of the remote direct memory access RDMA of the message, or carried in the message RDMA The load Payload.
  19. 一种网络拥塞控制装置,其特征在于,所述装置包括:处理单元和收发单元;A network congestion control device, characterized in that the device includes: a processing unit and a transceiver unit;
    所述处理单元,用于:The processing unit is used to:
    通过所述收发单元接收第一设备发送的第一报文,所述第一报文中携带第一时间戳;所述第一时间戳为发送所述第一报文时的本地时间戳;所述第一设备为发送数据报文的设备;The first message sent by the first device is received by the transceiver unit, and the first message carries a first time stamp; the first time stamp is the local time stamp when the first message is sent; The first device is a device that sends data packets;
    通过所述收发单元向所述第一设备发送第二报文,所述第二报文中携带所述第一时间戳;Sending a second message to the first device through the transceiver unit, where the second message carries the first timestamp;
    其中,所述第一报文的优先级和数据报文的优先级相同,所述第二报文的优先级高于所述数据报文的优先级;或者,所述第一报文的优先级高于所述数据报文的优先 级,所述第二报文的优先级和所述数据报文的优先级相同。Wherein, the priority of the first message is the same as the priority of the data message, and the priority of the second message is higher than the priority of the data message; or, the priority of the first message is The priority is higher than the priority of the data message, and the priority of the second message is the same as the priority of the data message.
  20. 根据权利要求19所述的装置,其特征在于,所述处理单元,还用于:The device according to claim 19, wherein the processing unit is further configured to:
    通过所述收发单元接收所述第一设备发送的第三报文,所述第三报文中携带第三时间戳,所述第三时间戳为发送所述第三报文时的本地时间戳;所述第三报文的优先级高于所述数据报文的优先级;A third message sent by the first device is received through the transceiver unit, the third message carries a third time stamp, and the third time stamp is the local time stamp when the third message is sent ; The priority of the third message is higher than the priority of the data message;
    通过所述收发单元向所述第一设备发送第四报文;所述第四报文中携带所述第三时间戳;所述第四报文的优先级高于所述数据报文的优先级。Send a fourth message to the first device through the transceiver unit; the fourth message carries the third time stamp; the priority of the fourth message is higher than the priority of the data message level.
  21. 一种计算机存储介质,所述计算机存储介质中存储有计算机程序代码,其特征在于,当所述计算机程序代码在处理器上运行时,使得所述处理器执行如权利要求1-10中任一项所述的网络拥塞控制方法。A computer storage medium in which computer program code is stored, wherein when the computer program code runs on a processor, the processor is caused to execute any one of claims 1-10 The network congestion control method described in item.
  22. 一种网络拥塞控制装置,其特征在于,所述网络拥塞控制装置包括:A network congestion control device, characterized in that the network congestion control device includes:
    收发器,用于收发信息,或者用于与其他网元通信;Transceiver, used to send and receive information, or to communicate with other network elements;
    存储器,用于存储计算机执行指令;Memory, used to store computer execution instructions;
    处理器,用于执行所述计算机执行指令实现如权利要求1-10中任一项所述的网络拥塞控制方法。The processor is configured to execute the computer-executable instructions to implement the network congestion control method according to any one of claims 1-10.
PCT/CN2020/084260 2019-04-12 2020-04-10 Method and device for controlling network congestion WO2020207479A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910295531.0A CN111817977B (en) 2019-04-12 2019-04-12 Network congestion control method and device
CN201910295531.0 2019-04-12

Publications (1)

Publication Number Publication Date
WO2020207479A1 true WO2020207479A1 (en) 2020-10-15

Family

ID=72750946

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084260 WO2020207479A1 (en) 2019-04-12 2020-04-10 Method and device for controlling network congestion

Country Status (2)

Country Link
CN (1) CN111817977B (en)
WO (1) WO2020207479A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113422704A (en) * 2021-02-05 2021-09-21 阿里巴巴集团控股有限公司 Data measurement method, data measurement device, electronic equipment and computer storage medium
CN113037859B (en) * 2021-03-24 2022-04-22 新华三技术有限公司 Session information management method, device, exchange equipment and medium
CN113364701B (en) * 2021-05-28 2022-11-25 南京大学 RTT (round trip time) -based congestion control method and equipment combining proportional-integral-derivative control
CN114938354A (en) * 2022-06-24 2022-08-23 北京有竹居网络技术有限公司 Congestion control method, device, equipment and storage medium
CN116527593B (en) * 2023-07-03 2023-09-19 珠海星云智联科技有限公司 Network traffic congestion control method and related device
CN116582492B (en) * 2023-07-14 2023-09-26 珠海星云智联科技有限公司 Congestion control method, system and storage medium for optimizing RDMA reading
CN116760779A (en) * 2023-08-21 2023-09-15 珠海星云智联科技有限公司 Network congestion control method, system, storage medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1802037A2 (en) * 2005-12-23 2007-06-27 Agilent Technologies, Inc. System and method for measuring network performance using real network traffic
CN108075935A (en) * 2016-11-15 2018-05-25 华为技术有限公司 Measure the method and apparatus of time delay
CN108737207A (en) * 2017-04-25 2018-11-02 华为技术有限公司 Propagation delay time detection method, equipment and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055677A (en) * 2011-01-26 2011-05-11 杭州华三通信技术有限公司 Method and device for reducing network congestion
US9571406B2 (en) * 2011-10-25 2017-02-14 Vmware, Inc. Network congestion management based on communication delay
CN109412958B (en) * 2017-08-18 2022-04-05 华为技术有限公司 Congestion control method and device for data center
CN107896192B (en) * 2017-11-20 2020-09-25 电子科技大学 QoS control method for differentiating service priority in SDN network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1802037A2 (en) * 2005-12-23 2007-06-27 Agilent Technologies, Inc. System and method for measuring network performance using real network traffic
CN108075935A (en) * 2016-11-15 2018-05-25 华为技术有限公司 Measure the method and apparatus of time delay
CN108737207A (en) * 2017-04-25 2018-11-02 华为技术有限公司 Propagation delay time detection method, equipment and system

Also Published As

Publication number Publication date
CN111817977B (en) 2024-04-16
CN111817977A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
WO2020207479A1 (en) Method and device for controlling network congestion
WO2021008473A1 (en) System, method, and apparatus for evaluating round-trip time
US11032205B2 (en) Flow control method and switching device
US20060203730A1 (en) Method and system for reducing end station latency in response to network congestion
US10606492B2 (en) Detecting and handling solicited IO traffic microbursts in a fibre channel storage area network
US9426080B2 (en) Data communication apparatus, data transmission method, and computer system
WO2022121469A1 (en) Flow control method, apparatus, and device, and readable storage medium
WO2016182772A1 (en) Uplink performance management
Zhang et al. Congestion detection in lossless networks
Lu et al. Dynamic ECN marking threshold algorithm for TCP congestion control in data center networks
Gangam et al. Estimating TCP latency approximately with passive measurements
US11115308B2 (en) System and method for congestion control using time difference congestion notification
Zheng et al. An effective approach to preventing TCP incast throughput collapse for data center networks
JP4930275B2 (en) Communication system, communication method, transmitter, receiver, rate calculation method, and program
CA2940077C (en) Buffer bloat control
Lu et al. EQF: An explicit queue-length feedback for TCP congestion control in datacenter networks
Le et al. SFC: Near-source congestion signaling and flow control
US9882751B2 (en) Communication system, communication controller, communication control method, and medium
US11924106B2 (en) Method and system for granular dynamic quota-based congestion management
US20230123387A1 (en) Window-based congestion control
US11824792B2 (en) Method and system for dynamic quota-based congestion management
JP4828555B2 (en) Node device and bandwidth control method
Misund Rapid acceleration in TCP Prague
JP2012044278A (en) Edge node and bandwidth control method
CEIE Network Working Group H. Zheng Internet Draft CEIE, BJTU Intended status: Experimental C. Qiao Expires: December 28, 2016 SUNY K. Chen

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20787539

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20787539

Country of ref document: EP

Kind code of ref document: A1