WO2020001192A1 - 一种数据传输方法、计算设备、网络设备及数据传输系统 - Google Patents

一种数据传输方法、计算设备、网络设备及数据传输系统 Download PDF

Info

Publication number
WO2020001192A1
WO2020001192A1 PCT/CN2019/087382 CN2019087382W WO2020001192A1 WO 2020001192 A1 WO2020001192 A1 WO 2020001192A1 CN 2019087382 W CN2019087382 W CN 2019087382W WO 2020001192 A1 WO2020001192 A1 WO 2020001192A1
Authority
WO
WIPO (PCT)
Prior art keywords
data packet
rtt
sent
data
network device
Prior art date
Application number
PCT/CN2019/087382
Other languages
English (en)
French (fr)
Inventor
谭焜
胡水海
付斌章
陈凯
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP19826507.6A priority Critical patent/EP3742690B1/en
Publication of WO2020001192A1 publication Critical patent/WO2020001192A1/zh
Priority to US17/006,196 priority patent/US11477129B2/en
Priority to US17/856,161 priority patent/US11799790B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • H04L47/283Flow control; Congestion control in relation to timing considerations in response to processing delays, e.g. caused by jitter or round trip time [RTT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2425Traffic characterised by specific attributes, e.g. priority or QoS for supporting services specification, e.g. SLA
    • H04L47/2433Allocation of priorities to traffic types
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/11Identifying congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/27Evaluation or update of window size, e.g. using information derived from acknowledged [ACK] packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/31Flow control; Congestion control by tagging of packets, e.g. using discard eligibility [DE] bits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/32Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/37Slow start
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/625Queue scheduling characterised by scheduling criteria for service slots or service orders
    • H04L47/6275Queue scheduling characterised by scheduling criteria for service slots or service orders based on priority

Definitions

  • the present application relates to the field of communications technologies, and in particular, to a data transmission method, related equipment, and systems.
  • Transmission Control Protocol is a connection-oriented, reliable, byte stream-based transport layer communication protocol.
  • the RFC 793 definition issued by the Internet Engineering Task Force (IETF) is the most widely used transport layer protocol in the network.
  • TCP assigns a sequence number (SN) to each data packet.
  • SN sequence number
  • ACK acknowledgement
  • the confirmation will carry the sequence number of the received data packet. If the sender does not receive an acknowledgement within a reasonable Round-Trip Time (RTT), the corresponding data packet will be retransmitted.
  • RTT Round-Trip Time
  • TCP guarantees the reliable transmission of data through acknowledgement and timeout retransmission mechanisms
  • network resources including link bandwidth, buffers in switching nodes, etc.
  • network congestion If transmitted in the network within a certain period of time, If there are too many data packets, the transmission performance of the network will deteriorate sharply. This situation is called network congestion. When network congestion occurs, packet loss generally occurs, transmission delays increase, throughput decreases, and even severe "congestion collapse” can occur.
  • TCP introduced a series of congestion control algorithms, including the "slow start” and “congestion avoidance” algorithms originally proposed by V. Jacobson in a 1988 paper, and later "Fast retransmit” and “Fast Recovery” algorithms added to the TCP Reno version.
  • the common point of these congestion control algorithms is to adjust the rate of data transmission based on a congestion window.
  • the size of the congestion window that is, the cwnd value, represents the maximum number of data packets that can be sent but not yet received an ACK. The larger the window, the faster the data transmission rate, but the more likely it is to cause network congestion. If the window value is 1, then every time you send a data, you have to wait for the other party's confirmation before sending the second data packet. Obviously, the data transmission efficiency is low. Choosing the best cwnd value to maximize network throughput without congestion is the core of the congestion control algorithm.
  • cwnd ssthresh
  • the congestion avoidance phase is entered.
  • the congestion avoidance phase does not increase cwnd exponentially, but increases linearly, that is, only 1 / cwnd data is added after receiving the ACK at the receiving end. package. This increases cwnd by up to one in an RTT.
  • the existing congestion control algorithms can slowly increase the data transmission rate when the network is in good condition to avoid impacting the network. At the same time, when a packet loss is detected, the data transmission rate is aggressively reduced to avoid further deterioration of the network state.
  • RDMA remote direct memory access
  • the embodiments of the present application provide a data transmission method and related equipment and systems, which aim to reduce network congestion while making full use of network bandwidth, increasing data transmission rate, and reducing data transmission delay.
  • an embodiment of the present application provides a data transmission method.
  • the method includes: a first end of a data transmission phase between a transmitting end and a receiving end at a large rate (line speed or arbitrary). (Custom rate) send multiple data packets, and add a first tag to the multiple data packets sent in the first RTT, so that after the network device receives the data packet carrying the first tag, Buffering or discarding the data packet carrying the first tag to a low priority queue, wherein a data packet in a high priority queue of the network device is forwarded prior to a data packet in the low priority queue , The data packet buffered in the high priority queue does not carry the first mark.
  • This method uses the free network bandwidth to enable new data flows to start quickly without delay, and at the same time marks the packets sent by the first round of RTT, so that network devices forward the packets sent by the first round of RTT at a lower priority to reduce new flows. Quickly start the impact on old flows (non-first round RTT packets), reducing the probability of network congestion.
  • the sending end may adjust the sending rate or sending data in the next RTT based on the number of data packets successfully received by the receiving end among the multiple data packets sent in the first RTT. Number of packets, and send data packets in the next RTT based on the adjusted sending rate or number of data packets, so as to implement congestion control in time based on the perceived network conditions, so as not to cause rapid deterioration of network conditions.
  • the sender may add a second tag to a data packet sent in a non-first RTT to indicate that the data packet is sent in a non-first RTT, and the network device may further based on the second carried in the data packet. Mark, buffer the data packet to a high-priority queue, and prioritize the data packet sent within the first RTT to reduce the impact on the old flow (non-first round RTT packet).
  • the first mark and the second mark are a field or a specific bit in a packet header.
  • the sending end establishes a communication connection with the receiving end before sending a data packet.
  • the first RTT or the first round of RTT is the first RTT after the communication connection is established.
  • the transmitting end performs data transmission during the process of establishing a communication connection with the receiving end.
  • the first RTT or the first round of RTT is the first RTT in the communication connection establishment phase.
  • the sending end confirms the number of data packets successfully received by the receiving end among the multiple data packets sent in the first RTT based on the received one or confirmation from the receiving end. .
  • the upper limit of the number of data packets that the sender is allowed to send in the second RTT is linearly related to the number of data packets that have been successfully received by the receiver in the first RTT.
  • an embodiment of the present application provides a data transmission method, including: a network device receives a data packet sent by a sending end; if the data packet is a first round trip of the data transmitting stage between the sending end and the receiving end If the data packet is sent within the time (RTT), the network device buffers the data packet to a low priority queue; if the data packet is not sent within the first RTT, the network device The data packets are buffered to a high-priority queue; wherein the data packets in the high-priority queue are forwarded prior to the data packets in the low-priority queue.
  • the network device distinguishes between data packets sent by the first round of RTT and data packets sent by the non-first round RTT, and gives higher forwarding priority to data packets sent by the non-first round of RTT, thereby reducing the speed of the first round of RTT.
  • the impact on old flows (non-first round RTT packets) reduces the probability of network congestion.
  • the sender adds a specific tag to the data packet sent by the first round of RTT, and the network device determines whether the data packet was sent by the sender within the first round of RTT based on the tag carried by the received data packet.
  • the network device maintains a flow table for recording all active flow information. If the five-tuple information of a flow cannot be found in the flow table, the flow is classified as a new flow, and a new flow record is inserted into the flow table. When the subsequent data packets look up the table, the newly inserted flow entry will be hit, and according to the content of the flow entry, it is determined that the current data packet belongs to the new flow, that is, the packet sent by the first round of RTT. When a new stream ends the first round of RTT data transmission, the updated stream entry is "old stream", so subsequent packets of the stream will be identified as non-first round RTT packets based on the updated stream entry.
  • each flow record of the flow table has a valid time. If the flow does not subsequently send any new data packets within the valid time, the flow record is deleted.
  • an embodiment of the present application provides a data transmission method, including: a network device receives a data packet sent by a sending end; if the data packet is the first of the data transmitting phase between the sending end and the receiving end Sent within the round-trip time (RTT) and the number of data packets in the receive queue of the network device exceeds a set threshold, the network device drops the data packet; if the data packet is not in the first Sent within one RTT and the receiving queue is not full, the network device adds the data packet to the receiving queue.
  • RTT round-trip time
  • the network device selectively discards the data packets sent by the first round of RTT based on the depth of the receiving queue, thereby reducing the impact of the first round of rapid RTT packet sending on the old flow (non-first round of RTT packets) and reducing the occurrence of network congestion. The probability.
  • the network device drops the data packet if the data packet is not sent within the first RTT and the receive queue is full.
  • the network device drops a data packet in the receive queue, where The dropped data packet is a data packet sent by the sending end within the first RTT.
  • the sender adds a specific tag to the data packet sent by the first round of RTT, and the network device determines whether the data packet was sent by the sender within the first round of RTT based on the tag carried by the received data packet.
  • the network device maintains a flow table for recording all active flow information. If the five-tuple information of a flow cannot be found in the flow table, the flow is classified as a new flow, and a new flow record is inserted into the flow table. When the subsequent data packets look up the table, the newly inserted flow entry will be hit, and according to the content of the flow entry, it is determined that the current data packet belongs to the new flow, that is, the packet sent by the first round of RTT. When a new stream ends the first round of RTT data transmission, the updated stream entry is "old stream", so subsequent packets of the stream will be identified as non-first round RTT packets based on the updated stream entry.
  • each flow record of the flow table has a valid time. If the flow does not subsequently send any new data packets within the valid time, the flow record is deleted.
  • an embodiment of the present application provides a computing device having the function of implementing a transmitting end in the foregoing method example.
  • This function can be realized by hardware, and can also be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • the computing device includes a processor, a memory, and a network card.
  • the network card is used to receive data packets and send data packets.
  • the processor runs a protocol stack program in the memory, and is used to execute the foregoing method example. Features of the sender.
  • the structure of the computing device includes a receiving unit, a processing unit, and a sending unit, and these units can perform the corresponding functions in the foregoing method example.
  • the receiving unit and the sending unit are used for receiving data packets, respectively.
  • the processing unit is used for data packet processing, such as adding a first and / or a second tag.
  • an embodiment of the present application provides a network device that has a function of implementing a network device in any one of the foregoing aspects or any possible implementation manner of any aspect.
  • This function can be realized by hardware, and can also be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • the network device includes a processor, a memory, and an input / output port.
  • the input / output port is used to receive data packets and send data packets.
  • the processor runs a protocol stack program in the memory to use It performs the functions of the network in the above method example, such as identifying the data packets sent by the first round of RTT, buffering the data packets to the receiving queue, and discarding the data packets when the receiving queue is full or the queue depth exceeds a set threshold.
  • the structure of the network device includes a receiving unit, a processing unit, and a sending unit. These units can perform the corresponding functions in the foregoing method examples.
  • the receiving unit and the sending unit are used for receiving and Sending.
  • the processing unit is used for data packet processing, such as identifying the packets sent by the first round of RTT, buffering the packets to the receiving queue, and discarding the packets when the receiving queue is full or the queue depth exceeds a set threshold.
  • the above-mentioned receiving unit and sending unit are transceivers, network cards or communication interfaces
  • the processing unit is a processor, a hardware circuit such as a field programmable gate array (FPGA), or an application-specific integrated circuit (ASIC), or a dedicated circuit. chip.
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • an embodiment of the present application provides a network card, including: an input / output port and a processor, wherein the processor is configured to pass the input within a first round-trip time RTT of a data transmission phase between a transmitting end and a receiving end.
  • / Output port sends multiple data packets; adds a first tag to the multiple data packets sent within the first RTT, so that the network device will send all data packets after receiving the data packet carrying the first tag.
  • the data packet carrying the first tag is cached or dropped in a low priority queue, wherein the data packet in the high priority queue of the network device is forwarded prior to the data packet in the low priority queue, so The data packet buffered in the high priority queue does not carry the first mark.
  • an embodiment of the present application provides a computing device, and the computing device includes the foregoing network card.
  • an embodiment of the present application provides a data transmission system.
  • the system includes the foregoing computing device and the foregoing network device.
  • an embodiment of the present application provides a computer storage medium for storing computer software instructions used by the computing device or network device, which includes a program designed to execute the foregoing aspect.
  • the data transmission method provided in the embodiment of the present application sends a large number of data packets in the first RTT of the sender after the TCP connection is established, thereby making full use of the available network bandwidth.
  • Network congestion that is, the data transmission method provided in the embodiment of the present application achieves a better balance between the network bandwidth utilization rate and the probability of network congestion, while fully utilizing the network bandwidth, it also tries to avoid causing network congestion as much as possible.
  • FIG. 1 is a schematic diagram of a congestion control algorithm in the prior art.
  • FIG. 2 is an architecture diagram of a data transmission system according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a computing device according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a computing device according to another embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a computing device according to another embodiment of the present application.
  • FIG. 6 is a flowchart of a data transmission method according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of data transmission based on high and low priority queues according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a selective packet loss process according to an embodiment of the present application.
  • FIG. 9 is a flowchart of a data transmission method according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a format of a TCP data packet according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a network device according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of another data transmission system architecture according to an embodiment of the present application.
  • Data packet It is also called a message. It is the basic unit of network transmission. It is data organized in a certain format. Different types of network protocols have different definitions of the data packet format, but in general, a data packet can be divided into a header and a payload, where the header contains the necessary data during the packet transmission process. Information, such as address information, flag bits, etc., the payload is also called the data portion of the data packet, and contains the content of the data being sent.
  • Switch A network device that can forward data packets and can provide more connection ports for the network in order to connect more devices to the network.
  • Switch queue A functional unit in a switch that stores received data packets.
  • Server A device that provides computing services. Because the server needs to respond to service requests and process them, in general, the server should have the ability to undertake services and guarantee services.
  • the server In the network environment, according to the different types of services provided by the server, it is divided into file server, database server, application server, WEB server and so on.
  • Terminal device A device that provides users with voice and / or data connectivity, including wireless or wired terminals.
  • the wireless terminal may be a mobile terminal, such as a mobile phone (also referred to as a “cellular” phone) or a computer with a mobile terminal, for example, may be a portable, pocket-sized, handheld, computer-built or vehicle-mounted mobile device.
  • a data center is a complex set of facilities. It includes not only computer systems and other supporting equipment (such as communication and storage systems), but also redundant data communication networks, environmental control equipment, monitoring equipment, and various safety devices. These devices are placed together because they have the same environmental requirements and physical safety requirements, and they are placed for easy maintenance.
  • Data center network A network that connects all computer systems in a data center.
  • Network bandwidth refers to the amount of data that can be transmitted in a unit time (generally, 1 second).
  • Data flow refers to a group of data packets sent from one computing device (such as server A) to another computing device (such as server B) in the network.
  • the definition of the data flow can be changed according to the needs of the application. Usually it is five yuan.
  • Group source IP address, destination IP address, source port number, destination port number, protocol type to define a data flow.
  • Network congestion refers to the situation where network transmission performance is reduced due to limited network resources when the amount of data transmitted on the network is too large. Generally, network congestion occurs when the network performance is degraded due to excessive load on the network.
  • Congestion control Solving or alleviating network congestion through some method. Congestion control needs to ensure that the network can carry the traffic submitted by users. It is a global problem and involves many factors such as hosts and switches. The main parameters for measuring whether the network is congested are: packet loss rate, switch queue depth (average or instantaneous), number of retransmitted packets over time, average transmission delay, etc.
  • Congestion control algorithm An algorithm to solve network congestion.
  • Acknowledgement A control packet sent by the receiver to the sender during data transmission to indicate that the data sent by the sender has been acknowledged.
  • Round-Trip Time An important performance indicator in network transmission, which represents the total delay experienced from the time the sender sends data until the sender receives an acknowledgement (ACK) from the receiver.
  • RTT is determined by three parts: the propagation time of the link, the processing time of the sender and receiver, and the queuing and processing time in the buffer of the intermediate network device. Among them, the values of the first two parts are relatively fixed, and the queuing and processing time in the cache of the network device will change as the degree of network congestion changes. So the change of RTT reflects the change of network congestion to a certain extent.
  • Transmission Control Protocol is a connection-oriented, reliable, byte stream-based transport layer communication protocol, defined by IETF RFC793.
  • ECN Explicit Congestion Notification
  • the receiving end of the data packet After detecting that the data packet is marked by the network device, the receiving end of the data packet makes a corresponding mark in the returned ACK to notify the sending end that network congestion has occurred; accordingly, the sending end reduces its transmission rate, just like in traditional TCP congestion control algorithms That was the case when packet loss was detected.
  • Remote Direct Memory Access A technique created to address server-side data processing delays in network transmission. RDMA uses the network card to offload the protocol stack to quickly move data from one server to another server's memory. The entire process does not require the involvement of the operating system (kernel bypass), which can reduce the load on the CPU.
  • DPDK Data Plane Development Kit It is a set of development platforms and interfaces for quickly processing data packets.
  • DPDK provides a simple and complete architecture for fast packet processing and network transmission at the application layer.
  • DPDK allows data packets encapsulated in the application layer to be sent directly to the network card, without the involvement of the operating system in the process, thus reducing the load on memory and CPU.
  • FIG. 2 shows a simplified architecture of a data transmission system 10 to which the solution of the present invention is applicable.
  • a plurality of computing devices 100 are communicatively connected through a network 200.
  • the computing device 100 may be any device capable of transmitting and receiving data, such as a server, a terminal device, or a virtual machine.
  • the network 200 may be a data center network, the Internet, a local area network, or a wide area network.
  • the network 200 includes one or more network devices 210, such as switches, gateways, routers, and other devices capable of implementing data packet forwarding.
  • the two devices use communication-related components, such as a protocol stack and a network card, to establish a communication connection, and transmit data in units of data packets according to the established connection.
  • the data packet passes through one or more network devices 200 in the network 200 during the transmission process.
  • the network device 200 first buffers the received data packet in a switch queue, and then forwards it to other devices.
  • one of the two devices that have established a communication connection is referred to as a transmitting end, and the other is referred to as a receiving end.
  • the transmitting end and the receiving end may be any device having data transmission and reception capabilities.
  • the sending end may be a server and the receiving end is another server; or the sending end is a terminal device and the receiving end is a server; or the sending end and the receiving end are both terminal devices.
  • the sending end and the receiving end are two opposite roles, and can be converted to each other, that is, the same device may be the sending end or the receiving end in different scenarios.
  • a data transmission system may generally include fewer or more components than those shown in FIG. 2 or include components different from those shown in FIG. 2.
  • FIG. 2 The multiple implementation methods disclosed in the embodiments of the present application are more relevant.
  • the data transmission system 10 shown in FIG. 2 is only a typical application scenario of the present invention, and should not be construed as limiting the application scenario of the present invention.
  • the embodiment of the present application provides a data transmission method, which can reduce network congestion while making full use of network bandwidth, improving data transmission rate, and reducing data transmission delay. This method can be used for devices that perform data transmission based on TCP. It can be understood that other networks that allow packet loss can also use this method for congestion control.
  • FIG. 3 is a schematic structural diagram of a computing device 100 according to an embodiment of the present application.
  • the computing device 100 includes components such as a processor 110, a memory 130, and a network interface card (NIC) 150. These components can communicate via one or more communication buses or signal lines.
  • NIC network interface card
  • the computing device 100 may include more or fewer components than shown, or a combination of certain components.
  • the processor 110 is a control center of the computing device 100, and uses various interfaces and buses to connect various components of the computing device 100.
  • the processor 110 may include one or more processing units.
  • the memory 130 stores executable programs such as an operating system 131 and an application program 133 shown in FIG. 3.
  • the processor 110 is configured to execute an executable program in the memory 130 so as to implement functions defined by the program, for example, the processor 110 executes an operating system 131 to implement various functions of the operating system on the computing device 100.
  • the memory 130 also stores data other than executable programs, such as data generated during the operation of the operating system 131 and the application program 133.
  • the memory 130 may include one or more storage devices.
  • the memory 130 is a non-volatile storage medium, and generally includes a memory and an external memory.
  • the memory includes, but is not limited to, Random Access Memory (RAM), Read-Only Memory (ROM), or a cache.
  • External storage includes, but is not limited to, flash memory, hard disk, universal serial bus (USB) disk, and the like.
  • An executable program is usually stored on external memory, and the processor will load the program from external memory to memory before executing the executable program.
  • the memory 130 may be independent and connected to the processor 110 through a bus; the memory 130 may also be integrated with the processor 110 into a chip subsystem.
  • the network card 150 is a hardware unit that implements functions such as data packet sending and receiving, data packet encapsulation and decapsulation, media access control, and data caching.
  • the network card 150 includes a processor, a memory (including RAM and / or ROM), and an input / output port.
  • the input / output port is used to receive and send data packets, and the memory is used to buffer the received data packets or data packets to be sent.
  • the processor runs the program in the memory to process the data packet, such as encapsulation / decapsulation.
  • the operating system 131 mounted on the computing device 100 may be Or other operating systems, this embodiment of the present application does not place any restrictions on this.
  • the computing device 100 may be logically divided into a hardware layer 21, an operating system 131, and an application layer 27.
  • the hardware layer 21 includes hardware resources such as the processor 110, the memory 130, and the network card 150 as described above.
  • the application layer 31 includes one or more applications, such as an application 133.
  • the operating system 131, as software middleware between the hardware layer 21 and the application layer 31, is an executable program that manages and controls hardware and software resources.
  • the operating system 131 includes a kernel 23, which is used to provide low-level system components and services, such as: power management, memory management, protocol stack 25, hardware driver 26, and the like.
  • the protocol stack 25 is a component that implements a network protocol.
  • the hardware driver 26 includes a memory driver 233 and a network card driver 235.
  • the protocol stack 25 includes a network layer protocol component 253 and a transport layer protocol component 255, which are respectively used to implement network protocol functions of the network layer and the transport layer.
  • the data transmission method provided in the embodiment of the present application may be implemented by a transport layer protocol component 255 in the kernel 12.
  • the computing device 100 may offload the functions of the related components of the protocol stack to the network card, thereby implementing kernel bypass and improving data forwarding performance.
  • the network card 150 is a programmable network card, and the transport layer protocol component 255 is implemented in the programmable network card 150.
  • the data transmission method provided in the embodiment of the present application may be implemented by the network card 150.
  • the method may be implemented by modifying the hardware or firmware of the network card 150.
  • the computing device 100 may implement the transport layer protocol component 255 at the application layer (such as DPDK technology), thereby implementing kernel bypass and improving data forwarding performance.
  • the data transmission method provided in the embodiment of the present application may be implemented by software at an application layer.
  • FIG. 6 is a schematic flowchart of a data transmission method according to an embodiment of the present application.
  • one of the two computing devices 100 that performs data transmission through the network device 210 is referred to as a transmitting end, and the other is referred to as a receiving end.
  • the data transmission method includes the following steps:
  • Step 610 The sender sends multiple data packets in the first RTT in the data transmission phase with the receiver, or sends data packets at a very large rate (line speed or any custom rate) in the first RTT.
  • RTT is the time that it takes for the sender to start sending data and receive an acknowledgement from the receiver.
  • RTT is not a fixed value, it is a measured value. It will change with the congestion of the entire network.
  • For the first RTT it is the time elapsed from the time when the sender sends the first packet until the sender receives the acknowledgement from the receiver.
  • the sender can dynamically estimate the RTT value based on the timestamp of the sent data packet and the timestamp of the received acknowledgement. Therefore, as the degree of network congestion changes, the RTT also changes.
  • the sending end may send multiple data packets without restriction, instead of sending only one data packet as in the slow start mechanism of the prior art.
  • the multiple data packets here include at least two data packets, preferably, more than two, or even more than two data packets.
  • the data transmission phase is relative to the connection establishment phase. In order for the sender and receiver to realize data transmission, a communication connection needs to be established. The connection establishment phase usually involves one or more message interactions.
  • the “first RTT” described in the embodiments of the present application generally refers to the first RTT for data transmission after the communication connection between the sending end and the receiving end is established.
  • connection establishment phase and data transmission phase may not be serial and parallel, that is, the sender starts data transmission during the process of establishing a connection with the receiver.
  • first RTT refers to the first RTT that the sender starts to transmit data to the receiver.
  • Step 630 The network device 210 receives one or more of the multiple data packets, buffers them into a receiving queue, and forwards them at an appropriate time.
  • Step 650 The sending end adjusts the next RTT (not the first round of RTT) according to the number of data packets successfully received by the receiving end in the data packets sent in the first RTT, that is, the number of data packets sent in the second RTT. Number or transmission rate, and continue to send data packets based on the adjusted number or transmission rate. For example, if the number of data packets successfully received by the receiver in the first RTT is N, you can adjust the upper limit of the data packets allowed to be sent in the next RTT to N, N + 1, or be linear with N Integer value of the relationship.
  • the “number of data packets successfully received by the receiving end” in step 650 is generally determined or estimated by the transmitting end based on the feedback from the receiving end, and may be different from the actual number of data packets successfully received by the receiving end. There is an error.
  • each time the receiving end receives a data packet it will reply a corresponding acknowledgement to the sending node, indicating that the receiving end has successfully received the message.
  • the sending end confirms the number of data packets successfully received by the receiving end among the plurality of data packets sent in the first RTT based on the received confirmation from the receiving end.
  • the receiving end may also send back information about multiple data packets that have been successfully received, or information about one or more data packets that have not been received successfully, through a confirmation carrying an extension field.
  • the extended field in the confirmation can determine the data packets successfully received by the receiving end among the multiple data packets sent in the first RTT, and then determine the number of data packets.
  • Step 670 The network device 210 receives one or more data packets sent by the transmitting end in a non-first round RTT, buffers them into a receiving queue, and forwards them at an appropriate timing.
  • the network device 210 may perform a differentiated operation on a data packet sent in the first RTT and a data packet sent in a non-first RTT. Among them, the network device 210 may adopt various methods to distinguish a data packet sent by the transmitting end in the first RTT and a data packet sent in a non-first RTT.
  • the transmitting end may add a first mark to the data packet sent in the first RTT, so that the network device 210 can quickly identify the data packet sent in the first RTT.
  • the sending end may also add a second tag to the data packet sent in the non-first RTT, so that the network device 210 can quickly identify the data packet sent in the non-first RTT.
  • the first mark and the second mark may be located in an extended field in the header of the data packet, such as an option field, or a reserved field.
  • the first mark and the second mark may be ECN bits of a packet header. When the ECN bit position is a certain value (for example, 0 or 1, etc.), it is used to indicate that the data packet is within the first RTT. Packets sent.
  • the network device 210 can determine whether the data packet is the first data packet sent in the RTT according to the tag carried by the data packet.
  • the network device 210 may maintain a flow table that records all active data flow information, and perform new and old flow classification based on the flow table to determine whether the data packet is sent within the first RTT.
  • the active data stream refers to the data stream that has completed the first round of RTT data transmission. If the five-tuple information of a flow cannot be found in the flow table, the flow is classified as a new flow, and a new flow record is inserted into the flow table. If the flow table table is queried according to the quintuple of the data packet and the above-mentioned newly inserted flow entry is hit, it is determined that the current data packet belongs to the new flow, that is, the first data packet sent by the RTT.
  • each flow record of the flow table has a valid time, and if the flow does not transmit any new data packet within the valid time, the flow record is deleted. Conversely, the calculation of the effective time is restarted after the effective time expires.
  • the flow rate of the second round of RTT may be calculated according to the following method (1) or (2):
  • the network device 210 performs differentiated buffering and forwarding according to the identified data packets sent in the first RTT and data packets sent in a non-first RTT.
  • the network device 210 includes multiple priority queues, such as a high priority queue and a low priority queue, and the network device 210 buffers a data packet sent in the first RTT to a low priority. Queue to buffer the packets sent in the non-first RTT to the high priority queue. The data packets in the high priority queue are forwarded first. When the high priority queue is empty or the depth of the high priority queue is less than a preset threshold, the data packets in the low priority queue are forwarded. The depth of a queue is the number of packets buffered in the queue.
  • network device 210 has two high-priority queues and two low-priority queues, or three high-priority queues and one low-priority queue.
  • Priority queue there can be one or more high-priority queues and low-priority queues, for example, network device 210 has two high-priority queues and two low-priority queues, or three high-priority queues and one low-priority queue.
  • Priority queue is one or more high-priority queues and low-priority queues.
  • a selective packet loss threshold k is set for the receiving queue of the network device 210.
  • the first RTT received by the network device 210 is sent. Both the packets sent in the non-first RTT and the packets sent in the first RTT can be buffered to the receiving queue; and when the depth of the receiving queue is greater than or equal to the threshold k, if the currently received data packet is the data packet sent in the first RTT, The network device 210 will drop the data packet. If the currently received data packet is a data packet that is not sent within the first RTT and the receive queue is not full, the network device 210 will buffer it to the receive queue.
  • the depth of the receiving queue refers to the number of data packets in the receiving queue.
  • the network device 210 when the receive queue is full, or the depth of the receive queue is greater than or equal to the threshold k, if the currently received data packet is a data packet sent in a non-first RTT, the network device 210 also One or more packets in the first RTT buffered in the receive queue can be discarded to "make room" for newly received packets not sent in the first RTT.
  • the network device 210 may also discard the currently received data packet.
  • the network device 210 may also The received data packets are discarded, where the threshold s here is greater than or equal to the aforementioned selective packet loss threshold k.
  • a first threshold m is set for a low-priority queue for buffering a packet sent in the first RTT
  • a first threshold n is set for a high-priority queue for buffering a packet sent in a non-first RTT.
  • M ⁇ n.
  • the network device 210 when the low-priority queue depth is greater than or equal to m, if the network device 210 receives the first data packet sent in the RTT, it is discarded; when the high-priority queue depth is greater than or equal to n, if the network device 210 receives a Packets sent within the first RTT are discarded.
  • FIG. 9 is a schematic flowchart of a data transmission method according to an embodiment of the present application when the computing device 100 performs data transmission based on TCP.
  • TCP is only a transmission protocol that a computing device may use, and other transmission protocols may also be applicable to the methods in the embodiments of the present application.
  • one of the two computing devices 100 that performs data transmission through the network device 210 is referred to as a transmitting end, and the other is referred to as a receiving end.
  • the data transmission method includes the following steps:
  • Step 910 The sender establishes a TCP connection with the receiver.
  • the establishment of the TCP connection may be initiated by an application on the sending end.
  • the application generates a socket open command, which is passed to the protocol stack of the sending end to trigger the protocol stack to establish a TCP connection with the receiving end through three message interactions (also known as "three-way handshake").
  • the protocol The stack informs the application that a connection has been established.
  • the format of the TCP data packet is shown in Figure 10. Among them, the source port number and the destination port number are used to determine the application process of the sender and receiver.
  • the five-tuple (source port number, destination port number, source IP address, destination IP address) Address and transport layer protocol number) can uniquely identify a TCP connection.
  • the data packets transmitted on the TCP connection constitute a data stream, that is, the data packets in the same data stream have the same quintuple; the sequence number of the TCP packet header
  • the (Sequence Number, seq) field is used to indicate the sequence number of the data packet.
  • the sequence number of the data packet is the sequence number of the first data byte in the payload of the data packet.
  • the receiver After receiving the TCP data packet, the receiver sends an Acknowledgement (ACK) to the sender.
  • the window size is used to indicate the size of the current receive buffer at the receiving end.
  • the Option field can be used to carry additional information; the definition of the 6 control bits is as follows:
  • PSH immediately sent to the application layer for processing
  • SYN synchronization flag, set to 1 to establish a connection
  • the sender and the receiver establish a TCP connection through a "three-way handshake" as follows:
  • the sending end first sends a SYN (Synchronize) packet to the receiving end, telling the receiving end to request to establish a connection; among them, the SYN packet is a TCP packet with only the SYN control bit set to 1 (see the TCP data packet format in FIG. 10).
  • SYN Synchronize
  • the receiver After receiving the SYN packet, the receiver will return an acknowledgement packet (SYN / ACK) to the sender, which indicates the acknowledgement of the first SYN packet.
  • SYN / ACK acknowledgement packet
  • the SYN / ACK packet is only the SYN and ACK marks. For 1 pack.
  • the sender After receiving the SYN / ACK packet, the sender sends an acknowledgement packet (ACK) to the receiver to notify the receiver that the connection has been established. At this point, the three-way handshake is complete and the TCP connection is established.
  • ACK acknowledgement packet
  • Step 930 The sender sends multiple data packets in the first RTT after the connection is established.
  • the size of the congestion window cwnd is initialized to 1, and the first RTT of the sender after the connection is established only sends 1 data packet.
  • multiple data packets are sent at the first RTT after the connection is established, that is, the first RTT makes full use of the network bandwidth to send data packets at a larger rate.
  • Step 950 When multiple data packets sent in the first RTT of the transmitting end pass through the network device 210, the network device 210 buffers the multiple data packets to a receiving queue and forwards them in sequence at an appropriate timing.
  • Step 970 The sending end adjusts the congestion window size cwnd according to the number of data packets successfully received by the receiving end among the data packets sent by the first RTT, and sends a corresponding number of data packets based on the adjusted cwnd in the second RTT.
  • the second RTT refers to the next RTT after the first RTT.
  • RTT is an estimated value, and the specific estimation method can refer to the existing technology, and will not be described again.
  • each time the receiver successfully receives a data packet it returns a corresponding ACK to the sender.
  • the receiving end may also use a data packet to carry extended options, such as a selective acknowledgement (Selective Acknowledgment, SACK) to indicate multiple successfully received data packets.
  • SACK selective Acknowledgment
  • the sending end can determine the data packet successfully received by the receiving end based on the received acknowledgement from the receiving end, and then adjust cwnd.
  • cwnd may be adjusted to m + 1, or a value having a linear constraint relationship with m .
  • congestion control algorithms such as slow start and congestion avoidance can be used to adjust the congestion window size cwnd.
  • the process of the slow-start algorithm is as follows:
  • cwnd cannot grow indefinitely, so the sender also sets a slow start threshold (slowthreshold) ssthresh, which indicates the upper limit of the congestion window.
  • slowthreshold slow start threshold
  • ssthresh the congestion avoidance algorithm is triggered.
  • a typical congestion avoidance algorithm is as follows:
  • Step 990 After receiving the data packet sent in the second RTT, the network device 210 buffers the data packet to the receiving queue and forwards it in order.
  • Step 1000 After the data transmission is completed, the sending end disconnects the connection with the receiving end.
  • the connection between the sending end and the receiving end is a TCP connection
  • the sending end and the receiving end may specifically disconnect through a "four-way handshake".
  • the specific process refer to the related description of the prior art. This is not repeated here.
  • the network device 210 may perform a differentiated operation on a data packet sent in the first RTT and a data packet sent in a non-first RTT.
  • the network device 210 may adopt various methods to distinguish a data packet sent by the transmitting end in the first RTT and a data packet sent in a non-first RTT.
  • the sender can add a first tag to a packet sent in the first RTT, or can add a second tag to a packet sent in a non-first RTT, or the sender can add a packet to the first RTT Packets sent in and out of the first RTT are marked.
  • the first mark and the second mark may be located in a certain field of the data packet header, such as an option field, a reserved field, or a control bit field.
  • the first mark and the second mark may be ECN bits of a packet header.
  • the ECN bit position is a certain value (for example, 0 or 1, etc.), it is used to indicate that the data packet is within the first RTT. Packets sent.
  • the network device 210 can determine whether the data packet is the first data packet sent in the RTT according to the tag carried by the data packet. For another example, the network device 210 may also maintain a flow table for recording data flow information, and classify new and old flows based on the flow table to determine whether the data packet is sent within the first RTT. The flow table is queried to determine whether the data packet is the first
  • the ECN bit position is a certain value (for example, 0 or 1, etc.)
  • the network device 210 can determine whether the data packet is the first data packet sent in the RTT according to the tag carried by the data packet. For another example, the network device 210 may also maintain a flow table for recording data flow information, and classify new and old flows based on the flow table to determine whether the data packet is sent within the first RTT. The flow table is queried to determine whether
  • the network device 210 performs differentiated buffering and forwarding according to the identified data packets sent in the first RTT and data packets sent in a non-first RTT.
  • the network device 210 may use a high- and low-priority queue to differentiately buffer the data packets sent in the first RTT and the non-first RTT.
  • the network device 210 may use the selective packet loss scheme shown in FIG. 8 to discard one or more data packets sent by the first round of RTT.
  • the sender sends a large number of data packets during the first RTT in the data transmission phase with the receiver, thereby making full use of the free network bandwidth to enable a new data stream to be started quickly without delay, and at the same time, one data Classify the flow based on whether the new flow is a standard, and set different network transmission priorities for different flows to prevent the data packets of the new flow from interfering with the old flow data packet transmission and causing network congestion. That is to say, the data transmission method provided in the embodiment of the present application achieves a better balance between the network bandwidth utilization rate and the probability of network congestion, while making full use of the network bandwidth, it also avoids network congestion as much as possible.
  • the functions of the sending end described in the related embodiments of FIGS. 6 to 9 are implemented by the protocol stack 25 of the computing device 100, as shown in FIG. 3. Accordingly, the functions of the network device 210 described in the above embodiments may also be implemented by a protocol stack of the network device 210.
  • the protocol stack 25 of the computing device 100 may be executed by a suitable combination of software, hardware, and / or firmware on the computing device 100.
  • the protocol stack 25 is stored in the memory 130 in the form of an executable program; the processor 110 of the computing device 100 runs an executable program corresponding to the protocol stack to execute the methods described in the foregoing method embodiments. Describes some or all steps of the sender.
  • the protocol stack 25 is an independent executable program, and the operating system 131 calls the protocol stack through an interface for data packet processing and transmission.
  • the protocol stack 25 may also be included in the operating system 131 as a part of the operating system kernel 23.
  • the protocol stack can be divided into multiple protocol components or modules according to the protocol level or function. Each component implements the functions of a layer of protocol. Components are used to implement transport layer protocols (such as TCP or UDP protocols), and so on.
  • executable program used in the embodiments of the present application should be broadly interpreted to include, but not limited to: instructions, instruction sets, codes, code segments, subroutines, software modules, applications, software packages, Threads, processes, functions, firmware, middleware, etc.
  • the network card 150 is a programmable network card.
  • the function of the transport layer protocol component 255 may be a Field Programmable Gate Array (FPGA) or a special-purpose integrated circuit. (Application Specific Integrated Circuit, ASIC) and other hardware circuits or special-purpose chips for implementation.
  • the hardware circuits or special-purpose chips can be integrated in the programmable network card 150.
  • the network card 150 includes: a processor, a memory, and an input / output port.
  • the input / output port is used to receive and send data packets, and the memory is used to buffer the received data packets or to be sent.
  • the processor runs a program in the memory to implement the functions of the sending end described in the related embodiments in FIG. 6 to FIG. 9.
  • the data transmission method described in the embodiment of the present application is applied to data transmission between a transmitting end and a receiving end in a network such as a data network, the Internet, and a local area network.
  • the transmitting end and the receiving end establish a communication connection and have data transmission and reception capabilities.
  • Devices such as computers, terminal devices, servers, switches, routers, and so on.
  • An embodiment of the present application further provides a network device 400.
  • the network device 400 includes: a processing circuit 402, a communication interface 404 and a storage medium 406 connected to the processing circuit 402.
  • the processing circuit 402 is used to process data, control data access and storage, issue commands, and control other components to perform operations.
  • Processing circuit 402 may be implemented as one or more processors, one or more controllers, and / or other structures that may be used to execute programs.
  • the processing circuit 402 may specifically include at least one of a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic components.
  • a general purpose processor may include a microprocessor, as well as any conventional processor, controller, microcontroller, or state machine.
  • the processing circuit 302 may also be implemented as a computing component, such as a combination of a DSP and a microprocessor.
  • the storage medium 406 may include a computer-readable storage medium such as a magnetic storage device (for example, a hard disk, a floppy disk, a magnetic stripe), an optical storage medium (for example, a digital versatile disk (DVD)), a smart card, a flash memory device, a random access memory (RAM), read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), registers, and any combination thereof.
  • the storage medium 406 may be coupled to the processing circuit 402 such that the processing circuit 402 can read information and write information to the storage medium 406.
  • the storage medium 406 may be integrated into the processing circuit 402, or the storage medium 406 and the processing circuit 302 may be separate.
  • the communication interface 404 may include circuits and / or programs to enable two-way communication between the network device 400 and one or more wireless network devices (eg, routers, switches, access points, etc.).
  • the communication interface 404 includes at least one receiving circuit 416 and / or at least one transmitting circuit 418.
  • the communication interface 404 may be implemented in whole or in part by a wireless modem.
  • the protocol stack program 420 is stored in the storage medium 406, and the processing circuit 402 is adapted to execute the protocol stack program 420 stored in the storage medium 406, so as to implement the network in the embodiments related to FIGS. 6 to 9 described above. Some or all functions of the device.
  • an embodiment of the present application further provides a data transmission system
  • the system includes a plurality of servers (for example, servers 103, 105, and 107 shown in FIG. 12) connected through a physical switch 230.
  • server 103 Take server 103 as an example.
  • the hardware of server 103 includes processor 102, memory 104, and network card 106.
  • the software of server 103 includes virtual switch 250, virtual machine (VM) 205 and 206, and VM 205 and VM on server 103.
  • 206 communicates through the virtual switch 250, and VMs 205 and VM206 communicate with VMs on other servers through the virtual switch 250 and the network card 103.
  • the server 103 is configured to implement the functions of the sending end described in the related embodiments of FIG. 6 to FIG.
  • the server 103 may implement the functions of the sending end described above by the processor 102 executing an executable program in the memory 104. It can be understood that during the execution of the program, if it involves sending a data packet externally (Such as sending a data packet to another server), the driver of the network card 106 needs to be called to drive the network card 106 to perform a data packet sending operation.
  • the server 103 may also separately implement the functions of the transmitting end described above through the network card 106, for example, it may be implemented by hardware circuits such as FPGA or ASIC integrated in the network card 106 or a dedicated chip.
  • the server may also install a container or other virtual operating system software.
  • the number of servers may also be other numbers.
  • the included hardware is also not limited to the hardware shown in FIG. 12.
  • the disclosed methods and devices may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a terminal device, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc. medium.

Abstract

本申请实施例提供一种数据传输方法、相关设备及系统。发送端在首个RTT内发大量的数据包,从而充分利用网络空余带宽来使新数据流无延迟快速启动,同时以一条数据流是否新流为标准进行流分类,并对不同流设置不同的网络传输优先级,以免新流的数据包干扰旧流数据包传输,造成网络拥塞。本申请实施例提供的数据传输方法,在网络带宽利用率和网络拥塞发生概率之间取得了更好的平衡,在充分利用网络带宽的同时也尽量避免造成网络拥塞。

Description

一种数据传输方法、计算设备、网络设备及数据传输系统 技术领域
本申请涉及通信技术领域,尤其涉及一种数据传输方法、相关设备及系统。
背景技术
在网络中,设备之间基于各种类型的通信协议进行数据传输,例如,传输控制协议(Transmission Control Protocol,TCP)是一种面向连接的、可靠的、基于字节流的传输层通信协议,由互联网工程任务小组(Internet Engineering Task Force,IETF)发布的RFC 793定义,它是目前网络中使用最广泛的传输层协议。TCP为了保证数据包的可靠传输,给每个数据包分配一个序列号(Sequence Number,SN),对于已经成功收到的数据包,接收端会向发送端回复一个相应的确认(Acknowledgement,ACK),该确认会携带该接收到的数据包的序列号。如果发送端在合理的往返时延(Round-Trip Time,即RTT)内未收到确认,那么对应的数据包将会被重传,这种机制通常也被称为超时重传。虽然TCP通过确认和超时重传机制保障了数据的可靠性传输,但网络资源(包括链路带宽、交换节点中的缓存等)通常是有限的,如果在某段时间内,在网络中传送的数据包过多,网络的传输性能就会急剧恶化,这种情况就叫做网络拥塞。当网络发生拥塞时,一般会出现数据包丢失,传输时延增加,吞吐量下降,严重时甚至会导致“拥塞崩溃”(congestion collapse)。
为了防止网络拥塞,TCP引入了一系列的拥塞控制算法,包括最初由V.Jacobson在1988年的论文中提出的“慢启动(slow start)”和“拥塞避免(congestion avoidance)”算法,以及后来在TCP Reno版本中加入的“快速重传(Fast retransmit)”和“快速恢复(Fast Recovery)”算法。这些拥塞控制算法的共同点在于基于一个拥塞窗口来调整数据发送的速率。拥塞窗口的大小,即cwnd值,代表能够发送出去的但还没有收到ACK的最大数据数据包个数,窗口越大那么数据发送的速率也就越快,但是也有越可能使得网络出现拥塞,如果窗口值为1,那么每发送一个数据,都要等到对方的确认才能发送第二个数据包,显然数据传输效率低下。选取最佳的cwnd值,从而使得网络吞吐量最大化且不产生拥塞是拥塞控制算法的核心。
图1示出了现有技术中TCP拥塞控制的主要过程,包含慢启动,拥塞避免、快速重传和快速恢复阶段,其中有两个重要的参数cwnd(拥塞窗口大小)和ssthresh(慢启动阈值),这几个阶段都是通过改变这两个参数来控制数据的发送速率。如图1所示,在慢启动阶段,发送端先发送1个数据包(cwnd=1),如果接收端成功接收该数据包,发送端就开始发送2个数据包(cwnd=2),如果接收端成功接收这2个数据包,然后再发送4个数据包(cwnd=4),即拥塞窗口大小呈指数增长,直到达到设定的慢启动阈值ssthresh。当cwnd=ssthresh之后,便进入了拥塞避免阶段,拥塞避免阶段发送端不再以上面的指数方式增长cwnd,而是以线性增长,即每次收到接收端ACK后只增加1/cwnd个数据包。这样在一个RTT内cwnd最多增加1。在cwnd=24的时候,假如发生了超时,则将cwnd重置为1,并减小慢启动阈值ssthresh,比如ssthresh=当前cwnd/2。
可以看出,现有的拥塞控制算法在网络状况良好的时候通过缓慢增加数据发送速率, 避免对网络造成冲击,同时在检测到丢包时激进地降低数据发送速率,以避免网络状态的进一步恶化,是一种以“预防拥塞”为主的拥塞控制算法。这种算法虽然能一定程度上抑制网络拥塞,但也可能会不合理地限制数据传输速率,增加数据传输的时延,降低网络带宽利用率。尤其是在无线网络、数据中心网络和远程直接内存访问(remote direct memory access,RDMA)网络等环境中,由现有的拥塞控制算法导致的吞吐率降低,数据传输时延大以及网络带宽浪费的情况普遍存在。
发明内容
本申请实施例提供一种数据传输方法及相关的设备和系统,旨在减少网络拥塞的同时,能够充分利用网络带宽,提高数据传输的速率,降低数据传输时延。
为达到上述发明目的,一方面,本申请实施例提供了一种数据传输方法,该方法包括:发送端在与接收端的数据传输阶段的第一个RTT内以很大的速率(线速或者任意自定义速率)发送多个数据包,并为所述第一个RTT内发送的所述多个数据包添加第一标记,以使得网络设备在接收到携带所述第一标记的数据包后,将所述携带所述第一标记的数据包缓存至低优先级队列或丢弃,其中,所述网络设备的高优先级队列中的数据包优先于所述低优先级队列中的数据包被转发,所述高优先级队列中缓存的数据包未携带所述第一标记。该方法利用网络空余带宽来使新数据流无延迟快速启动,同时对首轮RTT发送的数据包进行标记,使得网络设备对首轮RTT发送的包以较低的优先级转发,以减少新流快速启动对旧流(非首轮RTT包)的冲击,降低网络拥塞发生的概率。
在一个可能的设计中,发送端可以基于第一个RTT内发送的所述多个数据包中被所述接收端成功接收的数据包的个数,调整下一个RTT内的发送速率或者发送数据包的个数,并基于调整后的发送速率或数据包个数在下一个RTT内发送数据包,从而实现基于感知到的网络状况及时进行拥塞控制,以免造成网络状况的急速恶化。
在一个可能的设计中,发送端可以为非首个RTT内发送的数据包添加第二标记,以指示该数据包为非首个RTT内发送的,网络设备进而基于数据包中携带的第二标记,将该数据包缓存至高优先级队列,并优先于首个RTT内发送的数据包转发,以减少对旧流(非首轮RTT包)的影响。
在一个可能的设计中,第一标记和第二标记为数据包头部的字段或特定比特位。
在一个可能的设计中,发送端在发送数据包之前,先建立与接收端的通信连接,上述第一个RTT或首轮RTT为通信连接建立后的第一个RTT。
在一个可能的设计中,发送端在与接收端建立通信连接的过程中进行数据传输,上述第一个RTT或首轮RTT为通信连接建立阶段第一个RTT。
在一个可能的设计中,发送端基于接收到的来自接收端的一个或确认,确认所述第一个RTT内发送的所述多个数据包中被所述接收端成功接收的数据包的个数。
在一个可能的设计中,发送端在第二个RTT内允许发送的数据包的数量上限与第一个RTT发送的数据包中已被接收端成功接收的数据包个数呈线性关系。
第二方面,本申请实施例提供了一种数据传输方法,包括:网络设备接收发送端发送的数据包;若所述数据包是所述发送端在与接收端的数据传输阶段的第一个往返时间 (RTT)内发送的,则所述网络设备将所述数据包缓存至低优先级队列;若所述数据包不是在所述第一个RTT内发送的,则所述网络设备将所述数据包缓存至高优先级队列;其中,所述高优先级队列中的数据包优先于所述低优先级队列中的数据包被转发。
采用上述方法,网络设备对首轮RTT发送的数据包和非首轮RTT发送的数据包进行区分,并给予非首轮RTT发送的数据包更高的转发优先级,从而减少首轮RTT快速发包对旧流(非首轮RTT包)的冲击,降低网络拥塞发生的概率。
在一个可能的设计中,发送端给首轮RTT发送的数据包添加有特定标记,网络设备基于接收到的数据包携带的标记来判断该数据包是否为发送端在首轮RTT内发送的。
在一个可能的设计中,网络设备维护一个流表,用于记录所有活跃流信息。如果一条流的五元组信息无法在流表里查找到,则将该流分类为新流,并在流表里插入一条新流记录。后续数据包在查表时,会命中上述新插入流表项,并根据流表项内容确定当前数据包属于新流,即首轮RTT发送的包。当一条新流结束第一轮RTT的数据传输后,更新流表项为“旧流”,所以该流的后续数据包均会根据更新后的流表项被识别为非首轮RTT包。
在一个可能的设计中,流表的每条流记录都有一个有效时间,如果在有效时间里该流后续没有发送任何新的数据包,则删除该流记录。
第三方面,本申请实施例提供一种数据传输方法,包括:网络设备接收发送端发送的数据包;若所述数据包是所述发送端在与所述接收端的数据传输阶段的第一个往返时间(RTT)内发送的,且所述网络设备的接收队列中数据包的个数超出设定阈值,则所述网络设备将所述数据包丢弃;若所述数据包不是在所述第一个RTT内发送的,且所述接收队列未满,则所述网络设备将所述数据包加入所述接收队列。
采用上述方法,网络设备基于接收队列的深度,对首轮RTT发送的数据包进行选择性丢包,从而减少首轮RTT快速发包对旧流(非首轮RTT包)的冲击,降低网络拥塞发生的概率。
在一个可能的设计中,若所述数据包不是在所述第一个RTT内发送的,且所述接收队列已满,则所述网络设备将所述数据包丢弃。
在一个可能的设计中,若所述数据包不是在所述第一个RTT内发送的,且所述接收队列已满,则所述网络设备将所述接收队列中的一个数据包丢弃,其中,丢弃的所述数据包为所述发送端在所述第一个RTT内发送的数据包。
在一个可能的设计中,发送端给首轮RTT发送的数据包添加有特定标记,网络设备基于接收到的数据包携带的标记来判断该数据包是否为发送端在首轮RTT内发送的。
在一个可能的设计中,网络设备维护一个流表,用于记录所有活跃流信息。如果一条流的五元组信息无法在流表里查找到,则将该流分类为新流,并在流表里插入一条新流记录。后续数据包在查表时,会命中上述新插入流表项,并根据流表项内容确定当前数据包属于新流,即首轮RTT发送的包。当一条新流结束第一轮RTT的数据传输后,更新流表项为“旧流”,所以该流的后续数据包均会根据更新后的流表项被识别为非首轮RTT包。
在一个可能的设计中,流表的每条流记录都有一个有效时间,如果在有效时间里该流后续没有发送任何新的数据包,则删除该流记录。
第四方面,本申请实施例提供一种计算设备,该计算设备具有实现上述方法示例中发 送端的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
在一种可能的设计中,该计算设备包括处理器、存储器和网卡,该网卡用于接收数据包以及发送数据包,该处理器运行存储器中的协议栈程序,以用于执行上述方法示例中发送端的功能。
在另一种可能的设计中,计算设备的结构中包括接收单元、处理单元以及发送单元,这些单元可以执行上述方法示例中的相应功能,例如,接收单元和发送单元分别用于数据包的接收和发送,处理单元用于数据包的处理,如添加第一和/或第二标记。
第五方面,本申请实施例提供一种网络设备,该网络设备具有实现上述任一方面或任一方面的任一可能的实现方式中网络设备的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
在一种可能的设计中,该网络设备包括处理器、存储器和输入/输出端口,该输入/输出端口用于接收数据包以及发送数据包,该处理器运行存储器中的协议栈程序,以用于执行上述方法示例中网络的功能,例如识别首轮RTT发送的数据包,缓存数据包至接收队列,当接收队列已满或队列深度超过设定阈值时丢弃数据包等。
在一种可能的设计中,网络设备的结构中包括接收单元、处理单元以及发送单元,这些单元可以执行上述方法示例中的相应功能,例如,接收单元和发送单元分别用于数据包的接收和发送,处理单元用于数据包的处理,例如识别首轮RTT发送的数据包,缓存数据包至接收队列,当接收队列已满或队列深度超过设定阈值时丢弃数据包等。
在一种可能的设计中,上述接收单元和发送单元为收发器、网卡或通信接口,处理单元为处理器,现场可编程门阵列(FPGA)或特定用途集成电路(ASIC)等硬件电路或专用芯片。
第六方面,本申请实施例提供一种网卡,包括:输入/输出端口,处理器,其中,处理器用于,在发送端与接收端的数据传输阶段的第一个往返时间RTT内通过所述输入/输出端口发送多个数据包;为所述第一个RTT内发送的所述多个数据包添加第一标记,以使得网络设备在接收到携带所述第一标记的数据包后,将所述携带所述第一标记的数据包缓存至低优先级队列或丢弃,其中,所述网络设备的高优先级队列中的数据包优先于所述低优先级队列中的数据包被转发,所述高优先级队列中缓存的数据包未携带所述第一标记。
又一方面,本申请实施例提供了一种计算设备,该计算设备包括上述网卡。
又一方面,本申请实施例提供了一种数据传输系统,该系统包括上述计算设备和上述网络设备。
再一方面,本申请实施例提供了一种计算机存储介质,用于储存为上述计算设备或网络设备所用的计算机软件指令,其包含用于执行上述方面所设计的程序。
相较于现有技术,本申请实施例提供的方案中,本申请实施例提供的数据传输方法,发送端在建立TCP连接后的首个RTT内发大量的数据包,从而充分利用网络空余带宽来使新数据流无延迟快速启动,同时以一条数据流是否新流为标准进行流分类,并对不同流设置不同的网络传输优先级,以免新流的数据包干扰旧流数据包传输,造成网络拥塞。也就是说,本申请实施例提供的数据传输方法,在网络带宽利用率和网络拥塞发生概率之间取 得了更好的平衡,在充分利用网络带宽的同时也尽量避免造成网络拥塞。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍。
图1是现有技术中一种拥塞控制算法示意图。
图2是本申请实施例的数据传输系统的架构图。
图3是本申请一实施例的计算设备的结构示意图。
图4是本申请另一实施例的计算设备的结构示意图。
图5是本申请另一实施例的计算设备的结构示意图。
图6是本申请实施例的数据传输方法的流程图。
图7是本申请实施例的基于高低优先级队列的数据传输示意图。
图8是本申请实施例的选择性丢包过程示意图。
图9是本申请实施例的数据传输方法的流程图。
图10是本申请实施例的TCP数据包的格式示意图。
图11是本申请实施例的网络设备的结构示意图。
图12是本申请实施例的另一数据传输系统架构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行详细描述,显然,所描述的实施例是本申请的一部分实施例,而不是全部实施例。
在开始描述本申请具体实施例之前,先介绍在本申请以下实施例中将会被提及的术语及其含义。可以理解的是,无如其他说明,本申请各个实施例中的这些术语及其含义均可以是相同的。
数据包:也称为报文,是网络传输的基本单位,以一定格式组织起来的数据。不同类型的网络协议对数据包的格式有不同的定义,但通常而言,一个数据包可分为头部(header)和净荷(payload),其中,头部包含了数据包传输过程中必需的信息,比如地址信息、标志位等等,净荷也称为数据包的数据部分,包含了被发送的数据内容。
交换机:能够转发数据包的一种网络设备,能为网络提供更多的连接端口,以便连接更多的设备到网络中。
交换机队列:交换机中用于存储接收到的数据包的功能单元。
服务器:提供计算服务的设备。由于服务器需要响应服务请求,并进行处理,因此一般来说服务器应具备承担服务并且保障服务的能力。在网络环境下,根据服务器提供的服务类型不同,分为文件服务器,数据库服务器,应用程序服务器,WEB服务器等。
终端设备:向用户提供语音和/或数据连通性的设备,包括无线终端或有线终端。无线终端可以是移动终端,如移动电话(或称为“蜂窝”电话)或具有移动终端的计算机,例如,可以是便携式、袖珍式、手持式、计算机内置的或者车载的移动装置。
数据中心:数据中心是一整套复杂的设施。它不仅仅包括计算机系统和其它与之配套 的设备(例如通信和存储系统),还包含冗余的数据通信网络、环境控制设备、监控设备以及各种安全装置。这些设备被放置在一起是因为它们具有相同的对环境的要求以及物理安全上的需求,并且这样放置便于维护。
数据中心网络:连接数据中心里所有计算机系统的网络。
网络带宽:网络带宽是指在单位时间(一般指的是1秒钟)内能传输的数据量。
数据流:指网络中从一个计算设备(如服务器A)发往另外一个计算设备(如服务器B)的一组数据包,数据流的定义方式可以随应用的需求不同而改变,通常由五元组(源IP地址,目的IP地址,源端口号,目的端口号,协议类型)来定义一条数据流。
网络拥塞:网络拥塞是指在网络中传送数据量太大时,由于网络资源有限而造成网络传输性能下降的情况。通常情况下,当网络中负载过度增加致使网络性能下降时,就会发生网络拥塞。
拥塞控制:即通过某种方法来解决或缓解网络拥塞。拥塞控制需要确保网络能够承载用户提交的通信量,是一个全局性问题,涉及主机、交换机等很多因素。衡量网络是否拥塞的参数主要有:数据包丢失率、交换机队列深度(平均或瞬时)、超时重传的数据包数目、平均传输延迟等。
拥塞控制算法:解决网络拥塞的算法。
确认(Acknowledgement,ACK):数据传输过程中由接收端发给发送端的一种控制包,用于表示发送端发来的数据已确认接收。
往返时间(Round-Trip Time,RTT):网络传输中一个重要的性能指标,表示从发送端发送数据开始,到发送端收到来自接收端的确认(ACK),总共经历的时延。RTT由三个部分决定:即链路的传播时间、发送端和接收端的处理时间以及中间网络设备缓存中的排队和处理时间。其中,前面两个部分的值相对固定,网络设备缓存中的排队和处理时间会随着整个网络拥塞程度的变化而变化。所以RTT的变化在一定程度上反映了网络拥塞程度的变化。
传输控制协议(Transmission Control Protocol,TCP):是一种面向连接的、可靠的、基于字节流的传输层通信协议,由IETF的RFC 793定义。
显式拥塞通知(Explicit Congestion Notification,ECN):是一个对TCP协议的扩展,定义于IETF的RFC 3168。ECN允许拥塞控制的端对端通知而避免丢包。ECN为一项可选功能,如果底层网络设施支持,则可能被启用ECN的两个端点使用。通常来说,网络通过丢弃数据包来表明信道阻塞。在ECN成功协商的情况下,支持ECN的交换机可以在数据包包头中设置一个标记来代替丢弃数据包,以标明拥塞即将发生。数据包的接收端在检测到数据包被网络设备标记后,在返回的ACK中做相应标记以通知发送端网络发生拥塞;相应地,发送端降低其传输速率,就如同在传统TCP拥塞控制算法中检测到丢包那样。
远程直接内存访问(Remote Direct Memory Access,RDMA):为了解决网络传输中服务器端数据处理的延迟而产生的一种技术。RDMA通过网卡卸载协议栈的方式,将数据从一台服务器快速移动到另一台服务器内存中,整个过程不需要经过操作系统的参与(内核旁路),因而能减少CPU的负载。
数据平面开发套件(Data Plane Development Kit,DPDK):是一组快速处理数据包的开 发平台及接口。DPDK在应用层为快速的数据包处理及网络传输提供一个简单而完善的架构。DPDK允许应用层封装好的数据包直接发送给网卡,过程中不需要经过操作系统的参与,因而能减少内存和CPU的负载。
在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
图2示出了本发明方案适用的一种数据传输系统10的简易架构。如图2所示,多个计算设备100通过网络200通信连接。计算设备100可以是具有数据收发能力的任何设备,比如服务器、终端设备、或虚拟机等等。网络200可以为数据中心网络,因特网、局域网或广域网等。网络200包括一个或多个网络设备210,比如交换机、网关、路由器等能实现数据包转发的设备。两个设备利用通信相关的组件,比如协议栈、网卡等建立通信连接,并根据建立的连接,以数据包为单位传输数据。数据包在传输的过程中会经过网络200中的一个或多个网络设备200,网络设备200先将接收到的数据包缓存在交换机队列中,然后再转发给其它设备。为了简化描述,本申请实施例中将已建立通信连接的两个设备中的一个称为发送端,另一个称为接收端。可以理解的是,发送端和接收端可以是具有数据收发能力的任何设备。比如,发送端可以是一台服务器,接收端是另一台服务器;或者发送端是终端设备,接收端是服务器;或者发送端和接收端均为终端设备。另外,发送端和接收端是两个相对的角色,且可以互相转换,即同一设备在不同的场景下可能是发送端,也有可能是接收端。
所属领域的技术人员可以理解一个数据传输系统通常可包括比图2中所示的部件更少或更多的部件,或者包括与图2中所示部件不同的部件,图2仅仅示出了与本申请实施例所公开的多个实现方式更加相关的部件。
图2所示的数据传输系统10仅作为本发明的一种典型应用场景,不应理解为对本发明应用场景的限定。
在数据传输系统中传输数据不可避免地会存在数据包丢失现象,因此为了保证数据的可靠传输,数据传输系统通常会引入了慢启动、拥塞避免等拥塞控制算法来抑制网络拥塞,但这些拥塞控制算法同时也会极大的限制数据的数据传输速率,增加传输时延,降低带宽利用率。本申请实施例提供一种数据传输方法,可以在减少网络拥塞的同时,能够充分利用网络带宽,提高数据传输速率,降低数据传输时延。该方法可以用于基于TCP进行数据传输的设备。可以理解的是,其它允许丢包的网络也可以使用该方法进行拥塞控制。
图3示出了本申请实施例的计算设备100的一个示例性的结构示意图。如图3所示,计算设备100包括:处理器110、存储器130和网卡(network interface card,NIC)150等部件。这些部件可通过一根或多根通信总线或信号线进行通信。本领域技术人员可以理解,计算设备100可以包括比图示更多或更少的部件,或者组合某些部件。
处理器110是计算设备100的控制中心,利用各种接口和总线连计算设备100的各个部件。在一些实施例中,处理器110可包括一个或多个处理单元。
存储器130中存储有可执行程序,诸如图3所示的操作系统131和应用程序133。处理器110被配置用于执行存储器130中的可执行程序,从而实现该程序定义的功能,例如处理器110执行操作系统131从而在计算设备100上实现操作系统的各种功能。存储器130还存储有除可执行程序之外的其他数据,诸如操作系统131和应用程序133运行过程中产 生的数据。存储器130可以包括一个或多个存储设备。在一些实施例中,存储器130为非易失性存储介质,一般包括内存和外存。内存包括但不限于随机存取存储器(Random Access Memory,RAM),只读存储器(Read-Only Memory,ROM),或高速缓存(cache)等。外存包括但不限于闪存(flash memory)、硬盘、通用串行总线(universal serial bus,USB)盘等。可执行程序通常被存储在外存上,处理器在执行可执行程序前会将该程序从外存加载到内存。
存储器130可以是独立的,通过总线与处理器110相连接;存储器130也可以和处理器110集成到一个芯片子系统。
网卡150是实现数据包发送与接收、数据包的封装与解封装、介质访问控制以及数据缓存等功能的硬件单元。网卡150包括处理器、存储器(包括RAM和/或ROM)以及输入/输出端口,其中,输入/输出端口用于接收和发送数据包,存储器用于缓存接收到的数据包或待发送的数据包,处理器通过运行存储器中的程序,以对数据包进行处理,比如封装/解封装等。
计算设备100搭载的操作系统131可以为
Figure PCTCN2019087382-appb-000001
或者其它操作系统,本申请实施例对此不作任何限制。
进一步地,在一个实施例中,如图3所示,计算设备100从逻辑上可划分为硬件层21、操作系统131以及应用层27。硬件层21包括如上所述的处理器110、存储器130、网卡150等硬件资源。应用层31包括一个或多个应用程序,比如应用程序133。操作系统131作为硬件层21和应用层31之间的软件中间件,是管理和控制硬件与软件资源的可执行程序。
操作系统131包括内核23,内核23用于提供底层系统组件和服务,例如:电源管理、内存管理、协议栈25、硬件驱动程序26等。协议栈25是实现网络协议的组件。在一个实施例中,硬件驱动程序26包括存储器驱动233、网卡驱动235。在一个实施例中,协议栈25包括网络层协议组件253和传输层协议组件255,分别用于实现网络层和传输层的网络协议功能。本申请实施例提供的数据传输方法可以由内核12中的传输层协议组件255来实施。
在一个实施例中,计算设备100可以将协议栈相关组件的功能卸载到网卡,从而实现内核旁路,提升数据转发性能。如图4所示,网卡150为可编程网卡,传输层协议组件255实现在可编程网卡150中。相应地,本申请实施例提供的数据传输方法可以由网卡150来实施,例如,可以通过修改网卡150的硬件或固件来实现该方法。
在一个实施例中,如图5所示,计算设备100可以将传输层协议组件255实现在应用层(如DPDK技术),从而实现内核旁路,提升数据转发性能。相应地,本申请实施例提供的数据传输方法可以由应用层的软件来实施。
图6示出了本申请实施例提供的数据传输方法的流程示意图。为了便于描述,本申请实施例将通过网络设备210进行数据传输的两台计算设备100中的一台称为发送端,另一台称为接收端。如图6所示,该数据传输方法包括如下步骤:
步骤610:发送端在与接收端的数据传输阶段的首个RTT内发送多个数据包,或者在首个RTT内以很大的速率(线速或者任意自定义速率)发送数据包。RTT是发送端从发送数据开始,到收到来自接收端的确认所经历的时间。RTT不是一个固定值,是一个测量值, 它会随着整个网络拥塞程度的变化而变化。对于首个RTT来说,它是从发送端发送首个数据包开始,直至发送端收到来自接收端的确认所经历的时长。在数据传输过程中,发送端可以根据发送的数据包的时间戳以及接收到的确认的时间戳动态估算RTT值,因此,随着网络拥塞程度的变化,RTT也会变化。
步骤610中,发送端在收到来自接收端的确认之前,可以无限制地发送多个数据包,而不必像现有技术的慢启动机制那样,仅发送一个数据包。这里的多个数据包至少包括二个数据包,优选地,包括两个以上,甚至远多于两个数据包。数据传输阶段是相对连接建立阶段而言的。发送端和接收端要能实现数据传输,需要建立通信连接,连接建立阶段通常涉及一次或多次消息交互。本申请各个实施例中所描述的“首个RTT”,一般是指发送端和接收端之间的通信连接建立后,进行数据传输的第一个RTT。在某一些特殊的情形下,连接建立阶段和数据传输阶段可能不是串行,而并行的,即发送端在与接收端建立连接的过程中就开始进行数据传输。在这种情况下,“首个RTT”是指发送端开始向接收端传输数据的第一个RTT。
步骤630:网络设备210接收该多个数据包中的一个或多个,并缓存至接收队列,并在合适的时机转发;
步骤650:发送端根据首个RTT内发送的数据包中,被接收端成功接收的数据包的个数,调整下一个RTT(非首轮RTT),即第二个RTT内发送数据包的个数或发送速率,并基于调整后的个数或发送速率继续发送数据包。例如,若首个RTT内发送的数据包中被接收端成功接收的数据包个数为N,则可以调整下一个RTT内允许发送数据包的上限为N,N+1,或者与N具有线性关系的整数值。
需要说明的是,步骤650中“被接收端成功接收的数据包的个数”通常是发送端基于接收端的反馈来确定或者估算的,与实际被接收端成功接收的数据包的个数可能会存在误差。在一个实施例中,接收端每接收到一个数据包,就会向发送节点回复一个对应的确认(acknowledgement),表明接收端已成功接收该报文。相应地,发送端基于接收到的来自接收端的确认,确认所述第一个RTT内发送的所述多个数据包中被所述接收端成功接收的数据包的个数。在一个实施例中,接收端也可以通过一个携带扩展字段的确认,反馈已经成功接收的多个数据包的信息,或者未成功接收的一个或多个数据包的信息,发送端基于接收到的确认中的扩展字段,即可确定所述第一个RTT内发送的所述多个数据包中被所述接收端成功接收的数据包,进而确定数据包的个数。
步骤670:网络设备210接收发送端在非首轮RTT发送的一个或多个数据包,并缓存至接收队列,并在合适的时机转发。
为了减少网络拥塞,网络设备210可以对首个RTT内发送的数据包和非首个RTT内发送的数据包进行差异化的操作。其中,网络设备210可以采用多种方式来区分发送端在首个RTT内发送的数据包以及非首个RTT内发送的数据包。
在一个实施例中,如图6中步骤640所示,发送端可以对首个RTT内发送的数据包添加第一标记,以便于网络设备210快速识别首个RTT内发送的数据包。
可选地,如步骤680所示,发送端也可以为非首个RTT内发送的数据包添加第二标记,以便于网络设备210快速识别非首个RTT内发送的数据包。其中,第一标记和第二标 记可位于数据包头部的某一扩展字段,比如选项(Option)字段,或保留(reserved)字段等等。在一个实施例中,第一标记和第二标记可以为数据包报报头的ECN比特位,ECN比特位置为某个特定值(比如,0或1等)时用于表示数据包为首个RTT内发送的数据包。相应地,网络设备210在接收到任一数据包后,根据该数据包携带的的标记即可确定该数据包是否为首个RTT内发送的数据包。
在另一个实施例中,网络设备210可以维护一个记录所有活跃数据流信息的流表,并基于流表进行新旧流分类以确定数据包是否是首个RTT内发送的。其中,活跃数据流是指已经完成首轮RTT数据传输的数据流。如果一条流的五元组信息无法在流表里查找到,则将该流分类为新流,并在流表里插入一条新流记录。若根据数据包的五元组查询该流表表,命中上述新插入流表项,则确定当前数据包属于新流,即首个RTT发送的数据包。当一条新流结束第一轮RTT的数据传输后,更新流表项为“旧流”,所以后续数据包均会根据更新后的流表项被识别为旧流,即非首个RTT发送的数据包。可选地,流表的每条流记录都有一个有效时间,如果在有效时间里该流没有传输任何新的数据包,则删除该流记录。反之,则等有效时间到期后重新开始计算有效时间。
在一个实施例中,在步骤650中,第一轮RTT的数据传输之后,第二轮RTT的流速率可以根据以下方法(1)或(2)计算:
(1)根据第一个RTT成功传输的数据包数量来确定第二个RTT的流速率。假设拥塞窗口初始化为1(cwnd=1),则该拥塞窗口在第一个RTT不起作用。第二个RTT内,每收到一个确认则将cwnd的值增加1,并且第二个RTT内允许发送的数据包数量由cwnd的值来确定。
(2)主动式拥塞控制算法:从第二个轮RTT开始,发送速率的使用现有的拥塞控制算法来计算。
进一步地,网络设备210根据识别出的首个RTT内发送的数据包和非首个RTT内发送的数据包,进行差异化的缓存和转发。
在一个实施例中,如图7所示,网络设备210包含多个优先级队列,比如高优先级队列和低优先级队列,网络设备210将首个RTT内发送的数据包缓存至低优先级队列,将非首个RTT内发送的数据包缓存至高优先级队列。高优先级队列中的数据包优先被转发,当高优先级队列为空,或者高优先级队列深度小于预设阈值时,低优先级队列中的数据包才会被转发。队列的深度是指队列中缓存的数据包的数目。其中,高优先级队列和低优先级队列分别可以有一个或多个,比如网络设备210有2个高优先级队列和2个低优先级队列,或者有3个高优先级队列和1个低优先级队列。
在另一个实施例中,如图8所示,为网络设备210的接收队列设置一个选择性丢包阈值k,当接收队列的深度小于阈值k时,网络设备210接收到的首个RTT内发送的数据包和非首个RTT内发送的数据包均可缓存至接收队列;而当接收队列的深度大于或等于阈值k时,若当前接收到的数据包为首个RTT内发送的数据包,则网络设备210会将该数据包丢弃。若当前接收到的数据包为非首个RTT内发送的数据包,且接收队列未满,则网络设备210才会将其缓存至接收队列。其中,接收队列的深度是指接收队列中的数据包的个数。
作为一种可选的实施例方式,当接收队列已满,或者接收队列的深度大于或等于阈值 k时,若当前接收到的数据包为非首个RTT内发送的数据包,网络设备210也可以将接收队列中已缓存的首个RTT内的数据包丢弃一个或多个,以给新接收到的非首个RTT内发送的数据包“腾出空间”。
作为一种可选的实施例方式,当接收队列已满,若当前接收到的数据包为非首个RTT内发送的数据包,网络设备210也可以将当前接收到的数据包丢弃。
作为一种可选的实施例方式,当接收队列的深度大于或等于另一阈值s时,若当前接收到的数据包为非首个RTT内发送的数据包,网络设备210也可以将当前接收到的数据包丢弃,其中,这里的阈值s大于或等于前述选择性丢包阈值k。
作为一种可选的实施例方式,当网络设备210有多个接收队列时,可以为不同的接收队列设置不同的选择性丢包阈值。例如,为用于缓存首个RTT内发送的数据包的低优先级队列设置第一阈值m,为用于缓存非首个RTT内发送的数据包的高优先级队列设置第一阈值n,其中,m<n。这样当低优先级队列深度大于或等于m时,若网络设备210接收到首个RTT内发送的数据包,则丢弃;当高优先级队列深度大于或等于n时,若网络设备210接收到非首个RTT内发送的数据包,则丢弃。
图9示出了当计算设备100基于TCP进行数据传输时,应用本申请实施例的数据传输方法的流程示意图。应当理解,TCP只是计算设备可能使用的一种传输协议,其它的传输协议也可适用于本申请实施例的方法。为了便于描述,本申请实施例将通过网络设备210进行数据传输的两台计算设备100中的一台称为发送端,另一台称为接收端。如图9所示,该数据传输方法包括如下步骤:
步骤910:发送端与接收端建立TCP连接。
在一个实施例中,该TCP连接的建立可由发送端上的应用程序发起。应用程序生成套接字开启(socket open)命令,该命令被传递给发送端的协议栈,以触发协议栈通过三次消息交互(也称为“三次握手”)与接收端建立TCP连接,然后,协议栈通知应用程序连接已经建立。TCP数据包的格式如图10所示,其中,源端口号和目的端口号用于确定发送端和接收端应用进程,通过五元组(源端口号、目的端口号、源IP地址、目的IP地址和传输层协议号)可以唯一确定一个TCP连接,该TCP连接上传输的数据包构成一条数据流,即同一条数据流中的数据包具有相同的五元组;TCP数据包头部的序列号(Sequence Number,seq)字段用于指示数据包的序号,通常情况下,数据包的序号为数据包净荷中第一个数据字节的序列号。接收端在接收到TCP数据包后,发送确认(Acknowledgement,ACK)至发送端。窗口大小用于指示接收端当前接收缓冲区的大小。另外,TCP数据包头部还有6个控制位以及一个可自定义的选项(Option)字段,选项字段可以用于携带额外的信息;其中6个控制位的定义如下:
URG:紧急指针有效;
ACK:确认号有效;
PSH:立即上送应用层处理;
RST:异常复位;
SYN:同步标志,置1建立连接;
FIN:终止标志,请求释放连接。
参照图9,发送端与接收端通过“三次握手”建立TCP连接,具体如下:
(1)发送端首先向接收端发一个SYN(Synchronize)包,告诉接收端请求建立连接;其中,SYN包就是仅SYN控制位设为1的TCP包(参见图10TCP数据包格式)。
(2)接收端收到SYN包后会向发送端返回一个对SYN包的确认包(SYN/ACK),表示对第一个SYN包的确认;其中,SYN/ACK包是仅SYN和ACK标记为1的包。
(3)发送端收到SYN/ACK包后,向接收端发一个确认包(ACK),通知接收端连接已建立。至此,三次握手完成,TCP连接建立。
步骤930:发送端在连接建立后的首个RTT内发送多个数据包。
根据现有的拥塞控制算法中的慢启动机制,拥塞窗口大小cwnd初始化为1,发送端在连接建立后的首个RTT只发送1个数据包。而本申请实施例在连接建立后的首个RTT就发送多个数据包,即在首个RTT充分利用网络带宽,以较大的速率发送数据包。具体地,为了实现首个RTT发送多个数据包的目的,发送端可以将拥塞窗口大小cwnd初始化为一个较大的值(cwnd=n,n>1),然后基于初始化的cwnd发送相应数量的数据包。可以理解的是,发送端也可以仍然按照现有方式,将拥塞窗口大小初始化为1(cwnd=1),但通过其他手段使该拥塞窗口在首个RTT不起作用。
步骤950:发送端首个RTT内发送的多个数据包经过网络设备210时,网络设备210将该多个数据包缓存至接收队列,并在合适的时机依序转发。
步骤970:发送端根据首个RTT发送的数据包中,被接收端成功接收的数据包个数,调整拥塞窗口大小cwnd,并在第二个RTT内基于调整后的cwnd发送相应数量的数据包,这里的第二个RTT是指上述首个RTT后的下一个RTT。RTT是一个估计值,具体的估算方法可以参照现有技术,不再赘述。
在一个实施例中,接收端每成功接收一个数据包,便返回一个对应的ACK给发送端。在另一个实施例中,接收端也可以通过一个数据包携带扩展选项,比如选择性确认(Selective Acknowledgment,SACK)来指示多个被成功接收的数据包。发送端可以基于接收到的来自接收端的确认来确定被接收端成功接收的数据包,进而调整cwnd。
在一个实施例中,若首个RTT发送的数据包中,被接收端成功接收的数据包个数为m,则可以将cwnd调整为m+1,或与m具有某种线性约束关系的值。
在一个实施例中,可以采用慢启动、拥塞避免等拥塞控制算法来调整拥塞窗口大小cwnd。其中,慢启动的算法的过程如下:
1)初始化拥塞窗口cwnd=1,表明可以发送一个数据包;
2)每当收到一个ACK,cwnd++;呈线性上升;
3)每当过了一个RTT,cwnd=cwnd*2;
当然,cwnd不可能无限制增长,因此,发送端还设置了慢启动阈值(slow start threshold)ssthresh,表示拥塞窗口的上限,当cwnd>=ssthresh时,就会触发拥塞避免算法。一种典型的拥塞避免算法如下:
1)收到一个ACK时,cwnd=cwnd+1/cwnd
2)当每过一个RTT时,cwnd=cwnd+1
这样就可以避免数据包增长过快导致网络拥塞,慢慢的增加调整到网络的最佳值。
步骤990:网络设备210接收到第二个RTT内发送的数据包后,将数据包缓存至接收队列,并依序转发。
步骤1000:数据传输完成后,发送端断开与接收端之间的连接。在一个实施例中,若发送端和接收端之间的连接为TCP连接,则发送端和接收端具体可以通过“四次握手”断开连接,具体过程可以参考现有技术的相关描述,在此不再赘述。
与图6所示的实施例类似,网络设备210可以对首个RTT内发送的数据包和非首个RTT内发送的数据包进行差异化的操作。
其中,网络设备210可以采用多种方式来区分发送端在首个RTT内发送的数据包以及非首个RTT内发送的数据包。例如,发送端可以对首个RTT内发送的数据包添加第一标记,或者可以为非首个RTT内发送的数据包添加第二标记,或者也发送端可以对首个RTT内发送的数据包和非首个RTT内发送的数据包都进行标记。其中,第一标记和第二标记可位于数据包头部的某一字段,比如选项(Option)字段,保留(reserved)字段或者控制位字段。在一个实施例中,第一标记和第二标记可以为数据包报报头的ECN比特位,ECN比特位置为某个特定值(比如,0或1等)时用于表示数据包为首个RTT内发送的数据包。相应地,网络设备210在接收到任一数据包后,根据该数据包携带的的标记即可确定该数据包是否为首个RTT内发送的数据包。再例如,网络设备210也可以维护一个记录数据流信息的流表,并基于流表进行新旧流分类以确定数据包是否是首个RTT内发送的,通过查询流表以判断数据包是否为首个RTT发送的相关细节可以参见前面的实施例,不再赘述。
进一步地,网络设备210根据识别出的首个RTT内发送的数据包和非首个RTT内发送的数据包,进行差异化的缓存和转发。例如,网络设备210可以使用高低优先级队列来对首个RTT内发送和非首个RTT内发送的数据包进行差异化的缓存,具体细节可以参照图7相关的实施例。又例如,当接收队列的深度超过选择性丢包阈值时,网络设备210可以使用图8所示的选择性丢包方案来丢弃首轮RTT发送的一个或多个数据包。
本申请实施例提供的数据传输方法,发送端在与接收端的数据传输阶段的首个RTT内发大量的数据包,从而充分利用网络空余带宽来使新数据流无延迟快速启动,同时以一条数据流是否新流为标准进行流分类,并对不同流设置不同的网络传输优先级,以免新流的数据包干扰旧流数据包传输,造成网络拥塞。也就是说,本申请实施例提供的数据传输方法,在网络带宽利用率和网络拥塞发生概率之间取得了更好的平衡,在充分利用网络带宽的同时也尽量避免造成网络拥塞。
在一个实施例中,图6至图9相关实施例所描述的发送端的功能,由计算设备100的协议栈25来实现,如图3所示。相应地,以上实施例描述的网络设备210的功能也可由网络设备210的协议栈来实现。计算设备100的协议栈25可由计算设备100上的软件、硬件和/或固件的适当组合执行。例如,在一种可能的实现方式中,协议栈25以可执行程序的形式存储于存储器130中;计算设备100的处理器110运行协议栈对应的可执行程序,以执行上述各方法实施例所描述的发送端的部分或全部步骤。
在一种可能的实现方式中,协议栈25为一个独立的可执行程序,操作系统131通过接口调用协议栈以进行数据包处理和传输。在另一种可能的实现方式中,协议栈25也可以被包含在操作系统131中,作为操作系统内核23的一部分。其中,协议栈按照协议层级或 功能又可以分为多个协议组件或模块,每一个组件实现一层协议的功能,比如网络层协议组件用于实现网络层协议(比如IP协议),传输层协议组件用于实现传输层协议(比如TCP或者UDP协议),等等。
需要说明的是,本申请实施例所使用的术语“可执行程序”应被广泛地解释为包括但不限于:指令,指令集,代码,代码段,子程序,软件模块,应用,软件包,线程,进程,函数,固件,中间件等。
图6至图9相关实施例所描述的发送端的功能,也可以由计算设备100的网卡150来实现。在一种可能的实现方式中,如图4所示,该网卡150为可编程网卡,传输层协议组件255的功能可以由现场可编程门阵列(Field Programmable Gate Array,FPGA)或特定用途集成电路(Application Specific Integrated Circuit,ASIC)等硬件电路或专用芯片来实现,该硬件电路或专用芯片可以集成在可编程网卡150中。在另一种可能的实现方式中,网卡150包括:处理器、存储器以及输入/输出端口,其中,输入/输出端口用于接收和发送数据包,存储器用于缓存接收到的数据包或待发送的数据包,处理器通过运行存储器中的程序,以实现图6至图9相关实施例所描述的发送端的功能。
可以理解的是,本申请实施例所描述的数据传输方法应用于数据网络、因特网、局域网等网络中发送端和接收端之间数据传输,发送端和接收端为建立通信连接且具有数据收发能力的设备,比如计算机、终端设备、服务器、交换机、路由器等等。
本申请实施例还提供一种网络设备400,如图11所示,该网络设备400包括:处理电路402,以及与其连接的通信接口404和存储介质406。
处理电路402用于处理数据,控制数据访问和存储,发出命令以及控制其它组件执行操作。处理电路402可以被实现为一个或多个处理器,一个或多个控制器和/或可用于执行程序的其它结构。处理电路402具体可以包括通用处理器,数字信号处理器(DSP),专用集成电路(ASIC),现场可编程门阵列(FPGA)或其它可编程逻辑组件中的至少一种。通用处理器可以包括微处理器,以及任何常规的处理器,控制器,微控制器,或状态机。处理电路302也可以实现为计算组件,例如DSP和微处理器的组合。
存储介质406可以包括计算机可读存储介质,如磁存储设备(例如,硬盘,软盘,磁条),光存储介质(例如,数字多功能盘(DVD)),智能卡,闪存设备,随机存取存储器(RAM),只读存储器(ROM),可编程ROM(PROM),可擦除PROM(EPROM),寄存器,以及它们的任意组合。存储介质406可以耦合到处理电路402以使得处理电路402可读取信息和将信息写入到存储介质406。具体地,存储介质406可以集成到处理电路402,或者存储介质406和处理电路302可以是分开的。
通信接口404可包括电路和/或程序以实现网络设备400与一个或多个无线网络设备(例如,路由器、交换机、接入点等等)之间的双向通信。通信接口404包括至少一个接收电路416和/或至少一个发射电路418。在一个实施例中,通信接口404可以是全部或部分由无线调制解调器来实现。
在一个实施例中,存储介质406中存储有协议栈程序420,处理电路402被适配为执行存储在存储介质406中的协议栈程序420,以实现上述图6至9相关的实施例中网络设备的部分或全部功能。
基于以上各个实施例描述的数据传输方法,本申请实施例还提供一种数据传输系统,
如图12所示,该系统包括通过物理交换机230连接的多台服务器(例如图12所示的服务器103、105和107)。以服务器103为例,服务器103的硬件包括处理器102、存储器104和网卡106,服务器103的软件包括虚拟交换机250、虚拟机(virtual machine,VM)205和206,服务器103上的VM 205和VM 206通过虚拟交换机250通信,VM 205和VM206通过虚拟交换机250和网卡103与其它服务器上的VM通信。其中,服务器103用于实现图6至图9相关实施例所描述的发送端的功能,物理交换机230用于实现图6至图9相关实施例所描述的网络设备的功能。具体地,在一个实施例中,服务器103可以通过处理器102执行存储器104中的可执行程序来实现以上描述的发送端的功能,可以理解的是,在程序执行过程中,若涉及对外发送数据包(比如向另一服务器发送数据包),则需要调用网卡106的驱动程序以驱动网卡106执行数据包的发送操作。在另一个实施例中,服务器103也可以通过网卡106来单独实现以上描述的发送端的功能,比如可以由集成在网卡106中的FPGA或ASIC等硬件电路或专用芯片来实现。
上述数据传输系统仅是举例说明,适用于本发明技术方案的数据传输统不限于此,例如,服务器还可以安装容器或者其它虚拟操作系统软件,服务器的数量还可以是其它数量,每个服务器所包括的硬件也都不限于图12所示的硬件。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是终端设备,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储可执行程序的介质。

Claims (39)

  1. 一种数据传输方法,其特征在于,包括:
    发送端在与接收端的数据传输阶段的第一个往返时间RTT内发送多个数据包;
    所述发送端为所述第一个RTT内发送的所述多个数据包添加第一标记,以使得网络设备在接收到携带所述第一标记的数据包后,将所述携带所述第一标记的数据包缓存至低优先级队列或丢弃,其中,所述网络设备的高优先级队列中的数据包优先于所述低优先级队列中的数据包被转发,所述高优先级队列中缓存的数据包未携带所述第一标记。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    所述发送端确认所述第一个RTT内发送的所述多个数据包中被所述接收端成功接收的数据包的个数;
    所述发送端根据确定的所述成功接收的数据包的个数,确定第二个RTT内发送数据包的个数或发送速率;
    所述发送端在第二个RTT内,基于确定的所述发送数据包的个数或发送速率发送一个或多个数据包。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    所述发送端为所述第二个RTT内发送的所述一个或多个数据包添加第二标记,以使得网络设备在接收到携带所述第二标记的数据包后,将所述携带所述第二标记的所述数据包缓存至高优先级队列,其中,所述低优先级队列中缓存的数据包未携带所述第二标记。
  4. 根据权利要求2所述的方法,其特征在于,还包括:
    所述发送端为所述第二个RTT内发送的所述一个或多个数据包添加第二标记,以使得网络设备在接收到携带所述第二标记的数据包后,若接收队列已满或所述接收队列中数据包的个数超出设定阈值,则将所述接收队列中未携带所述第二标记的一个或多个数据包丢弃。
  5. 根据权利要求1所述的方法,其特征在于,所述第一标记用于指示所述第一个RTT内发送的数据包。
  6. 根据权利要求3或4所述的方法,其特征在于,所述第二标记用于指示非所述第一个RTT内发送的数据包。
  7. 根据权利要求1至6任一项所述的方法,所述第一个RTT为所述发送端与所述接收端建立通信连接后的首个RTT。
  8. 一种数据传输方法,其特征在于,包括:
    网络设备接收发送端发送的数据包;
    若所述数据包是所述发送端在与接收端的数据传输阶段的第一个往返时间RTT内发送的,则所述网络设备将所述数据包缓存至低优先级队列;
    若所述数据包不是在所述第一个RTT内发送的,则所述网络设备将所述数据包缓存至高优先级队列;其中,所述高优先级队列中的数据包优先于所述低优先级队列中的数据包被转发。
  9. 根据权利要求8所述的方法,其特征在于,还包括:
    所述网络设备确定所述数据包是否是所述发送端在所述第一个RTT内发送的。
  10. 根据权利要求9所述的方法,其特征在于,所述数据包携带有第一标记,所述第一标记是由所述发送端添加的,用于指示所述数据包为所述第一个RTT内发送的;所述确定步骤包括:
    所述网络设备根据所述数据包携带的所述第一标记,确定所述数据包是所述发送端在所述第一个RTT内发送的。
  11. 根据权利要求9或10所述的方法,其特征在于,所述数据包携带有第二标记,所述第二标记是由所述发送端添加的,用于指示所述数据包不是在所述第一个RTT内发送的;所述确定步骤包括:
    所述网络设备根据所述数据包携带的所述第二标记,确定所述数据包不是所述发送端在所述第一个RTT内发送的。
  12. 根据权利要求9所述的方法,其特征在于,所述确定步骤包括:
    所述网络设备根据所述数据包的特征信息查询流表,以确定所述数据包是否是所述发送端在所述第一个RTT内发送的,其中,所述流表存储有一条或多条数据流的特征信息。
  13. 一种数据传输方法,其特征在于,包括:
    网络设备接收发送端发送的数据包;
    若所述数据包是所述发送端在与所述接收端的数据传输阶段的第一个往返时间RTT内发送的,且所述网络设备的接收队列中数据包的个数超出设定阈值,则所述网络设备将所述数据包丢弃;
    若所述数据包不是在所述第一个RTT内发送的,且所述接收队列未满,则所述网络设备将所述数据包加入所述接收队列。
  14. 根据权利要求13所述的方法,其特征在于,还包括:
    若所述数据包不是在所述第一个RTT内发送的,且所述接收队列已满,则所述网络设备将所述数据包丢弃。
  15. 根据权利要求14所述的方法,其特征在于,还包括:
    若所述数据包不是在所述第一个RTT内发送的,且所述接收队列已满,则所述网络设备将所述接收队列中的一个数据包丢弃,其中,丢弃的所述数据包为所述发送端在所述第一个RTT内发送的数据包。
  16. 根据权利要求14或15所述的方法,其特征在于,还包括:所述网络设备确定所述数据包是否是所述发送端在所述第一个RTT内发送的。
  17. 根据权利要求16所述的方法,其特征在于,所述数据包携带有第一标记,所述第一标记是由所述发送端添加的,用于指示所述数据包为所述第一个RTT内发送的;所述确定步骤包括:
    所述网络设备根据所述数据包携带的所述第一标记,确定所述数据包是所述发送端在所述第一个RTT内发送的。
  18. 根据权利要求16或17所述的方法,其特征在于,所述数据包携带有第二标记,所述第二标记是由所述发送端添加的,用于指示所述数据包不是在所述第一个RTT内发送的;所述确定步骤包括:
    所述网络设备根据所述数据包携带的所述第二标记,确定所述数据包不是所述发送端在所述第一个RTT内发送的。
  19. 根据权利要求16所述的方法,其特征在于,所述确定步骤包括:
    所述网络设备根据所述数据包的特征信息查询流表,以确定所述数据包是否是所述发送端在所述第一个RTT内发送的,其中,所述流表存储有一条或多条数据流的特征信息。
  20. 一种计算设备,包括:处理器、存储器及存储在所述存储器上的可执行程序,其特征在于,所述处理器执行所述程序时实现权利要求1至7中任一项所述方法的步骤。
  21. 一种网络设备,包括:处理器、存储器及存储在所述存储器上的可执行程序,其特征在于,所述处理器执行所述程序时实现权利要求8至12中任一项所述方法的步骤。
  22. 一种网络设备,包括:处理器、存储器及存储在所述存储器上的可执行程序,其特征在于,所述处理器执行所述程序时实现权利要求13至19中任一项所述方法的步骤。
  23. 一种网卡,包括:输入/输出端口,处理器,其特征在于,所述处理器用于,在发送端与接收端的数据传输阶段的第一个往返时间RTT内通过所述输入/输出端口发送多个数据包;为所述第一个RTT内发送的所述多个数据包添加第一标记,以使得网络设备在接收到携带所述第一标记的数据包后,将所述携带所述第一标记的数据包缓存至低优先级队列或丢弃,其中,所述网络设备的高优先级队列中的数据包优先于所述低优先级队列中的数据包被转发,所述高优先级队列中缓存的数据包未携带所述第一标记。
  24. 根据权利要求23所述的网卡,其特征在于,所述处理器还用于,确认所述第一个RTT内发送的所述多个数据包中被所述接收端成功接收的数据包的个数;根据确定的所述成功接收的数据包的个数,确定第二个RTT内发送数据包的个数或发送速率;在第二个RTT内,基于确定的所述发送数据包的个数或发送速率发送一个或多个数据包。
  25. 根据权利要求24所述的网卡,其特征在于,所述处理器还用于,为所述第二个RTT内发送的所述一个或多个数据包添加第二标记,以使得网络设备在接收到携带所述第二标记的数据包后,将所述携带所述第二标记的所述数据包缓存至高优先级队列,其中,所述低优先级队列中缓存的数据包未携带所述第二标记。
  26. 一种计算设备,其特征在于,包括如权利要求23至25任一项所述的网卡。
  27. 一种网络设备,其特征在于,包括:
    接收器,用于接收发送端向接收端发送的数据包;
    处理器,用于当所述数据包是所述发送端在与所述接收端的数据传输阶段的第一个往返时间RTT内发送时,将所述数据包加入低优先级队列;当所述数据包不是在所述第一个RTT内发送时,将所述数据包缓存至高优先级队列;其中,所述高优先级队列中的数据包优先于低优先级队列中的数据包被转发;
    存储器,用于缓存所述高优先级队列和所述低优先级队列。
  28. 根据权利要求27所述的网络设备,其特征在于,所述处理器还用于,确定所述数据包是否是所述发送端在所述第一个RTT内发送的。
  29. 根据权利要求28所述的网络设备,其特征在于,所述数据包携带有第一标记,所述第一标记是由所述发送端添加的,用于指示所述数据包为所述第一个RTT内发送的;所述处理器具体用于,根据所述数据包携带的所述第一标记,确定所述数据包是所述发送端 在所述第一个RTT内发送的。
  30. 根据权利要求28所述的网络设备,其特征在于,所述存储器中存储有流表,所述流表包含有一条或多条数据流的特征信息;所述处理器具体用于,根据所述数据包的特征信息查询所述流表,以确定所述数据包是否是所述发送端在所述第一个RTT内发送的。
  31. 一种网络设备,其特征在于,包括:
    接收器,用于接收发送端向接收端发送的数据包;
    处理器,用于当所述数据包是所述发送端在与所述接收端的数据传输阶段的第一个往返时间RTT内发送的,且所述网络设备的接收队列中数据包的个数超出设定阈值时,将所述数据包丢弃;当所述数据包不是在所述第一个RTT内发送的,且所述接收队列未满时,将所述数据包加入所述接收队列;
    存储器,用于缓存所述接收队列。
  32. 根据权利要求31所述的网络设备,其特征在于,所述处理器还用于,若所述数据包不是在所述第一个RTT内发送的,且所述接收队列已满,将所述数据包丢弃。
  33. 根据权利要求31或32所述的网络设备,其特征在于,所述处理器还用于,确定所述数据包是否是所述发送端在所述第一个RTT内发送的。
  34. 根据权利要求33所述的网络设备,其特征在于,所述数据包携带有第一标记,所述第一标记是由所述发送端添加的,用于指示所述数据包为所述第一个RTT内发送的;所述处理器具体用于,根据所述数据包携带的所述第一标记,确定所述数据包是所述发送端在所述第一个RTT内发送的。
  35. 根据权利要求33所述的网络设备,其特征在于,所述存储器中存储有流表,所述流表包含有一条或多条数据流的特征信息;所述处理器具体用于,根据所述数据包的特征信息查询所述流表,以确定所述数据包是否是所述发送端在所述第一个RTT内发送的。
  36. 一种计算机可读存储介质,其上存储有可执行程序(指令),其特征在于,该程序(指令)被处理器执行时实现权利要求1至7中任一项所述方法的步骤。
  37. 一种计算机可读存储介质,其上存储有可执行程序(指令),其特征在于,该程序(指令)被处理器执行时实现权利要求8至12中任一项所述方法的步骤。
  38. 一种计算机可读存储介质,其上存储有可执行程序(指令),其特征在于,该程序(指令)被处理器执行时实现权利要求13至19中任一项所述方法的步骤。
  39. 一种数据传输系统,其特征在于,包括:如权利要求20所述的计算设备,以及权利要求21或22所述的网络设备。
PCT/CN2019/087382 2018-06-29 2019-05-17 一种数据传输方法、计算设备、网络设备及数据传输系统 WO2020001192A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP19826507.6A EP3742690B1 (en) 2018-06-29 2019-05-17 Data transmission method, computing device, network device and data transmission system
US17/006,196 US11477129B2 (en) 2018-06-29 2020-08-28 Data transmission method, computing device, network device, and data transmission system
US17/856,161 US11799790B2 (en) 2018-06-29 2022-07-01 Data transmission method, computing device, network device, and data transmission system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810711997.X 2018-06-29
CN201810711997.XA CN110661723B (zh) 2018-06-29 2018-06-29 一种数据传输方法、计算设备、网络设备及数据传输系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/006,196 Continuation US11477129B2 (en) 2018-06-29 2020-08-28 Data transmission method, computing device, network device, and data transmission system

Publications (1)

Publication Number Publication Date
WO2020001192A1 true WO2020001192A1 (zh) 2020-01-02

Family

ID=68984678

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087382 WO2020001192A1 (zh) 2018-06-29 2019-05-17 一种数据传输方法、计算设备、网络设备及数据传输系统

Country Status (4)

Country Link
US (2) US11477129B2 (zh)
EP (1) EP3742690B1 (zh)
CN (1) CN110661723B (zh)
WO (1) WO2020001192A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113824777A (zh) * 2021-09-06 2021-12-21 武汉中科通达高新技术股份有限公司 数据管理方法和数据管理装置
CN113890852A (zh) * 2021-08-24 2022-01-04 北京旷视科技有限公司 数据发送方法、装置、设备及介质

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110661723B (zh) * 2018-06-29 2023-08-22 华为技术有限公司 一种数据传输方法、计算设备、网络设备及数据传输系统
EP3949680B1 (en) * 2019-04-01 2023-11-01 Telefonaktiebolaget Lm Ericsson (Publ) Methods and arrangements for managing round trip time associated with provision of a data flow via a multi-access communication network
CN111464374B (zh) * 2020-02-21 2021-09-21 中国电子技术标准化研究院 网络延迟控制方法、设备及装置
CN111464839A (zh) * 2020-03-17 2020-07-28 南京创维信息技术研究院有限公司 一种能显示辅助信息的主副屏系统
CN111404783B (zh) * 2020-03-20 2021-11-16 南京大学 一种网络状态数据采集方法及其系统
US11374858B2 (en) 2020-06-30 2022-06-28 Pensando Systems, Inc. Methods and systems for directing traffic flows based on traffic flow classifications
US11818022B2 (en) * 2020-06-30 2023-11-14 Pensando Systems Inc. Methods and systems for classifying traffic flows based on packet processing metadata
FR3112262B1 (fr) * 2020-07-01 2022-06-10 Sagemcom Energy & Telecom Sas Procede de regulation destine a resorber un engorgement d’un reseau maille de communication par courants porteurs en ligne
CN112543129B (zh) * 2020-11-27 2022-06-21 北京经纬恒润科技股份有限公司 队列深度的确认方法、系统及报文模拟器
CN113572582B (zh) * 2021-07-15 2022-11-22 中国科学院计算技术研究所 数据发送、重传控制方法及系统、存储介质及电子设备
US11916791B2 (en) * 2021-12-22 2024-02-27 Cloudbrink Inc. Modifying data packet transmission strategy based on transmission control protocol stage
CN115884229B (zh) * 2023-01-29 2023-05-12 深圳开鸿数字产业发展有限公司 传输时延的管理方法、电子设备和存储介质
CN117112044B (zh) * 2023-10-23 2024-02-06 腾讯科技(深圳)有限公司 基于网卡的指令处理方法、装置、设备和介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020161914A1 (en) * 1999-10-29 2002-10-31 Chalmers Technology Licensing Ab Method and arrangement for congestion control in packet networks
CN101115013A (zh) * 2006-06-30 2008-01-30 阿尔卡特朗讯 提供资源准入控制的方法
CN106027416A (zh) * 2016-05-23 2016-10-12 北京邮电大学 一种基于时空结合的数据中心网络流量调度方法及系统
CN106953742A (zh) * 2017-02-16 2017-07-14 广州海格通信集团股份有限公司 一种基于sdn的无线异构网带宽保障方法

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6023368B2 (ja) * 1977-08-29 1985-06-07 川崎重工業株式会社 アクチユエ−タの異常検出方式
US8051197B2 (en) * 2002-03-29 2011-11-01 Brocade Communications Systems, Inc. Network congestion management systems and methods
US7706394B2 (en) * 2003-07-23 2010-04-27 International Business Machines Corporation System and method for collapsing VOQ's of a packet switch fabric
US20080037420A1 (en) * 2003-10-08 2008-02-14 Bob Tang Immediate ready implementation of virtually congestion free guaranteed service capable network: external internet nextgentcp (square waveform) TCP friendly san
DE602004014036D1 (de) * 2004-04-07 2008-07-03 France Telecom Verfahren und einrichtung zum senden von datenpaketen
CN101023455A (zh) * 2004-08-17 2007-08-22 加州理工大学 使用排队控制和单向延迟测量实现网络拥塞控制的方法和设备
EP1829321A2 (en) * 2004-11-29 2007-09-05 Bob Tang Immediate ready implementation of virtually congestion free guaranteed service capable network: external internet nextgentcp (square wave form) tcp friendly san
CA2679951A1 (en) * 2007-03-12 2008-09-18 Citrix Systems, Inc. Systems and methods for dynamic bandwidth control by proxy
JP4888515B2 (ja) * 2009-04-16 2012-02-29 住友電気工業株式会社 動的帯域割当装置及び方法とponシステムの局側装置
WO2012019080A1 (en) * 2010-08-06 2012-02-09 Acquire Media Ventures Inc. Method and system for pacing, ack'ing, timing, and handicapping (path) for simultaneous receipt of documents
US20140164640A1 (en) * 2012-12-11 2014-06-12 The Hong Kong University Of Science And Technology Small packet priority congestion control for data center traffic
US9628406B2 (en) * 2013-03-13 2017-04-18 Cisco Technology, Inc. Intra switch transport protocol
CN103457871B (zh) * 2013-09-18 2016-03-30 中南大学 Dcn中基于延迟约束的拥塞避免阶段的增窗方法
EP2869514A1 (en) * 2013-10-30 2015-05-06 Alcatel Lucent Method and system for queue management in a packet-switched network
US10999012B2 (en) * 2014-11-07 2021-05-04 Strong Force Iot Portfolio 2016, Llc Packet coding based network communication
US9185045B1 (en) * 2015-05-01 2015-11-10 Ubitus, Inc. Transport protocol for interactive real-time media
CN105471750A (zh) * 2016-01-21 2016-04-06 清华大学深圳研究生院 一种内容中心网络下提高pit表性能的方法
CN106059951B (zh) * 2016-06-08 2019-03-01 中南大学 一种用于dcn中基于多级拥塞反馈的传输控制方法
CN112866127B (zh) * 2018-02-14 2022-12-30 华为技术有限公司 一种分组网络中控制流量的方法及装置
CN110661723B (zh) * 2018-06-29 2023-08-22 华为技术有限公司 一种数据传输方法、计算设备、网络设备及数据传输系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020161914A1 (en) * 1999-10-29 2002-10-31 Chalmers Technology Licensing Ab Method and arrangement for congestion control in packet networks
CN101115013A (zh) * 2006-06-30 2008-01-30 阿尔卡特朗讯 提供资源准入控制的方法
CN106027416A (zh) * 2016-05-23 2016-10-12 北京邮电大学 一种基于时空结合的数据中心网络流量调度方法及系统
CN106953742A (zh) * 2017-02-16 2017-07-14 广州海格通信集团股份有限公司 一种基于sdn的无线异构网带宽保障方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113890852A (zh) * 2021-08-24 2022-01-04 北京旷视科技有限公司 数据发送方法、装置、设备及介质
CN113824777A (zh) * 2021-09-06 2021-12-21 武汉中科通达高新技术股份有限公司 数据管理方法和数据管理装置
CN113824777B (zh) * 2021-09-06 2023-12-19 武汉中科通达高新技术股份有限公司 数据管理方法和数据管理装置

Also Published As

Publication number Publication date
CN110661723A (zh) 2020-01-07
CN110661723B (zh) 2023-08-22
US11799790B2 (en) 2023-10-24
US11477129B2 (en) 2022-10-18
EP3742690A4 (en) 2021-03-10
EP3742690B1 (en) 2024-03-20
US20200396169A1 (en) 2020-12-17
EP3742690A1 (en) 2020-11-25
US20220337528A1 (en) 2022-10-20

Similar Documents

Publication Publication Date Title
WO2020001192A1 (zh) 一种数据传输方法、计算设备、网络设备及数据传输系统
US11329920B2 (en) Method and apparatus for network congestion control based on transmission rate gradients
CN109936510B (zh) 多路径rdma传输
KR102203509B1 (ko) 패킷 전송 방법, 단말, 네트워크 디바이스 및 통신 시스템
EP3370376B1 (en) Data transfer method, sending node, receiving node and data transfer system
US8605590B2 (en) Systems and methods of improving performance of transport protocols
EP2154857B1 (en) Data sending control method and data transmission device
WO2018210117A1 (zh) 一种拥塞控制方法、网络设备及其网络接口控制器
WO2017050216A1 (zh) 一种报文传输方法及用户设备
US20060203730A1 (en) Method and system for reducing end station latency in response to network congestion
CN109714267B (zh) 管理反向队列的传输控制方法及系统
JP2007534194A (ja) パケットを再配列する際のtcp性能の改善
Tam et al. Preventing TCP incast throughput collapse at the initiation, continuation, and termination
US20230059755A1 (en) System and method for congestion control using a flow level transmit mechanism
US8054847B2 (en) Buffer management in a network device
CN111224888A (zh) 发送报文的方法及报文转发设备
EP3108631B1 (en) Buffer bloat control
WO2018133784A1 (zh) 报文处理方法、设备及系统
Molia et al. A conceptual exploration of TCP variants
TW202335470A (zh) 網路流壅塞管理裝置及其方法
Kulkarni et al. Addressing TCP Incast

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19826507

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019826507

Country of ref document: EP

Effective date: 20200820

NENP Non-entry into the national phase

Ref country code: DE