WO2018219100A1 - 数据传输的方法和设备 - Google Patents

数据传输的方法和设备 Download PDF

Info

Publication number
WO2018219100A1
WO2018219100A1 PCT/CN2018/085942 CN2018085942W WO2018219100A1 WO 2018219100 A1 WO2018219100 A1 WO 2018219100A1 CN 2018085942 W CN2018085942 W CN 2018085942W WO 2018219100 A1 WO2018219100 A1 WO 2018219100A1
Authority
WO
WIPO (PCT)
Prior art keywords
data stream
sent
time interval
data
duration
Prior art date
Application number
PCT/CN2018/085942
Other languages
English (en)
French (fr)
Inventor
沈利
周洪
吴涛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP18809629.1A priority Critical patent/EP3637704B1/en
Publication of WO2018219100A1 publication Critical patent/WO2018219100A1/zh
Priority to US16/699,352 priority patent/US11140082B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/19Flow control; Congestion control at layers above the network layer
    • H04L47/196Integration of transport layer protocols, e.g. TCP and UDP
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • H04L61/2517Translation of Internet protocol [IP] addresses using port numbers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/164Adaptation or special uses of UDP protocol
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/618Details of network addresses
    • H04L2101/663Transport layer addresses, e.g. aspects of transmission control protocol [TCP] or user datagram protocol [UDP] ports
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/14Two-way operation using the same type of signal, i.e. duplex
    • H04L5/1469Two-way operation using the same type of signal, i.e. duplex using time-sharing

Definitions

  • the present application relates to the field of communications and, more particularly, to a method and apparatus for data transmission.
  • the data center (DC) network is more and more widely created by cloud service providers and enterprises.
  • the load balancing effect of the data center network on the network traffic is directly related to the user experience.
  • the network congestion packet loss is mainly divided into two cases: First, the local load is unbalanced, such as Equal Cost Multi-Path (ECMP) and VLB mode. The size of the data flows is different, so the traffic cannot be balanced. The large flows may be mapped to the same link, causing the aggregate traffic to exceed the port capacity and causing congestion. Load balancing may result in multiple large streams of multiple leaf switches going to the same leaf to the same Spine switch, causing downstream traffic congestion.
  • ECMP Equal Cost Multi-Path
  • VLB mode Low Cost Multi-Path
  • the large flows may be mapped to the same link, causing the aggregate traffic to exceed the port capacity and causing congestion.
  • Load balancing may result in multiple large streams of multiple leaf switches going to the same leaf to the same Spine switch, causing downstream traffic congestion.
  • network congestion may cause a large stream to occupy a certain link bandwidth, causing a small flow (Small Flow) to fail to forward; several large streams are scheduled to the same link, causing the large stream itself to lose packets due to insufficient bandwidth. .
  • Small Flow Small Flow
  • the present invention provides a data transmission method and device, which can periodically insert delays for two groups of messages in a data stream, actively construct a substream, break up a large stream, and eliminate the persistence of ports in the switching network. Congestion, load balancing effect is good, and easy to implement. In turn, it overcomes congestion and packet loss caused by unbalanced network traffic load and increases the reliability of data transmission.
  • the embodiment of the present application provides a data transmission method, including: calculating, according to at least one data flow to be sent, a first time interval, where the first time interval is a preset value, where Different data streams in at least one data stream to be sent have different five-tuples;
  • the method for data transmission may be performed by a remote direct data access (RDMA) network card, or may be performed by an access TOR switch.
  • RDMA remote direct data access
  • the substream when at least one data stream is sent, the substream is actively constructed for the data stream, thereby dispersing the data stream and alleviating the continuous congestion of the ports in the switching network.
  • the load balancing effect is good and easy to implement.
  • the method further includes:
  • the method further includes:
  • the message of the first data stream is not sent during the second time interval.
  • the packets sent in the second time interval belong to different data flows.
  • the method further includes:
  • UDP User Data protocol
  • TCP Transmission Control Protocol
  • the method further includes:
  • the method further includes:
  • the calculating the first duration according to the at least one data stream to be sent and the first time interval including:
  • the first duration ⁇ the first time interval / (the number of data streams to be transmitted - 1).
  • determining the quantity of the data stream to be sent includes:
  • the number of data streams to be transmitted is updated when a data stream is sent and/or a new data stream is completed.
  • the method further includes:
  • the sending of the first data stream includes:
  • the at least one data stream to be sent is an RDMA RoCEv2 stream that is fused by the Ethernet bearer.
  • the embodiment of the present application provides a device for data transmission, which can execute the module or unit of the method in the first aspect or any alternative implementation manner of the first aspect.
  • an apparatus for data transmission comprising a memory, a transceiver, and a processor having stored thereon program code for indicating execution of the first aspect or any optional implementation thereof, for a transceiver
  • the specific signal transceiving is performed under the driving of the processor, and when the code is executed, the processor can implement various operations performed by the device in the method.
  • a computer storage medium storing program code for instructing a computer to perform the method of any of the first aspect or the first aspect of the first aspect. instruction.
  • a computer program product comprising instructions, when executed on a computer, causes the computer to perform the methods described in the various aspects above.
  • FIG. 1 is a schematic diagram of a data center network using the method of data transmission of the present application.
  • FIG. 2 is a schematic diagram of another data center network using the method of data transmission of the present application.
  • FIG. 3 is a schematic diagram of a method of data transmission in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a RoCEv2 protocol in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of data transmission in an embodiment of the present application.
  • FIG. 6 is another schematic diagram of data transmission in the embodiment of the present application.
  • FIG. 7 is still another schematic diagram of data transmission in the embodiment of the present application.
  • FIG. 8 is still another schematic diagram of data transmission in the embodiment of the present application.
  • FIG. 9 is a schematic diagram of port number setting in a data transmission process according to an embodiment of the present application.
  • FIG. 10 is another schematic diagram of port number setting in a data transmission process according to an embodiment of the present application.
  • FIG. 11 is still another schematic diagram of port number setting in a data transmission process according to an embodiment of the present application.
  • FIG. 12 is a schematic block diagram of an apparatus for data transmission according to an embodiment of the present application.
  • FIG. 13 is a schematic block diagram of an apparatus for data transmission provided by an embodiment of the present application.
  • the method for data transmission of the present application can be applied to a switch (for example, an access TOR switch in a CLOS network), and can also be applied to a network card (for example, a remote direct data access network card integrated in a server), specifically
  • the method of the present application can be implemented on a chip of a switch or a network card.
  • the following is a CLOS network as an example to describe a data center network using the data transmission method of the present application, and the present application does not impose any limitation thereon.
  • the data center network 100 is a secondary CLOS network, and specifically includes: two types of switches, one is connected to the TOR switch, the downlink port is connected to the server, the uplink port is connected to the core SPINE switch, and the other is SPINE.
  • the SPINE switch is used to connect to the TOR switch.
  • the servers under different TOR switches communicate with each other and the traffic passes through the SPINE switch.
  • data transmission between servers can be implemented by a TOR switch and a SPINE switch.
  • the server Server#1 sends a message of the first data stream to the Server#3.
  • the Server#1 sends the message of the first data stream to the TOR#1; then, the TOR#1 can pass the SPINE#1 to the TOR. #3, the message of the first data stream is sent, and TOR#1 can also send the message of the first data stream to TOR#3 through SPINE#2; finally, TOR#3 sends the first data stream to the server# 3 messages.
  • the data transmission method described in this application may be implemented at TOR#1 and TOR#3, and the data described in this application may also be implemented at Server#1 and Server#3. The method of transmission.
  • the data center network 200 is a three-level CLOS network, and specifically includes: three types of switches, one is a TOR switch, the downlink port is connected to the server, the uplink port is connected to the aggregation AGG switch, and the other is an AGG switch.
  • the downlink port is connected to the TOR switch, and the uplink port is connected to the SPINE switch.
  • the other is the SPINE switch.
  • the SPINE switch is used to connect to the AGG switch.
  • the servers under different TOR switches communicate with each other and the traffic passes through the AGG switch and the SPINE switch.
  • data transmission between servers can be implemented by a TOR switch, an AGG switch, and a SPINE switch.
  • the server Server#1 sends a message of the first data stream to the Server#2.
  • the Server#1 sends the message of the first data stream to the TOR#1; then, the TOR#1 can pass the AGG#1 to the SPINE.
  • the message of the first data stream, TOR#1 can also send the message of the first data stream to SPINE#2 through AGG#2; then, when the message of the first data stream is transmitted to SPINE#1, SPINE #1 can send the message of the first data stream to the TOR#2 through the AGG#1, and the SPINE#1 can send the message of the first data stream to the TOR#2 through the AGG#2, when the first data stream is When the message is transmitted to SPINE#2, SPINE#2 can send the message of the first data stream to TOR#2 through AGG#1, and SPINE#2 can send the first data stream to TOR#2 through AGG#2. The message is finally sent by TOR#2 to the first data stream to Server#2.
  • the server Server#1 sends the message of the first data stream to the Server#2.
  • the Server#1 sends the message of the first data stream to the TOR#1; then, the TOR#1 can directly pass the AGG#1.
  • Sending the message of the first data stream to TOR#2, TOR#1 may also directly send the message of the first data stream to TOR#2 through AGG#2; finally, TOR#2 to the first data stream
  • the message is sent to Server#2.
  • the data transmission method described in this application may be implemented at TOR#1 and TOR#2, and the application may be implemented at Server#1 and Server#2. The method of data transmission described.
  • the data center network 100 and the data center network 200 shown in FIG. 1 and FIG. 2 are only simple examples of the secondary CLOS network and the tertiary CLOS network, and the number of servers, TOR switches, AGG switches, and SPINE switches in actual deployment. It can be determined according to factors such as network size and application type.
  • FIG. 3 is a schematic diagram of a method 300 for data transmission according to an embodiment of the present application.
  • the execution body of the method 300 may be an RDMA network card integrated in a server, or may be a TOR switch.
  • the following is an RDMA network card as an execution subject.
  • the method 300 includes:
  • S310 Calculate a first duration according to the at least one data stream to be sent and the first time interval.
  • the first time interval is a preset value, and different data streams in the at least one data stream to be sent have different five elements. group.
  • the at least one data stream to be sent is an RDMA over Converged Ethernet (RoCE) v2 stream.
  • RoCE RDMA over Converged Ethernet
  • the RoCEv2 data stream may be a 5-tuple hash-based ECMP load balancing flow.
  • the quintuple refers to a source Internet Protocol Address (src IP), a destination Internet Protocol Address (dst IP), and an Internet Protocol Protocol (IP Protocol).
  • src IP source Internet Protocol Address
  • dst IP destination Internet Protocol Address
  • IP Protocol Internet Protocol Protocol
  • the Ether Type may indicate that the data packet is an IP, and the IP protocol number may indicate that the data packet is UDP. Use the UDP port number to indicate that the next header is IB.BTH.
  • different data streams in the at least one data stream to be sent have different five-tuples.
  • two data streams of different tuples in a five-tuple are different data streams.
  • two data streams with different source port numbers in a quintuple are different data streams.
  • the first time interval is a preset value, and the first time interval is greater than or equal to a maximum path delay difference, which may be represented by a Flowlet Gap.
  • the first duration ⁇ the first time interval / (the number of data streams to be transmitted - 1). It should be understood that the number of data streams to be transmitted at this time is greater than or equal to two.
  • the first duration is greater than or equal to the first time interval.
  • the number of data streams to be sent may be updated when a data stream is sent and/or a new data stream is completed.
  • the RDMA network card after determining the first duration, the RDMA network card periodically transmits the message in the first data stream.
  • the RDMA network card may send the message in the first data stream once every second time interval.
  • first time period and the second time period are only used to send any two adjacent time segments of the first data stream, which is not limited in this embodiment.
  • the RDMA network card can send the same number of packets in each time period, for example, sending 5 messages in a first time period, 5 messages in a second time period, ..., the last one Send 5 messages in the time period.
  • the RDMA network card may send the same number of packets in each time period except the last time period, and may send less than the number of other time periods in the last time period, for example, 5 messages are sent in the first time period, 5 messages are sent in the second time period, ..., 2 messages are sent in the last time period (only 2 messages to be sent remain in the last time period) Text).
  • first data stream there is a data stream (first data stream) to be transmitted, and six messages of the first data stream are transmitted within a first time period, and one after the second time interval.
  • the first packet of the first data stream is sent again in the first time period, and then the first data stream is periodically sent in this manner.
  • the first time interval is greater than the first time interval, and the second time interval is equal to the first time interval (Flowlet Gap).
  • the message of the first data stream is not sent in the second time interval.
  • some feedback frames for example, an ACK frame, may be sent in the second time interval, or no message may be sent.
  • the packet of the at least one data stream to be sent except the first data stream is sent in the second time interval.
  • FIG. 6 there are two data streams to be transmitted (a first data stream and a second data stream), and six packets of the first data stream are transmitted in a first duration, and at intervals Sending 6 messages of the first data stream in a first time period after the second time interval; transmitting 6 messages of the second data stream in a second time interval, and after a first time interval
  • the six packets of the second data stream are sent in a second time interval, and the first data stream and the second data stream are periodically sent in this manner.
  • the first time length is equal to the first time interval
  • the second time interval is equal to the first time interval (Flowlet Gap).
  • FIG. 7 there are three data streams to be sent (a first data stream, a second data stream, and a third data stream), and six packets of the first data stream are sent in a first duration. And resending the six messages of the first data stream within a first time period after the second time interval (twice the first time interval); within a first time interval of the second time interval Sending 6 packets of the second data stream, and transmitting 6 packets of the third data stream in another first time interval, and subsequently periodically transmitting the first data stream and the second data in this manner Stream and the third data stream.
  • the first duration is equal to the first time interval.
  • messages belonging to different data streams may be sent in the second time interval.
  • FIG. 8 there are three data streams to be sent (a first data stream, a second data stream, and a third data stream), and six packets of the first data stream are sent in a first duration. And resending the six messages of the first data stream in a first time period after the second time interval; sending the three data packets of the second data stream in the second time interval, and then sending the message
  • the three packets of the three data streams are subsequently periodically sent in this manner, the first data stream, the second data stream, and the third data stream.
  • the first time length is equal to the first time interval
  • the second time interval is equal to the first time interval (Flowlet Gap).
  • the number of messages sent in the first time period and the number of messages sent in the second time interval are merely examples, and the embodiment of the present application does not do any of this. limited.
  • the number of packets sent in a first duration may be calculated according to the first duration.
  • the number of packets sent in a first duration the first duration * port rate / (8 * maximum transmission unit), wherein the unit of the port rate is kbps, the maximum transmission unit (MTU) The unit is byte.
  • MTU in the Ethernet (Ethernet) protocol can be 1500 bytes
  • PPPoE Point to Point Protocol over Ethernet
  • the port rate may be the port rate at which the RDMA network card sends packets, or the port rate at which the TOR switch sends packets.
  • the RDMA network card may continuously send multiple packets of the first data stream with the same number of calculated packets, and at intervals second.
  • the message in the first data stream is continuously sent after the time interval.
  • the RDMA network card sets different UDP source port numbers or TCP source port numbers for the messages sent in the two consecutive first durations.
  • the following example uses the UDP source port number as an example.
  • the UDP source port number set for the packet sent in the first time duration is 3123, and the UDP source port number set for the packet sent in the next first duration is 269.
  • the RDMA network card sets different UDP source port numbers or TCP source port numbers for consecutively sent packets belonging to different data flows.
  • the following example uses the UDP source port number as an example.
  • the RDMA network card sets the UDP source port number of the packet belonging to the first data stream sent in the first time period to be 3123, and belongs to the second data that is sent in the second time interval.
  • the UDP source port number set by the flow message is 62320, and the UDP source port number set for the message belonging to the first data stream sent in a first time period is 269.
  • the RDMA network card sets the UDP source port number of the packet belonging to the first data stream sent in the first time period to 4890, and belongs to the second packet sent in the second time interval.
  • the UDP source port number set in the packet of the data stream is 269
  • the UDP source port number set for the packet belonging to the third data stream sent in the second time interval is 62320, and is then within a first duration.
  • the UDP source port number set for the packets that belong to the first data stream is 3123.
  • the RDMA network card sets the same UDP source port number or TCP source port number for the message sent in the first time period.
  • the UDP port or the TCP port in the embodiment of the present application is a logical port, and the port number may range from 0 to 65535.
  • the specific port number of the RDMA network card is merely an example, and the embodiment of the present application does not limit this.
  • the RDMA network card can randomly allocate a port number when the port number needs to be set.
  • the RDMA network card may set the same UDP destination port number or TCP destination port number for the at least one data stream to be sent.
  • the destination port number of the at least one data stream to be sent may be represented by a well-known port number.
  • the UDP destination port number of the at least one data stream to be sent may be set as a well-known port. No. 4791.
  • the method 300 described in this embodiment of the present application may be implemented on a server server and a TOR switch.
  • the RDMA network card integrated in Server#1 can send multiple messages of the first data stream in the first time period, at intervals of one. Transmitting a plurality of packets of the first data stream in a second time period after the second time interval, and transmitting a plurality of packets of the first data stream in a third time period after a second time interval, ..., The packet of the first data stream is sent in such a cycle, so that the substream is actively constructed for the first data stream.
  • the first data stream needs to be sent from the server #1 to the server #3.
  • the TOR#1 can send the first data stream in the first time period. Sending, by the plurality of packets, the plurality of packets of the first data stream in a second time period after a second time interval, and transmitting the first data in a third time interval after a second time interval A plurality of packets of the stream, ..., send the message of the first data stream in a cycle, thereby constructing a substream for the first data stream.
  • the first data stream needs to be sent from the server #1 to the server #3.
  • the TOR#1 can send the first data stream in the first time period. Sending, by the plurality of packets, the plurality of packets of the first data stream in a second time period after a second time interval, and transmitting the first data in a third time interval after a second time interval Transmitting a plurality of packets, ..., sending the packets of the first data stream in a periodic manner, and setting different UDP source port numbers or TCP source port numbers for the packets sent in each time period, so that different time periods are obtained.
  • the message sent within can be sent through different paths.
  • the message in the first time period can be sent from TOR#1 to SPINE#1, and then SPINE#1 will send the received message.
  • the message in the second time period can be sent from TOR#1 to SPINE#2, then SPINE#2 sends the received message to TOR#3, and finally Send to Server#3.
  • the substream when at least one data stream is sent, the substream is actively constructed for the data stream, thereby dispersing the data stream, eliminating persistent congestion of the ports in the switching network, and load balancing.
  • the effect is good and easy to implement.
  • a different UDP source port number or TCP source port number is set for each substream of a data stream, and each substream can be implemented when the switch supports the hexadecimal hash-based ECMP load balancing function.
  • the load balancing overcomes the congestion and packet loss caused by the imbalance of network traffic load, and increases the reliability of data transmission.
  • FIG. 12 is a schematic block diagram of an apparatus 500 for data transmission according to an embodiment of the present application. As shown in FIG. 12, the device 500 includes:
  • the processing unit 510 is configured to calculate a first duration according to the at least one data stream to be sent and the first time interval, where the first time interval is a preset value, and different data streams in the at least one data stream to be sent have Different quintuples;
  • the sending unit 520 is configured to send a first data stream, where the first data stream belongs to the at least one data stream to be sent, where
  • the sending unit 520 is further configured to send, in the second time interval, a packet of the at least one data stream to be sent except the first data stream.
  • the packets sent in the second time interval belong to different data flows.
  • the processing unit 510 is further configured to separately set different UDP source port numbers for the packets sent in the first time period and the second time period.
  • processing unit 510 is further configured to set the same UDP source port number for the message sent in a first duration.
  • processing unit 510 is further configured to determine the quantity of the data stream to be sent;
  • the first duration ⁇ the first time interval / (the number of data streams to be transmitted - 1).
  • the processing unit 510 is further configured to update the quantity of the data stream to be sent when completing the sending of a data stream and/or adding a data stream.
  • the processing unit 510 is further configured to calculate, according to the first duration, a number of packets sent in a first duration
  • the sending unit 520 is further configured to continuously send the plurality of packets of the first data stream that are the same as the calculated number of the received packets, and continuously send the packets in the first data stream after the second time interval.
  • the at least one data stream to be sent is an RDMA RoCEv2 stream that is fused by the Ethernet bearer.
  • FIG. 13 is a schematic block diagram of an apparatus 600 for data transmission provided by an embodiment of the present application, where the apparatus 600 includes:
  • a memory 610 configured to store a program, where the program includes code
  • the transceiver 620 is configured to communicate with other devices;
  • the processor 630 is configured to execute program code in the memory 610.
  • the processor 630 can implement various operations performed by the RDMA network card or the TOR switch in the method 300 in FIG. 3, and details are not described herein for brevity.
  • the device 600 may be an RDMA network card or a TOR switch integrated in the server.
  • the transceiver 620 is configured to perform specific signal transceiving under the driving of the processor 630.
  • the processor 630 may be a central processing unit (CPU), and the processor 630 may also be other general-purpose processors, digital signal processors (DSPs), and application specific integrated circuits. (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and more.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 610 can include read only memory and random access memory and provides instructions and data to the processor 630. A portion of the memory 610 may also include a non-volatile random access memory. For example, the memory 610 can also store information of the device type.
  • the transceiver 620 can be used to implement signal transmission and reception functions, such as frequency modulation and demodulation functions or upconversion and down conversion functions.
  • the device 600 for data transmission can be a chip or a chipset.
  • the steps of the method disclosed in the embodiments of the present application may be directly implemented as a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory, and the processor 630 reads the information in the memory and completes the steps of the above method in combination with the hardware thereof. To avoid repetition, it will not be described in detail here.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present application.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be wired from a website site, computer, server or data center (for example, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供了一种数据传输的方法和设备,在发送至少一条数据流时,主动为数据流构造子流,从而,打散了数据流,消除了交换网内端口的持续拥塞,负载均衡效果好。该方法包括:根据至少一条待发送的数据流和第一时间间隔,计算第一时长,该第一时间间隔为一个预设值,该至少一条待发送的数据流中不同的数据流具有不同的五元组;发送第一数据流,该第一数据流属于该至少一条待发送的数据流,其中,在第一时间段内发送该第一数据流的多个报文,以及在间隔第二时间间隔之后的第二时间段内发送该第一数据流中的报文,该第一时间段和该第二时间段的时长等于该第一时长,该第二时间间隔大于或者等于该第一时间间隔。

Description

数据传输的方法和设备
本申请要求于2017年06月01日提交中国专利局、申请号为201710403008.6、申请名称为“数据传输的方法和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,并且更具体地,涉及一种数据传输的方法和设备。
背景技术
在通信网络中,数据中心(Data Center,DC)网络越来越广泛地被云服务提供商和企业所创建,同时,数据中心网络对网络流量的负载均衡效果直接关系到了用户体验。
在网络流量负载均衡时,会因为网络拥塞而出现丢包现象。网络拥塞丢包主要分为两种情况:一、本地负载不均衡,如等价多路径(Equal Cost Multi-Path,ECMP)、VLB方式,这两种方式只能做到流数的均衡,由于数据流的大小不一,因此无法做到流量的均衡,可能会出现多条大流(Large Flow)映射到同一条链路,导致汇聚流量超过端口容量,引起拥塞;二、由于Leaf交换机各自作负载均衡,可能会出现多个Leaf交换机上去往相同Leaf的多条大流发到同一个Spine交换机,导致下行流量汇聚拥塞。这两种情况的网络拥塞可能导致大流占尽某链路带宽,导致小流(Small Flow)无法转发;几条大流被调度到同一条链路,导致大流本身由于带宽不足引起丢包。
因此,如何克服因网络流量负载不均衡而造成的拥塞和丢包现象,增加数据传输可靠性,是一项亟待解决的问题。
发明内容
本申请提供一种数据传输的方法和设备,能够周期性地为一条数据流的前后两个组报文插入时延,主动构造子流,打散了大流,消除了交换网内端口的持续拥塞,负载均衡效果好,且易于实现,进而,克服了因网络流量负载不均衡而造成的拥塞和丢包现象,增加数据传输的可靠性。
第一方面,本申请实施例提供了一种数据传输的方法,包括:根据至少一条待发送的数据流和第一时间间隔,计算第一时长,该第一时间间隔为一个预设值,该至少一条待发送的数据流中不同的数据流具有不同的五元组;
发送第一数据流,该第一数据流属于该至少一条待发送的数据流,其中,
在第一时间段内发送该第一数据流的多个报文,以及在间隔第二时间间隔之后的第二时间段内发送该第一数据流中的报文,该第一时间段和该第二时间段的时长等于该第一时长,该第二时间间隔大于或者等于该第一时间间隔。
可选地,该数据传输的方法可以由远程直接数据存取(Remote Direct Memory Access, RDMA)网卡执行,也可以由接入TOR交换机执行。
因此,在本申请实施例的数据传输的方法中,在发送至少一条数据流时,主动为数据流构造子流,从而,打散了数据流,缓解了交换网内端口的持续拥塞。
进一步地,在交换机支持Flowlet负载均衡功能时,负载均衡效果好,且易于实现。
可选地,在第一方面的一种实现方式中,该方法还包括:
可选地,在第一方面的一种实现方式中,该方法还包括:
在该第二时间间隔内不发送该第一数据流的报文。
在该第二时间间隔内发送该至少一条待发送的数据流中除该第一数据流外的其他数据流的报文。
可选地,在第一方面的一种实现方式中,该第二时间间隔内发送的报文属于不同的数据流。
可选地,在第一方面的一种实现方式中,该方法还包括:
为该第二时间间隔内发送的属于不同的数据流的报文分别设置不同的用户数据协议(User Datagram Protocol,UDP)源端口号或者传输控制协议(Transmission Control Protocol,TCP)源端口号。
可选地,在第一方面的一种实现方式中,该方法还包括:
为该第一时间段和该第二时间段内发送的报文分别设置不同的UDP源端口号。
可选地,在第一方面的一种实现方式中,该方法还包括:
为一个第一时长内发送的报文设置相同的UDP源端口号。
因此,在本申请实施例的数据传输的方法中,为一条数据流的每个子流设置不同的UDP源端口号,在交换机支持基于五元组散列(Hash)的ECMP负载均衡功能时即可实现了每个子流的负载均衡,进而,克服了因网络流量负载不均衡而造成的拥塞和丢包现象,增加数据传输的可靠性。
可选地,在第一方面的一种实现方式中,该根据该至少一条待发送的数据流和第一时间间隔,计算第一时长,包括:
确定该待发送的数据流的数量;
该第一时长≥该第一时间间隔/(该待发送的数据流的数量-1)。
可选地,在第一方面的一种实现方式中,该确定该待发送的数据流的数量,包括:
在完成一条数据流的发送和/或新增数据流时,更新该待发送的数据流的数量。
可选地,在第一方面的一种实现方式中,该方法还包括:
根据该第一时长,计算一个第一时长内发送的报文数;
该发送第一数据流,包括:
连续发送该第一数据流的与计算得到的报文数相同的多个报文,以及在间隔该第二时间间隔之后连续发送该第一数据流中的报文。
可选地,在第一方面的一种实现方式中,该至少一条待发送的数据流为融合以太网承载的RDMA RoCEv2流。
第二方面,本申请实施例提供了一种数据传输的设备,可以执行第一方面或第一方面的任一可选的实现方式中的方法的模块或者单元。
第三方面,提供了一种数据传输的设备,包括存储器、收发器和处理器,该存储器上 存储有可以用于指示执行上述第一方面或其任意可选的实现方式的程序代码,收发器用于在处理器的驱动下执行具体的信号收发,当该代码被执行时,该处理器可以实现方法中设备执行的各个操作。
第四方面,提供了一种计算机存储介质,该计算机存储介质中存储有程序代码,该程序代码用于指示计算机执行上述第一方面或第一方面的任一种可能的实现方式中的方法的指令。
第五方面,提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
附图说明
图1是使用本申请的数据传输的方法的一种数据中心网络的示意图。
图2是使用本申请的数据传输的方法的另一种数据中心网络的示意图。
图3是本申请一个实施例的数据传输的方法的示意图。
图4是本申请实施例中RoCEv2协议的示意图。
图5是本申请实施例数据传输的一种示意图。
图6是本申请实施例的数据传输的另一种示意图。
图7是本申请实施例的数据传输的再一种示意图。
图8是本申请实施例的数据传输的再一种示意图。
图9是本申请实施例的数据传输过程中端口号设置的一种示意图。
图10是本申请实施例的数据传输过程中端口号设置的另一种示意图。
图11是本申请实施例的数据传输过程中端口号设置的再一种示意图。
图12是根据本申请实施例的一种数据传输的设备的示意性框图。
图13示出了本申请实施例提供的数据传输的设备的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
应理解,本申请的数据传输的方法可以应用于交换机(例如,CLOS网络中的接入TOR交换机),也可以应用于网卡(例如,集成于服务器中的远程直接数据存取网卡),具体地,可以在交换机或者网卡的芯片上实现本申请的方法。以下是以CLOS网络为例来描述使用本申请的数据传输的方法的数据中心网络的,本申请并不对此作任何限制。
图1是使用本申请的数据传输的方法的一种数据中心网络的示意图。如图1所示,该数据中心网络100为二级CLOS网络,具体包括:两种交换机,一种是接入TOR交换机,其下行端口连接服务器,上行端口连接核心SPINE交换机;另一种是SPINE交换机,SPINE交换机用于连接TOR交换机,其中,不同TOR交换机下的服务器通信,流量会经过SPINE交换机。
具体地,服务器之间的数据传输可以通过TOR交换机和SPINE交换机实现。例如,服务器Server#1向Server#3发送第一数据流的报文,首先,Server#1向TOR#1发送该第一数据流的报文;然后,TOR#1可以通过SPINE#1向TOR#3发送该第一数据流的报文,TOR#1也可以通过SPINE#2向TOR#3发送该第一数据流的报文;最后,TOR#3向该第 一数据流发送给Server#3的报文。在此第一数据流的发送过程中,可以在TOR#1和TOR#3处实现本申请所述的数据传输的方法,也可以在Server#1和Server#3处实现本申请所述的数据传输的方法。
图2是使用本申请的数据传输的方法的另一种数据中心网络的示意图。如图2所示,该数据中心网络200为三级CLOS网络,具体包括:三种交换机,一种是TOR交换机,其下行端口连接服务器,上行端口连接汇聚AGG交换机;另一种是AGG交换机,其下行端口连接TOR交换机,上行端口连接SPINE交换机;再一种是SPINE交换机,SPINE交换机用于连接AGG交换机,其中,不同TOR交换机下的服务器通信,流量会经过AGG交换机和SPINE交换机。
具体地,服务器之间的数据传输可以通过TOR交换机、AGG交换机和SPINE交换机实现。例如,服务器Server#1向Server#2发送第一数据流的报文,首先,Server#1向TOR#1发送该第一数据流的报文;然后,TOR#1可以通过AGG#1向SPINE#1发送该第一数据流的报文,TOR#1也可以通过AGG#1向SPINE#2发送该第一数据流的报文,TOR#1还可以通过AGG#2向SPINE#1发送该第一数据流的报文,TOR#1还可以通过AGG#2向SPINE#2发送该第一数据流的报文;接着,当该第一数据流的报文传输至SPINE#1时,SPINE#1可以通过AGG#1向TOR#2发送该第一数据流的报文,SPINE#1可以通过AGG#2向TOR#2发送该第一数据流的报文,当该第一数据流的报文传输至SPINE#2时,SPINE#2可以通过AGG#1向TOR#2发送该第一数据流的报文,SPINE#2可以通过AGG#2向TOR#2发送该第一数据流的报文;最后,TOR#2向该第一数据流的报文发送给Server#2。又例如,服务器Server#1向Server#2发送第一数据流的报文,首先,Server#1向TOR#1发送该第一数据流的报文;然后,TOR#1可以通过AGG#1直接向TOR#2发送该第一数据流的报文,TOR#1也可以通过AGG#2直接向TOR#2发送该第一数据流的报文;最后,TOR#2向该第一数据流的报文发送给Server#2。在此第一数据流的报文的发送过程中,可以在TOR#1和TOR#2处实现本申请所述的数据传输的方法,也可以在Server#1和Server#2处实现本申请所述的数据传输的方法。
应理解,图1和图2示出的数据中心网络100和数据中心网络200仅是二级CLOS网络和三级CLOS网络的简单示例,实际部署时服务器、TOR交换机、AGG交换机和SPINE交换机的数量可以根据网络规模、应用类型等因素确定。
还应理解,本申请实施例还可以应用于其他CLOS网络,如四级CLOS网络,或者,更高级别的CLOS网络,本申请对此并不做任何限制。
图3是本申请一个实施例的数据传输的方法300的示意图,该方法300的执行主体可以是集成于服务器中的RDMA网卡,也可以是TOR交换机。以下是以RDMA网卡为执行主体进行描述,如图3所示,该方法300包括:
S310,根据至少一条待发送的数据流和第一时间间隔,计算第一时长,该第一时间间隔为一个预设值,该至少一条待发送的数据流中不同的数据流具有不同的五元组。
可选地,该至少一条待发送的数据流为融合以太网承载的RDMA(RDMA over Converged Ethernet,RoCE)v2流。
可选地,该RoCEv2数据流可以是基于五元组散列(Hash)的ECMP负载均衡流。
可选地,五元组分别是指源因特网协议地址(Source Internet Protocol Address,src IP)、 目的因特网协议地址(Destination Internet Protocol Address,dst IP)、因特网协议地址协议(Internet Protocol Protocol,IP protocol)、源端口(Source Port,src Port)和目的端口(Destination Port,dst Port)。
可选地,如图4所示,在基于RoCEv2协议的数据流中,可以用以太类型(Ether Type)指出数据包是IP,可以用IP协议代码(IP protocol number)指出数据包是UDP,可以用UDP端口号码(UDP port number)指出下一个报头是IB.BTH。
可选地,在该至少一条待发送的数据流中不同的数据流具有不同的五元组。
应理解,五元组中任何一个元组不同的两条数据流为不同的数据流。例如,五元组中源端口号不同的两条数据流为不同的数据流。
还应理解,不同数据流进行五元组散列(Hash)运算时,可能得到相同的结果。
应理解,该至少一条数据流的报文在多条路径上传输时,多条路径中不同的路径之间可能会存在时延差。
可选地,该第一时间间隔为一个预设值,该第一时间间隔大于或者等于最大的路径时延差,可以用Flowlet Gap表示。
可选地,第一时长≥第一时间间隔/(待发送的数据流的数量-1)。应理解,此时该待发送的数据流的数量大于或者等于2。
可选地,在该待发送的数据流的数量为1时,该第一时长大于或者等于该第一时间间隔。
可选地,可以在完成一条数据流的发送和/或新增数据流时,更新该待发送的数据流的数量。
S320,发送第一数据流,该第一数据流属于该至少一条待发送的数据流,其中,
在第一时间段内发送该第一数据流的多个报文,以及在间隔第二时间间隔之后的第二时间段内发送该第一数据流中的报文,该第一时间段和该第二时间段的时长等于该第一时长,该第二时间间隔大于或者等于该第一时间间隔。
应理解,在确定该第一时长之后,该RDMA网卡周期性发送该第一数据流中的报文。可选地,该RDMA网卡可以每间隔一个第二时间间隔发送一次该第一数据流中的报文。
还应理解,第一时间段和第二时间段只是发送该第一数据流的任意两个相邻的时间段,本申请实施例对此并不做任何限定。
可选地,该RDMA网卡可以在每个时间段内发送相同数量的报文,例如,在第一时间段内发送5个报文,第二时间段内发送5个报文,…,最后一个时间段内发送5个报文。
可选地,该RDMA网卡可以在除最后一个时间段外的每个时间段内发送相同数量的报文,可以在最后一个时间段内发送少于其他时间段内的报文数,例如,在第一时间段内发送5个报文,第二时间段内发送5个报文,…,最后一个时间段内发送2个报文(最后一个时间段内只剩下了2个待发送的报文)。
例如,如图5所示,存在一条待发送的数据流(第一数据流),在一个第一时长内发送该第一数据流的6个报文,以及在间隔第二时间间隔之后的一个第一时长内再发送该第一数据流的6个报文,后续以此方式周期性发送该第一数据流。此时,该第一时长大于该第一时间间隔,该第二时间间隔等于该第一时间间隔(Flowlet Gap)。
可选地,在该第二时间间隔内不发送该第一数据流的报文。可选地,此时,在该第二 时间间隔内可以发送一些反馈帧,例如,ACK帧,也可以不发送任何报文。
可选地,在该第二时间间隔内发送该至少一条待发送的数据流中除该第一数据流外的其他数据流的报文。
例如,如图6所示,存在两条待发送的数据流(第一数据流和第二数据流),在一个第一时长内发送该第一数据流的6个报文,以及在间隔第二时间间隔之后的一个第一时长内再发送该第一数据流的6个报文;在一个第二时间间隔内发送第二数据流的6个报文,以及在间隔一个第一时长之后的一个第二时间间隔内发送该第二数据流的6个报文,后续以此方式周期性发送该第一数据流和该第二数据流。此时,该第一时长等于该第一时间间隔,该第二时间间隔等于该第一时间间隔(Flowlet Gap)。
又例如,如图7所示,存在三条待发送的数据流(第一数据流、第二数据流和第三数据流),在一个第一时长内发送该第一数据流的6个报文,以及在间隔第二时间间隔(两倍的第一时间间隔)之后的一个第一时长内再发送该第一数据流的6个报文;在第二时间间隔中的一个第一时间间隔内发送该第二数据流的6个报文,以及在另一个第一时间间隔内发送该第三数据流的6个报文,后续以此方式周期性发送该第一数据流、该第二数据流和该第三数据流。此时,该第一时长等于该第一时间间隔。
可选地,可以在该第二时间间隔内发送属于不同的数据流的报文。
例如,如图8所示,存在三条待发送的数据流(第一数据流、第二数据流和第三数据流),在一个第一时长内发送该第一数据流的6个报文,以及在间隔第二时间间隔之后的一个第一时长内再发送该第一数据流的6个报文;在第二时间间隔内先发送该第二数据流的3个报文,接着发送该第三数据流的3个报文,后续以此方式周期性发送该第一数据流、该第二数据流和该第三数据流。此时,该第一时长等于该第一时间间隔,该第二时间间隔等于该第一时间间隔(Flowlet Gap)。
应理解,在图5至图8的示例中,在第一时长内发送的报文数,以及在第二时间间隔内发送的报文数仅仅只是示例,本申请实施例对此并不做任何限定。
可选地,在确定该第一时长之后,可以根据该第一时长,计算一个第一时长内发送的报文数。
具体地,一个第一时长内发送的报文数=该第一时长*端口速率/(8*最大传输单元),其中,端口速率的单位是kbps,最大传输单元(Maximum Transmission Unit,MTU)的单位是字节,例如,以太网(Ethernet)协议中的MTU可以是1500字节,以太网点对点协议(Point to Point Protocol over Ethernet,PPPoE)中的MTU可以是1492字节。
应理解,端口速率可以是RDMA网卡发送报文的端口速率,也可以是TOR交换机发送报文的端口速率。
可选地,在计算得到一个第一时长内发送的报文数之后,该RDMA网卡可以连续发送该第一数据流的与计算得到的报文数相同的多个报文,以及在间隔第二时间间隔之后连续发送该第一数据流中的报文。
可选地,该RDMA网卡为连续两个第一时长内发送的报文分别设置不同的UDP源端口号或者TCP源端口号。
以下示例以设置UDP源端口号为例进行说明。
例如,如图9所示,该RDMA网卡为一个第一时长内发送的报文设置的UDP源端口 号为3123,为下一个第一时长内发送的报文设置的UDP源端口号为269。
可选地,该RDMA网卡为连续发送的属于不同数据流的报文分别设置不同的UDP源端口号或者TCP源端口号。
以下示例以设置UDP源端口号为例进行说明。
例如,如图10所示,该RDMA网卡为一个第一时长内发送的属于第一数据流的报文设置的UDP源端口号为3123,为接着在第二时间间隔内发送的属于第二数据流的报文设置的UDP源端口号为62320,以及为接着在一个第一时长内发送的属于第一数据流的报文设置的UDP源端口号为269。
又例如,如图11所示,该RDMA网卡为一个第一时长内发送的属于第一数据流的报文设置的UDP源端口号为4890,为接着在第二时间间隔内发送的属于第二数据流的报文设置的UDP源端口号为269,为接着在第二时间间隔内发送的属于第三数据流的报文设置的UDP源端口号为62320,以及为接着在一个第一时长内发送的属于第一数据流的报文设置的UDP源端口号为3123。
可选地,该RDMA网卡为一个第一时长内发送的报文设置相同的UDP源端口号或者TCP源端口号。
应理解,本申请实施例中的UDP端口或者TCP端口为逻辑端口,端口号的范围可以是0到65535。
还应理解,在图9至图11的示例中,该RDMA网卡设置的具体端口号仅仅只是示例,本申请实施例对此并不做任何限定。可选地,该RDMA网卡在需要设置端口号时,可以随机分配一个端口号。
可选地,该RDMA网卡可以为该至少一条待发送的数据流设置相同的UDP目的端口号或者TCP目的端口号。
可选地,在RoCEv2协议中,可以通过知名端口号来表示该至少一条待发送的数据流的目的端口号,如,可以将该至少一条待发送的数据流的UDP目的端口号设置为知名端口号4791。
可选地,如图1所示,可以在服务器Server和TOR交换机上实现本申请实施例所述的方法300。
例如,存在第一数据流需要从Server#1发送至Server#3,集成于Server#1的RDMA网卡可以通过在第一时间段内发送该第一数据流的多个报文,在间隔一个第二时间间隔之后的第二时间段内发送该第一数据流的多个报文,在间隔一个第二时间间隔之后的第三时间段内发送该第一数据流的多个报文,…,如此周期发送该第一数据流的报文,从而,主动为该第一数据流构造子流。
又例如,存在第一数据流需要从Server#1发送至Server#3,TOR#1在接收到该第一数据流的报文之后,可以通过在第一时间段内发送该第一数据流的多个报文,在间隔一个第二时间间隔之后的第二时间段内发送该第一数据流的多个报文,在间隔一个第二时间间隔之后的第三时间段内发送该第一数据流的多个报文,…,如此周期发送该第一数据流的报文,从而,主动为该第一数据流构造子流。
再例如,存在第一数据流需要从Server#1发送至Server#3,TOR#1在接收到该第一数据流的报文之后,可以通过在第一时间段内发送该第一数据流的多个报文,在间隔一个第 二时间间隔之后的第二时间段内发送该第一数据流的多个报文,在间隔一个第二时间间隔之后的第三时间段内发送该第一数据流的多个报文,…,如此周期发送该第一数据流的报文,并且为每个时间段内发送的报文分别设置不同的UDP源端口号或者TCP源端口号,使得不同时间段内发送的报文可以通过不同的路径进行发送,如图1中,第一时间段内的报文可以是从TOR#1发送至SPINE#1,然后,SPINE#1将接收到的报文发送至TOR#3,最后发送至Server#3;第二时间段内的报文可以是从TOR#1发送至SPINE#2,然后,SPINE#2将接收到的报文发送至TOR#3,最后发送至Server#3。
因此,在本申请实施例的数据传输的方法中,在发送至少一条数据流时,主动为数据流构造子流,从而,打散了数据流,消除了交换网内端口的持续拥塞,负载均衡效果好,且易于实现。
更进一步地,为一条数据流的每个子流设置不同的UDP源端口号或者TCP源端口号,在交换机支持基于五元组散列(Hash)的ECMP负载均衡功能时即可实现了每个子流的负载均衡,进而,克服了因网络流量负载不均衡而造成的拥塞和丢包现象,增加数据传输的可靠性。
图12是根据本申请实施例的一种数据传输的设备500的示意性框图。如图12所示,该设备500包括:
处理单元510,用于根据至少一条待发送的数据流和第一时间间隔,计算第一时长,该第一时间间隔为一个预设值,该至少一条待发送的数据流中不同的数据流具有不同的五元组;
发送单元520,用于发送第一数据流,该第一数据流属于该至少一条待发送的数据流,其中,
在第一时间段内发送该第一数据流的多个报文,以及在间隔第二时间间隔之后的第二时间段内发送该第一数据流中的报文,该第一时间段和该第二时间段的时长等于该第一时长,该第二时间间隔大于或者等于该第一时间间隔。
可选地,该发送单元520,还用于在该第二时间间隔内发送该至少一条待发送的数据流中除该第一数据流外的其他数据流的报文。
可选地,该第二时间间隔内发送的报文属于不同的数据流。
可选地,该处理单元510,还用于为该第一时间段和该第二时间段内发送的报文分别设置不同的UDP源端口号。
可选地,该处理单元510,还用于为一个第一时长内发送的报文设置相同的UDP源端口号。
可选地,该处理单元510,还用于确定该待发送的数据流的数量;
该第一时长≥该第一时间间隔/(该待发送的数据流的数量-1)。
可选地,该处理单元510,还用于在完成一条数据流的发送和/或新增数据流时,更新该待发送的数据流的数量。
可选地,该处理单元510,还用于根据该第一时长,计算一个第一时长内发送的报文数;
该发送单元520,还用于连续发送该第一数据流的与计算得到的报文数相同的多个报文,以及在间隔第二时间间隔之后连续发送该第一数据流中的报文。
可选地,该至少一条待发送的数据流为融合以太网承载的RDMA RoCEv2流。
应理解,根据本申请实施例的一种数据传输的设备500中的各个单元的上述和其它操作和/或功能分别为了实现图3中的方法300中RDMA网卡或者TOR交换机的相应流程,为了简洁,在此不再赘述。
图13示出了本申请实施例提供的数据传输的设备600的示意性框图,该设备600包括:
存储器610,用于存储程序,该程序包括代码;
收发器620,用于和其他设备进行通信;
处理器630,用于执行存储器610中的程序代码。
可选地,当该代码被执行时,该处理器630可以实现图3中的方法300中RDMA网卡或者TOR交换机执行的各个操作,为了简洁,在此不再赘述。此时,该设备600可以为集成于服务器中的RDMA网卡或者TOR交换机。收发器620用于在处理器630的驱动下执行具体的信号收发。
应理解,在本申请实施例中,该处理器630可以是中央处理单元(Central Processing Unit,CPU),该处理器630还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器610可以包括只读存储器和随机存取存储器,并向处理器630提供指令和数据。存储器610的一部分还可以包括非易失性随机存取存储器。例如,存储器610还可以存储设备类型的信息。
收发器620可以是用于实现信号发送和接收功能,例如频率调制和解调功能或叫上变频和下变频功能。
在实现过程中,上述方法的至少一个步骤可以通过处理器630中的硬件的集成逻辑电路完成,或该集成逻辑电路可在软件形式的指令驱动下完成该至少一个步骤。因此,该数据传输的设备600可以是个芯片或者芯片组。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器630读取存储器中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划 分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
该作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种数据传输的方法,其特征在于,包括:
    根据至少一条待发送的数据流和第一时间间隔,计算第一时长,所述第一时间间隔为一个预设值,所述至少一条待发送的数据流中不同的数据流具有不同的五元组;
    发送第一数据流,所述第一数据流属于所述至少一条待发送的数据流,其中,
    在第一时间段内发送所述第一数据流的多个报文,以及在间隔第二时间间隔之后的第二时间段内发送所述第一数据流中的报文,所述第一时间段和所述第二时间段的时长等于所述第一时长,所述第二时间间隔大于或者等于所述第一时间间隔。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在所述第二时间间隔内发送所述至少一条待发送的数据流中除所述第一数据流外的其他数据流的报文。
  3. 根据权利要求2所述的方法,其特征在于,所述第二时间间隔内发送的报文属于不同的数据流。
  4. 根据权利要求1至3中任一所述的方法,其特征在于,所述方法还包括:
    为所述第一时间段和所述第二时间段内发送的报文分别设置不同的用户数据协议UDP源端口号。
  5. 根据权利要求1至4中任一所述的方法,其特征在于,所述方法还包括:
    为一个第一时长内发送的报文设置相同的UDP源端口号。
  6. 根据权利要求1至5中任一所述的方法,其特征在于,所述根据至少一条待发送的数据流和第一时间间隔,计算第一时长,包括:
    确定待发送的数据流的数量;
    所述第一时长≥所述第一时间间隔/(所述待发送的数据流的数量-1)。
  7. 根据权利要求6所述的方法,其特征在于,所述确定所述待发送的数据流的数量,包括:
    在完成一条数据流的发送和/或新增数据流时,更新所述待发送的数据流的数量。
  8. 根据权利要求1至7中任一所述的方法,其特征在于,所述方法还包括:
    根据所述第一时长,计算一个第一时长内发送的报文数;
    所述发送第一数据流,包括:
    连续发送所述第一数据流的与计算得到的报文数相同的多个报文,以及在间隔所述第二时间间隔之后连续发送所述第一数据流中的报文。
  9. 根据权利要求1至8中任一所述的方法,其特征在于,所述至少一条待发送的数据流为融合以太网承载的RDMA RoCEv2流。
  10. 一种数据传输的设备,其特征在于,包括:
    处理单元,用于根据至少一条待发送的数据流和第一时间间隔,计算第一时长,所述第一时间间隔为一个预设值,所述至少一条待发送的数据流中不同的数据流具有不同的五元组;
    发送单元,用于发送第一数据流,所述第一数据流属于所述至少一条待发送的数据流, 其中,
    在第一时间段内发送所述第一数据流的多个报文,以及在间隔第二时间间隔之后的第二时间段内发送所述第一数据流中的报文,所述第一时间段和所述第二时间段的时长等于所述第一时长,所述第二时间间隔大于或者等于所述第一时间间隔。
  11. 根据权利要求10所述的设备,其特征在于,所述发送单元,还用于在所述第二时间间隔内发送所述至少一条待发送的数据流中除所述第一数据流外的其他数据流的报文。
  12. 根据权利要求11所述的设备,其特征在于,所述第二时间间隔内发送的报文属于不同的数据流。
  13. 根据权利要求10至12中任一所述的设备,其特征在于,所述处理单元,还用于为所述第一时间段和所述第二时间段内发送的报文分别设置不同的用户数据协议UDP源端口号。
  14. 根据权利要求10至13中任一所述的设备,其特征在于,
    所述处理单元,还用于为一个第一时长内发送的报文设置相同的UDP源端口号。
  15. 根据权利要求10至14中任一所述的设备,其特征在于,所述处理单元,还用于:
    确定所述待发送的数据流的数量;
    所述第一时长≥所述第一时间间隔/(所述待发送的数据流的数量-1)。
  16. 根据权利要求15所述的设备,其特征在于,所述处理单元,还用于在完成一条数据流的发送和/或新增数据流时,更新所述待发送的数据流的数量。
  17. 根据权利要求10至16中任一所述的设备,其特征在于,
    所述处理单元,还用于根据所述第一时长,计算一个第一时长内发送的报文数;
    所述发送单元,还用于连续发送所述第一数据流的与计算得到的报文数相同的多个报文,以及在间隔所述第二时间间隔之后连续发送所述第一数据流中的报文。
  18. 根据权利要求10至17中任一所述的设备,其特征在于,所述至少一条待发送的数据流为融合以太网承载的RDMA RoCEv2流。
PCT/CN2018/085942 2017-06-01 2018-05-08 数据传输的方法和设备 WO2018219100A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18809629.1A EP3637704B1 (en) 2017-06-01 2018-05-08 Data transmission method and device
US16/699,352 US11140082B2 (en) 2017-06-01 2019-11-29 Data transmission method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710403008.6 2017-06-01
CN201710403008.6A CN108989237B (zh) 2017-06-01 2017-06-01 数据传输的方法和设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/699,352 Continuation US11140082B2 (en) 2017-06-01 2019-11-29 Data transmission method and device

Publications (1)

Publication Number Publication Date
WO2018219100A1 true WO2018219100A1 (zh) 2018-12-06

Family

ID=64455161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/085942 WO2018219100A1 (zh) 2017-06-01 2018-05-08 数据传输的方法和设备

Country Status (4)

Country Link
US (1) US11140082B2 (zh)
EP (1) EP3637704B1 (zh)
CN (1) CN108989237B (zh)
WO (1) WO2018219100A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11374865B2 (en) * 2018-07-02 2022-06-28 Marvell Israel (M.I.S.L) Ltd. Group specific load balancing in network devices
CN111858418B (zh) * 2019-04-30 2023-04-07 华为技术有限公司 一种基于远程直接内存访问rdma的内存通信方法及装置
CN113099488B (zh) * 2019-12-23 2024-04-09 中国移动通信集团陕西有限公司 解决网络拥塞的方法、装置、计算设备及计算机存储介质
US11570239B2 (en) 2020-04-20 2023-01-31 Cisco Technology, Inc. Distributed resilient load-balancing for multipath transport protocols
CN114079638A (zh) * 2020-08-17 2022-02-22 中国电信股份有限公司 多协议混合网络的数据传输方法、装置和存储介质
CN112565102B (zh) * 2020-11-30 2022-11-11 锐捷网络股份有限公司 一种负载均衡方法、装置、设备及介质
US11765237B1 (en) * 2022-04-20 2023-09-19 Mellanox Technologies, Ltd. Session-based remote direct memory access

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141320A (zh) * 2007-08-07 2008-03-12 中兴通讯股份有限公司 一种产生网络流量的方法及其装置
CN103262471A (zh) * 2010-11-05 2013-08-21 意大利电信股份公司 对通信网络中的数据流的测量
CN104023006A (zh) * 2014-05-09 2014-09-03 东北大学 一种基于应用层中继的多径传输系统及方法
CN104539483A (zh) * 2014-12-31 2015-04-22 中国电子科技集团公司第五十研究所 网络测试系统
CN105591974A (zh) * 2014-10-20 2016-05-18 华为技术有限公司 报文处理方法、装置及系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102008017290A1 (de) * 2007-12-11 2009-06-18 Rohde & Schwarz Gmbh & Co. Kg Verfahren und Vorrichtung zur Bildung eines gemeinsamen Datenstroms insbesondere nach dem ATSC-Standard
US8516101B2 (en) 2009-06-15 2013-08-20 Qualcomm Incorporated Resource management for a wireless device
US20120207020A1 (en) * 2009-10-31 2012-08-16 Hui Li Load-Balancing Structure for Packet Switches with Minimum Buffers Complexity and its Building Method
US9548924B2 (en) * 2013-12-09 2017-01-17 Nicira, Inc. Detecting an elephant flow based on the size of a packet
US10306692B2 (en) * 2014-07-07 2019-05-28 Telefonaktiebolaget Lm Ericsson (Publ) Multi-path transmission control protocol
US9906425B2 (en) * 2014-07-23 2018-02-27 Cisco Technology, Inc. Selective and dynamic application-centric network measurement infrastructure
GB201502257D0 (en) * 2015-02-11 2015-04-01 Nat Univ Ireland A method of transmitting data between a source node and destination node
US9923828B2 (en) * 2015-09-23 2018-03-20 Cisco Technology, Inc. Load balancing with flowlet granularity
US10123371B2 (en) * 2015-10-02 2018-11-06 Avago Technologies General Ip (Singapore) Pte. Ltd. Systems and methods for LTE-WAN aggregation
US9813338B2 (en) * 2015-12-10 2017-11-07 Cisco Technology, Inc. Co-existence of routable and non-routable RDMA solutions on the same network interface

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141320A (zh) * 2007-08-07 2008-03-12 中兴通讯股份有限公司 一种产生网络流量的方法及其装置
CN103262471A (zh) * 2010-11-05 2013-08-21 意大利电信股份公司 对通信网络中的数据流的测量
CN104023006A (zh) * 2014-05-09 2014-09-03 东北大学 一种基于应用层中继的多径传输系统及方法
CN105591974A (zh) * 2014-10-20 2016-05-18 华为技术有限公司 报文处理方法、装置及系统
CN104539483A (zh) * 2014-12-31 2015-04-22 中国电子科技集团公司第五十研究所 网络测试系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3637704A4

Also Published As

Publication number Publication date
EP3637704A1 (en) 2020-04-15
CN108989237B (zh) 2021-03-23
US20200099620A1 (en) 2020-03-26
US11140082B2 (en) 2021-10-05
CN108989237A (zh) 2018-12-11
EP3637704A4 (en) 2020-04-15
EP3637704B1 (en) 2023-07-26

Similar Documents

Publication Publication Date Title
WO2018219100A1 (zh) 数据传输的方法和设备
CN111682952B (zh) 针对体验质量度量的按需探测
EP3618372B1 (en) Congestion control method and network device
US10735323B2 (en) Service traffic allocation method and apparatus
US8897130B2 (en) Network traffic management
WO2019055578A1 (en) PROTOCOL FOR DYNAMIC BOND STATE ROUTING
US9584443B2 (en) Methods and systems for transmitting data through an aggregated connection
CN106716376B (zh) 从本地库提供针对网络连接的功能要求
US9025451B2 (en) Positive feedback ethernet link flow control for promoting lossless ethernet
CN104052684A (zh) 动态适配计算机网络中的最大传输单元大小的方法和系统
US11108699B2 (en) Method, apparatus, and system for implementing rate adjustment at transmit end
CN107770085B (zh) 一种网络负载均衡方法、设备及系统
WO2021244450A1 (zh) 一种通信方法及装置
JP2018511275A (ja) Tcpトンネル及びネイティブtcp情報に基づくバンドリングシナリオにおけるパケットのスケジューリングのための方法及びシステム
KR20200015777A (ko) 패킷 전송 방법, 프록시 서버 및 컴퓨터 판독가능 저장 매체
US10135761B2 (en) Switch device, control method, and storage medium
JP5775214B2 (ja) 適応性の伝送キュー長を用いたデータパケット損失低減システムおよび方法
CN106302213A (zh) 一种数据传输的方法及装置
CN113612698A (zh) 一种数据包发送方法及装置
Halepoto et al. Management of buffer space for the concurrent multipath transfer over dissimilar paths
US20210281524A1 (en) Congestion Control Processing Method, Packet Forwarding Apparatus, and Packet Receiving Apparatus
CN112714072B (zh) 一种调整发送速率的方法及装置
US20200145478A1 (en) Method, electronic device, and computer program product for handling congestion of data transmission
US20220070736A1 (en) Traffic steering device
JP2009200580A (ja) ノード装置および帯域制御方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18809629

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018809629

Country of ref document: EP

Effective date: 20191202