WO2023207521A1 - 流量拥塞控制方法、装置、计算机可读介质及电子设备 - Google Patents

流量拥塞控制方法、装置、计算机可读介质及电子设备 Download PDF

Info

Publication number
WO2023207521A1
WO2023207521A1 PCT/CN2023/085795 CN2023085795W WO2023207521A1 WO 2023207521 A1 WO2023207521 A1 WO 2023207521A1 CN 2023085795 W CN2023085795 W CN 2023085795W WO 2023207521 A1 WO2023207521 A1 WO 2023207521A1
Authority
WO
WIPO (PCT)
Prior art keywords
bandwidth
transmission protocol
traffic
computer program
virtual machine
Prior art date
Application number
PCT/CN2023/085795
Other languages
English (en)
French (fr)
Inventor
张志明
姜立东
江卓
廖惟博
王剑
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023207521A1 publication Critical patent/WO2023207521A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0896Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/19Flow control; Congestion control at layers above the network layer
    • H04L47/196Integration of transport layer protocols, e.g. TCP and UDP

Definitions

  • the present disclosure relates to the field of data transmission technology, and specifically, to a traffic congestion control method, device, computer-readable medium electronic equipment, computer program and computer program product.
  • Cloud computing has developed for more than ten years since its introduction and has become an important part of the Internet. More and more enterprises are willing to deploy servers on public cloud platforms. Enterprises expect to use the powerful computing power of the cloud platform to complete a large amount of background processing within the cloud platform and provide simple terminal services through the Internet. However, cloud computing platforms face a challenge. With the failure of Moore's Law, the performance of the central processing unit (CPU) of cloud computing servers is slowly improving, and network bandwidth is growing much faster than the growth rate of CPU processing performance, causing the CPU to be used too much for network communications. In order to speed up business processing, especially for big data processing, users expect network bandwidth to be as high as possible and latency to be as low as possible. This results in a large part of the computing resources purchased by users being used for network communication and wasted.
  • CPU central processing unit
  • RDMA Remote Direct Memory Access
  • RoCE RDMA over Converged Ethernet
  • the present disclosure provides a traffic congestion control method, which includes: allocating network card bandwidth to virtual machines on the same physical machine; the network interface of the virtual machine is used to transmit first transmission protocol traffic and second transmission protocol traffic ;
  • the first transmission protocol and the second transmission protocol are different types of transmission protocols that share the same physical network; for any virtual machine on the physical machine, the total bandwidth of the virtual machine is divided into the first bandwidth, the third Two bandwidths and buffer bandwidths; the first bandwidth is used to transmit the first transmission protocol traffic, the second bandwidth is used to transmit the second transmission protocol traffic; and the actual bandwidth used according to the first transmission protocol and the actual usage bandwidth of the second transmission protocol, and reallocate the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol.
  • the present disclosure provides a traffic congestion control device, including: a processing module for allocating network card bandwidth to virtual machines on the same physical machine; the network interface of the virtual machine is used for transmitting first transmission protocol traffic and The second transmission protocol traffic; the first transmission protocol and the second transmission protocol are different types of transmission protocols sharing the same physical network; the processing module is also used to transfer the said The total bandwidth of the virtual machine is divided into a first bandwidth, a second bandwidth and a buffer bandwidth; the first bandwidth is used to transmit the first transmission protocol traffic, and the second bandwidth is used to transmit the second transmission protocol traffic; An allocation module, configured to redistribute the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol according to the actual usage bandwidth of the first transmission protocol and the actual usage bandwidth of the second transmission protocol.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, which implements the steps of the aforementioned traffic congestion control method when executed by a processing device.
  • the present disclosure provides an electronic device, including: a storage device on which at least one computer program is stored; and at least one processing device for executing the at least one computer program in the storage device to implement The steps of the aforementioned traffic congestion control method.
  • the present disclosure provides a computer program, the computer program comprising program code executable by a processing device, and when the processing device executes the computer program, the steps of the method of the first aspect are implemented.
  • the present disclosure provides a computer program product.
  • the computer program product includes a computer program carried on a non-transitory computer-readable medium.
  • the computer program includes program code executable by a processing device. When the processing device executes The computer program implements the steps of the method described in the first aspect.
  • Figure 1 is a schematic diagram of bandwidth allocation of a virtual machine provided by an exemplary embodiment of the present disclosure.
  • Figure 2 is a flow chart of a traffic congestion control method provided by an exemplary embodiment of the present disclosure.
  • Figure 3 is a flow chart of processing incoming traffic of a virtual machine provided by an exemplary embodiment of the present disclosure.
  • Figure 4 is a block diagram of a traffic congestion control device provided by an exemplary embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.
  • 120-switch 120-switch; 140-cloud computing server; 20-traffic congestion device; 201-processing module; 203-distribution module; 600-electronic equipment; 601-processing device; 602-ROM; 603-RAM; 604-bus; 605- I/O interface; 606-input device; 607-output device; 608-storage device; 609-communication device.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.” It should be noted that concepts such as “first” and “second” mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units. Or interdependence.
  • a prompt message is sent to the user to clearly remind the user that the operation requested will require the acquisition and use of the user's personal information. Therefore, users can autonomously choose whether to provide personal information to software or hardware such as electronic devices, applications, servers or storage media that perform the operations of the technical solution of the present disclosure based on the prompt information.
  • the method of sending prompt information to the user may be, for example, a pop-up window, and the prompt information may be presented in the form of text in the pop-up window.
  • the pop-up window can also contain a selection control for the user to choose "agree” or "disagree” to provide personal information to the electronic device.
  • Cloud computing services have become an important part of the Internet.
  • the growth rate of network bandwidth is much faster than the growth rate of CPU processing performance.
  • the network bandwidth resources configured for the same cloud computing server are much larger than Computing resources, causing the CPU to be used too much for network communication. As a result, a large part of the computing resources purchased by users are used for network communication and wasted.
  • RDMA technology solves the above problems very well.
  • Ethernet as the most important access standard for the Internet, has become very popular. Almost all data centers use Ethernet.
  • the RoCE standard was proposed to run RDMA technology on Ethernet.
  • the RoCE standard enables RDMA technology to be seamlessly connected to Ethernet, reducing deployment resistance.
  • the RoCE standard can only run within the data center. When providing external services, it still relies on the traditional TCP/IP protocol.
  • the RoCE standard With the RoCE standard, RDMA technology can be deployed using Ethernet equipment. However, the RoCE standard is a different protocol from TCP/IP. How to allocate bandwidth between RoCE traffic and TCP traffic on the same physical network is a problem. In the multi-tenant scenario of cloud computing platform, some special Quality of Service (QoS) mechanisms are needed to ensure that both can run smoothly.
  • QoS Quality of Service
  • TC Linux kernel traffic control
  • qdisc queuing rule
  • class category
  • filter(filter) Classifiable qdiscs include algorithms such as CBQ, HTB and PRIO.
  • CBQ is the abbreviation of Class Based Queueing. It implements a rich connection sharing class structure, with the ability to limit (shaping) bandwidth and bandwidth priority management.
  • HTB is the abbreviation of Hierarchy Token Bucket. It has a rich connection sharing category system, which can guarantee the bandwidth of each transmission protocol category, and also allows it to break through the bandwidth limit of its own category and occupy the bandwidth of other transmission protocols.
  • PRIO is the abbreviation of Priority. It cannot limit bandwidth, but it can prioritize traffic well.
  • Each interface of the Ethernet switch has 8 QoS queues.
  • the traffic type can be identified through the Differentiated Services Code Point (DSCP) field in the IP protocol header of each data packet, and RoCE traffic and non-RoCE traffic can be classified. Finally, it is sent to different queues for rate limiting and scheduling.
  • DSCP Differentiated Services Code Point
  • the switch can identify RoCE and TCP traffic (that is, TCP/IP traffic), and perform rate limiting and scheduling.
  • TCP traffic that is, TCP/IP traffic
  • the QoS queue on the switch is limited, and the RoCE traffic of different virtual machines uses the same DSCP value, making it impossible to limit and schedule the traffic of multiple virtual machines at the same time.
  • RoCE traffic and TCP traffic share the same physical network.
  • the same network interface in the user virtual machine can transmit both RDMA traffic and TCP traffic. Because the reliable transmission and congestion control functions of RoCE traffic are on the network card, it is necessary to limit the speed of RoCE traffic and TCP traffic on the network card.
  • an exemplary embodiment of the present disclosure provides a traffic congestion control method.
  • the network card bandwidth is allocated to a virtual machine on the same physical machine (such as a switch or cloud computing server), and the network interface of the virtual machine is used to transmit the first transmission protocol traffic and the second transmission protocol traffic, where The first transmission protocol and the second transmission protocol are different types of transmission protocols sharing the same physical network.
  • the first transmission protocol is RoCE
  • the second transmission protocol is TCP/IP
  • the network card is an RDMA network card.
  • FIG. 1 shows a schematic diagram of bandwidth allocation of a virtual machine provided by an exemplary embodiment of the present disclosure.
  • the switch 120 includes a switch 120 and a cloud computing server 140.
  • the network card bandwidth is 200Gbps
  • the cloud computing server 140 hosts 5 virtual machines (Virtual Machine, VM), namely VM-1, VM-2, VM-3, VM-4, and VM-5.
  • the network of the virtual machine The interface is used to transmit RoCE traffic and TCP traffic.
  • 40G bandwidth can be allocated to each virtual machine.
  • the switch 120 can identify the traffic type of the data packet through the DSCP field of the IP protocol header of each data packet, classify the RoCE traffic and TCP traffic, and send Limit and schedule the different speed limiters (Meters) of the RDMA network card.
  • Methoders Different speed limiters
  • the RDMA network card supports the rte_flow interface of the Data Plane Development Kit (DPDK).
  • the rte_flow interface includes some rules, such as classifying traffic, speed limiting and scheduling of traffic through Meter, and based on registers. The value is used to perform traffic rate limiting and scheduling. Therefore, by cascading Meter and these rules, the rate limiting and bandwidth borrowing functions of RoCE traffic and TCP traffic can be realized.
  • FIG 2 is a flow chart of a traffic congestion control method provided by an exemplary embodiment of the present disclosure.
  • the method is performed by an electronic device, for example, by a switch or a cloud computing server shown in Figure 1 .
  • the traffic congestion control method shown in Figure 2 includes the following steps:
  • step S101 for any virtual machine on the physical machine, the total bandwidth of the virtual machine is divided into a first bandwidth, a second bandwidth and a buffer bandwidth.
  • the first bandwidth is used to transmit the first transmission protocol traffic
  • the second bandwidth is used to transmit the second transmission protocol traffic.
  • the first bandwidth is used to transmit RoCE traffic
  • the second bandwidth is used to transmit TCP traffic.
  • each virtual machine can be allocated a total bandwidth of 40G, and RoCE traffic and TCP traffic can be allocated 30G and 10G bandwidth respectively.
  • 5G is used as a buffer bandwidth to reduce the probability of transitional preemption of RoCE traffic and TCP traffic.
  • the value of buffer bandwidth can also be adjusted specifically according to online operation conditions, and is not limited here. For example, when the buffer bandwidth is 5G, if both RoCE and TCP are trying their best to send data packets, RoCE traffic can reach 30G and TCP traffic can reach 10G; if TCP traffic is 0, RoCE can reach up to 35G by preempting TCP bandwidth. (30+10-5).
  • step S102 the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol are reallocated according to the actual usage bandwidth of the first transmission protocol and the actual usage bandwidth of the second transmission protocol.
  • the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol may be reallocated through an RDMA network card.
  • the RDMA network card is located on the physical machine and can limit the speed and schedule any virtual machine on the physical machine.
  • the RDMA network card includes a classifier, a first register, a second register, a third register, a first speed limiter and a second speed limiter. speed limiter, wherein the classifier is used to identify the type of traffic passing through the network interface of the virtual machine, the first register records the type of traffic passing through the network interface of the virtual machine, the first speed limiter and the second speed limiter implement RoCE traffic and For rate limiting and scheduling of TCP traffic, the second register and the third register are used to record the results of flow control.
  • FIG. 3 is a flow chart of processing incoming traffic of a virtual machine provided by an exemplary embodiment of the present disclosure.
  • incoming traffic passing through the network interface of the virtual machine first identify the type of the incoming traffic through the classifier of the RDMA network card, whether it is RoCE traffic or TCP traffic, and then record the type of incoming traffic through the first register R1;
  • the speed limiter controls RoCE traffic and TCP traffic for the first time.
  • the first speed limiter can control the RoCE traffic speed limit to 30G and the TCP traffic speed limit to 10G.
  • the result of the first traffic control and the value of R2 are stored in the second register R2.
  • G or Y where G indicates that RoCE traffic or TCP traffic has not reached the maximum speed limit of the first speed limiter, and Y indicates that RoCE traffic or TCP traffic has reached the maximum speed limit of the first speed limiter; through the second speed limiter Control RoCE traffic and TCP traffic for the second time.
  • G indicates that RoCE traffic or TCP traffic has not reached the maximum speed limit of the first speed limiter
  • Y indicates that RoCE traffic or TCP traffic has reached the maximum speed limit of the first speed limiter
  • the bandwidth allocation settings for several scenarios are as follows: the virtual machine is allocated a total bandwidth of 40G, the buffer bandwidth is 0, RoCE traffic is allocated a bandwidth of 30G, and TCP traffic is allocated a bandwidth of 10G.
  • Scenario 1 When the total bandwidth is remaining, the first transmission protocol or the second transmission protocol is controlled to occupy the remaining portion of the total bandwidth.
  • the bandwidth actually used by RoCE traffic and TCP traffic may be remaining, or one of them may be remaining.
  • the party without remaining bandwidth can be controlled to borrow the bandwidth of the remaining party, that is, borrowing the total bandwidth. remaining portion of the bandwidth.
  • Scenario 2 When the bandwidth actually used by the first transmission protocol is less than the first bandwidth and there is no remaining total bandwidth, the second transmission protocol is controlled to occupy part of the traffic of the first bandwidth.
  • Scenario 3 When the bandwidth actually used by the second transmission protocol is less than the second bandwidth and there is no remaining total bandwidth, the first transmission protocol is controlled to occupy part of the second bandwidth.
  • Scenario 4 When the bandwidth actually used by the first transmission protocol is equal to the first bandwidth and there is no remaining bandwidth in the total, a congestion flag is set in the data packet of the second transmission protocol to reduce the bandwidth actually used by the second transmission protocol.
  • the actual bandwidth used by RoCE traffic gradually increases and reaches 30G. At this time, the total bandwidth is no longer remaining, indicating that the actual bandwidth used by TCP traffic has also reached 10G. Since RoCE traffic is more sensitive to packet loss, packet loss is directly used. Rate limiting in this way will seriously affect the performance of the RoCE protocol. Therefore, the congestion flag is set in the data packets of TCP traffic to reduce the actual bandwidth used by TCP traffic and avoid traffic congestion.
  • the specific method of setting the congestion flag in the data packet of TCP traffic includes: after the intermediate device (such as a switch) that receives the TCP traffic data packet discovers that the network is congested, it then sets the congestion flag in the data packet, and the receiving end receives the congestion flag set. After the data packet, a speed reduction instruction packet is sent to the sending end. The speed reduction indication packet is used to instruct the sending end to send fewer data packets to achieve the purpose of reducing traffic.
  • Scenario 5 When the bandwidth actually used by the second transmission protocol is equal to the second bandwidth and there is no remaining bandwidth in the total, the second transmission protocol is controlled to adopt a packet loss fallback mechanism to reduce the actual usage of the first transmission protocol and the second transmission protocol. bandwidth.
  • the actual bandwidth used by TCP traffic gradually increases and reaches 10G. At this time, the total bandwidth is no longer remaining, indicating that the actual bandwidth used by RoCE traffic has also reached 30G. Since RoCE traffic is more sensitive to packet loss, packet loss is used directly. Rate limiting in this way will seriously affect the performance of the RoCE protocol. Therefore, a packet loss fallback mechanism is used to control TCP traffic to reduce the actual bandwidth used by TCP traffic and avoid traffic congestion.
  • the packet loss fallback mechanism refers to discarding the data packet currently being transmitted (for the above example, the data packet is a TCP data packet). After waiting for a predetermined time, when it is detected that the total bandwidth is remaining, the previously discarded data packet will be retransmitted. .
  • Scenario 6 When the sum of the bandwidth actually used by the first transmission protocol and the bandwidth actually used by the second transmission protocol is greater than the predetermined threshold, both the first transmission protocol and the second transmission protocol are controlled to adopt a packet loss fallback mechanism.
  • the predetermined threshold is greater than the total bandwidth, and may be, for example, but not limited to 45G.
  • the virtual machine is allocated a total bandwidth of 40G. For example, the sum of the actual bandwidth used by RoCE traffic and TCP traffic has reached 45G. At this time, the network is overwhelmed and very congested. The congestion situation at this time can be recorded through the third register, and the third register can be used to record the congestion situation at this time. The value of the third register is set to red. After the RDMA network card detects that the value of the third register is red, it controls both RoCE traffic and TCP traffic to use the packet loss fallback mechanism to avoid traffic congestion.
  • TCP traffic can borrow the bandwidth of RoCE traffic; when the actual bandwidth used by TCP traffic does not reach 10G, RoCE traffic can borrow the bandwidth of TCP traffic; when RoCE traffic When both TCP and TCP traffic are competing for 40G bandwidth, the available bandwidth of the two can be controlled to achieve a balance of 30:10.
  • the traffic congestion control method provided by the present disclosure can also be used for traffic allocation and scheduling.
  • the traffic congestion control method includes allocating network card bandwidth to virtual machines on the same physical machine.
  • the network interface of the virtual machine is used to transmit the first transmission protocol traffic and the second transmission protocol traffic.
  • the transmission protocol and the second transmission protocol are different types of transmission protocols that share the same physical network; for any virtual machine on the physical machine, the total bandwidth of the virtual machine is divided into the first bandwidth, the second bandwidth and the buffer bandwidth.
  • One bandwidth is used to transmit the first transmission protocol traffic, and the second bandwidth is used to transmit the second transmission protocol traffic; the bandwidth of the first transmission protocol is reallocated according to the actual usage bandwidth of the first transmission protocol and the actual usage bandwidth of the second transmission protocol. and the bandwidth of the second transport protocol. It can effectively control the bandwidth ratio between the first transmission protocol and the second transmission protocol, allocate a certain proportion of buffer bandwidth to avoid traffic congestion, and ensure the performance of different transmission protocols running on the same physical network.
  • FIG. 4 is a block diagram of a traffic congestion control device according to an exemplary embodiment of the present disclosure.
  • the device 20 includes a processing module 201 and a distribution module 203 .
  • the processing module 201 is used to allocate network card bandwidth to virtual machines on the same physical machine; the network interface of the virtual machine is used to transmit the first transmission protocol traffic and the second transmission protocol traffic; the first transmission protocol and the second transmission protocol For different types of transmission protocols sharing the same physical network;
  • the processing module 201 is also used to divide the total bandwidth of the virtual machine into a first bandwidth, a second bandwidth and a buffer bandwidth for any virtual machine on the physical machine; the first bandwidth is used to transmit the First transmission protocol traffic, the second bandwidth is used to transmit the second transmission protocol traffic;
  • the allocation module 203 is configured to reallocate the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol according to the actual usage bandwidth of the first transmission protocol and the actual usage bandwidth of the second transmission protocol.
  • the allocation module 203 is also configured to reallocate the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol through a remote direct data access RDMA network card.
  • the RDMA network card includes a classifier, a first register, a second register, a third register, a first speed limiter and a second speed limiter; the processing module 201 is also used to
  • the second register stores the result of the first flow control through the first speed limiter
  • the third register stores the result of the second flow control through the first speed limiter.
  • the allocation module 203 is also configured to: if the total bandwidth remains, control the first transmission protocol or the second transmission protocol to occupy the remaining portion of the total bandwidth;
  • a congestion flag is set in the data packet of the second transmission protocol to reduce the risk of the second transmission.
  • the second transmission protocol is controlled to adopt a packet loss fallback mechanism to reduce the cost of the first transmission protocol. and the bandwidth actually used by the second transmission protocol.
  • the allocation module 203 is also configured to control the first transmission protocol and the sum of the bandwidth actually used by the first transmission protocol and the bandwidth actually used by the second transmission protocol is greater than a predetermined threshold.
  • the second transmission protocol all adopts a packet loss fallback mechanism; the predetermined threshold is greater than the total bandwidth.
  • the first transmission protocol is RoCE
  • the second transmission protocol is TCP
  • the allocation module 203 is also used to control inbound traffic and/or outbound traffic of the virtual machine through the RDMA network card.
  • FIG. 5 a schematic structural diagram of an electronic device (such as the switch or cloud computing server in FIG. 1 ) 600 suitable for implementing embodiments of the present disclosure is shown.
  • the electronic device shown in FIG. 5 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601 , which may process data according to a program stored in a read-only memory (ROM, Read Only Memory) 602 or from a storage device 608
  • the program loaded into the random access memory (RAM, Random Access Memory) 603 executes various appropriate actions and processes.
  • RAM random access memory
  • various programs and data required for the operation of the electronic device 600 are also stored.
  • the processing device 601, ROM 602 and RAM 603 are connected to each other via a bus 604.
  • An input/output (I/O, Input/Output) interface 605 is also connected to bus 604.
  • input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) , an output device 607 such as a speaker, a vibrator, etc.; a storage device 608 including a magnetic tape, a hard disk, etc.; and a communication device 609.
  • Communication device 609 may allow electronic device 600 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 5 illustrates electronic device 600 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication device 609, or from storage device 608, or from ROM 602.
  • the processing device 601 When the computer program is executed by the processing device 601, the above functions defined in the method of the embodiment of the present disclosure are performed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof.
  • Computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmable read-only memory (EPROM (Erasable Programmable Read-only Memory) or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM, Compact Disc Read-only Memory), optical storage device, magnetic storage device, or the above Any suitable combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code contained on a computer-readable medium can be transmitted using any appropriate medium, including but not limited to: wires, optical cables, radio frequency (RF, Radio Frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device allocates the network card bandwidth to the virtual machine on the same physical machine;
  • the network interface is used to transmit the first transmission protocol traffic and the second transmission protocol traffic;
  • the first transmission protocol and the second transmission protocol are different types of transmission protocols sharing the same physical network; for any virtual machine on the physical machine , the total bandwidth of the virtual machine is divided into a first bandwidth, a second bandwidth and a buffer bandwidth;
  • the first bandwidth is used to transmit the first transmission protocol traffic, and the second bandwidth is used to transmit the second Transmission protocol traffic: reallocate the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol according to the actual usage bandwidth of the first transmission protocol and the actual usage bandwidth of the second transmission protocol.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider). connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as an Internet service provider
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the modules involved in the embodiments of the present disclosure can be implemented in software or hardware. Among them, the name of the module does not constitute a limitation on the module itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Array (FPGA, Field Programmable Gate Array), Application Specific Integrated Circuit (ASIC, Application Specific Integrated Circuit), Application Specific Standard Product (ASSP, Application Specific Standard Product), System on Chip (SOC, System on Chop), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), etc.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Product
  • SOC System on Chip
  • CPLD Complex Programmable Logic Device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • Example 1 provides a traffic congestion control method, including: allocating network card bandwidth to virtual machines on the same physical machine; the network interface of the virtual machine is used to transmit the first transmission Protocol traffic and second transmission protocol traffic; the first transmission protocol and the second transmission protocol are different types of transmission protocols sharing the same physical network;
  • the total bandwidth of the virtual machine is divided into a first bandwidth, a second bandwidth and a buffer bandwidth; the first bandwidth is used to transmit the first transmission protocol traffic, so The second bandwidth is used to transmit the second transmission protocol traffic; and
  • the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol are reallocated according to the actual usage bandwidth of the first transmission protocol and the actual usage bandwidth of the second transmission protocol.
  • Example 2 provides the method of Example 1, and the step of reallocating the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol includes:
  • the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol are reallocated through the remote direct data access RDMA network card.
  • Example 3 provides the method of Example 2.
  • the RDMA network card includes a classifier, a first register, a second register, a third register, a first speed limiter and a second speed limiter. device;
  • the result of the first flow control performed by the first speed limiter is stored in the second register;
  • the third register stores the result of the second flow control through the second speed limiter.
  • Example 4 provides the method of Examples 1-3, which redistributes all the bandwidth according to the actual usage bandwidth of the first transmission protocol and the actual usage bandwidth of the second transmission protocol.
  • the steps of determining the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol include:
  • control the first transmission protocol or the second transmission protocol to occupy the remaining portion of the total bandwidth
  • a congestion flag is set in the data packet of the second transmission protocol to reduce the risk of the second transmission.
  • the second transmission protocol is controlled to adopt a packet loss fallback mechanism to reduce the cost of the first transmission protocol. and the bandwidth actually used by the second transmission protocol.
  • Example 5 provides the method of Examples 1-4, which redistributes all the bandwidth according to the actual usage bandwidth of the first transmission protocol and the actual usage bandwidth of the second transmission protocol.
  • the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol also include:
  • both the first transmission protocol and the second transmission protocol are controlled to use packet loss return.
  • the predetermined threshold is greater than the total bandwidth.
  • Example 6 provides the method of Examples 1-5, the first transmission protocol is RoCE, and the second transmission protocol is TCP.
  • Example 7 provides the method of Examples 1-6, further comprising:
  • Example 8 provides a traffic congestion control device, including: a processing module for allocating network card bandwidth to virtual machines on the same physical machine; the network interface of the virtual machine is In transmitting the first transmission protocol traffic and the second transmission protocol traffic; the first transmission protocol and the second transmission protocol are different types of transmission protocols sharing the same physical network;
  • the processing module is also configured to divide the total bandwidth of the virtual machine into a first bandwidth, a second bandwidth and a buffer bandwidth for any virtual machine on the physical machine; the first bandwidth is used to transmit the First transmission protocol traffic, the second bandwidth is used to transmit the second transmission protocol traffic;
  • An allocation module configured to redistribute the bandwidth of the first transmission protocol and the bandwidth of the second transmission protocol according to the actual usage bandwidth of the first transmission protocol and the actual usage bandwidth of the second transmission protocol.
  • Example 9 provides a computer-readable medium having a computer program stored thereon, which implements the steps of the aforementioned traffic congestion control method when executed by a processing device.
  • Example 10 provides an electronic device, including: a storage device on which at least one computer program is stored; and at least one processing device for executing all the programs in the storage device.
  • the at least one computer program is used to implement the steps of the aforementioned traffic congestion control method.
  • Example 11 provides a computer program.
  • the computer program includes program code executable by a processing device.
  • the processing device executes the computer program, the aforementioned traffic congestion control method is implemented. A step of.
  • Example 12 provides a computer program product, the computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program comprising a program executable by a processing device Code, when the processing device executes the computer program, implements the steps of the aforementioned traffic congestion control method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本公开涉及一种流量拥塞控制方法、装置、计算机可读介质及电子设备,该方法包括:将网卡带宽分给同一台物理机上的虚拟机,虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量;对于物理机上的任意一台虚拟机,将虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽,第一带宽用于传输第一传输协议流量,第二带宽用于传输第二传输协议流量;以及根据第一传输协议的实际使用带宽及第二传输协议的实际使用带宽,重新分配第一传输协议的带宽和第二传输协议的带宽。能够有效控制第一传输协议与第二传输协议之间的带宽比例,分配一定比例的缓冲带宽以避免流量拥塞,保证运行在同一个物理网络上的不同传输协议的性能。

Description

流量拥塞控制方法、装置、计算机可读介质及电子设备
本公开要求于2022年4月29日提交中国专利局、申请号为202210476371.1、申请名称为“流量拥塞控制方法、装置、计算机可读介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及数据传输技术邻域,具体地,涉及一种流量拥塞控制方法、装置、计算机可读介质电子设备、计算机程序以及计算机程序产品。
背景技术
云计算从提出到现在发展了十多年,已经成为互联网的一个重要组成部分,越来越多的企业愿意将服务器部署到公共云平台上。企业期望利用云平台的强大算力,在云平台内部完成大量的后台处理,并通过互联网提供简单的终端服务。然而云计算平台面临一个挑战。随着摩尔定律的失效,云计算服务器的中央处理器(CPU,Central Processing Unit)性能提升缓慢,网络带宽的增长速度远快于CPU处理性能的增长速度,导致CPU过多的用于网络通信。为加快业务的处理速度,尤其是对于大数据处理来说,用户期望网络带宽尽量高,延迟尽量低,这导致用户购买的计算资源中,有很大一部分被用于网络通信而浪费掉了。
远程直接数据存取(Remote Direct Memory Access,RDMA)技术很好地解决了云平台面临的这一挑战。这一技术把网络通信相关的任务卸载到网卡上,节省了用户的计算开销,同时降低了延迟。RDMA网卡为支持这一技术所增加的成本远低于浪费的计算资源成本。因此逐渐成为云计算平台和大数据处理业务的标配。
为了节省成本,将RDMA技术运行在以太网上,基于融合以太网的RDMA(RDMA over Converged Ethernet,RoCE)标准就被提了出来,基于RoCE标准,RDMA技术就可以利用以太网设备进行部署,但毕竟RoCE标准是一个与TCP/IP不同的协议,RoCE流量与TCP流量(即TCP/IP流量)在同一物理网络上如何和平共处是个问题,因此亟需一种方法来控制RoCE流量与TCP流量之间的带宽。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种流量拥塞控制方法,包括:将网卡带宽分给同一台物理机上的虚拟机;所述虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量;第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议;对于所述物理机上的任意一台虚拟机,将所述虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽;所述第一带宽用于传输所述第一传输协议流量,所述第二带宽用于传输所述第二传输协议流量;以及根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
第二方面,本公开提供一种流量拥塞控制装置,包括:处理模块,用于将网卡带宽分给同一台物理机上的虚拟机;所述虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量;第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议;所述处理模块还用于对于所述物理机上的任意一台虚拟机,将所述虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽;所述第一带宽用于传输所述第一传输协议流量,所述第二带宽用于传输所述第二传输协议流量;分配模块,用于根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该计算机程序被处理装置执行时实现前述的流量拥塞控制方法的步骤。
第四方面,本公开提供一种电子设备,包括:存储装置,其上存储有至少一个计算机程序;以及至少一个处理装置,用于执行所述存储装置中的所述至少一个计算机程序,以实现前述的流量拥塞控制方法的步骤。
第五方面,本公开提供一种计算机程序,该计算机程序包含处理装置可执行的程序代码,当所述处理装置执行所述计算机程序时实现第一方面所述方法的步骤。
第六方面,本公开提供一种计算机程序产品,该计算机程序产品包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含处理装置可执行的程序代码,当所述处理装置执行所述计算机程序时实现第一方面所述方法的步骤。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,元件和元素不一定按照比例绘制。在附图中:
图1是本公开一个示例性实施例提供的虚拟机的带宽分配示意图。
图2是本公开一个示例性实施例提供的流量拥塞控制方法的流程图。
图3是本公开一个示例性实施例提供的虚拟机入向流量的处理流程图。
图4是本公开一个示例性实施例提供的流量拥塞控制装置的框图。
图5是本公开一个示例性实施例提供的电子设备的结构示意图。
附图标记说明
120-交换机;140-云计算服务器;20-流量拥塞装置;201-处理模块;203-分配模块;600-电子设备;601-处理装置;602-ROM;603-RAM;604-总线;605-I/O接口;606-输入装置;607-输出装置;608-存储装置;609-通信装置。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
可以理解的是,在使用本公开各实施例公开的技术方案之前,均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,其请求执行的操作将需要获取和使用到用户的个人信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。
作为一种可选的但非限定性的实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供个人信息的选择控件。
可以理解的是,上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其它满足相关法律法规的方式也可应用于本公开的实现方式中。
同时,可以理解的是,本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。
“尽力而为”的互联网服务适应了早期的发展需求,促使互联网在人们的日常生活中得到了普及,并进一步渗透到了各行各业。但随着应用的展开,“尽力而为”的服务质量逐渐暴露出短板。一些工业生产领域对网络的带宽、时延、抖动等有很高的要求。因此互联网服务质量相关的研究也得到了进一步的重视。
云计算服务已经成为互联网的一个重要组成部分,然而云计算服务器的CPU性能提升缓慢,网络带宽的增长速度远快于CPU处理性能的增长速度,同一台云计算服务器配置的网络带宽资源要远大于计算资源,导致CPU过多的用于网络通信。进而导致用户购买的计算资源中,有很大一部分被用于网络通信而浪费掉了。
RDMA技术很好的解决了上述问题。为了使RDMA技术与传统以太网兼容,以太网作为互联网最重要的接入标准已经非常普及,几乎所有的数据中心内部都是以太网。为了节省成本,将RDMA技术运行在以太网上,RoCE标准就被提了出来,RoCE标准使得RDMA技术能与以太网无缝对接,减小了部署阻力。但RoCE标准只能运行在数据中心内部,对外提供服务时,还要依赖传统的TCP/IP协议。
有了RoCE标准,RDMA技术就可以利用以太网设备进行部署,然而RoCE标准是一个与TCP/IP不同的协议,RoCE流量与TCP流量在同一物理网络上如何分配带宽是个问题。在云计算平台多租户场景下,更是需要一些特殊的服务质量(Quality of Service,QoS)机制来保证两者都能平稳运行。
云计算服务器上的虚拟机使用Linux操作系统,Linux内核流量控制(Traffic Control,TC)的QoS机制提供了一套框架,通过三种对象对流量进行处理:qdisc(排队规则)、class(类别)和filter(过滤器)。其中可分类qdisc包括CBQ、HTB和PRIO等算法。
CBQ是Class Based Queueing(基于类别排队)的缩写。它实现了一个丰富的连接共享类别结构,具有限制(shaping)带宽和带宽优先级管理的能力。
HTB是Hierarchy Token Bucket(层次令牌桶)的缩写。它有丰富的连接共享类别体系,可以保证每个传输协议类别的带宽,也允许突破自己类别的带宽上限,占用别的传输协议的带宽。
PRIO是Priority(优先级)的缩写,它不能限制带宽,但可以很好地对流量进行优先级管理。
然而Linux的TC框架比较复杂,使用了很多锁,性能比较差。如果在网卡上使用HTB qdisc来对RoCE流量和TCP流量进行限速,无法满足现有需求,而且RoCE流量对丢包非常敏感,直接使用丢包的方式进行限速会严重影响RoCE协议的性能。
以太网交换机每个接口都有8个QoS队列,可以通过每个数据包的IP协议头的差分服务代码点(Differentiated Services Code Point,DSCP)字段识别流量类型,对RoCE流量和非RoCE流量进行分类后,送给不同的队列进行限速和调度。
对于从交换机到云计算服务器的入向流量来说,交换机可以对RoCE和TCP流量(即TCP/IP流量)进行识别,并进行限速和调度。但交换机上的QoS队列有限,且不同虚拟机的RoCE流量使用的是同一个DSCP值,无法同时对多个虚拟机的流量进行限速和调度。
对于云计算服务器的虚拟网络来说,RoCE流量和TCP流量共享同一个物理网络,用户虚拟机中的同一个网络接口既可以传输RDMA流量,也可以传输TCP流量。因为RoCE流量的可靠传输和拥塞控制功能都在网卡上,所以需要在网卡上对RoCE流量和TCP流量进行限速
为了解决上述问题,本公开一个示例性实施例提供一种流量拥塞控制方法。本公开实施方式中,将网卡带宽分给同一台物理机(如交换机或云计算服务器)上的虚拟机,该虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量,其中第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议。
示例性的,第一传输协议为RoCE,第二传输协议为TCP/IP,网卡为RDMA网卡。请参阅图1,图1示出了本公开一个示例性实施例提供的虚拟机的带宽分配示意图。
如图1所示,包括交换机120及云计算服务器140。假设网卡带宽为200Gbps,云计算服务器140上承载5个虚拟机(Virtual Machine,VM),分别为VM-1、VM-2、VM-3、VM-4、VM-5,该虚拟机的网络接口用于传输RoCE流量和TCP流量,在一种实施方式中,可以给每个虚拟机分配40G的带宽。对于交换机120传输至云计算服务器140的入向流量来说,交换机120可以通过每个数据包的IP协议头的DSCP字段识别该数据包的流量类型,对RoCE流量和TCP流量进行分类后,送给RDMA网卡的不同限速器(Meter)进行限速和调度。
需要说明的是,RDMA网卡支持数据平面开发工具包(Data Plane Development Kit,DPDK)的rte_flow接口,rte_flow接口包括一些规则,如对流量进行分类,通过Meter对流量进行限速和调度,及根据寄存器的值来执行流量的限速和调度,因此通过级联Meter和这些规则可以实现RoCE流量和TCP流量的限速和带宽借用功能。
请参阅图2,图2为本公开一个示例性实施例提供的流量拥塞控制方法的流程图。该方法由电子设备来执行,例如,由图1所示的交换机或云计算服务器来执行。图2所示的流量拥塞控制方法包括以下步骤:
在步骤S101中,对于物理机上的任意一台虚拟机,将虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽。
第一带宽用于传输第一传输协议流量,第二带宽用于传输第二传输协议流量。第一带宽用于传输RoCE流量,第二带宽用于传输TCP流量。
请继续参阅图1,如前述可以给每个虚拟机分配40G的总带宽,给RoCE流量和TCP流量分别分配30G和10G的带宽,5G作为缓冲带宽,以降低RoCE流量和TCP流量过渡抢占的概率;缓冲带宽的取值还可以根据线上运行情况具体调整,在此不作限制。例如,当缓冲带宽为5G时,如果RoCE和TCP都在尽力发数据包,RoCE流量可以达到30G,TCP流量可以达到10G;如果TCP流量为0时,RoCE通过抢占TCP的带宽,最多可以达到35G(30+10-5)。
在步骤S102中,根据第一传输协议的实际使用带宽及第二传输协议的实际使用带宽,重新分配第一传输协议的带宽和第二传输协议的带宽。
示例性的,可以通过RDMA网卡重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
RDMA网卡位于物理机上,可以对物理机上的任意一台虚拟机进行限速和调度。RDMA网卡包括分类器、第一寄存器、第二寄存器、第三寄存器、第一限速器及第二限 速器,其中,分类器用于识别经过虚拟机的网络接口的流量的类型,第一寄存器记录经过虚拟机的网络接口的流量的类型,第一限速器及第二限速器实现RoCE流量和TCP流量的限速和调度,第二寄存器及第三寄存器用于记录流量控制的结果。
请参阅图3,图3为本公开一个示例性实施例提供的虚拟机入向流量的处理流程图。对于经过虚拟机的网络接口的入向流量,先通过RDMA网卡的分类器识别该入向流量的类型,是RoCE流量还是TCP流量,然后通过第一寄存器R1记录入向流量的类型;通过第一限速器第一次控制RoCE流量和TCP流量,第一限速器可以控制RoCE流量限速30G,控制TCP流量限速10G;通过第二寄存器R2存储第一次流量控制的结果,R2的值为G或Y,其中G表示RoCE流量或TCP流量未达到第一限速器的最高限速,Y表示RoCE流量或TCP流量已达到第一限速器的最高限速;通过第二限速器第二次控制RoCE流量和TCP流量,例如可以将RoCE流量与TCP流量之和限速为35G,通过第三寄存器R3存储第二限速器的流量控制的结果,R3的值为G或Y,其中G表示RoCE流量与TCP流量之和未达到第二限速器的最高限速,Y表示RoCE流量与TCP流量已达到第二限速器的最高限速。
需要说明的是,根据第一传输协议的实际使用带宽及第二传输协议的实际使用带宽,重新分配第一传输协议的带宽和第二传输协议的带宽有以下几种情形,为了示例说明,这几种情形的带宽分配情形设置为:虚拟机分配有40G的总带宽,缓冲带宽为0,RoCE流量分配有30G的带宽,TCP流量分配有10G的带宽。
情形一:在总带宽有剩余的情况下,控制第一传输协议或第二传输协议占用总带宽的剩余部分。
总带宽还有剩余的情形,此时可能RoCE流量和TCP流量实际使用的带宽均有剩余,或者其中之一有剩余,此时可以控制没有剩余的一方借用有剩余的一方的带宽,即借用总带宽的剩余部分。
情形二:在第一传输协议实际使用的带宽小于第一带宽且总带宽没有剩余的情况下,控制第二传输协议占用第一带宽的部分流量。
示例性的,在RoCE流量实际使用的带宽小于30G,且虚拟机的总带宽没有剩余的情况下,此时RoCE流量自己的带宽未用完,总带宽已用完,表明TCP流量带宽已用完,可以控制TCP流量占用RoCE流量的部分带宽,以避免流量拥塞。
情形三:在第二传输协议实际使用的带宽小于第二带宽且总带宽没有剩余的情况下,控制第一传输协议占用第二带宽的部分流量。
示例性的,在TCP流量实际使用的带宽小于10G,且虚拟机的总带宽没有剩余的情况下,此时TCP流量自己的带宽未用完,总带宽已用完,表明RoCE流量带宽已用完,可以控制RoCE流量占用TCP流量的部分带宽,以避免流量拥塞。
情形四:在第一传输协议实际使用的带宽等于第一带宽且总带宽没有剩余的情况下,在第二传输协议的数据包中设置拥塞标志,以降低第二传输协议实际使用的带宽。
示例性的,RoCE流量实际使用的带宽逐渐增加,并达到30G,此时总带宽没有剩余,表明TCP流量实际使用的带宽也已经达到10G,由于RoCE流量对丢包更加敏感,直接使用丢包的方式进行限速会严重影响RoCE协议的性能,因此在TCP流量的数据包中设置拥塞标志,以降低TCP流量实际使用的带宽,进而避免流量拥塞。
在TCP流量的数据包中设置拥塞标志的具体方式包括:接收TCP流量数据包的中间设备(如交换机)发现网络拥塞后,进而在该数据包中设置拥塞标志,接收端接收到设置有拥塞标志的数据包后,发送降速指示包至发送端,该降速指示包用于指示发送端减少发送数据包,以达到减低流量的目的。
情形五:在第二传输协议实际使用的带宽等于第二带宽且总带宽没有剩余的情况下,控制第二传输协议采用丢包回退机制,以降低第一传输协议及第二传输协议实际使用的带宽。
示例性的,TCP流量实际使用的带宽逐渐增加,并达到10G,此时总带宽没有剩余,表明RoCE流量实际使用的带宽也已经达到30G,由于RoCE流量对丢包更加敏感,直接使用丢包的方式进行限速会严重影响RoCE协议的性能,因此控制TCP流量采用丢包回退机制,以降低TCP流量实际使用的带宽,进而避免流量拥塞。
丢包回退机制是指,丢弃当前正在传输的数据包(对于上述示例内容,该数据包为TCP数据包),等待预定时间后,检测到总带宽有剩余时,重新传输之前丢弃的数据包。
情形六:在第一传输协议实际使用的带宽与第二传输协议实际使用的带宽之和大于预定阈值的情况下,控制第一传输协议及第二传输协议均采用丢包回退机制。
该预定阈值大于总带宽,例如可以是但不限于45G。
虚拟机分配有40G的总带宽,例如,RoCE流量和TCP流量实际使用的带宽之和已经达到45G,此时网络已经不堪重负,十分拥塞,可以通过第三寄存器记录此时的拥塞情况,将第三寄存器的值置为red,RDMA网卡检测到第三寄存器的值为red后,控制RoCE流量和TCP流量均采用丢包回退机制,进而避免流量拥塞。
综上所述,当RoCE流量实际使用的带宽不到30G时,TCP流量可以借用RoCE流量的带宽;当TCP流量实际使用的带宽未达到10G时,RoCE流量可以借用TCP流量的带宽;当RoCE流量和TCP流量都在争抢40G带宽时,可以控制二者的可用带宽达到30:10的平衡。
需要说明的是,对于虚拟机的网络接口的出向流量,也可以采用本公开提供的流量拥塞控制方法来进行流量分配和调度。
综上所述,本公开提供的流量拥塞控制方法,包括将网卡带宽分给同一台物理机上的虚拟机,虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量,第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议;对于物理机上的任意一台虚拟机,将虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽,第一带宽用于传输第一传输协议流量,第二带宽用于传输第二传输协议流量;根据第一传输协议的实际使用带宽及第二传输协议的实际使用带宽,重新分配第一传输协议的带宽和第二传输协议的带宽。能够有效控制第一传输协议与第二传输协议之间的带宽比例,分配一定比例的缓冲带宽以避免流量拥塞,保证运行在同一个物理网络上的不同传输协议的性能。
图4是本公开一个示例性实施例示出的一种流量拥塞控制装置的框图。参照图4,该装置20包括处理模块201和分配模块203。
该处理模块201用于将网卡带宽分给同一台物理机上的虚拟机;所述虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量;第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议;
该处理模块201还用于对于所述物理机上的任意一台虚拟机,将所述虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽;所述第一带宽用于传输所述第一传输协议流量,所述第二带宽用于传输所述第二传输协议流量;
该分配模块203用于根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
可选地,该分配模块203还用于通过远程直接数据存取RDMA网卡重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
可选地,所述RDMA网卡包括分类器、第一寄存器、第二寄存器、第三寄存器、第一限速器及第二限速器;该处理模块201还用于
通过所述分类器识别经过所述虚拟机的网络接口的流量的类型;
通过所述第一寄存器记录经过所述虚拟机的网络接口的流量的类型;
通过所述第一限速器第一次控制所述第一传输协议流量及所述第二传输协议流量;
通过所述第二限速器第二次控制所述第一传输协议流量及所述第二传输协议流量;
通过所述第二寄存器存储通过第一限速器进行第一次流量控制的结果;
通过所述第三寄存器存储通过第一限速器进行第二次流量控制的结果。
可选地,该分配模块203还用于:在所述总带宽有剩余的情况下,控制所述第一传输协议或所述第二传输协议占用所述总带宽的剩余部分;
在所述第一传输协议实际使用的带宽小于所述第一带宽且所述总带宽没有剩余的情况下,控制所述第二传输协议占用所述第一带宽的部分流量;
在所述第二传输协议实际使用的带宽小于所述第二带宽且所述总带宽没有剩余的情况下,控制所述第一传输协议占用所述第二带宽的部分流量;
在所述第一传输协议实际使用的带宽等于所述第一带宽且所述总带宽没有剩余的情况下,在所述第二传输协议的数据包中设置拥塞标志,以降低所述第二传输协议实际使用的带宽;
在所述第二传输协议实际使用的带宽等于所述第二带宽且所述总带宽没有剩余的情况下,控制所述第二传输协议采用丢包回退机制,以降低所述第一传输协议及所述第二传输协议实际使用的带宽。
可选地,该分配模块203还用于在所述第一传输协议实际使用的带宽与所述第二传输协议实际使用的带宽之和大于预定阈值的情况下,控制所述第一传输协议及所述第二传输协议均采用丢包回退机制;所述预定阈值大于所述总带宽。
可选地,所述第一传输协议为RoCE,所述第二传输协议为TCP。
可选地,该分配模块203还用于通过所述RDMA网卡控制所述虚拟机的入向流量和/或出向流量。
下面参考图5,其示出了适于用来实现本公开实施例的电子设备(例如图1中的交换机或云计算服务器)600的结构示意图。图5示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图5所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM,Read Only Memory)602中的程序或者从存储装置608加载到随机访问存储器(RAM,Random Access Memory)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。 处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O,Input/Output)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD,Liquid Crystal Display)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM(Erasable Programmable Read-only Memory)或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM,Compact Disc Read-only Memory)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(RF,Radio Frequency)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:将网卡带宽分给同一台物理机上的虚拟机;所述虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量;第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议;对于所述物理机上的任意一台虚拟机,将所述虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽;所述第一带宽用于传输所述第一传输协议流量,所述第二带宽用于传输所述第二传输协议流量;根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA,Field Programmable Gate Array)、专用集成电路(ASIC,Application Specific Integrated Circuit)、专用标准产品(ASSP,Application Specific Standard Product)、片上系统(SOC,System on Chop)、复杂可编程逻辑设备(CPLD,Complex Programmable Logic Device)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读存储介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种流量拥塞控制方法,包括:将网卡带宽分给同一台物理机上的虚拟机;所述虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量;第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议;
对于所述物理机上的任意一台虚拟机,将所述虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽;所述第一带宽用于传输所述第一传输协议流量,所述第二带宽用于传输所述第二传输协议流量;以及
根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述重新分配所述第一传输协议的带宽和所述第二传输协议的带宽的步骤包括:
通过远程直接数据存取RDMA网卡重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
根据本公开的一个或多个实施例,示例3提供了示例2的方法,所述RDMA网卡包括分类器、第一寄存器、第二寄存器、第三寄存器、第一限速器及第二限速器;
通过所述分类器识别经过所述虚拟机的网络接口的流量的类型;
通过所述第一寄存器记录经过所述虚拟机的网络接口的流量的类型;
通过所述第一限速器第一次控制所述第一传输协议流量及所述第二传输协议流量;
通过所述第二限速器第二次控制所述第一传输协议流量及所述第二传输协议流量;
通过所述第二寄存器存储通过所述第一限速器进行第一次流量控制的结果;
通过所述第三寄存器存储通过所述第二限速器进行第二次流量控制的结果。
根据本公开的一个或多个实施例,示例4提供了示例1-3的方法,所述根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽的步骤包括:
在所述总带宽有剩余的情况下,控制所述第一传输协议或所述第二传输协议占用所述总带宽的剩余部分;
在所述第一传输协议实际使用的带宽小于所述第一带宽且所述总带宽没有剩余的情况下,控制所述第二传输协议占用所述第一带宽的部分流量;
在所述第二传输协议实际使用的带宽小于所述第二带宽且所述总带宽没有剩余的情况下,控制所述第一传输协议占用所述第二带宽的部分流量;
在所述第一传输协议实际使用的带宽等于所述第一带宽且所述总带宽没有剩余的情况下,在所述第二传输协议的数据包中设置拥塞标志,以降低所述第二传输协议实际使用的带宽;以及
在所述第二传输协议实际使用的带宽等于所述第二带宽且所述总带宽没有剩余的情况下,控制所述第二传输协议采用丢包回退机制,以降低所述第一传输协议及所述第二传输协议实际使用的带宽。
根据本公开的一个或多个实施例,示例5提供了示例1-4的方法,所述根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽还包括:
在所述第一传输协议实际使用的带宽与所述第二传输协议实际使用的带宽之和大于预定阈值的情况下,控制所述第一传输协议及所述第二传输协议均采用丢包回退机制;所述预定阈值大于所述总带宽。
根据本公开的一个或多个实施例,示例6提供了示例1-5的方法,所述第一传输协议为RoCE,所述第二传输协议为TCP。
根据本公开的一个或多个实施例,示例7提供了示例1-6的方法,还包括:
通过RDMA网卡控制所述虚拟机的入向流量和/或出向流量。
根据本公开的一个或多个实施例,示例8提供了一种流量拥塞控制装置,包括:处理模块,用于将网卡带宽分给同一台物理机上的虚拟机;所述虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量;第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议;
所述处理模块还用于对于所述物理机上的任意一台虚拟机,将所述虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽;所述第一带宽用于传输所述第一传输协议流量,所述第二带宽用于传输所述第二传输协议流量;
分配模块,用于根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
根据本公开的一个或多个实施例,示例9提供了一种计算机可读介质,其上存储有计算机程序,该计算机程序被处理装置执行时实现前述的流量拥塞控制方法的步骤。
根据本公开的一个或多个实施例,示例10提供了一种电子设备,包括:存储装置,其上存储有至少一个计算机程序;以及至少一个处理装置,用于执行所述存储装置中的所述至少一个计算机程序,以实现前述的流量拥塞控制方法的步骤。
根据本公开的一个或多个实施例,示例11提供了一种计算机程序,该计算机程序包含处理装置可执行的程序代码,当所述处理装置执行所述计算机程序时实现前述的流量拥塞控制方法的步骤。
根据本公开的一个或多个实施例,示例12提供了一种计算机程序产品,该计算机程序产品包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含处理装置可执行的程序代码,当所述处理装置执行所述计算机程序时实现前述的流量拥塞控制方法的步骤。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施 例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (12)

  1. 一种流量拥塞控制方法,包括:
    将网卡带宽分给同一台物理机上的虚拟机;其中所述虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量,第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议;
    对于所述物理机上的任意一台虚拟机,将所述虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽;其中所述第一带宽用于传输所述第一传输协议流量,所述第二带宽用于传输所述第二传输协议流量;以及
    根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
  2. 根据权利要求1所述的方法,其中,所述重新分配所述第一传输协议的带宽和所述第二传输协议的带宽的步骤包括:
    通过远程直接数据存取RDMA网卡重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
  3. 根据权利要求2所述的方法,其中,所述RDMA网卡包括分类器、第一寄存器、第二寄存器、第三寄存器、第一限速器及第二限速器;
    通过所述分类器识别经过所述虚拟机的网络接口的流量的类型;
    通过所述第一寄存器记录经过所述虚拟机的网络接口的流量的类型;
    通过所述第一限速器第一次控制所述第一传输协议流量及所述第二传输协议流量;
    通过所述第二限速器第二次控制所述第一传输协议流量及所述第二传输协议流量;
    通过所述第二寄存器存储通过所述第一限速器进行第一次流量控制的结果;
    通过所述第三寄存器存储通过所述第二限速器进行第二次流量控制的结果。
  4. 根据权利要求1-3中任一项所述的方法,其中,所述根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽的步骤包括:
    在所述总带宽有剩余的情况下,控制所述第一传输协议或所述第二传输协议占用所述总带宽的剩余部分;
    在所述第一传输协议实际使用的带宽小于所述第一带宽且所述总带宽没有剩余的情况下,控制所述第二传输协议占用所述第一带宽的部分流量;
    在所述第二传输协议实际使用的带宽小于所述第二带宽且所述总带宽没有剩余的情况下,控制所述第一传输协议占用所述第二带宽的部分流量;
    在所述第一传输协议实际使用的带宽等于所述第一带宽且所述总带宽没有剩余的情况下,在所述第二传输协议的数据包中设置拥塞标志,以降低所述第二传输协议实际使用的带宽;以及
    在所述第二传输协议实际使用的带宽等于所述第二带宽且所述总带宽没有剩余的情况下,控制所述第二传输协议采用丢包回退机制,以降低所述第一传输协议及所述第二传输协议实际使用的带宽。
  5. 根据权利要求1-4中任一项所述的方法,其中,所述根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽还包括:
    在所述第一传输协议实际使用的带宽与所述第二传输协议实际使用的带宽之和大于预定阈值的情况下,控制所述第一传输协议及所述第二传输协议均采用丢包回退机制;其中所述预定阈值大于所述总带宽。
  6. 根据权利要求1-5中任一项所述的方法,其中,所述第一传输协议为RoCE,所述第二传输协议为TCP。
  7. 根据权利要求1-6中任一项所述的方法,还包括:
    通过RDMA网卡控制所述虚拟机的入向流量和/或出向流量。
  8. 一种流量拥塞控制装置,包括:
    处理模块,用于将网卡带宽分给同一台物理机上的虚拟机;其中所述虚拟机的网络接口用于传输第一传输协议流量及第二传输协议流量,第一传输协议与第二传输协议为共享同一个物理网络的不同类型的传输协议;
    其中所述处理模块还用于对于所述物理机上的任意一台虚拟机,将所述虚拟机的总带宽分为第一带宽、第二带宽及缓冲带宽;所述第一带宽用于传输所述第一传输协议流量,所述第二带宽用于传输所述第二传输协议流量;以及
    分配模块,用于根据所述第一传输协议的实际使用带宽及所述第二传输协议的实际使用带宽,重新分配所述第一传输协议的带宽和所述第二传输协议的带宽。
  9. 一种计算机可读介质,所述计算机可读介质上存储有计算机程序,其中,所述计算机程序被处理装置执行时实现权利要求1-7中任一项所述方法的步骤。
  10. 一种电子设备,包括:
    存储装置,其上存储有至少一个计算机程序;以及
    至少一个处理装置,用于执行所述存储装置中的所述至少一个计算机程序,以实现权利要求1-7中任一项所述方法的步骤。
  11. 一种计算机程序,所述计算机程序包含处理装置可执行的程序代码,当所述处理装置执行所述计算机程序时实现权利要求1-7中任一项所述方法的步骤。
  12. 一种计算机程序产品,所述计算机程序产品包括承载在非暂态计算机可读介质上的计算机程序,所述计算机程序包含处理装置可执行的程序代码,当所述处理装置执行所述计算机程序时实现权利要求1-7中任一项所述方法的步骤。
PCT/CN2023/085795 2022-04-29 2023-03-31 流量拥塞控制方法、装置、计算机可读介质及电子设备 WO2023207521A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210476371.1 2022-04-29
CN202210476371.1A CN114884823B (zh) 2022-04-29 2022-04-29 流量拥塞控制方法、装置、计算机可读介质及电子设备

Publications (1)

Publication Number Publication Date
WO2023207521A1 true WO2023207521A1 (zh) 2023-11-02

Family

ID=82674590

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/085795 WO2023207521A1 (zh) 2022-04-29 2023-03-31 流量拥塞控制方法、装置、计算机可读介质及电子设备

Country Status (2)

Country Link
CN (1) CN114884823B (zh)
WO (1) WO2023207521A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114884823B (zh) * 2022-04-29 2024-03-22 北京有竹居网络技术有限公司 流量拥塞控制方法、装置、计算机可读介质及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9608917B1 (en) * 2013-10-21 2017-03-28 Google Inc. Systems and methods for achieving high network link utilization
CN109756429A (zh) * 2017-11-06 2019-05-14 阿里巴巴集团控股有限公司 带宽分配方法及设备
CN110661654A (zh) * 2019-09-19 2020-01-07 北京浪潮数据技术有限公司 一种网络带宽资源分配方法、装置、设备及可读存储介质
CN112436982A (zh) * 2020-11-23 2021-03-02 苏州浪潮智能科技有限公司 一种网络流量自动混跑测试方法、系统、终端及存储介质
CN114079638A (zh) * 2020-08-17 2022-02-22 中国电信股份有限公司 多协议混合网络的数据传输方法、装置和存储介质
CN114884823A (zh) * 2022-04-29 2022-08-09 北京有竹居网络技术有限公司 流量拥塞控制方法、装置、计算机可读介质及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036803A (zh) * 2012-12-21 2013-04-10 南京邮电大学 一种基于应用层检测的流量控制方法
CN113746744A (zh) * 2020-05-30 2021-12-03 华为技术有限公司 网络拥塞的控制方法、装置、设备、系统及存储介质
CN113726690B (zh) * 2021-07-31 2023-08-08 苏州浪潮智能科技有限公司 一种协议报文的上传方法、系统、电子设备及存储介质
CN114039926B (zh) * 2021-11-05 2023-10-03 北京字节跳动网络技术有限公司 传输控制协议确定方法、装置、可读介质及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9608917B1 (en) * 2013-10-21 2017-03-28 Google Inc. Systems and methods for achieving high network link utilization
CN109756429A (zh) * 2017-11-06 2019-05-14 阿里巴巴集团控股有限公司 带宽分配方法及设备
CN110661654A (zh) * 2019-09-19 2020-01-07 北京浪潮数据技术有限公司 一种网络带宽资源分配方法、装置、设备及可读存储介质
CN114079638A (zh) * 2020-08-17 2022-02-22 中国电信股份有限公司 多协议混合网络的数据传输方法、装置和存储介质
CN112436982A (zh) * 2020-11-23 2021-03-02 苏州浪潮智能科技有限公司 一种网络流量自动混跑测试方法、系统、终端及存储介质
CN114884823A (zh) * 2022-04-29 2022-08-09 北京有竹居网络技术有限公司 流量拥塞控制方法、装置、计算机可读介质及电子设备

Also Published As

Publication number Publication date
CN114884823A (zh) 2022-08-09
CN114884823B (zh) 2024-03-22

Similar Documents

Publication Publication Date Title
US10447594B2 (en) Ensuring predictable and quantifiable networking performance
JP5398707B2 (ja) エンド・ツー・エンド型ネットワークqosの強化
CN108616458B (zh) 客户端设备上调度分组传输的系统和方法
CN107210972B (zh) 控制公平带宽分配效率的系统和方法
US20170308394A1 (en) Networking stack of virtualization software configured to support latency sensitive virtual machines
JP4214682B2 (ja) 計算機およびその入出力手段
US9686203B2 (en) Flow control credits for priority in lossless ethernet
US20090055831A1 (en) Allocating Network Adapter Resources Among Logical Partitions
US10810038B2 (en) Accounting and enforcing non-process execution by container-based software receiving data over a network
WO2023207521A1 (zh) 流量拥塞控制方法、装置、计算机可读介质及电子设备
US20080086575A1 (en) Network interface techniques
WO2020034819A1 (zh) 分布式存储系统中服务质量保障方法、控制节点及系统
JP7251648B2 (ja) サーバ内遅延制御システム、サーバ内遅延制御装置、サーバ内遅延制御方法およびプログラム
CN116868553A (zh) 用于管理端点资源和拥塞缓解的数据中心网络上的动态网络接收器驱动的数据调度
US9413672B2 (en) Flow control for network packets from applications in electronic devices
US20180091447A1 (en) Technologies for dynamically transitioning network traffic host buffer queues
CN112671832A (zh) 虚拟交换机中保障层次化时延的转发任务调度方法及系统
US20140281087A1 (en) Moderated completion signaling
US10901820B1 (en) Error state message management
US11775359B2 (en) Methods and apparatuses for cross-layer processing
CN115242727B (zh) 用户请求处理方法、装置、设备和介质
WO2024001851A1 (zh) 一种资源调度方法、装置及系统
CN117795926A (zh) 复用会话中的数据封包优先级排序
CN116982030A (zh) 服务器内延迟控制装置、服务器内延迟控制方法和程序
CN115794353A (zh) 云网业务服务质量优化处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23794956

Country of ref document: EP

Kind code of ref document: A1