CN115051953B

CN115051953B - Programmable data plane distributed load balancing method based on switch queue behavior

Info

Publication number: CN115051953B
Application number: CN202210681089.7A
Authority: CN
Inventors: 刘外喜; 蔡君; 凌森; 沈湘平; 陈庆春
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2022-06-16
Filing date: 2022-06-16
Publication date: 2023-07-28
Anticipated expiration: 2042-06-16
Also published as: CN115051953A

Abstract

The invention relates to the technical field of data center networks, and discloses a programmable data plane distributed load balancing method based on switch queue behaviors, which comprises a data plane state collection and return module, a congestion index estimation module and a probability forwarding module; the state collection and return module of the switch outlet processing part periodically returns the queue behavior data of the outlet port to the own inlet processing part; when a data packet arrives at the switch, a congestion index estimation module of an inlet processing part of the switch calculates the congestion index of each available outlet port according to the queue behavior of the outlet port, and a probability forwarding module calculates the probability that each outlet port is selected to forward the data packet according to the congestion index. The invention takes the occupancy rate of the queue, the time interval of dequeuing and the change trend of the queue as decision basis of load balancing; the invention has small decision time delay, can realize load balancing with any granularity, and can solve the problem that the current load balancing mechanism routes all traffic to the optimal path.

Description

Programmable data plane distributed load balancing method based on switch queue behavior

Technical Field

The invention relates to the technical field of data center networks, in particular to a programmable data plane distributed load balancing method based on switch queue behaviors.

Background

ECMP, which is widely used in current data center networks, is load balancing at the granularity of flows, which allocates the available paths between each pair of switches in units of flows, however, this method is generally performed under high traffic load due to hash collision and lack of global state awareness, while selecting paths with randomness. CONGA proposes to implement congestion awareness of the global network through feedback between leaf node switches and load balancing in packet clusters, but this approach requires RTT-level time to update congestion awareness status and requires custom hardware. While Presto forwards traffic onto different paths by splitting the traffic into flowcells at a fixed size and using a round robin algorithm for the flowcells. The DRILL further explores a load balancing strategy using packet granularity that uses a small amount of local load information to achieve packet granularity load balancing, which allows delay of routing decisions to the order of microseconds, however, this approach uses packet granularity routing strategies that tend to result in packet misordering.

Disclosure of Invention

The invention aims to provide a programmable data plane distributed load balancing method based on switch queue behaviors, so as to solve the problems in the background art.

In order to achieve the above purpose, the present invention provides the following technical solutions: a programmable data plane distributed load balancing method based on switch queue behavior can realize load balancing with any granularity, and comprises a state collection and return module, a congestion index estimation module and a probability forwarding module in a programmable data plane, and is characterized in that: the processing steps when the data packet arrives at the switch are as follows:

the congestion index estimation module of the inlet processing part of the switch calculates the congestion index of each available outlet port according to the queue behavior of the outlet port of the switch;

the probability forwarding module of the inlet processing part of the switch calculates the probability of each outlet port of the switch to be selected for forwarding the data packet according to the congestion index, and completes the forwarding process of the data packet in the switch;

while the above steps are being performed, the state collection and return module of the egress processing portion of the switch periodically returns the queue behavior data of the egress port of the switch to the congestion index estimation module of the ingress processing portion, providing the required data for the forwarding decision of the ingress processing portion.

Preferably, the switch queue behavior is queue occupancy, dequeue time interval, and queue change trend.

Preferably, the queue occupancy rate is the ratio of the depth of the queue of the egress port queue to the total length of the queue when the data packet arrives at the ingress port queue; the dequeue time interval is the time difference between two packets leaving the queue at the deport queue; the trend of the queue change is that the queue is in the increasing or decreasing direction.

Preferably, the congestion index estimation module estimates the congestion degree of each output port, and the calculation formula is as follows:

C _i ＝L _i *(τ-αT _i )*V _i (1)

C _i l is the congestion index of the output port i _i For the queue occupancy of egress port i, T _i Dequeue time interval for the queue of deport i, V _i As the change trend of the queue of the output port i, when the occupancy rate of the queue is in an ascending trend, V _i Take the value beta ₁ The method comprises the steps of carrying out a first treatment on the surface of the When the occupancy rate of the queue is in a decreasing trend, V _i Take the value beta ₂ τ is a time constant, the value is a constant of the end-to-end delay level of the network, and α is an adjustable factor of the influence degree of T on C.

Preferably, the probability forwarding module calculates the probability of each output port being selected to forward the data packet according to the congestion index, and the probability P of the output port i being selected _i The calculation formula of (2) is as follows:

P _i ＝W _i /(∑W _i ) (2)

wherein W is _i Forwarding weight, W, for traffic out of port i _i The value of (3) decreases with the increase of congestion index, and equation (3) is used to determine the traffic forwarding weight of each port and the traffic forwarding weight W of the outlet port i _i The method comprises the following steps:

W _i ＝[(C _max -C _i )] (3)

wherein, [.]Representing rounding operations, C _i Represents the congestion index of port i, C _max For the maximum value of the congestion index of the egress port i, the most severe congestion level is indicated, i.e., C _max Representing the queue occupancy L as out port i _i Time interval of dequeue T =100% _i =0, the queue occupancy is in an upward trend (i.e., V _i Take the value beta ₁ ) Time C _i Is a value of (2).

Preferably, the probability forwarding of formula (2) is implemented by using a random function provided in a programmable data plane and capable of realizing uniform distribution, and the weight of the output port i is represented by W _i The boxes represent, one switch shares Sigma W _i A number of lattices, numbered 1,2,3 … Sigma W _i The random function randomly generates a value at [1 ], ΣW _i ]And (3) the cell numbers in the range, and the data packet is forwarded from the outlet port to which the cell belongs.

Preferably, the state collection and return module of the egress processing portion of the switch periodically returns queue behavior data of the egress port to the congestion index estimation module of the ingress processing portion, and when the data packet arrives at the egress processing portion of the switch, the process is as follows:

said when the data packet arrives at the egress processing part, if the time interval does not reach period T _b The state collection module continuously collects queue behavior data of the current port in the outlet processing part, and the collected queue behavior data is stored in a register of the switch in the form of port numbers as indexes;

if the time interval exceeds the period T _b The state return module reads the local queue behavior data stored on the register, adds the data to the custom header of the clone data packet and returns the custom header to the entry processing part of the switch; when the clone packet reaches the ingress processing portion, the ingress processing portion updates the queue behavior data for the port according to the port number index, and then discards the clone packet.

Preferably, the custom header of the cloned data packet is composed of four fields including queue occupancy rate, dequeue time interval, queue change trend and deport number index;

the queue occupancy rate, the dequeue time interval and the queue change trend are used for carrying queue behavior data; the output port number index represents the returned queue behavior data of the output port, the inlet processing part updates the queue behavior data of the output port according to the index, and when the subsequent data packet arrives, the inlet processing part makes a forwarding decision based on the queue behavior data.

Preferably, the load balancing with any granularity can be realized, the any granularity comprises four types of data packet granularity, packet cluster (flowlet) granularity, flowcell granularity and flow granularity, the granularity can be determined according to actual requirements, when a data packet arrives at a switch, the switch has the following four forwarding strategies,

for each data packet, a forwarding port is independently selected according to the probability shown in the formula (2), so that the load balancing of the granularity of the data packet is realized;

for each packet cluster, forwarding ports are independently selected according to the probability shown in the formula (2), and all data packets in one packet cluster are forwarded by using the same port, so that load balancing of packet cluster granularity is realized; the packet cluster means that when the transmission interval of adjacent data packets in one stream exceeds a certain threshold value, the packet cluster is represented;

for each flowcell, forwarding ports are independently selected according to the probability shown in a formula (2), and all data packets in one flowcell are forwarded by using the same port, so that load balancing of the flowcell granularity is realized; the flow cells are formed by cutting a flow into one flow cell according to a fixed size, and adjacent data packets in the flow form one flow cell in one flow, wherein the sizes of all the flow cells are equal;

for each flow, forwarding ports are independently selected according to the probability shown in the formula (2), and all data packets in one flow are forwarded by using the same port, so that load balancing of flow granularity is realized; the flow refers to a collection of data packets with the same five-tuple, and the flow granularity can completely avoid the data packet disorder.

Compared with the prior art, the invention has the beneficial effects that: the programmable data plane distributed load balancing method based on the switch queue behavior can realize load balancing of any granularity and can determine the granularity according to actual requirements; the method comprises the steps of providing a decision basis for load balancing of an outbound time interval, a queue change and a queue change trend; the probability type forwarding module is used for uniformly distributing the traffic to each available forwarding port according to a certain probability, so that a good load balancing effect is realized, and the problem that the optimal path is congested due to the fact that all traffic is routed to the optimal path in the current load balancing is avoided; the invention is a completely distributed mechanism for independently making load balancing decisions by each switch, does not need the participation of controllers or the cooperation between the switches, does not need human participation, and has low cost; the invention makes a decision locally in the exchanger, has small time delay required for the decision, and can rapidly cope with micro burst flow in the network; the present invention is applicable to any topology or traffic pattern.

Drawings

FIG. 1 is a diagram of a system architecture of the present invention;

FIG. 2 is a diagram of the occupancy and dequeue time intervals of the present invention;

FIG. 3 is a schematic diagram of a probability selection port according to the present invention;

FIG. 4 is a schematic diagram of the header format of a clone packet returned from the egress processing portion to the ingress processing portion according to the present invention;

FIG. 5 is a schematic diagram of average FCT in Data-mining traffic in a symmetric network topology according to the present invention;

FIG. 6 is a schematic diagram of average FCT in Data-mining traffic in an asymmetric network topology according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-6, the present invention provides a technical solution: the invention discloses a programmable data plane distributed load balancing method based on switch queue behaviors. The invention can be used for load balancing of granularity of data packets, packet clusters, flowcells, flows and the like.

Load balancing of packet granularity

The processing steps when the data packet arrives at the switch are as follows:

the congestion index estimation module of the inlet processing part calculates the congestion index of each available outlet port according to the queue behavior (the invention refers to the occupancy rate of the queue, the time interval of the outlet queue and the change trend of the queue) of the outlet port of the switch;

and the probability forwarding module of the inlet processing part of the switch calculates the probability that each outlet port of the switch is selected to forward the data packet according to the congestion index, and completes the forwarding process of the data packet in the switch.

And (3) while performing the steps (1) and (2), the state collection and return module of the outlet processing part periodically returns the queue behavior data of the outlet port of the switch to the inlet processing, and provides required data for forwarding decision of the inlet processing.

Load balancing of packet cluster granularity

The processing steps when the data packet arrives at the switch are as follows:

in the ingress processing section, the five-tuple of the data packet is saved by hash, called a generated packet cluster ID, and a time stamp T1 is recorded;

the congestion index estimation module of the inlet processing part calculates the congestion index of each available outlet port according to the queue behavior (the queue occupancy rate, the queue time interval and the queue change trend) of the outlet port;

the probability type forwarding module of the inlet processing part converts the congestion index into the probability of forwarding the data packet from each available outlet port, the forwarding process of the data packet in the switch is completed, and the forwarding port of the data packet is recorded.

When the subsequent data packet arrives, the time stamp is T2. If DeltaT=T2-T1 is larger than the packet cluster threshold value Ts, wherein Ts=10ms, creating a new packet cluster ID, and turning to the step (2); otherwise, forwarding is carried out according to the forwarding port corresponding to the current packet cluster ID;

the state collection and return module of the outlet processing part returns the queue behavior data of the outlet port to the inlet processing periodically while the steps (1) - (4) are carried out, and provides required data for forwarding decision of the inlet processing;

the queue behavior in the invention refers to the queue occupancy rate, the queue-out time interval, the queue change trend and the like of the port, and the queue occupancy rate refers to the proportion of the queue depth of the port-out queue to the total length of the queue when a data packet arrives at the port-in queue; the dequeue time interval is the time difference between two packets leaving the queue indicating the port queue; the queue change trend means whether the queue is in the increasing or decreasing direction, that is, whether the current queue occupancy is increasing or decreasing, for example, assuming that the current queue occupancy=50%, the queue occupancy is decreasing from 60% to 50% or increasing from 40% to 50%;

the congestion index estimation module is crucial to the effect of the load balancing strategy and is the basis of routing decisions, so the accuracy of the module directly influences the performance of the load balancing strategy.

The load balancing policy is in effect the choice of ports, which determines the traffic out of the ports. Through big data analysis, the positive correlation between the output port flow and the queue occupancy rate is observed (1); (2) egress port traffic is inversely related to the dequeue time interval. Meanwhile, considering that the programmable switch does not support floating point number operation and division operation, the congestion index estimation module estimates the congestion degree of each output port, and the calculation formula is as follows:

C _i ＝L _i *(τ-αT _i )*V _i (1)

C _i l is the congestion index of the output port i _i Queue occupancy (%), T for egress port i _i Dequeue time interval (us), V for a queue of deport i _i As the change trend of the queue of the output port i, when the occupancy rate of the queue is in an ascending trend, V _i Taking a value of 2; when the occupancy rate of the queue is in a decreasing trend, V _i Take the value 1, τ is a normalized time constant, take the value as a constant of the end-to-end delay level of the network, α is an adjustable factor of the influence degree of T on C, for example, τ=10 ms, α=1.

The congestion index estimation module obtains the congestion index of each port according to the network state information returned by the outlet processing part and the formula (1), wherein the larger the congestion index is, the more serious the congestion degree of the port is.

Compared with other methods, the congestion index estimation module can reflect the congestion condition of the network more accurately and has better load balancing effect by only judging the occupancy rate of the queue to indirectly infer the congestion degree;

the probability forwarding module determines the probability of port selection according to the congestion index so as to realize uniform dispatching of the traffic to each available port. In short, the more idle the ports (i.e., the lower the queue occupancy of the ports, the greater the dequeue time interval), the greater the probability of being selected, giving priority to traffic going to more idle ports. In the present invention, probability P of output port i being selected _i The calculation formula of (2) is as follows:

P _i ＝W _i /(∑W _i ) (2)

wherein W is _i Forwarding weight, W, for traffic out of port i _i The value of (3) decreases with the increase of congestion index, and equation (3) is used to determine the traffic forwarding weight of each port and the forwarding weight W of the output port i _i The method comprises the following steps:

W _i ＝[(C _max -C _i )] (3)

Equation (2) uses division, probability P _i Floating point numbers are possible, however, programmable data planes do not have high support for division and floating point type data. The invention thus uses a random function provided in the programmable data plane that enables a uniform distribution to achieve the probabilistic forwarding described by equation (2). That is, the ports to be forwarded are selected in a grid-like manner, and the weight of the output port i is defined by W _i The boxes represent, one switch shares Sigma W _i A grid numbered 1,2,3 … sigma Wi, the random function randomly generating a code of [1, Σw ] _i ]Cell numbers in the range, and the data packet is forwarded from the output port to which the cell belongs;

the state collection module is responsible for collecting network states of the switch exit processing part, including queue occupancy rate, queue-out time interval, queue change trend and the like.

When the packet arrives at the egress processing part, if the time interval does not reach period T _b (e.g., 1 ms), the state collection module continues to collect the queue state of the current port at the egress processing portion. The collected network state information will be saved to the registers of the switch in the form of port numbers as indexes. If the time interval exceeds the period T _b The state return module reads the local queue state information stored on the register and adds this information to the custom header of the clone packet, returning to the ingress processing portion of the switch. When the clone packet reaches the ingress processing portion, the ingress processing portion updates the queue behavior data for the port according to the port number index, and then discards the clone packet.

The network status format carried by the clone data packet is shown in fig. 4, and the entire custom header occupies 10 bytes. The queue occupancy rate, the queue-out time interval and the queue change trend are carried network states; the output port number index is used for indicating the returned network state of which output port, the entrance processing part updates the network state of the port according to the index, and when the subsequent data packet arrives at the entrance processing part, the forwarding decision is made according to the state.

Experimental example: compared with the load balancing strategies of several current main flows, such as ECMP of flow level, letFlow, CONGA, HULA of packet cluster level and DRILL of data packet level, experiments show that the method has great advantages in reducing the flow completion time FCT (Flow Completion Time), and the network load sharing precision of the method is lower than 3% and is reduced by more than 50% compared with ECMP under various traffic loads.

According to FCT of each mechanism when loading Data-mining application under symmetrical network topology, it is known that:

the FCT of Roll-pkt based on packet granularity is only 50% of ECMP based on flow granularity;

as for the granularity of the data packet, the FCT of the Roll-pkt is reduced by 21.9% compared with the DRILL;

the FCT of the Roll-flowlet is reduced by 11.1% compared with the CONGA;

according to FCT of each mechanism when loading Data-mining application under asymmetric network topology, it is known that:

the FCT of the Roll-pkt based on the granularity of the data packet is reduced by 121% compared with the ECMP based on the granularity of the flow;

as for the granularity of the data packet, the FCT of the Roll-pkt is reduced by 23.1% compared with the DRILL;

as for packet cluster granularity, the FCT of the Roll-flowlet is reduced by 10.4% compared with the CONGA.

When the method is used, the load balancing of any granularity can be realized by the programmable data plane distributed load balancing method based on the switch queue behavior, and the granularity can be determined according to actual requirements; comprehensively considering queue behaviors such as queue occupancy rate, queue-out time interval, queue change trend and the like, and providing decision basis for taking the queue-out time interval and the queue change as load balancing; the probability type forwarding module uniformly distributes the traffic to each available forwarding port according to a certain probability, so that a good load balancing effect is realized, and the problem that the current load balancing generally routes all traffic to the optimal path to cause the congestion of the optimal path is avoided.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A programmable data plane distributed load balancing method based on switch queue behavior can realize load balancing with any granularity, and comprises a state collection and return module, a congestion index estimation module and a probability forwarding module in a programmable data plane, and is characterized in that: the processing steps when the data packet arrives at the switch are as follows:

the method comprises the steps that when the steps are carried out, a state collection and return module of an outlet processing part of a switch periodically returns queue behavior data of an outlet port of the switch to a congestion index estimation module of an inlet processing part, and provides required data for forwarding decision of the inlet processing part, wherein the switch queue behavior is a queue occupancy rate, an outlet queue time interval and a queue change trend, and the queue occupancy rate is the proportion of the queue depth of an outlet port queue to the total length of the queue when a data packet arrives at the inlet port queue; the dequeue time interval is the time difference between two packets leaving the queue at the deport queue; the queue change trend is that the queue is in the increasing or decreasing direction, the congestion index estimation module estimates the congestion degree of each output port, and the calculation formula is as follows:

C _i is an outlet portiIs a function of the congestion index of (a),L _i is an outlet portiIs used for the rate of occupancy of the queue,T _i is an outlet portiIs provided for the dequeue time interval of the queue of (c),V _i is an outlet portiIs a change trend of the queue, when the occupancy of the queue is in an upward trend,V _i take the value ofβ ₁ The method comprises the steps of carrying out a first treatment on the surface of the When the queue occupancy is in a decreasing trend,V _i take the value ofβ ₂ ，τThe time constant is a constant of the end-to-end delay level of the network,αcongestion for dequeue time intervalAn adjustable factor of the degree of exponential influence.

2. A method of distributed load balancing of a programmable data plane based on switch queue behavior according to claim 1, wherein: the probability forwarding module calculates the probability of each output port selected to forward the data packet according to the congestion index, wherein the output portsiProbability of being selectedP _i The calculation formula of (2) is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,Wiis an outlet portiIs used for the traffic forwarding weight of (1),Withe value of (2) decreases with the increase of congestion index, and equation (3) is used to determine the traffic forwarding weight of each port and output the portiTraffic forwarding weights of (2)WiThe method comprises the following steps:

wherein, [.]The rounding operation is represented by a number of bits,C _i the ports are showniIs a function of the congestion index of (a),C _max is an outlet portiIs the most severe congestion level, i.e.,C _max indicating when out portiQueue occupancy of (2)L _i Time interval of dequeue =100%T _i When the queue occupancy rate is in the rising trendC _i Is a value of (2).

3. A method of distributed load balancing of a programmable data plane based on switch queue behavior according to claim 2, wherein: using random functions provided in the programmable data plane to realize uniform distribution to realize the probability forwarding of formula (2), the output portiIs weighted byW _i The boxes represent, a switch sharing ΣW _i A number of lattices, numbered 1,2,3 … sigmaW _i The random function randomly generates a value at [1, Σ ]W _i ]And (3) the cell numbers in the range, and the data packet is forwarded from the outlet port to which the cell belongs.

4. A method of distributed load balancing of a programmable data plane based on switch queue behavior according to claim 1, wherein: the state collection and return module of the exit processing part of the switch periodically returns the queue behavior data of the exit port to the congestion index estimation module of the entrance processing part, and when the data packet arrives at the exit processing part of the switch, the process is as follows:

said when the data packet arrives at the egress processing part, if the time interval does not reach the periodT _b The state collection module continuously collects queue behavior data of the current port in the outlet processing part, and the collected queue behavior data is stored in a register of the switch in the form of port numbers as indexes;

if the time interval exceeds the periodT _b The state return module reads the local queue behavior data stored on the register, adds the data to the custom header of the clone data packet and returns the custom header to the entry processing part of the switch; when the clone packet reaches the ingress processing portion, the ingress processing portion updates the queue behavior data for the port according to the port number index, and then discards the clone packet.

5. The method for distributed load balancing of a programmable data plane based on switch queue behavior of claim 4, wherein: the custom header of the cloned data packet consists of four fields, namely queue occupancy rate, queue-out time interval, queue change trend and port number index;

6. A method of distributed load balancing of a programmable data plane based on switch queue behavior according to claim 2, wherein: load balancing with any granularity can be realized, the any granularity comprises four types of data packet granularity, packet cluster flowlet granularity, flowcell granularity and flow granularity, the granularity can be determined according to actual requirements, when a data packet arrives at a switch, the switch has the following four forwarding strategies,

for each flow, forwarding ports are independently selected according to the probability shown in the formula (2), and all data packets in one flow are forwarded by using the same port, so that load balancing of flow granularity is realized; the flow refers to a collection of packets having the same five-tuple.

7. A switch queue behavior-based programmable data plane distributed load balancing system for implementing the switch queue behavior-based programmable data plane distributed load balancing method according to any one of claims 1 to 6, comprising:

the state collection module is used for collecting queue behavior data of an exit processing part of the switch, including queue occupancy rate, queue-out time interval and queue change trend;

a return module that periodically returns the queue behavior data of the egress port to the congestion index estimation module of the ingress processing section;

the congestion index estimation module is used for calculating the congestion index of each available outlet port according to the queue behavior of the outlet port;

and the probability forwarding module is used for calculating the probability of forwarding the data packet selected by each output port according to the congestion index.

8. A readable storage medium having stored thereon a computer program, which when executed by a processor implements a programmable data plane distributed load balancing method based on switch queue behaviour as claimed in any one of claims 1-6.

9. An apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the switch queue behavior based programmable data plane distributed load balancing method of any one of claims 1-6 when the program is executed by the processor.