WO2018014569A1 - 负载均衡方法、装置及设备 - Google Patents

负载均衡方法、装置及设备 Download PDF

Info

Publication number
WO2018014569A1
WO2018014569A1 PCT/CN2017/076987 CN2017076987W WO2018014569A1 WO 2018014569 A1 WO2018014569 A1 WO 2018014569A1 CN 2017076987 W CN2017076987 W CN 2017076987W WO 2018014569 A1 WO2018014569 A1 WO 2018014569A1
Authority
WO
WIPO (PCT)
Prior art keywords
path
flowlet
equal
packet
switch
Prior art date
Application number
PCT/CN2017/076987
Other languages
English (en)
French (fr)
Inventor
沈利
袁峰
蒋玲
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP17830212.1A priority Critical patent/EP3468119B1/en
Publication of WO2018014569A1 publication Critical patent/WO2018014569A1/zh
Priority to US16/239,353 priority patent/US11134014B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/03Topology update or discovery by updating link state protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • H04L45/7453Address table lookup; Address filtering using hashing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric
    • H04L49/253Routing or path finding in a switch fabric using establishment or release of connections between ports
    • H04L49/254Centralised controller, i.e. arbitration or scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/11Identifying congestion

Definitions

  • the present application relates to the field of network technologies, and in particular, to a load balancing method, apparatus, and device.
  • FIG. 1A since the traditional load balancing method, such as the Equal-Cost MultiPath (ECMP) method, can only load balance the number of streams, it may happen that multiple elephant streams shown in FIG. 1A happen.
  • ECMP Equal-Cost MultiPath
  • FIG. 1B since the Leaf switches are each load balanced and do not have a global information, multiple elephant streams sent to the same Leaf switch on multiple Leaf switches as shown in FIG.
  • 1A may be sent to the same In the case of a transit switch (such as a Spine switch), downstream traffic is congested. This congestion is called Downstream Collision. Therefore, in order to prevent the occurrence of the above two congestion situations, how to load balance network traffic has become a focus of attention of those skilled in the art.
  • a transit switch such as a Spine switch
  • the source Leaf switch detects a flow (flowlet), wherein a message located in a sliding window is sent as a flowlet. For each new Flowlet, the source Leaf switch always chooses the minimum load path.
  • the load condition of the path is obtained by the following steps: (1) The source Leaf switch encapsulates the congestion information field (including LBTag and CE) into the overlay header of each message. LBTag indicates the port number of the source Leaf switch, and CE indicates the path congestion metric.
  • the CE is updated when it passes through the Spine switch and is temporarily stored in the Congestion-From-Leaf table after reaching the destination Leaf switch.
  • the destination leaf switch After receiving the packet, the destination leaf switch sends a reverse packet to the source leaf switch.
  • the specific destination leaf switch encapsulates the congestion information field (including FB_LBTag and FB_Metric) into the overlay header of the reverse packet.
  • FB_LBTag indicates the port number of the destination Leaf switch
  • FB_Metric indicates the congestion metric.
  • the source Leaf switch After receiving the reverse packet, stores the reverse packet in the Congestion-To-Leaf table. In this way, the source Leaf switch can traverse the egress port that obtains the minimum load path according to the local uplink and the feedback of the downlink congestion, so as to determine the best path for the flowlet.
  • the destination switch needs to return all the downlink load information of the source switch, and needs to traverse the egress port corresponding to the minimum load path, the complexity of the load balancing method is too high, and the load balancing effect is poor.
  • the embodiment of the present application provides a load balancing method, apparatus, and device.
  • the technical solution is as follows:
  • the weights of the equal-cost multi-paths are calculated according to the network topology, the port state, and the link bandwidth. Specifically, for one source switch, the controller calculates the source switch and each destination switch. Obtaining a weighted value of the plurality of equal-cost paths, and obtaining an equivalent path weight table of the source switch, where the equivalent path weight table stores a correspondence between the multiple equal-cost paths and the weight value; The source switch delivers the equal-cost path weight table. In this way, each source switch stores an equivalent path weight table that matches itself.
  • the controller is generally implemented in the following manner when calculating a weight value of multiple equal-cost paths between the source switch and each destination switch: for a destination switch, determining the source switch and the destination switch Each of the equal-cost paths includes a first link path between the source switch and the transit switch, and a second link path between the transit switch and the destination switch; For an equal-cost path, a weight value of the equal-cost path is calculated according to link states of the first link path and the second link path.
  • a source switch after receiving a packet sent by the server, a source switch first performs a flowlet check to determine whether the packet is the first packet of a Flowlet.
  • a flowlet check When determining whether the packet is the first packet of a Flowlet, the following manner may be implemented:
  • the source switch determines the destination switch according to the destination address of the packet, and determines, in the stored equivalent path weight table, at least the association with the destination switch, in the case that the packet is the first packet of the flowlet. a weight value of an equal-cost path; scheduling the message to be transmitted on a corresponding equivalent path according to the weight value of the at least one equal-cost path.
  • the packet is scheduled to be transmitted to the corresponding equal-cost path according to the weight value of the at least one equal-cost path, including:
  • the source switch further uses the path identifier of the first designated equivalent path as the outbound port information, and saves the corresponding Flowlet in the Flowlet table. In the entry.
  • the source switch also updates the quintuple information, the current time as the most recent active time into the Flowlet entry, and updates the valid bit information from the first value to the second value.
  • the source switch sends a link state change message to the controller, where the link state change message indicates a third specified equal cost path in which the link state changes, so that the controller changes according to the link state.
  • the message recalculates the weight values of multiple equal-cost paths between each source switch and each destination switch, obtains a new equal-cost path weight table, and sends a matching new equal-cost path weight table to each source switch. Then, after receiving the new equivalent path weight table sent by the controller, each source switch stores the new equivalent path weight table to replace the previously stored equivalent path weight table.
  • the source switch periodically calculates a difference between the current time and the recorded most recent active time; if the difference is greater than the preset time threshold, Then the source switch sets the flowlet entry to an invalid state to facilitate flowlet detection.
  • the switch After receiving the packet sent by the server, the switch performs flowlet detection, and then the switch directly performs flowlet-based load balancing according to the locally stored equivalent path weight table, wherein the equivalent path weight table stores at least one equal-cost path and For example, after detecting that the currently received packet is the first packet of a Flowlet, the destination switch is directly determined according to the destination address of the packet, and is determined in the stored equivalent path weight table. The weight value of the at least one equal-cost path associated with the destination switch, and then the packet is scheduled to be transmitted to the corresponding equal-cost path according to the weight value of the at least one equal-cost path, so that the switch does not need real-time acquisition. Load balancing is implemented under the load of each path, which greatly reduces the complexity of the load balancing algorithm and has better effects.
  • FIG. 1A is a schematic diagram of a load balancing method provided by the background art of the present application.
  • FIG. 1B is a schematic diagram of a load balancing method provided by the background art of the present application.
  • FIG. 2 is a schematic diagram of a load balancing method provided by the background art of the present application.
  • FIG. 3 is a schematic diagram of a flow transmission process provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a flow of a Flowlet transmitted in a data center network of a Leaf-Spine architecture according to an embodiment of the present application;
  • FIG. 5 is a schematic structural diagram of a switch according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a controller according to an embodiment of the present disclosure.
  • FIG. 7 is a flowchart of a load balancing method according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a system for a load balancing method according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a three-level Clos architecture data center network according to an embodiment of the present application.
  • FIG. 11 is a flowchart of a load balancing method according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a load balancing apparatus according to an embodiment of the present disclosure.
  • FIG. 13 is a schematic structural diagram of a load balancing apparatus according to an embodiment of the present application.
  • the transmission control protocol (TCP) stream transmission when the transmission control protocol (TCP) stream transmission is performed, if the transmission time interval between the two messages is greater than the difference between the two path delays, then the two reports are Text can be sent through different paths without worrying about causing out of order. Specifically, after the message 1 is transmitted through the previous path at the Diverging Point, after the delay time of the two paths is delayed, the message 2 is transmitted through the following path. This ensures that after the message 1 reaches the Converging Point, the message 2 reaches the aggregation point again, and does not cause disorder between the messages. Since the TCP stream naturally has a Burst characteristic, the TCP stream always sends a message located in a sliding window as a Burst, and then waits for an Acknowledgement (ACK).
  • ACK Acknowledgement
  • the message in the next window is sent, so that there is naturally a large time interval between the two Bursts. This time interval ensures that the two Bursts are transmitted through different paths, and the convergence is achieved. The point is not out of order, so a Burst is called a sublet.
  • the Flowlet is not necessarily composed of multiple messages in a sliding window, and may also be composed of messages in multiple sliding windows, and may even be divided into several messages in a sliding window.
  • Flowlet which is related to determining the transmission time interval of the Flowlet. The shorter the transmission time interval, the more the number of Flowlets that the original stream is divided into. The longer the transmission interval is, the smaller the number of Flowlets that the original stream is divided into. The transmission time interval is not as long as possible. It is not as short as possible. There is a critical value. Use this threshold to distinguish the Flowlet. Not only must the number of Flowlets be sufficient, but also the Flowlet should be sent to the convergence point through different paths. After that, it will not be out of order.
  • Figure 4 illustrates the basic principles of a Flowlet in a Leaf-Spine architecture data center network.
  • the first leaf switch is identified by Leaf1
  • the fourth leaf switch is identified by Leaf4.
  • packet 1 and packet 2 are respectively taken from Leaf1 by two paths shown by thick lines in the figure. Send to Leaf4.
  • the delays of the two paths are identified by d1 and d2 respectively, and the transmission interval Gap ⁇
  • FIG. 5 is a schematic structural diagram of a switch according to an embodiment of the present disclosure.
  • the device includes: a transmitter 501, a receiver 502, a memory 503, and a processor 504.
  • the memory 503, the transmitter 501, and the receiver 502 are respectively processed.
  • the device 504 is connected, the memory 503 stores program code, and the processor 504 is configured to call the program code and perform the following operations:
  • determining whether the message is the first message of a Flowlet After receiving a message sent by the server by the receiver 502, determining whether the message is the first message of a Flowlet; if the message is the first message of the Flowlet, according to the message Determining, by the destination address, a destination switch; determining, in the stored equivalent path weight table, a weight value of at least one equal-cost path associated with the destination switch, where the at least one equal-cost path is stored in the equal-cost path weight table Corresponding relationship with the weight value; scheduling the message to be transmitted on the corresponding equivalent path according to the weight value of the at least one equal path.
  • processor 504 is configured to invoke program code to perform the following operations:
  • processor 504 is configured to invoke program code to perform the following operations:
  • processor 504 is configured to invoke program code to perform the following operations:
  • the valid bit information in the flowlet entry is the second value, determining that the packet is a non-first packet of the Flowlet, and determining a second specified equivalent path according to the egress port information in the Flowlet entry. Transmitting the message to the second specified equivalent path for transmission and updating the most recent active time in the Flowlet entry to the current time.
  • processor 504 is configured to invoke program code to perform the following operations:
  • the path identifier of the first specified equivalent path is used as the outbound port information, and is saved in the corresponding Flowlet entry of the Flowlet table;
  • processor 504 is configured to invoke program code to perform the following operations:
  • the link state change message is sent by the sender 501 to the controller, where the link state change message indicates a third specified equivalence of the link state change. a path, so that the controller recalculates the weight value of the at least one equal-cost path according to the link state change message, to obtain a new equivalent path weight table; and receives, by the receiver 502, the location sent by the controller
  • the new equivalent path weight table is described, and the new equivalent path weight table is stored.
  • processor 504 is configured to invoke program code to perform the following operations:
  • the difference between the current time and the most recent active time is periodically calculated; if the difference is greater than the preset time threshold, the Flowlet entry is set to an invalid state.
  • the switch provided by the embodiment of the present application, after receiving the equivalent path weight table sent by the controller and storing the packet, if receiving the packet sent by the server, performing the flowlet detection and directly according to the locally stored equivalent path weight table. And the flowlet-based load balancing of the flowlet is used to select the equal-cost path to be transmitted for the currently received packet.
  • the switch does not need to obtain the load of each path in real time, and only needs to recalculate the equivalent path when the link state changes.
  • the weight can be reduced, and the complexity of the load balancing algorithm is greatly reduced, and the effect is better.
  • FIG. 6 is a schematic structural diagram of a controller according to an embodiment of the present application.
  • a transmitter 601, a receiver 602, a memory 603, and a processor 604 are provided.
  • the memory 603, the transmitter 601, and the receiver 602 are respectively configured.
  • the processor 604 is connected, the memory 603 stores program code, and the processor 604 is configured to call the program code to perform the following operations:
  • For a source switch calculate a weighted value of a plurality of equal-cost paths between the source switch and each destination switch, and obtain an equivalent path weight table of the source switch; send an equivalent path weight table to the source switch through the transmitter 601.
  • the source switch receives a packet sent by the server and determines that the packet is the first packet of the flowlet
  • the destination switch determines the destination switch according to the destination address of the packet, and determines the equivalent path weight table.
  • the weight value of the at least one equal-cost path associated with the destination switch is scheduled to be transmitted to the corresponding equal-cost path according to the weight value of the at least one equal-cost path.
  • processor 604 is configured to invoke program code to perform the following operations:
  • processor 604 is configured to invoke program code to perform the following operations:
  • the controller provided by the embodiment of the present application sets the weight value for each equal-cost path between the two switches according to the information of the entire network link, and sends the corresponding equivalent path weight table to each switch. If the switch receives the packet sent by the server, it performs the flowlet detection and directly performs the flow-based load balancing according to the locally stored equivalent path weight table and the flowlet table, so as to select the equivalent path to be transmitted for the currently received packet.
  • the switch does not need to obtain the load condition of each path in real time, and only needs to recalculate the equivalent path weight when the link state changes, which greatly reduces the complexity of the load balancing algorithm, and the effect is better.
  • FIG. 7 is a flowchart of a load balancing method according to an embodiment of the present application.
  • the main idea of the embodiment of the present application is two. One is to use a flowlet to break up an elephant stream; the other is to implement the flow based on the flowlet.
  • Weighted Cost MultiPath (WCMP) method is used for load balancing. Taking the data center network of the Level-Spine architecture of the second-level Clos as an example, referring to FIG. 7, the method flow provided by the embodiment of the present application includes:
  • the controller calculates the weight of each equal-cost path between the two leaf switches in the Leaf-Spine architecture, generates an equal-cost path weight table, and sends an equivalent path weight table to each Leaf switch.
  • the controller calculates the weight of each equal-cost path between the two Leaf switches based on the link state such as the network topology, port status, and link bandwidth.
  • the equivalent path between the two Leaf switches refers to calculating the equivalent path between the source Leaf switch and each destination Leaf switch for a source Leaf switch.
  • the equivalent paths between the two Leaf switches include three equal-cost paths between Leaf#1 and Leaf#2, and three equivalent paths between Leaf#1 and Leaf#3. 3 equivalent paths between Leaf#2 and Leaf#1, 3 equivalent paths between Leaf#2 and Leaf#3, and 3 equivalent paths between Leaf#3 and Leaf#1, Leaf 3 equivalent paths between #3 and Leaf#2.
  • Leaf#1 and the destination Leaf switch as Leaf#2 Take the source Leaf switch as Leaf#1 and the destination Leaf switch as Leaf#2 as an example.
  • Figure 8 there are three equal-cost paths between the two, namely Leaf#1 ⁇ Spine#1 ⁇ Leaf#2,Leaf# 1 ⁇ Spine#2 ⁇ Leaf#2 and Leaf#1 ⁇ Spine#3 ⁇ Leaf#2.
  • Leaf#1 ⁇ Spine#1 ⁇ Leaf#2 since the link bandwidth of the links Leaf#1 ⁇ Spine#1, Spine#1 ⁇ Leaf#2 is 40G,
  • the weight of the equivalent path is set to 4.
  • the strip is The weight of the equivalent path is also set to 4.
  • the strip is The weight of the equivalent path is also set to 4.
  • Leaf#1 and the destination Leaf switch as Leaf#3 Take the source Leaf switch as Leaf#1 and the destination Leaf switch as Leaf#3 as an example.
  • Figure 8 there are three equal-cost paths between the two, namely Leaf#1 ⁇ Spine#1 ⁇ Leaf#3,Leaf# 1 ⁇ Spine#2 ⁇ Leaf#3 and Leaf#1 ⁇ Spine#3 ⁇ Leaf#3.
  • Leaf#1 ⁇ Spine#1 ⁇ Leaf#3 since the link of Spine#1 ⁇ Leaf#3 is in a invalid state, the link is down, and the equivalent path is The weight is set to 0.
  • the equivalent path weight table associated with Leaf #1 shown in FIG. The horizontal axis represents the path number and the vertical axis represents the destination Leaf switch number.
  • the weight values of the equal-cost paths can also be calculated as described above, and the equivalent path weight table associated with Leaf#2 shown in FIG. 8 is obtained and associated with Leaf#3.
  • the equivalent path weight table After obtaining the above equivalent path weight table, the controller sends the equivalent path weight table associated with Leaf#1 to the Leaf switch Leaf#1, and sends the equivalent path weight table associated with Leaf#2 to the Leaf switch Leaf# 2. Send the equivalent path weight table associated with Leaf#3 to the Leaf switch Leaf#3.
  • a source Leaf switch For a source Leaf switch, receive a packet sent by the server and detect a flowlet, and each time a packet is received, determine whether the packet is the first packet of a Flowlet; if the packet is a non-first packet of the Flowlet Then, the following step 703 is performed; if the message is the first message of the Flowlet, the following step 704 is performed.
  • a flowlet is used to break up the elephant stream.
  • the source switch has different implementation modes when detecting the flowlet.
  • One implementation method is to maintain a flow table for each flow. After receiving the packet, calculating the time difference between the arrival time of the current packet and the arrival time of the previous packet; if the time difference exceeds the preset time threshold of the determination flowlet, the current one is received.
  • a new type of flowlet is used to maintain a flowlet table.
  • the timeout mechanism of the flowlet entry is used. If the timeout period of a flowlet entry exceeds the preset time threshold, the next packet arrives and the corresponding flowlet is determined.
  • the valid bit is 0, which means that the received message is the first packet of a new Flowlet.
  • the second method is actually adopted. The advantage of the second method is that the number of items to be maintained is far less than the first method.
  • a source Leaf switch Each time a source Leaf switch receives a packet, it performs a hash calculation based on the quintuple information of the packet, and determines a Flowlet entry matching the obtained hash value in the stored Flowlet table. If the hash value obtained by hash calculation according to the quintuple information of the packet is consistent with the hash value calculation result of the quintuple information of any Flowlet entry in the Flowlet table, the Flowlet entry is determined as A flowlet entry that matches the resulting hash value.
  • the quintuple information usually includes: Source Internet Protocol (Src IP), destination IP address (Destin IP, Dst IP), source port (Src Port), destination port (Dst Port), and control protocol (Protocol). .
  • Src IP Source Internet Protocol
  • Dst IP destination IP address
  • Src Port source port
  • Dst Port destination port
  • Control protocol Protocol
  • a Flowlet corresponds to an entry.
  • the first five columns of elements are quintuple information
  • the Last Active Time is used to record the last time a Flowlet entry is used
  • the Outport indicates the message used to transmit the message.
  • Equivalent path information The Valid Bit has a value of 0 or 1, which indicates whether the Flowlet recorded by the current Flowlet entry is in a valid state. If the value of the valid bit information is 0, it proves that the Flowlet of the current Flowlet entry record is in an invalid state, and the Flowlet has timed out.
  • the source Leaf switch determines the equal-cost path A according to the egress port information in the associated Flowlet entry in the stored Flowlet table, and the packet is scheduled to the equal-cost path A. Transfer and update the most recent active time in its associated Flowlet entry to the current time.
  • the source Leaf switch since the source Leaf switch has already established the associated Flowlet entry in the Flowlet table when receiving the first packet of the Flowlet, when the non-first packet of the Flowlet is received, the Flowlet entry is already in the Flowlet entry. At least the quintuple information, the latest active time, the outgoing port information, and the valid bit information are included, and the value of the valid bit information is 1. Since the flow port entry includes the outbound port information, the equivalent path A indicated by the outbound port information is directly determined to be an equivalent path for transmitting the packet, and the packet is scheduled to be transmitted on the equal cost path A, and The Last Active Time in the Flowlet entry is updated to the current time.
  • the source Leaf switch determines the destination Leaf switch according to the destination address of the packet, and determines at least one associated with the destination Leaf switch in the stored equivalent path weight table.
  • the price path the message is scheduled to be transmitted on the equivalent path B according to the weight value of at least one equal-cost path.
  • the information other than the valid bit information (valued as 0) in the Flowlet table is related information of the Flowlet that has timed out before.
  • the source switch also needs to save the quintuple information and the current time of the packet as the latest active time to the corresponding Flowlet entry in the Flowlet table, and update the valid bit information from the first value to the second value, that is, 0 is updated to 1.
  • the source Leaf switch calculates the equivalent path to which the packet is scheduled to be transmitted according to the equivalent path weight table sent by the controller.
  • the specific determination process of the equivalent path B is as follows:
  • the source Leaf switch obtains the weight value of each equal-cost path between the target leaf switch, calculates the sum of the weight values of each equal-cost path, and generates a random number whose value range is between zero and the sum of the weight values; According to the value of the random number, in all the equivalent paths between the destination leaf switch, the equivalent path B matching the random number is determined, and the message is scheduled to be transmitted on the equivalent path B.
  • the weights of the three equivalent paths of Leaf#1 to Leaf#3 are 0, 1, and 4, respectively, so the equivalent paths are The sum of the weight values is 5.
  • a random number ranging from 1 to 5 is generated; if the random number has a value of 1, the message is dispatched to Leaf#1 ⁇ Spine#2 ⁇ Leaf# 3 on the equivalent path (ie, the second outgoing port of Leaf #1 in Figure 8); if the random number is 2 to 5, the message is dispatched to Leaf#1 ⁇ Spine#3 ⁇ Leaf#3 On the equivalent path (that is, the third egress port of Leaf#1 in Figure 8), this ensures that the ratio of the number of Flowlets on the three equivalent paths of Leaf#1 to Leaf#3 is 0:1:4. Load balancing.
  • the path identifier of the equivalent path B is saved as the outbound port information.
  • the path identifier of the equivalent path B is saved as the outbound port information.
  • the corresponding Flowlet entry of the Flowlet table In this way, for the subsequent packet of the Flowlet (that is, the non-Flowlet first packet), the outgoing port information in the Flowlet entry can be directly read and sent.
  • the associated Leaf switch reports the port down or up to the controller, and the controller is down according to the port.
  • the weight value of each equal-cost path between the two leaf switches is re-calculated according to the method in the step 701, and the part where the weight value changes is sent to the involved Leaf switch. Assume that if the Spine #1 ⁇ Leaf#3 link in 8 is restored, the equivalent path weight table of each Leaf switch will become as shown in FIG. 9. Comparing with FIG.
  • the leaf switch periodically scans all the flowlet entries in the flowlet table, and sets the entry to an invalid state for the entry that the flowlet has timed out.
  • the entry whose timeout has expired refers to the entry in the current time minus the value of the Last Active Time in the entry that is greater than a preset time threshold. That is, the Leaf switch periodically calculates the difference between the current time and the most recent active time recorded in the Flowlet entry for each Flowlet entry in the Flowlet table; if the difference is greater than the preset time threshold, the Flowlet The entry is set to an invalid state.
  • the load balancing method mentioned in the embodiment of the present application can also be applied to the Fat-tree architecture data center network of the three-stage Clos.
  • the Fat-tree architecture of the three-stage Clos is actually a superposition of multiple secondary Clos.
  • the three-level Clos Fat-tree architecture, two Pool Of Devices (POD) respectively correspond to one secondary Clos, namely Top Of Rack (TOR) #1, TOR#2.
  • Aggregation switches (Aggregation, AGG) #1, AGG#2 form a secondary Clos, TOR#3, TOR#4, AGG#3, AGG#4 form another secondary Clos, while the second layer of AGG#1 AGG#2, AGG#3, AGG#4 and Spine#1, Spine#2 form a second-level Clos, which respectively runs the scheme described in this application, which can achieve very good load balancing effect.
  • the controller sets the weight value for each equal-cost path between the two switches according to the information of the entire network link, and sends the corresponding equal-cost path weight table to each switch.
  • the switch After receiving the packet sent by the server, the switch performs the flowlet detection, so that the switch can directly perform loadlet-based load balancing according to the locally stored equivalent path weight table and the flowlet table, and the switch does not need to obtain the load condition of each path in real time, only It is necessary to recalculate the equivalent path weight when the link state changes, which greatly reduces the complexity of the load balancing algorithm, and the effect is better.
  • FIG. 11 is a flowchart of a load balancing method according to an embodiment of the present application. Taking the perspective of the method for performing the method on the switch, as shown in FIG. 11 , the method process provided by the embodiment of the present application includes:
  • step 1102 After receiving a message sent by the server, it is determined whether the message is the first message of a Flowlet. If the message is the first message of a Flowlet, the following step 1102 is performed.
  • the destination switch is determined according to the destination address of the packet.
  • the packet is scheduled to be transmitted on the corresponding equivalent path according to the weight value of the at least one equal-cost path.
  • the switch performs the flowlet detection after receiving the packet sent by the server.
  • the switch then directly performs flowlet-based load balancing according to the locally stored equivalent path weight table, wherein the equivalent path weight table stores a correspondence between at least one equal-cost path and the weight value; for example, when the current reception is detected
  • the destination switch is directly determined according to the destination address of the packet, and the weight of at least one equal-cost path associated with the destination switch is determined in the stored equivalent path weight table.
  • the value is further, according to the weight value of the at least one equal-cost path, the message is scheduled to be transmitted to the corresponding equal-cost path, thereby implementing load balancing without requiring the switch to obtain the load of each path in real time, thereby greatly reducing the load.
  • the complexity of the load balancing algorithm is better.
  • the scheduling according to the weight value of the at least one equal-cost path, scheduling the message to be transmitted on a corresponding equivalent path, including:
  • the determining whether the packet is the first packet of a Flowlet includes:
  • the valid bit information in the flowlet entry is the first value, determining that the message is the first message of the Flowlet, and updating the quintuple information and the current time as the latest active time to the Flowlet entry. And updating the valid bit information from the first value to the second value;
  • the flowlet entry includes at least one quintuple information, recent active information, outbound port information, and valid bit information of the Flowlet.
  • the method further includes:
  • the valid bit information in the flowlet entry is the second value, determining that the packet is a non-first packet of the Flowlet, and determining a second specified equivalent path according to the egress port information in the Flowlet entry. ;
  • the message is scheduled to be transmitted on the second designated equivalent path, and the most recent active time in the Flowlet entry is updated to the current time.
  • the method further includes:
  • the path identifier of the first designated equivalence path is saved as the outbound port information in the corresponding Flowlet entry of the Flowlet table.
  • the method further includes:
  • the method further includes:
  • the Flowlet entry is set to an invalid state.
  • FIG. 12 is a schematic structural diagram of a load balancing apparatus according to an embodiment of the present application.
  • the apparatus includes: a determining module 1201, a determining module 1202, and a scheduling module 1203.
  • the determining module 1201 is configured to determine, after receiving a message sent by the server, whether the message is the first message of a Flowlet;
  • the determining module 1202 is configured to: if the packet is the first packet of the flowlet, determine the destination switch according to the destination address of the packet;
  • the determining module 1202 is further configured to determine, in the stored equivalent path weight table, a weight value of at least one equal-cost path associated with the destination switch, where the at least one piece is stored in the equal-cost path weight table The correspondence between the equivalent path and the weight value;
  • the scheduling module 1203 is configured to schedule, according to the weight value of the at least one equal-cost path, the message to be transmitted to the corresponding equivalent path.
  • the scheduling module 1203 is configured to obtain weight values of each of the at least one equal-cost path, calculate a sum of weight values of each of the equal-cost paths, and generate a value. a range of zeros to a random number between the sum of the weight values; determining, according to the value of the random number, a first specified equivalent path that matches the random number in the at least one equivalent path; The message is scheduled to be transmitted on the first designated equivalent path.
  • the apparatus further comprises:
  • the determining module 1201 is configured to perform a hash calculation according to the quintuple information of the packet to obtain a hash value, and in the stored Flowlet table, determine a Flowlet entry that matches the hash value; if the Flowlet If the valid bit information in the entry is the first value, determining that the message is the first message of the Flowlet;
  • the update module 1204 is configured to: when the message is the first message of the flowlet, update the quintuple information and the current time as the latest active time into the Flowlet entry, and use the valid bit information
  • the first value is updated to a second value; wherein a flowlet entry includes at least one quintuple information, recent active information, outbound port information, and valid bit information of the Flowlet.
  • the determining module 1201 is further configured to: if the valid bit information in the Flowlet entry is the second value, determine that the message is a non-first message of the Flowlet;
  • the determining module 1202 is further configured to: when the packet is a non-first packet of the Flowlet, determine a second specified equivalence path according to the egress port information in the Flowlet entry;
  • the scheduling module 1203 is further configured to: when the packet is a non-first packet of the Flowlet, schedule the packet to be transmitted to the second specified equal-cost path; and the update module 1204 is further configured to: When the message is a non-first message of the Flowlet, the latest active time in the Flowlet entry is updated to the current time.
  • the apparatus further comprises:
  • the saving module 1205 is configured to save the path identifier of the first designated equivalent path as the outbound port information in the corresponding Flowlet entry of the Flowlet table after determining the first specified equivalent path that matches the random number.
  • the apparatus further comprises:
  • the sending module 1205 is configured to: if the link state of the at least one equal-cost path changes, send a link state change message to the controller, where the link state change message indicates a third designation that the link state changes. Equivalent path, so that the controller recalculates the weight value of the at least one equal-cost path according to the link state change message, to obtain a new equivalent path weight table;
  • the receiving module 1206 is configured to receive the new equivalent path weight table sent by the controller, and store the new equivalent path weight table.
  • the apparatus further comprises:
  • the calculating module 1207 is configured to periodically calculate a difference between the current time and the most recent active time for each Flowlet entry in the Flowlet table;
  • the setting module 1208 is configured to set the Flowlet entry to an invalid state if the difference is greater than a preset time threshold.
  • the apparatus after receiving the equivalent path weight table sent by the controller and storing the packet, if receiving the packet sent by the server, performing the flowlet detection and directly according to the locally stored equivalent path weight table And the flowlet-based load balancing of the flowlet is used to select the equal-cost path to be transmitted for the currently received packet.
  • the switch does not need to obtain the load of each path in real time, and only needs to recalculate the equivalent path when the link state changes.
  • the weight can be reduced, and the complexity of the load balancing algorithm is greatly reduced, and the effect is better.
  • FIG. 13 is a schematic structural diagram of a load balancing apparatus according to an embodiment of the present application.
  • the apparatus includes: a calculation module 1301 and a transmission module 1302.
  • the calculation module 1301 is configured to calculate, for a source switch, a weight value of multiple equal-cost paths between the source switch and each destination switch, to obtain an equivalent path weight table of the source switch.
  • the sending module 1302 is configured to send the equal-cost path weight table to the source switch, so that the source switch receives a packet sent by the server, and determines that the packet is the first packet of a Flowlet. Determining, according to the destination address of the packet, a destination switch, determining, in the equal-cost path weight table, a weight value of at least one equal-cost path associated with the destination switch, according to a weight of the at least one equal-cost path Value, the message is scheduled to be transmitted on the corresponding equivalent path.
  • the calculating module 1301 is configured to determine, for a destination switch, respective equal paths between the source switch and the destination switch, where the equivalent path includes the source switch to a first link path between the transit switches and a second link path between the transit switch and the destination switch; for an equal path, according to the first link path and the second link The link state of the path, and the weight value of the equivalent path is calculated.
  • the apparatus further comprises:
  • the receiving module 1303 is configured to receive a link state change message sent by the source switch, where the link state change message indicates an equivalent path where a link state changes.
  • the calculating module 1301 is further configured to recalculate weight values of multiple equal-cost paths between each source switch and each destination switch according to the link state change message, to obtain a new equivalent path weight table;
  • the sending module 1302 is further configured to send a matching new equivalent path weight table to each source switch.
  • the device provided by the embodiment of the present application sets a weight value for each equal-cost path between the two switches according to the information of the entire network link, and sends the corresponding equivalent path weight table to each switch. If the switch receives the service
  • the packet sent by the server performs the flowlet detection and directly performs the flow-based load balancing according to the locally stored equivalent path weight table and the flowlet table, so as to select the equivalent path to be transmitted for the currently received message, without the switch real-time.
  • To obtain the load status of each path it is only necessary to recalculate the equivalent path weight when the link status changes, which greatly reduces the complexity of the load balancing algorithm, and the effect is better.
  • the load balancing device provided by the foregoing embodiment is only illustrated by the division of the foregoing functional modules when performing load balancing. In actual applications, the function allocation may be completed by different functional modules as needed. The internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the load balancing device and the load balancing method are provided in the same embodiment. For details, refer to the method embodiment, and details are not described herein.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种负载均衡方法、装置及设备,属于网络技术领域。所述方法包括:在接收到服务器发送的一个报文后,判断该报文是否为一个Flowlet的首报文;若该报文为所述Flowlet的首报文,则根据该报文的目的地址确定目的交换机;在存储的等价路径权重表中,确定与该目的交换机关联的至少一条等价路径的权重值,该等价路径权重表中存储了所述至少一条等价路径与权重值之间的对应关系;根据所述至少一条等价路径的权重值,将该报文调度到相应的等价路径上传输。本申请交换机在接收到服务器发送的报文后,进行Flowlet检测,且直接根据本地存储的等价路径权重表作基于Flowlet的负载均衡,无需交换机实时获取各个路径的负载情况,大大降低了负载均衡算法的复杂度,效果佳。

Description

负载均衡方法、装置及设备
本申请要求于2016年07月19日提交中国专利局、申请号为201610570733.8、发明名称为“负载均衡方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及网络技术领域,特别涉及一种负载均衡方法、装置及设备。
背景技术
随着网络技术的不断发展,数据中心网络越来越广泛地被云服务提供商和企业所创建。在基于数据中心网络进行流量传输时,常常会发生下述两种流量拥塞情况。在图1A中,由于传统的负载均衡方法如等价多路径(Equal-Cost MultiPath,ECMP)方式仅能做到对流数的负载均衡,因此可能会出现图1A所示的多条大象流凑巧映射到同一条链路的情况,导致汇聚流量超过端口容量,引起流量拥塞,这种拥塞称为局部冲突(Local Collision)。在图1B中,由于Leaf交换机各自做负载均衡,没有一个全局的信息,因此可能会出现图1A所示的多个Leaf交换机上发往同一个Leaf交换机的多条大象流,发送到同一个中转交换机(比如Spine交换机)的情况,导致下行流量拥塞,这种拥塞称为下游冲突(Downstream Collision)。因此,为了防止上述两种拥塞情况的发生,如何对网络流量进行负载均衡成为了本领域技术人员关注的一个焦点。
参见图2,现有技术在进行负载均衡时主要逻辑如下:源Leaf交换机检测子流(Flowlet),其中位于一个滑动窗口内的报文作为一个Flowlet一起发送。对于每个新的Flowlet,源Leaf交换机总是选择最小负载路径。其中,路径的负载情况通过下述步骤得到:(1)源Leaf交换机将拥塞信息字段(包括LBTag和CE)封装到每个报文的覆盖(overlay)头中。其中,LBTag表示源Leaf交换机的端口号,CE表示路径拥塞度量值;CE在经过Spine交换机时更新,并在到达目的Leaf交换机后暂存到Congestion-From-Leaf表中。(2)目的Leaf交换机在接收到上述报文后,向源Leaf交换机发送反向报文,具体目的Leaf交换机将拥塞信息字段(包括FB_LBTag和FB_Metric)封装到反向报文的overlay头中。其中,FB_LBTag表示目的Leaf交换机的端口号,FB_Metric表示拥塞度量值。源Leaf交换机在接收到反向报文后,将反向报文存放到Congestion-To-Leaf表中。这样,源Leaf交换机便可以根据本地上行链路和反馈回来的下行链路的拥塞情况,遍历获取最小负载路径的出端口,以为Flowlet决策最佳路径。
在实现本申请的过程中,发明人发现现有技术至少存在以下问题:
由于需要目的Leaf交换机返回给源Leaf交换机所有的下行链路负载信息,且需遍历获取最小负载路径对应的出端口,因此该种负载均衡方法的复杂度过高,负载均衡效果欠佳。
发明内容
为了解决现有技术的问题,本申请实施例提供了一种负载均衡方法、装置及设备。所述技术方案如下:
针对控制器来讲,首先会根据网络拓扑、端口状态、链路带宽等信息,计算各条等价多路径的权重,具体为:对于一个源交换机,控制器计算所述源交换机与各个目的交换机之间多条等价路径的权重值,得到所述源交换机的等价路径权重表,所述等价路径权重表中存储了所述多条等价路径与权重值之间的对应关系;向所述源交换机下发所述等价路径权重表。这样每一个源交换机中都会存储有一个与自身匹配的等价路径权重表。
其中,所述控制器在计算所述源交换机与各个目的交换机之间多条等价路径的权重值时,通常采取下述方式实现:对于一个目的交换机,确定所述源交换机与所述目的交换机之间的各条等价路径,所述等价路径包括所述源交换机至中转交换机之间的第一链路路径、以及所述中转交换机至所述目的交换机之间的第二链路路径;对于一条等价路径,根据所述第一链路路径和所述第二链路路径的链路状态,计算所述等价路径的权重值。
在另一个实施例中,一个源交换机在接收到服务器发送的一个报文后,首先进行Flowlet检测,判断所述报文是否为一个Flowlet的首报文。其中,在判断所述报文是否为一个Flowlet的首报文时,可采取下述方式实现:
根据所述报文的五元组信息进行哈希计算,得到哈希值;在存储的Flowlet表中,确定与所述哈希值匹配的Flowlet条目;若所述Flowlet条目中的有效位信息为第一数值,则确定所述报文为所述Flowlet的首报文;若所述Flowlet条目中的有效位信息为所述第二数值,则确定所述报文为所述Flowlet的非首报文;其中,一个Flowlet条目中至少包括一个Flowlet的五元组信息、最近活跃信息、出端口信息和有效位信息。
针对所述报文为所述Flowlet的首报文的情况,源交换机会根据所述报文的目的地址确定目的交换机;在存储的等价路径权重表中,确定与所述目的交换机关联的至少一条等价路径的权重值;根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
在本申请实施例中,所述根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输,包括:
获取所述至少一条等价路径中每一条等价路径的权重值;计算所述每一条等价路径的权重值之和,产生一个数值范围位于零至所述权重值之和之间的随机数;根据所述随机数的数值,在所述至少一条等价路径中,确定与所述随机数匹配的第一指定等价路径;将所述报文调度到所述第一指定等价路径上传输。
需要说明的是,在确定与所述随机数匹配的第一指定等价路径后,源交换机还会将所述第一指定等价路径的路径标识作为出端口信息,保存在Flowlet表的对应Flowlet条目中。此外,源交换机还会将所述五元组信息、当前时间作为最近活跃时间更新至所述Flowlet条目中,并将所述有效位信息由第一数值更新为第二数值。
针对所述报文为所述Flowlet的非首报文的情况,源交换机根据所述Flowlet条目中的出端口信息确定第二指定等价路径;将所述报文调度到所述第二指定等价路径上传输,并将所述Flowlet条目中的最近活跃时间更新为当前时间。
在另一个实施例中,若与所述目的交换机关联的至少一条等价路径的链路状态发生变 化,则源交换机向控制器发送链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的第三指定等价路径,以使所述控制器根据所述链路状态变更消息,重新计算各个源交换机与各个目的交换机之间多条等价路径的权重值,得到新等价路径权重表,并向各个源交换机发送匹配的新等价路径权重表。这样每一个源交换机在接收到所述控制器发送的新等价路径权重表后,存储所述新等价路径权重表用以替换之前存储的等价路径权重表。
在另一个实施例中,对于自身存储的Flowlet表中每一个Flowlet条目,源交换机会周期性计算当前时间与记录的最近活跃时间之间的差值;若所述差值大于预设时间阈值,则源交换机将所述Flowlet条目设置为无效状态,以方便进行Flowlet检测。
本申请实施例提供的技术方案带来的有益效果是:
交换机在接收到服务器发送的报文后进行Flowlet检测,之后交换机直接根据本地存储的等价路径权重表作基于Flowlet的负载均衡,其中所述等价路径权重表中存储了至少一条等价路径与权重值之间的对应关系;比如在检测到当前接收到的报文为一个Flowlet的首报文后,直接根据该报文的目的地址确定目的交换机,并在存储的等价路径权重表中确定与所述目的交换机关联的至少一条等价路径的权重值,进而根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输,从而在无需交换机实时获取各个路径的负载情况下实现了负载均衡,大大降低了负载均衡算法的复杂度,效果较佳。
附图说明
图1A是本申请背景技术提供的一种负载均衡方法的逻辑示意图;
图1B是本申请背景技术提供的一种负载均衡方法的逻辑示意图;
图2是本申请背景技术提供的一种负载均衡方法的逻辑示意图;
图3是本申请实施例提供的一种流传输过程的逻辑示意图;
图4是本申请实施例提供的一种Flowlet在Leaf-Spine架构数据中心网络中传输的逻辑示意图;
图5是本申请实施例提供的一种交换机的结构示意图;
图6是本申请实施例提供的一种控制器的结构示意图;
图7是本申请实施例提供的一种负载均衡方法的流程图;
图8是本申请实施例提供的一种负载均衡方法的系统示意图;
图9是本申请实施例提供的一种等价路径权重表的示意图;
图10是本申请实施例提供的一种三级Clos架构数据中心网络的示意图;
图11是本申请实施例提供的一种负载均衡方法的流程图;
图12是本申请实施例提供的一种负载均衡装置的结构示意图;
图13是本申请实施例提供的一种负载均衡装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在对本申请实施例进行详细地解释说明之前,先对Flowlet的原理进行一下解释说明。
如图3所示,在进行传输控制协议(Transmission Control Protocol,TCP)流传输时,如果保证前后两个报文之间的传输时间间隔大于两条路径延时的差值,那么这两个报文便可以通过不同的路径发送,而不用担心引起乱序。具体来说,在分开点(Diverging Point)通过上面一条路径传输报文1后,经过两条路径延时的差值时长后,再通过下面一条路径传输报文2。这样可保证在报文1到达汇聚点(Converging Point)之后,报文2再到达聚集点,不会引起报文之间的乱序。由于TCP流天然存在突发(Burst)特性,即TCP流总是将位于一个滑动窗口内的报文作为一个Burst一起发送,然后等待确认字符(Acknowl edgement,ACK)。在接收到ACK后再发送下一个窗口内的报文,这样的话两个Burst之间便天然存在一个较大的时间间隔,这个时间间隔可以保证这两个Burst通过不同的路径传输,在达到汇聚点时并不会乱序,这样一个Burst称之为一个子流(Flowlet)。
需要说明的是,Flowlet也并不一定是由一个滑动窗口内的多个报文组成,还可能由多个滑动窗口内的报文组成,甚至可能是一个滑动窗口内的报文被分成几个Flowlet,这和判断Flowlet的传输时间间隔有关,这个传输时间间隔越短,原始的一条流被分成的Flowlet数越多;这个传输时间间隔越长,原始的一条流被分成的Flowlet数越少。这个传输时间间隔并不是越长越好,也不是越短越好,会有一个临界值,用这个临界值区分Flowlet,不但需保证Flowlet数足够多,还需保证Flowlet通过不同路径发送到汇聚点后还不会乱序。
图4说明了Flowlet在Leaf-Spine架构数据中心网络中的基本原理。以Leaf1对第一个Leaf交换机进行标识,以Leaf4对第四个Leaf交换机进行标识为例,在图4中,报文1和报文2分别通过图中粗线所示的两条路径从Leaf1发送到Leaf4。分别以d1和d2对两条路径的延时进行标识,则需保证报文1和报文2之间的发送时间间隔Gap≥|d1-d2|,这样报文1便会先于报文2达到Leaf4,从而保证两个报文之间不会乱序。
图5是本申请实施例提供的一种交换机的结构示意图,参见图5,包括:发送器501、接收器502、存储器503和处理器504,存储器503、发送器501和接收器502分别与处理器504连接,存储器503存储有程序代码,处理器504用于调用程序代码,执行以下操作:
在通过接收器502接收到服务器发送的一个报文后,判断所述报文是否为一个Flowlet的首报文;若所述报文为所述Flowlet的首报文,则根据所述报文的目的地址确定目的交换机;在存储的等价路径权重表中,确定与所述目的交换机关联的至少一条等价路径的权重值,所述等价路径权重表中存储了所述至少一条等价路径与权重值之间的对应关系;根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
在另一个实施例中,处理器504用于调用程序代码,执行以下操作:
获取所述至少一条等价路径中每一条等价路径的权重值;计算所述每一条等价路径的权重值之和,产生一个数值范围位于零至所述权重值之和之间的随机数;根据所述随机数的数值,在所述至少一条等价路径中,确定与所述随机数匹配的第一指定等价路径;将所述报文调度到所述第一指定等价路径上传输。
在另一个实施例中,处理器504用于调用程序代码,执行以下操作:
根据所述报文的五元组信息进行哈希计算,得到哈希值;在存储的Flowlet表中,确定与所述哈希值匹配的Flowlet条目;若所述Flowlet条目中的有效位信息为第一数值,则确定所述报文为所述Flowlet的首报文,将所述五元组信息、当前时间作为最近活跃时 间更新至所述Flowlet条目中,并将所述有效位信息由第一数值更新为第二数值;其中,一个Flowlet条目中至少包括一个Flowlet的五元组信息、最近活跃信息、出端口信息和有效位信息。
在另一个实施例中,处理器504用于调用程序代码,执行以下操作:
若所述Flowlet条目中的有效位信息为所述第二数值,则确定所述报文为所述Flowlet的非首报文,根据所述Flowlet条目中的出端口信息确定第二指定等价路径;将所述报文调度到所述第二指定等价路径上传输,并将所述Flowlet条目中的最近活跃时间更新为当前时间。
在另一个实施例中,处理器504用于调用程序代码,执行以下操作:
在确定与所述随机数匹配的第一指定等价路径后,将所述第一指定等价路径的路径标识作为出端口信息,保存在Flowlet表的对应Flowlet条目中;
在另一个实施例中,处理器504用于调用程序代码,执行以下操作:
若所述至少一条等价路径的链路状态发生变化,则通过发送器501向控制器发送链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的第三指定等价路径,以使所述控制器根据所述链路状态变更消息,重新计算所述至少一条等价路径的权重值,得到新等价路径权重表;通过接收器502接收所述控制器发送的所述新等价路径权重表,并存储所述新等价路径权重表。
在另一个实施例中,处理器504用于调用程序代码,执行以下操作:
对于所述Flowlet表中的每一个Flowlet条目,周期性计算当前时间与最近活跃时间之间的差值;若所述差值大于预设时间阈值,则将所述Flowlet条目设置为无效状态。
本申请实施例提供的交换机,在接收到控制器下发的等价路径权重表并进行存储后,若接收到服务器发送的报文,则进行Flowlet检测并直接根据本地存储的等价路径权重表和Flowlet表作基于Flowlet的负载均衡,为当前接收到的报文选择待传输的等价路径,无需交换机实时获取各个路径的负载情况,仅需在链路状态发送变化时重新计算一次等价路径权重即可,大大降低了负载均衡算法的复杂度,效果较佳。
图6是本申请实施例提供的一种控制器的结构示意图,参见图6,包括:发送器601、接收器602、存储器603和处理器604,存储器603、发送器601和接收器602分别与处理器604连接,存储器603存储有程序代码,处理器604用于调用程序代码,执行以下操作:
对于一个源交换机,计算该源交换机与各个目的交换机之间多条等价路径的权重值,得到该源交换机的等价路径权重表;通过发送器601向该源交换机下发等价路径权重表,以使该源交换机在接收到服务器发送的一个报文且判断出该报文为一个Flowlet的首报文后,根据该报文的目的地址确定目的交换机,在等价路径权重表中确定与该目的交换机关联的至少一条等价路径的权重值,根据至少一条等价路径的权重值,将该报文调度到相应的等价路径上传输。
在另一个实施例中,处理器604用于调用程序代码,执行以下操作:
对于一个目的交换机,确定该源交换机与该目的交换机之间的各条等价路径,所述等价路径包括该源交换机至中转交换机之间的第一链路路径、以及中转交换机至该目的交换机之间的第二链路路径;对于一条等价路径,根据第一链路路径和第二链路路径的链路状 态,计算该等价路径的权重值。
在另一个实施例中,处理器604用于调用程序代码,执行以下操作:
通过接收器602接收该源交换机发送的链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的等价路径;根据链路状态变更消息,重新计算各个源交换机与各个目的交换机之间多条等价路径的权重值,得到新等价路径权重表;通过发送器601向各个源交换机发送匹配的新等价路径权重表。
本申请实施例提供的控制器,在根据整网链路信息,分别为两两交换机之间的各条等价路径设置权重值,并将对应的等价路径权重表下发至每一个交换机后,若交换机接收到服务器发送的报文,则进行Flowlet检测并直接根据本地存储的等价路径权重表和Flowlet表作基于Flowlet的负载均衡,以为当前接收到的报文选择待传输的等价路径,无需交换机实时获取各个路径的负载情况,仅需在链路状态发送变化时重新计算一次等价路径权重即可,大大降低了负载均衡算法的复杂度,效果较佳。
图7是本申请实施例提供的一种负载均衡方法的流程图,本申请实施例的主要思路有两个,一个是使用Flowlet来打散大象流;另一个是在Flowlet的基础上实现通过带权重非等价多路径(Weighted Cost MultiPath,WCMP)方法来作负载均衡。以二级Clos的Leaf-Spine架构数据中心网络为例,参见图7,本申请实施例提供的方法流程包括:
701、控制器计算Leaf-Spine架构中两两Leaf交换机之间各条等价路径的权重,生成等价路径权重表,并下发等价路径权重表至各个Leaf交换机。
如图8所示,控制器会根据网络拓扑、端口状态、链路带宽等链路状态,计算出两两Leaf交换机之间各条等价路径的权重。需要说明的是,两两Leaf交换机之间各条等价路径指代的含义是,对于一个源Leaf交换机,计算该源Leaf交换机与各个目的Leaf交换机之间的全部等价路径。在图8中,两两Leaf交换机之间各条等价路径,包括Leaf#1与Leaf#2之间的3条等价路径、Leaf#1与Leaf#3之间的3条等价路径,Leaf#2与Leaf#1之间的3条等价路径、Leaf#2与Leaf#3之间的3条等价路径,以及Leaf#3与Leaf#1之间的3条等价路径、Leaf#3与Leaf#2之间的3条等价路径。
以源Leaf交换机为Leaf#1,目的Leaf交换机为Leaf#2为例,参加图8,二者之间总共存在三条等价路径,分别为Leaf#1→Spine#1→Leaf#2,Leaf#1→Spine#2→Leaf#2和Leaf#1→Spine#3→Leaf#2。其中,对于第一条等价路径Leaf#1→Spine#1→Leaf#2,由于其链路Leaf#1→Spine#1、Spine#1→Leaf#2的链路带宽都是40G,因此将该条等价路径的权重设置为4。对于第二条等价路径Leaf#1→Spine#2→Leaf#2,由于其链路Leaf#1→Spine#2、Spine#2→Leaf#2的链路带宽都是40G,因此将该条等价路径的权重也设置为4。对于第三条等价路径Leaf#1→Spine#3→Leaf#3,由于其链路Leaf#1→Spine#3、Spine#3→Leaf#2的链路带宽都是40G,因此将该条等价路径的权重也设置为4。
以源Leaf交换机为Leaf#1,目的Leaf交换机为Leaf#3为例,参加图8,二者之间总共存在三条等价路径,分别为Leaf#1→Spine#1→Leaf#3,Leaf#1→Spine#2→Leaf#3和Leaf#1→Spine#3→Leaf#3。其中,对于第一条等价路径Leaf#1→Spine#1→Leaf#3,由于Spine#1→Leaf#3这段链路处于失效状态,因此该条链路down,将该条等价路径的权重设置为0。对于第二条等价路径Leaf#1→Spine#2→Leaf#3,由于Spine#2-->Leaf#3这段链 路的链路带宽为10G,因此将该条等价路径的权重设置为1。对于第三条等价路径Leaf#1→Spine#3→Leaf#3,由于其链路Leaf#1→Spine#3、Spine#3→Leaf#3的链路带宽都是40G,因此将该条等价路径的权重设置为4。
对于Leaf#1来说,在计算出Leaf#1与Leaf#2之间3条等价路径的权重值、Leaf#1与Leaf#3之间3条等价路径的权重值之后,便可得到图8所示的与Leaf#1关联的等价路径权重表。其中,横轴表示路径号、纵轴表示目的Leaf交换机号。而对于Leaf#2和Leaf#3来说,也可按照上述方式计算各条等价路径的权重值,得到图8所示的与Leaf#2关联的等价路径权重表和与Leaf#3关联的等价路径权重表。控制器在得到上述等价路径权重表后,将与Leaf#1关联的等价路径权重表发送至Leaf交换机Leaf#1,将与Leaf#2关联的等价路径权重表发送至Leaf交换机Leaf#2,将与Leaf#3关联的等价路径权重表发送至Leaf交换机Leaf#3。
702、对于一个源Leaf交换机,接收服务器发送的报文并检测Flowlet,每接收到一个报文,判断该报文是否为一个Flowlet的首报文;若该报文为该Flowlet的非首报文,则执行下述步骤703;若该报文为该Flowlet的首报文,则执行下述步骤704。
在本申请实施例中,使用Flowlet来打散大象流。其中,源交换机在检测Flowlet时有不同的实现方式,一种实现方式是为每条流维护一个流表。在接收到报文后,计算当前报文的到达时间与前一个报文的到达时间之间的时间差值;如果该时间差值超过判定Flowlet的预设时间阈值,则当前接收到的是一个新的Flowlet;另一种实现方式是维护一个Flowlet表,通过Flowlet表项超时机制,即如果一个Flowlet表项不活动时间超过预设时间阈值,那么下一个报文到达后,判断出对应Flowlet的valid bit为0,这说明接收到的报文是一个新Flowlet的首包,更具体的说明请参见后续描述部分。在本申请实施例中实际采用第二种方式,第二种方式的好处是需要维护的表项数远少于第一种方式。
源Leaf交换机每接收到一个报文,都会根据该报文的五元组信息进行哈希计算,并在存储的Flowlet表中确定与得到的哈希值匹配的Flowlet条目。其中,若根据该报文的五元组信息进行哈希计算得到的哈希值,与Flowlet表中任一Flowlet条目的五元组信息的哈希值计算结果一致,则将该Flowlet条目确定为与得到的哈希值匹配的Flowlet条目。
若该Flowlet条目中的有效位信息(Valid Bit)为第一数值,则确定该报文为该Flowlet的首报文;若该Flowlet条目中的有效位信息为第二数值,则确定该报文为该Flowlet的非首报文。其中,第一数值通常为0,第二数值通常为1。五元组信息通常包括:源网络协议地址(Source Internet Protocol,Src IP)、目的IP地址(Destination IP,Dst IP)、源端口(Src Port)、目的端口(Dst Port)、控制协议(Protocol)。此外,Flowlet表包含的内容具体如下述表1所示:
表1
Figure PCTCN2017076987-appb-000001
Figure PCTCN2017076987-appb-000002
在表1中,一个Flowlet对应一个条目。对于一个Flowlet条目来说,前五列元素为五元组信息,最近活跃时间(Last Active Time)用于记录最后一次使用一条Flowlet条目的时间,出端口信息(Outport)指示用于传输报文的等价路径信息。Valid Bit的取值为0或1,用于指示当前Flowlet条目记录的Flowlet是否处于有效状态。如果有效位信息的取值为0,则证明当前Flowlet条目记录的Flowlet处于无效状态,这个Flowlet已经超时。
703、若该报文为该Flowlet的非首报文,源Leaf交换机根据存储的Flowlet表中与其关联Flowlet条目中的出端口信息确定等价路径A,将该报文调度到等价路径A上传输,并将与其关联Flowlet条目中的最近活跃时间更新为当前时间。
针对该种情况,由于源Leaf交换机在接收到该Flowlet的首报文时,已经在Flowlet表中建立好与其关联Flowlet条目,因此在接收到该Flowlet的非首报文时,该Flowlet条目中已经至少包括五元组信息、最近活跃时间、出端口信息和有效位信息了,且有效位信息的取值为1。由于该Flowlet条目中包括出端口信息,因此直接将该出端口信息指示的等价路径A确定为传输该报文的等价路径,将该报文调度到等价路径A上进行传输,同时将该Flowlet条目中的Last Active Time更新为当前时间。
704、若该报文为该Flowlet的首报文,则源Leaf交换机根据该报文的目的地址确定目的Leaf交换机,在存储的等价路径权重表中,确定与目的Leaf交换机关联的至少一条等价路径;根据至少一条等价路径的权重值,将该报文调度到等价路径B上传输。
由于该报文为该Flowlet的首报文,因此这是一个新的Flowlet,Flowlet表中除了有效位信息(取值为0)之外的其他信息均为之前已超时的Flowlet的相关信息,因此还需源Leaf交换机将该报文的五元组信息、当前时间作为最近活跃时间保存到Flowlet表中对应的Flowlet条目中,同时将有效位信息由第一数值更新为第二数值,也即由0更新为1。之后,源Leaf交换机再根据控制器下发的等价路径权重表,计算将该报文调度到哪一个等价路径上进行传输。其中,等价路径B的具体确定过程如下:
源Leaf交换机获取与目的Leaf交换机之间每一条等价路径的权重值,计算每一条等价路径的权重值之和,并产生一个数值范围位于零至该权重值之和之间的随机数;根据该随机数的数值,在与目的Leaf交换机之间的全部等价路径中,确定与该随机数匹配的等价路径B,将该报文调度到等价路径B上传输。
以报文从Leaf#1发送到Leaf#3为例,由图8可知,Leaf#1至Leaf#3的三条等价路径的权重值分别为0、1、4,因此各条等价路径的权重值之和为5。对于每个新的Flowlet(即Flowlet中的首包),产生一个范围在1~5的随机数;如果随机数的值为1,将该报文调度到Leaf#1→Spine#2→Leaf#3这条等价路径上(即图8中Leaf#1的第2个出端口);如果随机数为2~5,则将该报文调度到Leaf#1→Spine#3→Leaf#3这条等价路径上(即图8中Leaf#1的第3个出端口),这样便可保证Leaf#1至Leaf#3的三条等价路径上Flowlet的数目比例为0:1:4,实现了负载均衡。
需要说明的是,在确定等价路径B后,将等价路径B的路径标识作为出端口信息保存 在Flowlet表的对应Flowlet条目中。这样对于该Flowlet的后续报文(即非Flowlet首包),则可直接读取该Flowlet条目中的出端口信息进行报文发送。
此外,当两两Leaf交换机之间的各条等价路径中链路状态变化时,比如端口down或up,关联的Leaf交换机会将端口down或up的情况上报控制器,由控制器根据端口down或up的情况,根据步骤701中类似的方式重新计算两两Leaf交换机之间的各条等价路径的权重值,并将权重值发生变化的部分下发给涉及到的Leaf交换机。假设如8中Spine#1→Leaf#3链路恢复正常,则各个Leaf交换机的等价路径权重表将变为图9所示。与图8对照可知,在Spine#1→Leaf#3链路恢复正常后,与Leaf#1关联的等价路径权重表、与Leaf#2关联的等价路径权重表、与Leaf#3关联的等价路径权重表均发生了变化,因此分别向三个Leaf交换机下发新等价路径权重表。
需要说明的是,Leaf交换机会周期性地扫描Flowlet表中所有Flowlet条目,对于Flowlet已超时的表项,会将该表项设置为无效状态。其中,Flowlet已超时的表项指代当前时间减去表项中Last Active Time大于某个预设时间阈值的表项。也即,Leaf交换机对于Flowlet表中的每一个Flowlet条目,周期性计算当前时间与该Flowlet条目中记录的最近活跃时间之间的差值;若该差值大于预设时间阈值,则将该Flowlet条目设置为无效状态。
此外,本申请实施例提及的负载均衡方法同样可运用于三级Clos的Fat-tree架构数据中心网络。其中三级Clos的Fat-tree架构实际上由多个二级Clos叠加而成。如图10所示的三级Clos Fat-tree架构,两个设备池(Pool Of Device,POD)分别对应一个二级Clos,即架顶交换机(Top Of Rack,TOR)#1、TOR#2、汇聚交换机(Aggregation,AGG)#1、AGG#2组成一个二级Clos,TOR#3、TOR#4、AGG#3、AGG#4组成另外一个二级Clos,同时第二层的AGG#1、AGG#2、AGG#3、AGG#4和Spine#1、Spine#2又组成1个二级Clos,分别运行本申请所述方案,可达到非常好的负载均衡效果。
本申请实施例提供的方法,控制器根据整网链路信息,分别为两两交换机之间的各条等价路径设置权重值,并将对应的等价路径权重表下发至每一个交换机,且交换机在接收到服务器发送的报文后,进行Flowlet检测,这样交换机可直接根据本地存储的等价路径权重表和Flowlet表作基于Flowlet的负载均衡,无需交换机实时获取各个路径的负载情况,仅需在链路状态发送变化时重新计算一次等价路径权重即可,大大降低了负载均衡算法的复杂度,效果较佳。
图11是本申请实施例提供的一种负载均衡方法的流程图。以交换机执行该方法的角度为例,参见图11,本申请实施例提供的方法流程包括:
1101、在接收到服务器发送的一个报文后,判断该报文是否为一个Flowlet的首报文;若该报文为一个Flowlet的首报文,则执行下述步骤1102。
1102、若该报文为一个Flowlet的首报文,则根据该报文的目的地址确定目的交换机。
1103、在存储的等价路径权重表中,确定与该目的交换机关联的至少一条等价路径的权重值,该等价路径权重表中存储了至少一条等价路径与权重值之间的对应关系。
1104、根据至少一条等价路径的权重值,将该报文调度到相应的等价路径上传输。
本申请实施例提供的方法,交换机在接收到服务器发送的报文后进行Flowlet检测, 之后交换机直接根据本地存储的等价路径权重表作基于Flowlet的负载均衡,其中所述等价路径权重表中存储了至少一条等价路径与权重值之间的对应关系;比如在检测到当前接收到的报文为一个Flowlet的首报文后,直接根据该报文的目的地址确定目的交换机,并在存储的等价路径权重表中确定与所述目的交换机关联的至少一条等价路径的权重值,进而根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输,从而在无需交换机实时获取各个路径的负载情况下实现了负载均衡,大大降低了负载均衡算法的复杂度,效果较佳。
在另一个实施例中,所述根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输,包括:
获取所述至少一条等价路径中每一条等价路径的权重值;
计算所述每一条等价路径的权重值之和,产生一个数值范围位于零至所述权重值之和之间的随机数;
根据所述随机数的数值,在所述至少一条等价路径中,确定与所述随机数匹配的第一指定等价路径;
将所述报文调度到所述第一指定等价路径上传输。
在另一个实施例中,所述判断所述报文是否为一个Flowlet的首报文,包括:
根据所述报文的五元组信息进行哈希计算,得到哈希值;
在存储的Flowlet表中,确定与所述哈希值匹配的Flowlet条目;
若所述Flowlet条目中的有效位信息为第一数值,则确定所述报文为所述Flowlet的首报文,将所述五元组信息、当前时间作为最近活跃时间更新至所述Flowlet条目中,并将所述有效位信息由第一数值更新为第二数值;
其中,一个Flowlet条目中至少包括一个Flowlet的五元组信息、最近活跃信息、出端口信息和有效位信息。
在另一个实施例中,该方法还包括:
若所述Flowlet条目中的有效位信息为所述第二数值,则确定所述报文为所述Flowlet的非首报文,根据所述Flowlet条目中的出端口信息确定第二指定等价路径;
将所述报文调度到所述第二指定等价路径上传输,并将所述Flowlet条目中的最近活跃时间更新为当前时间。
在另一个实施例中,该方法还包括:
在确定与所述随机数匹配的第一指定等价路径后,将所述第一指定等价路径的路径标识作为出端口信息,保存在Flowlet表的对应Flowlet条目中。
在另一个实施例中,该方法还包括:
若所述至少一条等价路径的链路状态发生变化,则向控制器发送链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的第三指定等价路径,以使所述控制器根据所述链路状态变更消息,重新计算所述至少一条等价路径的权重值,得到新等价路径权重表;
接收所述控制器发送的所述新等价路径权重表,并存储所述新等价路径权重表。
在另一个实施例中,该方法还包括:
对于所述Flowlet表中的每一个Flowlet条目,周期性计算当前时间与最近活跃时间 之间的差值;
若所述差值大于预设时间阈值,则将所述Flowlet条目设置为无效状态。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。
图12是本申请实施例提供的一种负载均衡装置的结构示意图。参见图12,该装置包括:判断模块1201、确定模块1202、调度模块1203。
判断模块1201,用于在接收到服务器发送的一个报文后,判断所述报文是否为一个Flowlet的首报文;
确定模块1202,用于若所述报文为所述Flowlet的首报文,则根据所述报文的目的地址确定目的交换机;
所述确定模块1202,还用于在存储的等价路径权重表中,确定与所述目的交换机关联的至少一条等价路径的权重值,所述等价路径权重表中存储了所述至少一条等价路径与权重值之间的对应关系;
调度模块1203,用于根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
在另一个实施例中,所述调度模块1203,用于获取所述至少一条等价路径中每一条等价路径的权重值;计算所述每一条等价路径的权重值之和,产生一个数值范围位于零至所述权重值之和之间的随机数;根据所述随机数的数值,在所述至少一条等价路径中,确定与所述随机数匹配的第一指定等价路径;将所述报文调度到所述第一指定等价路径上传输。
在另一个实施例中,该装置还包括:
所述判断模块1201,用于根据所述报文的五元组信息进行哈希计算,得到哈希值;在存储的Flowlet表中,确定与所述哈希值匹配的Flowlet条目;若该Flowlet条目中的有效位信息为第一数值,则确定所述报文为所述Flowlet的首报文;
更新模块1204,用于当所述报文为所述Flowlet的首报文时,将所述五元组信息、当前时间作为最近活跃时间更新至所述Flowlet条目中,并将所述有效位信息由第一数值更新为第二数值;其中,一个Flowlet条目中至少包括一个Flowlet的五元组信息、最近活跃信息、出端口信息和有效位信息。
在另一个实施例中,所述判断模块1201,还用于若所述Flowlet条目中的有效位信息为所述第二数值,则确定所述报文为所述Flowlet的非首报文;
所述确定模块1202,还用于当所述报文为所述Flowlet的非首报文时,根据所述Flowlet条目中的出端口信息确定第二指定等价路径;
所述调度模块1203,还用于当所述报文为所述Flowlet的非首报文时,将该报文调度到所述第二指定等价路径上传输;更新模块1204,还用于当所述报文为所述Flowlet的非首报文时,将该Flowlet条目中的最近活跃时间更新为当前时间。
在另一个实施例中,该装置还包括:
保存模块1205,用于在确定与所述随机数匹配的第一指定等价路径后,将所述第一指定等价路径的路径标识作为出端口信息,保存在Flowlet表的对应Flowlet条目中。
在另一个实施例中,该装置还包括:
发送模块1205,用于若所述至少一条等价路径的链路状态发生变化,则向控制器发送链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的第三指定等价路径,以使所述控制器根据所述链路状态变更消息,重新计算所述至少一条等价路径的权重值,得到新等价路径权重表;
接收模块1206,用于接收所述控制器发送的所述新等价路径权重表,并存储所述新等价路径权重表。
在另一个实施例中,该装置还包括:
计算模块1207,用于对于所述Flowlet表中的每一个Flowlet条目,周期性计算当前时间与最近活跃时间之间的差值;
设置模块1208,用于若所述差值大于预设时间阈值,则将所述Flowlet条目设置为无效状态。
本申请实施例提供的装置,在接收到控制器下发的等价路径权重表并进行存储后,若接收到服务器发送的报文,则进行Flowlet检测并直接根据本地存储的等价路径权重表和Flowlet表作基于Flowlet的负载均衡,为当前接收到的报文选择待传输的等价路径,无需交换机实时获取各个路径的负载情况,仅需在链路状态发送变化时重新计算一次等价路径权重即可,大大降低了负载均衡算法的复杂度,效果较佳。
图13是本申请实施例提供的一种负载均衡装置的结构示意图。参见图13,该装置包括:计算模块1301和发送模块1302。
计算模块1301,用于对于一个源交换机,计算所述源交换机与各个目的交换机之间多条等价路径的权重值,得到所述源交换机的等价路径权重表;
发送模块1302,用于向所述源交换机下发所述等价路径权重表,以使所述源交换机在接收到服务器发送的一个报文且判断出所述报文为一个Flowlet的首报文后,根据所述报文的目的地址确定目的交换机,在所述等价路径权重表中确定与所述目的交换机关联的至少一条等价路径的权重值,根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
在另一个实施例中,所述计算模块1301,用于对于一个目的交换机,确定所述源交换机与所述目的交换机之间的各条等价路径,所述等价路径包括所述源交换机至中转交换机之间的第一链路路径、以及所述中转交换机至所述目的交换机之间的第二链路路径;对于一条等价路径,根据所述第一链路路径和所述第二链路路径的链路状态,计算所述等价路径的权重值。
在另一个实施例中,该装置还包括:
接收模块1303,用于接收所述源交换机发送的链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的等价路径;
所述计算模块1301,还用于根据所述链路状态变更消息,重新计算各个源交换机与各个目的交换机之间多条等价路径的权重值,得到新等价路径权重表;
所述发送模块1302,还用于向各个源交换机发送匹配的新等价路径权重表。
本申请实施例提供的装置,在根据整网链路信息,分别为两两交换机之间的各条等价路径设置权重值,并将对应的等价路径权重表下发至每一个交换机后,若交换机接收到服 务器发送的报文,则进行Flowlet检测并直接根据本地存储的等价路径权重表和Flowlet表作基于Flowlet的负载均衡,以为当前接收到的报文选择待传输的等价路径,无需交换机实时获取各个路径的负载情况,仅需在链路状态发送变化时重新计算一次等价路径权重即可,大大降低了负载均衡算法的复杂度,效果较佳。
需要说明的是:上述实施例提供的负载均衡装置在进行负载均衡时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的负载均衡装置与负载均衡方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (22)

  1. 一种负载均衡方法,其特征在于,所述方法包括:
    在接收到服务器发送的一个报文后,判断所述报文是否为一个子流Flowlet的首报文;
    若所述报文为所述Flowlet的首报文,则根据所述报文的目的地址确定目的交换机;
    在存储的等价路径权重表中,确定与所述目的交换机关联的至少一条等价路径的权重值,所述等价路径权重表中存储了所述至少一条等价路径与权重值之间的对应关系;
    根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输,包括:
    获取所述至少一条等价路径中每一条等价路径的权重值;
    计算所述每一条等价路径的权重值之和,产生一个数值范围位于零至所述权重值之和之间的随机数;
    根据所述随机数的数值,在所述至少一条等价路径中,确定与所述随机数匹配的第一指定等价路径;
    将所述报文调度到所述第一指定等价路径上传输。
  3. 根据权利要求1所述的方法,其特征在于,所述判断所述报文是否为一个Flowlet的首报文,包括:
    根据所述报文的五元组信息进行哈希计算,得到哈希值;
    在存储的Flowlet表中,确定与所述哈希值匹配的Flowlet条目;
    若所述Flowlet条目中的有效位信息为第一数值,则确定所述报文为所述Flowlet的首报文,将所述五元组信息、当前时间作为最近活跃时间更新至所述Flowlet条目中,并将所述有效位信息由第一数值更新为第二数值;
    其中,一个Flowlet条目中至少包括一个Flowlet的五元组信息、最近活跃信息、出端口信息和有效位信息。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    若所述Flowlet条目中的有效位信息为所述第二数值,则确定所述报文为所述Flowlet的非首报文,根据所述Flowlet条目中的出端口信息确定第二指定等价路径;
    将所述报文调度到所述第二指定等价路径上传输,并将所述Flowlet条目中的最近活跃时间更新为当前时间。
  5. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    在确定与所述随机数匹配的第一指定等价路径后,将所述第一指定等价路径的路径标识作为出端口信息,保存在Flowlet表的对应Flowlet条目中。
  6. 根据权利要求1至5中任一权利要求所述的方法,其特征在于,所述方法还包括:
    若所述至少一条等价路径的链路状态发生变化,则向控制器发送链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的第三指定等价路径,以使所述控制器根据所述链路状态变更消息,重新计算所述至少一条等价路径的权重值,得到新等价路径权重表;
    接收所述控制器发送的所述新等价路径权重表,并存储所述新等价路径权重表。
  7. 根据权利要求3至5中任一权利要求所述的方法,其特征在于,所述方法还包括:
    对于所述Flowlet表中的每一个Flowlet条目,周期性计算当前时间与最近活跃时间之间的差值;
    若所述差值大于预设时间阈值,则将所述Flowlet条目设置为无效状态。
  8. 一种负载均衡方法,其特征在于,所述方法还包括:
    对于一个源交换机,计算所述源交换机与各个目的交换机之间多条等价路径的权重值,得到所述源交换机的等价路径权重表;
    向所述源交换机下发所述等价路径权重表,以使所述源交换机在接收到服务器发送的一个报文且判断出所述报文为一个Flowlet的首报文后,根据所述报文的目的地址确定目的交换机,在所述等价路径权重表中确定与所述目的交换机关联的至少一条等价路径的权重值,根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
  9. 根据权利要求8所述的方法,其特征在于,所述计算所述源交换机与各个目的交换机之间多条等价路径的权重值,包括:
    对于一个目的交换机,确定所述源交换机与所述目的交换机之间的各条等价路径,所述等价路径包括所述源交换机至中转交换机之间的第一链路路径、以及所述中转交换机至所述目的交换机之间的第二链路路径;
    对于一条等价路径,根据所述第一链路路径和所述第二链路路径的链路状态,计算所述等价路径的权重值。
  10. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    接收所述源交换机发送的链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的等价路径;
    根据所述链路状态变更消息,重新计算各个源交换机与各个目的交换机之间多条等价路径的权重值,得到新等价路径权重表;
    向各个源交换机发送匹配的新等价路径权重表。
  11. 一种负载均衡装置,其特征在于,所述装置包括:
    判断模块,用于在接收到服务器发送的一个报文后,判断所述报文是否为一个子流Flowlet的首报文;
    确定模块,用于若所述报文为所述Flowlet的首报文,则根据所述报文的目的地址确定目的交换机;
    所述确定模块,还用于在存储的等价路径权重表中,确定与所述目的交换机关联的至少 一条等价路径的权重值,所述等价路径权重表中存储了所述至少一条等价路径与权重值之间的对应关系;
    调度模块,用于根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
  12. 根据权利要求11所述的装置,其特征在于,所述调度模块,用于获取所述至少一条等价路径中每一条等价路径的权重值;计算所述每一条等价路径的权重值之和,产生一个数值范围位于零至所述权重值之和之间的随机数;根据所述随机数的数值,在所述至少一条等价路径中,确定与所述随机数匹配的第一指定等价路径;将所述报文调度到所述第一指定等价路径上传输。
  13. 根据权利要求11所述的装置,其特征在于,所述装置还包括:
    所述判断模块,用于根据所述报文的五元组信息进行哈希计算,得到哈希值;在存储的Flowlet表中,确定与所述哈希值匹配的Flowlet条目;若所述Flowlet条目中的有效位信息为第一数值,则确定所述报文为所述Flowlet的首报文;
    更新模块,用于当所述报文为所述Flowlet的首报文时,将所述五元组信息、当前时间作为最近活跃时间更新至所述Flowlet条目中,并将所述有效位信息由第一数值更新为第二数值;其中,一个Flowlet条目中至少包括一个Flowlet的五元组信息、最近活跃信息、出端口信息和有效位信息。
  14. 根据权利要求13所述的装置,其特征在于,所述判断模块,还用于若所述Flowlet条目中的有效位信息为所述第二数值,则确定所述报文为所述Flowlet的非首报文;
    所述确定模块,还用于当所述报文为所述Flowlet的非首报文时,根据所述Flowlet条目中的出端口信息确定第二指定等价路径;
    所述调度模块,还用于当所述报文为所述Flowlet的非首报文时,将所述报文调度到所述第二指定等价路径上传输;
    所述更新模块,还用于当所述报文为所述Flowlet的非首报文时,将所述Flowlet条目中的最近活跃时间更新为当前时间。
  15. 根据权利要求12所述的装置,其特征在于,所述装置还包括:
    保存模块,用于在确定与所述随机数匹配的第一指定等价路径后,将所述第一指定等价路径的路径标识作为出端口信息,保存在Flowlet表的对应Flowlet条目中。
  16. 根据权利要求11至15中任一权利要求所述的装置,其特征在于,所述装置还包括:
    发送模块,用于若所述至少一条等价路径的链路状态发生变化,则向控制器发送链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的第三指定等价路径,以使所述控制器根据所述链路状态变更消息,重新计算所述至少一条等价路径的权重值,得到新等价路径权重表;
    接收模块,用于接收所述控制器发送的所述新等价路径权重表,并存储所述新等价路径 权重表。
  17. 根据权利要求13至15中任一权利要求所述的装置,其特征在于,所述装置还包括:
    计算模块,用于对于所述Flowlet表中的每一个Flowlet条目,周期性计算当前时间与最近活跃时间之间的差值;
    设置模块,用于若所述差值大于预设时间阈值,则将所述Flowlet条目设置为无效状态。
  18. 一种负载均衡装置,其特征在于,所述装置还包括:
    计算模块,用于对于一个源交换机,计算所述源交换机与各个目的交换机之间多条等价路径的权重值,得到所述源交换机的等价路径权重表;
    发送模块,用于向所述源交换机下发所述等价路径权重表,以使所述源交换机在接收到服务器发送的一个报文且判断出所述报文为一个Flowlet的首报文后,根据所述报文的目的地址确定目的交换机,在所述等价路径权重表中确定与所述目的交换机关联的至少一条等价路径的权重值,根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
  19. 根据权利要求18所述的装置,其特征在于,所述计算模块,用于对于一个目的交换机,确定所述源交换机与所述目的交换机之间的各条等价路径,所述等价路径包括所述源交换机至中转交换机之间的第一链路路径、以及所述中转交换机至所述目的交换机之间的第二链路路径;对于一条等价路径,根据所述第一链路路径和所述第二链路路径的链路状态,计算所述等价路径的权重值。
  20. 根据权利要求18所述的装置,其特征在于,所述装置还包括:
    接收模块,用于接收所述源交换机发送的链路状态变更消息,所述链路状态变更消息指示了链路状态发生变化的等价路径;
    所述计算模块,还用于根据所述链路状态变更消息,重新计算各个源交换机与各个目的交换机之间多条等价路径的权重值,得到新等价路径权重表;
    所述发送模块,还用于向各个源交换机发送匹配的新等价路径权重表。
  21. 一种交换机,其特征在于,包括:发送器、接收器、存储器和处理器,所述存储器、所述发送器和所述接收器分别与所述处理器连接,所述存储器存储有程序代码,所述处理器用于调用程序代码,执行以下操作:
    在通过所述接收器接收到服务器发送的一个报文后,判断所述报文是否为一个子流Flowlet的首报文;若所述报文为所述Flowlet的首报文,则根据所述报文的目的地址确定目的交换机;在存储的等价路径权重表中,确定与所述目的交换机关联的至少一条等价路径的权重值,所述等价路径权重表中存储了所述至少一条等价路径与权重值之间的对应关系;根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
  22. 一种控制器,其特征在于,包括:发送器、接收器、存储器和处理器,所述存储器、 所述发送器和所述接收器分别与所述处理器连接,所述存储器存储有程序代码,所述处理器用于调用程序代码,执行以下操作:
    对于一个源交换机,计算所述源交换机与各个目的交换机之间多条等价路径的权重值,得到所述源交换机的等价路径权重表;通过所述发送器向所述源交换机下发所述等价路径权重表,以使所述源交换机在接收到服务器发送的一个报文且判断出所述报文为一个Flowlet的首报文后,根据所述报文的目的地址确定目的交换机,在所述等价路径权重表中确定与所述目的交换机关联的至少一条等价路径的权重值,根据所述至少一条等价路径的权重值,将所述报文调度到相应的等价路径上传输。
PCT/CN2017/076987 2016-07-19 2017-03-16 负载均衡方法、装置及设备 WO2018014569A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17830212.1A EP3468119B1 (en) 2016-07-19 2017-03-16 Method, apparatus and device for balancing load
US16/239,353 US11134014B2 (en) 2016-07-19 2019-01-03 Load balancing method, apparatus, and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610570733.8A CN107634912B (zh) 2016-07-19 2016-07-19 负载均衡方法、装置及设备
CN201610570733.8 2016-07-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/239,353 Continuation US11134014B2 (en) 2016-07-19 2019-01-03 Load balancing method, apparatus, and device

Publications (1)

Publication Number Publication Date
WO2018014569A1 true WO2018014569A1 (zh) 2018-01-25

Family

ID=60992833

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/076987 WO2018014569A1 (zh) 2016-07-19 2017-03-16 负载均衡方法、装置及设备

Country Status (4)

Country Link
US (1) US11134014B2 (zh)
EP (1) EP3468119B1 (zh)
CN (1) CN107634912B (zh)
WO (1) WO2018014569A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112423345A (zh) * 2019-08-22 2021-02-26 大唐移动通信设备有限公司 一种小区重选的方法和网络侧设备及ue

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106998302B (zh) * 2016-01-26 2020-04-14 华为技术有限公司 一种业务流量的分配方法及装置
US10848432B2 (en) * 2016-12-18 2020-11-24 Cisco Technology, Inc. Switch fabric based load balancing
US10924385B2 (en) * 2017-11-07 2021-02-16 Nicira, Inc. Weighted multipath routing configuration in software-defined network (SDN) environments
CN113039760A (zh) * 2018-10-26 2021-06-25 慧与发展有限责任合伙企业 对网络路径的拥塞中的拐点的确定
CN112398749A (zh) * 2019-08-12 2021-02-23 中国电信股份有限公司 负载均衡方法、装置和系统、网络和存储介质
EP4113917A4 (en) * 2020-03-10 2023-01-25 Mitsubishi Electric Corporation CONTROLLER, NETWORK SYSTEM, AND FLOW MANAGEMENT METHOD
CN111526089B (zh) * 2020-04-14 2021-08-17 北京交通大学 一种基于变长粒度的数据融合传输与调度的装置
US11411869B1 (en) * 2020-05-11 2022-08-09 Cisco Technology, Inc. Designated forwarder selection for multihomed hosts in an ethernet virtual private network
CN113810284A (zh) * 2020-06-16 2021-12-17 华为技术有限公司 确定报文发送路径的方法及装置
WO2022067791A1 (zh) * 2020-09-30 2022-04-07 华为技术有限公司 一种数据处理、传输方法及相关设备
CN112787925B (zh) * 2020-10-12 2022-07-19 中兴通讯股份有限公司 拥塞信息收集方法、确定最优路径方法、网络交换机
US11425044B2 (en) * 2020-10-15 2022-08-23 Cisco Technology, Inc. DHCP layer 2 relay in VXLAN overlay fabric
CN112910795B (zh) * 2021-01-19 2023-01-06 南京大学 一种基于众源的边缘负载均衡方法和系统
CN115442287B (zh) * 2022-08-10 2024-04-05 北京金山云网络技术有限公司 基于权重的专线网关方法以及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136986A (zh) * 2010-01-22 2011-07-27 杭州华三通信技术有限公司 一种负载分担方法和交换设备
CN102710489A (zh) * 2011-03-28 2012-10-03 日电(中国)有限公司 动态分流调度系统和方法
US20140108489A1 (en) * 2012-10-15 2014-04-17 Et International, Inc. Flowlet-based processing
CN104580002A (zh) * 2015-01-14 2015-04-29 盛科网络(苏州)有限公司 大流负载均衡转发方法及装置
CN105591974A (zh) * 2014-10-20 2016-05-18 华为技术有限公司 报文处理方法、装置及系统

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8897130B2 (en) * 2009-09-16 2014-11-25 Broadcom Corporation Network traffic management
US9071541B2 (en) * 2012-04-25 2015-06-30 Juniper Networks, Inc. Path weighted equal-cost multipath
US9036476B2 (en) * 2012-09-28 2015-05-19 Juniper Networks, Inc. Maintaining load balancing after service application with a network device
US9270601B2 (en) * 2013-04-01 2016-02-23 Broadcom Corporation Path resolution for hierarchical load distribution
US9502111B2 (en) * 2013-11-05 2016-11-22 Cisco Technology, Inc. Weighted equal cost multipath routing
US10778584B2 (en) * 2013-11-05 2020-09-15 Cisco Technology, Inc. System and method for multi-path load balancing in network fabrics
US9565114B1 (en) * 2014-03-08 2017-02-07 Google Inc. Weighted load balancing using scaled parallel hashing
US9367366B2 (en) * 2014-03-27 2016-06-14 Nec Corporation System and methods for collaborative query processing for large scale data processing with software defined networking
US9397926B2 (en) * 2014-08-05 2016-07-19 Dell Products L.P. Peer-influenced aggregate member selection
US10320681B2 (en) * 2016-04-12 2019-06-11 Nicira, Inc. Virtual tunnel endpoints for congestion-aware load balancing
US10015096B1 (en) * 2016-06-20 2018-07-03 Amazon Technologies, Inc. Congestion avoidance in multipath routed flows

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136986A (zh) * 2010-01-22 2011-07-27 杭州华三通信技术有限公司 一种负载分担方法和交换设备
CN102710489A (zh) * 2011-03-28 2012-10-03 日电(中国)有限公司 动态分流调度系统和方法
US20140108489A1 (en) * 2012-10-15 2014-04-17 Et International, Inc. Flowlet-based processing
CN105591974A (zh) * 2014-10-20 2016-05-18 华为技术有限公司 报文处理方法、装置及系统
CN104580002A (zh) * 2015-01-14 2015-04-29 盛科网络(苏州)有限公司 大流负载均衡转发方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3468119A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112423345A (zh) * 2019-08-22 2021-02-26 大唐移动通信设备有限公司 一种小区重选的方法和网络侧设备及ue
CN112423345B (zh) * 2019-08-22 2022-08-02 大唐移动通信设备有限公司 一种小区重选的方法和网络侧设备及ue

Also Published As

Publication number Publication date
EP3468119A1 (en) 2019-04-10
CN107634912B (zh) 2020-04-28
US20190140956A1 (en) 2019-05-09
US11134014B2 (en) 2021-09-28
EP3468119A4 (en) 2019-04-24
CN107634912A (zh) 2018-01-26
EP3468119B1 (en) 2023-07-26

Similar Documents

Publication Publication Date Title
WO2018014569A1 (zh) 负载均衡方法、装置及设备
US10986021B2 (en) Flow management in networks
US10200300B2 (en) Maintaining named data networking (NDN) flow balance with highly variable data object sizes
US10938724B2 (en) Flow rate based network load balancing
US8149704B2 (en) Communication apparatus and data communication method
Hafeez et al. Detection and mitigation of congestion in SDN enabled data center networks: A survey
US10200294B2 (en) Adaptive routing based on flow-control credits
CN102263697A (zh) 一种聚合链路流量分担方法和装置
US10728156B2 (en) Scalable, low latency, deep buffered switch architecture
CN111224888A (zh) 发送报文的方法及报文转发设备
WO2021083160A1 (zh) 数据传输的方法和装置
US10778568B2 (en) Switch-enhanced short loop congestion notification for TCP
CN109309625A (zh) 一种数据中心网络灾备传输方法
US20120155268A1 (en) Packet relay device
WO2014161421A1 (zh) 数据传输方法、设备及系统
Wang et al. WinCM: A window based congestion control mechanism for NDN
Ruan et al. PTCP: A priority-based transport control protocol for timeout mitigation in commodity data center
Ho et al. Performance improvement of delay-based TCPs in asymmetric networks
McCauley et al. Taking an axe to L2 spanning trees
Farahmand et al. A closed-loop rate-based contention control for optical burst switched networks
Xia et al. A real-time aware routing strategy in smart city environments
Shah et al. A fluid flow model for SCTP traffic over the internet
CN110266608B (zh) 一种基于队列缓存平衡因子的mptcp传输控制方法
CN111510391B (zh) 数据中心环境下细粒度级别混合的负载均衡方法
Li PFECC: a precise feedback-based explicit congestion control algorithm in nameddata networking

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17830212

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017830212

Country of ref document: EP

Effective date: 20190107

NENP Non-entry into the national phase

Ref country code: DE