CN106911584B - Flow load sharing method, device and system based on leaf-ridge topological structure - Google Patents

Flow load sharing method, device and system based on leaf-ridge topological structure Download PDF

Info

Publication number
CN106911584B
CN106911584B CN201510981555.3A CN201510981555A CN106911584B CN 106911584 B CN106911584 B CN 106911584B CN 201510981555 A CN201510981555 A CN 201510981555A CN 106911584 B CN106911584 B CN 106911584B
Authority
CN
China
Prior art keywords
congestion
path
leaf
value
leaf device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510981555.3A
Other languages
Chinese (zh)
Other versions
CN106911584A (en
Inventor
姚学军
王建兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510981555.3A priority Critical patent/CN106911584B/en
Publication of CN106911584A publication Critical patent/CN106911584A/en
Application granted granted Critical
Publication of CN106911584B publication Critical patent/CN106911584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0289Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution

Abstract

The present application relates to the field of mobile communications, and in particular, to a traffic load sharing method, device, and system based on a leaf-ridge topology. In the flow load sharing method based on the leaf-ridge topology structure, a first leaf device sends a plurality of congestion detection messages to a second leaf device through a plurality of paths, and each congestion detection message carries a congestion value, wherein the congestion value is used for indicating the congestion degree of the path through which the congestion detection message passes. The first leaf device selects at least one path to transmit data to the second leaf device according to the congestion value of each path. By the scheme provided by the application, the load sharing of the whole network is realized.

Description

Flow load sharing method, device and system based on leaf-ridge topological structure
Technical Field
The invention relates to the field of mobile communication, in particular to traffic load sharing of an Ethernet network.
Background
In the current ethernet network, in order to reduce the influence of link failure on the service, a plurality of links are usually backed up by each other, and when a failure occurs in an active link, the traffic is switched to a standby link to ensure the quality and reliability of the service. However, this method cannot make the traffic flow be forwarded on the primary and standby links at the same time, and cannot effectively utilize the bandwidth of the link.
To support traffic load sharing, multiple physical links of a device are typically bundled together into one logical link, as shown in fig. 1. Fig. 1 is a schematic diagram of a logical link composed of a plurality of physical links in the prior art. In fig. 1, the server is connected to device 1 by two physical links that make up a logical link LAG 1. Device 1 is in turn connected to device 2 and device 3 by two physical links, which constitute a logical link LAG 2. When a message is sent through a logical link, one of the physical links is selected to send the message according to a corresponding load balancing algorithm. And when a physical link fails, the physical link with the failure exits from the logical link, so that the message can not be load-shared on the physical link with the failure.
Today, it has evolved to build data center networks using a two-tier leaf-spine (leaf-spine) topology architecture. FIG. 2 is a schematic diagram of a two-layer leaf-spine topology in the prior art. In fig. 2, leaf (leaf) devices are connected to different spine (spine) devices through multiple physical links. The plurality of physical links constitute one logical link. Therefore, the east-west traffic between the leaf equipment and the leaf equipment is shared among the spine equipment through a plurality of physical links. How to effectively share each service traffic load to each physical link in the logical link affects the quality and performance of service traffic forwarding in the data center network.
In order to realize that each service flow can be uniformly shared to each physical link in a logical link to the greatest extent and ensure that messages are not out of order, a static hash algorithm is proposed in the industry. The algorithm mainly obtains the physical link index under the logical link according to the characteristic field of the message. Thus, the same message feature field value will produce the same hash result. In the static hash mode, since multiple flows with the same characteristic field are hashed on the same link, hash imbalance occurs, which causes an excessive load on a certain physical link, and even congestion occurs, which causes packet discarding, while some physical links have a light load and a low bandwidth utilization, as shown in fig. 3. Fig. 3 is a schematic diagram of a leaf-spine network topology in the prior art, in which a static hash algorithm is used to cause hash imbalance.
In fig. 3, leaf devices are connected to different spine devices via respective physical links that make up a logical link. And carrying out Hash load sharing on the traffic from the leaf device to the spine device on the logical link. If the hash is not uniform, congestion or even discarding of the traffic to part of spine equipment can be caused, and the service quality and the user experience of the data center are reduced.
In order to overcome the defect of link load imbalance caused by uneven hash of a static hash algorithm, a novel commercial forwarding chip supports a Dynamic Load Balancing (DLB) algorithm. Namely: under the condition of keeping the service messages from being out of order, according to the load condition of each physical link under the logical link, selecting the physical link with the lightest load to send out the messages, as shown in fig. 4.
Fig. 4 is a schematic view of a leaf-spine topology for local load sharing by using a DLB algorithm in the prior art. In fig. 4, when the server 1 requests to send data to the server 3, the load of the link 1 from the leaf device 1 to the spine device 1 is already too heavy, and the load of the link 2 from the leaf device 1 to the spine device 2 is lighter. At this time, when the leaf device 1 shares the load of the new service traffic by using the DLB algorithm, the new service traffic is shared to the link 2 with a light load. If the static hash algorithm is adopted, the new service flow can still be continuously shared to the link 1 with the overlarge load. The final effect of the DLB algorithm can make the load of each local physical link of the logical link more balanced.
When the DLB algorithm executes the local load sharing of the equipment, the load sharing effect of each local physical link can be well achieved, and the phenomenon that each flow is hashed to the same physical link like a static hash algorithm, so that the physical link is congested and messages are discarded is avoided. However, even if the DLB algorithm performs local load sharing of the device, the DLB algorithm cannot meet the traffic sharing requirement in the leaf-spine topology of the data center network.
In a leaf-spine topology, for example, for east-west traffic from a leaf device to a leaf device, traffic load is shared by multiple spine devices through multiple physical links. As shown in fig. 5, fig. 5 is a schematic diagram of a leaf-spine topology in the prior art, in which a DLB algorithm is adopted to cause network load imbalance.
In fig. 5, when the server 1 requests to transmit data to the server 3, there are two paths for traffic from the leaf device 1 to the leaf device 3, each consisting of physical links {1,3} and {2,4 }. The link 3 to the leaf device 3 is lightly loaded, but the congestion drop status has occurred for the link 4. The leaf device 1 can only guarantee load sharing between the local physical links 1 and 2 when executing the DLB algorithm. If traffic is hashed onto the physical link 2, it will reach the leaf device 3 via the link 4, and it will happen that the packet is dropped due to congestion. Therefore, in the network topology, there are multiple paths that can reach the destination device of the packet, and the link bandwidth and the load of the devices on each path are different. Even if the DLB algorithm achieves load sharing of each physical link of the local device, it cannot guarantee that the entire path under the network topology can be load shared, and congestion discarding will not occur.
Therefore, no matter the static hash algorithm or the DLB algorithm is adopted, the traffic load sharing of the whole path under the leaf-spine network topology cannot be realized.
Disclosure of Invention
The invention provides a traffic load sharing method, a device and a system based on a leaf-ridge topology structure, so as to realize traffic load balance of the whole network.
In one aspect, an embodiment of the present application provides a traffic load sharing method based on a leaf-ridge topology. Wherein the first leaf devices communicate with each other through the spine device towards the second leaf device. The method includes the first leaf device sending a plurality of congestion detection messages to the second leaf device over a plurality of paths, each congestion detection message including a congestion value field. The congestion value field is used for writing a congestion value of the device top-going currently passed by the congestion detection information, and the congestion value is used for indicating the congestion degree of the path. The first leaf device receives a plurality of response messages sent by the second leaf device, wherein each response message comprises a congestion value of a path through which a congestion detection message corresponding to the response message passes. The first leaf device determines a congestion value of each path according to the plurality of response messages, and selects at least one path from the plurality of paths as a path for transmitting data to the second leaf device.
The invention realizes the traffic load sharing of the whole leaf-spine network, ensures the transmission performance and the throughput of the service flow in the data center network, increases the network bandwidth utilization rate and improves the user experience.
In one possible design, the congestion value for each path is determined based on the congestion values of the physical links included in the path. Further, the congestion value of the physical link is determined according to a quantized value of a queue length of an egress port corresponding to the physical link and/or a quantized value of a bandwidth utilization rate of a corresponding egress port. In one possible design, the congestion detection message is a protocol message or a data message, and the congestion detection message further includes an identifier of a path. The congestion value and the identification of the path are included in a header, a reserved bit, or a new field of the congestion detection message.
In one possible design, the traffic load sharing method further includes the first leaf device receiving a reverse congestion detection message sent by the second leaf device. And acquiring the reverse congestion value of the path passed by the reverse congestion control message from the congestion value field of the reverse congestion detection message.
On the other hand, the embodiment of the invention provides leaf equipment for traffic load sharing based on a leaf-ridge topological structure. The leaf device is in communication with another leaf device. The leaf device includes a transmitter, a receiver, and a processor.
The sender sends a plurality of congestion detection messages to the other leaf device through a plurality of paths, each congestion detection message including a congestion value field for writing a congestion value determined by a device through which the congestion detection message currently passes, the congestion value indicating a congestion degree of a path. The receiver is configured to receive a plurality of response messages sent by the other leaf device, where each response message includes a congestion value of a path through which a congestion detection message corresponding to the response message passes. The processor is configured to determine a congestion value of each path according to the plurality of response messages, and select at least one path from the plurality of paths as a path for transmitting data to the second blade device.
In another aspect, an embodiment of the present invention provides a spine device for traffic load sharing based on a leaf-spine topology. Wherein the first leaf device is in communication with the second leaf device through the spine device via the spine device. The spine device includes a receiver and a processor. The receiver is configured to receive a congestion detection message sent by the first leaf device, where a congestion value field of the congestion detection message includes a congestion value of a first physical link from the first leaf device to the spine device. The processor is configured to determine a congestion value of a path of the first leaf device to the second leaf device via the spine device by comparing the congestion value of the first physical link with a congestion value of the spine device to the second leaf device, and process the congestion detection message according to the determined congestion value.
In one possible design, when processing the congestion probe message according to the determined congestion value, the processor is specifically configured to: when the congestion value of the first physical link is the congestion value of the path, directly sending a congestion detection message to the second blade device; and when the congestion value of the second physical link is the congestion value of the path, updating the congestion value of the first physical link in the congestion value field of the congestion detection message to the congestion value of the second physical link, and sending the updated congestion detection message to the second blade device.
In another aspect, an embodiment of the present invention provides a system for traffic load sharing based on a leaf-ridge topology. The first leaf device is in communication with the second leaf device via the spine device. The system includes a first leaf device, a spine device, a second leaf device.
The first leaf device is configured to send a congestion probe message to the spine device, and the congestion probe message includes a congestion value field. The congestion value field is used for writing a congestion value determined by the first leaf device, and the congestion value is used for indicating a congestion degree of a first physical link from the first leaf device to the ridge device.
The spine device is to compare the first physical link congestion value with a second physical link congestion value of the spine device to the second leaf device to determine a congestion value of a path of the first leaf device to the second leaf device via the spine device.
The second leaf device is used for receiving the congestion value of the path from the ridge device and feeding back the path congestion value to the first leaf device, so that the first leaf device can select the corresponding path to send data.
Compared with the prior art, the method and the device optimize the multi-path load sharing effect under the leaf-spine network topology, avoid the service flow from being shared to the blocked forwarding path, improve the forwarding quality and performance and improve the network bandwidth utilization rate. And when the physical link fails, the method and the system can automatically sense and automatically adjust the sharing of the service flow, and improve the robustness of the data center network to the physical link failure.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic diagram of a logical link formed by a plurality of physical links in the prior art;
FIG. 2 is a schematic diagram of a two-layer leaf-spine topology in the prior art;
FIG. 3 is a schematic diagram of a leaf-spine network topology in the prior art, wherein a static hash algorithm is adopted to cause hash imbalance;
FIG. 4 is a leaf-spine topology diagram for local load sharing by DLB algorithm in the prior art;
FIG. 5 is a schematic diagram of a leaf-spine topology in the prior art, in which a DLB algorithm is adopted to cause network load imbalance;
fig. 6 is a schematic view of a leaf-spine system architecture related to the traffic load sharing method according to the embodiment of the present invention;
fig. 7 is a flowchart of a traffic load sharing method based on a leaf-spine topology structure according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a leaf device for traffic load sharing based on a leaf-spine topology according to an embodiment of the present invention;
fig. 9 is a schematic diagram of a spine device for traffic load sharing based on a leaf-spine topology according to an embodiment of the present invention;
fig. 10 is a schematic diagram of a traffic load sharing system based on a leaf-spine topology structure according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.
Fig. 6 is a schematic view of a leaf-spine system architecture related to a traffic load sharing method based on a leaf-spine network topology according to an embodiment of the present invention. The system comprises a plurality of leaf devices, a plurality of spine devices and a plurality of servers. The leaf equipment and the spine equipment are fully connected, so that the reliability of the network is ensured. The spine equipment is not connected with the spine equipment, and the leaf equipment is not connected with the leaf equipment.
The spine device 1 and the spine device 2 are backbone nodes of a bearer network and are located on a convergence layer. The leaf devices 1 to 4 are leaf nodes of a bearer network and are located in an access layer.
In one example, the spine device is an Ethernet switch and the leaf device is also an Ethernet switch. Further, the switch performance of the spine device is stronger than the interactive machine performance of the leaf device. For example, the switch of the spine device supports 100Gb/s ports, and the switch of the leaf device supports 1Gb/s or 10Gb/s ports.
Those skilled in the art will appreciate that the leaf-spine network architecture of fig. 6 is not limiting, and that the number of leaf devices and spine devices may be greater or fewer than that shown.
In fig. 6, when the server 1 sends data to the server 3, two paths are available for selection, where the first path is a {1,3} path, that is, the first path passes through the physical link 1 and then the physical link 3. The second path is {2,4}, i.e., first over physical link 2 and then over physical link 4. Physical link 1 is a link from leaf device 1 to spine device 1, and physical link 3 is a link from spine device 1 to leaf device 3. Physical link 2 is a link from leaf device 1 to spine device 2, and physical link 4 is a link from spine device 2 to leaf device 3. In order to select a path with a low congestion degree, the embodiment of the present invention provides a method for implementing local load sharing of a leaf device according to a path load congestion degree of a whole leaf-spine network, so as to implement global load sharing (GLB). How embodiments of the present invention implement GLB is set forth below.
In fig. 6, before the server 1 sends data to the server 3, the leaf device 1 accessed by the server 1 sends congestion detection messages to the paths {1,3} and {2,4} respectively in order to detect the congestion degrees of the paths {1,3} and {2,4}, where the congestion detection messages include a congestion value field for writing a congestion degree quantization value determined by a device through which the congestion detection messages currently pass. The quantized value of the congestion degree is referred to herein as a congestion value, and the congestion value is used to indicate the congestion degree of the path. The congestion value of the path is determined according to the quantized value of the congestion degree of each physical link contained in the path, that is, the congestion value of each physical link, and the congestion value of each physical link is calculated by the end device of the physical link, that is, the congestion value of the physical link corresponding to the congestion value is calculated by the leaf device and/or the spine device. According to the determined congestion values of the paths {1,3} and the paths {2,4}, the leaf device 1 selects a path with a low congestion degree, namely selects a path with a low congestion value to send data from the server 1, thereby realizing the global load sharing of the whole leaf-spine network.
Fig. 7 is a flowchart of a global load sharing method based on a leaf-spine network topology according to an embodiment of the present invention.
In step 710, each device in the leaf-spine network determines a congestion value of each physical link with the device as an end device.
Each device (including a leaf device and a spine device) in the leaf-spine network topology quantizes the congestion degree of each physical link taking the device as an end device, so as to obtain the congestion value of each physical link of each leaf device and spine device.
For example, in fig. 6, the leaf device 1 quantifies the congestion levels of the physical link 1 and the physical link 2, respectively. The leaf device 3 quantifies the congestion level of the physical link 3 and the congestion level of the physical link 4. The spine device 1 quantifies the congestion degree of the physical link 1 'and the physical link 3', respectively. The spine device 2 quantifies the congestion degree of the physical link 2 'and the physical link 4' respectively. Wherein, the physical link 1 'is a link from the spine device 1 to the leaf device 1, and the physical link 3' is a link from the spine device 1 to the leaf device 3; the physical link 2 'is a link from the spine device 2 to the leaf device 1, and the physical link 4' is a link from the spine device 2 to the leaf device 3.
How the leaf device and the spine device quantize the congestion degree of the physical link taking the leaf device or the spine device as the end device to obtain the congestion value of each physical link is explained in detail below.
In one example, the leaf device and the spine device quantize the congestion degree of each physical link to obtain the congestion value of the corresponding physical link, and the congestion value is determined by quantizing the queue length of the output port corresponding to the physical link and/or quantizing the bandwidth utilization rate of the output port corresponding to the local physical link.
The length of the output port queue refers to the length of an output port cache queue periodically read by a leaf device and a spine device, wherein the length of the cache queue refers to the number of bytes of a message in the cache queue.
The bandwidth utilization ratio of the physical link refers to a percentage of byte number of a message sent by an output port corresponding to the physical link, which is periodically read by the leaf device and the spine device, that is, service flow, and an output port bandwidth corresponding to the physical link, for example, a port with a bandwidth of 10G.
In one example, a leaf device or a spine device performs segmented quantization on ranges of physical link queue lengths and/or physical link bandwidth utilization rates, each range corresponds to one quantized value, and the quantized values corresponding to the ranges are stored in corresponding device buffers in a graph manner, as shown in tables 1 and 2 below. For example, the leaf device 1 stores, in its buffer, a table indicating quantized values of the ranges of its physical link queue length (see table 1), and a table indicating quantized values of the ranges of its physical link bandwidth utilization (see table 2).
The egress port congestion value may be determined according to a quantized value of the egress port queue length and/or a utilization of the egress port bandwidth. How to quantify the egress port queue length is illustrated by the following table 1, and how to quantify the egress port bandwidth utilization is illustrated by the following table 2.
Table 1 below is an example of quantizing the egress port queue length to 3 bits, and quantizing the egress port queue length.
Figure BDA0000888065360000101
TABLE 1
In table 1 above, for example, if the output port queue length of a certain leaf device or spine device is 20000 bytes, the leaf device or spine device can find, by querying table 1 above, that the output port queue length of the leaf device or spine device is quantized to 6, that is, quantized to 110 of 3 bits.
It should be noted that specific values in the egress port queue length quantization segmentation point, the quantization range, and the quantization result in table 1 are configurable by a user, that is, the specific values are not limited to those shown in table 1.
Table 2 below is an example of quantizing the bandwidth utilization of the egress port to 3 bits, and quantizing the bandwidth utilization of the egress port.
Figure BDA0000888065360000102
Figure BDA0000888065360000111
TABLE 2
In table 2 above, for example, if the output port bandwidth utilization rate of a certain leaf device or spine device is 86%, the leaf device or spine device queries table 2 above, and the output port bandwidth utilization rate of the leaf device or spine device is quantized to 5, that is, quantized to 101 with 3 bits.
It should be noted that, specific values in the egress port bandwidth utilization quantization segmentation point, the quantization range and the quantization result in table 2 are configurable by a user, that is, the specific values are not limited to those shown in table 2.
In addition, in order to improve the quantization precision, the bit width of the quantization result in table 1 and/or table 2 may be increased, and in this case, the quantization segmentation point is increased accordingly. More quantization bit widths means better quantization accuracy and better reflection of the congestion level of the physical link.
In an example, in a leaf-spine network, a leaf device or a spine device obtains a congestion value of each physical link by looking up the egress port queue length quantization table (i.e., table 1) and/or the egress port bandwidth utilization quantization table (i.e., table 2). The larger the congestion value, the higher the congestion level of the physical link.
For example, the leaf device or the spine device may obtain the congestion value of the physical link of the leaf device or the spine device by respectively giving weights to the quantized value of the length of the output port queue and the quantized value of the bandwidth utilization rate of the output port. For another example, the leaf device or the spin device may obtain the physical link congestion value of the leaf device or the spin device by adding or multiplying the quantized value of the egress port queue length and the quantized value of the egress port bandwidth utilization rate. For another example, the leaf device or the spine device may obtain the physical link congestion value of the leaf device or the spine device only through the quantized value of the exit port queue length; or the physical link congestion value of the leaf device or the spine device can be obtained only through the quantized value of the bandwidth utilization rate of the egress port.
The following proceeds to the description of other steps in fig. 7.
Steps 720 to 730 illustrate that: the server 1 sends data to the server 3 in advance. First, a leaf device 1 (i.e., a first leaf device) corresponding to a server 1 sends a congestion detection message 1 to the leaf device 3 through a path {1,3}, and simultaneously, the leaf device 1 sends a congestion detection message 2 to the leaf device 3 through a path {2,4 }. The congestion detection message 1 includes a congestion value field 1, where the congestion value field 1 is used for writing a congestion value determined by the spine device 1, and the congestion value is used for indicating the congestion degree of the path {1,3 }. The congestion detection message 2 includes a congestion value field 2, where the congestion value field 2 is used for writing a congestion value determined by the spine device 2, and the congestion value is used for indicating the congestion degree of the path {2,4 }.
At step 720, the first leaf device sends a plurality of congestion probe messages over the physical link, each congestion probe message including a congestion value field. For example, the leaf device 1 sends a congestion detection message 1 to the spine device 1 through the physical link 1, where the congestion detection message 1 includes a congestion value field 1, the congestion value field 1 includes a congestion value of the physical link 1 determined by the leaf device 1, and the congestion value of the physical link 1 is used to indicate a congestion degree of the physical link 1; meanwhile, the leaf device 1 sends a congestion detection message 2 to the spine device 2 through the physical link 2, wherein the congestion detection message 2 comprises a congestion value field 2, the congestion value field 2 comprises the congestion value of the physical link 2 determined by the leaf device 1, and the congestion value of the physical link 2 is used for indicating the congestion degree of the physical link 2.
In one example, the leaf device 1 obtains the congestion values of the physical link 1 and the physical link 2 by looking up its egress port queue length quantization table (i.e., table 1) and/or its egress port bandwidth utilization quantization table (i.e., table 2). The specific implementation method is described in step 710 and corresponding content. In one example, the congestion detection message is a protocol packet or a data packet, and the congestion detection message further includes an identification of a path, and the congestion value and the identification of the path are included in a header, a reserved bit, or a new field of the congestion detection message. The following explains the example that the frame header of the congestion detection message includes a congestion value and a path identifier.
The congestion degree of each path is determined according to the congestion degree measurement information of the physical link contained in the path. The measurement information of the congestion degree of the physical link comprises the congestion value of the physical link and the identification of the physical link. See table 3 below.
4bit 6bit
Path-Id Path-Congestion-Quantized-Value
TABLE 3
In table 3, the physical link congestion degree metric information included in the header of the congestion detection message includes: the physical link identification Path-Id, and the Congestion Value Path-Congestion-Quantized-Value of the physical link. Wherein, the Path-Id can be represented by 4 bits, and the Congestion Value Path-Congestion-quantified-Value of the physical link can be represented by 6 bits.
Those skilled in the art will appreciate that the Path-Id bit width is not limited to a specific Value of 4 bits, and the Path-Congestion-Quantized-Value bit width is not limited to a specific Value of 6 bits. When the Path-Id bit width is 4 bits, the method indicates that the leaf device can have 16 physical links at most to reach the spine device. If there are more spine devices in the leaf-spine topology, that is, the number of spine devices is greater than 16, then the Path-Id bit width needs to be greater than 4 bits. And the actual value of the Path-Id bit width is determined according to the number of physical links from the leaf device to the spine device.
In step 730, the spine device determines the maximum congestion value of each physical link included in the path of the received congestion detection message, and processes the congestion detection message according to the maximum congestion value. For example, the spine device 1 receives the congestion probe message 1 from the leaf device 1, and recognizes that the congestion probe message 1 is a congestion probe message addressed to the leaf device 3, the spine device 1 determines that the path of the congestion probe message also includes the physical link 3. The spine device 1 then compares the congestion value included in the frame header of the congestion detection message 1 with the congestion value of the physical link 3. If the congestion value of the physical link 3 is greater than the congestion value included in the congestion probe message 1, the congestion value included in the header of the congestion probe message 1 is updated with the congestion value of the physical link 3, otherwise, the congestion probe message 1 is directly sent to the leaf device 3 (second leaf device) without being updated. That is to say, the spine device 1 selects the maximum value of the congestion values of the physical links 1 and 3 as the congestion value of the path {1,3}, and carries the congestion value of the path {1,3} through the frame header of the congestion detection message 1. Then the spine device 1 sends the congestion probe message 1 including the congestion values of the paths {1,3} to the leaf device 3.
In the same manner as the spine device 1, the spine device 2 also sends a congestion probe message 2 including the congestion value of the path {2,4} to the leaf device 3.
It should be noted that, in step 730, the maximum congestion value of each physical link in the path is selected as the congestion value of the path, so as to measure the congestion degree of the path. In fact, without being limited to this, the congestion level of the path may be measured by adding or multiplying quantized values of the physical links in the path to obtain the congestion value of the path.
At step 740, the second blade device obtains a congestion value from each congestion detection message and stores it. For example, the leaf device 3 receives the congestion probe message 1 and the congestion probe message 2 from the leaf device 1. The Leaf device 3 extracts the values of the Path-Id and the Path-Congestion-Quantized-Value fields From the frame headers of the Congestion detection message 1 and the Congestion detection message 2, respectively, that is, obtains the identifier of the Path and the Congestion Value of the Path, and stores the identifier of the Path and the Congestion Value of the Path into a "From Leaf" Congestion table in the Leaf device 3. The "From Leaf" congestion table is used to store the path identifier of the congestion probe message sent by other Leaf devices and received by the Leaf device 3 as the destination device, and the congestion value of the path.
It should be noted that, through the Path-Id of the physical link identifier, the corresponding Path identifier can be obtained. For example, the leaf device 3 obtains a physical link identifier 1 through a field Path-Id in a header of the congestion detection message, learns that only the spine device 1 is located in spine devices from the leaf device 1 to the leaf device 3 through the physical link 1, and learns that a corresponding Path identifier is {1,3} according to the spine device 1, the leaf device 1 of the source device, and the leaf device 3 of the destination device.
At step 750, the second leaf device feeds back the congestion value of the path of each congestion detection message to the first leaf device through the response message. Wherein the response message may be a reverse congestion probe message. For example, after receiving the congestion probe message 1 and the congestion probe message 2, the leaf device 3 can know the congestion values of the paths {1,3} and the paths {2,4 }. At this time, however, only the destination device leaf device 3 knows the congestion value of each path, and the source device leaf device 1 does not know. Therefore, the leaf device 3 feeds back the congestion value of the path {1,3} through the reverse congestion detection message 3 to the leaf device 1 via the path {3 ', 1' }; meanwhile, the congestion value of the path {2,4} is also fed back to the leaf device 1 through the reverse congestion detection message 4 via the path {4 ', 2' }. Wherein, the path {3 ', 1' } refers to a path from the physical link 3 'to the physical link 1', the physical link 3 'represents a physical link from the leaf device 3 to the spine device 1, and the physical link 1' represents a physical link from the spine device 1 to the leaf device 1; the path {4 ', 2' } refers to a path from the physical link 4 'to the physical link 2', the physical link 4 'represents a physical link from the leaf device 3 to the spine device 2, and the physical link 2' represents a physical link from the spine device 2 to the leaf device 1.
In an example, in order to save resources and reduce the number of congestion detection messages, in the process that the leaf device 3 feeds back the congestion value of the path {1,3} to the leaf device 1 through the path {3 ', 1' }, the congestion value of the path {3 ', 1' } is measured; also, in the process of the leaf device 3 feeding back the congestion value of the path {2,4} to the leaf device 1 through the path {4 ', 2' }, the congestion value of the path {4 ', 2' } is measured. In other words, when measuring the congestion values of the paths {3 ', 1' }, {4 ', 2' } from the leaf device 3 to the leaf device 1, the leaf device 3 carries the congestion values of the paths {1,3}, {2,4} respectively through the frame header (or the reserved bit or the newly added field) of the congestion detection message.
To this end, the frame header of the congestion detection message includes, in addition to the information for storing the congestion quantification result determined during the measurement: besides the Path-Id (physical link identifier) and Path-Congestion-Quantized-Value fields, a Path-Id-Metric (returned physical link identifier) and a Path-Congestion-Quantized-Value-Metric (returned Congestion Value) field are added for carrying the Congestion Value of the Path returned to the source leaf device. See in particular Table 4 below
Figure BDA0000888065360000151
Figure BDA0000888065360000161
TABLE 4
Those skilled in the art will appreciate that the Path-Id, Path-Id-Meter bit widths in Table 4 are not limited to a specific number of 4 bits, nor are the Path-Congestion-quantified-Value, Path-Congestion-quantified-Value-Meter bit widths limited to a specific number of 6 bits.
In step 760, the first leaf device receives a plurality of response messages, acquires a congestion value of a path included in each response message, and selects at least one path from the plurality of paths as a path for transmitting data to the second leaf device. The response message may be a reverse congestion probe message. For example, the leaf device 1 receives a plurality of response messages sent by the leaf device 3, where each response message includes a congestion value of a path through which a congestion probe message corresponding to the response message passes. The leaf device 1 determines the congestion value of each path from the plurality of response messages, and selects at least one path from the plurality of paths as a path for transmitting data to the leaf device 3.
Specifically, the leaf device 1 receives the reverse congestion detection message 3 and the reverse congestion detection message 4 from the leaf device 3, acquires the congestion value of the path {1,3} from the frame header of the reverse congestion detection message 3, and acquires the congestion value of the path {2,4} from the frame header of the reverse congestion detection message 4. The leaf device 1 compares the congestion value of the path {1,3} with the congestion value of the path {2,4}, selects a path with a small congestion value, and transmits data from the server 1 to the server 3 through the path.
In one example, when the leaf device 1 receives the reverse Congestion probe message sent by the leaf device 3, that is, the reverse Congestion probe message 3 and the reverse Congestion probe message 4, in addition to extracting and storing the reverse Congestion degree Metric information from the frame header of the message, the Congestion degree Metric information from the leaf device 3 to the leaf device 1 is extracted and stored from the frame header of the reverse Congestion probe message, including Path-Id-Metric (returned physical link identifier) and Path-Congestion-Quantized-Value-Metric (returned Congestion Value), including Path-Id (physical link identifier) and Path-Congestion-Quantized-Value.
The reverse congestion degree metric information extracted by the leaf device 1 includes two sets, one set is the identifier of the path {1,3} and its congestion value, and the other set is the identifier of the path {2,4} and its congestion value. The Leaf device 1 extracts the two sets of reverse congestion degree metric information and stores the two sets of reverse congestion degree metric information into a To Leaf congestion table, and the To Leaf congestion table stores a path for the Leaf device 1 To serve as a source device To send congestion detection messages To other Leaf devices and a congestion value of the path. The congestion degree metric information extracted by the leaf device 1 from the leaf device 3 to the leaf device 1 also includes two sets, one set is the identifier of the path {3 ', 1' } and its congestion value, and the other set is the identifier of the path {4 ', 2' } and its congestion value. The Leaf device 1 extracts the two sets of congestion degree metric information and stores the two sets of congestion degree metric information into a "From Leaf" quantized congestion table, wherein the "From Leaf" congestion table stores the path identifier of congestion detection messages sent by other Leaf devices and received by the Leaf device 1 as a target device, and the congestion value of the path.
In the same way, all leaf devices and spine devices in the leaf-spine network structure measure the congestion degree of each path, and finally, the leaf devices and spine devices form respective congestion tables. When load sharing is performed on a service flow, a Leaf device selects a path with the minimum congestion value To send data according To the congestion value of each path corresponding To a certain target Leaf device in a stored To Leaf congestion table.
It should be noted that what is stated above is a reverse congestion probing message header, which includes only one reverse congestion value in addition to the congestion values of the paths it traverses. For example, the reverse congestion probing message 3 header includes only the congestion values of paths {1,3} in addition to the congestion values of paths {3 ', 1' }. In practice, one reverse congestion probing message header may include multiple reverse congestion values. For example, reverse congestion probing message 3 may include multiple reverse congestion values, such as the congestion value for path {1,3} and the congestion value for path {2,4 }.
Furthermore, what has been set forth above is by including a congestion value in a frame header of a congestion probe message (or reverse congestion probe message). In practice, the congestion value may also be carried in other ways. For example, the congestion value is carried by a traffic packet. In this case, a field may be additionally added to the service packet that needs the leaf device and the spine device to support the service packet, the leaf device carries the congestion value through the additionally added field, and the leaf device strips the additionally added field from the service packet after obtaining the congestion value. As another example, the congestion value is conveyed by a reserved bit (Res) of a traffic message field.
In summary, through the GLB load sharing in the embodiment of the present invention, the business message sent by the Leaf device 1 and destined to the Leaf device 3 can bypass the congested path of the remote physical link, so as to ensure the transmission performance and throughput of the business flow, improve the user experience, and increase the utilization rate of the network bandwidth.
Fig. 8 is a schematic diagram of a leaf device for traffic load sharing based on a leaf-spine topology structure according to an embodiment of the present invention. The leaf device includes a transmitter 810, a receiver 820, and a processor 820.
The transmitter 810 is configured to transmit a plurality of congestion detection messages to another leaf device through a plurality of paths, each of the congestion detection messages including a congestion value field indicating a congestion degree of the path.
In one example, the path congestion value is determined based on congestion values of physical links included in the path. Further, the path congestion value is the maximum value of the congestion values of the physical links included in the path. For example, in fig. 6, the congestion value of the path {1,3} is the maximum value of the congestion values of the physical link 1 and the physical link 3.
In one example, the physical link congestion value is determined according to a quantized value of a queue length of an egress port corresponding to the physical link and/or a quantized value of a bandwidth utilization rate of an egress port corresponding to the physical link. The detailed method is described in the above step 710 and its corresponding content.
The receiver 820 is configured to receive a plurality of response messages sent by another leaf device, where each response message includes a congestion value of a path through which a congestion probe message corresponding to the response message passes.
In one example, the response message received by the receiver 820 is the reverse congestion probe message described above, i.e., a congestion probe message from the destination leaf device that includes the congestion value of the path from the leaf device to another leaf device. For example, in fig. 6, the reverse congestion probe message 3 includes the congestion values of the paths {1,3 }.
Further, the reverse congestion detection message received by the leaf device also includes the congestion value of the other leaf device to the leaf device. For example, in fig. 6, the reverse congestion detection message 3 further includes the congestion value of the path {3 ', 1' }. The processor 830 is configured to determine a congestion value of each path according to a plurality of response messages from the receiver 820, and select at least one path from the plurality of paths as a path for transmitting data to the other leaf device.
In one example, the congestion detection message is a protocol packet or a data packet, and the congestion detection message further includes an identification of the path. The congestion value and the identification of the path are included in a header or reserved bit or a new field of the congestion detection message.
In one example, the leaf device selects a path with the smallest congestion value from the plurality of paths, and sends data to the other leaf device.
Fig. 9 is a schematic diagram of a spine device for traffic load sharing based on a leaf-spine topology according to an embodiment of the present invention. In FIG. 9, a first leaf device communicates with a second leaf device via the spine device.
The spine device includes a receiver 910 and a processor 920.
The receiver 910 is configured to receive a congestion detection message sent by the first leaf device, and a congestion value field of the congestion detection message includes a congestion value of a first physical link from the first leaf device to the spine device.
In one example, the congestion detection message is a protocol packet or a data packet, and the congestion detection message further includes an identification of the path. The congestion value and the identification of the path are included in a header or reserved bit or a new field of the congestion detection message.
In one example, the physical link congestion value is determined according to a quantized value of a queue length of an egress port corresponding to the physical link and/or a quantized value of a bandwidth utilization rate of an egress port corresponding to the physical link. The detailed method is described in the above step 710 and its corresponding content.
The processor 920 is configured to determine a congestion value of a path from the first leaf device to the second leaf device via the spine device by comparing the first physical link congestion value with a second physical link congestion value of the spine device to the second leaf device, and process the congestion detection message according to the determined congestion value.
In one example, the processor 920 takes the maximum of the first physical link congestion value and the second physical link congestion value as the path congestion value of the first leaf device to the second leaf device via the spine device.
In one example, when processing the congestion probe message according to the determined congestion value, the processor 920 is specifically configured to: when the congestion value of the first physical link is the congestion value of the path, directly sending the congestion detection message to the second blade device; when the congestion value of the second physical link is the congestion value of the path, the congestion value of the first physical link in the congestion value field of the congestion detection message is updated to the congestion value of the second physical link, and the updated congestion detection message is sent to the second leaf device.
Fig. 10 is a system for traffic load sharing based on a leaf-ridge topology according to an embodiment of the present invention. The system comprises a first leaf device 101, a spine device 102, a second leaf device 103. The first leaf device 101 communicates with the second leaf device 103 via the spine device 102.
The first leaf device 101 is configured to send a congestion probe message to the spine device 102, and the congestion probe message includes a congestion value field. The congestion value field is used to write a congestion value determined by the first leaf device, the congestion value being used to indicate a degree of congestion of a first physical link from the first leaf device 101 to the spine device 102.
The spine device 102 is configured to compare the first physical link congestion value with a second physical link congestion value of the spine device 102 to the second blade device 103 and determine a congestion value of a path of the first blade device 101 via the spine device 102 to the second blade device 103.
The second leaf device 103 is configured to receive the congestion value of the path from the spine device 102, and feed the congestion value of the path back to the first leaf device 101, so that the first leaf device 101 selects a corresponding path to transmit data.
In one example, the physical link congestion value is determined according to a quantized value of a queue length of a corresponding egress port of the physical link and/or a quantized value of a bandwidth utilization rate of a corresponding egress port.
In one example, the congestion detection message is a protocol packet or a data packet, and the congestion detection message further includes an identification of the path. The congestion value and the identification of the path are included in a header or reserved bit or a new field of the congestion detection message. .
In one example, the path congestion value is a maximum of the first physical link congestion value and the second physical link congestion value.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (9)

1. A traffic load sharing method based on a leaf-spine topology, wherein a first leaf device communicates with a second leaf device through a spine device, the method comprising:
the first leaf device sends a plurality of congestion detection messages to the second leaf device through a plurality of paths, wherein each congestion detection message comprises a congestion value field, the congestion value field is used for writing a congestion value determined by a device through which the congestion detection message passes currently, and the congestion value is used for indicating the congestion degree of the path; the first leaf device receives a plurality of response messages sent by the second leaf device, wherein each response message comprises a congestion value of a path through which a congestion detection message corresponding to the response message passes;
the first leaf device determines the congestion value of each path according to the plurality of response messages, and selects at least one path from the plurality of paths as a path for sending data to the second leaf device;
and determining the congestion value of each path according to the congestion values of the physical links contained in the path.
2. The method of claim 1, wherein the congestion value of the physical link is determined according to a quantized value of a queue length of a corresponding egress port of the physical link and/or a quantized value of a bandwidth utilization of a corresponding egress port.
3. The method according to any of claims 1 to 2, wherein the congestion detection message is a protocol packet or a data packet, the congestion detection message further comprising an identification of a path; the congestion value and the identification of the path are included in a header, a reserved bit, or a newly added field of the congestion detection message.
4. The method of any of claims 1 to 2, further comprising:
the first leaf device receives a reverse congestion detection message sent by the second leaf device; and acquiring the reverse congestion value of the path passed by the reverse congestion detection message from the congestion value field of the reverse congestion detection message.
5. The method of any of claims 1-2, wherein each response message received by the first leaf device also carries a congestion value for a path from the second leaf device to the first leaf device.
6. A leaf device for traffic load sharing based on a leaf-spine topology, wherein the leaf device is in communication with another leaf device, the leaf device comprising:
a transmitter, configured to transmit a plurality of congestion detection messages to the another leaf device through a plurality of paths, where each congestion detection message includes a congestion value field, where the congestion value field is used to write a congestion value determined by a device through which the congestion detection message currently passes, and the congestion value is used to indicate a congestion degree of a path;
a receiver, configured to receive a plurality of response messages sent by the another leaf device, where each response message includes a congestion value of a path through which a congestion probe message corresponding to the response message passes;
a processor, configured to determine a congestion value of each path according to the plurality of response messages, and select at least one path from the plurality of paths as a path for sending data to the another leaf device;
and determining the congestion value of each path according to the congestion values of the physical links contained in the path.
7. The leaf device of claim 6, wherein the congestion value of the physical link is determined from a quantized value of a queue length of a corresponding egress port of the physical link and/or a quantized value of a bandwidth utilization of a corresponding egress port.
8. The leaf device of any of claims 6 to 7, wherein the congestion detection message is a protocol packet or a data packet, the congestion detection message further comprising an identification of a path; the congestion value and the identification of the path are included in a header, a reserved bit, or a newly added field of the congestion detection message.
9. The leaf device according to any one of claims 6 to 7, wherein the leaf device selects a path with a minimum congestion value from the plurality of paths, and sends the data to the other leaf device.
CN201510981555.3A 2015-12-23 2015-12-23 Flow load sharing method, device and system based on leaf-ridge topological structure Active CN106911584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510981555.3A CN106911584B (en) 2015-12-23 2015-12-23 Flow load sharing method, device and system based on leaf-ridge topological structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510981555.3A CN106911584B (en) 2015-12-23 2015-12-23 Flow load sharing method, device and system based on leaf-ridge topological structure

Publications (2)

Publication Number Publication Date
CN106911584A CN106911584A (en) 2017-06-30
CN106911584B true CN106911584B (en) 2020-04-14

Family

ID=59200421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510981555.3A Active CN106911584B (en) 2015-12-23 2015-12-23 Flow load sharing method, device and system based on leaf-ridge topological structure

Country Status (1)

Country Link
CN (1) CN106911584B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109861925B (en) * 2017-11-30 2021-12-21 华为技术有限公司 Data transmission method, related device and network
CN108270643B (en) * 2017-12-14 2021-07-02 中国银联股份有限公司 Method and equipment for detecting link between Leaf-Spine switches
CN109728947A (en) * 2018-12-26 2019-05-07 成都科来软件有限公司 A kind of network performance analysis method based on cloud computing in conjunction with network topological diagram
CN109802879B (en) * 2019-01-31 2021-05-28 新华三技术有限公司 Data stream routing method and device
CN112511325B (en) * 2019-09-16 2022-03-11 华为技术有限公司 Network congestion control method, node, system and storage medium
US11575594B2 (en) * 2020-09-10 2023-02-07 Mellanox Technologies, Ltd. Deadlock-free rerouting for resolving local link failures using detour paths
CN112787925B (en) * 2020-10-12 2022-07-19 中兴通讯股份有限公司 Congestion information collection method, optimal path determination method and network switch
CN112910795B (en) * 2021-01-19 2023-01-06 南京大学 Edge load balancing method and system based on many sources
CN114221907B (en) * 2021-12-06 2023-09-01 北京百度网讯科技有限公司 Network hash configuration method, device, electronic equipment and storage medium
CN116192636B (en) * 2023-04-27 2023-08-15 苏州浪潮智能科技有限公司 Network device hash group configuration method and device, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104092628A (en) * 2014-07-23 2014-10-08 杭州华三通信技术有限公司 Flow distribution method and network devices
CN104813620A (en) * 2012-11-20 2015-07-29 思科技术公司 Fabric load balancing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104813620A (en) * 2012-11-20 2015-07-29 思科技术公司 Fabric load balancing
CN104092628A (en) * 2014-07-23 2014-10-08 杭州华三通信技术有限公司 Flow distribution method and network devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CONGA: Distributed Congestion-Aware Load Balancing for Datacenters;Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar et al;《SIGCOMM 2014,ACM》;20140822;第503-514页 *

Also Published As

Publication number Publication date
CN106911584A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN106911584B (en) Flow load sharing method, device and system based on leaf-ridge topological structure
US9065795B2 (en) Apparatus and method for providing a congestion measurement in a network
US11736388B1 (en) Load balancing path assignments techniques
US10498612B2 (en) Multi-stage selective mirroring
US8804509B2 (en) System and method of communicating a media stream
US9264341B2 (en) Method and system for dynamic routing and/or switching in a network
US9185047B2 (en) Hierarchical profiled scheduling and shaping
US8427958B2 (en) Dynamic latency-based rerouting
CN112313910A (en) Multi-path selection system and method for data center centric metropolitan area networks
US10057174B2 (en) Dynamic group multipathing
US20130003549A1 (en) Resilient Hashing for Load Balancing of Traffic Flows
US10492084B2 (en) Collaborative communications
US20140211621A1 (en) System and method for link aggregation group hashing using flow control information
WO2018036100A1 (en) Data message forwarding method and apparatus
US11516695B2 (en) Link aggregation with data segment fragmentation
US8948011B2 (en) Pseudo-relative mode WRED/tail drop mechanism
US20180097731A1 (en) Communication apparatus and method of communication
US20110122883A1 (en) Setting and changing queue sizes in line cards
CN111224888A (en) Method for sending message and message forwarding equipment
CN112825512A (en) Load balancing method and device
US8619627B2 (en) Automatic determination of groupings of communications interfaces
CN112910795B (en) Edge load balancing method and system based on many sources
CN115190537A (en) Wireless link dynamic selection method and system
CN107592269B (en) Method and network node for transmitting load information of path
US10652159B2 (en) Mobile packet data rate control based on radio load and other measures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant