CN111711565B - Multi-path routing method oriented to high-speed interconnected dragonfly + network - Google Patents
Multi-path routing method oriented to high-speed interconnected dragonfly + network Download PDFInfo
- Publication number
- CN111711565B CN111711565B CN202010616646.8A CN202010616646A CN111711565B CN 111711565 B CN111711565 B CN 111711565B CN 202010616646 A CN202010616646 A CN 202010616646A CN 111711565 B CN111711565 B CN 111711565B
- Authority
- CN
- China
- Prior art keywords
- switch node
- data
- dragonfly
- network
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/24—Multipath
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/38—Flow based routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/12—Avoiding congestion; Recovering from congestion
- H04L47/125—Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
Abstract
The invention discloses a high-speed interconnected dragonfly + network multi-path routing method, which comprises the following steps that based on various information in the dragonfly + network, a switch node firstly determines whether data uses an intermediate group through the weight of the shortest path, then selects the intermediate group, and finally selects an output port on the basis of the intermediate group, so that the QP-granularity data multi-path routing in the high-speed interconnected dragonfly + network is realized.
Description
Technical Field
The invention belongs to the technical field of communication, and further relates to a multi-path routing method based on a high-speed interconnected dragonfly + network in the technical field of network communication. The invention can select a proper path for data, and is used for balancing data traffic in a dragonfly + network interconnected at high speed.
Background
Dragonfly + networks, also known as Dragonfly + networks, are a new network architecture that has been proposed to be popular with scientists due to their low network diameter, rich path diversity, high scalability, and large split bandwidth. In terms of interconnect technology, IB (InfiniBand) is currently the most widely used in high-speed interconnect networks. By 11 months 2019, IB was used as its interconnection technology in 44% of the first 50 fastest supercomputers in the world. Routing is the basis of the operation stability and utilization rate of the whole network, a multi-path routing mechanism based on an lmc (lid Mask control) in the existing IB cannot be directly applied to a dragonfly + network, how to perform multi-path routing in a high-speed interconnection dragonfly + network, and how to fully utilize the path diversity of the multi-path routing mechanism becomes an important technical problem.
An Inter-sub Source Routing method ISSR (Inter-sub Source Routing) based on Source Routing is disclosed in a patent document "System and method for Routing channel spaced on Source Routing" (application No. US201313889088 publication No. US9231888) applied by Oracle International Corporation. The method comprises the steps of establishing a routing model, constructing a hash function, polling and selecting different paths on a router according to different global addresses and local addresses of data between the same sub-networks, and inquiring information of a switch to select the optimal path for the data after the data reach the sub-networks. The method has two disadvantages, one is that the router completely depends on the address information of data when polling and selecting the path, the flow between the same node pair cannot be balanced, and the congestion is easy to occur on some key links. Secondly, the network may generate deadlock in the operation process, and is not suitable for being deployed and used in the actual high-speed interconnection network.
A RDMA-based multipath routing method MP-RDMA (Multi-Path Transport for RDMA) is proposed in the paper "Multi-Path Transport for RDMA in Datacenters" (15th USENIX Symposium on Networked Systems Design and Implementation (NSDI' 18), April 9-11,2018 Renton, WA, USA) published by Yuanwei Lu et al. The method comprises the steps of firstly sending congestion sensing data packets to multiple paths, transmitting congestion sensing information by utilizing an ACK mechanism, and selecting different paths for data according to Message granularity. Meanwhile, by means of an unordered path perception selection algorithm, a slow path is actively pruned, and the flow is concentrated on a fast path. The method has the disadvantages that the granularity is too fine when the path is selected for the data, the data disorder to a certain degree is easy to cause, and the service quality of the actual network can be reduced when the slow path is pruned.
Disclosure of Invention
The present invention aims to solve the above-mentioned deficiencies of the prior art, and provides a multi-path routing method for a high-speed interconnected dragonfly + network, which is used for selecting an appropriate path for data in the dragonfly + network interconnected at a high speed and balancing data traffic.
The specific idea for realizing the purpose of the invention is as follows: the invention utilizes the information of the data packet header, the switch node information and the dragonfly + network structure information, combines the self-defined shortest path weight, firstly determines the group serial number of the next hop switch node for each data, and then selects a specific outlet port.
The steps of the invention comprise:
(1) numbering all groups in dragonfly + network:
a subnet manager in the dragonfly + network of the path to be selected selects a group from the dragonfly + network optionally, sets the serial number of the group as 0, and numbers the rest groups in the dragonfly + network according to the sequence that the distance from the selected group is gradually increased from near to far;
(2) collecting structure information and switch node information:
each switch node inquires dragonfly + network structure information and the switch node information from the subnet manager;
(3) collecting data packet header information:
the switch node receiving the data collects data packet header information from the received data packet header field;
(4) when the source group serial number of the data is not equal to the destination group serial number of the data, judging whether the source group serial number of the data is equal to the group serial number of the switch node, if so, executing the step (5), otherwise, executing the step (11);
(5) and calculating the data identifier of each data received by the switch node according to the following formula:
wherein, XiData identification, destQP, representing the ith data received by the switch nodei[23:8]The first 16 bits of 24 bits DestQP in the ith packet header field received by the switch node,indicating an exclusive-OR operation, SLIDi[15:0]Represents the 16-bit source address SLID, DestQP in the ith packet header field received by the switch nodei[15:0]Indicating the last 16 bits of 24 bits DestQP in the ith packet header field received by the switch node, DLIDi[15:0]A destination address DLID of 16 bits in a header field of ith data received by a switch node;
(6) judging whether the following formula is established; if yes, executing the step (7), otherwise, executing the step (9);
Xi%(K/2+W*SLi)<K/2
wherein,% represents the modulo operation, K represents the number of ports of the switch node receiving the data, W represents the shortest path weight,. denotes the multiplication operation, SLiRepresenting a Service Level value in the ith data packet header field received by the switch node;
(7) calculating the group sequence number of the next hop switch node of the ith data received by the switch node according to the following formula:
Gn=(Gs+VL+Gd)%Ga
wherein G isnGroup number, G, representing the i-th data next hop switch node received by the switch nodesSource group number indicating ith data received by the switch node, VL indicating virtual channel of ith data received by the switch node, GdSequence number, G, indicating the destination group of the ith data received by the switch nodeaTotal number of groups representing dragonfly + net;
(8) judging whether the group serial number corresponding to the next hop switch node is equal to the source group serial number of the data or not, if so, executing the step (9); otherwise, executing step (10);
(9) by GdUpdate GnThen executing step (10);
(10) and (3) calculating an output port of the ith data received by the switch node according to the following formula, and then executing the step (12):
Po=Xi%Pu
wherein, PoEgress port, P, representing the ith data received by the switch nodeuRepresenting connections G in a switch nodenAll feasible port sets of (2);
(11) the switch node inquires a linear forwarding table LFT to obtain an output port of data;
(12) the switch node forwards the data out of the egress port.
Compared with the prior art, the invention has the following advantages:
first, because the invention adopts the method of calculating the data identifier of each data received by the switch node, the data identifiers of the data with the same QP are necessarily the same, thus overcoming the problem that the granularity is too fine when the data selects the path in the prior art, which is easy to cause a certain degree of data disorder, and leading the invention to always select the same path when facing the path selection of the same QP data, thereby avoiding the disorder condition and simultaneously improving the user satisfaction.
Secondly, the virtual channels of the data are considered when the group number of the next hop switch node of the data is calculated, so that the data of different virtual channels are distributed to different paths as much as possible, and the problem that deadlock may occur in the operation process of the network in the prior art is solved, so that the method has the advantage of deadlock prevention and is suitable for being deployed in an actual high-speed interconnection network.
Thirdly, because the invention calculates the group number of the next hop switch node of the data, and distributes the data in different groups according to the QP granularity, the invention overcomes the problems that the flow between the same node pair can not be balanced and the congestion is easy to occur on some key links in the prior art, thereby fully utilizing the dragonfly + rich link resources of the network, and having the advantages of balancing the flow and alleviating the congestion.
Drawings
FIG. 1 is a flow chart of the present invention;
fig. 2 is a network topology diagram after the implementation of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and examples.
The specific steps of the present invention are described in further detail with reference to fig. 1.
Step 1, numbering all groups in the dragonfly + network.
The dragonfly + network is composed of a certain number of groups, each group comprises a certain number of switch nodes and server nodes, a subnet manager in the dragonfly + network selects one group from the dragonfly + network optionally, the serial number of the group can be set to be 0, and the rest groups in the dragonfly + network are numbered according to the sequence of increasing the distance from the selected group from near to far, so that the serial numbers of all the groups in the dragonfly + network are continuous, and the minimum serial number is 0.
each switch node inquires dragonfly + network structure information and the switch node information from the subnet manager, wherein the dragonfly + network structure information comprises a sequence number table of a group where each switch node in the dragonfly + network is located, a sequence number table of a group where each server node in the dragonfly + network is located, and an SL to VL table of the dragonfly + network. The information is fixed information, and only needs to be collected once as long as the network is not changed, and when the network operates again or fails, the information needs to be collected again.
the switch node receiving the data collects data packet header information from the received data packet header field; the data header information comprises a SL value of data, a source address SLID of the data, a destination address DLID of the data and a DestQP of the data. The switch node collects the information again each time it receives a data.
And 4, when the source group serial number of the data is not equal to the destination group serial number of the data, judging whether the source group serial number of the data is equal to the group serial number of the switch node, if so, executing the step 5, otherwise, executing the step 11.
The source group serial number of the data, the destination group serial number of the data and the group serial number of the switch node are obtained by substituting the address of the switch node into the sequence number table of the group where each server node in the dragonfly + network corresponds to with the data source address SLID and the data destination address DLID, and substituting the address of the switch node into the sequence number table of the group where each switch node in the dragonfly + network belongs.
Step 5, according to the following formula, calculating the data identifier of each data received by the switch node:
wherein, XiData identification, destQP, representing the ith data received by the switch nodei[23:8]The first 16 bits of 24 bits DestQP in the ith packet header field received by the switch node,indicating an exclusive-OR operation, SLIDi[15:0]Represents the 16-bit source address SLID, DestQP in the ith packet header field received by the switch nodei[15:0]Indicating the last 16 bits of 24 bits DestQP in the ith packet header field received by the switch node, DLIDi[15:0]A destination address DLID of 16 bits in the header field representing the ith data received by the switch node.
Xi%(K/2+W*SLi)<K/2
Where,% represents a modulo operation, K represents a port of a switch node that receives the dataNumber, W represents shortest path weight, multiplication operation, SLiRepresenting a Service Level value in the ith data packet header field received by the switch node; the shortest path weight is a numerical value set by the subnet manager for all switch nodes in the dragonfly + network, and when the value of the shortest path weight is higher, the probability that the switch nodes select the shortest paths for the data to directly reach the destination group from the source group without passing through other groups is higher.
Step 7, calculating the group sequence number of the next hop switch node of the ith data received by the switch node according to the following formula:
Gn=(Gs+VL+Gd)%Ga
wherein G isnGroup number, G, representing the i-th data next hop switch node received by the switch nodesSource group number indicating ith data received by the switch node, VL indicating virtual channel of ith data received by the switch node, GdSequence number, G, indicating the destination group of the ith data received by the switch nodeaRepresenting the total number of dragonfly + network groups.
And 8, judging whether the group serial number corresponding to the next hop switch node is equal to the source group serial number of the data, if so, executing the step 9, otherwise, executing the step 10.
Step 10, according to the following formula, after calculating the output port of the ith data received by the switch node, executing step 12:
Po=Xi%Pu
wherein, PoEgress port, P, representing the ith data received by the switch nodeuRepresenting connections G in a switch nodenIs set of all feasible ports. And mapping the modulus result and the feasible ports one by one through modulus operation to obtain the specific ports.
The linear forwarding table LFT is stored in the switch node, and is maintained by the subnet manager at the beginning of network operation, and the switch node directly queries the linear forwarding table LFT to obtain the shortest path exit port leading from the node to the destination node.
The invention will be further described with reference to the network topology after the implementation of the invention shown in fig. 2 of the drawings.
In fig. 2, the dragonfly + network is formed by a switch node and a host node, a connecting line between each two nodes represents a connecting link between the two nodes, and an arrow on the connecting link represents a data transfer direction. The node labeled host 1 in fig. 2 is the source node, the node labeled host 5 in fig. 2 is the destination node, and host 1 has a total of 6 data sent to H5 nodes, i.e., stream 1, stream 2, stream 3, stream 4, stream 5, and stream 6, respectively. At the leaf switch 1 node, data Q selects one path for stream 1 and stream 2, and another path for data stream 3, stream 4, stream 5, and stream 6. At the spine switch 1 node, data flow 3 has selected one path and data flow 4, flow 5, flow 6 has selected another path.
The foregoing description is only an example of the present invention and is not intended to limit the invention, so that it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the spirit and scope of the invention.
Claims (6)
1. A multipath routing method facing a high-speed interconnected dragonfly + network is characterized in that information of a data packet header, switch node information and dragonfly + network structure information are combined with self-defined shortest path weight, a group sequence number of a next hop switch node is determined for each data, and then an output port is selected; the method comprises the following specific steps:
(1) numbering all groups in dragonfly + network:
a subnet manager in the dragonfly + network of the path to be selected selects a group from the dragonfly + network optionally, sets the serial number of the group as 0, and numbers the rest groups in the dragonfly + network according to the sequence that the distance from the selected group is gradually increased from near to far;
(2) collecting structure information and switch node information:
each switch node inquires dragonfly + network structure information and the switch node information from the subnet manager;
(3) collecting data packet header information:
the switch node receiving the data collects data packet header information from the received data packet header field;
(4) when the source group serial number of the data is not equal to the destination group serial number of the data, judging whether the source group serial number of the data is equal to the group serial number of the switch node, if so, executing the step (5), otherwise, executing the step (11);
(5) and calculating the data identifier of each data received by the switch node according to the following formula:
wherein, XiData identification, destQP, representing the ith data received by the switch nodei[23:8]The first 16 bits of 24 bits DestQP in the ith packet header field received by the switch node,indicating an exclusive-OR operation, SLIDi[15:0]Represents the 16-bit source address SLID, DestQP in the ith packet header field received by the switch nodei[15:0]Indicating the last 16 bits of 24 bits DestQP in the ith packet header field received by the switch node, DLIDi[15:0]A destination address DLID of 16 bits in a header field of ith data received by a switch node;
(6) judging whether the following formula is established; if yes, executing the step (7), otherwise, executing the step (9);
Xi%(K/2+W*SLi)<K/2
wherein,% represents the modulo operation, K represents the number of ports of the switch node receiving the data, W represents the shortest path weight,. denotes the multiplication operation, SLiA SL value in a header field of an ith data packet received by the switch node;
(7) calculating the group sequence number of the next hop switch node of the ith data received by the switch node according to the following formula:
Gn=(Gs+VL+Gd)%Ga
wherein G isnGroup number, G, representing the i-th data next hop switch node received by the switch nodesSource group number indicating ith data received by the switch node, VL indicating virtual channel of ith data received by the switch node, GdSequence number, G, indicating the destination group of the ith data received by the switch nodeaTotal number of groups representing dragonfly + net;
(8) judging whether the group serial number corresponding to the next hop switch node is equal to the source group serial number of the data or not, if so, executing the step (9); otherwise, executing step (10);
(9) by GdUpdate GnThen executing step (10);
(10) and (3) calculating an output port of the ith data received by the switch node according to the following formula, and then executing the step (12):
Po=Xi%Pu
wherein, PoEgress port, P, representing the ith data received by the switch nodeuRepresenting connections G in a switch nodenAll feasible port sets of (2);
(11) the switch node inquires a linear forwarding table LFT to obtain an output port of data;
(12) the switch node forwards the data out of the egress port.
2. The multi-path routing method for a high-speed interconnect dragonfly + network as claimed in claim 1, wherein the configuration information of the dragonfly + network in step (2) includes a sequence number table of a group in which each switch node in the dragonfly + network is located, a sequence number table of a group in which each server node in the dragonfly + network is located, and an SL to VL table of the dragonfly + network.
3. The multi-path routing method for a high-speed interconnect dragonfly + network as claimed in claim 1, wherein the node information of the switch in the step (2) includes a total number of ports of the switch node, an address of the switch node itself and a linear forwarding table LFT.
4. The multi-path routing method for a high-speed interconnect dragonfly + network as claimed in claim 1, wherein the data header information in step (3) includes SL value of data, source address SLID of data, destination address DLID of data, DestQP of data.
5. The multi-path routing method for a high-speed interconnect dragonfly + network as claimed in claim 1, wherein the source group number of the data, the destination group number of the data and the group number of the switch node in step (4) are obtained by substituting the data source address SLID by the switch node, the data destination address DLID into the sequence number table of the group in which each server node in the dragonfly + network corresponds to, and the switch node substituting the address of the switch node itself into the sequence number table of the group in which each switch node in the dragonfly + network belongs.
6. The multi-path routing method for a high-speed interconnect dragonfly + network as claimed in claim 1, wherein the shortest path weight in step (6) is a value set by the subnet manager for all switch nodes in the dragonfly + network, and when the value of the shortest path weight is higher, the probability that a switch node will reach the shortest path for data selection data directly from the source group to the destination group without passing through other groups is higher.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010616646.8A CN111711565B (en) | 2020-07-01 | 2020-07-01 | Multi-path routing method oriented to high-speed interconnected dragonfly + network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010616646.8A CN111711565B (en) | 2020-07-01 | 2020-07-01 | Multi-path routing method oriented to high-speed interconnected dragonfly + network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111711565A CN111711565A (en) | 2020-09-25 |
CN111711565B true CN111711565B (en) | 2021-05-04 |
Family
ID=72543958
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010616646.8A Active CN111711565B (en) | 2020-07-01 | 2020-07-01 | Multi-path routing method oriented to high-speed interconnected dragonfly + network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111711565B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115914078A (en) * | 2021-09-28 | 2023-04-04 | 华为技术有限公司 | Message forwarding method and device and dragonfly network |
CN116980341A (en) * | 2022-04-22 | 2023-10-31 | 华为技术有限公司 | Message sending method, network equipment and communication system |
CN117081984B (en) * | 2023-09-27 | 2024-03-26 | 新华三技术有限公司 | Route adjustment method and device and electronic equipment |
CN117155846B (en) * | 2023-10-31 | 2024-02-06 | 苏州元脑智能科技有限公司 | Routing method, device, computer equipment and storage medium of interconnection network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104079490A (en) * | 2014-06-27 | 2014-10-01 | 清华大学 | Multi-layer dragonfly interconnecting network and self-adaptive routing method |
CN110324243A (en) * | 2018-03-28 | 2019-10-11 | 清华大学 | The dragonfly network architecture and its broadcast routing method |
CN110324249A (en) * | 2018-03-28 | 2019-10-11 | 清华大学 | A kind of dragonfly network architecture and its multicast route method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9519605B2 (en) * | 2014-07-08 | 2016-12-13 | International Business Machines Corporation | Interconnection network topology for large scale high performance computing (HPC) systems |
CN108234310B (en) * | 2016-12-12 | 2021-06-15 | 清华大学 | Multilevel interconnection network, self-adaptive routing method and routing equipment |
-
2020
- 2020-07-01 CN CN202010616646.8A patent/CN111711565B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104079490A (en) * | 2014-06-27 | 2014-10-01 | 清华大学 | Multi-layer dragonfly interconnecting network and self-adaptive routing method |
CN110324243A (en) * | 2018-03-28 | 2019-10-11 | 清华大学 | The dragonfly network architecture and its broadcast routing method |
CN110324249A (en) * | 2018-03-28 | 2019-10-11 | 清华大学 | A kind of dragonfly network architecture and its multicast route method |
Non-Patent Citations (3)
Title |
---|
《Shortest paths in Dragonfly systems》;Ryland Curtsinger;《2019 International Workshop of High-Perfomance Interconnection Networks in the Exascale and Big-Data Era (HiPNEB)》;20191231;全文 * |
《蜻蜓网络拓扑与路由算法分析》;施得君;《第十八届计算机工程与工艺年会暨第四届微处理器技术论坛论文集》;20141231;全文 * |
《蜻蜓网络自适应路由算法实现解析》;高剑刚;《高性能计算技术》;20150228;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111711565A (en) | 2020-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111711565B (en) | Multi-path routing method oriented to high-speed interconnected dragonfly + network | |
EP2911348B1 (en) | Control device discovery in networks having separate control and forwarding devices | |
US8792506B2 (en) | Inter-domain routing in an n-ary-tree and source-routing based communication framework | |
US8438305B2 (en) | Method and apparatus for implementing multiple portals into an RBRIDGE network | |
US7359383B2 (en) | Load balancing with mesh tagging | |
US8989049B2 (en) | System and method for virtual portchannel load balancing in a trill network | |
US6711171B1 (en) | Distributed connection-oriented services for switched communications networks | |
US6069895A (en) | Distributed route server | |
CN104335537B (en) | For the system and method for the multicast multipath of layer 2 transmission | |
US20080137669A1 (en) | Network of nodes | |
US20100177778A1 (en) | Distributed connection-oriented services for switched communication networks | |
EP3266166A1 (en) | Method and apparatus for load balancing in network switches | |
US20100046537A1 (en) | Methods for intelligent nic bonding and load-balancing | |
Lee et al. | Improving TCP performance in multipath packet forwarding networks | |
AU3154397A (en) | Connection aggregation in switched communications networks | |
US6611874B1 (en) | Method for improving routing distribution within an internet and system for implementing said method | |
CN100563215C (en) | A kind of packet routing switch device and method thereof | |
CN102891800A (en) | Scalable forwarding table with overflow address learning | |
CN107846706A (en) | A kind of coding cognitive radio mesh network multipaths footpath method for routing of Congestion Avoidance | |
CN108400936A (en) | Information Network method for routing based on MPLS | |
CN101789949A (en) | Method and router equipment for realizing load sharing | |
Tanenbaum et al. | The network layer | |
CN113438182A (en) | Flow control system and flow control method based on credit | |
Srebrny et al. | Cachecast: Eliminating redundant link traffic for single source multiple destination transfers | |
Wu | Packet forwarding technologies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |