WO2012162988A1 - Efficient adaptive deadlock-free routing algorithms for torus networks - Google Patents

Efficient adaptive deadlock-free routing algorithms for torus networks Download PDF

Info

Publication number
WO2012162988A1
WO2012162988A1 PCT/CN2011/080312 CN2011080312W WO2012162988A1 WO 2012162988 A1 WO2012162988 A1 WO 2012162988A1 CN 2011080312 W CN2011080312 W CN 2011080312W WO 2012162988 A1 WO2012162988 A1 WO 2012162988A1
Authority
WO
WIPO (PCT)
Prior art keywords
packet
algorithm
routing
wraparound
dimension
Prior art date
Application number
PCT/CN2011/080312
Other languages
French (fr)
Inventor
Dong Xiang
Wei Luo
Original Assignee
Dong Xiang
Wei Luo
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dong Xiang, Wei Luo filed Critical Dong Xiang
Publication of WO2012162988A1 publication Critical patent/WO2012162988A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/18Loop-free operations

Definitions

  • the present invention relates to parallel interconnection networks, more particularly to deadlock-free routing algorithms in torus networks.
  • Meshes and tori have become popular topologies of interconnection networks for constructing massively parallel multiprocessor for decades. They are widely used in recent commercial or experimental multiprocessssors, such as, IBM Blue Gene series, Cray XT series.
  • Torus is a topology which has n-dimensional grid structure with k nodes in each dimension. All nodes in tori have the same number of neighbors.
  • every packet is divided into several flow control digits or flits.
  • the first flit of a packet (called the header flit) contains the information for routing.
  • the packet is pipelined through the network at flit level. When a header flit is blocked during transmission, all flits wait at their local nodes.
  • VCT switched networks packets can cut through to the next router before the complete packet has been received.
  • the main difference between a VCT-switched and wormhole-switched network is that routers need buffer a whole packet when blocking occurs. Therefore, a VCT switched router needs more buffers than a wormhole-switched router.
  • a routing algorithm determines the sequence of channels for a packet to traverse from the source to the destination.
  • the routing scheme is important to performance of an interconnection network. It is essential to present a more effective routing algorithm for torus networks.
  • NoCs Networks-on-chip
  • Meshes/tori are still the most popular topologies for NoCs.
  • the region-based routing algorithm is proposed based on eliminating the redundant information in the routing tables, according to which a network can be partitioned into a small number of regions. Therefore, the amount of routing information can be reduced significantly.
  • All links are classified into mesh sub-network links and wraparound links in an n-dimensional torus.
  • the present invention provides several deadlock-free minimal routing algorithms in torus network. All algorithms are minimal routing algorithms, so all the channels discussed are in the minimal paths.
  • Virtual channel Ri is fully adaptive; a packet can request Ri channels at any time.
  • a packet need not traverse any wraparound link from the current node to the destination, which is delivered in a mesh sub-network. It could request a R 2 channel of the mesh sub-network links. However, it must follow a deadlock-free routing algorithm for meshes, such as, negative -first or dimension-order routing when being delivered via R 2 .
  • Rule 3 The next hop of a packet is to traverse a wraparound link of dimension d, and d is the lowest of the dimensions in which the packet need traverse wraparound links from the current node to the destination.
  • the packet can request the R 2 channel of that wraparound link.
  • This algorithm is named as flow controlled clue. When combined with a flow control scheme, a more efficient algorithm can be proposed.
  • a packet is safe to the downstream node in either one of the following conditions:
  • a packet does not need to traverse any wraparound link from the current node to the destination.
  • the next hop is to reserve a link in the mesh sub-network according to the routing function F.
  • a packet is not safe to a node, we say it is unsafe to that node.
  • a packet is safe or unsafe means that it is safe or unsafe to its current node.
  • This algorithm avoids filling any input port with only unsafe packets. It checks how many free buffers / and safe packets c in the next input buffer when sending a packet:
  • wormhole clue This algorithm is named as wormhole clue. All nodes in an n-dimension torus can be classified as follows: nodes that are directly connected to n wraparound links fall into set S 0 .
  • the Si set contains the nodes such that a packet from a node in Si to any node in the set So takes at least i hops.
  • a minimal deadlock free algorithm for wormhole switching torus is proposed as following:
  • Ri channels It can request only Ri channels from nodes of class Si to nodes of class S j where (j ⁇ i).
  • R 2 channels The next hop of a packet is to traverse a wraparound link of dimension d, and d is the lowest of the dimensions in which the packet need traverse wraparound links from the current node to the destination.
  • the packet can request the R 2 channel of that wraparound link.
  • Fig. 1 - Fig. 4 illustrate the algorithm of clue in k-ary n-cubes of VCT switching
  • Fig. 5 illustrates an example of channel selection in a VCT-switched 2D torus
  • Fig. 6 illustrates the algorithm of flow controlled clue in k-ary n-cubes of VCT switching
  • Fig. 7 illustrates the classification of all nodes in a 2D torus
  • Fig. 8 illustrates a design of the routing logic part in a 2D torus wormhole router based on wormhole clue.
  • Fig. 9 illustrates the architecture of a VCT router based on flow controlled clue.
  • Fig. l presents how clue selects an output channel in a torus of VCT switching. It first provides all possible Ri channels, and then provides all possible R 2 channels. Any of the provided Ri and R 2 channels can be selected as the next channel for the packet.
  • Procedure add-Ri(S, offset) adds all possible Ri channels to the set S, while procedure add-R 2 (S, offset, current) adds to the set S all possible R 2 channels given the offset of the current node to the destination.
  • the function select(S) is the selection function of the routing algorithm, when provided a set of possible channels in S, the selection function selects one of the channels in S for the packet as the next hop.
  • One implementing method of the selection function is randomly selecting a channel from S.
  • resource allocation policies provide different selection functions.
  • k is the number of nodes along each dimension
  • b ⁇ is the offset along dimension z.
  • a packet can choose any Ri channels which lead to nodes closer to its destination.
  • Rn + means the Ri channel of the current node in z ' th dimension along positive direction
  • Rn_ means the Ri channel of the current node in z ' th dimension along negative direction.
  • the packet need traverse a wraparound link in the z ' th dimension when bi > k/2. The next hop would be along the negative direction in dimension z. Therefore, the Rn_ channel is added.
  • the packet need traverse a wraparound link in the z ' th dimension when bi ⁇ -k/2.
  • the next hop would be in the positive direction along dimension z. Therefore, the Rn + channel is provided.
  • the packet need not traverse a wraparound link along z ' th dimension when 0 ⁇ bi ⁇ k/2.
  • the next hop would be in the positive direction along dimension z, the Rn + is provided in this case, -k/2 ⁇ bi ⁇ 0 means that the packet need not traverse a wraparound link in the z ' th dimension and the Rn_ should be provided.
  • the function FirstWrap() finds the first dimension z, along which the packet need traverse a wraparound link.
  • the procedure tries to find the first b j such that b j ⁇ 0 via the function firstOne().
  • the R 2 channel along the y ' th dimension can be added because dimension-order routing is applied on R 2 .
  • Let the z ' th dimension be the lowest dimension, along which the packet need traverse a wraparound link when i ⁇ 0.
  • the channel R 2 of the next hop along dimension z can be added only when the current node is directly connected to the wraparound link of dimension z.
  • Fig.4 presents how negative-first routing is used on the mesh sub-network.
  • the function FirstWrap() is the same as stated above .
  • i ⁇ the analysis is the same as stated about Fig.3.
  • a packet traverses from the source A to the destination H.
  • the routing algorithm in a 2D torus is stated as follows. Because this algorithm is fully adaptive, packets can be routed along any minimal routing path. We just arbitrarily specify a routing path and states in this path which channel or channels can be requested. 1) A ⁇ B, the packet need traverse two wraparound links from the source A to the destination H, where dimension x is the lowest dimension along which the packet need traverse a wraparound link. From Rule 1 and Rule 3, either Ri or R 2 can be selected;
  • Flow controlled clue is described as Fig.6.
  • f + , s i+ represent the number of free buffers and safe packets in the input buffer which neighboring the current node C along dimension i in positive direction, respectively.
  • the input of this algorithm including coordinates of the current node; coordinates of destination node; free buffer numbers and special packet numbers of all neighboring input buffers.
  • Available channel set and the selected output channel are initialized as and null. If the current node is equal to destination, the internal channel is selected. Otherwise, for each dimension whether the packet would be forwarded is checked through function flow-control (). Add the channel ch; + or ch;_ to S if the packet could advance along dimension i.
  • the function issafe (z) returns 1 when the packet is safe to the next node along dimension i. Otherwise, it returns 0.
  • Function flow-control () avoids filling any input port with only unsafe packets. It checks how many free buffers (f) and safe packets (s) in the next input buffer:
  • the packet could be delivered if it is a safe packet to the next node; otherwise, keep the packet in the waiting list.
  • the key point of flow controlled clue is the flow-control () function. It avoids filling any input port with only unsafe packets.
  • the input buffers are organized as dynamically allocated multi-queues. There are two queues at each input port: a safe packet queue and an unsafe packet queue. All safe packets to a node in an input port are linked together as a safe packet queue. The rest are linked as an unsafe packet queue. So the safe packets would not be blocked by unsafe packets.
  • All nodes in an n-dimension torus can be classified as follows: nodes that are directly connected to n wraparound links fall into set S 0 .
  • the Si set contains the nodes such that a packet from a node in Si to any node in the set So takes at least i hops.
  • a node (x 0 , x ls x n-1 ) is in the set with subscript:
  • So contains the four vertices, which are directly connected to two wraparound links; Si contains the eight nodes neighboring to the four vertices in So.
  • a packet takes at least two steps from a node in the set S 2 to any node in the set So.
  • Fig.8 shows an algorithmic routing logic design for wormhole clue in a 2D torus. To simplify the routing logic, a little change is made to the algorithm: when packets need cross wrap-around links, they can only request Ri channels of the dimensions which they need cross wrap-around links.
  • a routing header includes a sign bit (sx and sy) and a wraparound link bit (wx and wy) for each dimension.
  • the relative address (x and y) specify the number of hops to the destination in each dimension.
  • the direction is specified by the sign bit.
  • the wraparound link bit indicates whether the packet need traverse a wraparound link along that dimension.
  • the relative addresses (called offsets in many cases) are input to zero checkers that generate the signals xdone and ydone. If the relative address in a particular dimension is zero, the packet is done routing along that dimension.
  • the sign bits, the done signals are then input to an array of five AND gates that determine which directions, whether need traverse a wraparound link, are productive in the sense that they will move the packet closer to the destination.
  • the 8 -bit productive vector is input to a routing function block that decides which channels are available.
  • the routing function block also needs one bit for each dimension to specify whether the current node is directly connected to a wrap-around link in that dimension (wrapx and wrapy). Queue lengths specify the state of each virtual channel.
  • the routing function block is not complex. Assume dimension-order routing is applied on R 2 . We take R lx+ , R 2x+ and R 2y+ for example to explain when these channels can be added to the available channel set. In wormhole clue, as the constraint we added, when packets need cross wrap-around links in y dimension and not in x dimension, it could not request Ri channels of x dimension. Otherwise, Ri channels of x dimension could be added to the channel set if necessary.
  • the free buffer bit fr R i x+ indicates there is at least one free buffer in the corresponding Ri channel. Then,
  • R ix + fi " Rix + - (+ x ) - wy - wx.
  • R 2 channels are preferred when both Ri and R 2 available. When several Ri channels or R 2 channels available, one channel is randomly selected. So the selection function is also very simple. At last, a one-hot selected vector indicates the virtual channel that the packet selects.
  • the input buffers of flow controlled clue are organized as dynamically allocated multi-queues.
  • the router design is shown in Fig.8. Packets of each queue advance in the traditional FIFO manner. Each output channel maintains three counts: G, C, and S as shown in Table 1. The credit count C decreases 1 and S increases 1 when an output channel sends a safe packet to a downstream node. Once the downstream router forwards a packet and releases the associated buffer, it sends a credit to the upstream router. This makes a buffer count be incremented. There is also a bit in the credit indicating whether this packet is a safe packet. This bit can be used to update C.

Abstract

A deadlock-free minimal routing algorithm called clue is first disclosed for VCT tori. Only two virtual channels are required. One channel is applied in the deadlock-free routing algorithm for the mesh sub-network based on a known base routing scheme, such as, negative-first or dimension-order routing. The other channel is similar to an adaptive channel. This combination completes a very novel fully-adaptive routing scheme because the first channel does not supply routing paths for every source-destination pair. Based on clue, we proposed two other algorithms named flow controlled clue and wormhole clue. Flow controlled clue is also proposed for VCT-switched tori. It is fully adaptive deadlock-free. Each input port requires at least two buffers, each of which is able to keep a packet. A simple but well-designed flow control function is used in the proposed flow controlled clue routing algorithm to avoid deadlocks. Wormhole clue is proposed for wormhole-switched tori. It is partially adaptive because we add some constraints to the adaptive channel for deadlock avoidance.

Description

Efficient Adaptive Deadlock-free Routing Algorithms for
Torus Networks
Field of the Invention
The present invention relates to parallel interconnection networks, more particularly to deadlock-free routing algorithms in torus networks.
Background of the Invention
Meshes and tori have become popular topologies of interconnection networks for constructing massively parallel multiprocessor for decades. They are widely used in recent commercial or experimental multiprocessssors, such as, IBM Blue Gene series, Cray XT series.
Torus is a topology which has n-dimensional grid structure with k nodes in each dimension. All nodes in tori have the same number of neighbors. In wormhole switching, every packet is divided into several flow control digits or flits. The first flit of a packet (called the header flit) contains the information for routing. The packet is pipelined through the network at flit level. When a header flit is blocked during transmission, all flits wait at their local nodes. In VCT switched networks, packets can cut through to the next router before the complete packet has been received. The main difference between a VCT-switched and wormhole-switched network is that routers need buffer a whole packet when blocking occurs. Therefore, a VCT switched router needs more buffers than a wormhole-switched router.
A routing algorithm determines the sequence of channels for a packet to traverse from the source to the destination. The routing scheme is important to performance of an interconnection network. It is essential to present a more effective routing algorithm for torus networks.
Several routing algorithms were proposed for meshes and torus. The X-Y routing in a 2D mesh delivers a packet first along dimension X and then along dimension Y after all hops along dimension X have been eliminated. Turn model was proposed for designing partially adaptive deadlock-free algorithms in a mesh. It prevents the minimum number of turns to avoid cyclic channel dependencies.
Networks-on-chip (NoCs) have been emerging technologies for high-performance computer systems and CPU. Meshes/tori are still the most popular topologies for NoCs. The region-based routing algorithm is proposed based on eliminating the redundant information in the routing tables, according to which a network can be partitioned into a small number of regions. Therefore, the amount of routing information can be reduced significantly.
Several routing algorithms are proposed in this invention. They are very novel because deadlock is analyzed from an original aspect. When deadlock occurs in a minimal routing algorithm, there must be some packets in the network which cannot advance even one hop because of cyclic waiting. Also, these packets are unable to advance forever unless we break the deadlock manually. So, deadlock-free algorithms are designed based on the idea that each packet in the network cannot be blocked forever when this algorithm is applied.
Summary of the Invention
All links are classified into mesh sub-network links and wraparound links in an n-dimensional torus. The present invention provides several deadlock-free minimal routing algorithms in torus network. All algorithms are minimal routing algorithms, so all the channels discussed are in the minimal paths.
(1) A fully adaptive routing algorithm in VCT switching torus without flow control.
This algorithm is named as clue. Two virtual channels Ri and R2 are enough to provide deadlock-free fully adaptive routing in a VCT-switched n-dimensional torus. Three rules are used to illustrate the algorithm:
Rule 1: Virtual channel Ri is fully adaptive; a packet can request Ri channels at any time.
Rule 2: A packet need not traverse any wraparound link from the current node to the destination, which is delivered in a mesh sub-network. It could request a R2 channel of the mesh sub-network links. However, it must follow a deadlock-free routing algorithm for meshes, such as, negative -first or dimension-order routing when being delivered via R2.
Rule 3: The next hop of a packet is to traverse a wraparound link of dimension d, and d is the lowest of the dimensions in which the packet need traverse wraparound links from the current node to the destination. The packet can request the R2 channel of that wraparound link.
(2) A fully adaptive routing algorithm in VCT switching torus with flow control.
This algorithm is named as flow controlled clue. When combined with a flow control scheme, a more efficient algorithm can be proposed.
In this algorithm, two kinds of packets, safe packets and unsafe packets are defined. Based on a routing algorithm F for the mesh sub-network of an n-dimensional torus, a packet is safe to the downstream node in either one of the following conditions:
1) The next hop of the packet is to traverse a wraparound link along dimension d, and d is the lowest of the dimensions along which the packet need traverse wraparound links.
2) A packet does not need to traverse any wraparound link from the current node to the destination. The next hop is to reserve a link in the mesh sub-network according to the routing function F.
If a packet is not safe to a node, we say it is unsafe to that node. In the following, unless specifically specified a packet is safe or unsafe means that it is safe or unsafe to its current node.
This algorithm avoids filling any input port with only unsafe packets. It checks how many free buffers / and safe packets c in the next input buffer when sending a packet:
1) f > 1, the packet could be delivered because there is more than one free buffers in the next node.
2) f = l and c > 0, the packet could be delivered because there is at least one safe packet in the next node.
3) f = l and c = 0, the packet could be delivered if it is a safe packet to the next node; otherwise, keep the packet in the waiting list.
4) f= 0, keep the packet in the waiting list.
(3) A partially adaptive routing algorithm in wormhole switching torus.
This algorithm is named as wormhole clue. All nodes in an n-dimension torus can be classified as follows: nodes that are directly connected to n wraparound links fall into set S0. The Si set contains the nodes such that a packet from a node in Si to any node in the set So takes at least i hops. A minimal deadlock free algorithm for wormhole switching torus is proposed as following:
1) A packet need traverse one or more wraparound links.
a) Ri channels: It can request only Ri channels from nodes of class Si to nodes of class Sj where (j < i).
b) R2 channels: The next hop of a packet is to traverse a wraparound link of dimension d, and d is the lowest of the dimensions in which the packet need traverse wraparound links from the current node to the destination. The packet can request the R2 channel of that wraparound link.
2) A packet need not traverse wraparound links.
a) Ri channels. It could request any of the Ri channels available.
b) R2 channels. It could request a R2 channel of the mesh sub-network links. However, it must follow a deadlock-free routing algorithm for meshes, such as, negative-first or dimension-order routing when being delivered via R2.
Brief Description of Figures
Fig. 1 - Fig. 4 illustrate the algorithm of clue in k-ary n-cubes of VCT switching;
Fig. 5 illustrates an example of channel selection in a VCT-switched 2D torus;
Fig. 6 illustrates the algorithm of flow controlled clue in k-ary n-cubes of VCT switching;
Fig. 7 illustrates the classification of all nodes in a 2D torus;
Fig. 8 illustrates a design of the routing logic part in a 2D torus wormhole router based on wormhole clue.
Fig. 9 illustrates the architecture of a VCT router based on flow controlled clue.
Detailed Description of Preferred Embodiments
Fig. l presents how clue selects an output channel in a torus of VCT switching. It first provides all possible Ri channels, and then provides all possible R2 channels. Any of the provided Ri and R2 channels can be selected as the next channel for the packet. Procedure add-Ri(S, offset) adds all possible Ri channels to the set S, while procedure add-R2(S, offset, current) adds to the set S all possible R2 channels given the offset of the current node to the destination. We applied dimension-order routing and negative-first routing on R2 channels of the mesh sub-network links as shown in Fig.3 and Fig.4, separately. The function select(S) is the selection function of the routing algorithm, when provided a set of possible channels in S, the selection function selects one of the channels in S for the packet as the next hop. One implementing method of the selection function is randomly selecting a channel from S. Usually different resource allocation policies provide different selection functions.
As shown in Fig.2, k is the number of nodes along each dimension, and b{ is the offset along dimension z. In clue, a packet can choose any Ri channels which lead to nodes closer to its destination. For each dimension z, whether Rii+ or Rn_ can be added to the set is checked. Here Rn+ means the Ri channel of the current node in z'th dimension along positive direction; Rn_ means the Ri channel of the current node in z'th dimension along negative direction. The packet need traverse a wraparound link in the z'th dimension when bi > k/2. The next hop would be along the negative direction in dimension z. Therefore, the Rn_ channel is added. The packet need traverse a wraparound link in the z'th dimension when bi < -k/2. The next hop would be in the positive direction along dimension z. Therefore, the Rn+ channel is provided. The packet need not traverse a wraparound link along z'th dimension when 0 < bi < k/2. The next hop would be in the positive direction along dimension z, the Rn+ is provided in this case, -k/2 < bi < 0 means that the packet need not traverse a wraparound link in the z'th dimension and the Rn_ should be provided.
As shown in Fig.3, the function FirstWrap() finds the first dimension z, along which the packet need traverse a wraparound link. The packet need traverse a wraparound link in neither dimension when z = 0. The procedure tries to find the first bj such that bj≠ 0 via the function firstOne(). The R2 channel along the y'th dimension can be added because dimension-order routing is applied on R2. Let the z'th dimension be the lowest dimension, along which the packet need traverse a wraparound link when i≠ 0. The channel R2 of the next hop along dimension z can be added only when the current node is directly connected to the wraparound link of dimension z.
Fig.4 presents how negative-first routing is used on the mesh sub-network. The function FirstWrap() is the same as stated above . When z = 0 the procedure checks in each dimension the packet would route along positive direction or negative direction, and put the corresponding R2 channels into Ti and T2, separately. If the T2 set is not empty, channels in T2 are added to the S. Otherwise channels in Ti are added to the S. The reason is that, when negative-first routing is applied on mesh sub-network, packets must traverse via R2 channels along negative direction first, then via R2 channels along positive direction. When i≠ 0, the analysis is the same as stated about Fig.3.
As shown in Fig.5, assuming the dimension-order routing algorithm is applied on R2 channels of the mesh sub-network links, a packet traverses from the source A to the destination H. The routing algorithm in a 2D torus is stated as follows. Because this algorithm is fully adaptive, packets can be routed along any minimal routing path. We just arbitrarily specify a routing path and states in this path which channel or channels can be requested. 1) A→ B, the packet need traverse two wraparound links from the source A to the destination H, where dimension x is the lowest dimension along which the packet need traverse a wraparound link. From Rule 1 and Rule 3, either Ri or R2 can be selected;
2) B→ C, the packet need traverse a wraparound link from B to H, while dimension y is the lowest dimension along which the packet need traverse a wraparound link. From Rule 1 and Rule 3 we know that, either Ri or R2 can be selected;
3) C→ E, F→ G, G→ H, the packet need traverse no wraparound link from the current node to the destination H. These hops follow the dimension-order algorithm. As stated in Rule 2, R2 channels can be selected in these cases. Ri can also be selected. Therefore, either Ri or R2 can be selected;
4) E→ F, the packet need traverse no wraparound link from E to H. This hop does not follow the dimension-order algorithm. From Rule 1 and Rule 2, R2 channel cannot be selected in this hop, only Ri channel can be selected.
Let a packet be delivered from I to L as shown in Fig.5: I→ J, only Ri channel can be selected because this hop does not follow the dimension-order algorithm. J→ K, K→ L: either Ri or R2 can be selected as presented in Fig.5.
Flow controlled clue is described as Fig.6. f+, si+ represent the number of free buffers and safe packets in the input buffer which neighboring the current node C along dimension i in positive direction, respectively. The input of this algorithm including coordinates of the current node; coordinates of destination node; free buffer numbers and special packet numbers of all neighboring input buffers. Available channel set and the selected output channel are initialized as and null. If the current node is equal to destination, the internal channel is selected. Otherwise, for each dimension whether the packet would be forwarded is checked through function flow-control (). Add the channel ch;+ or ch;_ to S if the packet could advance along dimension i. The function issafe (z) returns 1 when the packet is safe to the next node along dimension i. Otherwise, it returns 0. Function flow-control () avoids filling any input port with only unsafe packets. It checks how many free buffers (f) and safe packets (s) in the next input buffer:
1) / > 1, the packet could be delivered because there is more than one free buffers in the next node.
2) f = 1 and s > 0, the packet could be delivered because there is at least one safe} packet in the next node.
3) /= 1 and s = 0, the packet could be delivered if it is a safe packet to the next node; otherwise, keep the packet in the waiting list.
4) f= 0, keep the packet in the waiting list.
Finally, select an output channel from S if it is not null. Otherwise, the packet is blocked. The key point of flow controlled clue is the flow-control () function. It avoids filling any input port with only unsafe packets. The input buffers are organized as dynamically allocated multi-queues. There are two queues at each input port: a safe packet queue and an unsafe packet queue. All safe packets to a node in an input port are linked together as a safe packet queue. The rest are linked as an unsafe packet queue. So the safe packets would not be blocked by unsafe packets.
All nodes in an n-dimension torus can be classified as follows: nodes that are directly connected to n wraparound links fall into set S0. The Si set contains the nodes such that a packet from a node in Si to any node in the set So takes at least i hops. In a k-ary n-cube, a node (x0, xls xn-1) is in the set with subscript:
∑min(x., - l - x.)
o
As shown in Fig.7, So contains the four vertices, which are directly connected to two wraparound links; Si contains the eight nodes neighboring to the four vertices in So. A packet takes at least two steps from a node in the set S2 to any node in the set So.
Fig.8 shows an algorithmic routing logic design for wormhole clue in a 2D torus. To simplify the routing logic, a little change is made to the algorithm: when packets need cross wrap-around links, they can only request Ri channels of the dimensions which they need cross wrap-around links.
A routing header includes a sign bit (sx and sy) and a wraparound link bit (wx and wy) for each dimension. The relative address (x and y) specify the number of hops to the destination in each dimension. The direction is specified by the sign bit. The wraparound link bit indicates whether the packet need traverse a wraparound link along that dimension. The relative addresses (called offsets in many cases) are input to zero checkers that generate the signals xdone and ydone. If the relative address in a particular dimension is zero, the packet is done routing along that dimension. The sign bits, the done signals are then input to an array of five AND gates that determine which directions, whether need traverse a wraparound link, are productive in the sense that they will move the packet closer to the destination. For example, the second leftmost gate from determines that if xdone = 0, indicating that we are not done routing in the x dimension, and sx = 0, indicating that the packet is delivered in the +x direction, then the +x is productive. The 8 -bit productive vector is input to a routing function block that decides which channels are available. The routing function block also needs one bit for each dimension to specify whether the current node is directly connected to a wrap-around link in that dimension (wrapx and wrapy). Queue lengths specify the state of each virtual channel.
The routing function block is not complex. Assume dimension-order routing is applied on R2. We take Rlx+, R2x+ and R2y+ for example to explain when these channels can be added to the available channel set. In wormhole clue, as the constraint we added, when packets need cross wrap-around links in y dimension and not in x dimension, it could not request Ri channels of x dimension. Otherwise, Ri channels of x dimension could be added to the channel set if necessary. The free buffer bit frRix+ indicates there is at least one free buffer in the corresponding Ri channel. Then,
Rix+ = fi"Rix+ - (+x) - wy - wx.
In clue it would be:
Rlx+ = frRlx+ - (+X) As we discussed above, Ri is fully adaptive in clue, packets can request Ri at any time. When packets need traverse along x dimension in positive direction (the +x bit) and there is at least free buffer, the Rix+ can be added to the available channel set.
In the selection function, R2 channels are preferred when both Ri and R2 available. When several Ri channels or R2 channels available, one channel is randomly selected. So the selection function is also very simple. At last, a one-hot selected vector indicates the virtual channel that the packet selects.
The input buffers of flow controlled clue are organized as dynamically allocated multi-queues. The router design is shown in Fig.8. Packets of each queue advance in the traditional FIFO manner. Each output channel maintains three counts: G, C, and S as shown in Table 1. The credit count C decreases 1 and S increases 1 when an output channel sends a safe packet to a downstream node. Once the downstream router forwards a packet and releases the associated buffer, it sends a credit to the upstream router. This makes a buffer count be incremented. There is also a bit in the credit indicating whether this packet is a safe packet. This bit can be used to update C.
Figure imgf000008_0001

Claims

What is claimed is:
(1) A fully adaptive routing algorithm in VCT switching torus without flow control.
This algorithm is named as clue. Two virtual channels Ri and R2 are enough to provide deadlock-free fully adaptive routing in a VCT-switched n-dimensional torus. Three rules are used to illustrate the algorithm:
Rule 1: Virtual channel Ri is fully adaptive; a packet can request Ri channels at any time.
Rule 2: A packet need not traverse any wraparound link from the current node to the destination, which is delivered in a mesh sub-network. It could request a R2 channel of the mesh sub-network links. However, it must follow a deadlock-free routing algorithm for meshes, such as, negative -first or dimension-order routing when being delivered via R2.
Rule 3: The next hop of a packet is to traverse a wraparound link of dimension d, and d is the lowest of the dimensions in which the packet need traverse wraparound links from the current node to the destination. The packet can request the R2 channel of that wraparound link.
(2) A fully adaptive routing algorithm in VCT switching torus with flow control.
This algorithm is named as flow controlled clue. When combined with a flow control scheme, a more efficient algorithm can be proposed.
In this algorithm, two kinds of packets, safe packets and unsafe packets are defined. Based on a routing algorithm F for the mesh sub-network of an n-dimensional torus, a packet is safe to the downstream node in either one of the following conditions:
1) The next hop of the packet is to traverse a wraparound link along dimension d, and d is the lowest of the dimensions along which the packet need traverse wraparound links.
2) A packet does not need to traverse any wraparound link from the current node to the destination. The next hop is to reserve a link in the mesh sub-network according to the routing function F.
If a packet is not safe to a node, we say it is unsafe to that node. In the following, unless specifically specified a packet is safe or unsafe means that it is safe or unsafe to its current node.
This algorithm avoids filling any input port with only unsafe packets. It checks how many free buffers / and safe packets c in the next input buffer when sending a packet:
1) f > 1, the packet could be delivered because there is more than one free buffers in the next node.
2) f = l and c > 0, the packet could be delivered because there is at least one safe packet in the next node.
3) f = l and c = 0, the packet could be delivered if it is a safe packet to the next node; otherwise, keep the packet in the waiting list.
4) f = 0, keep the packet in the waiting list. (3) A partially adaptive routing algorithm in wormhole switching torus.
This algorithm is named as wormhole clue. All nodes in an n-dimension torus can be classified as follows: nodes that are directly connected to n wraparound links fall into set So. The Sj set contains the nodes such that a packet from a node in Sj to any node in the set So takes at least i hops. A minimal deadlock free algorithm for wormhole switching torus is proposed as following:
1) A packet need traverse one or more wraparound links.
a) Ri channels: It can request only Ri channels from nodes of class S, to nodes of class Sj where (j < i).
b) R2 channels: The next hop of a packet is to traverse a wraparound link of dimension d, and d is the lowest of the dimensions in which the packet need traverse wraparound links from the current node to the destination. The packet can request the R2 channel of that wraparound link.
2) A packet need not traverse wraparound links.
a) Ri channels. It could request any of the Ri channels available.
b) R2 channels. It could request a R2 channel of the mesh sub-network links. However, it must follow a deadlock-free routing algorithm for meshes, such as, negative- first or dimension-order routing when being delivered via R2.
PCT/CN2011/080312 2011-05-31 2011-09-28 Efficient adaptive deadlock-free routing algorithms for torus networks WO2012162988A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2011101448825A CN102170402B (en) 2011-05-31 2011-05-31 A deadlock-free adaptive routing algorithm in a Torus network
CN201110144882.5 2011-05-31

Publications (1)

Publication Number Publication Date
WO2012162988A1 true WO2012162988A1 (en) 2012-12-06

Family

ID=44491385

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/080312 WO2012162988A1 (en) 2011-05-31 2011-09-28 Efficient adaptive deadlock-free routing algorithms for torus networks

Country Status (2)

Country Link
CN (1) CN102170402B (en)
WO (1) WO2012162988A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11425027B2 (en) 2020-11-01 2022-08-23 Mellanox Technologies, Ltd. Turn-based deadlock-free routing in a Cartesian topology
US11770326B2 (en) 2019-08-08 2023-09-26 Mellanox Technologies, Ltd. Producing deadlock-free routes in lossless cartesian topologies with minimal number of virtual lanes

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102170402B (en) * 2011-05-31 2013-07-10 清华大学 A deadlock-free adaptive routing algorithm in a Torus network
CN103686467B (en) * 2012-09-13 2017-06-30 日电(中国)有限公司 Optical data central site network, optical module, Wavelength allocation method
CN102904806B (en) * 2012-09-28 2015-04-15 清华大学 Deadlock free fault-tolerant self-adaptation routing method of computer system
CN103095588B (en) * 2013-01-17 2015-09-30 清华大学 Based on the adaptive routing method without dead of multiple spanning tree
CN103491023B (en) * 2013-09-13 2016-08-17 中国人民解放军国防科学技术大学 Method for routing for three-dimensional torus photoelectricity hybrid network
WO2015176243A1 (en) * 2014-05-21 2015-11-26 华为技术有限公司 Improved ring topology structure and application method thereof
CN104065575B (en) * 2014-07-16 2017-08-04 曙光信息产业(北京)有限公司 It is a kind of to indicate route and the method and device of routing iinformation based on nodes
CN104539536B (en) * 2014-12-01 2017-10-17 清华大学 The stream control of dynamical state driving and Torus network self-adapting routing methods
CN105530206B (en) * 2015-12-22 2019-01-29 合肥工业大学 A kind of Torus network system and its working method with double access infrastructures
CN110048947B (en) * 2019-03-06 2020-06-16 清华大学 Self-adaptive routing method of data packet in two-dimensional Mesh network and electronic equipment
CN110198268A (en) * 2019-05-15 2019-09-03 清华大学 The high-dimensional Torus network architecture and adaptive routing method
CN112039678B (en) * 2019-06-04 2021-11-19 清华大学 Torus network-based multicast method
CN114760255B (en) * 2022-03-31 2024-03-08 中国电子科技集团公司第五十八研究所 On-chip and inter-chip integrated network deadlock-free architecture for multi-die interconnection

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101242372A (en) * 2008-03-10 2008-08-13 清华大学 Non lock routing method for k-element N-dimension mesh
US20090046727A1 (en) * 2007-08-16 2009-02-19 D. E. Shaw Research, Llc Routing with virtual channels
CN102170402A (en) * 2011-05-31 2011-08-31 清华大学 A deadlock-free adaptive routing algorithm in a Torus network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100399771C (en) * 2004-12-09 2008-07-02 电子科技大学 Method for non deadlock self adaptive routing in multi-dimensional exchanging structure
CN101335704B (en) * 2008-04-18 2011-05-11 清华大学 Adaptive routing method without dead lock in three-dimensional torus network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090046727A1 (en) * 2007-08-16 2009-02-19 D. E. Shaw Research, Llc Routing with virtual channels
CN101242372A (en) * 2008-03-10 2008-08-13 清华大学 Non lock routing method for k-element N-dimension mesh
CN102170402A (en) * 2011-05-31 2011-08-31 清华大学 A deadlock-free adaptive routing algorithm in a Torus network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS, RESEARCH DISCLOSURE, vol. 338, no. 081, 10 June 1992 (1992-06-10) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11770326B2 (en) 2019-08-08 2023-09-26 Mellanox Technologies, Ltd. Producing deadlock-free routes in lossless cartesian topologies with minimal number of virtual lanes
US11425027B2 (en) 2020-11-01 2022-08-23 Mellanox Technologies, Ltd. Turn-based deadlock-free routing in a Cartesian topology

Also Published As

Publication number Publication date
CN102170402B (en) 2013-07-10
CN102170402A (en) 2011-08-31

Similar Documents

Publication Publication Date Title
WO2012162988A1 (en) Efficient adaptive deadlock-free routing algorithms for torus networks
JP6267367B2 (en) Packet routing method in distributed direct interconnection network
JP5551253B2 (en) Method and apparatus for selecting from multiple equal cost paths
Xiang et al. An efficient adaptive deadlock-free routing algorithm for torus networks
Wang et al. Cupid: Congestion-free consistent data plane update in software defined networks
US9294385B2 (en) Deadlock-free routing in fat tree networks
EP3809646A1 (en) Routing tables for forwarding packets between switches in a data center network
US10735305B2 (en) Communication locality oriented high-radix interconnection network and adaptive routing algorithm design
Lotfi-Kamran et al. BARP-a dynamic routing protocol for balanced distribution of traffic in NoCs
CN108111410B (en) Method and device for constructing deadlock-free route in network with Cartesian topology
US20230327976A1 (en) Deadlock-free multipath routing for direct interconnect networks
Nosrati et al. G-CARA: A Global Congestion-Aware Routing Algorithm for traffic management in 3D networks-on-chip
Gorgues et al. Achieving balanced buffer utilization with a proper co-design of flow control and routing algorithm
Ebrahimi et al. Partitioning methods for unicast/multicast traffic in 3D NoC architecture
Seydim Wormhole routing in parallel computers
Kiasari et al. Analytic performance comparison of hypercubes and star graphs with implementation constraints
US11765103B2 (en) Large-scale network with high port utilization
Chang et al. Overview of high-efficiency ant colony optimization (ACO)-based adaptive routings for traffic balancing in network-on-chip systems
Fulgham et al. A comparison of input and output driven routers
Lin et al. Express circuit switching: Improving the performance of bufferless networks-on-chip
CN110679123B (en) Virtual channel routing method, network device, and medium
Tomita et al. A Fault-tolerant Routing Method for 2D-torus Network-on-Chips Based on Bus Functions
Kurokawa et al. A Fault-Tolerant Routing Method for 2D-Mesh Network-on-Chips Based on the Passage of Fault Blocks
Mallappa et al. Joint Application-Aware Oblivious Routing and Static Virtual Channel Allocation
Chen et al. Towards an effective utilization of partially defected interconnections in 2d mesh nocs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11866921

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11866921

Country of ref document: EP

Kind code of ref document: A1