WO2012162988A1

WO2012162988A1 - Efficient adaptive deadlock-free routing algorithms for torus networks

Info

Publication number: WO2012162988A1
Application number: PCT/CN2011/080312
Authority: WO
Inventors: Dong Xiang; Wei Luo
Original assignee: Dong Xiang; Wei Luo
Priority date: 2011-05-31
Filing date: 2011-09-28
Publication date: 2012-12-06
Also published as: CN102170402B; CN102170402A

Abstract

A deadlock-free minimal routing algorithm called clue is first disclosed for VCT tori. Only two virtual channels are required. One channel is applied in the deadlock-free routing algorithm for the mesh sub-network based on a known base routing scheme, such as, negative-first or dimension-order routing. The other channel is similar to an adaptive channel. This combination completes a very novel fully-adaptive routing scheme because the first channel does not supply routing paths for every source-destination pair. Based on clue, we proposed two other algorithms named flow controlled clue and wormhole clue. Flow controlled clue is also proposed for VCT-switched tori. It is fully adaptive deadlock-free. Each input port requires at least two buffers, each of which is able to keep a packet. A simple but well-designed flow control function is used in the proposed flow controlled clue routing algorithm to avoid deadlocks. Wormhole clue is proposed for wormhole-switched tori. It is partially adaptive because we add some constraints to the adaptive channel for deadlock avoidance.

Description

Efficient Adaptive Deadlock-free Routing Algorithms for

Torus Networks

Field of the Invention

The present invention relates to parallel interconnection networks, more particularly to deadlock-free routing algorithms in torus networks.

Background of the Invention

Meshes and tori have become popular topologies of interconnection networks for constructing massively parallel multiprocessor for decades. They are widely used in recent commercial or experimental multiprocessssors, such as, IBM Blue Gene series, Cray XT series.

Torus is a topology which has n-dimensional grid structure with k nodes in each dimension. All nodes in tori have the same number of neighbors. In wormhole switching, every packet is divided into several flow control digits or flits. The first flit of a packet (called the header flit) contains the information for routing. The packet is pipelined through the network at flit level. When a header flit is blocked during transmission, all flits wait at their local nodes. In VCT switched networks, packets can cut through to the next router before the complete packet has been received. The main difference between a VCT-switched and wormhole-switched network is that routers need buffer a whole packet when blocking occurs. Therefore, a VCT switched router needs more buffers than a wormhole-switched router.

A routing algorithm determines the sequence of channels for a packet to traverse from the source to the destination. The routing scheme is important to performance of an interconnection network. It is essential to present a more effective routing algorithm for torus networks.

Several routing algorithms were proposed for meshes and torus. The X-Y routing in a 2D mesh delivers a packet first along dimension X and then along dimension Y after all hops along dimension X have been eliminated. Turn model was proposed for designing partially adaptive deadlock-free algorithms in a mesh. It prevents the minimum number of turns to avoid cyclic channel dependencies.

Networks-on-chip (NoCs) have been emerging technologies for high-performance computer systems and CPU. Meshes/tori are still the most popular topologies for NoCs. The region-based routing algorithm is proposed based on eliminating the redundant information in the routing tables, according to which a network can be partitioned into a small number of regions. Therefore, the amount of routing information can be reduced significantly.

Several routing algorithms are proposed in this invention. They are very novel because deadlock is analyzed from an original aspect. When deadlock occurs in a minimal routing algorithm, there must be some packets in the network which cannot advance even one hop because of cyclic waiting. Also, these packets are unable to advance forever unless we break the deadlock manually. So, deadlock-free algorithms are designed based on the idea that each packet in the network cannot be blocked forever when this algorithm is applied.

Summary of the Invention

All links are classified into mesh sub-network links and wraparound links in an n-dimensional torus. The present invention provides several deadlock-free minimal routing algorithms in torus network. All algorithms are minimal routing algorithms, so all the channels discussed are in the minimal paths.

(1) A fully adaptive routing algorithm in VCT switching torus without flow control.

This algorithm is named as clue. Two virtual channels Ri and R₂ are enough to provide deadlock-free fully adaptive routing in a VCT-switched n-dimensional torus. Three rules are used to illustrate the algorithm:

Rule 1: Virtual channel Ri is fully adaptive; a packet can request Ri channels at any time.

Rule 2: A packet need not traverse any wraparound link from the current node to the destination, which is delivered in a mesh sub-network. It could request a R₂ channel of the mesh sub-network links. However, it must follow a deadlock-free routing algorithm for meshes, such as, negative -first or dimension-order routing when being delivered via R₂.

Rule 3: The next hop of a packet is to traverse a wraparound link of dimension d, and d is the lowest of the dimensions in which the packet need traverse wraparound links from the current node to the destination. The packet can request the R₂ channel of that wraparound link.

(2) A fully adaptive routing algorithm in VCT switching torus with flow control.

This algorithm is named as flow controlled clue. When combined with a flow control scheme, a more efficient algorithm can be proposed.

In this algorithm, two kinds of packets, safe packets and unsafe packets are defined. Based on a routing algorithm F for the mesh sub-network of an n-dimensional torus, a packet is safe to the downstream node in either one of the following conditions:

1) The next hop of the packet is to traverse a wraparound link along dimension d, and d is the lowest of the dimensions along which the packet need traverse wraparound links.

2) A packet does not need to traverse any wraparound link from the current node to the destination. The next hop is to reserve a link in the mesh sub-network according to the routing function F.

If a packet is not safe to a node, we say it is unsafe to that node. In the following, unless specifically specified a packet is safe or unsafe means that it is safe or unsafe to its current node.

This algorithm avoids filling any input port with only unsafe packets. It checks how many free buffers / and safe packets c in the next input buffer when sending a packet:

1) f > 1, the packet could be delivered because there is more than one free buffers in the next node.

2) f = l and c > 0, the packet could be delivered because there is at least one safe packet in the next node.

3) f = l and c = 0, the packet could be delivered if it is a safe packet to the next node; otherwise, keep the packet in the waiting list.

4) f= 0, keep the packet in the waiting list.

(3) A partially adaptive routing algorithm in wormhole switching torus.

This algorithm is named as wormhole clue. All nodes in an n-dimension torus can be classified as follows: nodes that are directly connected to n wraparound links fall into set S₀. The Si set contains the nodes such that a packet from a node in Si to any node in the set So takes at least i hops. A minimal deadlock free algorithm for wormhole switching torus is proposed as following:

1) A packet need traverse one or more wraparound links.

a) Ri channels: It can request only Ri channels from nodes of class Si to nodes of class S_j where (j < i).

b) R₂ channels: The next hop of a packet is to traverse a wraparound link of dimension d, and d is the lowest of the dimensions in which the packet need traverse wraparound links from the current node to the destination. The packet can request the R₂ channel of that wraparound link.

2) A packet need not traverse wraparound links.

a) Ri channels. It could request any of the Ri channels available.

b) R₂ channels. It could request a R₂ channel of the mesh sub-network links. However, it must follow a deadlock-free routing algorithm for meshes, such as, negative-first or dimension-order routing when being delivered via R₂.

Brief Description of Figures

Fig. 1 - Fig. 4 illustrate the algorithm of clue in k-ary n-cubes of VCT switching;

Fig. 5 illustrates an example of channel selection in a VCT-switched 2D torus;

Fig. 6 illustrates the algorithm of flow controlled clue in k-ary n-cubes of VCT switching;

Fig. 7 illustrates the classification of all nodes in a 2D torus;

Fig. 8 illustrates a design of the routing logic part in a 2D torus wormhole router based on wormhole clue.

Fig. 9 illustrates the architecture of a VCT router based on flow controlled clue.

Detailed Description of Preferred Embodiments

Fig. l presents how clue selects an output channel in a torus of VCT switching. It first provides all possible Ri channels, and then provides all possible R₂ channels. Any of the provided Ri and R₂ channels can be selected as the next channel for the packet. Procedure add-Ri(S, offset) adds all possible Ri channels to the set S, while procedure add-R₂(S, offset, current) adds to the set S all possible R₂ channels given the offset of the current node to the destination. We applied dimension-order routing and negative-first routing on R₂ channels of the mesh sub-network links as shown in Fig.3 and Fig.4, separately. The function select(S) is the selection function of the routing algorithm, when provided a set of possible channels in S, the selection function selects one of the channels in S for the packet as the next hop. One implementing method of the selection function is randomly selecting a channel from S. Usually different resource allocation policies provide different selection functions.

As shown in Fig.2, k is the number of nodes along each dimension, and b_{ is the offset along dimension z. In clue, a packet can choose any Ri channels which lead to nodes closer to its destination. For each dimension z, whether Ri_i+ or Rn_ can be added to the set is checked. Here Rn₊ means the Ri channel of the current node in z^'th dimension along positive direction; Rn_ means the Ri channel of the current node in z^'th dimension along negative direction. The packet need traverse a wraparound link in the z^'th dimension when bi > k/2. The next hop would be along the negative direction in dimension z. Therefore, the Rn_ channel is added. The packet need traverse a wraparound link in the z^'th dimension when bi < -k/2. The next hop would be in the positive direction along dimension z. Therefore, the Rn₊ channel is provided. The packet need not traverse a wraparound link along z^'th dimension when 0 < bi < k/2. The next hop would be in the positive direction along dimension z, the Rn₊ is provided in this case, -k/2 < bi < 0 means that the packet need not traverse a wraparound link in the z^'th dimension and the Rn_ should be provided.

As shown in Fig.3, the function FirstWrap() finds the first dimension z, along which the packet need traverse a wraparound link. The packet need traverse a wraparound link in neither dimension when z = 0. The procedure tries to find the first b_j such that b_j≠ 0 via the function firstOne(). The R₂ channel along the y^'th dimension can be added because dimension-order routing is applied on R₂. Let the z^'th dimension be the lowest dimension, along which the packet need traverse a wraparound link when i≠ 0. The channel R₂ of the next hop along dimension z can be added only when the current node is directly connected to the wraparound link of dimension z.

Fig.4 presents how negative-first routing is used on the mesh sub-network. The function FirstWrap() is the same as stated above . When z = 0 the procedure checks in each dimension the packet would route along positive direction or negative direction, and put the corresponding R₂ channels into Ti and T₂, separately. If the T₂ set is not empty, channels in T₂ are added to the S. Otherwise channels in Ti are added to the S. The reason is that, when negative-first routing is applied on mesh sub-network, packets must traverse via R₂ channels along negative direction first, then via R₂ channels along positive direction. When i≠ 0, the analysis is the same as stated about Fig.3.

As shown in Fig.5, assuming the dimension-order routing algorithm is applied on R₂ channels of the mesh sub-network links, a packet traverses from the source A to the destination H. The routing algorithm in a 2D torus is stated as follows. Because this algorithm is fully adaptive, packets can be routed along any minimal routing path. We just arbitrarily specify a routing path and states in this path which channel or channels can be requested. 1) A→ B, the packet need traverse two wraparound links from the source A to the destination H, where dimension x is the lowest dimension along which the packet need traverse a wraparound link. From Rule 1 and Rule 3, either Ri or R₂ can be selected;

2) B→ C, the packet need traverse a wraparound link from B to H, while dimension y is the lowest dimension along which the packet need traverse a wraparound link. From Rule 1 and Rule 3 we know that, either Ri or R₂ can be selected;

3) C→ E, F→ G, G→ H, the packet need traverse no wraparound link from the current node to the destination H. These hops follow the dimension-order algorithm. As stated in Rule 2, R₂ channels can be selected in these cases. Ri can also be selected. Therefore, either Ri or R₂ can be selected;

4) E→ F, the packet need traverse no wraparound link from E to H. This hop does not follow the dimension-order algorithm. From Rule 1 and Rule 2, R₂ channel cannot be selected in this hop, only Ri channel can be selected.

Let a packet be delivered from I to L as shown in Fig.5: I→ J, only Ri channel can be selected because this hop does not follow the dimension-order algorithm. J→ K, K→ L: either Ri or R₂ can be selected as presented in Fig.5.

Flow controlled clue is described as Fig.6. f₊, s_i+ represent the number of free buffers and safe packets in the input buffer which neighboring the current node C along dimension i in positive direction, respectively. The input of this algorithm including coordinates of the current node; coordinates of destination node; free buffer numbers and special packet numbers of all neighboring input buffers. Available channel set and the selected output channel are initialized as and null. If the current node is equal to destination, the internal channel is selected. Otherwise, for each dimension whether the packet would be forwarded is checked through function flow-control (). Add the channel ch;₊ or ch;_ to S if the packet could advance along dimension i. The function issafe (z) returns 1 when the packet is safe to the next node along dimension i. Otherwise, it returns 0. Function flow-control () avoids filling any input port with only unsafe packets. It checks how many free buffers (f) and safe packets (s) in the next input buffer:

1) / > 1, the packet could be delivered because there is more than one free buffers in the next node.

2) f = 1 and s > 0, the packet could be delivered because there is at least one safe} packet in the next node.

3) /= 1 and s = 0, the packet could be delivered if it is a safe packet to the next node; otherwise, keep the packet in the waiting list.

4) f= 0, keep the packet in the waiting list.

Finally, select an output channel from S if it is not null. Otherwise, the packet is blocked. The key point of flow controlled clue is the flow-control () function. It avoids filling any input port with only unsafe packets. The input buffers are organized as dynamically allocated multi-queues. There are two queues at each input port: a safe packet queue and an unsafe packet queue. All safe packets to a node in an input port are linked together as a safe packet queue. The rest are linked as an unsafe packet queue. So the safe packets would not be blocked by unsafe packets.

All nodes in an n-dimension torus can be classified as follows: nodes that are directly connected to n wraparound links fall into set S₀. The Si set contains the nodes such that a packet from a node in Si to any node in the set So takes at least i hops. In a k-ary n-cube, a node (x₀, x_ls x_n-1) is in the set with subscript:

∑min(x., - l - x.)

o

As shown in Fig.7, So contains the four vertices, which are directly connected to two wraparound links; Si contains the eight nodes neighboring to the four vertices in So. A packet takes at least two steps from a node in the set S₂ to any node in the set So.

Fig.8 shows an algorithmic routing logic design for wormhole clue in a 2D torus. To simplify the routing logic, a little change is made to the algorithm: when packets need cross wrap-around links, they can only request Ri channels of the dimensions which they need cross wrap-around links.

A routing header includes a sign bit (sx and sy) and a wraparound link bit (wx and wy) for each dimension. The relative address (x and y) specify the number of hops to the destination in each dimension. The direction is specified by the sign bit. The wraparound link bit indicates whether the packet need traverse a wraparound link along that dimension. The relative addresses (called offsets in many cases) are input to zero checkers that generate the signals xdone and ydone. If the relative address in a particular dimension is zero, the packet is done routing along that dimension. The sign bits, the done signals are then input to an array of five AND gates that determine which directions, whether need traverse a wraparound link, are productive in the sense that they will move the packet closer to the destination. For example, the second leftmost gate from determines that if xdone = 0, indicating that we are not done routing in the x dimension, and sx = 0, indicating that the packet is delivered in the +x direction, then the +x is productive. The 8 -bit productive vector is input to a routing function block that decides which channels are available. The routing function block also needs one bit for each dimension to specify whether the current node is directly connected to a wrap-around link in that dimension (wrapx and wrapy). Queue lengths specify the state of each virtual channel.

The routing function block is not complex. Assume dimension-order routing is applied on R₂. We take R_lx+, R_2x+ and R_2y+ for example to explain when these channels can be added to the available channel set. In wormhole clue, as the constraint we added, when packets need cross wrap-around links in y dimension and not in x dimension, it could not request Ri channels of x dimension. Otherwise, Ri channels of x dimension could be added to the channel set if necessary. The free buffer bit fr_Ri_x+ indicates there is at least one free buffer in the corresponding Ri channel. Then,

^Rix₊ = fi^"Rix₊ - (+^x) - wy - wx.

In clue it would be:

^Rlx₊ = ^frRlx₊ - (+^X) As we discussed above, Ri is fully adaptive in clue, packets can request Ri at any time. When packets need traverse along x dimension in positive direction (the +x bit) and there is at least free buffer, the Ri_x+ can be added to the available channel set.

In the selection function, R₂ channels are preferred when both Ri and R₂ available. When several Ri channels or R₂ channels available, one channel is randomly selected. So the selection function is also very simple. At last, a one-hot selected vector indicates the virtual channel that the packet selects.

The input buffers of flow controlled clue are organized as dynamically allocated multi-queues. The router design is shown in Fig.8. Packets of each queue advance in the traditional FIFO manner. Each output channel maintains three counts: G, C, and S as shown in Table 1. The credit count C decreases 1 and S increases 1 when an output channel sends a safe packet to a downstream node. Once the downstream router forwards a packet and releases the associated buffer, it sends a credit to the upstream router. This makes a buffer count be incremented. There is also a bit in the credit indicating whether this packet is a safe packet. This bit can be used to update C.

Claims

What is claimed is:

4) f = 0, keep the packet in the waiting list. (3) A partially adaptive routing algorithm in wormhole switching torus.

This algorithm is named as wormhole clue. All nodes in an n-dimension torus can be classified as follows: nodes that are directly connected to n wraparound links fall into set So. The Sj set contains the nodes such that a packet from a node in Sj to any node in the set So takes at least i hops. A minimal deadlock free algorithm for wormhole switching torus is proposed as following:

1) A packet need traverse one or more wraparound links.

a) Ri channels: It can request only Ri channels from nodes of class S, to nodes of class S_j where (j < i).

2) A packet need not traverse wraparound links.

a) Ri channels. It could request any of the Ri channels available.

b) R₂ channels. It could request a R₂ channel of the mesh sub-network links. However, it must follow a deadlock-free routing algorithm for meshes, such as, negative- first or dimension-order routing when being delivered via R₂.