WO2006106445A1

WO2006106445A1 - Electronic device and method for input queuing

Info

Publication number: WO2006106445A1
Application number: PCT/IB2006/050903
Authority: WO
Inventors: Theodorus J. J. Denteneer; Ronald Rietman; Santiago Gonzalez Pestana; Nick Boot; Ivo J-B. F. Adan
Original assignee: Nxp B.V.
Priority date: 2005-04-05
Filing date: 2006-03-23
Publication date: 2006-10-12

Abstract

An electronic device is provided comprising a network interconnect means (NOC) for coupling processing units (IP). Said network interconnect means (NOC) comprise a plurality of interconnect means (NI, R) each having a number (N) of inputs (I1 -I4) and an input queuing means (IQM) for queuing inputs of the interconnect means (NI, R). The input queuing means (IQM) comprises a plurality of queuing units (QU1 -QU4), and is adapted to associate one of the queuing units (QU1-QU4) to each of the number (N) of inputs (I1-I4; P1-P4), wherein each queuing unit (QU1 -QU4) associated to one of the inputs (I1-I4; P1 -P4) comprise an individual number of input queues (IQ) independently of the other queuing units (QU1-QU4).

Description

Electronic device and method for input queuing

The present invention relates to an electronic device with a network interconnect which comprises a plurality of interconnect means. The invention is also related to a method for input queuing within an electronic device having a network interconnect means with a plurality of interconnect means. Networks on chip NOC proved to be scalable interconnect structures which could become possible solutions for future on chip interconnections between so-called IP blocks, i.e. intellectual property blocks. IP blocks are usually modules on chip with a specific function like CPUs, memories, digital signal processors or the like. The IP blocks communicate with each other via the network on chip. The network on chip is typically composed of network interfaces and routers. The network interfaces serve to provide an interlace between the IP block and the network on chip, i.e. they translate the information from the IP block to information which the network on chip can understand and vice versa. The routers serve to transport data from one network interface to another.

The communication between two IP blocks via the network on chip can be performed based on a guaranteed throughput traffic where circuit switching is used or based on a best effort traffic where packet switching is used. If the best effort routing is used, some kind of buffering will be needed in the routers. Therefore, typically the routers comprise some queuing units. The input queue can be implemented by a FIFO at every input of the router. The most simple situation would be one queue for each input of the router. The input queuing is less expensive than other designs but the throughput is only 0,59. Buffering within the router can also be performed by output queuing, i.e. the queues are situated at the output of the router. Therefore, the output of the router will need a queue for each input, i.e. N² queues are required with N being the number of inputs. A router based on multiple input queuing typically comprises a N-by-N switch which connects N inputs to N outputs. The router will have 1 < m < N input queues for each input or input connection.

The buffering may also be performed by virtual output queuing where each input comprises N queues, one for the traffic to each of the outputs such that N² queues are required at the input side of the router. A further method for buffering within the router is the multiple input queuing where multiple queues are arranged at the input of the router. The number of queues at the input side for each input can be smaller than the number of outputs, while the number of inputs is almost always equal to the number of outputs. It is therefore an object of the invention to provide an electronic device with a network interconnect with a plurality of interconnect means, wherein the input queuing of the interconnect means is improved.

This object is solved by an electronic device according to claim 1 as well as a method for input queuing according to claim 5. Therefore, an electronic device is provided comprising network interconnect means for coupling processing units. The network interconnect means comprise a plurality of interconnect means, each having a number of inputs as well as an input queuing means for queuing inputs of the interconnect means. The input queuing means comprise a plurality of queuing units. The input queuing means is further adapted to associate one of the queuing units to each of the inputs. Each queuing unit associated to one of the inputs comprise an individual number of input queues independently of the other queuing units. This implies that at least one queuing unit associated to one of the inputs comprises a number of input queues different from that of an other queuing unit.

Accordingly, the throughput of the interconnect means can be improved by providing a flexible input queuing independently for each of the inputs.

According to a further aspect of the invention, the input queuing means is adapted to associate an individual number of queuing units to each of the inputs according to a traffic distribution in the network interconnect means. Accordingly, more queuing units or input queues can be associated to those inputs of the interconnect means which are associated to a larger traffic distribution than other inputs.

The invention further relates to a method for input queuing within an electronic device having a network interconnect means for coupling processing units. Said network interconnect means comprise a plurality of interconnect means each having a number of inputs and an input queuing means with queuing units for queuing inputs of the interconnect means. A number of input queues is individually associated to each of the queuing units associated to one of the of inputs.

The invention is based on the idea to associate an individual number of input queue to each of the inputs such that different traffic streams on every input port are distributed into different queues to maximize the throughput. In other words, a flexible distribution of input queues is provided. The amount of input queuing for each input can be adapted to the actual traffic distribution.

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter and with respect to the following figures.

Fig. 1 shows the basic architecture of a network on chip according to a first embodiment, Fig. 2 shows a block diagram of a router according to the second embodiment,

Fig: 3 shows a block diagram of a network interface according to the third embodiment,

Fig. 4 shows a block diagram of a router according to a fourth embodiment; Fig. 5 shows a graph for illustrating the pre-routing scheme according to a fifth embodiment.

Fig. 1 shows a basic structure of a system on chip with a network on chip interconnect according to a first embodiment. A plurality of IP blocks IP are coupled to each other via a network on chip NOC. The network NOC comprises network interfaces NI for providing an interface between the IP block IP and the network on chip NOC. The network on chip NOC furthermore comprises a plurality of routers R. The network interface NI serves to translate the information from the IP block to a protocol which can be handled by the network on chip NOC and vice versa. The routers R serve to transport the data from one network interface NI to another. The communication between the network interfaces NI will not only depend on the number of routers R in between them, but also on the topology of the routers R. The routers R may be fully connected, connected in a 2D mesh, connected in a linear array, connected in a torus, connected in a folded torus, connected in a binary tree or in a fat-tree fashion. The IP block IP can be implemented as modules on chip with a specific or dedicated function such as CPU, memory, digital signal processors or the like.

The information from the IP block IP which is transferred via the network on chip will be translated at the network interface NI into packets with variable length. The information from the IP block IP will typically comprise a command followed by an address and an actual data to be transported over the network. The network interface NI will divide the information from the IP block IP into pieces called packets and will add a packet header to each of the packets. Such a packet header comprises extra information that allows the transmission of the data over the network (e.g. destination address or routing path, and flow control information). Accordingly, each packet is divided into flits (flow control digit), which can travel through the network on chip. The flit can be seen as the smallest granularity at which control is taken place.

The communication is typically synchronized by a clock periodic signal to all of the interconnect means, the network interfaces and the routers. For example, if the router takes two-clock periods to produce a decision, then we have that the flit size is two words. In this case 'words' is the length of the data path.

Fig. 2 shows a block diagram of a router R according to the second embodiment. The router R comprises N inputs Il - 14 and N outputs Ol - O4. In the Fig. 2, N is chosen to be 4 for illustration purpose only. The router R will further comprise an input queuing means IQM and a switch fabric unit SFU. As mentioned above, the packets are divided into flits to be transmitted over the network on chip. The flits will arrive at the inputs of the router R and need to be distributed by the router R to the correct outputs according to the information as contained in the packet header. The switch fabric unit SFU is then used to switch the information from the input to the respective outputs, i.e. the switch fabric unit SFU is a N-by-N switch. Especially for a best effort routing scheme, the routers will need to buffer some of the arriving flits. According to the second embodiment, the buffering of the flits is performed by the input queuing means IQM. Input queuing means IQM comprises a plurality of queuing units QUl - QU4 which can each be implemented as a number of FIFO queues. In contrast to a router based on multiple input queuing, the router according to the second embodiment is based on a flexible input queuing scheme, i.e. the number of inputs queues (per queuing unit QU1-QU4) associated to anyone of the inputs Il -14 is flexible and can be adjusted independently of the other inputs. In contrast to that, the router based on the multiple input queuing comprises equally many input queues for each of the inputs. Such a restriction is removed by the router according to the second embodiment and a flexible number of input queues is now associated with each input or input connection.

For the N inputs Il - 14 each input port i has mi input queues, i.e. each input port is associated to a queuing unit QU1-QU4. Each queuing unit QU1-QU4 can comprise a different number of input queues. Part of the router cost will therefore be determined by the amount of queues. Hence, this cost in a router according to the second embodiment equals

This cost in a design based on multiple input queuing is obviously equal to Nm. The increased flexibility of the router according to the second embodiment can be used to adopt the input queuing according to the actual traffic distribution within the network on chip such that a higher throughput can be achieved with the same number of input queues and thus the same costs.

As an example N is chosen to equal 4 and that the traffic distribution matrix is

Accordingly, for the input port II, the traffic is destined for output ports 2, 3, or 4 based on equal probability. At input 12, the traffic is destined for output port 1. At input port 13, the traffic is destined for output 2. At input 4, the traffic is destined for output ports 1, 3, or 4 with equal probability.

Consider a router based on multiple input queuing with m = 2 and a router R according to the second embodiment with ml= 3, m2= 1, m3= 1, and m4= 3. The first input queuing unit QUl will comprise three input queues. The second queuing unit QU2 will comprise one input queue associated to the second input 12. The third input queuing unit QU3 will comprise one input queue associated to the third input 13. The fourth queuing unit QU4 comprises three input queues associated to the fourth input 14. The router based on multiple queuing with the standard iSLIP algorithm (using two iterations) achieves a throughput of 0.773 with Nm= 8 queues. The router according to the second embodiment achieves a throughput of 0.833 with a total of

Σ N m_t = 8 queues.

The input queuing scheme according to the second embodiment is based on the idea to associate a number of input queues with the input port based on an arbitrary integer. In other words, the numbers of input queues associated to anyone of the inputs can vary from one input to another. Preferably, the numbers of input queues within a queuing unit QU1-QU4 associated to an input corresponds to the traffic distribution for that input. If the traffic distribution is high, more input queues will be associated to the respective input or input connection.

Fig. 3 shows a block diagram of a network on chip according to the third embodiment. The network interface NI comprises an input queuing means IQM with a plurality of queuing units QUl -QU4 which can each be implemented as a number of FIFO queue. Accordingly, the input queuing means IQM according to the third embodiment corresponds to the input queuing means IQM according to the second embodiment. In other words, the input queuing means IQM according to the third embodiment implements the flexible input queuing scheme as described according to the second embodiment. The number of FIFO input queues associated to each of the queuing units QUl - QU4 will be flexible and preferably depends on actual traffic distribution within the network on chip NOC.

Fig. 4 shows a block diagram of a router according to a fourth embodiment. The router has N inputs P1-P4 and N output ports, i.e. N=4, and comprises a switch fabric unit SFU as well as an input queuing means IQM for managing the input queuing. Every input port Pl -P4 is connected or associated to a queuing unit QU1-QU4. The queuing units QU1-QU4 can be part of a queuing means IQM. Every queuing unit QU1-QU4 is associated with every port P1-P4 and is in its nature flexible and can comprise several input queues IQ. For instance, port 1 having a number of streams not specified will have a queuing unit QUl associated with it (port 1), and this queuing unit QUl behaves or is programmed as 4 input queues IQ. On the other hand, port 3 will be associated with the queuing unit QU3 and internally will be configured as a larger single queue.

According to a fifth embodiment of the invention a network interface NI is provided with an implementation of the input queuing according to the fifth embodiment. A router according to a sixth embodiment is based preferably on the router according to the second embodiment. The input queuing means IQM is further able to perform pre-routing. If one of the queuing units QU1-QU4 comprises more than one FIFO queue associated to one input, the input queuing means IQM distribute the incoming traffic over the input queues. This has to be performed for each of the inputs if more than one FIFO queue is associated to it. One scheme to distribute the incoming traffic is the odd-even rule. For a router based on the multiple input queuing scheme with two input queues associated to each of its input, the input connection can identify the queues as odd numbered input queue and even numbered input queue. The traffic destined for an odd number connection is routed to the odd numbered input queue and the traffic destined for the even numbered output connection is routed to an even numbered input queue.

Fig. 5 shows a graph in order to illustrate different pre-routing techniques. A router based on the multiple input queuing scheme may comprise N inputs and outputs with m = 2, i.e. two input queues are associated to each of the input. A uniform traffic pattern is considered and the standard iSLIP algorithm (with two iterations) is applied to switch the inputs of the routers to its outputs. Such a router based on the odd-even pre-routing scheme OE is depicted in Fig. 5.

However, a router according to the sixth embodiment is based on the following pre-routing scheme H2S as depicted in Fig. 5. At the input i, the traffic destined for the outputs I,I + 1 mod N, ... , i + N 12 - 1 mod N is pre-routed to the odd numbered queue while the remaining traffic is pre-routed to the even numbered queue.

Additionally, in Fig. 5 the router with the standard input queuing design IQ is shown. The performance thereof is clearly worse compared to the other two schemes. Accordingly, even for a uniform traffic distribution, the flexible pre-routing scheme as described with regard to the fourth embodiment is better than a fixed pre-routing. This result can even be bigger for any non-uniform traffic distribution.

The above-described pre-routing scheme according to the sixth embodiment can be applied to the router according to the second and fourth embodiment as well as to the network interface according to the third and fifth embodiment. The pre-routing scheme can even be applied to the router based on the multiple input queuing scheme, i.e. with the same number of input queues associated to each of the inputs or input connection. In other words, although the best results may be achieved for routers and network interfaces with flexible input queuing as well as with flexible pre-routing according to the sixth embodiment, the pre- routing scheme according to the sixth embodiment may also be advantageously implemented for a router on the multiple input queuing.

A flexible pre-routing scheme may also be applied to a network interface based on the multiple input queuing scheme.

It should be noted that the above described input queuing schemes and flexible pre-routing schemes can be applied to routers as well as network interfaces within a network on chip according to Fig. 1. Therefore, the network on chip NOC according to Fig. 1 may comprise a plurality of routers according to the second embodiment of Fig. 2, plurality of routers according to the fourth embodiment of Fig. 4, a plurality of network interlaces according to the third embodiment of Fig. 3, a plurality of network interfaces according to the fifth embodiment, a plurality of routers based on the multiple input queuing scheme with the pre-routing scheme according to the sixth embodiment, a plurality of network interfaces based on the multiple input queuing scheme and the flexible pre-routing scheme according to the sixth embodiment. A plurality of routers according to the second embodiment with the flexible pre-routing according to the sixth embodiment as well as a plurality of network interlaces according to the third embodiment with the flexible pre-routing according to the sixth embodiment.

Preferably, the number of input queues associated to an input connection or input port is determined or defined at design time. According to a seventh embodiment, a given traffic distribution matrix with traffic intensities of origin-destination pairs can be considered for a given network structure and routing algorithm. A simulation of the behavior thereof including the traffic pattern for each router can be performed in order to improve the performance of the overall system, input queues can be moved from one input or input connection of a router or a network to another router or network interface to another input of a router. Accordingly, merely by using some standard stochastic optimization algorithm, such as simulated annealing the design of a network on chip, in particular the design of the queue placement can be optimized.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. In the device claim in numerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are resided in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Furthermore, any reference signs in the claims shall not be constitute as limiting the scope of the claims.

Claims

CLAIMS:

1. Electronic device, comprising: a network interconnect means (NOC) for coupling processing units (IP); said network interconnect means (NOC) comprise a plurality of interconnect means (NI, R) each having a number (N) of inputs (11-14; P1-P4) and an input queuing means (IQM) for queuing inputs of the interconnect means (NI, R); wherein the input queuing means (IQM) comprises a plurality of queuing units (QUl -QU4), and is adapted to associate one of the queuing units (QUl -QU4) to each of the number (N) of inputs (11-14; P1-P4), wherein at least one queuing unit (QUl) associated to one of the inputs comprises a number of input queues different from that of an other queuing unit (QU2).

2. Electronic device according to claim 1, wherein the input queuing means (IQM) is adapted to associate an individual number of input queues (IQ) to each queuing unit (QUl -QU4) associated to one of the inputs (11-14) according to a traffic distribution in the network interconnect means (NOC).

3. Electronic device according to claim 2, wherein each of said plurality of interconnect means (NI, R) comprises a network interlace (NI) or a router (R).

4. Electronic device according to claim 1 or 2, wherein the input queuing means (IQM) is adapted to pre-route the inputs (11-14) to the odd-numbered or even numbered queuing units (QUl -QU4) according a function based on the destined output and a modulo function on the number (N) of inputs (11-14; P1-P4).

5. Method for input queuing within an electronic device having a network interconnect means (NOC) for coupling processing units (IP); wherein said network interconnect means (NOC) comprise a plurality of interconnect means (NI, R) each having a number (N) of inputs (11-14) and an input queuing means (IQM) with queuing units (QUl- QU4) for queuing inputs of the interconnect means (NI, R); comprising the step of: individually associating a number of input queues (IQ) to each of the queuing units (QU1-QU4) associated to one of the of inputs (11-14; P1-P4).