WO2006106445A1 - Electronic device and method for input queuing - Google Patents

Electronic device and method for input queuing Download PDF

Info

Publication number
WO2006106445A1
WO2006106445A1 PCT/IB2006/050903 IB2006050903W WO2006106445A1 WO 2006106445 A1 WO2006106445 A1 WO 2006106445A1 IB 2006050903 W IB2006050903 W IB 2006050903W WO 2006106445 A1 WO2006106445 A1 WO 2006106445A1
Authority
WO
WIPO (PCT)
Prior art keywords
queuing
input
inputs
network
router
Prior art date
Application number
PCT/IB2006/050903
Other languages
French (fr)
Inventor
Theodorus J. J. Denteneer
Ronald Rietman
Santiago Gonzalez Pestana
Nick Boot
Ivo J-B. F. Adan
Original Assignee
Nxp B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nxp B.V. filed Critical Nxp B.V.
Publication of WO2006106445A1 publication Critical patent/WO2006106445A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports
    • H04L49/3045Virtual queuing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/111Switch interfaces, e.g. port details
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports
    • H04L49/3018Input queuing

Definitions

  • the present invention relates to an electronic device with a network interconnect which comprises a plurality of interconnect means.
  • the invention is also related to a method for input queuing within an electronic device having a network interconnect means with a plurality of interconnect means.
  • Networks on chip NOC proved to be scalable interconnect structures which could become possible solutions for future on chip interconnections between so-called IP blocks, i.e. intellectual property blocks.
  • IP blocks are usually modules on chip with a specific function like CPUs, memories, digital signal processors or the like.
  • the IP blocks communicate with each other via the network on chip.
  • the network on chip is typically composed of network interfaces and routers.
  • the network interfaces serve to provide an interlace between the IP block and the network on chip, i.e. they translate the information from the IP block to information which the network on chip can understand and vice versa.
  • the routers serve to transport data from one network interface to another.
  • the communication between two IP blocks via the network on chip can be performed based on a guaranteed throughput traffic where circuit switching is used or based on a best effort traffic where packet switching is used. If the best effort routing is used, some kind of buffering will be needed in the routers. Therefore, typically the routers comprise some queuing units.
  • the input queue can be implemented by a FIFO at every input of the router. The most simple situation would be one queue for each input of the router. The input queuing is less expensive than other designs but the throughput is only 0,59. Buffering within the router can also be performed by output queuing, i.e. the queues are situated at the output of the router. Therefore, the output of the router will need a queue for each input, i.e.
  • a router based on multiple input queuing typically comprises a N-by-N switch which connects N inputs to N outputs. The router will have 1 ⁇ m ⁇ N input queues for each input or input connection.
  • the buffering may also be performed by virtual output queuing where each input comprises N queues, one for the traffic to each of the outputs such that N 2 queues are required at the input side of the router.
  • a further method for buffering within the router is the multiple input queuing where multiple queues are arranged at the input of the router. The number of queues at the input side for each input can be smaller than the number of outputs, while the number of inputs is almost always equal to the number of outputs. It is therefore an object of the invention to provide an electronic device with a network interconnect with a plurality of interconnect means, wherein the input queuing of the interconnect means is improved.
  • an electronic device comprising network interconnect means for coupling processing units.
  • the network interconnect means comprise a plurality of interconnect means, each having a number of inputs as well as an input queuing means for queuing inputs of the interconnect means.
  • the input queuing means comprise a plurality of queuing units.
  • the input queuing means is further adapted to associate one of the queuing units to each of the inputs.
  • Each queuing unit associated to one of the inputs comprise an individual number of input queues independently of the other queuing units. This implies that at least one queuing unit associated to one of the inputs comprises a number of input queues different from that of an other queuing unit.
  • the throughput of the interconnect means can be improved by providing a flexible input queuing independently for each of the inputs.
  • the input queuing means is adapted to associate an individual number of queuing units to each of the inputs according to a traffic distribution in the network interconnect means. Accordingly, more queuing units or input queues can be associated to those inputs of the interconnect means which are associated to a larger traffic distribution than other inputs.
  • the invention further relates to a method for input queuing within an electronic device having a network interconnect means for coupling processing units.
  • Said network interconnect means comprise a plurality of interconnect means each having a number of inputs and an input queuing means with queuing units for queuing inputs of the interconnect means.
  • a number of input queues is individually associated to each of the queuing units associated to one of the of inputs.
  • the invention is based on the idea to associate an individual number of input queue to each of the inputs such that different traffic streams on every input port are distributed into different queues to maximize the throughput.
  • a flexible distribution of input queues is provided.
  • the amount of input queuing for each input can be adapted to the actual traffic distribution.
  • Fig. 1 shows the basic architecture of a network on chip according to a first embodiment
  • Fig. 2 shows a block diagram of a router according to the second embodiment
  • Fig: 3 shows a block diagram of a network interface according to the third embodiment
  • Fig. 4 shows a block diagram of a router according to a fourth embodiment
  • Fig. 5 shows a graph for illustrating the pre-routing scheme according to a fifth embodiment.
  • Fig. 1 shows a basic structure of a system on chip with a network on chip interconnect according to a first embodiment.
  • a plurality of IP blocks IP are coupled to each other via a network on chip NOC.
  • the network NOC comprises network interfaces NI for providing an interface between the IP block IP and the network on chip NOC.
  • the network on chip NOC furthermore comprises a plurality of routers R.
  • the network interface NI serves to translate the information from the IP block to a protocol which can be handled by the network on chip NOC and vice versa.
  • the routers R serve to transport the data from one network interface NI to another.
  • the communication between the network interfaces NI will not only depend on the number of routers R in between them, but also on the topology of the routers R.
  • the routers R may be fully connected, connected in a 2D mesh, connected in a linear array, connected in a torus, connected in a folded torus, connected in a binary tree or in a fat-tree fashion.
  • the IP block IP can be implemented as modules on chip with a specific or dedicated function such as CPU, memory, digital signal processors or the like.
  • the information from the IP block IP which is transferred via the network on chip will be translated at the network interface NI into packets with variable length.
  • the information from the IP block IP will typically comprise a command followed by an address and an actual data to be transported over the network.
  • the network interface NI will divide the information from the IP block IP into pieces called packets and will add a packet header to each of the packets.
  • Such a packet header comprises extra information that allows the transmission of the data over the network (e.g. destination address or routing path, and flow control information). Accordingly, each packet is divided into flits (flow control digit), which can travel through the network on chip. The flit can be seen as the smallest granularity at which control is taken place.
  • the communication is typically synchronized by a clock periodic signal to all of the interconnect means, the network interfaces and the routers. For example, if the router takes two-clock periods to produce a decision, then we have that the flit size is two words. In this case 'words' is the length of the data path.
  • Fig. 2 shows a block diagram of a router R according to the second embodiment.
  • the router R comprises N inputs Il - 14 and N outputs Ol - O4.
  • N is chosen to be 4 for illustration purpose only.
  • the router R will further comprise an input queuing means IQM and a switch fabric unit SFU.
  • the packets are divided into flits to be transmitted over the network on chip.
  • the flits will arrive at the inputs of the router R and need to be distributed by the router R to the correct outputs according to the information as contained in the packet header.
  • the switch fabric unit SFU is then used to switch the information from the input to the respective outputs, i.e. the switch fabric unit SFU is a N-by-N switch.
  • the routers will need to buffer some of the arriving flits.
  • the buffering of the flits is performed by the input queuing means IQM.
  • Input queuing means IQM comprises a plurality of queuing units QUl - QU4 which can each be implemented as a number of FIFO queues.
  • the router according to the second embodiment is based on a flexible input queuing scheme, i.e. the number of inputs queues (per queuing unit QU1-QU4) associated to anyone of the inputs Il -14 is flexible and can be adjusted independently of the other inputs.
  • the router based on the multiple input queuing comprises equally many input queues for each of the inputs. Such a restriction is removed by the router according to the second embodiment and a flexible number of input queues is now associated with each input or input connection.
  • each input port i has mi input queues, i.e. each input port is associated to a queuing unit QU1-QU4.
  • Each queuing unit QU1-QU4 can comprise a different number of input queues. Part of the router cost will therefore be determined by the amount of queues. Hence, this cost in a router according to the second embodiment equals
  • This cost in a design based on multiple input queuing is obviously equal to Nm.
  • the increased flexibility of the router according to the second embodiment can be used to adopt the input queuing according to the actual traffic distribution within the network on chip such that a higher throughput can be achieved with the same number of input queues and thus the same costs.
  • N is chosen to equal 4 and that the traffic distribution matrix is
  • the traffic is destined for output ports 2, 3, or 4 based on equal probability.
  • the traffic is destined for output port 1.
  • the traffic is destined for output 2.
  • the traffic is destined for output ports 1, 3, or 4 with equal probability.
  • the first input queuing unit QUl will comprise three input queues.
  • the second queuing unit QU2 will comprise one input queue associated to the second input 12.
  • the third input queuing unit QU3 will comprise one input queue associated to the third input 13.
  • the fourth queuing unit QU4 comprises three input queues associated to the fourth input 14.
  • the router according to the second embodiment achieves a throughput of 0.833 with a total of
  • the input queuing scheme according to the second embodiment is based on the idea to associate a number of input queues with the input port based on an arbitrary integer.
  • the numbers of input queues associated to anyone of the inputs can vary from one input to another.
  • the numbers of input queues within a queuing unit QU1-QU4 associated to an input corresponds to the traffic distribution for that input. If the traffic distribution is high, more input queues will be associated to the respective input or input connection.
  • Fig. 3 shows a block diagram of a network on chip according to the third embodiment.
  • the network interface NI comprises an input queuing means IQM with a plurality of queuing units QUl -QU4 which can each be implemented as a number of FIFO queue.
  • the input queuing means IQM according to the third embodiment corresponds to the input queuing means IQM according to the second embodiment.
  • the input queuing means IQM according to the third embodiment implements the flexible input queuing scheme as described according to the second embodiment.
  • the number of FIFO input queues associated to each of the queuing units QUl - QU4 will be flexible and preferably depends on actual traffic distribution within the network on chip NOC.
  • Fig. 4 shows a block diagram of a router according to a fourth embodiment.
  • the queuing units QU1-QU4 can be part of a queuing means IQM. Every queuing unit QU1-QU4 is associated with every port P1-P4 and is in its nature flexible and can comprise several input queues IQ.
  • port 1 having a number of streams not specified will have a queuing unit QUl associated with it (port 1), and this queuing unit QUl behaves or is programmed as 4 input queues IQ.
  • port 3 will be associated with the queuing unit QU3 and internally will be configured as a larger single queue.
  • a network interface NI is provided with an implementation of the input queuing according to the fifth embodiment.
  • a router according to a sixth embodiment is based preferably on the router according to the second embodiment.
  • the input queuing means IQM is further able to perform pre-routing. If one of the queuing units QU1-QU4 comprises more than one FIFO queue associated to one input, the input queuing means IQM distribute the incoming traffic over the input queues. This has to be performed for each of the inputs if more than one FIFO queue is associated to it.
  • One scheme to distribute the incoming traffic is the odd-even rule.
  • the input connection can identify the queues as odd numbered input queue and even numbered input queue.
  • the traffic destined for an odd number connection is routed to the odd numbered input queue and the traffic destined for the even numbered output connection is routed to an even numbered input queue.
  • Fig. 5 shows a graph in order to illustrate different pre-routing techniques.
  • a uniform traffic pattern is considered and the standard iSLIP algorithm (with two iterations) is applied to switch the inputs of the routers to its outputs.
  • Such a router based on the odd-even pre-routing scheme OE is depicted in Fig. 5.
  • a router according to the sixth embodiment is based on the following pre-routing scheme H2S as depicted in Fig. 5.
  • the traffic destined for the outputs I,I + 1 mod N, ... , i + N 12 - 1 mod N is pre-routed to the odd numbered queue while the remaining traffic is pre-routed to the even numbered queue.
  • Fig. 5 the router with the standard input queuing design IQ is shown. The performance thereof is clearly worse compared to the other two schemes. Accordingly, even for a uniform traffic distribution, the flexible pre-routing scheme as described with regard to the fourth embodiment is better than a fixed pre-routing. This result can even be bigger for any non-uniform traffic distribution.
  • the above-described pre-routing scheme according to the sixth embodiment can be applied to the router according to the second and fourth embodiment as well as to the network interface according to the third and fifth embodiment.
  • the pre-routing scheme can even be applied to the router based on the multiple input queuing scheme, i.e. with the same number of input queues associated to each of the inputs or input connection.
  • the pre- routing scheme according to the sixth embodiment may also be advantageously implemented for a router on the multiple input queuing.
  • a flexible pre-routing scheme may also be applied to a network interface based on the multiple input queuing scheme.
  • the network on chip NOC according to Fig. 1 may comprise a plurality of routers according to the second embodiment of Fig. 2, plurality of routers according to the fourth embodiment of Fig. 4, a plurality of network interlaces according to the third embodiment of Fig. 3, a plurality of network interfaces according to the fifth embodiment, a plurality of routers based on the multiple input queuing scheme with the pre-routing scheme according to the sixth embodiment, a plurality of network interfaces based on the multiple input queuing scheme and the flexible pre-routing scheme according to the sixth embodiment.
  • the number of input queues associated to an input connection or input port is determined or defined at design time.
  • a given traffic distribution matrix with traffic intensities of origin-destination pairs can be considered for a given network structure and routing algorithm.
  • a simulation of the behavior thereof including the traffic pattern for each router can be performed in order to improve the performance of the overall system, input queues can be moved from one input or input connection of a router or a network to another router or network interface to another input of a router. Accordingly, merely by using some standard stochastic optimization algorithm, such as simulated annealing the design of a network on chip, in particular the design of the queue placement can be optimized.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

An electronic device is provided comprising a network interconnect means (NOC) for coupling processing units (IP). Said network interconnect means (NOC) comprise a plurality of interconnect means (NI, R) each having a number (N) of inputs (I1 -I4) and an input queuing means (IQM) for queuing inputs of the interconnect means (NI, R). The input queuing means (IQM) comprises a plurality of queuing units (QU1 -QU4), and is adapted to associate one of the queuing units (QU1-QU4) to each of the number (N) of inputs (I1-I4; P1-P4), wherein each queuing unit (QU1 -QU4) associated to one of the inputs (I1-I4; P1 -P4) comprise an individual number of input queues (IQ) independently of the other queuing units (QU1-QU4).

Description

Electronic device and method for input queuing
The present invention relates to an electronic device with a network interconnect which comprises a plurality of interconnect means. The invention is also related to a method for input queuing within an electronic device having a network interconnect means with a plurality of interconnect means. Networks on chip NOC proved to be scalable interconnect structures which could become possible solutions for future on chip interconnections between so-called IP blocks, i.e. intellectual property blocks. IP blocks are usually modules on chip with a specific function like CPUs, memories, digital signal processors or the like. The IP blocks communicate with each other via the network on chip. The network on chip is typically composed of network interfaces and routers. The network interfaces serve to provide an interlace between the IP block and the network on chip, i.e. they translate the information from the IP block to information which the network on chip can understand and vice versa. The routers serve to transport data from one network interface to another.
The communication between two IP blocks via the network on chip can be performed based on a guaranteed throughput traffic where circuit switching is used or based on a best effort traffic where packet switching is used. If the best effort routing is used, some kind of buffering will be needed in the routers. Therefore, typically the routers comprise some queuing units. The input queue can be implemented by a FIFO at every input of the router. The most simple situation would be one queue for each input of the router. The input queuing is less expensive than other designs but the throughput is only 0,59. Buffering within the router can also be performed by output queuing, i.e. the queues are situated at the output of the router. Therefore, the output of the router will need a queue for each input, i.e. N2 queues are required with N being the number of inputs. A router based on multiple input queuing typically comprises a N-by-N switch which connects N inputs to N outputs. The router will have 1 < m < N input queues for each input or input connection.
The buffering may also be performed by virtual output queuing where each input comprises N queues, one for the traffic to each of the outputs such that N2 queues are required at the input side of the router. A further method for buffering within the router is the multiple input queuing where multiple queues are arranged at the input of the router. The number of queues at the input side for each input can be smaller than the number of outputs, while the number of inputs is almost always equal to the number of outputs. It is therefore an object of the invention to provide an electronic device with a network interconnect with a plurality of interconnect means, wherein the input queuing of the interconnect means is improved.
This object is solved by an electronic device according to claim 1 as well as a method for input queuing according to claim 5. Therefore, an electronic device is provided comprising network interconnect means for coupling processing units. The network interconnect means comprise a plurality of interconnect means, each having a number of inputs as well as an input queuing means for queuing inputs of the interconnect means. The input queuing means comprise a plurality of queuing units. The input queuing means is further adapted to associate one of the queuing units to each of the inputs. Each queuing unit associated to one of the inputs comprise an individual number of input queues independently of the other queuing units. This implies that at least one queuing unit associated to one of the inputs comprises a number of input queues different from that of an other queuing unit.
Accordingly, the throughput of the interconnect means can be improved by providing a flexible input queuing independently for each of the inputs.
According to a further aspect of the invention, the input queuing means is adapted to associate an individual number of queuing units to each of the inputs according to a traffic distribution in the network interconnect means. Accordingly, more queuing units or input queues can be associated to those inputs of the interconnect means which are associated to a larger traffic distribution than other inputs.
The invention further relates to a method for input queuing within an electronic device having a network interconnect means for coupling processing units. Said network interconnect means comprise a plurality of interconnect means each having a number of inputs and an input queuing means with queuing units for queuing inputs of the interconnect means. A number of input queues is individually associated to each of the queuing units associated to one of the of inputs.
The invention is based on the idea to associate an individual number of input queue to each of the inputs such that different traffic streams on every input port are distributed into different queues to maximize the throughput. In other words, a flexible distribution of input queues is provided. The amount of input queuing for each input can be adapted to the actual traffic distribution.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter and with respect to the following figures.
Fig. 1 shows the basic architecture of a network on chip according to a first embodiment, Fig. 2 shows a block diagram of a router according to the second embodiment,
Fig: 3 shows a block diagram of a network interface according to the third embodiment,
Fig. 4 shows a block diagram of a router according to a fourth embodiment; Fig. 5 shows a graph for illustrating the pre-routing scheme according to a fifth embodiment.
Fig. 1 shows a basic structure of a system on chip with a network on chip interconnect according to a first embodiment. A plurality of IP blocks IP are coupled to each other via a network on chip NOC. The network NOC comprises network interfaces NI for providing an interface between the IP block IP and the network on chip NOC. The network on chip NOC furthermore comprises a plurality of routers R. The network interface NI serves to translate the information from the IP block to a protocol which can be handled by the network on chip NOC and vice versa. The routers R serve to transport the data from one network interface NI to another. The communication between the network interfaces NI will not only depend on the number of routers R in between them, but also on the topology of the routers R. The routers R may be fully connected, connected in a 2D mesh, connected in a linear array, connected in a torus, connected in a folded torus, connected in a binary tree or in a fat-tree fashion. The IP block IP can be implemented as modules on chip with a specific or dedicated function such as CPU, memory, digital signal processors or the like.
The information from the IP block IP which is transferred via the network on chip will be translated at the network interface NI into packets with variable length. The information from the IP block IP will typically comprise a command followed by an address and an actual data to be transported over the network. The network interface NI will divide the information from the IP block IP into pieces called packets and will add a packet header to each of the packets. Such a packet header comprises extra information that allows the transmission of the data over the network (e.g. destination address or routing path, and flow control information). Accordingly, each packet is divided into flits (flow control digit), which can travel through the network on chip. The flit can be seen as the smallest granularity at which control is taken place.
The communication is typically synchronized by a clock periodic signal to all of the interconnect means, the network interfaces and the routers. For example, if the router takes two-clock periods to produce a decision, then we have that the flit size is two words. In this case 'words' is the length of the data path.
Fig. 2 shows a block diagram of a router R according to the second embodiment. The router R comprises N inputs Il - 14 and N outputs Ol - O4. In the Fig. 2, N is chosen to be 4 for illustration purpose only. The router R will further comprise an input queuing means IQM and a switch fabric unit SFU. As mentioned above, the packets are divided into flits to be transmitted over the network on chip. The flits will arrive at the inputs of the router R and need to be distributed by the router R to the correct outputs according to the information as contained in the packet header. The switch fabric unit SFU is then used to switch the information from the input to the respective outputs, i.e. the switch fabric unit SFU is a N-by-N switch. Especially for a best effort routing scheme, the routers will need to buffer some of the arriving flits. According to the second embodiment, the buffering of the flits is performed by the input queuing means IQM. Input queuing means IQM comprises a plurality of queuing units QUl - QU4 which can each be implemented as a number of FIFO queues. In contrast to a router based on multiple input queuing, the router according to the second embodiment is based on a flexible input queuing scheme, i.e. the number of inputs queues (per queuing unit QU1-QU4) associated to anyone of the inputs Il -14 is flexible and can be adjusted independently of the other inputs. In contrast to that, the router based on the multiple input queuing comprises equally many input queues for each of the inputs. Such a restriction is removed by the router according to the second embodiment and a flexible number of input queues is now associated with each input or input connection.
For the N inputs Il - 14 each input port i has mi input queues, i.e. each input port is associated to a queuing unit QU1-QU4. Each queuing unit QU1-QU4 can comprise a different number of input queues. Part of the router cost will therefore be determined by the amount of queues. Hence, this cost in a router according to the second embodiment equals
Figure imgf000006_0001
This cost in a design based on multiple input queuing is obviously equal to Nm. The increased flexibility of the router according to the second embodiment can be used to adopt the input queuing according to the actual traffic distribution within the network on chip such that a higher throughput can be achieved with the same number of input queues and thus the same costs.
As an example N is chosen to equal 4 and that the traffic distribution matrix is
Figure imgf000006_0002
Accordingly, for the input port II, the traffic is destined for output ports 2, 3, or 4 based on equal probability. At input 12, the traffic is destined for output port 1. At input port 13, the traffic is destined for output 2. At input 4, the traffic is destined for output ports 1, 3, or 4 with equal probability.
Consider a router based on multiple input queuing with m = 2 and a router R according to the second embodiment with ml= 3, m2= 1, m3= 1, and m4= 3. The first input queuing unit QUl will comprise three input queues. The second queuing unit QU2 will comprise one input queue associated to the second input 12. The third input queuing unit QU3 will comprise one input queue associated to the third input 13. The fourth queuing unit QU4 comprises three input queues associated to the fourth input 14. The router based on multiple queuing with the standard iSLIP algorithm (using two iterations) achieves a throughput of 0.773 with Nm= 8 queues. The router according to the second embodiment achieves a throughput of 0.833 with a total of
Σ N mt = 8 queues.
The input queuing scheme according to the second embodiment is based on the idea to associate a number of input queues with the input port based on an arbitrary integer. In other words, the numbers of input queues associated to anyone of the inputs can vary from one input to another. Preferably, the numbers of input queues within a queuing unit QU1-QU4 associated to an input corresponds to the traffic distribution for that input. If the traffic distribution is high, more input queues will be associated to the respective input or input connection.
Fig. 3 shows a block diagram of a network on chip according to the third embodiment. The network interface NI comprises an input queuing means IQM with a plurality of queuing units QUl -QU4 which can each be implemented as a number of FIFO queue. Accordingly, the input queuing means IQM according to the third embodiment corresponds to the input queuing means IQM according to the second embodiment. In other words, the input queuing means IQM according to the third embodiment implements the flexible input queuing scheme as described according to the second embodiment. The number of FIFO input queues associated to each of the queuing units QUl - QU4 will be flexible and preferably depends on actual traffic distribution within the network on chip NOC.
Fig. 4 shows a block diagram of a router according to a fourth embodiment. The router has N inputs P1-P4 and N output ports, i.e. N=4, and comprises a switch fabric unit SFU as well as an input queuing means IQM for managing the input queuing. Every input port Pl -P4 is connected or associated to a queuing unit QU1-QU4. The queuing units QU1-QU4 can be part of a queuing means IQM. Every queuing unit QU1-QU4 is associated with every port P1-P4 and is in its nature flexible and can comprise several input queues IQ. For instance, port 1 having a number of streams not specified will have a queuing unit QUl associated with it (port 1), and this queuing unit QUl behaves or is programmed as 4 input queues IQ. On the other hand, port 3 will be associated with the queuing unit QU3 and internally will be configured as a larger single queue.
According to a fifth embodiment of the invention a network interface NI is provided with an implementation of the input queuing according to the fifth embodiment. A router according to a sixth embodiment is based preferably on the router according to the second embodiment. The input queuing means IQM is further able to perform pre-routing. If one of the queuing units QU1-QU4 comprises more than one FIFO queue associated to one input, the input queuing means IQM distribute the incoming traffic over the input queues. This has to be performed for each of the inputs if more than one FIFO queue is associated to it. One scheme to distribute the incoming traffic is the odd-even rule. For a router based on the multiple input queuing scheme with two input queues associated to each of its input, the input connection can identify the queues as odd numbered input queue and even numbered input queue. The traffic destined for an odd number connection is routed to the odd numbered input queue and the traffic destined for the even numbered output connection is routed to an even numbered input queue.
Fig. 5 shows a graph in order to illustrate different pre-routing techniques. A router based on the multiple input queuing scheme may comprise N inputs and outputs with m = 2, i.e. two input queues are associated to each of the input. A uniform traffic pattern is considered and the standard iSLIP algorithm (with two iterations) is applied to switch the inputs of the routers to its outputs. Such a router based on the odd-even pre-routing scheme OE is depicted in Fig. 5.
However, a router according to the sixth embodiment is based on the following pre-routing scheme H2S as depicted in Fig. 5. At the input i, the traffic destined for the outputs I,I + 1 mod N, ... , i + N 12 - 1 mod N is pre-routed to the odd numbered queue while the remaining traffic is pre-routed to the even numbered queue.
Additionally, in Fig. 5 the router with the standard input queuing design IQ is shown. The performance thereof is clearly worse compared to the other two schemes. Accordingly, even for a uniform traffic distribution, the flexible pre-routing scheme as described with regard to the fourth embodiment is better than a fixed pre-routing. This result can even be bigger for any non-uniform traffic distribution.
The above-described pre-routing scheme according to the sixth embodiment can be applied to the router according to the second and fourth embodiment as well as to the network interface according to the third and fifth embodiment. The pre-routing scheme can even be applied to the router based on the multiple input queuing scheme, i.e. with the same number of input queues associated to each of the inputs or input connection. In other words, although the best results may be achieved for routers and network interfaces with flexible input queuing as well as with flexible pre-routing according to the sixth embodiment, the pre- routing scheme according to the sixth embodiment may also be advantageously implemented for a router on the multiple input queuing.
A flexible pre-routing scheme may also be applied to a network interface based on the multiple input queuing scheme.
It should be noted that the above described input queuing schemes and flexible pre-routing schemes can be applied to routers as well as network interfaces within a network on chip according to Fig. 1. Therefore, the network on chip NOC according to Fig. 1 may comprise a plurality of routers according to the second embodiment of Fig. 2, plurality of routers according to the fourth embodiment of Fig. 4, a plurality of network interlaces according to the third embodiment of Fig. 3, a plurality of network interfaces according to the fifth embodiment, a plurality of routers based on the multiple input queuing scheme with the pre-routing scheme according to the sixth embodiment, a plurality of network interfaces based on the multiple input queuing scheme and the flexible pre-routing scheme according to the sixth embodiment. A plurality of routers according to the second embodiment with the flexible pre-routing according to the sixth embodiment as well as a plurality of network interlaces according to the third embodiment with the flexible pre-routing according to the sixth embodiment.
Preferably, the number of input queues associated to an input connection or input port is determined or defined at design time. According to a seventh embodiment, a given traffic distribution matrix with traffic intensities of origin-destination pairs can be considered for a given network structure and routing algorithm. A simulation of the behavior thereof including the traffic pattern for each router can be performed in order to improve the performance of the overall system, input queues can be moved from one input or input connection of a router or a network to another router or network interface to another input of a router. Accordingly, merely by using some standard stochastic optimization algorithm, such as simulated annealing the design of a network on chip, in particular the design of the queue placement can be optimized.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. In the device claim in numerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are resided in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Furthermore, any reference signs in the claims shall not be constitute as limiting the scope of the claims.

Claims

CLAIMS:
1. Electronic device, comprising: a network interconnect means (NOC) for coupling processing units (IP); said network interconnect means (NOC) comprise a plurality of interconnect means (NI, R) each having a number (N) of inputs (11-14; P1-P4) and an input queuing means (IQM) for queuing inputs of the interconnect means (NI, R); wherein the input queuing means (IQM) comprises a plurality of queuing units (QUl -QU4), and is adapted to associate one of the queuing units (QUl -QU4) to each of the number (N) of inputs (11-14; P1-P4), wherein at least one queuing unit (QUl) associated to one of the inputs comprises a number of input queues different from that of an other queuing unit (QU2).
2. Electronic device according to claim 1, wherein the input queuing means (IQM) is adapted to associate an individual number of input queues (IQ) to each queuing unit (QUl -QU4) associated to one of the inputs (11-14) according to a traffic distribution in the network interconnect means (NOC).
3. Electronic device according to claim 2, wherein each of said plurality of interconnect means (NI, R) comprises a network interlace (NI) or a router (R).
4. Electronic device according to claim 1 or 2, wherein the input queuing means (IQM) is adapted to pre-route the inputs (11-14) to the odd-numbered or even numbered queuing units (QUl -QU4) according a function based on the destined output and a modulo function on the number (N) of inputs (11-14; P1-P4).
5. Method for input queuing within an electronic device having a network interconnect means (NOC) for coupling processing units (IP); wherein said network interconnect means (NOC) comprise a plurality of interconnect means (NI, R) each having a number (N) of inputs (11-14) and an input queuing means (IQM) with queuing units (QUl- QU4) for queuing inputs of the interconnect means (NI, R); comprising the step of: individually associating a number of input queues (IQ) to each of the queuing units (QU1-QU4) associated to one of the of inputs (11-14; P1-P4).
PCT/IB2006/050903 2005-04-05 2006-03-23 Electronic device and method for input queuing WO2006106445A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP05102680 2005-04-05
EP05102680.5 2005-04-05

Publications (1)

Publication Number Publication Date
WO2006106445A1 true WO2006106445A1 (en) 2006-10-12

Family

ID=36607449

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/050903 WO2006106445A1 (en) 2005-04-05 2006-03-23 Electronic device and method for input queuing

Country Status (1)

Country Link
WO (1) WO2006106445A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6594234B1 (en) * 2001-05-31 2003-07-15 Fujitsu Network Communications, Inc. System and method for scheduling traffic for different classes of service
EP1482688A2 (en) * 1997-09-05 2004-12-01 Nec Corporation Large capacity, multiclass core ATM switch architecture
US20050047338A1 (en) * 2003-08-25 2005-03-03 Andiamo Systems, Inc., A Delaware Corporation Scalable approach to large scale queuing through dynamic resource allocation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1482688A2 (en) * 1997-09-05 2004-12-01 Nec Corporation Large capacity, multiclass core ATM switch architecture
US6594234B1 (en) * 2001-05-31 2003-07-15 Fujitsu Network Communications, Inc. System and method for scheduling traffic for different classes of service
US20050047338A1 (en) * 2003-08-25 2005-03-03 Andiamo Systems, Inc., A Delaware Corporation Scalable approach to large scale queuing through dynamic resource allocation

Similar Documents

Publication Publication Date Title
US11469922B2 (en) Data center network with multiplexed communication of data packets across servers
US7433363B2 (en) Low latency switch architecture for high-performance packet-switched networks
US7961721B2 (en) Router, network comprising a router, method for routing data in a network
US7113505B2 (en) Mesh architecture for synchronous cross-connects
US20130188486A1 (en) Data center network using circuit switching
Nilsson Design and Implementation of a hot-potato Switch in a Network on Chip
Xia et al. A practical large-capacity three-stage buffered Clos-network switch architecture
Goossens et al. Internet-router buffered crossbars based on networks on chip
Hassen et al. A scalable multi-stage packet-switch for data center networks
WO2006106445A1 (en) Electronic device and method for input queuing
US20100002601A1 (en) Methods for hardware reduction and overall performance improvement in communication system
Yébenes et al. Modeling a switch architecture with virtual output queues and virtual channels in HPC-systems simulators
Kleban Packet dispatching using module matching in the modified MSM Clos-network switch
Papaphilippou et al. Experimental survey of fpga-based monolithic switches and a novel queue balancer
Wu et al. A fault-tolerant routing algorithm for a network-on-chip using a link fault model
JP7455137B2 (en) Method and apparatus for improved data transfer between processor cores
Hassen et al. A scalable packet-switch architecture based on OQ NoCs for data center networks
Tang et al. An advanced nop selection strategy for odd-even routing algorithm in network-on-chip
Khawaja et al. Prioritized direction based switch for bufferless network on chip architecture
Zhao et al. Topological properties and routing algorithms in cellular router
Rekha et al. Analysis and Design of Novel Secured NoC for High Speed Communications
Hassen et al. High-radix packet-switching architecture for Data Center Networks
Hassen et al. Providing performance guarantees in data center network switching fabrics
Sabry et al. A New Dynamic Routing Algorithm for Networks-on-Chips
He et al. Ultra-large feedback-based switch implementation for data center networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

NENP Non-entry into the national phase

Ref country code: RU

WWW Wipo information: withdrawn in national office

Country of ref document: RU

122 Ep: pct application non-entry in european phase

Ref document number: 06727726

Country of ref document: EP

Kind code of ref document: A1

WWW Wipo information: withdrawn in national office

Ref document number: 6727726

Country of ref document: EP