WO2000042745A1 - Switching arrangement - Google Patents

Switching arrangement Download PDF

Info

Publication number
WO2000042745A1
WO2000042745A1 PCT/IB1999/001970 IB9901970W WO0042745A1 WO 2000042745 A1 WO2000042745 A1 WO 2000042745A1 IB 9901970 W IB9901970 W IB 9901970W WO 0042745 A1 WO0042745 A1 WO 0042745A1
Authority
WO
WIPO (PCT)
Prior art keywords
output
input
packet
switching device
packets
Prior art date
Application number
PCT/IB1999/001970
Other languages
French (fr)
Inventor
Marco C. Heddes
Original Assignee
International Business Machines Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corporation filed Critical International Business Machines Corporation
Priority to AU14038/00A priority Critical patent/AU1403800A/en
Publication of WO2000042745A1 publication Critical patent/WO2000042745A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/50Overload detection or protection within a single switching element
    • H04L49/505Corrective measures
    • H04L49/506Backpressure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/101Packet switching elements characterised by the switching fabric construction using crossbar or matrix
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/103Packet switching elements characterised by the switching fabric construction using a shared central buffer; using a shared memory
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/20Support for services
    • H04L49/205Quality of Service based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric
    • H04L49/253Routing or path finding in a switch fabric using establishment or release of connections between ports
    • H04L49/254Centralised controller, i.e. arbitration or scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports
    • H04L49/3027Output queuing

Definitions

  • the present invention relates to a switching arrangement for packets of data, particularly for fixed-size packets like ATM-packets. More particularly, it is related to a switching arrangement with several input ports and several output ports and which is determined for the transportation of incoming packets according to their header to one or more designated output ports and from there to a subsequent device.
  • EP 312628 is described a switching apparatus for interconnecting a plurality of incoming and outgoing transmission links of a communication network, or for exchanging data between incoming and outgoing computer- and workstation connection links. Furthermore, known packet formats are described.
  • the PRIZMA chip has 16 input ports and 16 output ports which provide a port speed of 300-400 Mbit/s.
  • the switch's principle is first to route incoming packets through a fully parallel I/O routing tree and then to queue the routed packets in an output buffer.
  • the chip uses a separation between data (payload) and control (header) flow. Only the payloads are stored in a dynamically shared output buffering storage. With this architecture head-of-the-line-queueing is avoided.
  • the PRIZMA chip has a scalable architecture and hence offers multiple expansion capabilities with which the port speed, the number of ports and the data throughput can be increased. These expansions can be realized based on a modular use of the PRIZMA. Also single-stage or multi-stage switch fabrics can be constructed in a modular way.
  • the PRIZMA chip is especially suited for broadband telecommunications, based on ATM, i.e. the Asynchronous Transfer Mode.
  • ATM is based on short, fixed-length packets, often called cells and is supposed to be applied as the integrated switching and transmission standard for the future public Broadband Integrated Services Digital Network (BISDN).
  • BIOSDN Public Broadband Integrated Services Digital Network
  • PRIZMA's topology and queuing arrangement for contention resolution employs a high degree of parallelism.
  • the routing function is performed in a distributed way at the hardware level, referred to as self-routing.
  • ATM packets are classified into several packet types, particularly packet types with different payload sizes, and the PRIZMA chip is dedicated to handle packets with a payload up to 64 bytes. However, also packet payloads with 12, 16, 32 or 48 bytes are often to be transported.
  • the performance of the PRIZMA chip can be increased in various ways.
  • the chip can be arranged in a multistage or in a singlestage arrangement.
  • the number of needed switches grows slower than in a comparable singlestage arrangement, i.e with growing number of ports a multistage arrangement needs fewer switches than a singlestage arrangement.
  • the performance of a multistage arrangement is lower because of increased latency and because of the possibility of backpressure due to total use of an output queue by one connection which prevents processing of cells with other destinations or a total use of the packet memory which blocks all switch inputs and propagates towards the preceding stage.
  • This lower performance can to a certain extent be compensated by a speedup factor.
  • This means that the switch is running at a higher speed than its environment. Then, an output buffer is needed behind the switch to queue the faster incoming packets which are sent out from the last stage and are to be passed over to the following hardware environment at a lower speed.
  • Another possibility is to increase switch-internal memory, such that total use is less likely. Such bigger memory is however extremely expensive and to some extent also physically limited. Increasing switch memory by the memory expansion mode avoids the physical limit but is nevertheless expensive.
  • a backpressure signal is generated for all inputs which backpressure signal is again transferred to all preceding chips.
  • the backpressure can be selectively blocking only cells heading for the full output queue.
  • all inputs are to be blocked. The backpressure signal blocks the preceding switch in that this switch can no longer send cells.
  • Backpressure is a mechanism which is occuring between two serially connected stages, be it two switches, e.g. in a multistage environment, or an adapter and a switch.
  • Backpressure signalisation is the mechanism that signalizes to the preceding device that the subsequent device is busy and not able to handle further packets.
  • a switch there are two sources of backpressure. Either the memory of the switch is full or an output queue is full. In the first case, the switch is totally busy, in the second case, the switch can no longer handle packets destinated for the respective output. In the memory-full case, hence all inputs of this switch have to be blocked irregard of the destinations of the packets. This is also called a master backpressure. In the output-queue-full case, in principle not all inputs need be blocked.
  • Two mechanisms for the output-queue-full case exist a first being the blocking of the input means in total when an output queue is blocked, which leads to a rapid increase in blocking upstream.
  • the backpressure is selective and blocks only an input which wants to send a packet to the busy output. Then, this input is blocked however even if it receives further packets which are heading for another output. This is called head-of-the-line blocking.
  • head-of-the-line blocking When the input is blocked, the corresponding output of the preceding switch is blocked via its output controller. If that output in the preceding switch is blocked, its corresponding output queue is also blocked and not further processed. This means that the blocked output queue is not emptied but meanwhile still filled from several inputs. This again leads to backpressure, namely when the blocked output queue is also full.
  • Backpressure from an output queue worsifies performance the longer the output queue is full, because the probability that an input of that switch delivers a packet which is heading for that output rises with time. This means that the longer an output queue is blocked, the more inputs will be blocked irregard of arriving packets with other destinations, hence suffering from the head-of-the-line effect. Such backpressure may then spread again backwards and block other output queues in other switches.
  • an intermediate buffer is arranged preferrably after each output port of the switching device. Since the ouput port relieves backpressure from the switch-internal structure, particularly the output queues, detrimental multiplicative backspreading of backpressure is reduced.
  • a buffering means also called intermediate buffer
  • the intermediate buffer can be dedicated to one output of the preceding switch. It hence prevents blocking of the corresponding output queue in the preceding switch, which output queue is hence still receptive for cells.
  • the buffering means comprises several queues for buffering incoming packets with different priority and/or order means for reordering buffered packets according to their priority and/or priority means for transmitting a packet with a higher priority before a packet with a lower priority
  • the upstream-blocking effect of backpressure is reduced, particularly the head-of-the-line blocking effect is reduced, because packets that are routable through the switch to non-blocked outputs can pass by blocked packets.
  • the number of buffering means can be reduced in that buffering means are shared between output ports.
  • a link-paralleled arrangement has the advantage of bigger packet processing speed, because parts of the packets can be processed in parallel.
  • a threshold means for signalizing to the switching device that a predetermined number of packets is buffered in the buffering means serves as backpressure signalizer for the switching device which hence can ract accordingly by reducing outgoing traffic in order to avoid packet loss.
  • Fig. 1 a double-stage arrangement with four switching devices
  • Fig. 2 a single-stage arrangement with four switching devices
  • Fig. 3 a first embodiment of a buffering means
  • Fig. 5 a third embodiment of a buffering means in a link-paralleling arrangement.
  • figure 1 is depicted, a double-stage arrangement with a first stage comprising a first switching device 11 and a third switching device 12 and with a second stage comprising a second switching device 13 and a fourth switching device 14.
  • the first switching device 11 has eight input ports 21, 22 of which for sake of clarity only two are depicted, which all lead to an input means 86 which is connected to an output queue access manager 18, a storage input controller 1500 and a storing means 87.
  • the output queue access manager 18 is connected to eight output queues 261-268, of which for sake of clarity only two are depicted.
  • the storing means 87 gets controlled by a storage output controller 1600 and delivers output to eight output routers 171, 178 of which again for sake of clarity only two are depicted and which all are connected to an address manager 71 which is itself connected to the input means 86.
  • the output routers 171-178 each have an ouput which leads to a respective output port 35, 36, of which only two are shown.
  • All other switching devices 12, 13, 14 are built the same way, whereby the second switching device 13 has eight input ports 25, 26 and eight output ports 31, 32, the third switching device 12 has eight input ports 23, 24 and eight output ports 37, 38 and the fourth switching device 14 has eight input ports 27, 28 and eight output ports 33, 34.
  • a first intermediate buffer 81 is arranged between the output port 35 of the first switching device 11 and the input port 25 of the second switching device 13.
  • a second intermediate buffer 82 is arranged between the output port 37 of the third switching device 12 and the input port 26 of the second switching device 13.
  • the output port 36 of the first switching device 11 is connected to the input port 27 of the fourth switching device 14 via a third intermediate buffer 83.
  • the output port 38 of the third switching device 12 and the input port 28 of the fourth switching device 14 are connected via a fourth intermediate buffer 84.
  • the input means 86 here comprises a number of input controllers which incorporate the function of a translation means. Each one of said input controllers is connected to one corresponding of a number of ensuing input routers. Every input router is connected via one router connection line to every one of a number of input selectors. The input routers, the router connection lines and the input selectors together form the input means 86.
  • Each of the input selectors is connected to one corresponding storage group in the storing means 87.
  • Each of the storage groups has as switch control means its assigned group input controller and as group output controlling means its assigned group output controller. Every group input controller is connected via a bidirectional line to each input controller.
  • Each storage group comprises a group of e.g. four storage cells and has four corresponding storage outputs. Hence there are 128 storage cells, divided up in groups of four.
  • the storage input controller 1500 comprises the group input controllers.
  • the storage output controller 1600 comprises the group output controllers.
  • the storage groups together form the storing means 87. Every output of every storage group is connected to every one of the eight output routers 171-178 which together are defined as output means. Each of the eight output routers 171-178 is connected to one of eight output controllers.
  • the address manager 71 serves as bookkeeping means and comprises an address lookup table which contains a storage matrix with 32 rows and 3 columns.
  • the first column contains comparators which are dedicated to the storage group addresses of the storage groups.
  • the second column contains a first counter and the third column a second counter.
  • the address manager 71 further comprises an incrementing section and a decrementing section.
  • the address lookup table, the incrementing section and the decrementing section together form a counting means.
  • the incrementing section is connected to all eight input controllers 111-118 and to all eight input routers.
  • the decrementing section has also eight inputs being connected via eight decrementer input lines to the eight output controllers and connected to the eight output routers 171-178.
  • the eight decrementer input lines between the decrementing section and the output routers 171-178, respectively the output controllers are also connected to each of the 32 group output controllers.
  • the decrementing section as well as the incrementing section are connected to the address lookup table which itself is connected to an address queue with 32 queuing places.
  • the address queue has an output which leads to every input controller.
  • the output queue access manager 18 serves as queue controlling means and has eight inputs which are connected respectively to the eight input controllers. The eight inputs are separated into two input groups of four inputs each. Every input group is connected to a group selection switch which has four output lines grouped together as one output group.
  • the output queue access manager 18 further comprises a switch controller which controls the group selection switch and has eight inputs deriving from the input controllers.
  • An incoming fixed size packet e.g. an ATM-packet is for example arriving at one input port and is entering the corresponding input controller.
  • the packet contains information comprising a header, also referred to as data destination part and a payload, also called data content part.
  • the header contains a target information about to which of the output ports 35- 36 this packet is to be sent. This target information is encrypted in the packet header as a number.
  • the corresponding input controller acts as translation means and therefor contains a list of numbers and a list of corresponding data patterns, e.g. bitmaps. The number of the incoming target information is compared with the stored list of numbers until a matching number has been found. The corresponding bitmap is read out from the list and assigned to the received packet.
  • the target information is by this procedure hence translated into the dedicated bitmap.
  • This bitmap contains eight bits, one for each output port 35-36. The contained bits indicate in binary form if the respective output port shall receive this packet or not. Every logical 1 in this bitmap means that the respective output port shall receive a copy of the packet. By this bitmap hence a selection of output ports is designated. As will be explained below, this is a sophisticated way to handle multicast packets.
  • the assigned bitmap is sent to the switch controller of the output queue access manager 18. The payload of the incoming packet is delivered to the corresponding input router.
  • the address manager's address queue offers to each input controller the number of a free storage group address, i.e. an address of a storage group which is not occupied by undelivered packet payloads.
  • One free storage group address is delivered to each input controller by the address queue wherefrom it is at the same time removed. For a high performance, every input controller has already received a storage group address before a packet has arrived.
  • the receiving input controller further sends the assigned bitmap to the incrementing section.
  • the incrementing section the sum of logical 1 's in the received bitmap is calculated and sent to the address lookup table. There the first counter of the respective storage group address is set to the received value.
  • the receiving input router has already set up a connection to the corresponding input selector of the storage group whose storage group address it has received.
  • the input selector has automatically switched the incoming connection to the corresponding storage group.
  • the connections to the input selectors are for a high performance all already set up when a packet arrives.
  • the output queue access manager 18 is connecting the input groups one after the other to the output group and hence to all output queues 261-268. This is done by the switch controller which controls the switching process of the group selector switch.
  • the payload address consisting of the received storage cell number and of the received storage group address is then written to the output queues 261-268 in convenience with the bitmap, received by the switch controller. Only the designated ones of the output queues 261-268, i.e. the ones which have a logical 1 in the bitmap receive the payload address.
  • the payload address is stored in the respective output queue 261-268 of every output port which has to receive a copy of the incoming packet.
  • the payload of a packet is stored in the storage cell and its destination is determined in that the output queues 261-268 assigned to the designated output ports contain the entries of the payload address in the corresponding output queues 261-268.
  • the addressed storage group remains active until all four storage cells have been used. Thereby the stored payloads need not be heading for the same output ports. Then a new storage group is selected from the address queue for the storage of the next four packet payloads. The storing process within one storage group is always performed sequentially.
  • every output controller receives from its correspondent output queue 261-268 the next payload address which contains the address of the corresponding storage group and the number of the corresponding storage cell where the next payload for this output port is stored.
  • the receiving output controller signals to the group output controller of the storage group which contains the storage cell with the received storage group address that it has to prepare to transmit the stored packet payload.
  • the output controller receives the payload address from the output queue 268 in form of the storage group and the storage cell inside of the storage group.
  • the corresponding output router 178 receives also the payload address from the output controller and sets up a connection between the storage cell with the respective storage cell number and the output controller. Then the group output controller also provides a read pointer and resets this read pointer to the first byte and transmits simultaneously all packets in the storage group to its storage outputs.
  • the comparator realizes that the storage group address of the corresponding storage group is entered again into the address queue.
  • All storage groups are independent and can receive packet payloads independently, hence asynchronously. However, only one payload can be written at once into a storage group.
  • the described arrangement can perform within each storage group a synchronous readout of the contained storage cells. This means that a second output controller willing to read a packet payload from the same storage group must wait until the packet payload being currently read is read out entirely, i.e the read pointer has reached again the first byte of the storage cells. This fact may serve as a criterion for the dimensioning of the size of the storage groups.
  • a small number of storage cells grouped in one storage group means a low probability that several output ports simultaneously request to receive packet payloads from the same storage group.
  • a high number of storage cells in one storage group means a lesser the expenditure on hardware, namely the input routers and the input selectors, because one storage group has only one input.
  • the storage cells within one storage group should all be filled up to an equal extent. Then, during the readout procedure the read pointer from the group output controller is reset to the first byte of the storage cells immediately after having reached the last occupied byte in all storage cells, which is faster than waiting for the read pointer to reach the last byte of the storage cells.
  • the readout process and the storing process are generally independent from each other. Hence, a readout process for a certain packet payload can already be started when its first byte has been stored.
  • the readout of payloads from the various storage groups can be performed asynchronously as well as synchronously.
  • all storage groups are synchronized for reading. This means that there is an additional latency in synchronous transmission mode due to waiting for the synchronization point, which here is defined as the point of time when the write pointer points to the first byte of all corresponding storage packets in its corresponding storage group.
  • the output routers 171-178 can be realized as a blocking output routers. This means that the output routers 171-178 while performing readout of a payload from a selected storage group 1401-1432 prevent all accesses of other output routers 171-178 to the same storage group 1401-1432. Then the output routers 171-178 can be decreased in size by arranging behind every storage group a multiplexing device with one output and four inputs, the output being connected to every output router 171-178, the inputs being connected to the outputs of the respective storage group. This multiplexing device then acts like an inverted version of the packet selector switch and allows access only to one of the storage packets of a storage group at one moment in time. Longer access times for blocked output ports hence appear more likely. However, as already explained, a low probability of coincidental simultaneous access of several output ports to the same storage group is the fact and will keep this delay acceptable low.
  • bitmaps can be processed in parallel, as well as the corresponding received payload addresses can be parallely processed.
  • the following numeric example enlightens the background.
  • the storing procedure of a packet payload with a size of 12 bytes, while assuming a clock cycle of 8 ns and a processing speed of one byte per clock cycle takes a total storing time of 96 nanoseconds. This is the maximal acceptable time during which the storage group address queuing procedure for all input ports 101-108 has to be completed.
  • this backpressure blocks via the ouput queue controller of the same switching device 13 its input means, in that the input ports 25, 26 which deliver packets heading for the first output port of the second switching device 13 are blocked.
  • the backpressure of one output queue already spreads upstream and influences a plurality of input ports 25, 26. Without the first intermediate buffer 81, the backpressure on the input port 25 would directly influence the first switching device 11 in that the corresponding output router 171 is not allowed to send any more packets to that input port 25.
  • the ouput router 171 is hence not allowed to process the corresponding output queue 261. This can lead to a second head-of-the-line problem in the case of multicast packets. Since at least one destination for a multicast packet is blocked, the whole deliverance can not be performed and the packet blocks a place in the storing section 87 as well as one in the address manager 71. This blocking also occurs in the other output queues where this packet has an entry for its destination.
  • the blocked output queue 531 again can transfer the backpressure upstream because it is prevented from being emptied but still receives entries from the output queue access manager 18.
  • the same effect as in the second switching device occurs, namely that the input means 86 is blocking inout ports which deliver packets heading for the first output queue 261. Again with time more and more input ports 21, 22 will be blocked.
  • Multicast packets are realized by storing only once their payload and entering the respective payload address into several output queues 261-268.
  • the choice of the output queues 261-268 is determined by the bitmap which gives e.g. a 1 for each output that has to receive a copy of the packet.
  • the counting means provides control that the corresponding storage cells and the corresponding storage groups are not used again until the last copy of the multicast packet payload stored therein has been read.
  • the intermediate buffers 81, 82, 83, 84 prevent backpressure from rapidly extending upstream. Since all effects described before have some multiplicative factor in their effect, the buffer effects not only a linear performance increase but a much higher performance increase. The basic reason for that is the fact that the buffer is designed to prevent backpressure fom only one source to reach elements which when receiving backpressure have negative impact on many therewith connected switching device elements, e.g. the input means 86.
  • the backpressure is led to a queue which is only dedicated to one output of the preceding switch. Since this queue serves as a decoupling element in that it buffers the backpressure to a certain extent, the probability that backpressure is extending to the preceding stage is reduced. Hence not only the backspreading backpressure effect but also the forwarding backpressure effect is reduced.
  • the buffering means acts as if it were not there when normal traffic is processed.
  • the switching device is best suited to be scaled up to enhance its performance. It is useable therefor as a scaleable module for a scaled system.
  • Size expansion i.e. an increase of the number of ports
  • memory expansion for achieving a higher data throughput
  • speed expansion for achieving a higher port speed.
  • the first switching device 11 has two input ports 21, 22 and two output ports 35, 36.
  • the second switching device 12 has two input ports 23, 24 and two output ports 37, 38.
  • the third switching device 13 has two input ports 25, 26 and two output ports 31, 32.
  • the fourth switching device 14 has two input ports 27, 28 and two output ports 33, 34.
  • a first system input 51 is connected to a first selector means 41 which has two outputs, one connected to input port 21 and one to input port 23.
  • a second system input 52 is connected to a second selector means 42 which has two outputs, one connected to input port 22 and one to input port 24.
  • a third system input 53 is connected to a third selector means 43 which has two outputs, one connected to input port
  • a fourth system input 54 is connected to a fourth selector means
  • a first arbitrator 45 has as first input output port 35 and as second input output port 37. Its output is defined as a first system output 55.
  • a second arbitrator 46 has as first input output port 36 and as second input output port 38. Its output is defined as a second system output 56.
  • a third arbitrator 47 has as first input output port 31 and as second input output port 33. Its output is defined as a third system output 57.
  • a fourth arbitrator 48 has as first input output port 32 and as second input output port 34. Its output is defined as a fourth system output 58.
  • the whole arrangement now has four input ports, namely the system inputs 51-54 instead of two and also four output ports, namely the system output ports 55-58, but provides full connectability between all input ports 21-28 and output ports 31-38.
  • the selectors 41, 42, 43, 44 serve as address filter.
  • the purpose of such an address filter is to choose to which of the switching devices 11, 12, 13, 14 an incoming packet has to be sent. This is done by using the header of the incoming packet. For instance the whole packet can be duplicated and sent to a filter unit for each switching device 11, 12, 13, 14 whereby only one of the filters is permeable for the packet.
  • a filter unit can also be located in the switching devices 11-14.
  • the arbitrators 45, 46, 47, 48 choose which of the switching devices 11, 12, 13, 14 has the right to give his packet from its output port 31-38 to one of the system outputs 55, 56, 57, 58. Destination control with this arrangement is mainly restricted to an automatism using directly the destination information from the packet header, respectively data destination part.
  • the first arbitrator 45 here comprises the first intermediate buffer 81.
  • the second arbitrator 46 comprises the second intermediate buffer 82
  • the third arbitrator 47 here comprises the third intermediate buffer 83
  • the fourth arbitrator 48 here comprises the fourth intermediate buffer 84.
  • the second switch 12 is not suffering immediately from this blockage and can hence continue operation particularly with packets heading for destinations that can be reached via the second arbitrator 46.
  • the blocked second switch 12 can effect a blocking in the third and fourth selector means 43, 44 which hence affects also the fourth switch 14, it is conceivable that the simple blockage of one switch also in a single-stage arrangement can have a disastrous effect on the overall arrangement performance. Also the probability of a deadlock situation is here reduced, namely e.g.
  • the first arbitrator 45 waiting for a packet from the first switch 11, hence blocking the second switch 12, which is then unable to deliver packets to the second arbitrator 46 which is waiting for packets from that second switch 12 and hence blocks the first switch 11 which hence cannot send packets to the first arbitrator 45.
  • a mask is generated in the preceding switch reflecting the actual state of the following switch, more precisely its overflow state. Only if this state shows free availability of an output queue, a packet is allowed to be sent. When a packet is sent, it can be immediately deleted in the preceding switch since its acceptance is granted. Since the mask always shows in some sense an outdated state, because during transmission of the mask its state may be already changed, the mask is transmitted with a security margin. This means that the output queue is already marked as full when say 95% of its places are full. This avoids packet loss.
  • the backpressure mechanism is herein in general identical to the backpressure mechanism of the concept which sends packets without looking at a grant mask, since only the threshold is a bit decreased and anticipated in time. Hence with the granting process concept, the intermediate buffer effect is the same.
  • the intermediate buffer can use the information which queues are free in the following switch and sort out packets that suffer from head-of-the-line blocking, which hence could be processed if they were not blocked by another packet and lead them directly to the subsequent switch which for that packet offers a free queue. Therefor, some intelligence in the intermediate buffer is needed for monitoring the destination of buffered, waiting packets and for comparing the free queues with these destinations.
  • FIG. 3 shows a first embodiment of the buffering means 81.
  • the buffering means 81 comprises a queue means 15 which is connected to an output means 20 which has another input connected to a backpressure means 17.
  • the queue means 15 is further connected to a threshold means 16 which has one output.
  • the backpressure means 17 has one input.
  • the queue means 15 has several queuing places which are all in parallel connected to a sorting means 39.
  • Incoming packets are queued in the queue means 15 which serves as a waiting queue, also called FIFO.
  • the output means 20 takes packet after packet from the queue means 15 and transmits it to the output of the buffering means 81.
  • a threshold signal which generates backpressure via the output of the threshold means 16.
  • the sorting means 39 is able to change the order of the queued packets.
  • a sorting method is applicable which puts packets with a higher priority at places which are served before places where packets with a lower priority are stored. It is possible that packets have different priorities, i.e a packet with a higher priority is more important to be transmitted in time than a packet with lower priority.
  • Another sorting mechanism could re-sort the packets each time a backpressure signal is received. By this method, a head-of the line queueing can be avoided, when the re-sorting continues with various permutations until a packet can be delivered.
  • Figure 4 shows an alternative embodiment of the buffering means 81 which compared to figure 3 has the following differences:
  • the sorting means 39 is replaced by a selection means 49 which then provides the input for the output means 20.
  • the selection means 49 selects one packet out of a group of packets buffered in the queue means 25 and directly delivers it to the output means 20 which transmits the packet to the subsequent stage or receiver.
  • the selection can again be done taking into account the packet priority and/or the impossibility to deliver one packet which leads to the selection of another packet. Again, head-of-the-line queueing can therewith be avoided.
  • this embodiment has means for rearranging the buffered packets according to their priorities, respectively or a means that organizes a resequenced transmittal in that it transmits high-prior packets before low-prior packets. To ensure a minimum sending rate for low-prior packets, a minimum transmitting rate for them can be established.
  • the buffering means can have a number of different queues for different priorities, e.g. 4 queues for 4 different priorities. Also mixed arrangements are possible, e.g. grouping two priorities together in one queue and having a priority-sorting means for this queue.
  • the buffering means 81 here comprises the queue means 15 which now has two independent buffering queues PI,
  • the buffering means 81 is here arranged in a so-called link-paralleling arrangement.
  • the buffering means 81 comprises a collecting means 76 which receives input from two output ports 35, 65 of the first switching means 11. The combining of the output ports pairwise effects a performance increase, in that the processing speed per packet is increased, because the packet payload can be split up in two halves which hence need only half of the number of cycles to be pushed through the first switching means 11.
  • the link-paralleling arrangement can generally be used in that one part of each packet is received through one of the paired input ports, using even a number of 3 or more ports for link-paralleling. Hence also only a smaller part of the packet needs be processed serially, while all parts can be processed simultaneously in parallel. With a pairwise link-paralleling, the number of available input/output ports is however halved, if it is not doubled again by using twice the number of switches.
  • the deadlock is a situation where several switches block each other, such that the blocking goes around in a circle and causes itself, i.e. persists even when the original cause has disappeared. This effect is hence fatal because it does not resolve itself and persists on and on.
  • deadlock prevention reserved space in output queues is one solution.
  • the probability of deadlock is with an intermediate buffer also significantly reduced.
  • the switching device's architectural principle is suitable for arbitrary choice of dimension which is also called scalablility. This means that by varying the dimension some or all components of the switching device, the wanted number of input ports, output ports and/or storage groups can be chosen.
  • the incrementing section need not be connected to the input controllers. They can also be connected to the outputs of the output group and receive then only the bitmap-derived incremention value for the input ports which are processed at one point of time there. Further, the bitmap need not be received by the switch controller but can also be received by the input groups. Then an interpretation of the bitmap's content can be performed in the output group.
  • the interconnections which are depicted as being connected via the same line with several components may also be realized as a bus connection or as separated lines.
  • the switching device is suited best for broadband communications, based on ATM.
  • Single-stage or multi-stage switch fabrics can be constructed in a modular way.
  • the shared memory can also be expanded by linking multiple switching devices.
  • the storage packet number needs not be transmitted to the output queue access manager by the group input controller but can get there in different equivalent ways. It suffices that this storage packet number shall be stored in the corresponding output queue to complete the payload address.
  • the function of the selectors 41-44 need not be concentrated in the selectors 41-44 but can be distributed and even be integrated with the corresponding part of the selector 41-44 on a PCB or even inside of a switching device 11-14.

Abstract

A switching arrangement comprising a switching device for transporting incoming packets of data containing a data destination part and a data content part, from a plurality of input ports to a plurality of output ports is proposed. The switching device comprises input means for transporting the data content parts of the incoming packets to storing means which contains a plurality of storage packets. It further comprises output means for reading out the stored data content parts and delivering them to a selection of the output ports, which is determined by the data destination part. Additionally the switching device comprises for at least one of the output ports buffering means which is arranged after the output port.

Description

SWITCHING ARRANGEMENT
TECHNICAL FIELD
The present invention relates to a switching arrangement for packets of data, particularly for fixed-size packets like ATM-packets. More particularly, it is related to a switching arrangement with several input ports and several output ports and which is determined for the transportation of incoming packets according to their header to one or more designated output ports and from there to a subsequent device.
BACKGROUND OF THE INVENTION
Fast switching of information, be it samples of analog signals or alphanumeric data, is an important task in a communication network. The network nodes in which lines or transmission links from various directions are interconnected for exchanging information between them are often the cause of delay in the transmission. If much traffic is concentrated in a node, and if in particular most of the traffic passes through only few of the links, increased delays or even loss of information are often encountered. It is therefore desirable to have switching nodes which allow fast routing and are at least partially non-blocking.
In EP 312628 is described a switching apparatus for interconnecting a plurality of incoming and outgoing transmission links of a communication network, or for exchanging data between incoming and outgoing computer- and workstation connection links. Furthermore, known packet formats are described.
An overview over prior art switching technology is given on the Internet page www.zurich.ibm.com/Technology/ATM/SWOCPWP, wherein an introduction into the PRIZMA Chip is illustrated. Another source for information about this topic is the publication "A flexible shared-buffer switch for ATM at Gbit/s rates" by W.E. Denzel, A.P.J. Engbersen, I. Iliadis in Computer Networks and ISDN Systems, (0169-7552/94), Elsevier Science B.V., Vol. 27, No. 4, pp. 611-624.
The PRIZMA chip has 16 input ports and 16 output ports which provide a port speed of 300-400 Mbit/s. The switch's principle is first to route incoming packets through a fully parallel I/O routing tree and then to queue the routed packets in an output buffer. In addition to this, the chip uses a separation between data (payload) and control (header) flow. Only the payloads are stored in a dynamically shared output buffering storage. With this architecture head-of-the-line-queueing is avoided. The PRIZMA chip has a scalable architecture and hence offers multiple expansion capabilities with which the port speed, the number of ports and the data throughput can be increased. These expansions can be realized based on a modular use of the PRIZMA. Also single-stage or multi-stage switch fabrics can be constructed in a modular way.
The PRIZMA chip is especially suited for broadband telecommunications, based on ATM, i.e. the Asynchronous Transfer Mode. However, the concept is not restricted to ATM-oriented architectural environments. ATM is based on short, fixed-length packets, often called cells and is supposed to be applied as the integrated switching and transmission standard for the future public Broadband Integrated Services Digital Network (BISDN). PRIZMA's topology and queuing arrangement for contention resolution employs a high degree of parallelism. The routing function is performed in a distributed way at the hardware level, referred to as self-routing. ATM packets are classified into several packet types, particularly packet types with different payload sizes, and the PRIZMA chip is dedicated to handle packets with a payload up to 64 bytes. However, also packet payloads with 12, 16, 32 or 48 bytes are often to be transported.
The performance of the PRIZMA chip can be increased in various ways. For the increase of number of input- and output ports, the chip can be arranged in a multistage or in a singlestage arrangement. In the multistage arrangement, the number of needed switches grows slower than in a comparable singlestage arrangement, i.e with growing number of ports a multistage arrangement needs fewer switches than a singlestage arrangement.
However, the performance of a multistage arrangement is lower because of increased latency and because of the possibility of backpressure due to total use of an output queue by one connection which prevents processing of cells with other destinations or a total use of the packet memory which blocks all switch inputs and propagates towards the preceding stage. This lower performance can to a certain extent be compensated by a speedup factor. This means that the switch is running at a higher speed than its environment. Then, an output buffer is needed behind the switch to queue the faster incoming packets which are sent out from the last stage and are to be passed over to the following hardware environment at a lower speed. Another possibility is to increase switch-internal memory, such that total use is less likely. Such bigger memory is however extremely expensive and to some extent also physically limited. Increasing switch memory by the memory expansion mode avoids the physical limit but is nevertheless expensive.
If in a multistage arrangement a subsequent switch is crowded (memory section is full, i.e. no address available, or output queue is full), a backpressure signal is generated for all inputs which backpressure signal is again transferred to all preceding chips. In the case of full output queues, the backpressure can be selectively blocking only cells heading for the full output queue. In the case of full packet memory, all inputs are to be blocked. The backpressure signal blocks the preceding switch in that this switch can no longer send cells.
SUMMARY OF THE INVENTION
Backpressure is a mechanism which is occuring between two serially connected stages, be it two switches, e.g. in a multistage environment, or an adapter and a switch. Backpressure signalisation is the mechanism that signalizes to the preceding device that the subsequent device is busy and not able to handle further packets.
In a switch there are two sources of backpressure. Either the memory of the switch is full or an output queue is full. In the first case, the switch is totally busy, in the second case, the switch can no longer handle packets destinated for the respective output. In the memory-full case, hence all inputs of this switch have to be blocked irregard of the destinations of the packets. This is also called a master backpressure. In the output-queue-full case, in principle not all inputs need be blocked.
Two mechanisms for the output-queue-full case exist, a first being the blocking of the input means in total when an output queue is blocked, which leads to a rapid increase in blocking upstream. In the PRIZMA design, the backpressure is selective and blocks only an input which wants to send a packet to the busy output. Then, this input is blocked however even if it receives further packets which are heading for another output. This is called head-of-the-line blocking. When the input is blocked, the corresponding output of the preceding switch is blocked via its output controller. If that output in the preceding switch is blocked, its corresponding output queue is also blocked and not further processed. This means that the blocked output queue is not emptied but meanwhile still filled from several inputs. This again leads to backpressure, namely when the blocked output queue is also full.
Backpressure from an output queue worsifies performance the longer the output queue is full, because the probability that an input of that switch delivers a packet which is heading for that output rises with time. This means that the longer an output queue is blocked, the more inputs will be blocked irregard of arriving packets with other destinations, hence suffering from the head-of-the-line effect. Such backpressure may then spread again backwards and block other output queues in other switches.
If an input is blocked it can not deliver any packet to any output. The fewer inputs are allowed to send packets due to one single output-queue-full condition, the fewer inputs can deliver their packets to other outputs, which leads to not used resources in the switch. The other output queues are not used as they may be used, since inputs are blocked as a consequence of the head-of-the-line effect. As a consequence, more and more outputs get fewer packets and deliver fewer packets to the following devices. Hence, the backpressure spreads again forward and influences other switching downstream.
With multicast, the effect is even worse, because blocking of only one copy of the packet causes the packet to be not deleted and remain in the queue. Blocking is hence more probable.
The bottleneck arises out of the fact that the output queues are filled from several inputs and hence backwardly spread upstream backpressure to these several inputs. This multiplication factor increases the negative impact of backpressure.
For solving the above explained problem, an intermediate buffer is arranged preferrably after each output port of the switching device. Since the ouput port relieves backpressure from the switch-internal structure, particularly the output queues, detrimental multiplicative backspreading of backpressure is reduced. OBJECT AND ADVANTAGES OF THE INVENTION
It is an object of the invention to provide a switching device with an improved performance. If a buffering means, also called intermediate buffer, is arranged at the output of a switch, backpressure from a subsequent switch does not so easily block this switch but is buffered in the queue.
The intermediate buffer can be dedicated to one output of the preceding switch. It hence prevents blocking of the corresponding output queue in the preceding switch, which output queue is hence still receptive for cells.
When the buffering means comprises several queues for buffering incoming packets with different priority and/or order means for reordering buffered packets according to their priority and/or priority means for transmitting a packet with a higher priority before a packet with a lower priority, the upstream-blocking effect of backpressure is reduced, particularly the head-of-the-line blocking effect is reduced, because packets that are routable through the switch to non-blocked outputs can pass by blocked packets.
When at least another output port of the switching device is leading to the buffering means, the number of buffering means can be reduced in that buffering means are shared between output ports.
A link-paralleled arrangement has the advantage of bigger packet processing speed, because parts of the packets can be processed in parallel.
A threshold means for signalizing to the switching device that a predetermined number of packets is buffered in the buffering means serves as backpressure signalizer for the switching device which hence can ract accordingly by reducing outgoing traffic in order to avoid packet loss. DESCRIPTION OF THE DRAWINGS
Examples of the invention are depicted in the drawings and described in detail below by way of example. It is shown in
Fig. 1, a double-stage arrangement with four switching devices,
Fig. 2, a single-stage arrangement with four switching devices,
Fig. 3, a first embodiment of a buffering means,
Fig. 4, a second embodiment of a buffering means,
Fig. 5, a third embodiment of a buffering means in a link-paralleling arrangement.
All the figures are for sake of clarity not shown in real dimensions, nor are the relations between the dimensions shown in a realistic scale. The various embodiments can be combined in total or in part.
DETAILED DESCRIPTION OF THE INVENTION
In the following the various exemplary embodiments of the invention are described.
In figure 1 is depicted, a double-stage arrangement with a first stage comprising a first switching device 11 and a third switching device 12 and with a second stage comprising a second switching device 13 and a fourth switching device 14.
The first switching device 11 has eight input ports 21, 22 of which for sake of clarity only two are depicted, which all lead to an input means 86 which is connected to an output queue access manager 18, a storage input controller 1500 and a storing means 87. The output queue access manager 18 is connected to eight output queues 261-268, of which for sake of clarity only two are depicted. The storing means 87 gets controlled by a storage output controller 1600 and delivers output to eight output routers 171, 178 of which again for sake of clarity only two are depicted and which all are connected to an address manager 71 which is itself connected to the input means 86. The output routers 171-178 each have an ouput which leads to a respective output port 35, 36, of which only two are shown. All other switching devices 12, 13, 14 are built the same way, whereby the second switching device 13 has eight input ports 25, 26 and eight output ports 31, 32, the third switching device 12 has eight input ports 23, 24 and eight output ports 37, 38 and the fourth switching device 14 has eight input ports 27, 28 and eight output ports 33, 34.
Between the output port 35 of the first switching device 11 and the input port 25 of the second switching device 13, a first intermediate buffer 81 is arranged. A second intermediate buffer 82 is arranged between the output port 37 of the third switching device 12 and the input port 26 of the second switching device 13. The output port 36 of the first switching device 11 is connected to the input port 27 of the fourth switching device 14 via a third intermediate buffer 83. The output port 38 of the third switching device 12 and the input port 28 of the fourth switching device 14 are connected via a fourth intermediate buffer 84.
The input means 86 here comprises a number of input controllers which incorporate the function of a translation means. Each one of said input controllers is connected to one corresponding of a number of ensuing input routers. Every input router is connected via one router connection line to every one of a number of input selectors. The input routers, the router connection lines and the input selectors together form the input means 86.
Each of the input selectors is connected to one corresponding storage group in the storing means 87. Each of the storage groups has as switch control means its assigned group input controller and as group output controlling means its assigned group output controller. Every group input controller is connected via a bidirectional line to each input controller. Each storage group comprises a group of e.g. four storage cells and has four corresponding storage outputs. Hence there are 128 storage cells, divided up in groups of four. The storage input controller 1500 comprises the group input controllers. The storage output controller 1600 comprises the group output controllers. The storage groups together form the storing means 87. Every output of every storage group is connected to every one of the eight output routers 171-178 which together are defined as output means. Each of the eight output routers 171-178 is connected to one of eight output controllers.
The address manager 71 serves as bookkeeping means and comprises an address lookup table which contains a storage matrix with 32 rows and 3 columns. The first column contains comparators which are dedicated to the storage group addresses of the storage groups. The second column contains a first counter and the third column a second counter.
The address manager 71 further comprises an incrementing section and a decrementing section. The address lookup table, the incrementing section and the decrementing section together form a counting means. The incrementing section is connected to all eight input controllers 111-118 and to all eight input routers. The decrementing section has also eight inputs being connected via eight decrementer input lines to the eight output controllers and connected to the eight output routers 171-178. The eight decrementer input lines between the decrementing section and the output routers 171-178, respectively the output controllers are also connected to each of the 32 group output controllers. The decrementing section as well as the incrementing section are connected to the address lookup table which itself is connected to an address queue with 32 queuing places. The address queue has an output which leads to every input controller.
The output queue access manager 18 serves as queue controlling means and has eight inputs which are connected respectively to the eight input controllers. The eight inputs are separated into two input groups of four inputs each. Every input group is connected to a group selection switch which has four output lines grouped together as one output group. The output queue access manager 18 further comprises a switch controller which controls the group selection switch and has eight inputs deriving from the input controllers.
Eight output queues 261-268 have each four inputs which are connected all in parallel to the four output lines. Each output queue 261-268 is dedicated and linked via one of eight queue output lines to one of the output controllers. Data outputs of the output controllers each lead directly to one of eight output ports. The group output controllers provide each a read pointer which is dedicated in common to all four storage cells in the corresponding storage group. The storage cells all have the same dimensions, e.g. a size of 16 bytes. The packets to be handled with this arrangement can have several sizes, referred to as packet types, e.g. a small packet has 12 bytes of payload length, a medium packet has 32 bytes and a big packet has 64 bytes as payload length.
An incoming fixed size packet e.g. an ATM-packet is for example arriving at one input port and is entering the corresponding input controller. The packet contains information comprising a header, also referred to as data destination part and a payload, also called data content part. The header contains a target information about to which of the output ports 35- 36 this packet is to be sent. This target information is encrypted in the packet header as a number. The corresponding input controller acts as translation means and therefor contains a list of numbers and a list of corresponding data patterns, e.g. bitmaps. The number of the incoming target information is compared with the stored list of numbers until a matching number has been found. The corresponding bitmap is read out from the list and assigned to the received packet. The target information is by this procedure hence translated into the dedicated bitmap. This bitmap contains eight bits, one for each output port 35-36. The contained bits indicate in binary form if the respective output port shall receive this packet or not. Every logical 1 in this bitmap means that the respective output port shall receive a copy of the packet. By this bitmap hence a selection of output ports is designated. As will be explained below, this is a sophisticated way to handle multicast packets. The assigned bitmap is sent to the switch controller of the output queue access manager 18. The payload of the incoming packet is delivered to the corresponding input router.
The address manager's address queue offers to each input controller the number of a free storage group address, i.e. an address of a storage group which is not occupied by undelivered packet payloads. One free storage group address is delivered to each input controller by the address queue wherefrom it is at the same time removed. For a high performance, every input controller has already received a storage group address before a packet has arrived.
The receiving input controller delivers the received storage group address to the corresponding input router and also to the corresponding input group in the output queue access manager 18.
The receiving input controller further sends the assigned bitmap to the incrementing section. In the incrementing section, the sum of logical 1 's in the received bitmap is calculated and sent to the address lookup table. There the first counter of the respective storage group address is set to the received value.
When the storage group addresses have been assigned to the input controllers before packets arrive, it is possible to set the corresponding counter already to the number of storage cells in that storage group, e.g. 4, so that only in case of a multicast packet an incrementing step for the incrementing section is needed. This brings the advantage that without increasing the hardware complexity for such waiting storage groups, the comparison of the counter positions delivers a nonequal result which prevents the storage group address from being erroneously reused.
The receiving input router has already set up a connection to the corresponding input selector of the storage group whose storage group address it has received. The input selector has automatically switched the incoming connection to the corresponding storage group. The connections to the input selectors are for a high performance all already set up when a packet arrives.
The corresponding group input controller which controls the corresponding storage group with the storage group address that the receiving input controller has received, receives from the receiving input controller a signal that this storage group is to be written to. The group input controller of the receiving storage group controls the storing of the payloads in this storage group. Since the addressed storage group contains four storage cells, it is able to store four packet payloads of the small-sized packet type.
The output queue access manager 18 is connecting the input groups one after the other to the output group and hence to all output queues 261-268. This is done by the switch controller which controls the switching process of the group selector switch. The payload address consisting of the received storage cell number and of the received storage group address is then written to the output queues 261-268 in convenience with the bitmap, received by the switch controller. Only the designated ones of the output queues 261-268, i.e. the ones which have a logical 1 in the bitmap receive the payload address. The payload address is stored in the respective output queue 261-268 of every output port which has to receive a copy of the incoming packet.
Hence with the above arrangement, respectively method, the payload of a packet is stored in the storage cell and its destination is determined in that the output queues 261-268 assigned to the designated output ports contain the entries of the payload address in the corresponding output queues 261-268.
The addressed storage group remains active until all four storage cells have been used. Thereby the stored payloads need not be heading for the same output ports. Then a new storage group is selected from the address queue for the storage of the next four packet payloads. The storing process within one storage group is always performed sequentially.
To read a packet payload from a storage cell and transport it to one of the designated output ports, every output controller receives from its correspondent output queue 261-268 the next payload address which contains the address of the corresponding storage group and the number of the corresponding storage cell where the next payload for this output port is stored. The receiving output controller signals to the group output controller of the storage group which contains the storage cell with the received storage group address that it has to prepare to transmit the stored packet payload. The output controller receives the payload address from the output queue 268 in form of the storage group and the storage cell inside of the storage group. The corresponding output router 178 receives also the payload address from the output controller and sets up a connection between the storage cell with the respective storage cell number and the output controller. Then the group output controller also provides a read pointer and resets this read pointer to the first byte and transmits simultaneously all packets in the storage group to its storage outputs.
When for reading out from the storage group only one storage cell is connected to an output controller, only the content of this storage cell is read out. However, since only copies are made during the reading procedure, this being called nondestructive reading, the other packet payloads in the same storage group are not lost but can be read in a later readout process. When receiving the packet payload, the output controller sends a signal to the decrementing section. The second counter is then incremented by one.
When the first counter and the second counter have equal values, the comparator realizes that the storage group address of the corresponding storage group is entered again into the address queue.
All storage groups are independent and can receive packet payloads independently, hence asynchronously. However, only one payload can be written at once into a storage group.
The described arrangement can perform within each storage group a synchronous readout of the contained storage cells. This means that a second output controller willing to read a packet payload from the same storage group must wait until the packet payload being currently read is read out entirely, i.e the read pointer has reached again the first byte of the storage cells. This fact may serve as a criterion for the dimensioning of the size of the storage groups. A small number of storage cells grouped in one storage group means a low probability that several output ports simultaneously request to receive packet payloads from the same storage group. On the other hand there is the fact that a high number of storage cells in one storage group means a lesser the expenditure on hardware, namely the input routers and the input selectors, because one storage group has only one input.
To keep the waiting-time for a blocked output port low, the storage cells within one storage group should all be filled up to an equal extent. Then, during the readout procedure the read pointer from the group output controller is reset to the first byte of the storage cells immediately after having reached the last occupied byte in all storage cells, which is faster than waiting for the read pointer to reach the last byte of the storage cells. The readout process and the storing process are generally independent from each other. Hence, a readout process for a certain packet payload can already be started when its first byte has been stored.
The readout of payloads from the various storage groups can be performed asynchronously as well as synchronously. In synchronous transmission mode, all storage groups are synchronized for reading. This means that there is an additional latency in synchronous transmission mode due to waiting for the synchronization point, which here is defined as the point of time when the write pointer points to the first byte of all corresponding storage packets in its corresponding storage group.
The output routers 171-178 can be realized as a blocking output routers. This means that the output routers 171-178 while performing readout of a payload from a selected storage group 1401-1432 prevent all accesses of other output routers 171-178 to the same storage group 1401-1432. Then the output routers 171-178 can be decreased in size by arranging behind every storage group a multiplexing device with one output and four inputs, the output being connected to every output router 171-178, the inputs being connected to the outputs of the respective storage group. This multiplexing device then acts like an inverted version of the packet selector switch and allows access only to one of the storage packets of a storage group at one moment in time. Longer access times for blocked output ports hence appear more likely. However, as already explained, a low probability of coincidental simultaneous access of several output ports to the same storage group is the fact and will keep this delay acceptable low.
In the output queue access manager several data patterns, respectively bitmaps can be processed in parallel, as well as the corresponding received payload addresses can be parallely processed. The following numeric example enlightens the background. The storing procedure of a packet payload with a size of 12 bytes, while assuming a clock cycle of 8 ns and a processing speed of one byte per clock cycle takes a total storing time of 96 nanoseconds. This is the maximal acceptable time during which the storage group address queuing procedure for all input ports 101-108 has to be completed. With a number of 32 input ports and the same number of output ports and output queues 261-268, the input ports being divided up into groups of four and with each group being processed during one clock cycle, the routing of the payload addresses into the output queues 261-268 takes 8 clock cycles, hence 64 ns which is clearly shorter than 96 ns. Hence a parallelity of four simultaneously processed payload addresses is here the minimum to satisfy the necessity to perform the payload address queuing in time.
Function of intermediate buffer
Assuming a backpressure is ocurring in the first output queue of the second switching device 13, this backpressure blocks via the ouput queue controller of the same switching device 13 its input means, in that the input ports 25, 26 which deliver packets heading for the first output port of the second switching device 13 are blocked. The longer the blocking persists, the higher is the probability that more of the input ports 25, 26 are blocked, assuming a somehow random or real traffic distribution. This is a head-of-the-line problem because input ports will be blocked according to the first packet to be delivered, irregard of packets that may be waiting behind and might be heading to an output port whose ouput queue is not full.
Hence, the backpressure of one output queue already spreads upstream and influences a plurality of input ports 25, 26. Without the first intermediate buffer 81, the backpressure on the input port 25 would directly influence the first switching device 11 in that the corresponding output router 171 is not allowed to send any more packets to that input port 25. The ouput router 171 is hence not allowed to process the corresponding output queue 261. This can lead to a second head-of-the-line problem in the case of multicast packets. Since at least one destination for a multicast packet is blocked, the whole deliverance can not be performed and the packet blocks a place in the storing section 87 as well as one in the address manager 71. This blocking also occurs in the other output queues where this packet has an entry for its destination. The blocked output queue 531 again can transfer the backpressure upstream because it is prevented from being emptied but still receives entries from the output queue access manager 18. The same effect as in the second switching device occurs, namely that the input means 86 is blocking inout ports which deliver packets heading for the first output queue 261. Again with time more and more input ports 21, 22 will be blocked.
The more input ports get blocked, the graver the effect. Assuming the input port 26 of the second switching device 13 also gets blocked, this backpressure spreads to the third switching device 12.
Multicast packets are realized by storing only once their payload and entering the respective payload address into several output queues 261-268. The choice of the output queues 261-268 is determined by the bitmap which gives e.g. a 1 for each output that has to receive a copy of the packet. The counting means provides control that the corresponding storage cells and the corresponding storage groups are not used again until the last copy of the multicast packet payload stored therein has been read.
When all or some of the input ports 21, 22 of the first switching device are blocked, these input ports 21, 22 can not deliver any packet to any output port 171, 178. Since the output queues 531, 538 receive fewer if not no packets at all, this leads to not used resources in the output queues and hence of the switching device 11. The other output queues 263 are not used as they might be used, since the input ports 21, 22 are blocked as a consequence of the head-of-the-line effect. As a consequence, more and more of the output ports 35, 36 get fewer packets and deliver fewer packets to the following switching devices 13, 14. Hence, the backpressure spreads again forward and influences the other switching devices 13, 14 downstream.
However, since the intermediate buffers 81, 82, 83, 84 are present, they prevent backpressure from rapidly extending upstream. Since all effects described before have some multiplicative factor in their effect, the buffer effects not only a linear performance increase but a much higher performance increase. The basic reason for that is the fact that the buffer is designed to prevent backpressure fom only one source to reach elements which when receiving backpressure have negative impact on many therewith connected switching device elements, e.g. the input means 86.
With other words, with the intermediate buffers 81, 82, 83, 84, the backpressure is led to a queue which is only dedicated to one output of the preceding switch. Since this queue serves as a decoupling element in that it buffers the backpressure to a certain extent, the probability that backpressure is extending to the preceding stage is reduced. Hence not only the backspreading backpressure effect but also the forwarding backpressure effect is reduced. The buffering means acts as if it were not there when normal traffic is processed.
In the following section, an arrangement will be described which incorporates several of the above switching devices. For sake of clearness all switching devices have been depicted with only two input ports and two output ports, but of course the examples work identically with a larger number of input ports and output ports.
The switching device is best suited to be scaled up to enhance its performance. It is useable therefor as a scaleable module for a scaled system. There exist different modes of expansion: Size expansion, i.e. an increase of the number of ports, memory expansion for achieving a higher data throughput and speed expansion for achieving a higher port speed.
For size expansion a single stage and a multistage arrangement are possible. The single stage version is depicted in figure 2. This design has a shorter delay than a multistage network and the number of switching devices grows with n2, n being the multiplication factor for the expansion of input ports.
In figure 2 four switching devices 11, 12, 13, 14 are combined. The first switching device 11 has two input ports 21, 22 and two output ports 35, 36. The second switching device 12 has two input ports 23, 24 and two output ports 37, 38. The third switching device 13 has two input ports 25, 26 and two output ports 31, 32. The fourth switching device 14 has two input ports 27, 28 and two output ports 33, 34. A first system input 51 is connected to a first selector means 41 which has two outputs, one connected to input port 21 and one to input port 23. A second system input 52 is connected to a second selector means 42 which has two outputs, one connected to input port 22 and one to input port 24. A third system input 53 is connected to a third selector means 43 which has two outputs, one connected to input port
25 and one to input port 27. A fourth system input 54 is connected to a fourth selector means
44 which has two outputs, one connected to input port 26 and one to input port 28.
A first arbitrator 45 has as first input output port 35 and as second input output port 37. Its output is defined as a first system output 55. A second arbitrator 46 has as first input output port 36 and as second input output port 38. Its output is defined as a second system output 56. A third arbitrator 47 has as first input output port 31 and as second input output port 33. Its output is defined as a third system output 57. A fourth arbitrator 48 has as first input output port 32 and as second input output port 34. Its output is defined as a fourth system output 58.
The whole arrangement now has four input ports, namely the system inputs 51-54 instead of two and also four output ports, namely the system output ports 55-58, but provides full connectability between all input ports 21-28 and output ports 31-38. The selectors 41, 42, 43, 44 serve as address filter. The purpose of such an address filter is to choose to which of the switching devices 11, 12, 13, 14 an incoming packet has to be sent. This is done by using the header of the incoming packet. For instance the whole packet can be duplicated and sent to a filter unit for each switching device 11, 12, 13, 14 whereby only one of the filters is permeable for the packet. Such a filter unit can also be located in the switching devices 11-14. In case of a multicast packet it may be necessary to store the payload in several of the switching devices 11-14, here particularly not in more than two of them. Furthermore, the arbitrators 45, 46, 47, 48 choose which of the switching devices 11, 12, 13, 14 has the right to give his packet from its output port 31-38 to one of the system outputs 55, 56, 57, 58. Destination control with this arrangement is mainly restricted to an automatism using directly the destination information from the packet header, respectively data destination part.
The first arbitrator 45 here comprises the first intermediate buffer 81. Respectively, the second arbitrator 46 comprises the second intermediate buffer 82, the third arbitrator 47 here comprises the third intermediate buffer 83, and the fourth arbitrator 48 here comprises the fourth intermediate buffer 84. This arrangement proves useful when combined with a subsequent stage of switches or receivers but also, when no backpressure-generation threatens. The fact that outgoing packets are queued in the arbitrators 45-48, takes the load off the switches 11-14. An arrangement is considered where e.g. the first arbitrator 45 for some reason does only accept a packet from one particular of its connected switches 11, 12. Thinking of the situation when the arbitrator 45 is waiting for a packet from the first switch 11, the other switch 12 might get blocked thereby. With the buffering means integrated in the arbitrator 45 however, the second switch 12 is not suffering immediately from this blockage and can hence continue operation particularly with packets heading for destinations that can be reached via the second arbitrator 46. Considering further that the blocked second switch 12 can effect a blocking in the third and fourth selector means 43, 44 which hence affects also the fourth switch 14, it is conceivable that the simple blockage of one switch also in a single-stage arrangement can have a disastrous effect on the overall arrangement performance. Also the probability of a deadlock situation is here reduced, namely e.g. the first arbitrator 45 waiting for a packet from the first switch 11, hence blocking the second switch 12, which is then unable to deliver packets to the second arbitrator 46 which is waiting for packets from that second switch 12 and hence blocks the first switch 11 which hence cannot send packets to the first arbitrator 45.
Since in the PRIZMA design in the case of backpressure the loss of a packet is to be avoided, the deletion of a packet in the preceding switch is postponed until it has been accepted by the subsequent switch. This means that the preceding switch is already decelerated by the need to wait for packet acceptance in the subsequent switch.
Another concept exists where the backpressure mechanism is substituted by a granting process. Therefor, a mask is generated in the preceding switch reflecting the actual state of the following switch, more precisely its overflow state. Only if this state shows free availability of an output queue, a packet is allowed to be sent. When a packet is sent, it can be immediately deleted in the preceding switch since its acceptance is granted. Since the mask always shows in some sense an outdated state, because during transmission of the mask its state may be already changed, the mask is transmitted with a security margin. This means that the output queue is already marked as full when say 95% of its places are full. This avoids packet loss. The backpressure mechanism is herein in general identical to the backpressure mechanism of the concept which sends packets without looking at a grant mask, since only the threshold is a bit decreased and anticipated in time. Hence with the granting process concept, the intermediate buffer effect is the same.
However, with the principle of updating an availability mask in the preceding stage, in combination with the intermediate buffer another possibility exists. The intermediate buffer can use the information which queues are free in the following switch and sort out packets that suffer from head-of-the-line blocking, which hence could be processed if they were not blocked by another packet and lead them directly to the subsequent switch which for that packet offers a free queue. Therefor, some intelligence in the intermediate buffer is needed for monitoring the destination of buffered, waiting packets and for comparing the free queues with these destinations.
Figure 3 shows a first embodiment of the buffering means 81. The buffering means 81 comprises a queue means 15 which is connected to an output means 20 which has another input connected to a backpressure means 17. The queue means 15 is further connected to a threshold means 16 which has one output. The backpressure means 17 has one input. The queue means 15 has several queuing places which are all in parallel connected to a sorting means 39.
Incoming packets are queued in the queue means 15 which serves as a waiting queue, also called FIFO. As long as the backpressure means 17 is not via its input triggered to exert backpressure on the output means 20, the output means 20 takes packet after packet from the queue means 15 and transmits it to the output of the buffering means 81. When a predetermined number of places in the queue means 15 is filled with packets, it itself generates a threshold signal which generates backpressure via the output of the threshold means 16.
The sorting means 39 is able to change the order of the queued packets. A sorting method is applicable which puts packets with a higher priority at places which are served before places where packets with a lower priority are stored. It is possible that packets have different priorities, i.e a packet with a higher priority is more important to be transmitted in time than a packet with lower priority. Another sorting mechanism could re-sort the packets each time a backpressure signal is received. By this method, a head-of the line queueing can be avoided, when the re-sorting continues with various permutations until a packet can be delivered.
Figure 4 shows an alternative embodiment of the buffering means 81 which compared to figure 3 has the following differences: The sorting means 39 is replaced by a selection means 49 which then provides the input for the output means 20.
The selection means 49 selects one packet out of a group of packets buffered in the queue means 25 and directly delivers it to the output means 20 which transmits the packet to the subsequent stage or receiver. The selection can again be done taking into account the packet priority and/or the impossibility to deliver one packet which leads to the selection of another packet. Again, head-of-the-line queueing can therewith be avoided.
Hence, this embodiment has means for rearranging the buffered packets according to their priorities, respectively or a means that organizes a resequenced transmittal in that it transmits high-prior packets before low-prior packets. To ensure a minimum sending rate for low-prior packets, a minimum transmitting rate for them can be established.
The buffering means can have a number of different queues for different priorities, e.g. 4 queues for 4 different priorities. Also mixed arrangements are possible, e.g. grouping two priorities together in one queue and having a priority-sorting means for this queue.
In fig. 5, a third embodiment of the buffering means 81 is shown. The buffering means 81 here comprises the queue means 15 which now has two independent buffering queues PI,
P2 for packets with different priorities PI, P2. The arrangement of the threshold means 16, output means 20 and backpressure means 17 is the same as in fig. 3. Furthermore, the buffering means 81 is here arranged in a so-called link-paralleling arrangement. Therefor, the buffering means 81 comprises a collecting means 76 which receives input from two output ports 35, 65 of the first switching means 11. The combining of the output ports pairwise effects a performance increase, in that the processing speed per packet is increased, because the packet payload can be split up in two halves which hence need only half of the number of cycles to be pushed through the first switching means 11. The link-paralleling arrangement can generally be used in that one part of each packet is received through one of the paired input ports, using even a number of 3 or more ports for link-paralleling. Hence also only a smaller part of the packet needs be processed serially, while all parts can be processed simultaneously in parallel. With a pairwise link-paralleling, the number of available input/output ports is however halved, if it is not doubled again by using twice the number of switches.
In bidirectional networks, the possibility of deadlock is a particular threat. The deadlock is a situation where several switches block each other, such that the blocking goes around in a circle and causes itself, i.e. persists even when the original cause has disappeared. This effect is hence fatal because it does not resolve itself and persists on and on. For deadlock prevention reserved space in output queues is one solution. The probability of deadlock is with an intermediate buffer also significantly reduced.
Regarding the synchronous readout of packet payloads in one storage group, situations can be imagined where small differences in arrival time of the packets cause a bigger latency during the synchronous readout procedure. Nevertheless, such problems are minimized since for asynchronously received packets which are queueing in different output queues 261-268 an automatic synchronization occurs when the first packet of such a queue has been queued for a waiting-time while being blocked.
The switching device's architectural principle is suitable for arbitrary choice of dimension which is also called scalablility. This means that by varying the dimension some or all components of the switching device, the wanted number of input ports, output ports and/or storage groups can be chosen.
The above described hardware arrangement can be varied while still maintaining the basic principle of the invention. For example the incrementing section need not be connected to the input controllers. They can also be connected to the outputs of the output group and receive then only the bitmap-derived incremention value for the input ports which are processed at one point of time there. Further, the bitmap need not be received by the switch controller but can also be received by the input groups. Then an interpretation of the bitmap's content can be performed in the output group. Generally, the interconnections which are depicted as being connected via the same line with several components may also be realized as a bus connection or as separated lines. The switching device is suited best for broadband communications, based on ATM.
However with appropriate adapters, the concept can be applied for non-ATM architectural environments too. Single-stage or multi-stage switch fabrics can be constructed in a modular way. The shared memory can also be expanded by linking multiple switching devices.
It should be noted, that all functions described above need not be performed by an exact arrangement of elements as described. For example the storage packet number needs not be transmitted to the output queue access manager by the group input controller but can get there in different equivalent ways. It suffices that this storage packet number shall be stored in the corresponding output queue to complete the payload address. Also, the function of the selectors 41-44 need not be concentrated in the selectors 41-44 but can be distributed and even be integrated with the corresponding part of the selector 41-44 on a PCB or even inside of a switching device 11-14.

Claims

1. Switching arrangement comprising a switching device (11) for transporting incoming packets of data containing a data destination part and a data content part, from a plurality of input ports (21-22) to a plurality of output ports (35-36), comprising input means (86) for transporting said data content parts of said incoming packets to storing means (87) which contains a plurality of storage packets and comprising output means (171-178) for reading out said stored data content parts and delivering them to a selection of said output ports (35-36), which is determined by said data destination part and comprising for at least one of said output ports (35-36) buffering means (81, 82) which is arranged after said output port (35-36).
2. Switching arrangement according to claim 1, characterized in that the buffering means (81, 82) comprises several queues (PI, P2) for buffering incoming packets with different priority and/or order means (39) for reordering buffered packets according to their priority and/or priority means (49) for transmitting a packet with a higher priority before a packet with a lower priority.
3. Switching arrangement according to claim 1 or 2, characterized in that at least a second of the output ports (35-36, 65) of the switching device (11) is leading to the buffering means (81, 82).
4. Switching arrangement according to claim 3, characterized in that it arranged as link-paralleled arrangement.
5. Switching arrangement according to one of claims 1 to 4, characterized in that it comprises threshold means (16) for signalizing to the switching device (11) that a predetermined number of packets is buffered in the buffering means (81, 82).
6. Switching arrangement according to one of claims 1 to 5, characterized in that the buffering means (81, 82) comprises output means (20) for transmitting a buffered packet to a subsequent second switching device (13) which is ready to receive said buffered packet.
7. Switching arrangement according to claim 6, characterized in that it comprises backpressure means (17) for receiving a backpressure signal from the subsequent second switching device (13)
8. Switching arrangement according to claim 7, characterized in that it comprises the second switching device (13) arranged subsequently to the switching device (11).
9. Switching arrangement according to one of claims 1 to 8, characterized in that it comprises a third switching device (12) arranged in parallel to the switching device (11).
PCT/IB1999/001970 1999-01-11 1999-12-10 Switching arrangement WO2000042745A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU14038/00A AU1403800A (en) 1999-01-11 1999-12-10 Switching arrangement

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP99100422 1999-01-11
EP99100422.7 1999-01-11

Publications (1)

Publication Number Publication Date
WO2000042745A1 true WO2000042745A1 (en) 2000-07-20

Family

ID=8237333

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB1999/001970 WO2000042745A1 (en) 1999-01-11 1999-12-10 Switching arrangement

Country Status (2)

Country Link
AU (1) AU1403800A (en)
WO (1) WO2000042745A1 (en)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHIUSSI F M ET AL: "THE ATLANTA ARCHITECTURE AND CHIPSET: A LOW-COST SCALABLE SOLUTION FOR ATM NETWORKING", ISS. WORLD TELECOMMUNICATIONS CONGRESS. (INTERNATIONAL SWITCHING SYMPOSIUM),CA,TORONTO, PINNACLE GROUP, 1997, pages 43 - 52, XP000720505 *
DENZEL W E ET AL: "A flexible shared-buffer switch for ATM at Gb/s rates", COMPUTER NETWORKS AND ISDN SYSTEMS,NL,NORTH HOLLAND PUBLISHING. AMSTERDAM, vol. 27, no. 4, 1 January 1995 (1995-01-01), pages 611 - 624, XP004037965, ISSN: 0169-7552 *
KATEVENIS M ET AL: "ATLAS I: a single-chip, gigabit ATM switch with HIC/HS links and multi-lane back-pressure", MICROPROCESSORS AND MICROSYSTEMS,GB,IPC BUSINESS PRESS LTD. LONDON, vol. 21, no. 7-8, 30 March 1998 (1998-03-30), pages 481 - 490, XP004123981, ISSN: 0141-9331 *

Also Published As

Publication number Publication date
AU1403800A (en) 2000-08-01

Similar Documents

Publication Publication Date Title
US6144662A (en) Fast routing and non-blocking switch which accomodates multicasting and variable length packets
US5361255A (en) Method and apparatus for a high speed asynchronous transfer mode switch
US8644327B2 (en) Switching arrangement and method with separated output buffers
US7145873B2 (en) Switching arrangement and method with separated output buffers
US6907041B1 (en) Communications interconnection network with distributed resequencing
EP0785698B1 (en) Buffering of multicast cells in switching networks
US4761780A (en) Enhanced efficiency Batcher-Banyan packet switch
US6944170B2 (en) Switching arrangement and method
US6542502B1 (en) Multicasting using a wormhole routing switching element
US5440553A (en) Output buffered packet switch with a flexible buffer management scheme
US5550815A (en) Apparatus and method for reducing data losses in a growable packet switch
US7403536B2 (en) Method and system for resequencing data packets switched through a parallel packet switch
US5856977A (en) Distribution network switch for very large gigabit switching architecture
US7400629B2 (en) CAM based system and method for re-sequencing data packets
KR100307375B1 (en) Lan switch architecture
EP0612171B1 (en) Data queueing apparatus and ATM cell switch based on shifting and searching
EP0417083B1 (en) Communication switching element
WO2000079739A1 (en) Apparatus and method for queuing data
WO2000042745A1 (en) Switching arrangement
EP1198098B1 (en) Switching arrangement and method with separated output buffers
Wong et al. Pipeline banyan-a parallel fast packet switch architecture
Keyvani VHDL implementation of a high-speed symmetric crossbar switch
EP1209865A2 (en) Method and structure for variable-length frame support in a shared memory switch

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref country code: AU

Ref document number: 2000 14038

Kind code of ref document: A

Format of ref document f/p: F

AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase