EP1310065A2 - Commutateurs et routeurs haute performance possedant des domaines de commutation paralleles avec acceleration des sous-unites - Google Patents

Commutateurs et routeurs haute performance possedant des domaines de commutation paralleles avec acceleration des sous-unites

Info

Publication number
EP1310065A2
EP1310065A2 EP01956728A EP01956728A EP1310065A2 EP 1310065 A2 EP1310065 A2 EP 1310065A2 EP 01956728 A EP01956728 A EP 01956728A EP 01956728 A EP01956728 A EP 01956728A EP 1310065 A2 EP1310065 A2 EP 1310065A2
Authority
EP
European Patent Office
Prior art keywords
link
switch fabric
crossbar
chip
links
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01956728A
Other languages
German (de)
English (en)
Inventor
Yuanlong Wang
Kewei Yang
Feng Chen Lin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mindspeed Technologies LLC
Original Assignee
Conexant Systems LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Conexant Systems LLC filed Critical Conexant Systems LLC
Publication of EP1310065A2 publication Critical patent/EP1310065A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/04Selecting arrangements for multiplex systems for time-division multiplexing
    • H04Q11/0428Integrated services digital network, i.e. systems for transmission of different types of digitised signals, e.g. speech, data, telecentral, television signals
    • H04Q11/0478Provisions for broadband connections
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric
    • H04L49/253Routing or path finding in a switch fabric using establishment or release of connections between ports
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • H04L49/1515Non-blocking multistage, e.g. Clos
    • H04L49/1523Parallel switch fabric planes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/20Support for services
    • H04L49/201Multicast operation; Broadcast operation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/20Support for services
    • H04L49/205Quality of Service based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric
    • H04L49/253Routing or path finding in a switch fabric using establishment or release of connections between ports
    • H04L49/254Centralised controller, i.e. arbitration or scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports
    • H04L49/3045Virtual queuing

Definitions

  • Packet as applied to digital systems and networks has multiple connotations.
  • packet transmission, packet-messaging, and packetized signaling all refer to a class of techniques for transmitting collections of digital bits as a data unit having a well-defined beginning and end.
  • Packet- messaging techniques are broadly taught in "Frames, Packets and Cells in Broadband Networking," by William A. Flanagan, published by Telecom Library Incorporated, 1991; and in “Computer Networks: a Systems Approach,” Second Edition, by Larry Peterson and Bruce Davie, published by Morgan Kaufmann, 2000.
  • data units at a given level may be of fixed-length, or have a relatively short maximum Variable- length, a data unit at a higher level may in fact be segmented and transmitted from source to destination using many lower level data units.
  • packets a second usage of the term.
  • a complete set of abstraction layers is referred to as a protocol-stack.
  • Different protocol-stacks exist for various purposes, using different abstractions as appropriate.
  • the different protocol-stacks generally have different definitions and names for each of (heir layers.
  • a particularly well-known protocol-stack is the OSI Reference Model.
  • the OSI Reference Model In this model, the lowest layer of abstraction (layer-1) is referred to as the physical layer.
  • the next higher layers of abstraction are the data link layer (layer-2), the network layer (layer-3), and a transport layer (layer-4). Other higher layers also exist, but we are not interested in them here.
  • the data unit has a particular formal name.
  • the OSI data link layer data unit is formally referred to as a frame.
  • the OSI network layer data unit is (confusingly) formally referred to as a packet. Thus we have the third usage of this term.
  • variable-length packets of variable or fixed-length.
  • packet is used to refer to a fixed-length network data unit.
  • packet is taken to refer to a variable-length network data unit.
  • Typical practice is to redundantly refer to "fixed-length cells” vs. "variable-length packets.”
  • fixed-length cells generally enables hardware optimizations not available to variable-length
  • Asynchronous Transfer Mode is a particular packet-switched technology (in the first 0 and broadest sense of "packet") that only uses cells, and is thus is commonly referred to as a cell- 1 switching technology. 2 3
  • packet the particular meaning of the term “packet” must be inferred 4 from the context of its use.
  • this specification will avoid the second and third 5 usage of the term packet as given above, unless explicitly indicated otherwise. This leaves the first 6 and fourth usages, which are generally easy to distinguish between.
  • This specification will also 7 sometimes use the term “message” to broadly refer to a data unit from a set that includes fixed-length, 8 variable-length, layer-2, and layer-3 data units.
  • layer-2 is 1 loosely referred to as the transport layer.
  • the interconnect-side of these link macros are generally characterized by a minimalist high- performance full-duplex I/O interface of relatively narrow data-width.
  • the I/O Interface is optimized for a basic transfer type and has minimal control signals that are focused-on error detection and retry for the basic transfer type. Control for higher order functions beyond the basic transfer type are implemented via control data fields defined within the (layer 2) frames.
  • Fig. 1 illustrates a system 15,000 using packet-messaging techniques to communicate between Chip A 15,100 and Chip B 15,300 using a generic (serial or parallel) bi-directional point-to-point channel (link) 15,200.
  • Chip A 15,100 includes core logic 15,120 and a link macro 15,110.
  • the link macro 15,110 includes physical layer circuits 15,140 and transport layer logic 15,130.
  • Chip B 15,300 includes core logic 15,320 and a link macro 15,310.
  • the link macro 15,310 includes physical layer circuits 15,340 and transport layer logic 15,330.
  • link and “channel” are synonymously used to refer to serial (one data wire or complementary wire-pair) and parallel (multiple data wires or wire-pairs) groupings of point-to-point inter-chip interconnect that are managed using variations on packet-messaging techniques as described herein.
  • the point-to-point to link 15,200 could be either a serial link, hereinafter referred to by the term S-Link, or a parallel link, hereinafter referred to by the term P-Link.
  • a clock rate is defined for the link macros and a plurality of bits is transferred (using multiple-data rate techniques) over each individual wire (bit) in each clock cycle. All of the bits transferred each cycle, either for the single wire in a serial link, or for multiple wires in a parallel link, are mapped into a predefined unit, or block, referred to as a "frame.”
  • the frame is often scaled-up into a super-frame by the use of multiple links in parallel.
  • the frame includes a data-unit (DU) and framing bits (F).
  • the DU may be used for either Control Data (CD) or Payload Data (PD).
  • the control data generally includes at least one address to identifying the ultimate destination for the packet.
  • the framing bits are required overhead information that generally includes error detection and correction bits and "flag" bits provided to define the boundaries of the frame or to facilitate timing alignment or calibration.
  • the term "cell” as used herein refers to a packet-messaging unit of a predetermined (fixed) number of frames (or super-frames). With reference to packet-messaging units, the term “packet” as used herein refers to a dynamically determined (variable-length) number of frames (or super-frames). Within a cell or packet, each frame (or super-frame) is typically allocated into multiple fields, the definition of each generally varying on a cycle-to-cycle basis.
  • Figs. 2A through 2E are abstract drawings illustrating in concept the components of a generic packet or cell message, transferred using the generic point-to-point link of Fig. 1.
  • Fig. 2 A illustrates a data unit prior to the addition of framing bits.
  • Fig. 2B illustrates the composite frame after framing bits have been added to the data unit.
  • Fig. 2C illustrates a complete cell or packet built from multiple frames. And shown, some of the day units have been designated as control data fields and others have been designated as payload data fields.
  • Fig. 2D conceptually shows the control data and payload data with the overhead framing bits stripped away.
  • Fig.2E conceptually shows the assembly of the control data and payload data fields into larger respective control data and payload data words, as might be employed by the core logic of either chip.
  • Fig. 2F is a conceptual timing diagram of a data transfer from chip A to chip B for the abstract generic link of Fig. 1. Time is shown increasing to the right on the x-axis, the vertical grid delineating cycle boundaries. Each of the waveforms A through E corresponds to a respective point in the system 15,000 of Fig. 1, as identified by the matching encircled letter.
  • the core logic of Chip A and the core logic of Chip B may pipeline or stream the control and payload data as though the two were directly coupled via a small multi-stage latency.
  • the two cores may inter-operate largely transparent to the fact that their interaction is actually occurring across chip (or board) boundaries and chip-to-chip (or board-to-board) interconnect. It will be appreciated by those skilled this has been abstract example, intended to establish the terminology to be used herein, and that the details of the nature of the point-to-point links of the present invention will differ significantly in detail from the foregoing discussion.
  • Switches are defined as devices that perform packet-forwarding functions to transport layer-2 data units (e.g., Token Ring, AppleTalk, Ethernet frames) across a switched network that is layer-2 homogenous (i.e., the same protocol is used throughout the network).
  • Routers are defined as devices that perform packet-forwarding functions to transport layer-3 data units (e.g., IP and IPX datagrams) across a network that is layer-3 homogeneous, but may be layer-2 heterogeneous (i.e., different protocols are used in various parts of the network).
  • the architectures described in the following paragraphs, while characterized as switch architectures, are the architectural foundation for both switches and routers. While exact implementation details will vary, at a purely abstract level, the operation of both of these packet-forwarding devices may be viewed as follows.
  • the switch fabric core forwards (switches) packets between one incoming and at least one outgoing network interface. (In high-end systems, the interfaces are generally modular and referred to as line cards.) For each data unit received by a line card, the data unit is briefly buffered on the input side of the switch fabric. The destination address specified by the data unit is looked up in the line card's copy of a forwarding table. Under control of resource scheduling fabric logic the data unit is subsequently transferred over the switch fabric to the destination line card.
  • a crossbar packet switch consists of a crossbar switch fabric core having a multiple-port input side and multiple-port output side.
  • the packet switch must interface externally at specified line rates for the input and output ports.
  • the addition of input and output queues (often implemented in shared- memory), at the input and output respectively of the switch fabric core, impacts both the service performance of the switch and the rate at which the switch fabric core must operate, for a given external line rate.
  • crossbar switches are referred to as Input-Queued (IQ) switches, Output-Queued (OQ) switches, or Combined Input and Output Queued (CIOQ) switches.
  • variable length packets will be formatted as cells. Subsequent to being switched across the switch fabric core, reassembly of the variable link packets will be performed. In this way, the switch fabric core need only handle fixed-length cells. From the perspective of the switch fabric core, a "time slot" is defined to be the interval between the cell arrivals.
  • Switches are characterized by their internal "speedup". Speedup is the (maximum) number of cells that are transferred across the switch fabric core each time slot. The switch fabric core must operate faster than the line rate by a factor equal to the speedup.
  • OQ switches must operate with a speedup of N, where N is the number of ports on each of the input and output sides of the switch.
  • N is the number of ports on each of the input and output sides of the switch.
  • OQ switches offer the maximum possible throughput and can provide quality-of-service (QoS) guarantees.
  • QoS quality-of-service
  • FIG. 1 is an abstract drawing of a prior art enhanced IQ crossbar-switch that employs Virtual Output Queues (VOQs) to eliminate HOL-blocking.
  • VOQs Virtual Output Queues
  • This and other architectures for Internet routers are overviewed in "Fast Switched Backplane for a Gigabit Switched Router," by Nick McKeown, in Business Communications Review, volume 27, No. 12, December 1997.
  • a separate FIFO queue is maintained for each output.
  • each of these FIFO queues is a VOQ for a respective output. After an initial forwarding decision is made, and arriving cell is placed in the VOQ for the output port to which it is to be forwarded.
  • CIOQ switches use buffering at both the inputs and outputs, they have speedup values between 1 and N. It has been shown in simulation that the average delay of practical CIOQ switches with a speedup of 2 approximates the average delay of OQ switches. Thus, CIOQ switches should provide much better delay control compared with IQ switches, and at the same time require only a modest speedup. Reducing the required speedup accordingly reduces bandwidth requirements and costs associated with memory and internal links. Thus from a theoretical perspective, CIOQ switches would appear to be an underlying architecture for high capacity switches that approximate the performance of an OQ switch without requiring high speed up.
  • prior art CIOQ architectures necessitate a large (and thereby expensive) number of semi-custom or ASIC chips to provide the requisite aggregate bandwidth.
  • prior art CIOQ architectures have problematically short cell times, even at the low levels of speedup required for a CIOQ approach. That is, a straightforward implementation of a CIOQ crossbar with a speedup of 2 could cause the cell time for an ATM cell (at lOGbps link rate) to be only 25ns. There is no practical prior art crossbar scheduler that can operate this fast for switch sizes up to 64x64. Such a short cell time also presents challenges to other aspects of the switch fabric design.
  • the present invention teaches crossbar and queuing chips with integrated point-to-point packet-based channel interfaces and resulting high internal aggregate bandwidths (on the order of 256Gbps in current technology), designed as modules of a scalable CIOQ-based switch fabric that supports high- capacity fixed-length cell switching. By aggregating large amounts of traffic onto a single switching chip, the system pin-count and chip-count is dramatically reduced.
  • the switch fabric offers improved switching capacity while operating at sub-unity speedup to relax cell-time requirements.
  • the switch fabric consists of multiple (eight in an illustrative embodiment) switching domains operating in parallel, for an overall effective speedup of 2, but with sub-unity speedup within each domain.
  • Each domain contains one or more non-buffered crossbar chips operating in bit-sliced fashion (plus a stand-alone scheduler chip if more than one slice is used).
  • the incoming traffic is queued at the ingress port in VOQs and then dispatched uniformly to all switching domains. Traffic coming out of the switching domains will then be aggregated at the egress port with shared-memory based OQs.
  • the switch fabric approximates a CIOQ crossbar switch with a speedup factor of 2 and at the same time doubles the cell time for ATM cells to 100ns.
  • the relatively long cell time allows for the design of the crossbar scheduler to be much easier.
  • the multiple sub-unity domains allow much easier system implementation of the switch fabric than a fully bit-sliced architecture.
  • the multiple non-buffered switching domains are totally independent and any switch domain can switch any cell for any ingress port to any egress port.
  • the status of each switch domain (Xchip or crossbar card) will be sent to the Qchips in conjunction with its handling of all the ingress and egress ports.
  • the Qchips monitor the returned status and automatically redirect cells and requests to available switching domains, avoiding any disabled or malfunctioning domains (whether due to link, chip, or other cause). Thus, there is no need to provide extra redundant switching capacity. Also, since there is no buffer within the switching domains, there is no need for cell reordering at the egress ports. This both simplifies the design and improves the switch fabric performance. 1
  • the switch fabric is protocol independent, scalable, and may be implemented as a set of ASIC chips.
  • the switch fabric is composed of building blocks (modules) including queuing chips (Qchips),
  • Xchips crossbar chips
  • Mchips MUX chips
  • 5 transceivers include 8Gbps parallel channels and 2.5Gbps serial links that allow for low chip-count,
  • the chipset uses sideband
  • the switch fabric of the present invention scales to larger port
  • 17 transceivers can provide an aggregate throughput of 256Gbps with a pin-count of less than 650 pins.
  • Edge and core switches and routers are exemplary system applications of the present invention. More
  • the present invention provides a high-performance protocol-independent switch fabric for
  • 26 invention include, but are not limited to, switches, routers, layer-3 switches, routing switches, and
  • Fig. 1 is an abstract drawing of a generic point-to-point link used for chip-to-chip message transfers, as found in the prior art.
  • Figs. 2A through 2E are abstract drawings illustrating the components of a message transferred using the generic point-to-point link of Fig. 1.
  • Fig. 2F is a timing diagram of a data transfer from chip A to chip B for the abstract generic link of Fig. 1.
  • FIG. 3 is a prior art crossbar switch.
  • 11 12 Fig. 4 illustrates a switch fabric 9200 in accordance with the present invention, having a capacity of 13 320Gbps using current technology, implemented using P-Link interconnect network 9250, and a 14 particular number of Qchips 2000 and Xchips 1000.
  • 15 16 Fig. 5 illustrates a router/switch 9000 using the switch fabric 9200 of Fig. 4, in which the network 17 interface 9100 and the Qchip 2000 are both implemented on Line Card 9150.
  • Fig. 6 illustrates the system environment in which the router/switch of Fig. 5 finds application.
  • 20 21 Fig. 7 illustrates a more general configuration of the switch fabric configuration 9200 of Fig.
  • FIG. 4 emphasizing that within the scope of the invention different numbers of Qchips and Xchips are 23 possible.
  • Fig. 8 illustrates a more general configuration of router/switch 9000 of Fig. 5, emphasizing that within 26 the scope of the invention the Qchips need not be implemented on the line cards.
  • Fig. 9A illustrates the internal architecture of the Qchips of Fig. 4, for specific numbers of OC-192 29 ports, S-Links, and P-Links.
  • 30 31 Fig. 9B illustrates the outgoing logic 2200 of Fig. 9A, for a specific configuration.
  • Fig. 9C illustrates the incoming logic 2100 of Fig. 9A, for a specific configuration.
  • Fig. 10A illustrates a more general configuration of the Qchip of Fig. 9 A, emphasizing that within the scope of the invention different numbers of ports, S-Links, and P-Links are possible.
  • Fig. 10B illustrates a more general configuration of the outgoing logic 2200 of Fig. 9C, emphasizing that within the scope of the invention different queue configurations are possible.
  • Fig. 10C illustrates a more general configuration of the incoming logic 2100 of Fig. 9B, emphasizing that within the scope of the invention different queue configurations are possible.
  • Fig. 11A illustrates the internal architecture of each Xchip of Fig. 4, for specific numbers of crossbar ports and associated P-Links.
  • Fig. 1 IB illustrates the internal structure of logic 1400 of Fig. 11 A.
  • Fig. 12 illustrates a more general configuration of the Xchip of Fig. 11A, emphasizing that within the scope of the invention different numbers of crossbar ports and P-Links are possible.
  • Fig. 13A is an abstract drawing illustrating one functional view of the S-Link macro 4100 Fig. 5 A.
  • Fig. 13B illustrates the minimum length S-Link packet format.
  • Fig. 13C illustrates the full-length S- Link packet format.
  • Fig. 14A is an abstract drawing illustrating one functional view of the P-Link macro 3100 Fig. 5A.
  • Fig. 14B illustrates the P-Link cell format.
  • Fig. 15 illustrates the logic of the S-Link macro 4100 of Fig. 13 A.
  • Figs. 16A through 161 detail logic, circuitry, and behavioral aspects of the P-Link macro 3100 of Fig. 14A.
  • Fig. 16A illustrates the logic of the P-Link macro 3100 of Fig. 14A.
  • Fig. 16B is a different view of the transmitter section 10,100 of the P-Link macro 3100 of Fig. 16A.
  • Fig. 16C illustrates the internal circuitry of the differential transceiver 10,127 of the transmitter section 10,100 of Fig. 16B.
  • Fig. 16D illustrates the voltage waveform for the link has observed from the transmitter output.
  • Fig. 16E illustrates the voltage waveform for the link is observed from the receiver input at the opposite and of the link relative to the observation-point of Fig. 16D.
  • Fig. 16A illustrates the logic of the P-Link macro 3100 of Fig. 14A.
  • Fig. 16B is a different view of the transmitter section 10,100 of the P-Link macro 3100 of Fig. 16A.
  • FIG. 16F is a different view of the receiver section 10,200 of the P-Link macro 3100 of Fig. 16A.
  • Fig. 16G illustrates the internal circuitry of the receiver section 10,200 of Fig. 16F.
  • Fig. 16H illustrates the logic within the receiver synchronization circuit 10,221 of Figs. 16A and 16F.
  • Fig. 161 illustrates a detail of the operation of the synchronization circuit 10,221 of Fig. 16H.
  • Fig. 17 illustrates switch fabric 9202, a reduced-scale variation of the switch fabric of Fig. 4 within the scope of the present invention, having a P-Link interconnect network 9252 and a capacity of 160Gbps using current technology, using a particular number of Qchips 2000 and Xchips 1000.
  • Fig. 18 is a drawing of a router/switch 9000 in accordance with the present invention using the switch fabric configuration 9202 of Fig. 17.
  • Fig. 19 illustrates a more general configuration of the switch fabric configuration 9202 of Fig. 17, emphasizing that within the scope of the invention different numbers of Qchips and Xchips are possible.
  • Fig. 20 illustrates switch fabric 9205, an expanded-scale variation of the switch fabric of Fig. 4 within the scope of the present invention, having a P-Link interconnect network 9255 and a capacity of 640Gbps using current technology, using a particular number of Qchips 2000 and Crossbar Cards 5000.
  • Fig. 21 illustrates a router/switch 9000 using the switch fabric 9205 of Fig. 20, in which the network interface 9100 and the Qchip 2000 are both implemented on Line Card 9150.
  • Fig. 22 illustrates a more general configuration of the switch fabric configuration 9205 of Fig. 20, emphasizing that within the scope of the invention different numbers of Qchips and Xchips are possible.
  • Fig. 23 illustrates a more general configuration of router/switch 9000 of Fig. 21; emphasizing that within the scope of the invention the Qchips need not be implemented on the line cards.
  • Fig. 24 illustrates the internal architecture of the crossbar cards 5000 of Fig. 20, using a particular number of Mchips 6000 and Xchips 1000.
  • Fig. 25 illustrates a more general configuration of the crossbar card 5000 of Fig. 24, emphasizing that within the scope of the invention different numbers of Mchips 6000 and Xchips lOOOare possible.
  • Fig. 26A illustrates the internal architecture of Mchip 6000 of Fig. 16C.
  • Fig. 26B provides additional
  • FIG. 26A 9 detail of logic block 6100 of Mchip 6000 of Fig. 26A.
  • Fig. 26C provides additional detail of logic 10 block 6300 of Mchip 6000 of Fig. 26A.
  • Fig. 26D provides additional detail of logic block 6400 of 11 Mchip 6000 of Fig. 26A. 12
  • Fig. 4 illustrates a switch fabric 9200 in accordance with the present invention, having a capacity of 19 320Gbps using current technology, implemented using P-Link interconnect network 9250, and a 20 particular number of Qchips 2000 and Xchips 1000.
  • the switch fabric of this particular illustrative 21 embodiment is designed to implement the CIOQ crossbar architecture with 8 slower speed (sub-unity 22 speedup) switching domains operating in parallel. Although the switch fabric as a whole has an 23 internal speedup of 2 for better QoS, the 8-switching domain design allows each domain to operate at 24 only half the link rate.
  • the raw switching capacity of the switch 27 fabric is 3.2 times the link data rate. For example, for every lOGbps link, the switch fabric allocates 28 32Gbps internal bandwidth. Of that, 20Gbps is used for the switching of payload; the other 12Gbps 29 bandwidth is used for overhead, which includes requests, grants, backpressure information, and other 30 control information.
  • the switch fabric supports 8 priorities (classes) with a per-port per-class based 31 delay control. The switch fabric naturally supports fault tolerance without requiring additional redundancy logic.
  • the multiple non-buffered switching domains are totally independent and any switch domain can switch any cell for any ingress port to any egress port.
  • each switch domain (Xchip or crossbar card) will be sent to the Qchips in conjunction with its handling of all the ingress and egress ports.
  • the Qchips monitor the returned status and automatically redirect cells and requests to available switching domains, avoiding any disabled or malfunctioning domains (whether due to link, chip, or other cause). Even when one switching domain is disabled, the remaining seven switching domains can still provide a speed up of 1.8 times that of the link data rate. In this case, the switch fabric continues to provide good performance. Thus, there is no need to provide extra redundant switching capacity.
  • Fig. 5 illustrates a router/switch 9000 using the switch fabric 9200 of Fig.4, in which the network interface 9100 and the Qchip 2000 are both implemented on Line Card 9150.
  • forwarding-table management is performed in a master network processor, a designation given to the network processor in the first line card.
  • the master network processor is responsible for dynamically maintaining forwarding tables for the network topology and switch configuration and updating the forwarding tables of the network processors on the other line cards.
  • the master network processor is also responsible for system administration functions typical for a switch, including initialization (e.g., loading of the switch operating system software), configuration, console, and maintenance.
  • initialization e.g., loading of the switch operating system software
  • configuration e.g., console
  • maintenance e.g., implementations are possible in which all network processors have identical functionality, and a separate unit, a switch processor, performs the forwarding-table management and system administration functionality.
  • forwarding-table management is performed in a master network processor, a designation given to the network processor in the first line card.
  • the master network processor is responsible for running the routing protocols, building routing and forwarding tables for the network topology and router configuration and distributing the forwarding tables to the network processors on the other line cards.
  • the master network processor is also responsible for system administration functions typical for a router, including initialization (e.g., loading of the router operating system software), configuration, console, and maintenance.
  • initialization e.g., loading of the router operating system software
  • configuration e.g., configuration, console, and maintenance.
  • implementations are possible in which all network processors have identical functionality, and a separate unit, a route processor, performs the forwarding-table management and system administration functionality.
  • Fig. 6 illustrates the system environment in which the router/switch of Fig. 5 finds application.
  • Fig. 7 illustrates a more general configuration of the switch fabric configuration 9200 of Fig.4, emphasizing that within the scope of the invention different numbers of Qchips and Xchips are possible.
  • Fig. 8 illustrates a more general configuration of router/switch 9000 of Fig. 5, emphasizing that within the scope of the invention the Qchips need not be implemented on the line cards.
  • Fig. 9 A illustrates the internal architecture of the Qchips of Fig. 4, for specific numbers of OC-192 ports, S-Links, and P-Links.
  • each Qchip can support a) 2 OC-192 ports (lOGbps), b) 8 OC-48 ports (2.5Gbps), or c) 1 OC-192 port and 4 OC-48 ports.
  • the Qchip interfaces to a line card or the network processor with 16 high-speed serial links providing a total of 32Gbps bandwidth. Each serial link runs at 2.5Gbps and provides effective bandwidth of 2Gbps.
  • the Qchip interfaces with the 8 switching domains with 8 parallel links providing a total of 64Gbps bandwidth. Each parallel link can provide 8Gbps bandwidth.
  • Qchip Ingress Processing is a) 2 OC-192 ports (lOGbps), b) 8 OC-48 ports (2.5Gbps), or c) 1 OC-192 port and 4 OC-48 ports.
  • the Qchip interfaces to a line card or the network processor with 16 high
  • Fig. 9B illustrates the outgoing logic 2200 of Fig. 9 A, for a specific configuration. Ingress processing is performed here.
  • the Qchip maintains 64 unicasting VOQs 2250, shared by all the ingress ports, each of these 64 VOQs targeting a respective egress port.
  • Each VOQ consists of 8 sub-VOQs, one for each of the 8 priorities. Thus, there are 512 sub-VOQs.
  • the Qchip also maintains a multicasting queue 2260 for each OC-192 port, so there are 2 multicasting queues in this particular Qchip embodiment. In the case that an OC-192 port is configured as 4 OC-48 ports, all 4 OC-48 ports will share the same multicasting queue.
  • Each multicasting queue also consists of 8 sub-queues for the 8 priorities.
  • the buffers for the unicasting queues and multicasting queues are implemented by on-chip SRAMs and are managed with an adaptive dynamic threshold algorithm for better adaptation to different traffic patterns and for efficient use of buffer space.
  • the ingress portions of the SRAMs have a bandwidth of 80Gbps.
  • the ingress port scheduler 2240 sits between the queues. It controls the dispatch of requests, which are sent to the Xchips from the queues via the 8 outgoing P-Links. The real data transfer happens only after the crossbar schedulers in the Xchips have granted a request. For every grant, a cell will be forwarded to the Xchip to be switched to the appropriate egress port.
  • the ingress port scheduler is based on a round-robin algorithm.
  • the pointers used for the scheduler can be reset to a random value periodically.
  • the scheduling mechanism is fully programmable and can support either strict or weighted priority.
  • Fig. 9C illustrates the incoming logic 2100 of Fig. 9A, for a specific configuration. Egress processing is performed here.
  • the Qchip maintains 8 OQs, corresponding to one OQ for every potential OC-48 egress port on the Qchip.
  • each OC-48 port will have its own OQ.
  • Each OQ 2120 corresponds to a priority and further has one sub-OQ per priority.
  • an OC-192 port is configured to be 4 OC-48 ports, all the 4 OC-48 ports will share the same multicasting queue.
  • the OQs and multicasting queues are shared-memory based and are implemented as on-chip SRAMs and managed with an adaptive dynamic threshold algorithm for better adaptation to different traffic patterns.
  • the egress portions of the SRAMs also have a bandwidth of 80Gbps.
  • the egress port scheduler 2140 sits between the OQs and multicasting queues, and the egress port 2145. It supports both strict priority and weighted round-robin algorithms to control delays of the cells for each of the priorities.
  • Fig. 10A illustrates a more general configuration of the Qchip of Fig. 9 A, emphasizing that within the scope of the invention different numbers of ports, S-Links, and P-Links are possible.
  • the Qchip interfaces to the line card or the network processor via multiple S-Link links (reference No. 4000 individually, 2050 collectively). Each serial link runs at 2.5Gbps and provides effective bandwidth of only 2Gbps due to 8b-10b encoding.
  • a single Qchip can support N OC-192 ports or 4xN OC-48 ports.
  • a group of 8 S-Links can be programmed to support either a single OC-192 port or 4 OC-48 ports.
  • an external mux is required to mux OC-48 cells on to the 8 S-Links.
  • the Qchip interfaces with the each of the switching domains with a respective 8Gbps bandwidth P- Link. All the ports on a Qchip share the multiple P-Links interfacing to the switching domains.
  • Fig. 10B illustrates a more general configuration of the incoming logic 2100 of Fig. 9B, emphasizing that within the scope of the invention different queue configurations are possible.
  • Fig. 10C illustrates a more general configuration of the outgoing logic 2200 of Fig. 9C, emphasizing that within the scope of the invention different queue configurations are possible.
  • CROSSBAR CHIP INTERNAL ARCHITECTURE
  • Fig. 11 A illustrates the internal architecture of each Xchip of Fig.4, for specific numbers of crossbar ports and associated P-Links.
  • Fig. 1 IB provides additional detail of logic 1400 of Fig. 11A.
  • the Xchip has an integral crossbar 1100 as the data path for the switch fabric. It also has an integral crossbar scheduler 1200.
  • Each Xchip has 16 P-Link interfaces. With 8Gbps per parallel channel (P- Link), per direction, an Xchip has raw switching capacity of 128Gbps. The aggregate throughput of a single Xchip is 256Gbps.
  • the Xchip also supports l-> N multicasting. In one cell time, multiple targets can receive the cell from a single ingress port.
  • Each Xchip used alone, or each combination of Xchips on a crossbar card constitutes an independent switching domain.
  • the real cell switching is done in these domains.
  • the switch fabric 9200 of Fig.4 there are eight such domains. Each switching domain operates independent of the others and at half the link rate.
  • the cell time is 100ns.
  • a scheduler using commonly available CMOS technology can finish one cell scheduling for a 64x64 crossbar within one cell time.
  • Fig. 12 illustrates a more general configuration of the Xchip of Fig. 11A, emphasizing that within the scope of the invention different numbers of crossbar ports and P-Links are possible.
  • Serial and parallel point-to-point packet-based channels and parallel multi-drop channels have been integrated into the chips of the switch fabric. The following paragraphs will describe for each of these channel interfaces their overall functionally, their data unit protocols, and aspects of their circuit design. Additional aspects regarding their implementation are provided in the previously referenced applications: Ser. No. 09/350,414 and Ser. No. 09/349,832. These low power, low pin-count, high reliability, high-speed CMOS transceivers are suitable for chip-to-chip, box-to-box, and a variety of backplane implementations.
  • the S-Link is a serial transceiver optimized for ease-of-use and efficiency in high-performance data transmission systems.
  • the S-Link Link contains on-chip PLL circuitry for synthesis of the baud-rate transmit-clock and the extraction of the clock from the received serial stream.
  • the S-Link Link can operate at bit data rates of up to 3.125Gbps.
  • the P-Link uses a parallel transceiver that provides 16Gbps of aggregated bandwidth at data rate of 1.6Gbps per differential signal pair with less than 10 "15 BER.
  • the data width is 5bit.
  • the transceiver utilizes 24 signal pin and 6 power and ground pins. It can drive 39" 50 ohm PCB trace with two connector crossings. It has a built-in self-calibration circuit that optimizes data transfer rate and corrects up to 1.2ns line-to-line data skew. Multiple transceivers can be integrated on to a single ASIC chip to dramatically improve the per chip bandwidth. Further more, the P-Link transceiver requires no external termination resistors. The latency through link is less than 8ns because there is no need for data encoding and decoding.
  • P-Link MD The P-Link multi-drop parallel channel
  • Fig. 13A is an abstract drawing illustrating one functional view of the S-Link macro 4100 Fig. 5A.
  • Table 1 itemizes the four link-side signal wires per S-Link transceiver, consisting of a complementary
  • the link-side interfaces uses a CMOS transceiver
  • transceiver is designed in a standard 0.25u CMOS process with power dissipation of 350mW.
  • 8 transceiver has a transmitter pre-equalization circuit for compensating high frequency attenuation
  • the transceiver supports both AC and DC coupled transmission, and both 50
  • the S-Link macro has built-in comma detection and framing
  • Table 2 details the parallel core-side interface, which includes 20-bits of payload, running at
  • the transceiver can also operate in 1.25Gbps mode with the parallel interface running at
  • S-Link macros can be integrated into a single ASIC chip to achieve any desired
  • TXP O Transmit Differential Signal ( + polarity)
  • TXM O Transmit Differential Signal ( - polarity)
  • RXM I Receiver Differential Signal ( - polarity)
  • Fig. 13B illustrates the minimum length S-Link packet format.
  • Fig. 13C illustrates the full-length S- Link packet format. Variable-length packets are used for carrying payload to and from the switch fabric, via the S-Links.
  • the packet load can have variable length.
  • the length of the packet payload (PP) could vary from 0 to the cell payload length in 8-byte granularity. PP will be addressed with byte granularity. For example, PPO will be byte 0 of the PP.
  • the cell payload length is
  • Packet payloads less than the cell payload length will first be patched to the full cell payload 11 length before being switched within the switch fabric.
  • PCH packet control header
  • PCHO will be byte 0 of the PCH
  • PCH[7:0] will be bits 0-7 of the PCH.
  • Tables 3 16 and 4 provide detailed definitions for the PCH fields. 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Table 4.
  • Type 2 Packet Control Header Field Definitions (Outgoing S-Link from Qchip to Li card Network Interface
  • VALID Indicating the packet is a valid packet.
  • L[3:0] Length.4-bit field used to indicate the length of the payload in 8-byte granularity.
  • MBP_MAP [7:0] Multicasting backpressure bit map. Used to indicate the backpressure information of the multicasting queues at an ingress port. Note that each priority of an OC-192 (or 4 OC-48 ports) has a separate multicasting queue and can be backpressured by setting a bit in the bitmap.
  • UBP_BASE[2:0] Unicasting backpressure base address. Used together with the 64-bit backpressure map to indicate which egress ports (out of a maximum of
  • UBP_MAP [63:0] Unicasting backpressure bit map. Used to indicate the backpressure information to an ingress port. Together with BP_BASE[2:0], the switch fabric can broadcast system wide 512-bit backpressure information in 8 cell times. Each bit in the bitmap indicates the backpressure information in a output queue. The detailed encoding is defined in the later chapter. P-Link Macro
  • Fig. 14A is an abstract drawing illustrating one functional view of the P-Link macro 3100 Fig. 5 A.
  • Table 5 provides detailed definitions of the P-Link side signals.
  • Table 6 provides detailed definitions of the core-side signals.
  • the P-Link macro is a scalable parallel data transceiver with an aggregate bandwidth of 32Gbps, or 16Gbps in each direction.
  • the transmitter serializes 200MHz data into five pairs of differential data lines.
  • a 200MHz transmitter phase-locked loop (PLL) generates equally distributed eight-phase clocks to multiplex 200MHz data into a 1.6Gbps data stream.
  • the data is then transmitted at a data rate eight times the system clock over differential signal lines.
  • PLL phase-locked loop
  • a delay-lock loop (DLL) at the receiver retrieves the clock and data, latching incoming data through high-speed differential receiver/latch modules using the eight phase clocks.
  • the Channel is optimized for ease- of-use and efficient, low Bit Error Rate data transmission in high performance systems.
  • An on-chip timing calibration circuit performs data de-skewing and timing optimization. Low swing differential signaling further reduces noise and error probability and therefore relaxes the restrictions of board design.
  • Fig. 14B illustrates the P-Link cell format.
  • Fixed length cells are used for carrying payload from ingress ports to egress ports within the switch fabric via the P-Links.
  • P-Links When transferred on a link within the switch fabric, two types of the information will be transferred for a cell. The information that is independent from the physical links and the information that is dependent on the physical links.
  • the physical link independent portion of the cell is divided into 3 fields while the physical link dependent portion of the cell defines how the 3 fields of a cell are transferred on a physical link.
  • the cell payload is the payload that will be transferred within the switch fabric. CP will be addressed with byte granularity. For example, CPO will be byte 0 of the CP.
  • the length of the cell payload is fixed to be 72-byte. Packet payload could be variable length. Packet payload can vary from 0 to the cell payload length of 72 bytes. Packet payloads less than the cell payload length will first be patched to the full cell payload length before being switched within the switch fabric.
  • the ingress processing logic of the Qchip will add a 32-bit cell payload header (CPH) to the ceil payload.
  • CPH will be addressed with bit granularity. For example, CPH[7:0] will be bits 0-7 of the CPH.
  • the switch fabric will then switch the cell payload and cell payload header to the target egress port as pure data without being looked at or modified.
  • the egress processing logic of the Qchip needs the information in the cell payload header for processing the cell. Table 7 provides detailed field definitions for the CPH.
  • a cell also contains the cell control header (CCH) for request, grant, backpressure control and other control purpose.
  • CCH will be addressed with bit granularity. For example, CCH[7:0] will be bits 0-7 of the CCH.
  • the cell control header is 89-bit long. Tables 8 and 9 provide detailed field definitions for the CCH.
  • SOC bit is the cell-framing signal. It is used to indicate the start of a cell. E bit is used to indicate whether the cell being transferred is an erroneous cell. This bit will be updated along the path. When set to 1, the E bit indicates that the cell contains an error even if there is no parity error detected on the cell. Odd horizontal parity is used for error protection.
  • PAR[0] covers P-Link bits [18:0], PAR[1] MREQ_SRC Multicasting Request Source. Used to indicate which OC-192 port of a Dual-port Qchip the multicasting request comes from.
  • UBP 32 Unicasting backpressure. Used to indicate the status of the unicasting queues in the egress portion of the Qchip.
  • UREQO Unicasting Request 0. Set to 1 to indicate a valid unicasting request.
  • CMD 6 GNT. Indicating the CCH contains unicasting and multicasting grants.
  • IDLE Indicating the CCH is an IDLE CCH.
  • MGNT MAP 16 Multicasting Grant Bitmap Bit map indicating a multicasting grant.
  • MGNT PRI Multicasting Grant Priority Used to indicate the priority of the multicasting grant.
  • MGNT SRC Multicasting Grant Source Used to indicate which OC-192 port of a Dual-port Qchip the multicasting grant is for.
  • Fig. 15 illustrates the logic of the S-Link macro 4100 of Fig. 13A.
  • Fig.s 16A through 161 detail logic, circuitry, and behavioral aspects of the P-Link macro 3100 of Fig. 14A.
  • Fig. 16A illustrates the logic of the P-Link macro 3100 of Fig. 14A.
  • the transmitter 10,100 serializes 40-bit 200MHz data into 5 pairs of differential data lines.
  • a 200MHz transmitter phase-locked loop (PLL) 10,110 generates equally distributed 8 phase clocks to multiplex 200MHz data into a 1.6Gb/s data stream.
  • 16 phases are used.
  • phase 16, phase 13, and phase 3 are respectively used instead of phase 8, phase 7, and phase 2 of the 8 phase embodiment illustrated herein.
  • Fig. 16B is a different view of the transmitter section 10,100 of the P-Link macro 3100 of Fig. 16A.
  • Fig. 16C illustrates the internal circuitry of the differential transceiver 10,127 of the transmitter section 10,100 of Fig. 16B.
  • a single stage differential 8-to-l multiplexer/predriver 10,124 is used.
  • the data driver 10,128 is a constant current, differential current steering driver with matched impedance termination resistors 10,134 to termination voltage (Vt).
  • Vt is typically the same as Vdd, and can be lower as long as it meets the receiver input common-mode range.
  • a process/voltage/temperature (PVT) compensated current source driven by TXBIAS 3130 is used to generate the output bias current. Because of constant current drive, power and ground noise due to simultaneous output switching is greatly reduced; therefore, the number of power and ground pins required are reduced compared to other implementations.
  • a dynamic impedance matching mechanism is utilized to obtain the best match between the termination resistor and the transmission line impedance.
  • the transmitter also has a 1-bit pre-equalization circuit 10,129 that amplifies signal swing when it switches.
  • the pre- equalization circuit provides compensation for the high frequency lost through board traces or cables. This maximizes the data eye opening at receiver inputs.
  • Fig. 16D illustrates the voltage waveform for the link has observed from the transmitter output.
  • Fig. 16E illustrates the voltage waveform for the link is observed from the receiver input at the opposite and of the link relative to the observation-point of Fig. 16D.
  • 2-bit transmitter swing control 3125 provides 4 levels of signal swing to compensate for signal attenuation through transmission lines. The transmitter forwards a 200MHz clock 10,115 in addition to transmitting 10 bit data.
  • Fig. 16F is a different view of the receiver section 10,200 of the P-Link macro 3100 of Fig. 16A.
  • Fig. 16G illustrates the internal circuitry of the receiver section 10,200 of Fig. 16F.
  • Signals are terminated at the receiver in addition to the transmitter.
  • a delay-locked loop (DLL) at the receiver regenerates 8 phase clocks from the incoming 200MHz transmitted clock.
  • Incoming data is latched through highspeed differential receiver/latch modules using these 8 phase clocks. Synchronization with the 200MHz receiving chip's core clock is performed after the data capture.
  • the receiver cell, shown in Fig. 16G is a high-speed differential latch. It senses and latches incoming data at the capture_clock's rising edge. Q and Q ⁇ are pre-charged to Vdd while the capture_clock is low. This receiver is capable of sensing differential input of 80m V and requires a very small data valid window.
  • the main function of calibration is data de-skewing. This function is achieved through two steps.
  • the first step is bit clock optimization. 8 phase clocks are globally distributed to 10 receiver cells, as shown in Figure X. At each receiver, the clock can be delayed up to 1-bit time (625ps at 1.6Gb/s data rate) before the input data latch uses it. 4-bit control provides a bit clock adjustment resolution of 40ps.
  • a 90-degree phase shifted version of 8 phase clocks are also used during the calibration to achieve 2X over sampling.
  • a dedicated training pattern is used. The receiver captures 16 data within one 200MHz clock cycle.
  • This data is used to determine the receiver timing and allow calibration logic to optimize the local clock delay in order to center the capture clock in the center of the data eye.
  • control logic determines the byte alignment for each receiver cell and between different receiver cells. This assures that 80-bit data output at the receiver matches the 80-bit data input at the transmitter. Overall, the calibration can correct up to 1.2ns data skew at 1.6Gb/s data transfer rate.
  • Fig. 16H illustrates the logic within the receiver synchronization circuit 10,221 of Fig.s 16A and 16F.
  • Fig. 161 illustrates a detail of the operation of the synchronization circuit 10,221 of Fig. 16H.
  • the received data is de-serialized and latched using phase 8.
  • a clock phase detector is used to determine the phase relationship between the core clock and the received clock. If the core clock rising edge is between phase 7 and phase 2, the data is delayed using a phase 7 latch, and then registered using the next rising edge of the core clock. If the core clock rising edge is between phase 2 and phase 7, the data is registered into the core clock domain without a delay. This circuit achieves minimum latency with the link.
  • Fig. 17 illustrates switch fabric 9202, a reduced-scale variation of the switch fabric of Fig. 4 within the scope of the present invention, having a P-Link interconnect network 9252 and a capacity of 160Gbps using current technology, using a particular number of Qchips 2000 and Xchips 1000.
  • Fig. 18 is a drawing of a router/switch 9000 in accordance with the present invention using the switch fabric configuration 9202 of Fig. 17.
  • Fig. 19 illustrates a more general configuration of the switch fabric configuration 9202 of Fig. 17, emphasizing that within the scope of the invention different numbers of Qchips and Xchips are possible.
  • a preferred illustrative embodiment uses Mux chips (Mchips) for doing bit slicing and protocol conversion between the Qchips and Xchips.
  • Mchips Mux chips
  • P-Links parallel channels
  • Fig. 20 illustrates switch fabric 9205, an expanded-scale variation of the switch fabric of Fig. 4 within the scope of the present invention, having a P-Link interconnect network 9255 and a capacity of 640Gbps using current technology, using a particular number of Qchips 2000 and Crossbar Cards 5000.
  • Fig. 21 illustrates a router/switch 9000 using the switch fabric 9205 of Fig. 20, in which the network interface 9100 and the Qchip 2000 are both implemented on Line Card 9150.
  • Fig. 22 illustrates a more general configuration of the switch fabric configuration 9205 of Fig. 20, emphasizing that within the scope of the invention different numbers of Qchips and Xchips are possible.
  • Fig. 23 illustrates a more general configuration of router/switch 9000 of Fig. 21, emphasizing that within the scope of the invention the Qchips need not be implemented on the line cards.
  • Fig. 24 illustrates the internal architecture of the crossbar cards 5000 of Fig. 20, using a particular number of Mchips 6000 and Xchips 1000.
  • Each crossbar card constitutes an independent switching domain. It includes multiple sliced crossbars and an external centralized scheduler.
  • Xchip 1000-3 acts as an external centralized scheduler for the other two Xchips 1000-1 and 1000-2 on the card.
  • External scheduler 1000-3 sends crossbar configuration information to the Xchips 1000-1 and 1000-2 synchronously every cell time via the dedicated high-speed crossbar configuration bus 5030, implemented using a P-Link MD channel.
  • Fig. 25 illustrates a more general configuration of the crossbar card 5000 of Fig. 24, emphasizing that within the scope of the invention different numbers of Mchips 6000 and Xchips lOOOare possible.
  • Fig. 26A illustrates the internal architecture of Mchip 6000 of Fig. 16C.
  • Fig. 26B provides additional detail of logic block 6100 of Mchip 6000 of Fig. 26A.
  • Fig. 26C provides additional detail of logic block 6300 of Mchip 6000 of Fig. 26 A.
  • Fig. 26D provides additional detail of logic block 6400 of Mchip 6000 of Fig. 26A.
  • N P-Links On the Qchip-side of the Mchip, there are N P-Links, on the Xchip-side of the Mchip, there are N+l P-Links, N of them are used to connect to the 4 crossbar chips (Xchips), the (N+l)th P-Link is used to connect to an Xchip dedicated to use as a scheduler chip for the other Xchips on the card (within the switching domain).
  • the Mchip->Xchip direction the Mchip receives cells from the 4 P-Links on the Qchip-side and forwards them across the first 5 P-Links on the Xchip-side.
  • the CP and CPH portion of the cells are forwarded in a bit-sliced fashion across the 4 P-Links to the crossbar chips.
  • the CCH portions of the cells are forwarded on the 5th P-Link to the scheduler chip.
  • the Mchip receives the CP and CPH portion of the cells from the crossbar chips on the 4 P-Links and the CCH portion of the cells from the scheduler chip on the 5th P- Link.
  • the Mchip assembles a complete cell from the CP, CPH and CCH received and forwards the cell to the appropriate Qchip via one of the 4 P-Links on the Qchip-side.
  • the switch fabric supports lossless backpressure protocol on a per class per port basis.
  • the backpressure information can be generated by the line card (network processor) at the egress port and the switch fabric internal queues.
  • the unicasting and multicasting backpressure information can be transferred all the way back to the line card (network processor) at the ingress port:
  • the switch fabric utilizes lossless backpressure protocol for flow control.
  • the backpressure could come from either the line card at the egress port or the switch fabric internal queues.
  • the line card backpressure is 8-bit per port (the BP[7:0] field in the PCH, the LSRC[1 :0] field is used to identify which OC-48 port is sending the BP[7:0].
  • LSRC[1:0] should be set to 0 for OC-192 port) for indicating the status of the 8 priority queues on the linecard.
  • the 8-bit backpressure information is applied directly on the OQ (in the Qchip) for the specific port.
  • the backpressured sub-OQs will stop sending data to the egress port.
  • our OQs are egress port specific (an OQ for either an OC-192 port or an OC-48 port).
  • VIQs Virtual Input Queues
  • VIQs can be used to achieve fairness among ingress ports at an egress port, the backpressure protocol and design complexity will unnecessarily go up dramatically due to the large number of queues that need to be maintained on a single Qchip. Instead of using VIQs to achieve fairness, user can map different types of traffic to different priorities for the same purpose.
  • telnet traffic and ftp traffic can be mapped to different priorities (the switch fabric has total 8 priorities) to guarantee the timely response to the telnet traffic under heavy ftp traffic load.
  • the fairness can also be achieved by higher-level traffic management protocols.
  • Another argument for not implementing the VIQs is that although fairness among ingress ports at an egress port is better achieved with VIQs within a switch fabric, the problem still exists once the cells (packets) get out the switch fabric. This is because that once leaving a switch fabric, the packets are no longer distinguishable with the ingress port numbers.
  • the switch fabric uses OQs for the unicasting traffic. For each OC-192 egress port, the switch fabric also has an MOQ for multicasting traffic. When an OC-192 port is configured as 4 OC-48 ports, the OC-48 ports will share the same MOQ. The BP[7:0] will be applied to both the OQ and MOQ for the specific egress port. Note that there may be head-of-line blocking in the MOQ if an OC-192 port is configured to be quad-OC-48. In this case, a backpressured OC-48 port will block the entire MOQ (for the specific priority).
  • the line card backpressure information does not directly result in the generation of internal backpressure.
  • the internal backpressure from the egress portion of the Qchip to the Schedulers (Xchip) will come from the OQs and MOQs.
  • the Qchip to Scheduler backpressure information is carried in the MBP[7:0], UBP[31:0] and BP_BAS fields of the CCH and transferred from the Qchip to the Scheduler in two cell times.
  • the MBP and UBP fields of the CCH are simply the status of the OQ and MOQ.
  • a Qchip will always generate the unicasting backpressure (UBP) information for the OQs.
  • the Qchip can be programmed through CSRs on per OC-1 2 port basis to select whether to generate multicasting backpressure (MBP) information to the Scheduler.
  • MBP multicasting backpressure
  • the reason behind this is that there could be head-of-line blocking in the multicasting queues once an egress port is blocked. Disabling all the egress ports to generate multicasting backpressure can eliminate the head-of-line blocking in the multicasting queues. However, disabling the multicasting backpressure on an egress port will make the multicasting traffic to that port lossy.
  • the Qchip will drop any multicasting cells towards an MOQ if the MOQ is backpressured and the generation of backpressure information to the Scheduler for that MOQ is disabled. There could be on-the-fly cells coming to a Qchip even after the backpressure information has been generated and transferred to the Scheduler.
  • the maximum number of the on-the-fly cells towards an OQ or MOQ is the product of the round-trip delay in the unit of cell time from Qchip to Qchip times the number of switching domains.
  • the adaptive dynamic threshold logic for the OQs and MOQs will always reserve certain amount entries in the two on-chip SRAM blocks. The number of reserved entries from each SRAM block is programmable through CSRs to be 0, 32, 64 or 128 entries.
  • the Schedulers in each switching domain maintains the complete backpressure information of the entire switch fabric.
  • the system wide backpressure information will be updated every 2 cell-times by UBP and MBP fields in the CCH.
  • the requests in the RVOQs and RMIQs of the Xchips will be selectively masked by the backpressure information. As a result, the backpressured requests will not be served (or scheduled) by the crossbar scheduler.
  • the backpressure information is applied to the unicasting requests in the unit of sub-VOQs.
  • Multiple ingress ports may share the same VOQ in different configurations. In such configurations, multiple ingress ports may be backpressured by a single egress port.
  • the backpressure information is applied to the multicasting requests in the unit of OC-192 port. As in the MOQ case, there could be head-of-line blocking in the multicasting RMIQs.
  • the backpressure protocol is achieved by broadcasting the UBP and MBP in the CCH from the Qchip to the Scheduler.
  • the backpressure protocol from the Scheduler to the ingress portion of the Qchip is different and is designed to be credit based.
  • the P-Link interfaces in the Qchips keep track of pending requests in the Xchip. The Qchip will stop sending requests once there is no room in the Xchip request queues.
  • the Qchip will broadcast the 512-bit unicasting (64 VOQs, 8 sub-VOQs per VOQ) backpressure information to the line card in the UBP_MAP[63:0] and UBP_BASE[2:0] fields of the PCH in 8 cell times.
  • the multicasting backpressure is generated from MIQs and broadcasted to the line card in the MBP_MAP[7:0] field of the PCH every cell time.
  • Each bit of the MBP_MAP[7:0] corresponds to a priority of an OC-192 port.
  • the unicasting backpressure information is VOQ based.
  • the multicasting backpressure is per OC- 192 port based. In the case that an OC-192 is configured as quad-OC-48 ports, the MBP_MAP[7:0] field will backpressure all 4 OC-48 ports.
  • the adaptive dynamic threshold logic will reserve entries for on-the-fly cells (packets) from the line card to the Qchip.
  • the number of reserved entries in the SRAM block from VOQs and MIQs can be programmed via CSRs to be 0, 32, 64 or 128.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Selon cette invention, des puces crossbar et de mise en file d'attente à interfaces de canaux à base de paquets de transmission point à point, ainsi que les largeurs de bandes internes élevées totales résultantes, sont conçues sous la forme de modules pour une matrice de commutation à base de CIOQ, laquelle matrice prend en charge la commutation de cellule de longueur fixe à haute capacité. En concentrant la totalité du trafic, malgré de grands volumes, sur une seule puce de commutation, le nombre de broches et le nombre de puces du système est réduit considérablement. Cette matrice de commutation offre une capacité de commutation améliorée, tout en fonctionnant avec une vitesse accrue des sous-unités pour diminuer les contraintes temps-cellule.
EP01956728A 2000-08-15 2001-08-14 Commutateurs et routeurs haute performance possedant des domaines de commutation paralleles avec acceleration des sous-unites Withdrawn EP1310065A2 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US640462 1984-08-13
US64046200A 2000-08-15 2000-08-15
PCT/IB2001/001451 WO2002015489A2 (fr) 2000-08-15 2001-08-14 Commutateurs et routeurs haute performance possedant des domaines de commutation paralleles avec acceleration des sous-unites

Publications (1)

Publication Number Publication Date
EP1310065A2 true EP1310065A2 (fr) 2003-05-14

Family

ID=24568350

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01956728A Withdrawn EP1310065A2 (fr) 2000-08-15 2001-08-14 Commutateurs et routeurs haute performance possedant des domaines de commutation paralleles avec acceleration des sous-unites

Country Status (3)

Country Link
EP (1) EP1310065A2 (fr)
AU (1) AU2001278644A1 (fr)
WO (1) WO2002015489A2 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE387828T1 (de) 2004-04-05 2008-03-15 Alcatel Lucent Zeitmultiplexstreckenverbindungen zwischen einer koppelmatrix und einem port in einem netzelement
US8989009B2 (en) 2011-04-29 2015-03-24 Futurewei Technologies, Inc. Port and priority based flow control mechanism for lossless ethernet
US9612992B2 (en) 2012-04-18 2017-04-04 Zomojo Pty Ltd Networking apparatus and a method for networking

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5367520A (en) * 1992-11-25 1994-11-22 Bell Communcations Research, Inc. Method and system for routing cells in an ATM switch

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0497097B1 (fr) * 1991-01-08 1996-11-06 Nec Corporation Système de commutation avec un étage d'entrée pour diffuser des paquets avec estampille temporelle et avec un étage de sortie pour mise de paquets en séquence
US5440550A (en) * 1991-07-01 1995-08-08 Telstra Corporation Limited High speed switching architecture
US5832303A (en) * 1994-08-22 1998-11-03 Hitachi, Ltd. Large scale interconnecting switch using communication controller groups with multiple input-to-one output signal lines and adaptable crossbar unit using plurality of selectors
US5905729A (en) * 1995-07-19 1999-05-18 Fujitsu Network Communications, Inc. Mapping a data cell in a communication switch
US6052373A (en) * 1996-10-07 2000-04-18 Lau; Peter S. Y. Fault tolerant multicast ATM switch fabric, scalable speed and port expansion configurations

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5367520A (en) * 1992-11-25 1994-11-22 Bell Communcations Research, Inc. Method and system for routing cells in an ATM switch

Also Published As

Publication number Publication date
WO2002015489A2 (fr) 2002-02-21
WO2002015489A3 (fr) 2002-12-12
AU2001278644A1 (en) 2002-02-25

Similar Documents

Publication Publication Date Title
US7221650B1 (en) System and method for checking data accumulators for consistency
US7298739B1 (en) System and method for communicating switch fabric control information
US8964754B2 (en) Backplane interface adapter with error control and redundant fabric
US5467347A (en) Controlled access ATM switch
US4875206A (en) High bandwidth interleaved buffer memory and control
US4872159A (en) Packet network architecture for providing rapid response time
US4893302A (en) Arrangement for switching concentrated telecommunications packet traffic
US7206283B2 (en) High-performance network switch
US6798784B2 (en) Concurrent switching of synchronous and asynchronous traffic
US7352694B1 (en) System and method for tolerating data link faults in a packet communications switch fabric
US20050207436A1 (en) Switching device based on aggregation of packets
US7193994B1 (en) Crossbar synchronization technique
US7212525B2 (en) Packet communication system
US20020089972A1 (en) High-performance network switch
US20020091884A1 (en) Method and system for translating data formats
US7324537B2 (en) Switching device with asymmetric port speeds
JPH0213043A (ja) 同時資源要求解決メカニズム
JP2002533994A (ja) データ交換方法およびその装置
CN1440608A (zh) 具有网格化底板的系统以及用于通过该系统传输数据的过程
KR100204203B1 (ko) 음성 및 데이타 패킷 스위칭 네트워크 및 그 방법
US6721310B2 (en) Multiport non-blocking high capacity ATM and packet switch
US6301269B1 (en) ATM switching system and method with propagation delay compensation
US6973072B1 (en) High performance protocol for an interconnect system of an intermediate network node
Yun A terabit multiservice switch
EP1310065A2 (fr) Commutateurs et routeurs haute performance possedant des domaines de commutation paralleles avec acceleration des sous-unites

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20030303

AK Designated contracting states

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

17Q First examination report despatched

Effective date: 20031009

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: MINDSPEED TECHNOLOGIES, INC.

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20041125