EP1183833A1

EP1183833A1 - Apparatus and method for traffic shaping in a network switch

Info

Publication number: EP1183833A1
Application number: EP00941149A
Authority: EP
Inventors: Stanley M. Reynolds; Todd L. Lowpensky; Richard Lemyre
Original assignee: Network Equipment Technologies Inc
Current assignee: Network Equipment Technologies Inc
Priority date: 1999-05-28
Filing date: 2000-05-26
Publication date: 2002-03-06
Also published as: RU2001135829A; JP4504606B2; WO2000074321A9; IL146767A0; JP2003501885A; WO2000074321A1; IL146767A; AU5589800A

Abstract

An apparatus and method for traffic shaping in a network switch, which provides for per-connection shaping. A Cell Descriptor (CD)-processing block and a ShapeID processing block operate to de-couple the management of the CDs from the scheduling of the CD output times. The CD-processing block outputs a token (ShapeID) to the ShapeID block. If the token is conforming, it is immediately passed back to the CD-processing block, otherwise it is processed. When the token is 'mature' the token is passed back to the CD-processing block. Use of 'now' and 'later' lists with per-connection ShapeIDs provides priority within a virtual connection (VC) and a virtual path (VP), respectively. This effectively preserves the relative priority for connections being shaped within a VP. Also, the use of a Calendar Queue reduces the complexity of a 'virtual finishing time' (VFT) calculation.

Description

APPARATUS AND METHOD FOR TRAFFIC SHAPING IN A NETWORK SWITCH BACKGROUND OF THE INVENTION

1. Field of the Invention The present invention relates generally to the field of network communications, and more particularly to an apparatus and method for traffic shaping in a network switch.

2. Description of the Related Art

In general, network communication systems interconnect many users in a network. Each user is connected to the network through a port. The network is formed by the interconnection of many nodes, whereby information input at an input port from one user at a source is passed from node to node through the network to an output port and to another user at a destination. The information transferred from source to destination is packetized and each node switches incoming packets at incoming ports to outgoing packets at outgoing ports. For ATM (Asynchronous Transfer Mode) networks, the packets are further divided into cells.

Using current technology, fast packet switches transfer hundreds of thousands of packets per second at every switch port. Each switch port is typically designed to transfer information at a rate from 50 Mbit/s to 2.4 Gbit/s for a broadband integrated service digital network (BISDN). Switch sizes range from a few ports to thousands of ports.

The term "fast packet switch" includes switches capable of handling both variable length packets and fixed length packets. Use of fixed-length packets can simplify the switch design. Fast packet switches using short, fixed-length packets (cells) are referred to as ATM switches. Fast packet switches handle different types of communications services in a single integrated network where such services may include voice, video and data communications. Since voice and video services can tolerate only a limited amount of delay and delay variance through a network, ATM switches are suitable for such services. The ATM standard for broadband ISDN networks defines a cell having a length of 53 bytes with a header of 5 bytes and data of 48 bytes. The ATM Forum Traffic Management Specification has specified a number of Service Class Definitions as follows: CBR: Continuous Bit Rate. For real-time applications requiring tightly constrained delay and delay variation such as voice and video. The CBR service class requires the consistent availability of a fixed quantity of bandwidth.

RT-NBR: Realtime Variable Bit Rate. For applications where sources transmit at a rate which varies with time (referred to in the art as "bursty"), yet still must receive service with tightly constrained delay and delay variation. ΝRT-VBR: Νon-Realtime Variable Bit Rate. For bursty applications, having no service requirements related to delay or its variance, but having sensitivity to loss. UBR: Unspecified Bit Rate. For non-real-time applications, such as file transfer and e-mail, that transmit non-continuous bursts of cells without related service guarantees and therefore without allocated bandwidth resource, without guarantee as to cell loss ratio or cell transfer delay, and without explicit feedback regarding current level of network congestion. GFR: Guaranteed Frame Rate. Also for non-real-time applications, this service category provides loss guarantees for sources transmitting traffic at or below a contracted minimum rate. Once a source exceeds the contracted minimum rate, traffic above that rate does not receive any loss guarantees.

ABR: Available Bit Rate. For non-real-time applications that permit variation in information transfer rate depending on the amount of bandwidth available in the network. In a typical ATM switch, the cell processing functions are performed within the nodes of a network. Each node is an ATM switch which includes input controllers (IC's), a switch fabric (SF), output controllers (OC's) and a node control (C). The node control is used for functions including connection establishment and release, bandwidth reservation, buffering control, congestion control, maintenance and network management.

In each switch, the input controllers are typically synchronized so that all cells from input controllers arrive at the switch fabric at the same time and cells can be accepted or rejected according to their priority. The traffic through the switch fabric is slotted and the switch fabric delay equals the sum of the timeslot duration, pipeline delay and the queuing delay.

The node control communicates with the input controllers and the output controllers either by a direct communication path which by-passes the switch fabric or via control cells transmitted through the switch fabric.

External connections to the switch are generally bi-directional. Bi-directional connections are formed by grouping an input controller (IC) and an output controller (OC) together to form a port controller (PC). The input sequence of cells in a virtual channel is preserved across the switch fabric so that the output sequence of cells on each virtual channel is the same as the input sequence. Cells contain a virtual channel identifier (VCI) in the cell header which identifies the connection to which the cell belongs. Each incoming VCI in the header of each cell is translated in an input controller to specify the outgoing VCI identifier. This translation is performed in the input controller typically by table look-up using the incoming VCI to address a connection table. This connection table also contains a routing field to specify the output port of the switch fabric to which the connection is routed. Other information may be included in the connection table on a per connection basis such as the priority, class of service, and traffic type of the connection. In an ATM switch, cell arrivals are not scheduled. In a typical operation, a number of cells may arrive simultaneously at different input ports, each requesting the same output port. Operations in which requests exceed the output capacity of the output port are referred to as output contention. Since an output port can only transmit a fixed number (for example, one) cell at a time, only the fixed number of cells can be accepted for transmission so that any other cells routed to that port must either be discarded or must be buffered in a queue. Different methods are employed for routing cells through a switch module, for example, self-routing and label routing. A self-routing network operates with an input controller prefixing a routing tag to every cell. Typically, the input controller uses a table look-up from a routing table to obtain the routing tag. The routing tag specifies the output port to which the cell is to be delivered. Each switching element is able to make a fast routing decision by inspecting the routing tag. The self-routing network ensures that each cell will arrive at the required destination regardless of the switch port at which it enters.

A label routing network operates with a label in each cell referencing translation tables in each switching element. The label is translated in each switching element and hence any arbitrary network of switching elements may be employed. Switches have two principal designs, time-division and space division. In a time-division switch fabric, all cells flow through a single communication channel shared in common by all input and output ports. In a space division switch, a plurality of paths are provided between the input and output ports. These paths operate concurrently so that many cells may be transmitted across the switch fabric at the same time. The total capacity of the switch fabric is thus the product of the bandwidth of each path and the average number of paths that can transmit a cell concurrently.

When the traffic load exceeds the available system resources in a network, congestion is present and performance degrades. When the number of cells is within the carrying capacity of the network, all cells can be delivered so that the number of cells delivered equals the number of cells sent without congestion. However, if cell traffic is increased to the level that nodes cannot handle the traffic, congestion results. Congestion can be brought about by several factors. If nodes in a network are too slow to perform the various tasks required of them (queuing buffers, updating tables, etc.), queues build up, even though excess line capacity exists. On the other hand, even if nodes are infinitely fast, queues will build up whenever the input traffic rate exceeds the capacity of the output traffic rate for any particular group of outputs.

If a node has no free buffers for queuing cells, the node must discard newly arriving cells. For packet data traffic, when a cell is discarded, the packet from which the discarded cell came will be retransmitted, perhaps many times, further extending the congestion epoch.

In an ATM switch, in order to guarantee a certain service rate, the flow of incoming data needs to be predictable, thereby allowing a designer to provide adequate buffer space. One problem which arises is that the cells do not arrive with a uniform distribution. In fact, most traffic arrives in "bursts" - with a cell group, having a random size, transmitted in between delays of random duration. In order to provide for a more predictable data stream, the cell bursts are shaped by a device known in the art as a "shaper." The shaper takes the cell bursts and distributes the cells evenly, according to a predefined "shape." Different virtual channels (VCs) may require different shapes, and therefore it would be desirable to have a shaper that shapes each VC independently.

SUMMARY OF THE INVENTION In general, the present invention is an apparatus and method for traffic shaping in a network switch, which provides for per-connection shaping. A shaper according to the present invention comprises two functional blocks: a Cell Descriptor (CD) processing block, and a ShapelD processing block. The CD processing block and the ShapelD processing block operate to de-couple the management of the CDs from the scheduling of the CD output times. The CD-processing block outputs a token (ShapelD) to the ShapelD block. If the token is conforming, it is immediately passed back to the CD-processing block, otherwise it is processed. When the token is "mature" the token is passed back to the CD-processing block. The CD processing block then outputs a CD.

Use of "now" and "later" lists with per-connection ShapelDs provides priority within a virtual connection (VC) and a virtual path (VP), respectively. This effectively preserves the relative priority for connections being shaped within a VP. In other words, a higher priority VC may be sent first, even if it did not generate the token, thus preserving cell priority. Also, the use of a Calendar Queue reduces the complexity of a "virtual finishing time" (VFT) calculation. BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:

Figure 1 is a schematic block diagram of a plurality of source/destination (S/D) users connected through a multi-node network;

Figure 2 is a schematic representation of a circuit with one S/D user connected to another S/D user through a sequence of nodes in the network of Figure 1 ; Figure 3 is a schematic representation of the Figure 2 circuit with a virtual channel connection of the source (S) sending information in a forward direction (F) to a destination (D) and with a reverse direction (R) for transmitting control signals to the source (S); Figure 4 is a schematic representation of a typical one of the nodes (N) in the

Figure 1 network;

Figure 5 is a schematic representation of the queuing unit in the Figure 4 node;

Figure 6(A) is an illustration of cell traffic, with each cell spaced 1ms apart, and "bursts" of traffic randomly spaced; Figure 6(B) is an illustration of the cell traffic of Figure 6(A) after the cells have been "shaped" with a uniform spacing of 3 ms;

Figure 7 is a block diagram of the functional blocks of a shaper configured according to the present invention;

Figure 8 is an example of a Cell Descriptor (CD) format; Figure 9 is a block diagram of one implementation of a shaper configured according to the present invention;

Figure 10 is a diagram illustrating the data flow of the ShapelD through the ShapelD processing block;

Figure 11 is a diagram of a Calendar Queue configured according to the present invention;

Figure 12 is a diagram of a "mature" linked list of ShapelDs;

Figure 13 is a table of the minimum and maximum cell intervals according to one embodiment of the present invention;

Figure 14 is a table of examples of minimum cell intervals; Figure 15 is a truth table for the scheduling operation;

Figure 16 is a truth table of the Calendar Queue insertion time calculation;

Figure 17 illustrates the schedule sequence for scheduling a ShapelD;

Figure 18 illustrates the operation of the "mature" sequence for the ShapelD processing block; Figure 19 illustrates the operation of the management sequence of the ShapelD processing block;

Figure 20 illustrates an example of an overall sequence performed by the ShapelD processmg block;

Figure 21 is a diagram illustrating the data flow of the CD and ShapelD through the CD-processing block;

Figure 22 is a diagram of the data structures and data flow in the CD-processing block;

Figure 23 illustrates the operation of the receive sequence for the CD-processing block;

Figure 24 illustrates the operation of the transfer sequence for the CD-processing block; Figure 25 illustrates the operation of the transmit sequence for the CD- processing block;

Figure 26 illustrates the operation of the management sequence for the CD- processing block; and

Figure 27 illustrates an example of an overall sequence performed by the CD- processing block.

DETAILED DESCRIPTION OF THE INVENTION The following description is provided to enable any person skilled in the art to make and use the invention and sets forth the best modes contemplated by the inventor for carrying out the invention. Various modifications, however, will remain readily apparent to those skilled in the art, since the basic principles of the present invention have been defined herein specifically to provide an apparatus and method for traffic shaping in a network switch. Any and all such modifications, equivalents and alternatives are intended to fall within the spirit and scope of the present invention. Referring first to Figure 1, a plurality of network users are represented as the source/destination (S/D) 4. Each user typically sends information as a source (S) and receives information as a destination (D). The source (S) of an S/D unit 4 will send information to the destination (D) of some other S/D unit 4. In order for information to be transferred from a source to a destination, each S/D unit 4 connects through a multi- node (N) network 1. The network 1 includes many nodes (N) 5. The nodes are connected from node to node so that, in general, any particular one of the S/D units 4 can connect to any one of the other S/D units 4 by forming a chain of nodes 5 in the network 1. In general, the connections between the S/D units 4 and a node 5, and the connections between nodes 5, are by bi-directional links 8 which enable information to be transferred in both directions.

In Figure 1, the number of nodes (N) 5 shown is for clarity a relatively small number, but the network may include hundreds or more of nodes. Also, the S/D units 4 include S users 4-0, 4-1, 4-2, 4-3, 4-4, ..., 4-(S-2), 4-(S-l). The value of S can be any integer, although S is typically equal to hundreds or higher.

In a typical embodiment, the Figure 1 communication system is an ATM network in which the unit of transfer of information is a cell. A plurality of cells form packets of information. The network I communicates cells and packets so as to support different types of information including images, voice and data.

In Figure 2, the S/D unit 4-x connects through a plurality C of nodes (N) 5-0, 5-1 ... 5-(C-l) to the S/D unit 4-y. The S/D unit 4-x is typical of any of the S/D units 4 of Figure 1. For example, the S/D unit 4-x may represent the S/D unit 4-2 in Figure 1. Similarly, the S/D unit 4-y in Figure 2 may represent any of the S/D units 4 in Figure 1. For example, S/D unit 4-y may represent the S/D unit 4-4 in Figure 1. In such an example, the nodes 5-0, 5-1, ..., 5-(C-l) represent the C nodes in the network 1 of Figure 1 which are used to connect the S/D unit 4-2 to the S/D unit 4-4.

In Figure 2, the bi-directional links 8-0, 8-1, ..., 8-(C-l), 8-(C) connect from the S/D unit 4-x through the nodes 5-0, 5-1, ..., 5-(C-l) to the S/D unit 4-y. In Figure 2, information may be transferred from the source (S) in the S/D unit 4-x to the destination (D) in the S/D unit 4-y. Similarly, information from the source (S) in the S/D unit 4-y can be transferred to the destination (D) in the S/D unit 4-x. While information may be transferred in either direction in Figure 2, it is convenient, for purposes of explanation to consider transfers between a source (S) and a destination (D), whether that be from the S/D unit 4-x to the S/D unit 4-y or from the S/D unit 4-y to the S/D unit 4-x. Regardless of the direction, each transfer is from a source (S) to a destination (D).

In Figure 3, a schematic representation of the circuitry used for a source (S) to destination (D) transfer in the virtual channel of Figure 2 is shown. In Figure 3, the source unit 4-(S) in the S/D unit 4-x of Figure 2 connects to the destination unit 4-(D) in the S/D unit 4-y of Figure 2.

In Figure 3, each of the links 8-0, 8-1, ..., 8-(C-l), 8-(C) includes a forward (F) channel for transferring information in the forward direction and a reverse (R) channel for transferring information in the reverse direction. The forward channel in Figure 3 is associated with the transfer of information from the source unit 4-(S) to the destination unit 4-(D). The reverse channel in Figure 3 is for the purpose of sending control information used in connection with the network of Figure 1. The reverse channel (R) is distinguished from the forward channel (F) used for the transfer of information in the forward direction from S/D unit 4-y to S/D unit 4-x, as discussed in connection with Figure 2. Both the forward (F) and the reverse (R) channels are associated with the source unit 4-(S) transfer to the destination unit 4-(D). Each of the nodes in Figure 3 includes forward (F) circuitry 6 and reverse (R) circuitry 7. In Figure 3, the forward channels 8-OP, 8-IF, ..., 8-(C-l)F connect as inputs respectively to the forward circuits 6-0, 6-1, ..., 6-(C-l). The forward channel 8-(C)F connects from the node 6-(C-l) to the D unit 4-(D). Similarly, the reverse channels 8-OR, 8-1R, ..., 8-(C-l)R connect from the reverse circuits 7-0, 7-1, ..., 7-(C-l). The reverse channel 8-(C)R connects from the D unit 4-(D) to the reverse circuit 7-(C-l). In Figure 3, each of the nodes 5 has a feedback connection 9 connecting from the forward (F) circuit 6 to the reverse (R) circuit 7. Specifically, the feedback channels 9-0, 9-1, ..., 9-(C-l) connect from the forward (F) circuits 6 to the reverse (R) circuits 7 in the node 5-0, 5-1, ..., 5-(C-l), respectively. In the FIG 3 circuit, a virtual channel connection is made along the forward channel setting up a communication path in the forward direction between the S unit 4-(S) and the D unit 4-(D). Because other virtual channels are also established in the network 1 of Figure 1, buffering is required at each node and destination including the nodes of Figure 3.

In Figure 4, one typical embodiment of a node having the signal paths of Figure 3 is shown. In Figure 4, the node 5 includes N links 18-0, 18-1, ... , 18-n, ... 18-(N-1). Each of the links 18 of Figure 4 are analogous to the bi-directional links 8 of Figure 2. In Figure 4, the links 18-0, 18-1, ..., 18-n, ..., 18-(N-1) connect to port controllers 11-0, 11-1, ..., 11-n, ..., 11-(N-I).

The node of Figure 4 is used in connection with the information transfer of Figure 3, for example, by having one of the links 18, for example, input link 18-0 in Figure 4, connect through switch fabric 10 to another one of the links 18, for example, link 18-n. In the example described, the switch fabric 10 functions to connect the link 18-0 to the link 18-n. In an example where the node of Figure 4 represents the node 5-1 in Figure 2, the link 8-1 in Figure 2 is the link 18-0 in Figure 4 and the link 8-2 in Figure 2 is the link 18-n in Figure 4. With such a connection, the node of Figure 4 connects information in one direction, for example, from link 18-0 to link 18-n, and connects information in the opposite direction from the link 18-n to the link 18-0. The links 18-0 and 18-n were arbitrarily selected for purposes of explanation. Any of the N links 18 might have been selected in the Figure 2 circuit for connection to any of the other links 18.

When the node of Figure 4 is used in the virtual channel connection of Figure 3 with the source (S) on the left and the destination (D) on the right, then for purposes of explanation it is assumed that the link 18-0 is an input to the node 5 in the forward direction and the link 18-n is output from the node in the forward direction.

In Figure 4, port controllers (PC) 11-0, 1 1-1, ..., 11-n, ..., 1 1-(N-1) have input controllers 14-0, 14-1, ..., 14-n, ..., 14-(N-1), respectively and have output controllers (OC) 15-0, 15-1, ... 15-n, ..., 15-(N-1), respectively. In Figure 4, forward information cells from the source 4-S of Figure 3 sent to the destination 4-(D) of Figure 3 connect from the bus 18-01 through the input controller 14-0 to the bus 20-nO through the switch fabric 10 to the bus 20-nl through the controller 15-n to the bus 18-nO. The port controllers share a common buffer storage located in shared queuing unit 51 and are bidirectionally connected to unit 51 over buses 41-0, 41- 41-n, ..., 41-(N-1).

In Figure 5, the queuing unit 51 of Figure 4 is shown in greater detail. The queuing unit 51 includes a data queue unit 52 and a queue control unit 53. The data queue unit 52 and the queue control unit 53 each connect to the bi-directional buses 41- 0, 41-1, ..., 41-n .... 41-(N-1). The control information on the buses 41 connect to the queue control unit 53 and the data on the buses 41 connect to the data queue unit 52. In Figure 5, the queue control unit 53 includes a queue manager 54 which controls data queue unit 52 and the overall operation of the queuing unit 51. The queue manager typically includes a processing unit capable of executing software. Upon detection that input information on the buses 41 requires storage in the data queue unit 52, the queue manager 54 detects an available buffer location from the free buffer list unit 59 and assigns the available data location in the data queue unit 52. The general function and operation of queue managers are well known. In addition to queuing, and in order to operate with the methods of the present invention, certain cells may need to be discarded from time to time to promote efficient operation of the overall communication network. The discard unit 55 under control of the queue manager 54 determines when to discard queue assignments previously allocated. A shaper block 60 "re-shapes" the cells, which usually arrive in bursts, and evenly spaces out the cells, as illustrated in Figure 6. The results of the queuing operation are stored in the per port queue unit 56, which in turn activates the de-queue unit 57, which in turn operates through the multicast server 58 to remove buffer locations that have been previously allocated. Once removed, the de-queued buffer locations are added back to the free buffer list in the unit 59 and are available for reassignment.

The discard unit 55 comprises three units: FIFO unit 61 (including sub-units 61- 1 and 61-2), discard unit 62, and pointer integrity unit 63. Discard unit 55 is responsible for:

1. Guaranteeing the contracted Quality of Service (QoS) of all the connections (by discarding non-conforming cells).

2. Surveillance and control of buffer congestion.

3. Performing Explicit Forward Congestion Indication (EFCI) tagging in the ATM header when the buffer starts to become congested.

4. Performing a per connection cell and frame discard when the congestion becomes excessive.

5. insuring fairness between the non-guaranteed connections (ABR, GFR, and UBR).

6. Providing different quality for ABR, GFR, and UBR traffic, by supporting various EFCI and discard thresholds. 7. Pointer integrity verification (verify that no pointer duplication occurs).

As mentioned above, the shaper block 60, spaces out cell bursts, and evenly distributes the cells. Figure 6(A) illustrates a sample transmission stream having cells spaced 1 ms apart that are bunched together in groups known as bursts, with irregular delays between bursts. A shaper takes the cell bursts and evenly distributes the cells, such that the cells are transmitted in even 3 ms intervals, as shown in Figure 6(B).

In general, as shown in Figure 7, a shaper 60 configured according to the present invention comprises two functional blocks: a Cell Descriptor (CD)-processing block 70, and a ShapelD-processing block 72. The functional blocks may be implemented as separate ASICs, or on the same chip. As described herein, the CD-processing block 70 is referred to as the DALEK 70 and the ShapelD-processing block 72 is referred to as the TARDIS 72. A Cell Descriptor (CD), as is known in the art, is a descriptor representing each cell. The CD for each cell is routed through the control path, instead of each cell, in order to provide more efficient processing. Once the discard subsystem 55 and shaper 60 process the CD, the corresponding cell is output from memory. An example of a CD format is shown in Figure 8.

The DALEK 70 stores the CDs and generates a token (ShapelD). The ShapelD is basically a pre-defined "shape" that specifies the rate that the cells can be transmitted. In operation, the shaper of the present invention allows a user to specify the shaped cell rates, or the user can defer the decision to software control. A token is output from the DALEK 70 to the TARDIS 72. The TARDIS 72 processes the ShapelD, and returns a token to the DALEK 70, which in turn outputs the appropriate CD, as described in further detail below.

From the connection identifier (ConnectionID) in the CD for each cell, the DALEK 70 determines the appropriate ShapelD. The TARDIS 72 contains tables that specify for each unique ShapelD, minimum time interval between cells. When a token "matures" (i.e. a cell can go out for a specific connection), a token is sent back to the DALEK 70. The DALEK then determines exactly which VC has priority, and sends out a cell. Thus, a cell on a higher priority VC gets sent, even if it did not originally generate the token. The present invention allows a specific connection to be shaped independently of other connections. Also, numerous different connections may be shaped according to the same ShapelD. High and low priority traffic can thus be sent in the same physical connection.

Figure 9 is a more detailed block diagram of one implementation of the present invention. The DALEK 70 utilizes three separate memory arrays: a SHAPE RAM 701, a COIN RAM 702, and a DATA BUFFER 703. Similarly, the TARDIS interacts with three arrays: a GCRA (Generic Cell-Rate Algorithm) RAM 721, a LINK RAM 722, and a MINT RAM 723. The DALEK 70 and the TARDIS 72, together with their associated RAM arrays, implement the complete logic functionality of the shaper 60. The relationship between the TARDIS 72 and DALEK 70 is one of master and slave, respectively. The TARDIS 72 controls the interface connecting the two blocks, and provides Main Timing Sequence signals to the DALEK 70. Interaction involves ShapelDs and management data. ShapelDs are exchanged between TARDIS 72 and DALEK 70, de-coupling the management of CDs from the scheduling of CD output times. The former is the responsibility of the DALEK 70, while the latter is the responsibility of the TARDIS 72. Up to six ShapelDs may pass between DALEK 70 and TARDIS 72 in each Main Timing Sequence - three in each direction.

The DALEK 70 is managed by an external CPU, via the TARDIS 72. The TARDIS 72 reads all DALEK 70 read registers once every Main Timing Sequence, keeping local copies which may be read by the CPU. Similarly, CPU write data intended for the DALEK 70 is transferred from the TARDIS 72 to the DALEK 70 within one Main Timing Sequence of arrival from the CPU. Some bits of the DALEK 70 Status Register can assert the Interrupt output of the TARDIS 72. Each such interrupt source is individually enabled. All event flags transferred from the DALEK 70 to the TARDIS 72 are captured and held until read by the CPU. Communication between the DALEK 70 and TARDIS 72 is accomplished using a shared data bus plus control signals. Both ShapelD and management data share the same bus. Time division multiplexing based on the Main Timing Sequence ensures the necessary timing and bandwidth for transfer of all required data. TARDIS block

Figure 10 is a block diagram of the TARDIS 72 (and associated RAMs) illustrating the data flow of the ShapelD tokens through the block. First, the ShapelD token is received from the DALEK 70, and its conformance is checked. A conforming ShapelD token is transmitted immediately back to the DALEK 70, whereas a non- conforming ShapelD token is inserted in the Calendar Queue. The ShapelD token is transferred from the Calendar Queue to the "mature" list, and then the ShapelD token is transmitted to the DALEK 70. The TARDIS 72 operates using sequences synchronized (described below) to a Main Timing Sequence, and provides sequence synchronization to the DALEK 70. Data structures managed by the TARDIS 72 include a set of GCRA configuration and state data, a Calendar Queue linked list array of scheduled ShapelDs and a "mature" linked list of ShapelDs queued for immediate output to the DALEK 70. The per-shape GCRA configuration and state data is maintained by the TARDIS 72 in the GCRA RAM 721. Configuration data includes the Minimum Cell Interval, defining the rate of the shape. State data includes Schedule Time and Count fields. Schedule Time is the output time of next ShapelD token. Count is the number of ShapelD tokens currently resident in the TARDIS 72. The Minimum Cell Interval is accessible from the main CPU. The GCRA data is used to schedule output times of ShapelD tokens up to six times in each Main Timing Sequence. Some scheduled ShapelDs (as described below) are inserted into the Calendar Queue, while others are held in the Count field of the shape. The Calendar Queue linked list array is maintained by the TARDIS 72 in the

MINT RAM 723 and LINK RAM 722. This structure is an array of 64K linked lists, one for each Calendar Time. Implementing the Calendar Queue as an array of linked lists allows ShapelD tokens on multiple shapes to be scheduled at the same time. The MINT RAM 723 holds the heads and tails of the linked lists. Each scheduled ShapelD token is usually appended to the Calendar Queue list for the calculated Schedule Time. Under some circumstances the ShapelD is appended to the list for the Current Time plus one.

In each Main Timing Sequence the Calendar Time is advanced. The Calendar Queue list for the new Current Time is transferred to the tail of the "mature" linked list. In this way, Calendar Queue lists for "old" Calendar Times are automatically emptied. The "mature" linked list is maintained by the TARDIS 72 using internal logic and the LINK RAM 722. This structure is a single linked list of ShapelDs queued for immediate output to the DALEK 70.

Up to three ShapelD tokens can be transferred to the DALEK 70 in each Main Timing Sequence. Precedence is given to conforming ShapelD tokens received in the Sequence, then ShapelD tokens from the "mature" linked list. This ensures congestion has minimum impact on conforming cell streams. The links for the Calendar Queue and "mature" linked lists both use the LINK RAM 722. Since only a single ShapelD token from each shape may be scheduled - i.e. present in either of the list structures, only 16K links are needed. The address of the LINK RAM 722 is the ShapelD and the data returned is the next ShapelD token in the same list. Figure 11 illustrates the Calendar Queue and Figure 12 shows the "mature" linked list structure. In the TARDIS 72, time is represented in a 16-bit binary field, giving a resolution of one Main Timing Sequence and a range of 64K Main Timing Sequences. Current Time increments once at the start of every Main Timing Sequence. The Minimum Cell Intervals are represented in a 24-bit binary field, giving a resolution of l/256th of a Main Timing Sequence and a range of 64K Main Timing Sequences. The 16 most significant bits of an interval are known as the "integer part." The 8 least significant bits of an interval are known as the "fractional part." The Peak Cell Rate (PCR) of each shape is defined in terms of the Minimum Cell Interval, which is the inverse of the rate. The minimum and maximum allowed rates are given in the table of Figure 13.

The high bandwidth limit is not enforced by the TARDIS 72. ShapelDs with higher bandwidth (i.e. smaller Minimum Cell Intervals) are therefore not guaranteed to be shaped correctly. Such ShapelDs are likely to suffer significant cell delay variation in the presence of other shaped connections due to the limited output bandwidth of the shaper 60. The low bandwidth limit is enforced by the TARDIS 72. A ShapelD configured with Minimum Cell Interval greater than the limit is not shaped (i.e. it is treated as if its Minimum Cell Interval is 0001 :00). Figure 14 shows examples of Minimum Cell Intervals that can be configured in the TARDIS 72, according to one embodiment of the present invention. Scheduling in the TARDIS 72 is carried out when:

1. ShapelD token received from the DALEK 70 (up to three in a Main Timing Sequence).

2. ShapelD token at head of "mature" list is transmitted to the DALEK 70. (up to three in a Main Timing Sequence). Figure 15 is a truth table for the scheduling operation. In the following discussion of the table it should be noted that the ShapelD tokens mentioned belong to a single shape. The 16K shapes supported by the TARDIS 72 are processed independently.

A scheduler result of "Firstln" occurs when a ShapelD token, is received from the DALEK 70 and there are no ShapelD tokens in the TARDIS 72 - indicated by a Count of zero. "Firstln" results in the ShapelD token being both returned to the

DALEK 70, since it is conforming, and inserted into the Calendar Queue. In addition, the Count is incremented. This shows an important characteristic of the algorithm - a "ghost" ShapelD token remains in the TARDIS 72 although no "real" ShapelD is present. The Count is actually the number of "real" ShapelD tokens plus one "ghost." A scheduler result of "Nextln" occurs when a ShapelD token is received from the DALEK 70 and there are already ShapelD token(s) in the TARDIS 72 - indicated by the Count being non-zero. "Nextln" results in the ShapelD token being held in the TARDIS 72 in the form of an increment to the Count. The ShapelD token is not returned to the DALEK 70 because the shape is currently non-conforming. Nor is it inserted in a Calendar Queue because a ShapelD token is already present.

A scheduler result of "NextOut" occurs when the ShapelD token at the head of the "mature" list is sent to the DALEK 70, and there are multiple ShapelD tokens in the TARDIS 72 - indicated by a Count greater than one. "NextOut" results in insertion of the ShapelD token in the Calendar Queue and the Count is decremented. A scheduler result of "GhostOut" occurs when the ShapelD token at the head of the "mature" list is sent to the DALEK 70, and there is only a "ghost" ShapelD token in the TARDIS 72 - indicated by a Count of one. "GhostOut" results in the Count being set to zero. This extra "ghost" ShapelD is ignored by the DALEK 70 since it finds no CD to output to the system.

Following "Firstln" and "NextOut" scheduling results in the ShapelD token must be appended to a Calendar Queue list - the list for the Schedule Time. The decision of exactly where to place each ShapelD is complicated by two factors:

1. The Calendar Queue has 64K entries, so the pointer wraps around regularly.

2. Congestion in the "mature" list can put the Schedule Time in the "past." The table of Figure 16 defines the truth table for Calendar Queue insertion time calculations. If "Current Time" is selected then the ShapelD token is placed in the (Current Time + 1) Calendar Queue. It is then appended to the "mature" list in the next

Main Timing Sequence.

The operation sequences carried out by the TARDIS 72 are tightly coupled to the Main Timing Sequence. The sequences are named Schedule, Mature and

Management. Schedule Sequence

This sequence carries out scheduling of a ShapelD. It is initiated either by reception of a ShapelD token from the DALEK 70 or by transmission of a ShapelD token to the DALEK 70 from the "mature" list. It inserts a ShapelD entry in the Calendar Queue and updates the Deferred Count. The table of Figure 17 illustrates this sequence:

1. GCRA RAM: Read current GCRA Configuration and State for the ShapelD.

2. Execution of the Scheduling Algorithm in internal logic. 3. GCRA RAM: Write updated GCRA Configuration and State.

4. MINT RAM: Read the current Head/Tail of the Schedule Time Calendar Queue.

5. MINT RAM: Write updated Head/Tail of the Schedule Time Calendar Queue. 6. LINK RAM: Write the link from the old Calendar Queue Tail to the new

Tail.

The MINT RAM and LINK RAM operations are only performed if the scheduling algorithm returns a result of "Firstln" or "NextOut."

Mature Sequence This sequence transfers a list of ShapelD tokens from the Current Time Calendar Queue to the tail of the "mature" linked list and loads the first three ShapelD tokens into the

TARDIS 72. It is initiated once in each Main Timing Sequence. The table of Figure 18 shows the sequence of

1. MINT RAM: Read the Current Time list from the Calendar Queue. 2. MINT RAM: Clear the Current Time list in the Calendar Queue.

3. LINK RAM: Links the Current Time list to the tail of the "mature" list.

4. LINK RAM: Reads the next (second) ShapelD token in the "mature" list.

5. LINK RAM: Reads the next (third) ShapelD token in the "mature" list. Management Sequence This sequence writes or reads a Minimum Cell Interval to/from the GCRA

RAM. These operations allow the configuration and monitoring of Minimum Cell Intervals by the CPU. The table of Figure 19 illustrates this sequence. The table shows the sequence of:

1. The address (ShapelD) pointed to by the Write Register WR SID is read, the data (MCI) is place in the Read Register RR_MCI_INT and RR_MCI_FRA. The Read Registers are only loaded for a Read Request. 2. The address (ShapelD) pointed to by Write Register WR_SID is written using the data (MCI) in Write Registers WR MCI NT and WR_MCI_FRA. This step only occurs for a Write Request.

Example Overall Sequence An example overall sequence carried out by the TARDIS 72 is shown in Figure

20. Such a sequence is run in each Main Timing Sequence. Each overall sequence combines the Schedule, Mature and Management sequences described above. The example in Figure 20 illustrates a worst case scenario in which:

1. Three ShapelD tokens from the DALEK 70, all with schedule "Firstln". 2. Three "mature" ShapelD tokens to the DALEK 70, all with Schedule result of

"NextOut".

3. CPU-requested GCRA RAM Configuration Write. DALEK block

The DALEK controls storage of the Cell Descriptors (CDs) currently residing in the shaper, including the management of linked lists for each Connection ID. Figure 21 illustrates the flow of a CD and associated ShapelD token into and out of the CD- processing functional block, or DALEK 70. When a CD is received from the system, the ShapelD look-up is first performed. The CD is stored in a "later" list, and the ShapelD token is output to the TARDIS 72. When the shape conforms, the ShapelD token is input to the DALEK 70 from the TARDIS 72. The CD is moved to the "now" list, and the CD is transmitted back to the system.

The DALEK 70 operates using sequences synchronized to the system Main Timing Sequence. Sequence synchronization is provided by the TARDIS 72. The Main Timing Sequence is 37 clock periods in length. This is approximately 685 ns or one-cell time in a STS-12c based system. A per-ConnectionlD configurable CLP

Option field allows each CD to be processed as either "CLP clear" or "CLP unchanged". CDs on "CLP clear" ConnectionlDs have their CLP bit reset on entry to the DALEK 70. CDs on "CLP unchanged" ConnectionlDs have their CLP bit passed unchanged. The CLP, and its associated parity bit, are the only fields of CDs modified by the DALEK 70.

The data structures managed by the DALEK 70 and the flow of data through the DALEK 70 will now be described. At any time, each CD in the DALEK 70 is stored in one of two linked list structures. A set of "later" linked lists , one for each ShapelD, holds CDs from when they are received until they are ready for transmission. A "now" linked list holds all CDs that are ready for transmission.

Up to three CDs may be received from the system in each Main Timing Sequence. Each CD includes a ToShape bit and a ConnectionlD field. Each CD with the ToShape bit set, for which a valid ConnectionlD to ShapelD mapping exists, is stored by the DALEK 70 in an external RAM array - the DATA BUFFER 703. Once stored, a CD is not moved when transferred between lists, instead the links are manipulated. Links are stored as part of the CD in the DATA BUFFER 703. An external RAM array called the SHAPE RAM 701 , holds the mapping table from ConnectionlD to ShapelD. Shaping is carried out on ShapelDs. Multiple ConnectionlDs may be mapped to a single ShapelD. The CLP Option field for ConnectionlD is stored in the SHAPE RAM 701 alongside its ShapelD. CDs with the ToShape bit set are appended to one of 16K "later" linked lists. The "later" lists are priority-based, applying a 4-level priority from a field in the CD. This field defines priority within the shaped connection - usually the VC priority. Heads and Tails of the "later" lists are stored in a separate external RAM array called the COIN RAM 702. Concurrently with storing a received CD, the DALEK 70 sends the ShapelD token to the TARDIS 72 for GCRA evaluation. The CD remains in the "later" list until it reaches the head of the list and the ShapelD is input from the TARDIS 72. A

ShapelD token input from the TARDIS 72 indicates that a CD with that ShapelD may be output to the system. The CD chosen is that at the highest priority occupied list for that ShapelD. It is transferred from the head of the "later" list to the tail of the "now" list. The "now" list provides an output queue to accommodate CDs which are ready for immediate output. This list is necessary since only one CD may output to the system in each Main Timing Sequence, while up to three ShapelDs may be input from the TARDIS 72. The "now" list is priority-based, applying 4-level priority from a field in the CD. This field defines priority between the shaped connections - usually the VP priority. Heads and Tails of the "now" list are stored within the DALEK 70 since only one "now" list exists. The data held in all three external RAM arrays is protected by parity bits. Parity is checked following every memory read operation and any error flagged. Similarly, the parity of CDs received from the system is checked and the errors flagged. Figure 22 illustrates these data structures and data flow through the DALEK 70. The operation sequences performed by the DALEK 70 are tightly coupled with the Main Timing Sequence. The sequences are named Receive, Transfer, Transmit, and Management. Receive Sequence

This sequence accepts a CD from the system, decodes the ShapelD and appends the CD to the ShapelD "later" linked list. A ShapelD token is passed to the TARDIS 72 during this sequence. The table of Figure illustrates this sequence:

1. SHAPE RAM: Read ShapelD, decoded from the CD ConnectionlD field.

2. COIN RAM: Read Head/Tail of ShapelD/Priority list, then write updated data. 3. DATA BUFFER: Write CD and null link, then write link to old Tail of list.

Transfer Sequence

This sequence transfers a CD from the "later" linked list to the "now" linked list.

The transfer is initiated by the receipt of a ShapelD token from the TARDIS 72. The table of Figure 24 illustrates this sequence: 1. COIN RAM: Read Head/Tail of all 4 priority "later" lists.

2. DATA BUFFER: Read "now" Priority and LINK at Head of chosen "later" list.

3. COIN RAM: Write new Head/Tail of "later" list (from Data Buffer link).

4. DATA BUFFER: Write link to new Tail of "now" list. Transmit Sequence

This sequence reads a CD from the "now" linked list and outputs the CD to the system. The table of Figure 25 illustrates this sequence:

1. DATA BUFFER: Read the CD word by word

2. CD_Data bus driven 3. CD_SHP_RDY asserted

Management Sequence This sequence writes a ShapelD to the SHAPE RAM (if requested), and reads a

ShapelD from the SHAPE RAM. These operations allow the configuration and monitoring of ConnectionlD to ShapelD mappings in the DALEK 70. The table of

Figure 26 illustrates this sequence: 1. The address (ConnectionlD) pointed to by write register CPU WR-CID is written using the data (ShapelD) in write register CPU WR SID.

2. The address (ConnectionlD) pointed to by CPU WR CID is read, the data

(ShapelD) being placed in read register CPU_RD_SID.

Example Overall Sequence Figure 27 illustrates an example overall sequence carried out by the DALEK 70.

Such a sequence is run in each Main Timing Sequence. Each overall sequence combines the Receive, Transfer, Transmit and Management sequences described in the preceding section.

The example overall sequence chosen here illustrates the worst case scenario in which: 1. Three CDs received from the system, initiating three Receive Sequences.

2. Three ShapelD tokens returned from the TARDIS 72, initiating three Transfer Sequences.

3. "Now" list occupied, initiating a Transmit Sequence.

4. CPU SR WRREQ bit asserted, initiating a Management Sequence. As defined herein, the present invention's use of "now" and "later" lists with per- connection ShapelDs provides priority within a virtual connection (NC) and a virtual path (NP), respectively. This effectively preserves the relative priority for connections being shaped within a VP. Also, the use of a Calendar Queue reduces the complexity of a "virtual finishing time" (VFT) calculation, such that the resultant VFT has a constant- time bound on its algorithmic complexity [O(l) versus O(Ν log N)]. Finally, the use of an "active list" reduces the complexity of the per-connection scheduling.

Those skilled in the art will appreciate that various adaptations and modifications of the just-described embodiments can be configured without departing from the scope and spirit of the invention. Therefore, it is to be understood that, within the scope of the appended claims, the invention may be practiced other than as specifically described herein.

Claims

CLAIMSWhat is claimed is:

1. A shaper unit comprising: a Cell Descriptor (CD) processing block; and a ShapelD processing block; wherein the CD processing block outputs a token to the ShapelD processing block corresponding to each CD, and the ShapelD processing block processes the tokens to control the scheduling of the CDs out of the shaper unit.

2. The shaper unit of Claim 1, wherein the CD processing block comprises: a processing block; a SHAPE RAM that holds a mapping table from a ConnectionlD to a ShapelD; a COIN RAM that stores heads and tails of "later" lists; and a DATA BUFFER array that stores cell descriptors.

3. The shaper unit of Claim 2, wherein the ShapelD processing block comprises: a processing block; a Generic Cell-Rate Algorithm (GCRA) RAM that stores per-shape GCRA configuration and state data; and a LINK RAM that stores a Calendar Queue linked list array.

4. The shaper unit of Claim 3, wherein the ShapelD processing block further comprises a MINT RAM that stores a Calendar Queue linked list array.

5. A network switch that receives cell bursts, the network switch comprising: a queue control unit, the queue control unit comprising: a queue manager; a discard block; a shaper to shape the cell bursts, the shaper comprising: a Cell Descriptor (CD) processing block; and a ShapelD processing block; wherein the CD processing block outputs a token to the ShapelD processing block corresponding to each CD, and the ShapelD processing block processes the tokens to control the scheduling of the CDs out of the shaper unit; a per port queue unit; a de-queue unit; a multicast server; and a free buffer list unit.

6. A method for shaping cell traffic in a network switch, the method comprising: receiving a Cell Descriptor (CD) in a CD processing block; decoding a ShapelD from the CD and storing the CD in a "later" list; outputting the ShapelD token to a ShapelD processing block; checking the conformance of the ShapelD token; if the ShapelD token is conforming, then transferring the ShapelD token back to the CD processing block; if the ShapelD token is not conforming, then inserting the ShapelD token into a Calendar Queue, and when the ShapelD token is mature, transferring the ShapelD token from the Calendar Queue to a mature list, and then transferring the ShapelD token back to the CD processing block; moving a CD to a "now" list, when a corresponding ShapelD token is received by the CD processing block; and+ outputting a CD from the CD processing block.

7. The method of Claim 6, wherein the use of the "now" and "later" lists with per-connection ShapelDs provides priority within a virtual connection (VC).

8. The method of Claim 7, wherein when a token matures, the CD processing block determines which VC to send out, such that a higher priority VC is sent before a lower priority VC, even if the higher priority VC did not generate the token.

9. The method of Claim 6, wherein each connection is shaped to a different rate.

10. The method of Claim 6, wherein a plurality of connections are all set to a same ShapelD.

11. A communications system comprising: a plurality of sources for supplying information; a plurality of destinations for receiving the information from the sources; one or more nodes forming a network connecting the sources to the destinations, the network having a plurality of channels for transporting the information, wherein each node includes a queuing control unit comprising: a queue manager; a discard block; a shaper comprising: a Cell Descriptor (CD) processing block; and a ShapelD processing block; wherein the CD processing block outputs a token to the ShapelD processing block corresponding to each CD, and the ShapelD processing block processes the tokens to control the scheduling of the CDs out of the shaper unit; a per port queue unit; a de-queue unit; a multicast server; and a free buffer list unit.

12. The communications system of Claim 11, wherein the CD processing block: receives a Cell Descriptor (CD); decodes a ShapelD from the CD and stores the CD in a "later" list; and outputs the ShapelD token to the ShapelD processing block.

13. The communications system of Claim 12, wherein the ShapelD processing block: checks the conformance of the ShapelD; if the ShapelD token conforms, then transfers the ShapelD token back to the CD processing block; if the ShapelD token does not conform, then inserts the ShapelD token into a Calendar Queue, and when the ShapelD token is mature, transfers the ShapelD token from the Calendar Queue to a mature list, and then transfers the ShapelD token back to the CD processing block.

14. The communications system of Claim 13, wherein when the CD processing block receives the ShapelD token from the ShapelD processing block, the CD processing block: moves a CD to a "now" list, and outputs a CD.