EP2783277A1 - Queuing apparatus - Google Patents

Queuing apparatus

Info

Publication number
EP2783277A1
EP2783277A1 EP11794131.0A EP11794131A EP2783277A1 EP 2783277 A1 EP2783277 A1 EP 2783277A1 EP 11794131 A EP11794131 A EP 11794131A EP 2783277 A1 EP2783277 A1 EP 2783277A1
Authority
EP
European Patent Office
Prior art keywords
queuing
queue
memory
queues
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP11794131.0A
Other languages
German (de)
French (fr)
Inventor
Yaron Shachar
Rami Zecharia
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP2783277A1 publication Critical patent/EP2783277A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • G06F5/065Partitioned buffers, e.g. allowing multiple independent queues, bidirectional FIFO's
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/56Queue scheduling implementing delay-aware scheduling
    • H04L47/564Attaching a deadline to packets, e.g. earliest due date first
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/901Buffering arrangements using storage descriptor, e.g. read or write pointers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • G06F5/10Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor having a sequence of storage locations each being individually accessible for both enqueue and dequeue operations, e.g. using random access memory

Definitions

  • the invention relates to a queuing apparatus having a queuing engine and to a method for queuing Packet Descriptors.
  • High-speed queuing systems are required in many applications, especially in high-speed routers and switches of networks. When many inputs are connected to one output, it is necessary to provide a queuing engine especially in cases where the ingress data rate be- comes higher than the egress data rate.
  • Conventional queuing engines provide either ingress queuing or use higher-frequency queuing systems. However, both approaches have disadvantages. Ingress queues use much more memory and have disadvantages like higher head-of-line blocking (HOL blocking) and the necessity of providing a complex traffic management.
  • HOL blocking head-of-line blocking
  • the use of high-frequency queuing systems using a high-frequency clock causes an increase of power consumption and moreover leads to scalability problems in terms of timing closure.
  • a queuing apparatus having a queuing engine comprising
  • each queue has a number, N, of sub-queues, SQ, associated to a corresponding number, N, of input lanes of said queuing engine,
  • each sub-queue, SQ is used for storing Packet Descriptors applied by each asso- ciated input lane of said queuing engine and
  • a shared descriptor memory adapted to store all Descriptors of the sub-queues, SQ, of the predetermined number, K, of queues, Q, of this queuing engine,
  • each sub-queue, SQ is adapted to store a maximum number, M, of Packet Descriptors applied to the associated input lane of said sub-queue, SQ, and wherein each sub-queue, SQ, can use the entire shared description memory,
  • each enqueuing unit is adapted to enqueue Packet Descriptors applied to the respective input lane during each system clock cycle of the system clock signal, CLK, applied to the queuing engine into the corresponding sub-queues, SQ, of the respective input lane.
  • the queuing engine further comprises a dequeuing unit adapted to dequeue a descriptor from any of the queues, Q, during the system clock cycle of the system clock signal, CLK, applied to said queuing engine.
  • the queuing apparatus further comprises a scheduler adapted to select a queue, Q, for being dequeued by the dequeuing unit of the queuing engine.
  • one of the sub-queues, SQ, of the selected queue, Q is selected by the memory controller using a sequencing function, SF.
  • the sequencing function, SF used by the memory controller comprises a First-in-First-Out, FIFO, se- quencing function,
  • the sequencing function, SF used by the memory controller comprises a Round Robin sequencing function, wherein a different sub-queue, SQ, is selected among the N sub-queues, SQ, of a queue, Q, each time a queue, Q, is selected by the scheduler for being dequeued by the dequeuing unit of the queuing engine.
  • the sequencing function, SF used by the memory controller comprises a deficit Round Robin sequencing function.
  • a shared descriptor memory comprises a multi-write-single read memory system.
  • the multi-write-single read memory system comprises a control logic adapted to receive a number, n, of write requests and to receive a read request within a clock cycle of the system clock signal, CLK.
  • the multi-write-single read memory system further comprises n+1 memory banks adapted to store data, wherein n is an integer number.
  • a control logic of the multi-write-single read memory system is adapted to control memory bank occupancy levels, MBOLs, of each memory bank such that the difference between memory bank occupancy levels, MBOLs, of the memory banks are minimized.
  • the queuing engine further comprises a memory controller adapted to receive queued data comprising Descriptors from said enqueuing unit and to supply dequeuing data comprising Descriptors to the dequeuing unit.
  • the memory controller is connected to the control logic of the multi-write-single read memory system forming the shared descriptor memory of the queuing engine.
  • a memory bank of the multi-write-single read memory system is formed by a single port random access memory RAM.
  • the queuing engine comprises a next pointer memory formed by a
  • the queuing engine further comprises a free buffer pool memory formed by a multi-write-single read memory system connected to the memory controller.
  • a queuing engine comprises a time stamp memory formed by a multi-write-single read memory system connected to the memory controller.
  • a time stamp memory is provided for storing time stamps, TS, attached to Packet Descriptors packets received via an input lane of the queuing engine.
  • the enqueuing units and the dequeuing units of the queuing engine have access to a queue information memory which stores information data of queues, Q.
  • a time stamp, TS is attached to its Packet Descriptor wherein the time stamp, TS, indicates an arrival time of the respective packet.
  • a Packet Descriptor comprises a memory address to address payload data of the received packet stored in a shared data memory of the queuing apparatus.
  • a traffic management device comprising a queuing apparatus having a queuing engine according to the first aspect of the present invention.
  • an integrated circuit comprising a queuing apparatus having a queuing engine according to the first aspect of the present invention.
  • Fig. 1 shows a block diagram of a possible implementation of a queuing apparatus having a queuing engine according to the first aspect of the present invention
  • Fig. 2 shows a block diagram of a possible implementation of a queuing engine provided in the queuing apparatus as shown in Fig. 1;
  • Fig. 3 shows a diagram for illustrating a logical view of a single queue employed by the queuing apparatus according to the first aspect of the present invention
  • Fig. 4 shows a diagram for illustrating an aggregation of queues of a butterfly queuing mechanism as employed by a queuing apparatus according to the first aspect of the present invention.
  • the queuing apparatus 1 comprises in the shown implementation as its core element a queuing engine 2 connected to a scheduler 3 via a data bus.
  • the queuing apparatus 1 further comprises a shared data memory 4 connected to a data storage unit 5 and to a data retrieval unit 6.
  • the queuing engine 2 is adapted to receive N Descriptors from the data storage unit 5 and to output one descriptor to the data retrieval unit 6.
  • a Packet Descriptor is a set of information that describes the packet. Packet Descriptors can hold all kinds of information on a data packet. For instance, the Packet De- scriptor comprises a pointer to a data-memory in which the packet is stored.
  • Packet Descriptor is sent.
  • the packet itself is stored in a data-memory. There could be systems in which this kind of descriptor would be enough. However, there could be cases in which the descriptor can hold more information, e.g. a packet ID. In systems in which the packet length is varies from one packet to another, the packet length might be added to the Packet Descriptor in order to enable the scheduler 3 to compute exactly the amount of bytes really scheduled.
  • a Packet Descriptor can include the header of a packet itself.
  • the Packet Descriptor can include one or several data fields of the header within the packet. There are many variants of Packet Descriptors, above were mentioned examples for such.
  • the data storage unit 5 is connected via N input lanes of the queuing engine 2 to supply N Descriptors, e.g. Packet Descriptor, as shown in Fig. 1 to the queuing engine 2.
  • N Descriptors e.g. Packet Descriptor
  • the data packets are stored in the shared data memory 4 and a reference to the stored data (usually a pointer) is put inside a Packet Descriptor which is sent to the queuing engine 2.
  • the queuing engine 2 comprises a predetermined number, K, of queues Q, wherein each queue Q has a number, N, of sub-queues, SQ, associated to the corresponding number N of input lanes connected to the queuing engine 2.
  • Each sub-queue, SQ is used for storing Packet Descriptors applied by each associated input lane of the queuing engine 2.
  • the scheduler 3 is adapted to select a queue among the K queues for deqeue. It is recognized by a person skilled the art that there are many state of art implementations of the scheduler 3. On each cycle the queuing engine 2 pops the Packet Descriptor at the head of the queue that is selected by the scheduler 3. This Packet Descriptor is then sent to the data retrieval unit 6 which looks up the data packet of the popped Packet Descriptor from the shared data memory 4 and sends it to the output lane.
  • a queue is a particular kind of collection in which the entities in the collection are kept in order and the principal operations on the collection are the addition of entities to the queue and removal of entities from the queue. For instance, for a First- In-First-Out (FIFO) queue, the first entity added to the queue will be the first one to be removed. This is equivalent to the requirement that once an entity is added, all entities that were added before have to be removed before the new entity can be invoked.
  • a queue is an example of a linear data struc- lure. Take the FIFO queue as an example, an operation of dequeuing is the leave of one or more entity from the front terminal position of the queue, while an operation of enqueuing is one or more entity enters to the rear terminal position of the queue. It is clear to a person skilled in the art that the operations of dequeuing and enqueuing apply to any types of queues, are not limited to FIFO queues.
  • the input lane is a bus of data, in the form of packets that streams into the queuing engine 2.
  • each path of enqueuing is called an input lane.
  • FIG. 2 An implementation of the queuing engine 2 shown in Fig. 1 is illustrated in more detail by the block diagram of Fig. 2.
  • a corresponding enqueuing unit 7-1, 7-2 . . . 7-N is provided.
  • Each enqueuing unit 7-i is adapted to enqueue Packet Descriptors applied to the respective input lane during each system clock cycle of the system clock signal CLK to the queuing engine 2 into a corresponding sub-queue SQ of the respective input lane.
  • the queuing engine 2 further comprises a dequeuing unit 8 adapted to dequeue a descriptor from any of the queues Q during the system clock cycle of the system clock signal CLK applied to the queuing engine 2.
  • the queuing engine 2 comprises in the shown implementation of Fig. 2 a memory controller 9 adapted to receive queued data comprising Descriptors from the enqueuing units 7 and to supply dequeuing data comprising Descriptors to the dequeuing unit 8.
  • the queuing engine 2 comprises a shared descriptor memory 10 adapted to store all Descriptors of the sub-queues SQ of the predetermined number, K, of queues, Q, of the queuing engine 2, wherein during each system clock cycle of the system clock signal CLK applied to the queuing engine 2 up to N input lanes can request a write operation to queues Q and up to one read operation from any queue Q.
  • the shared descriptor memory 10 can comprise a multi-write-single read memory system comprising a control logic adapted to receive a number, n, of write requests WRs and to receive a read request RR within a clock cycle of the system clock signal CLK.
  • the multi-write-single read memory system can comprise n+1 memory banks adapted to store data, wherein n is an integer number.
  • the control logic of the multi-write-single read memory system can be adapted to control memory bank occupancy levels MBOLs of each memory bank such that the difference between memory bank occupancy levels MBOLs of the memory banks are minimized.
  • the memory controller 9 as shown in Fig. 2 can be connected to the control logic of the multi-write-single read memory system forming the shared descriptor memory 10 of the queuing engine 2.
  • Each memory bank of the multi-write-single read memory system can be formed by a single port random access memory RAM.
  • the enqueuing units 7-i and dequeuing units 8 of the queuing engine 2 can have access to a queue information memory 11 which stores information of the queues Q.
  • the queue information memory 11 can hold details on each of the sub-queues SQ.
  • the detail can be a combination of the size of the sub-queue SQ, a read pointer and a write pointer.
  • the queue information memory 11 comprises at least the head of queue and tail of queue pointers.
  • the queue information memory 11 can further comprise information about the queue size and state information.
  • the queue information memory 11 can store further information about each sub-queue SQ.
  • Each enqueuing unit 7-i of the queuing engine 2 receives a descriptor, reads the queue, Q, it needs to insert data into and read the queue control information such as head of queue, tail of queue and queue size from the queue information memory 11.
  • the enqueuing unit 7-i then sends a command to the memory controller 9 requesting to store the descriptor and to update a next pointer memory on the same address of the newly stored descriptor to receive a vacant address that was used for the descriptor and the next pointer.
  • the enqueuing unit 7-i then can update the queue information memory 11 with the latest data. Concurrently, in case that the queue, Q, changed its state, for example from empty to non-empty, the enqueuing unit 7-i updates the scheduler 3 accordingly.
  • the dequeuing unit 8 receives in a possible implementation a queue number from the scheduler 3 and reads the queue information memory 11. According to the received queue information such as the queue head the dequeuing unit 8 can request the memory controller 9 to retrieve the Packet Descriptor at the head of the queue from shared descriptor memory 10 and send it to the output of the queuing engine 2. Further, the dequeuing unit 8 can update the scheduler 3 about the dequeue process. In a further possible embodiment further memories are connected to the memory controller 9. In a possible implementation the queuing engine 2 comprises a next pointer memory that can also be formed by a multi-write-single-read memory system connected to the memory controller 9. The next pointer memory is used to maintain multi-linked lists in one memory.
  • the queuing engine 2 further comprises a free buffer pool memory which can be formed by a multi-write-single-read memory system connected to the memory controller 9.
  • the free buffer pool memory can be used to track descriptor buffers that are vacant and can be used for incoming Packet Descriptors. Upon enqueuing the buffer is marked as occupied and upon dequeuing the buffer is marked as vacant.
  • the queuing engine further comprises a time stamp memory which can be formed by a multi-write-single-read memory system connected to the memory controller 9.
  • the time stamp memory is provided for storing time stamps TS attached to the Packet Descriptors packets received via an input lane by the queuing engine 2.
  • a time stamp TS is attached to its Packet Descriptor, wherein the time stamp TS indicates an arrival time of the respective packet.
  • the Packet Descriptor can comprise a memory address to ad- dress the received packet stored in the shared data memory 4 of the queuing apparatus 1. It is a preferred option that the Packet Descriptor holds the pointer to the data packet inside the shared data memory, otherwise, there will be no way to pull the data packet information when the Packet Descriptor is taken out of the queue.
  • a queuing mechanism provided by the queuing apparatus 1 as shown in Fig. 1 fulfils a function to receive control words or Packet Descriptors describing each data segment such as a packet and to store them in a set of queues Q. Under control of a decision engine, i.e. the scheduler 3, Packet Descriptors are dequeued from the head of one of the queues Q out of the queuing apparatus. The data segments themselves can be stored in the shared data memory 4 prior to the enqueuing process.
  • the scheduler 3 as shown in Fig. 1 is adapted to select a queue Q for being dequeued by the dequeuing unit 8 of the queuing engine 2.
  • the scheduler 3 is configured to selects a queue Q.
  • the scheduler 3 even does not need to know that a queue is represented by a set of sub-queues SQ.
  • the memory controller 9 inside the queuing engine 2 chooses the correct sub-queue SQ of the queue that was decided by the scheduler 3.
  • the manner in which the queuing engine 2 selects the sub-queue SQ is done according to one of the sequencing functions SF: a First-in-First-Out FIFO sequencing function, a Round Robin sequencing function and a deficit Round Robin sequencing function.
  • the sequencing function SF comprises a First-in-First-Out FIFO sequencing function, wherein the sub-queue SQ among the N sub-queues SQ of the selected queue Q is selected which comprises the descriptor in its head of queue to which a minimum time stamp TS is attached.
  • This mechanism allows for packets to be selected based on the arrival time. The first arriving packet to any sub-queue SQ is selected to be dequeued. In this mode, as shown in Fig. 4, a time stamp TS is required to be attached to each arriving Packet Descriptor of any sub-queue SQ.
  • This implementation mimics a logical FIFO for each of the queues Q.
  • the sequencing function SF comprises a Round Robin sequencing function, wherein a different sub-queue SQ is selected among the N sub-queues of a queue Q each time a queue Q is selected by the scheduler 3 for being dequeued by the dequeuing unit 8 of the queuing engine 2.
  • This implementation can be used, if there is a need to sequence the arriving data traffic between different sources.
  • a different sub-queue SQ is selected each time the queue Q is selected for being dequeued.
  • the sequencing function SF comprises a deficit Round Robin sequencing function. If the sub-queue SQ contains packets with different size, a Round Robin mechanism may not be fair. The deficit Round Robin mechanism allows to perform a Round Robin in a fair way such that each sub-queue SQ counts bytes or data volume and not packets.
  • the queuing mechanism employed by the queuing engine 2 allows to insert several Packet Descriptors together to the same or different queues Q within the same clock cycle by allowing the dequeued process on the same clock cycle.
  • the number of lanes i.e. the maximum amount of enqueues during the same clock cycle, is denoted by the parameter N.
  • the number of queues Q in the system is denoted by the parameter K.
  • a butterfly queuing engine 2 is provided having a shared pool of memory for data and replicated control information.
  • Each queue Q in the queuing mechanism provided by the queuing engine 2 is represented by a set of N sub-queues SQ.
  • Each sub-queue SQ represents a source such as an input port of the queuing apparatus 1.
  • the Packet Descriptors are stored in the relevant sub-queue SQ such that N arriving packets can be stored at the same time into N separate sub-queues SQ.
  • a queue Q is selected by the scheduler 3 for a dequeue, one of the sub-queues SQ of a queue Q can be selected by the memory controller 9 inside the queuing engine 2.
  • the selection between the sub-queues SQ of a queue Q can be performed by using different methods or sequencing functions SF comprising a First-In-First-Out FIFO or a Round Robin sequencing function.
  • the queuing engine 2 can use other selection functions SF for selection of a sub-queue SQ within a queue Q as well.
  • Fig. 3 shows a diagram for illustrating a logical view on a single queue Qi.
  • a corresponding sub-queue SQ within a queue Q is provided for storing Packet Descriptors D, wherein each sub-queue SQ is connected to a multiplexing logic MUX controlled by a sub-queue select logic SQSL as shown in Fig. 3 which evaluates for example time stamps TS associated with the data Descriptors D.
  • the MUX and the SQSL being a logical element in a logical view as shown in Fig. 3, are used to logically show the functionality done by the memory controller 9 of Fig. 2.
  • Fig. 4 is a logical view (conceptual view) of the scheme that is implemented by the queuing engine 2 as shown in Fig. 2.
  • Fig. 2 is an implementation of the logical view illustrated in Figure 4.
  • FIG. 4 An aggregation of K queues Q each having N sub-queues SQ is illustrated by the diagram of Fig. 4.
  • the input lane i corresponds to the sub-queue SQ i of each queue Q.
  • a corresponding sub-queue SQ i within each queue Q is provided for storing Packet Descriptors D, wherein each sub-queue SQ is connected to a multiplexing logic MUX con- trolled by a sub-queue select logic SQSL as shown in Fig. 3.
  • a time stamp TS is attached to each sub-queue SQ within each queue Q.
  • the time stamps TS attached to each sub-queue SQ constitute inputs of the SQSL.
  • the SQSL can evaluate time stamps TS associated with the Packet Descriptors D.
  • An output of each queue Q is coupled to a further MUX,
  • Figure 4 shows that each sub-queue SQ is selected according to a certain function SQSL, while each queue Q is selected according to the decision that is passed to the queuing engine 2 from the scheduler 3.
  • the queuing engine 2 uses N x K queues which are implemented using a shared descriptor memory. This kind of high-level division between the different queues Q enables to share a descriptor memory between the different queues Q.
  • the shared descriptor memory is in form of using a plurality of
  • the multi-write-single-read memories can be used to hold a Packet Descriptor database which forms a major part of the stored data.
  • the rest of the data such as queue control information such as a queue read and write pointer is duplicated per lane.
  • the queue control information forms only a minor part of the stored data and comprises a constant data volume being not dependant on data traffic and being scalable.
  • a queue Q is selected by the scheduler 3 for a dequeue phase, there can be packets ready to be delivered from more than one sub-queue SQ of the selected queue Q. Therefore, one sub-queue SQ is selected by the memory controller 9 inside the queuing engine 2 and the selection between each sub-queue SQ can be performed by a predetermined sequencing function SF.
  • the sequencing function SF used by the memory controller 9 can be changed in response to a control signal, e.g. to change an op- eration mode of the queuing apparatus 1.
  • the queuing apparatus 1 shown in Fig. 1 can be implemented in a traffic management device according to a second aspect of the present invention.
  • the traffic management device can be, for example, a switch device or a router device of a network.
  • the queuing apparatus 1 as shown in Fig. 1 can be integrated in an integrated circuit forming a third aspect of the present invention.
  • the integrated circuit can be a chip comprising a queuing apparatus 1 as shown in Fig. 1.
  • the invention further provides according to a fourth aspect a method for queuing Packet Descriptors received on each system clock cycle of the system clock signal CLK from a number N of input lanes concurrently using a queuing engine 2, wherein the queuing engine 2 comprises a predetermined number K of queues Q each having a number N of sub-queues SQ associated to the corresponding input lanes of the queuing engine 2.
  • All Descriptors of the sub-queues SQ of the predetermined number K of queues Q are stored in a shared descriptor memory 10 of the queuing engine 2 being adapted to store up to N Descriptors and to retrieve one descriptor per system clock cycle of the system clock signal CLK.
  • a shared descriptor memory 10 of the queuing engine 2 being adapted to store up to N Descriptors and to retrieve one descriptor per system clock cycle of the system clock signal CLK.
  • the system clock signal CLK Packet Descriptors applied to the N input lanes are written in parallel into the shared descriptor memory 10 and a single descriptor is read from the shared descriptor memory 10.
  • the method for queuing Descriptors can be performed by executing a control program having instructions to perform the method according to the fourth aspect of the present invention.
  • This control program can be stored in a possible implementation in a program memory.
  • this control program can be loaded from a data carrier storing such a control program.
  • a method and apparatus for sharing memory in a queuing system that can withstand multiple writes and a single read in every system clock cycle can be used in a VLSI application.
  • the queuing apparatus 1 uses a multi-port shared memory to create a queuing system. It also reduces the silicon area needed when integrating the queuing apparatus 1 on a chip. Furthermore, a lower power consumption is achieved by using a lower frequency.
  • the queue size is limited only by the total descriptor memory.
  • a single queue Q can use the entire descriptor memory.
  • Different inputs of the queuing apparatus 1 can send data to the same queue Q within the same clock cycle.
  • a single queue Q is represented by N sub-queues SQ.
  • the input lane is connected to a specific sub-queue SQ.
  • a queue Q is selected by the scheduler 3 and a sub-queue SQ is selected using a predetermined sequencing function SF.
  • the queuing apparatus 1 allows to provide a less complex scheduler 3 since the scheduler 3 does only see K queues Q.
  • each of the lanes inserts its incoming packet to its own set of queues.
  • a time stamp TS can be used to define an order between sub-queues SQ of the same queue Q.
  • each queue Q comprises four sub-queues SQ.
  • all four head time stamps TS are read and compared by an evaluation unit controlling a multiplexing logic as shown in Fig. 4.
  • the lane number can serve in a possible implementation as a differentiator.
  • the queuing apparatus 1 according to the present invention only the queue head/tail memory is duplicated, wherein the main Packet Descriptor memory is not duplicated thus saving memory space.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A queuing apparatus (1) having a queuing engine (2) comprising: a predetermined number, K, of queues, Q, wherein each queue Q has a number, N, of sub-queues, SQ, associated to a corresponding number, N, of input lanes of said queuing engine (2), wherein each sub-queue, SQ, is used for storing Packet Descriptors applied by each associated input lane of said queuing engine (2); and a shared descriptor memory (10) adapted to store all Descriptors of the sub-queues, SQ, of the predetermined number, K, of queues, Q, of said queuing engine (2), wherein in each system clock cycle of a system clock signal (CLK) applied to said queuing engine (2), up to N input lanes can request a write operation, WR, to queues, Q, and up to one read operation, RR, from any queue, Q.

Description

TITLE
Queuing Apparatus TECHNICAL BACKGROUND
The invention relates to a queuing apparatus having a queuing engine and to a method for queuing Packet Descriptors. BACKGROUND OF THE INVENTION
High-speed queuing systems are required in many applications, especially in high-speed routers and switches of networks. When many inputs are connected to one output, it is necessary to provide a queuing engine especially in cases where the ingress data rate be- comes higher than the egress data rate. Conventional queuing engines provide either ingress queuing or use higher-frequency queuing systems. However, both approaches have disadvantages. Ingress queues use much more memory and have disadvantages like higher head-of-line blocking (HOL blocking) and the necessity of providing a complex traffic management. The use of high-frequency queuing systems using a high-frequency clock causes an increase of power consumption and moreover leads to scalability problems in terms of timing closure.
Conventional queuing engines can become a bottleneck for the entire design of the high-speed queuing system. Although the high-speed queuing system can be designed to support a designated system standard bandwidth, a queuing apparatus is often required to support higher and even sometimes much higher bandwidths, in order to enable differentiated services on the delayed information that dwell in the buffer. In high-speed queuing systems where an output queue architecture is employed, a specific output queue is required to absorb data traffic which can have a data rate which is much higher than its output rate, especially when several input data sources send data traffic to said specific output queue.
This poses a major requirement to any queuing apparatus which is able to absorb input traffic to an input queue at a much higher data rate than the output rate of the queue. This requirement can be translated to many write operations to a queue when compared to read operations. Conventional queuing systems that have to withstand more than one write operation during each system clock cycle are required to actually replicate the control memories and the descriptor memories of the input queues in order to be able to perform several write accesses in parallel. However, since queuing systems require an ever-growing queue depths this replication of control and descriptor memories forms a major hurdle to system design and causes a waste of die space and power consumption when integrating these conventional queuing system on a chip. Accordingly, there is a need for a queuing apparatus and a corresponding method which can withstand multiple write operations and a read operation during one system clock cycle without the necessity to replicate control and de- scriptor memories for the queues.
SUMMARY OF THE INVENTION
According to a first aspect of the present invention a queuing apparatus having a queuing engine is provided comprising
a predetermined number, K, of queues, Q, wherein each queue has a number, N, of sub-queues, SQ, associated to a corresponding number, N, of input lanes of said queuing engine,
wherein each sub-queue, SQ, is used for storing Packet Descriptors applied by each asso- ciated input lane of said queuing engine and
a shared descriptor memory adapted to store all Descriptors of the sub-queues, SQ, of the predetermined number, K, of queues, Q, of this queuing engine,
wherein in each system clock cycle of a system clock signal, CLK, applied to the queuing engine, up to N input lanes can request a write operation to queues, Q, and up to one read operation from any queue, Q.
According to a first possible implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention each sub-queue, SQ, is adapted to store a maximum number, M, of Packet Descriptors applied to the associated input lane of said sub-queue, SQ, and wherein each sub-queue, SQ, can use the entire shared description memory,
wherein M is the number of Descriptors that the shared descriptor memory can hold. In a possible second implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention for each of the N input lanes of the queuing engine a corresponding enqueuing unit is provided. In a possible third implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention each enqueuing unit is adapted to enqueue Packet Descriptors applied to the respective input lane during each system clock cycle of the system clock signal, CLK, applied to the queuing engine into the corresponding sub-queues, SQ, of the respective input lane.
In a possible fourth implementation of the third implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention the queuing engine further comprises a dequeuing unit adapted to dequeue a descriptor from any of the queues, Q, during the system clock cycle of the system clock signal, CLK, applied to said queuing engine.
In a further possible fifth implementation of the first to fourth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention the queuing apparatus further comprises a scheduler adapted to select a queue, Q, for being dequeued by the dequeuing unit of the queuing engine.
In a possible sixth implementation of the fifth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention one of the sub-queues, SQ, of the selected queue, Q, is selected by the memory controller using a sequencing function, SF.
In a possible seventh implementation of the sixth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention the sequencing function, SF, used by the memory controller comprises a First-in-First-Out, FIFO, se- quencing function,
wherein the sub-queue, SQ, among the N sub-queues, SQ, of the selected queue, Q, is selected which comprises the descriptor in its head of queue to which a minimum time stamp, TS, is attached. In a possible eighth implementation of the sixth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention the sequencing function, SF, used by the memory controller comprises a Round Robin sequencing function, wherein a different sub-queue, SQ, is selected among the N sub-queues, SQ, of a queue, Q, each time a queue, Q, is selected by the scheduler for being dequeued by the dequeuing unit of the queuing engine.
In a further possible ninth implementation of the sixth implementation of the queuing ap- paratus having a queuing engine according to the first aspect of the present invention the sequencing function, SF, used by the memory controller comprises a deficit Round Robin sequencing function.
In a possible tenth implementation of the first to ninth implementation of the queuing ap- paratus having a queuing engine according to the first aspect of the present invention a shared descriptor memory comprises a multi-write-single read memory system.
In a possible eleventh implementation of the tenth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention the multi-write-single read memory system comprises a control logic adapted to receive a number, n, of write requests and to receive a read request within a clock cycle of the system clock signal, CLK.
In a possible twelfth implementation of the eleventh implementation of the queuing appa- ratus having a queuing engine according to the first aspect of the present invention the multi-write-single read memory system further comprises n+1 memory banks adapted to store data, wherein n is an integer number.
In a further possible thirteenth implementation of the eleventh or twelfth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention a control logic of the multi-write-single read memory system is adapted to control memory bank occupancy levels, MBOLs, of each memory bank such that the difference between memory bank occupancy levels, MBOLs, of the memory banks are minimized. In a possible fourteenth implementation of the second to fourth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention the queuing engine further comprises a memory controller adapted to receive queued data comprising Descriptors from said enqueuing unit and to supply dequeuing data comprising Descriptors to the dequeuing unit.
In a possible fifteenth implementation of the tenth to fourteenth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present in- vention the memory controller is connected to the control logic of the multi-write-single read memory system forming the shared descriptor memory of the queuing engine.
In a possible sixteenth implementation of the twelfth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention a memory bank of the multi-write-single read memory system is formed by a single port random access memory RAM.
In a possible seventeenth implementation of the fourteenth to sixteenth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention the queuing engine comprises a next pointer memory formed by a
multi-write-single read memory system connected to the memory controller.
In a possible eighteenth implementation of the fourteenth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention the queuing engine further comprises a free buffer pool memory formed by a multi-write-single read memory system connected to the memory controller.
In a possible nineteenth implementation of the fourteenth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention a queuing engine comprises a time stamp memory formed by a multi-write-single read memory system connected to the memory controller. In a possible twentieth implementation of the nineteenth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention a time stamp memory is provided for storing time stamps, TS, attached to Packet Descriptors packets received via an input lane of the queuing engine.
In a possible twenty-first implementation of the second to fourth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention the enqueuing units and the dequeuing units of the queuing engine have access to a queue information memory which stores information data of queues, Q.
In a possible twenty-second implementation of the nineteenth or twentieth implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention for each packet received via an input lane by said queuing engine a time stamp, TS, is attached to its Packet Descriptor wherein the time stamp, TS, indicates an arrival time of the respective packet.
In a possible twenty-third implementation of the twenty-second implementation of the queuing apparatus having a queuing engine according to the first aspect of the present invention a Packet Descriptor comprises a memory address to address payload data of the received packet stored in a shared data memory of the queuing apparatus.
According to a second aspect of the present invention a traffic management device is provided comprising a queuing apparatus having a queuing engine according to the first aspect of the present invention.
According to a third aspect of the present invention an integrated circuit is provided comprising a queuing apparatus having a queuing engine according to the first aspect of the present invention. According to a fourth aspect of the present invention a method for queuing Packet Descriptors received on each system clock cycle of a system clock signal, CLK, from a number N of input lanes concurrently using a queuing engine comprising a predetermined number, K, of queues, Q, each having a number, N, of sub-queues, SQ, associated to the corre- sponding input lanes of the queuing engine, wherein all Descriptors of the sub-queues, SQ, of the predetermined number, K, of queues, Q, are stored in a shared descriptor memory of the queuing engine which is adapted to store up to N Descriptors and to retrieve one descriptor per system clock cycle of the system clock signal, CLK.
In a possible implementation of the method for queuing Descriptors according to the fourth aspect of the present invention in the same system clock cycle of the system clock signal, CLK, Packet Descriptors applied to the N input lanes are written in parallel into the shared descriptor memory and a single descriptor is read from the shared descriptor memory.
BRIEF DESCRIPTION OF FIGURES
In the following exemplary implementations of different aspects of the present invention are described with reference to the enclosed figures in more detail.
Fig. 1 shows a block diagram of a possible implementation of a queuing apparatus having a queuing engine according to the first aspect of the present invention;
Fig. 2 shows a block diagram of a possible implementation of a queuing engine provided in the queuing apparatus as shown in Fig. 1;
Fig. 3 shows a diagram for illustrating a logical view of a single queue employed by the queuing apparatus according to the first aspect of the present invention; Fig. 4 shows a diagram for illustrating an aggregation of queues of a butterfly queuing mechanism as employed by a queuing apparatus according to the first aspect of the present invention.
DETAILED DESCRIPTION OF EMBODIMENTS
As can be seen in Fig. 1 the queuing apparatus 1 comprises in the shown implementation as its core element a queuing engine 2 connected to a scheduler 3 via a data bus. The queuing apparatus 1 further comprises a shared data memory 4 connected to a data storage unit 5 and to a data retrieval unit 6. As can be seen in Fig. 1 the queuing engine 2 is adapted to receive N Descriptors from the data storage unit 5 and to output one descriptor to the data retrieval unit 6. A Packet Descriptor is a set of information that describes the packet. Packet Descriptors can hold all kinds of information on a data packet. For instance, the Packet De- scriptor comprises a pointer to a data-memory in which the packet is stored. This is used in order to decrease the amount of information that is inserted into the queuing engine. Instead of sending the entire packet to the queuing system, only the Packet Descriptor is sent. The packet itself is stored in a data-memory. There could be systems in which this kind of descriptor would be enough. However, there could be cases in which the descriptor can hold more information, e.g. a packet ID. In systems in which the packet length is varies from one packet to another, the packet length might be added to the Packet Descriptor in order to enable the scheduler 3 to compute exactly the amount of bytes really scheduled. In a possible implementation a Packet Descriptor can include the header of a packet itself. In another possible implementation the Packet Descriptor can include one or several data fields of the header within the packet. There are many variants of Packet Descriptors, above were mentioned examples for such.
The data storage unit 5 is connected via N input lanes of the queuing engine 2 to supply N Descriptors, e.g. Packet Descriptor, as shown in Fig. 1 to the queuing engine 2.
The data packets are stored in the shared data memory 4 and a reference to the stored data (usually a pointer) is put inside a Packet Descriptor which is sent to the queuing engine 2.
The queuing engine 2 comprises a predetermined number, K, of queues Q, wherein each queue Q has a number, N, of sub-queues, SQ, associated to the corresponding number N of input lanes connected to the queuing engine 2. Each sub-queue, SQ, is used for storing Packet Descriptors applied by each associated input lane of the queuing engine 2.
The scheduler 3 is adapted to select a queue among the K queues for deqeue. It is recognized by a person skilled the art that there are many state of art implementations of the scheduler 3. On each cycle the queuing engine 2 pops the Packet Descriptor at the head of the queue that is selected by the scheduler 3. This Packet Descriptor is then sent to the data retrieval unit 6 which looks up the data packet of the popped Packet Descriptor from the shared data memory 4 and sends it to the output lane.
A queue is a particular kind of collection in which the entities in the collection are kept in order and the principal operations on the collection are the addition of entities to the queue and removal of entities from the queue. For instance, for a First- In-First-Out (FIFO) queue, the first entity added to the queue will be the first one to be removed. This is equivalent to the requirement that once an entity is added, all entities that were added before have to be removed before the new entity can be invoked. A queue is an example of a linear data struc- lure. Take the FIFO queue as an example, an operation of dequeuing is the leave of one or more entity from the front terminal position of the queue, while an operation of enqueuing is one or more entity enters to the rear terminal position of the queue. It is clear to a person skilled in the art that the operations of dequeuing and enqueuing apply to any types of queues, are not limited to FIFO queues.
The input lane is a bus of data, in the form of packets that streams into the queuing engine 2. In other words, each path of enqueuing is called an input lane.
An implementation of the queuing engine 2 shown in Fig. 1 is illustrated in more detail by the block diagram of Fig. 2. As can be seen in Fig. 2 for each of the N input lanes of the queuing engine 2 a corresponding enqueuing unit 7-1, 7-2 . . . 7-N is provided. Each enqueuing unit 7-i is adapted to enqueue Packet Descriptors applied to the respective input lane during each system clock cycle of the system clock signal CLK to the queuing engine 2 into a corresponding sub-queue SQ of the respective input lane. As can be seen in Fig. 2 the queuing engine 2 further comprises a dequeuing unit 8 adapted to dequeue a descriptor from any of the queues Q during the system clock cycle of the system clock signal CLK applied to the queuing engine 2. The queuing engine 2 comprises in the shown implementation of Fig. 2 a memory controller 9 adapted to receive queued data comprising Descriptors from the enqueuing units 7 and to supply dequeuing data comprising Descriptors to the dequeuing unit 8.
The queuing engine 2 comprises a shared descriptor memory 10 adapted to store all Descriptors of the sub-queues SQ of the predetermined number, K, of queues, Q, of the queuing engine 2, wherein during each system clock cycle of the system clock signal CLK applied to the queuing engine 2 up to N input lanes can request a write operation to queues Q and up to one read operation from any queue Q. In a possible implementation of the shared descriptor memory 10 the shared descriptor memory 10 can comprise a multi-write-single read memory system comprising a control logic adapted to receive a number, n, of write requests WRs and to receive a read request RR within a clock cycle of the system clock signal CLK. In this possible implementation the multi-write-single read memory system can comprise n+1 memory banks adapted to store data, wherein n is an integer number. The control logic of the multi-write-single read memory system can be adapted to control memory bank occupancy levels MBOLs of each memory bank such that the difference between memory bank occupancy levels MBOLs of the memory banks are minimized. In a possible implementation the memory controller 9 as shown in Fig. 2 can be connected to the control logic of the multi-write-single read memory system forming the shared descriptor memory 10 of the queuing engine 2. Each memory bank of the multi-write-single read memory system can be formed by a single port random access memory RAM.
As can be seen in Fig. 2 the enqueuing units 7-i and dequeuing units 8 of the queuing engine 2 can have access to a queue information memory 11 which stores information of the queues Q. The queue information memory 11 can hold details on each of the sub-queues SQ. For example, the detail can be a combination of the size of the sub-queue SQ, a read pointer and a write pointer. The queue information memory 11 comprises at least the head of queue and tail of queue pointers. In a further possible implementation the queue information memory 11 can further comprise information about the queue size and state information. In a further possible implementation the queue information memory 11 can store further information about each sub-queue SQ. Each enqueuing unit 7-i of the queuing engine 2 receives a descriptor, reads the queue, Q, it needs to insert data into and read the queue control information such as head of queue, tail of queue and queue size from the queue information memory 11. The enqueuing unit 7-i then sends a command to the memory controller 9 requesting to store the descriptor and to update a next pointer memory on the same address of the newly stored descriptor to receive a vacant address that was used for the descriptor and the next pointer. The enqueuing unit 7-i then can update the queue information memory 11 with the latest data. Concurrently, in case that the queue, Q, changed its state, for example from empty to non-empty, the enqueuing unit 7-i updates the scheduler 3 accordingly.
The dequeuing unit 8 receives in a possible implementation a queue number from the scheduler 3 and reads the queue information memory 11. According to the received queue information such as the queue head the dequeuing unit 8 can request the memory controller 9 to retrieve the Packet Descriptor at the head of the queue from shared descriptor memory 10 and send it to the output of the queuing engine 2. Further, the dequeuing unit 8 can update the scheduler 3 about the dequeue process. In a further possible embodiment further memories are connected to the memory controller 9. In a possible implementation the queuing engine 2 comprises a next pointer memory that can also be formed by a multi-write-single-read memory system connected to the memory controller 9. The next pointer memory is used to maintain multi-linked lists in one memory. In a further possible implementation the queuing engine 2 further comprises a free buffer pool memory which can be formed by a multi-write-single-read memory system connected to the memory controller 9. The free buffer pool memory can be used to track descriptor buffers that are vacant and can be used for incoming Packet Descriptors. Upon enqueuing the buffer is marked as occupied and upon dequeuing the buffer is marked as vacant.
In a further possible implementation the queuing engine further comprises a time stamp memory which can be formed by a multi-write-single-read memory system connected to the memory controller 9. In a possible implementation the time stamp memory is provided for storing time stamps TS attached to the Packet Descriptors packets received via an input lane by the queuing engine 2. In a possible implementation for each packet received via an input lane by the queuing engine a time stamp TS is attached to its Packet Descriptor, wherein the time stamp TS indicates an arrival time of the respective packet.
In a possible implementation the Packet Descriptor can comprise a memory address to ad- dress the received packet stored in the shared data memory 4 of the queuing apparatus 1. It is a preferred option that the Packet Descriptor holds the pointer to the data packet inside the shared data memory, otherwise, there will be no way to pull the data packet information when the Packet Descriptor is taken out of the queue. A queuing mechanism provided by the queuing apparatus 1 as shown in Fig. 1 fulfils a function to receive control words or Packet Descriptors describing each data segment such as a packet and to store them in a set of queues Q. Under control of a decision engine, i.e. the scheduler 3, Packet Descriptors are dequeued from the head of one of the queues Q out of the queuing apparatus. The data segments themselves can be stored in the shared data memory 4 prior to the enqueuing process.
The scheduler 3 as shown in Fig. 1 is adapted to select a queue Q for being dequeued by the dequeuing unit 8 of the queuing engine 2. The scheduler 3 is configured to selects a queue Q. The scheduler 3 even does not need to know that a queue is represented by a set of sub-queues SQ. The memory controller 9 inside the queuing engine 2 chooses the correct sub-queue SQ of the queue that was decided by the scheduler 3. The manner in which the queuing engine 2 selects the sub-queue SQ is done according to one of the sequencing functions SF: a First-in-First-Out FIFO sequencing function, a Round Robin sequencing function and a deficit Round Robin sequencing function.
In a possible implementation the sequencing function SF comprises a First-in-First-Out FIFO sequencing function, wherein the sub-queue SQ among the N sub-queues SQ of the selected queue Q is selected which comprises the descriptor in its head of queue to which a minimum time stamp TS is attached. This mechanism allows for packets to be selected based on the arrival time. The first arriving packet to any sub-queue SQ is selected to be dequeued. In this mode, as shown in Fig. 4, a time stamp TS is required to be attached to each arriving Packet Descriptor of any sub-queue SQ. The sub-queue SQ containing the Packet Descriptor in its head of queue with a minimum time stamp TS, i.e. the oldest enqueue is selected for each dequeue. This requires to find the minimum time stamp TS between the N sub-queues SQ. This implementation mimics a logical FIFO for each of the queues Q. In a further alternative implementation the sequencing function SF comprises a Round Robin sequencing function, wherein a different sub-queue SQ is selected among the N sub-queues of a queue Q each time a queue Q is selected by the scheduler 3 for being dequeued by the dequeuing unit 8 of the queuing engine 2. This implementation can be used, if there is a need to sequence the arriving data traffic between different sources. In this implementation a different sub-queue SQ is selected each time the queue Q is selected for being dequeued. In a further possible implementation the sequencing function SF comprises a deficit Round Robin sequencing function. If the sub-queue SQ contains packets with different size, a Round Robin mechanism may not be fair. The deficit Round Robin mechanism allows to perform a Round Robin in a fair way such that each sub-queue SQ counts bytes or data volume and not packets.
Other ways of selecting sub-queues SQ from the selected queue Q can be used in other implementations of the queuing apparatus 1.
The queuing mechanism employed by the queuing engine 2 according to the present inven- tion as shown in Fig. 2 allows to insert several Packet Descriptors together to the same or different queues Q within the same clock cycle by allowing the dequeued process on the same clock cycle. The number of lanes, i.e. the maximum amount of enqueues during the same clock cycle, is denoted by the parameter N. The number of queues Q in the system is denoted by the parameter K. In order to support the queuing mechanism a butterfly queuing engine 2 is provided having a shared pool of memory for data and replicated control information. Each queue Q in the queuing mechanism provided by the queuing engine 2 is represented by a set of N sub-queues SQ. Each sub-queue SQ represents a source such as an input port of the queuing apparatus 1. The Packet Descriptors are stored in the relevant sub-queue SQ such that N arriving packets can be stored at the same time into N separate sub-queues SQ. When a queue Q is selected by the scheduler 3 for a dequeue, one of the sub-queues SQ of a queue Q can be selected by the memory controller 9 inside the queuing engine 2. The selection between the sub-queues SQ of a queue Q can be performed by using different methods or sequencing functions SF comprising a First-In-First-Out FIFO or a Round Robin sequencing function. The queuing engine 2 can use other selection functions SF for selection of a sub-queue SQ within a queue Q as well.
Fig. 3 shows a diagram for illustrating a logical view on a single queue Qi. As can be seen in Fig. 3 for each lane a corresponding sub-queue SQ within a queue Q is provided for storing Packet Descriptors D, wherein each sub-queue SQ is connected to a multiplexing logic MUX controlled by a sub-queue select logic SQSL as shown in Fig. 3 which evaluates for example time stamps TS associated with the data Descriptors D. The MUX and the SQSL being a logical element in a logical view as shown in Fig. 3, are used to logically show the functionality done by the memory controller 9 of Fig. 2.
Fig. 4 is a logical view (conceptual view) of the scheme that is implemented by the queuing engine 2 as shown in Fig. 2. In other words, Fig. 2 is an implementation of the logical view illustrated in Figure 4.
An aggregation of K queues Q each having N sub-queues SQ is illustrated by the diagram of Fig. 4. The input lane i corresponds to the sub-queue SQ i of each queue Q. for each input lane i a corresponding sub-queue SQ i within each queue Q is provided for storing Packet Descriptors D, wherein each sub-queue SQ is connected to a multiplexing logic MUX con- trolled by a sub-queue select logic SQSL as shown in Fig. 3. The only difference between the queue as shown in Fig. 4 and the queue as shown in Fig. 3 is that a time stamp TS is attached to each sub-queue SQ within each queue Q. The time stamps TS attached to each sub-queue SQ constitute inputs of the SQSL. By this, the SQSL can evaluate time stamps TS associated with the Packet Descriptors D. An output of each queue Q is coupled to a further MUX, Figure 4 shows that each sub-queue SQ is selected according to a certain function SQSL, while each queue Q is selected according to the decision that is passed to the queuing engine 2 from the scheduler 3. As can be seen the queuing engine 2 uses N x K queues which are implemented using a shared descriptor memory. This kind of high-level division between the different queues Q enables to share a descriptor memory between the different queues Q. For instance, the shared descriptor memory is in form of using a plurality of
multi-write-single-read memories. The multi-write-single-read memories can be used to hold a Packet Descriptor database which forms a major part of the stored data. The rest of the data such as queue control information such as a queue read and write pointer is duplicated per lane. However, the queue control information forms only a minor part of the stored data and comprises a constant data volume being not dependant on data traffic and being scalable.
Once a queue Q is selected by the scheduler 3 for a dequeue phase, there can be packets ready to be delivered from more than one sub-queue SQ of the selected queue Q. Therefore, one sub-queue SQ is selected by the memory controller 9 inside the queuing engine 2 and the selection between each sub-queue SQ can be performed by a predetermined sequencing function SF. In a possible implementation of the sequencing function SF used by the memory controller 9 can be changed in response to a control signal, e.g. to change an op- eration mode of the queuing apparatus 1.
The queuing apparatus 1 shown in Fig. 1 can be implemented in a traffic management device according to a second aspect of the present invention. The traffic management device can be, for example, a switch device or a router device of a network.
Further, the queuing apparatus 1 as shown in Fig. 1 can be integrated in an integrated circuit forming a third aspect of the present invention. The integrated circuit can be a chip comprising a queuing apparatus 1 as shown in Fig. 1. The invention further provides according to a fourth aspect a method for queuing Packet Descriptors received on each system clock cycle of the system clock signal CLK from a number N of input lanes concurrently using a queuing engine 2, wherein the queuing engine 2 comprises a predetermined number K of queues Q each having a number N of sub-queues SQ associated to the corresponding input lanes of the queuing engine 2. All Descriptors of the sub-queues SQ of the predetermined number K of queues Q are stored in a shared descriptor memory 10 of the queuing engine 2 being adapted to store up to N Descriptors and to retrieve one descriptor per system clock cycle of the system clock signal CLK. In a possible implementation in the method according to the fourth aspect of the present invention in the same system clock cycle the system clock signal CLK Packet Descriptors applied to the N input lanes are written in parallel into the shared descriptor memory 10 and a single descriptor is read from the shared descriptor memory 10.
In a possible implementation the method for queuing Descriptors can be performed by executing a control program having instructions to perform the method according to the fourth aspect of the present invention. This control program can be stored in a possible implementation in a program memory. In a possible implementation this control program can be loaded from a data carrier storing such a control program. In a possible implementation a method and apparatus for sharing memory in a queuing system that can withstand multiple writes and a single read in every system clock cycle can be used in a VLSI application.
The queuing apparatus 1 according to the present invention uses a multi-port shared memory to create a queuing system. It also reduces the silicon area needed when integrating the queuing apparatus 1 on a chip. Furthermore, a lower power consumption is achieved by using a lower frequency.
In a possible specific implementation the number N of input lanes in N=4 is the number of queues K=1024. The queue size is limited only by the total descriptor memory. A single queue Q can use the entire descriptor memory. Different inputs of the queuing apparatus 1 can send data to the same queue Q within the same clock cycle. In the queuing apparatus 1 according to the present invention a single queue Q is represented by N sub-queues SQ. The input lane is connected to a specific sub-queue SQ. A queue Q is selected by the scheduler 3 and a sub-queue SQ is selected using a predetermined sequencing function SF. The queuing apparatus 1 allows to provide a less complex scheduler 3 since the scheduler 3 does only see K queues Q.
In a possible implementation each of the lanes inserts its incoming packet to its own set of queues. Further, a time stamp TS can be used to define an order between sub-queues SQ of the same queue Q.
In a possible implementation each queue Q comprises four sub-queues SQ. When a queue Q is selected, all four head time stamps TS are read and compared by an evaluation unit controlling a multiplexing logic as shown in Fig. 4. In case that the same queue Q gets inserted from more than one lane on the same clock cycle the lane number can serve in a possible implementation as a differentiator. In the queuing apparatus 1 according to the present invention only the queue head/tail memory is duplicated, wherein the main Packet Descriptor memory is not duplicated thus saving memory space.

Claims

Patent Claims
1. A queuing apparatus (1) having a queuing engine (2) comprising: a predetermined number, K, of queues Q wherein each queue, Q, has a number, N, of sub-queues, SQ, associated to a corresponding number, N, of input lanes of said queuing engine (2),
wherein each sub-queue, SQ, is used for storing Packet Descriptors applied by each associated input lane of said queuing engine (2); and a shared descriptor memory (10) adapted to store all Packet Descriptors of the
sub-queues, SQ, of the predetermined number, K, of queues, Q, of said queuing engine (2),
wherein in each system clock cycle of a system clock signal (CLK) applied to said queuing engine (2), up to N input lanes request a write operation, WR, to queues, Q, and up to one read operation, RR, from any queue, Q.
2. The queuing apparatus (1) according to claim 1,
wherein each sub-queue, SQ, is adapted to store a maximum number, M, of Packet Descriptors applied to the associated input lane of said sub-queue, SQ, and wherein each sub-queue, SQ, uses the entire shared description memory (10), wherein M is the number of Descriptors that the shared descriptor memory (10) holds.
3. The queuing apparatus according to claim 1 or 2,
wherein for each of said N input lanes of said queuing engine (2) a corresponding enqueuing unit (7) is provided,
said enqueuing unit (7) being adapted to enqueue Packet Descriptors applied to the respective input lane during each system clock cycle of the system clock signal (CLK) into sub-queues, SQ, corresponding to the respective input lane.
4. The queuing apparatus according to claim 3,
wherein said queuing engine (2) further comprises a dequeuing unit (8) adapted to dequeue a descriptor from any of the queues, Q, during the system clock cycle of the system clock signal (CLK) applied to said queuing engine (2).
5. The queuing apparatus according to one of the preceding claims 1 to 4, further comprising a scheduler (3) adapted to select a queue, Q, for being dequeued by the dequeuing unit (8) of said queuing engine (2),
wherein said queuing engine (2) further comprises a memory controller (9) adapted to select one of the sub-queues, SQ, of the selected queue, Q, using a sequencing function, SF.
6. The queuing apparatus according to claim 5,
wherein said sequencing function, SF, used by said memory controller (9) comprises:
a First-in-First-Out, FIFO, sequencing function,
wherein the sub-queue, SQ, among the N sub-queues of the selected queue, Q, is selected which comprises the descriptor in its head of queue to which a minimum time stamp, TS, is attached, or
wherein said sequencing function, SF, used by said memory controller (9) comprises:
a Round Robin sequencing function,
wherein a different sub-queue, SQ, is selected among the N sub-queues of a queue, Q, each time a queue is selected by said scheduler (3) for being dequeued by the dequeuing unit (8) of said queuing engine (2), or
wherein said sequencing function, SF, used by said memory controller (9) comprises a deficit Round Robin sequencing function.
7. The queuing apparatus according to one of the preceding claims 1 to 6,
wherein said shared descriptor memory (10) comprises a multi-write-single-read memory system comprising:
a control logic adapted to receive a number, n, of write requests, WRs, and to receive a read request, RR, within a system clock cycle of the system clock signal (CLK) and n+1 memory banks adapted to store data, n being an integer number;
wherein said control logic of said multi-write-single read memory system is adapted to control memory bank occupancy levels, MBOLs, of each memory bank such that the difference between memory bank occupancy levels, MBOLs, of the memory banks are minimized.
8. The queuing apparatus according to claims 3 or 4,
wherein said queuing engine (2) further comprises a memory controller (9) adapted to receive queued data comprising Descriptors from said enqueuing units (7) and to supply dequeuing data comprising Descriptors to said dequeuing unit (8).
9. The queuing apparatus according to claims 7 or 8,
wherein said memory controller (9) is connected to the control logic of said multi-write-single read memory system forming the shared descriptor memory (10) of said queuing engine (2).
10. The queuing apparatus according to one of the preceding claims 8 or 9,
wherein said queuing engine (2) comprises a next pointer memory, a free buffer pool memory and a time stamp memory which are formed by a multi-write-single read memory system connected to said memory controller (9);
said time stamp memory being provided for storing time stamps, TS, attached to Packet Descriptors received via the input lane by said queuing engine (2).
11. The queuing apparatus according to claims 3 or 4,
wherein said enqueuing units (7) and said dequeuing unit (8) of said queuing engine (2) have access to a queue information memory (11) which stores information data of queues, Q.
12. The queuing apparatus according to claim 10,
wherein for each packet received via the input lane by said queuing engine (2) a time stamp, TS, is attached to its Packet Descriptor wherein said time stamp, TS, indicates an arrival time of the respective packet.
13. The queuing engine according to claim 12,
wherein the Packet Descriptor comprises a memory address to address payload of the received packet stored in a shared data memory (4) of said queuing apparatus (1).
14. A traffic management device comprising a queuing apparatus (1) according to one of the preceding claims 1 to 13.
15. A method for queuing Packet Descriptors received on each system clock cycle of a system clock signal (CLK) from a number N of input lanes concurrently using a queuing engine (2) comprising a predetermined number, K, of queues, Q, each having a number, N, of sub-queues, SQ, associated to the corresponding input lanes of said queuing engine (2),
wherein all Descriptors of the sub-queues, SQ, of the predetermined number, K, of queues, Q, are stored in a shared descriptor memory (10) of said queuing engine (2), which is adapted to store up to N Descriptors and retrieve one descriptor per system clock cycle of the system clock signal (CLK).
EP11794131.0A 2011-12-07 2011-12-07 Queuing apparatus Withdrawn EP2783277A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2011/072086 WO2013083191A1 (en) 2011-12-07 2011-12-07 Queuing apparatus

Publications (1)

Publication Number Publication Date
EP2783277A1 true EP2783277A1 (en) 2014-10-01

Family

ID=45315788

Family Applications (1)

Application Number Title Priority Date Filing Date
EP11794131.0A Withdrawn EP2783277A1 (en) 2011-12-07 2011-12-07 Queuing apparatus

Country Status (3)

Country Link
EP (1) EP2783277A1 (en)
CN (1) CN103988167A (en)
WO (1) WO2013083191A1 (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5524265A (en) * 1994-03-08 1996-06-04 Texas Instruments Incorporated Architecture of transfer processor
SE508050C2 (en) * 1995-11-09 1998-08-17 Ericsson Telefon Ab L M Device and method of packet delivery
US6134638A (en) * 1997-08-13 2000-10-17 Compaq Computer Corporation Memory controller supporting DRAM circuits with different operating speeds
US7236489B1 (en) * 2000-04-27 2007-06-26 Mosaid Technologies, Inc. Port packet queuing
US6947418B2 (en) * 2001-02-15 2005-09-20 3Com Corporation Logical multicast packet handling
US20070147404A1 (en) * 2005-12-27 2007-06-28 Lucent Technologies, Inc. Method and apparatus for policing connections using a leaky bucket algorithm with token bucket queuing
US9436432B2 (en) * 2005-12-30 2016-09-06 Stmicroelectronics International N.V. First-in first-out (FIFO) memory with multi-port functionality
US8542693B2 (en) * 2007-08-01 2013-09-24 Texas Instruments Incorporated Managing free packet descriptors in packet-based communications

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2013083191A1 *

Also Published As

Publication number Publication date
CN103988167A (en) 2014-08-13
WO2013083191A1 (en) 2013-06-13

Similar Documents

Publication Publication Date Title
US8861515B2 (en) Method and apparatus for shared multi-bank memory in a packet switching system
US6272567B1 (en) System for interposing a multi-port internally cached DRAM in a control path for temporarily storing multicast start of packet data until such can be passed
EP1774714B1 (en) Hierarchal scheduler with multiple scheduling lanes
US20200044989A1 (en) Method and apparatus for using multiple linked memory lists
EP0886939B1 (en) Efficient output-request packet switch and method
US20050243829A1 (en) Traffic management architecture
US9602436B2 (en) Switching device
US9769092B2 (en) Packet buffer comprising a data section and a data description section
EP2526478B1 (en) A packet buffer comprising a data section an a data description section
US20070297330A1 (en) Scalable Link-Level Flow-Control For A Switching Device
US7483377B2 (en) Method and apparatus to prioritize network traffic
US6598132B2 (en) Buffer manager for network switch port
US7110405B2 (en) Multicast cell buffer for network switch
US10021035B1 (en) Queuing methods and apparatus in a network device
CN114531488A (en) High-efficiency cache management system facing Ethernet exchanger
EP1488600A1 (en) Scheduling using quantum and deficit values
EP2754050B1 (en) A method and apparatus for storing data
US8156265B2 (en) Data processor coupled to a sequencer circuit that provides efficient scalable queuing and method
WO2013083191A1 (en) Queuing apparatus
Guesmi et al. Design of a priority active queue management
Alisafaee et al. Architecture of an embedded queue management engine for high-speed network devices
Feng Design of per Flow Queuing Buffer Management and Scheduling for IP Routers
WO2010040983A1 (en) Switching device

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20140623

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20150616

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20161228