EP1523829A2 - Dispositif et procede efficaces de traitement de paquets en pipeline - Google Patents

Dispositif et procede efficaces de traitement de paquets en pipeline

Info

Publication number
EP1523829A2
EP1523829A2 EP03726675A EP03726675A EP1523829A2 EP 1523829 A2 EP1523829 A2 EP 1523829A2 EP 03726675 A EP03726675 A EP 03726675A EP 03726675 A EP03726675 A EP 03726675A EP 1523829 A2 EP1523829 A2 EP 1523829A2
Authority
EP
European Patent Office
Prior art keywords
packet
processing unit
data portion
processing element
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP03726675A
Other languages
German (de)
English (en)
Other versions
EP1523829A4 (fr
Inventor
Koen Deforche
Geert Verbruggen
Luc De Coster
Johan Wouters
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwitch Corp
Original Assignee
Transwitch Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transwitch Corp filed Critical Transwitch Corp
Publication of EP1523829A2 publication Critical patent/EP1523829A2/fr
Publication of EP1523829A4 publication Critical patent/EP1523829A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports
    • H04L49/3072Packet splitting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports
    • H04L49/3063Pipelined operation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/30Peripheral units, e.g. input or output ports

Definitions

  • the present invention relates to telecommunications networks, especially packet switched telecommunications networks and particularly to network elements and communication modules therefor, and methods of operating the same for processing packets, e.g. at nodes of the network.
  • Ideal properties of packet processing are inherent parallelism in processing packets, high I/O (input/output) requirements in both the data plane and control plane (on which a single processing thread can stall) and extremely small cycle budgets which need to be used as efficiently as possible.
  • Parallel processing is advantageous for packet processing in high throughput packet-switched telecommunications networks in order to increase processing power.
  • a shared resource e.g. a database
  • Each processing element can be carrying out an individual task which can be different from tasks carried out by any other processing element.
  • access to a shared resource may be necessary e.g. to a database to obtain relevant inline data.
  • accesses to shared resources of the processing elements generally have a large latency. If a processing element is halted until the reply from the shared resource is received the efficiency is low. Also resources requiring large storage space are normally located off chip so that access and retrieval times are significant.
  • optimizing processing on a processing element having for example, a processing core involves context switching, that is one processing thread is halted and all current data stored in registers is saved to memory in such a way that the same context can be recreated at a later time when the reply from the shared resource is received.
  • context switching takes up a large amount of processor resources or alternatively, time if only a small amount of processor resources is allocated to this task.
  • the present invention solves this problem and achieves a very high efficiency while keeping a simple programming model, without requiring expensive multi-threading on the processing elements and with the possibility to tailor processing elements to a particular function.
  • the present invention relies in part on the fact that, with respect to context switching, typically there is little useful context, or useful context can be reduced to a minimum by judicious task programming, when a shared resource request is launched in a network element of a packet switched telecommunications network. Switching to process another packet does not necessarily require saving the complete state of a processing element.
  • the judicious programming can include organizing the program to be run on each processing element as a sequence of function calls, each call having a context when run on a processing element but requiring no interfunction calls, except for the data in the packet itself.
  • the present invention provides a method of processing data packets in a packet processing apparatus for use in a packet switched network, the packet processing apparatus comprising a plurality of parallel pipelines, each pipeline comprising at least one processing unit for processing a part of a data packet, the method further comprising: organizing the tasks performed by each processing unit into a plurality of functions such that there are substantially only function calls and no interfunction calls and that at the termination of each function called by the function call for one processing unit, the only context is a first data portion.
  • the present invention provides a packet processing apparatus for use in a packet switched network, comprising: means for receiving a packet in the packet processing apparatus; means for adding to at least a first data portion of the packet administrative information including at least an indication of at least one process to be applied to the first data portion; a plurality of parallel pipelines, each pipeline comprising at least one processing unit, and the at least one processing unit carrying out the at least one process on the first data portion indicated by the administrative information to provide a modified first data portion.
  • the present invention also provides a communications module for use in a packet processing apparatus, comprising: means for receiving a packet in the communication module; means for adding to at least a first data portion of the packet administrative information including at least an indication of at least one process to be applied to the first data portion; a plurality of parallel communication pipelines, each communication pipeline being for use with at least one processing unit, and a memory device for storing the first data portion.
  • the present invention also provides a method of processing data packets in a packet processing apparatus for use in a packet switched network, the packet processing apparatus comprising a plurality of parallel pipelines, each pipeline comprising at least one processing unit, the method comprising: adding to at least a first data portion of the packet administrative information including at least an indication of at least one process to be applied to the first data portion; and the at least one processing unit carrying out the at least one process on the first data portion indicated by the administrative information to provide a modified first data portion.
  • the present invention also provides a packet processing apparatus for use in a packet switched network, comprising: means for receiving a packet in the packet processing apparatus; a module for splitting each packet received by the packet processing apparatus into a first data portion and a second data portion; means for processing at least the first data portion; and means for reassembling the first and second data portions.
  • the present invention also provides a method of processing data packets in a packet processing apparatus for use in a packet switched network, comprising splitting each packet received by the packet processing apparatus into a first data portion and a second data portion; processing at least the first data portion; and reassembling the first and second data portions.
  • the present invention also provides a packet processing apparatus for use in a packet switched network, comprising: means for receiving a packet in the packet processing apparatus; a plurality of parallel pipelines, each pipeline comprising at least one processing element, a communication engine linked to the at least one processing element by a two port memory unit, one port being connected to the communication engine and the other port being connected to the processing element.
  • the present invention also provides a communications module for use in a packet processing apparatus, comprising: means for receiving a packet in the communications module; a plurality of parallel communication pipelines, each communication pipeline comprising at least one communication engine for communication with a processing element for processing packets and a two port memory unit, one port of which being connected to the communication engine.
  • the present invention also provides a packet processing unit for use in a packet switched network, comprising: means for receiving a data packet in the packet processing unit; a plurality of parallel pipelines, each pipeline comprising at least one processing element for carrying out a process on at least a portion of a data packet, a communication engine connected to the processing element, and at least one shared resource, wherein the communication engine is adapted to receive a request for a shared resource from the processing element and transmit it to the shared resource. The communication engine is also adapted to receive a reply from the shared resource(s).
  • the present invention also provides a communication module for use with a packet processing unit, comprising: means for receiving a data packet in the communication module; a plurality of parallel pipelines, each pipeline comprising at least a communication engine having means for connection to a processing element, and at least one shared resource, wherein the communication engine is adapted to receive a request for a shared resource and transmit it to the shared resource and for receiving a reply from the shared resource and to transmit it to the means for connection to the processing element.
  • Figures la and lb show a packet processing path in accordance with an embodiment of the present invention.
  • FIGS. 2a and b show dispatch operations on a packet in accordance with an embodiment of the present invention.
  • FIG. 3 shows details of one pipeline in accordance with an embodiment of the present invention.
  • Figure 4a shows the location of heads in a FIFO memory associated with a processing unit in accordance with an embodiment of the present invention.
  • Figure 4b shows a head in accordance with an embodiment of the present invention.
  • FIG. 5 shows a processing unit in accordance with an embodiment of the present invention.
  • Figure 6 shows how a packet is processed through a pipeline in accordance with an embodiment of the present invention.
  • Figure 7 shows packet realignment during transfer in accordance with an embodiment of the present invention.
  • FIG. 8 shows a communication engine in accordance with an embodiment of the present invention.
  • Figure 9 shows a pointer arrangement for controlling a head queue in a buffer in accordance with an embodiment of the present invention.
  • Figure 10 shows a shared resource arrangement in accordance with a further embodiment of the present invention.
  • Figure 11 shows a flow diagram of processing a packet head in accordance with an embodiment of the present invention.
  • the packet processing apparatus consists of a number of processing pipelines, each consisting of a number of processing units.
  • the processing units include processor elements, e.g. processors and associated memory.
  • the processors may be microprocessors or may be programmable digital logic elements such as Programmable Array Logic (PAL), Programmable Logic Arrays (PLA), Programmable Gate Arrays, especially Field Programmable Logic Arrays.
  • the packet processing communication module comprises pipelined communication engines which provide non-local communication facilities suitable for processing units. To complete a packet processing apparatus, processor cores and optionally other processing blocks are installed on the packet processing communication module. The processor cores do not need to have a built-in local hardware context switching facility.
  • the processing elements are preferably combined with a hardware block called the communication engine, which is responsible for non-local communication.
  • This hardware block may be implemented in a conventional way, e.g. as a logic array such as a gate array.
  • the present invention may be implemented by alternative arrangements, e.g. the communication engine may be implemented as a configurable block such as can be obtained by the use of programmable digital logic elements such as Programmable Array Logic (PAL), Programmable Logic Arrays (PLA), Programmable Gate Arrays, especially Field Programmable Logic Arrays.
  • PAL Programmable Array Logic
  • PLA Programmable Logic Arrays
  • Programmable Gate Arrays especially Field Programmable Logic Arrays.
  • the present invention includes an intelligent design strategy over two or more generations whereby in the first generation programmable devices are used which are replaced in later generations with dedicated hardware blocks.
  • Hardware blocks are preferably used for protocol independent functions.
  • protocol dependent functions it is preferred to use software blocks which allow reconfiguration and reprogramming if the protocol is changed.
  • a microprocessor may find advantageous use for such applications.
  • a completed packet processing apparatus 10 comprises a packet processing communication module with installed processors.
  • the processing apparatus 10 has a packet processing path as shown in Fig. la consisting of a number of parallel processing pipelines 4, 5, 6. The number of pipelines depends on the processing capacity which is to be achieved.
  • the processing path comprises a dispatch unit 2 for receiving packets, e.g. from a telecommunications network 1 and for distributing the packets to one or more of the parallel processing pipelines, 4, 5, 6.
  • the telecommunications network 1 can be any packet switched network, e.g. a landline or mobile radio telecommunications network. Each received packet comprises a header and a payload.
  • Each pipeline 4, 5, 6 comprises a number of processing units 4b...e; 5b...e; 6b...e.
  • the processing units are adapted to process at least the headers of the packets.
  • a packet processing unit 4b...e, 5b...e, 6b...e may interface with a number of other circuit elements such as databases that are too big (or expensive) to be duplicated for each processing unit (e.g. routing tables).
  • some information needs to be updated or sampled by multiple pipelines (e.g. statistics or policing info). Therefore, a number of so called shared resources SR1-SR4 can be added with which the processing units can communicate.
  • a specific communications infrastructure is provided to let processing units communicate with shared resources.
  • the shared resources can be located at a distance from the processing units, and because they handle requests from multiple processors, the latencies between a request and an answer can be high.
  • at least one of the processing units 4b...e; 5b...e; 6b...e has access to one or more shared resources via a single bus 8a, 8b, 8c, 8d, 8e and 8f, e.g. processing units 4b, 5b, 6b with SRI via bus 8a, processing units 4b, 5b, 6b and 4c, 5c, 6c and 4e, 5e, 6e and SR2 via busses 8b, 8c and 8d, respectively.
  • the bus 8 may be any suitable bus and the form of the bus is not considered to be a limitation on the present invention.
  • ingress packet buffers 4a, 5a, 6a, and/or egress packet buffers 4f, 5f, 6f may precede and/or follow the processing pipelines, respectively.
  • One function of a packet buffer can be to adapt data path bandwidths.
  • a main task of a packet buffer is to convert the main data path communication bandwidth from the network 1 to the pipeline communication bandwidth.
  • some other functions may be provided in a packet buffer, such as overhead insertion/removal and task lookup.
  • the packet buffer has the ability to buffer a single head (which includes at least a packet header). It guarantees line speed data transfer at receive and transmit side for bursts as big as one head.
  • incoming packets e.g. from a telecommunications network 1 are split into a head and a tail by a splitting and sequence number assigning means which is preferably implemented in the dispatch unit 2.
  • the head includes the packet header, and the tail includes at least a part of the packet pay load.
  • the head is fed into one of the pipelines 4-6 whereas the payload is stored (buffered) in a suitable memory device 9, e.g. a FIFO.
  • the header and payload are reassembled in a reassembly unit (packet merge) 3 before being output, e.g. where they can be buffered before being transmitted through the network 1 to another node thereof.
  • one or more shared resources SRI -4 are available to the processing path, which handle specific tasks for the processing units in a pipeline.
  • these shared resources can be dedicated lookup engines using data structures stored in off-chip resources, or dedicated hardware for specialized functions which need to access shared information.
  • the present invention is particularly advantageous in increasing efficiency when these shared resource engines which are to be used in a processing system respond to requests with a considerable latency, that is a latency such as to degrade the efficiency of the processing units of the pipeline if each processing unit is halted until the relevant shared resource responds.
  • Typical shared resources which can be used with the present invention are an IP forwarding table, an MPLS forwarding table, a policing data base, a statistics database.
  • the functions that are performed by the pipeline structure assisted by shared resources may be:
  • pipeline structure may be assisted by the following shared resources:
  • One aspect of the use of shared resources is the stall time of processing units while waiting for answers to requests sent to shared resources.
  • a processing unit In order for a processing unit to abandon one currently pending task, change to another and then return to the first, it is conventional to provide context switching, that is to store the contents of registers of the processor element.
  • An aspect of the present invention is the use of hardware accelerated context switching. This also allows a processor core to be used for the processing element which is not provided with its own hardware switching facility.
  • This hardware is preferably provided in each processing node, e.g. in the form of a communication engine.
  • Each processing unit maintains a pool of packets to be processed. When a request to a shared resource is issued, a processing element of the relevant processing unit switches context to another packet, until the answer on the request has arrived.
  • One aspect of the present invention is to exploit packet processing parallelism in such a way that the processing units can be used as efficiently as possible doing useful processing, thus avoiding waiting for I/O (input/output) operations to complete.
  • I/O operations are, for example, requests to shared resources or copying packet information in and out of the processing element.
  • the present invention relies in part on the fact that typically there is little useful context, or useful context can be reduced to a minimum by judicious task programming, when a shared resource request is launched in a network element of a packet switched telecommunications network. Switching to process another packet does not necessarily require saving the complete state of a processing element.
  • the judicious programming can include organizing the program to be run on each processing element as a sequence of function calls, each call having a context when run on a processing element but requiring no interfunction calls.
  • the exception is context provided by the data in the packet itself or in a part of the packet.
  • the size of the head is chosen such that it contains all relevant headers that have been received with the packet. This can be done, for example, by splitting at a fixed point in the packet (after the maximum sized header supported). This can result in some of the payload being split off to the head. Generally, this does not matter as the payload is usually not processed. However, the present invention includes the possibility of the payload being processed, for instance for network rate control.
  • the packet data contains multi-resolutional data
  • the data can, when allowed, be truncated to a lower resolution by the network depending upon the bandwidth of the network forward of the node.
  • the present invention includes within its scope more accurate evaluation of the packet to recognize header and payload and to split these cleanly at their junction.
  • the separated head (or header) is fed into a processing pipeline, while the tail (or payload) is buffered (and optionally processed using additional processing elements not shown) and reattached to the (modified) head after processing.
  • the head is then supplied to one of the processing pipelines, while the tail is stored into a memory such as a FIFO 9.
  • Each packet is preferably assigned a sequence number by the sequence number assigning module 15. This sequence number is copied into the head as well as into the tail of each packet and stored. It may be used for three purposes:
  • the sequence number can be generated, for example, by a counter included in the packet splitting and sequence number assigning means 15.
  • the counter increments with each incoming packet. In that way, the sequence number can be used to put packets in a specific order at the end of the pipelines.
  • An overhead generator is provided in the packet dispatcher 2 or more preferably in the packet buffer 4a, 5a, 6a generates new/additional overhead for each head and/or tail. . After the complete head has been generated, the head is sent to one of the pipelines 4-6 that has buffer space available. The tail is sent to the tail FIFO 9.
  • the added overhead includes administrative data in both the head and/or the tail.
  • a process flow is shown schematically in Fig. 2a.
  • the new overhead preferably contains the sequence number and a length, i.e. the length of the payload, and may optionally include a reference to the pipeline used to process the corresponding head.
  • the added overhead preferably includes a Head Administration Field (HAF), and an area to store results and status generated by the packet processing pipeline.
  • HAF Head Administration Field
  • a head can comprise a result store, a status store, and administrative data store.
  • the HAF can contain head length, offset, sequence number and a number of fields necessary to perform FIFO maintenance and head selection.
  • Fig. 2b shows an alternative set of actions performed on a packet within the processing apparatus.
  • Each head processed by the pipeline may be preceded by a scratch area which can be used to store intermediate results. It may also be used to build a packet descriptor which can be used by processing devices downstream of the packet processing unit.
  • the packet buffer 4a, 5a, 6a at the beginning of each pipeline can add this scratch area to the packet head.
  • the packet buffer 4f, 5f, 6f at the end removes it (at least partially), as shown in Figure 2b.
  • the header contains some link layer information, defining the protocol of the packet. This has to be translated into a pointer to the first task to be executed on the packet by the packet processing unit. This lookup can be performed by the ingress packet buffer 4a, 5a, 6a.
  • the head when it is in the pipeline includes a reference to a task to be performed by the current and/or the next processing unit.
  • a part of the context of a processor element is stored in the head. That is, the current version of the HAF in a head is equivalent to the status of the processing including an indication of the next process to be performed on that head.
  • the head itself may also store in- line data, for example intermediate values of a variable can be stored in the scratch area. All information that is necessary to provide a processing unit with its context is therefore stored in the head.
  • the context moves with the head in the form of the data stored in the relevant parts of the head, e.g. HAF, scratch area.
  • a novel aspect of the present invention is that the context moves with the packet rather than the context being static with respect to a certain processor.
  • the packet reassembly module 3 reassembles the packet heads coming from the processing pipelines 4-6 and the corresponding tails coming from the tail FIFO 9.
  • Packet networks may be divided into those in which each packet can be routed independently at each node (datagram networks) and those in which virtual circuits are set up and packets between a source and a destination use one of these virtual circuits. Thus, depending upon the network there may be differing requirements on packet sequencing.
  • the reassembly module 3 assures packets leave in the order they arrive or, alternatively, in any other order as required.
  • the packet reassembly module 3 has means for keeping track of the sequence number of the last packet sent.
  • the head searches the outputs of the different processing pipelines for the head having a sequence number which may be sent, as well as the end of the FIFO 9 to see which tail is available for transmission, e.g. the next sequence number.
  • the packets are processed in the pipelines strictly in accordance with sequence number so that the heads and their corresponding tails are available at the reassembly module 3 at the same time. Therefore, it is preferred if means for processing packets in the pipelines strictly in accordance with sequence number are provided.
  • the appropriate head is propagated to the output of the pipeline, it is added in the reassembly module 3 to the corresponding tail, which is preferably the first entry in the tail FIFO 9 at that moment.
  • the reassembly unit 3 or the egress packet buffer 4f, 5f, 6f removes the remaining HAF and other fields from the head.
  • a processing unit When a packet must be dropped, a processing unit has a means for setting an indication in the head that a head is to be dropped, e.g. it can set a Drop flag in the packet overhead.
  • the reassembly module 3 is then responsible for dropping this head and the corresponding tail.
  • each processing unit 4b....4d comprises a processing element 14b - 14d and a communication engine 1 lb-d.
  • the communication engine may be implemented in hardware, e.g. a configurable digital logic element and the processing element may include a programmable processing core although the present invention is not limited thereto.
  • Some dedicated memory is allocated to each processing unit 4b-d, respectively.
  • a part of the data memory of each processing element is preferably a dual port memory, e.g. a dual port RAM 7b ....7d or similar.
  • One port is used by the communication engine 1 lb...d and the other port is connected to the processing element of this processing unit.
  • the communication engine 1 lb...d operates with the heads stored in memory 7b...7d in some circumstances as if this memory is organized as a FIFO.
  • the heads may be stored logically or physically as in a FIFO. By this means the heads are pushed and popped from this memory in accordance with their arrival sequence.
  • the communication engine is not limited to using the memory 7b....7d in this way but may make use of any capability of this memory, e.g. as a two-port RAM, depending upon the application.
  • the advantage of keeping a first-in-first-out relationship among the headers as they are processed is that the packet input sequence will be maintained automatically which results in the same output packet sequence.
  • the present invention is not limited thereto and includes the data memory being accessed by the communication engine in a random manner.
  • the communication engines communicate with each other for transferring heads. Thus, when each communication engine is ready to receive new data, a ready signal is sent to the previous communication engine or other previous circuit element.
  • Fig. 4a when moving from the output to the input port of a RAM 7b...7d, three areas of the memory are provided: one containing heads that are processed and ready to be sent to the next stage, another containing heads that are being processed, and a third containing a head that is partially received, but not yet ready to be processed.
  • the RAM 7b....7d is divided in a number of equally sized buffers 37a-h. Each buffer 37a-h contains only one head. As shown schematically in Fig. 4b each head contains:
  • HAF Head Administration Field
  • Scratch Area an optional area to be used as a scratch pad, to communicate packet state between processors or to build the packet descriptors that will leave the system.
  • the buffers 37a-h each preferably have means for storing the data in the scratch area.
  • Packet Overhead overhead to be removed from the packet (decapsulation) or to be added to the packet (encapsulation).
  • the buffers 37a-h each preferably have means for storing the packet overhead.
  • Head Packet Data the actual head data of the packet.
  • the buffers 37a-h each preferably have means for storing the head packet data.
  • each buffer provides some space for shared resource requests at the end of the buffer.
  • the buffers 37a-h each preferably have means for storing the shared resources requests.
  • the HAF contains packet information (length), and the processing status as well as containing part of the "layer2" information, if present (being at least, for instance, a code indicating the physical interface type and a "layer3" protocol number).
  • a communication module in accordance with an embodiment of the present invention may comprise the dispatch unit 2, the packet assembly unit 3, the memory 9, the communication engines 1 lb...d, the dual port RAM 7b-d, optionally the packet buffers as well as suitable connection points to the processing units and to the shared resources.
  • the communications module is provided with its complement of processing elements a functioning packet processing apparatus is formed.
  • a processing unit in accordance with an embodiment of the present invention is shown schematically in Fig. 5.
  • a processing unit 4b comprises a processing element 14b, a head buffer memory 7b preferably implemented as a dual-port RAM, a program memory 12b and a communications engine 1 lb.
  • a local memory 13b for the processing element may be provided.
  • the program memory 12b is connected to the processing element 14b via an > instruction bus 16b and is used to store the programs running on the processing element 14b.
  • the buffer memory 7b is connected to the processing element 14b by a data bus 17b.
  • the communication engine 1 lb monitors the data bus via a monitoring bus 18b to detect write accesses from the processing element to any HAF in one of the buffers.
  • the communication engine 1 lb monitor and update the status of each buffer in its internal registers.
  • the communication engine 1 lb is connected to the buffer memory 7b by a data memory bus 19b.
  • one or more processing blocks may be included with the processing element 14b, e.g. co-processing devices such as an encryption block in order to reduce load on the processing element 14b for repetitive data intensive tasks.
  • a processing element 14b in accordance with the present invention can efficiently be implemented using a processing core such as an Xtensa® core from Tensilica, Santa Clara, CA, USA.
  • a processing core with dedicated hardware instructions to accelerate the functions that will be mapped on this processing element make a good trade-off between flexibility and performance.
  • the needed processing element hardware support can be added in such a processor core, i.e. the processor core does not require context switching hardware support.
  • the processing element 14b is connected to the communication engine 1 lb through a system bus 20b - resets and interrupts may be transmitted through a separate control bus (best shown in Fig. 8). From the processing element's point of view, the data memory 7b is not a FIFO, but merely a pool of packets, from which packets can be selected for processing using a number of different selection algorithms.
  • processing elements are synchronized in such a way that the buffers 37a-h do not over- or underflow. Processing of a head is done in place at a processing element. Packets are removed from the system as quickly as they arrive so processing will never create the need for extra buffer space. So, a processing element should not generate a buffer overflow. Processing a packet can only be started when enough data are available.
  • the hardware (communication engine) suspends the processing element when no heads are eligible for processing.
  • the RAM 7b 7d provides buffer storage space and allows the processing elements to be decoupled from the processing pace of the pipeline.
  • Each processing element can decide to drop a packet or to strip a part of the head or add something to a head.
  • a processing element simply sets the Drop flag in the HAF. This will have two effects: the head will not be eligible anymore for processing and only the HAF will be transferred to the next stage.
  • the packet reassembler 3 receives a head having the Drop bit set, it drops the corresponding tail.
  • the HAF has an offset field which indicates the location of the first relevant byte. On an incoming packet, this will always be equal to zero.
  • the processing element makes the Offset flag point to the first byte after the part to be stripped.
  • the communication engine will remove the part to be stripped, realign the data to word boundaries, update the Length field in the HAF, and put the offset field back to zero. This is shown in figure 7.
  • the advantage of this procedure is that the next status to be read by a communication engine is always located at a certain part of the HAF, hence the communication engines (and processing elements) can be configured to access the same location in the HAF to obtain the necessary status information. Also, more space may be inserted in a HAF by negative offset values. Such space is inserted at the front of the HAF.
  • the dispatching unit 2 can issue a Mark command by writing a non-zero value into a Mark register. This value will be assigned to the next incoming packet, i.e. placed in the head.
  • the mark value can result in generation of an interrupt.
  • One purpose of marking a packet is when performing table updates. It may be necessary to know when all packets received before a certain moment, have left the pipelines. Such packets need to be processed with old table data. New packets are to be processed with new table data. Since packet order remains unchanged through the pipelines, this can be accomplished by marking an incoming packet. In packet processing apparatus in which the order is not maintained, a timestamp may be added to each head instead of a mark to one head. Each head is then processed according to its timestamp. This may involve storing two versions of table information for an overlap time period.
  • Each processing element has access to a number of shared resources, used, for example, for a variety of tasks such as lookups, policing and statistics. This access is via the communications engine associated with each processing element.
  • a number of buses 8 a-f are provided to connect the communication engines to the shared resources. The same buses 8a-f are used to transfer the requests as well as the answers.
  • each communication engine 1 lb is connected to such a bus 8 via a Shared Resource Bus Interface 24b (SRBI - see Fig. 8).
  • the communication engine and the data memory 7b can be configured via a configuration bus 21.
  • the communication engine 1 lb is preferably the only way for a processing element to communicate to resources other than its local memory 13b.
  • the communication engine 1 lb is controlled by the host processing element 14b via a control interface.
  • the main task of the communication engine 1 lb is to transfer packets from one pipeline stage to the next one. Besides this, it implements context switching and communication with the host processing element 14b and shared resources.
  • the communication engine lib has a receive interface 22b (Rx) connected to the previous circuit element of the pipeline and a transmit interface 23b (Tx) connected to the next circuit element in the pipeline. Heads to be processed are transmitted from one processing unit to another via the communications engines and the TX and RX interfaces, 22b, 23b. If a head is not to be processed in a specific processing unit it can be provided with a tunneling field which defines the number of processing units to be skipped.
  • the ingress packet buffer receives one packet head at bus speed and then sends it to the first processor stage at its own speed. During that period, it is not able to receive a new packet head.
  • the egress packet buffer 4f, 5f, 6f receives a packet head from the last processor stage. When received, it sends the head to the packet reassembly unit 3 at bus speed.
  • the ingress packet buffer can have two additional tasks:
  • the packet "layer2" encapsulation contains a Protocol field, identifying the "layer3” protocol. However, the meaning of this field depends on the "layer2" protocol.
  • the ("layer2" protocol, "layer3” protocol field) pair needs to be translated into a pointer, pointing to the first task to be executed on the packet.
  • the egress packet buffer has one additional task:
  • the processing element can modify the read and write addresses, such that the packet appears to be located at a fixed address.
  • the processing element can fetch the necessary information using a single read access. This information has to be split into different target registers. (FIFO location, head length, protocol, .
  • hardware such as the communication engine, may be provided to support a very simple multitasking scheme.
  • a "context switch" is done; for example, when a process running on a processing element has to wait for an answer from a shared resource or when a head is ready to be passed to the next stage.
  • the hardware is responsible for selecting a head that is ready to be processed, based on the HAF. Packets are transferred from one stage to another via a simple ready/available protocol or any other suitable protocol. Only the part of a buffer that contains relevant data is transferred. To achieve this the head is modified to contain the necessary information for directing the processing of the heads.
  • processing of a packet is split up into a number of tasks.
  • Each task typically handles the response to a request and generates a new request.
  • a pointer to the next task is stored in the head.
  • Each task first calculates and then stores the pointer to the next task.
  • buffers containing a packet can be in three different states:
  • the communication engine maintains the packet state, e.g. by storing the relevant state in a register, and also provides packets in the Ready4Processing state to the processor with which it is associated. After being processed, a packet is in the Ready4Next or Waiting state. In the case of the Ready 4Next state, the communication engine will transmit the packet to the next stage. When in the Waiting state, the state will automatically be changed by the communication engine to the Ready4Processing or Ready4Next state when the shared resource answer arrives.
  • the communication engine is provided to select a new packet head.
  • the selection of a new packet head is triggered by a processing element, e.g. by a processor read on the system bus.
  • a Current Buffer pointer is maintained in a register, indicating the current packet being processed by the processing element.
  • FIG. 8 A schematic representation of a communication engine in accordance with one embodiment of the present invention is shown in Fig. 8. The five main tasks of the communication engine may be summarized as follows: Buffer management:
  • Tx Transmit side 23
  • the five functions described above have been represented as four finite state machines (FSM, 32, 33, 34a, 34b) and a buffer manager 28 in Fig. 8 It should be understood that this is a functional description of the blocks of the communication engine and does not necessarily relate to actual physical elements.
  • the Finite State machine representation of the communications engine as shown in Fig. 8 can be implemented in a hardware block by standard processing techniques. For example, the representation may be converted into a hardware language such as Verilog or VHDL and a netlist for a hardware block, e.g. a gate array, may then be generated automatically from the VHDL source code.
  • Main data structures (listed after most involved task) handled by the communication engine are:
  • NewPacketRegister preparing 'HAF and buffer location of next packet to be processed by processor.
  • the control interface 26 may be provided to configure the communication engine, e.g. the registers and random access memory size.
  • a port of the data memory 7 is connected to the communication engine 11 via the Data Memory (DM) RAM interface 27 and the bus 19.
  • DM Data Memory
  • this bus 19 is used to fill the packet buffers 37a-h in memory 7 with data arriving at the RX interface 22 of the communication engine 11, or empty it to the TX interface 23, in both cases via the RAM arbiter 25.
  • the arbiter 25 organizes and prioritizes the access to DM RAM 7 between the functional units (FSMs): SR RX 34b, SR TX 34a, next packet selection 29, Receiving 32, Transmitting 33.
  • FSMs functional units
  • Each processor element 14 has access to a number of shared resources, used for lookups, policing and statistics.
  • a number of buses 8 are provided to connect processing elements 14 to the shared resources. The same bus 8 may be used to transfer the requests as well as the answers.
  • Each communication engine 11 is connected to such a bus via a Shared Resource Bus Interface 24 (SRBI).
  • SRBI Shared Resource Bus Interface 24
  • Each communication engine 11 maintains a number of packet buffers 37a-h.
  • Each buffer can contain one packet, i.e. has means for storing one packet. With respect to packet reception and transmission, the buffers are dealt with as a FIFO, so packet order remains unaltered. Packets enter from the RX Interface 22 and leave through the TX Interface 23.
  • the number of buffers, buffer size and the start of the buffer area in the data memory 7 are configured via the control interface 26. Buffer size is always a power of 2, and the buffer start is always a multiple of the buffer size. In that way, each memory address can easily be split up in a buffer number and an offset in the buffer.
  • Each buffer can contain the data of one packet.
  • a write access to a buffer by a processing element 14 is monitored by the communication engine 11 via the monitoring bus 18 and updates the buffer state in a buffer state register accordingly.
  • a buffer manager 28 maintains four pointers in registers 35, two of them pointing to a buffer and two of them pointing to a specific word in a buffer: • RXWritePointer: points to the next word that will be written when receiving data. After reset, it points to the first word of the first buffer.
  • TXReadPointer points to the next word that will be read when transmitting data. After reset, it points to the first word of the first buffer.
  • LastTransmittedBuffer points to the last transmitted buffer, or to the buffer that is being transmitted, i.e. it is updated to point to a buffer as soon as the first word of that buffer is being read. After reset, it points to the last buffer.
  • CurrentBuffer points to the buffer that is currently in use by the processor.
  • An associated Current-BufferValid flag indicates whether the content of CurrentBuffer is valid or not. When a process element is not processing any packet, CurrentBufferValid is cleared.
  • Each buffer is in one of the following five states:
  • the packet in the buffer can be selected for processing by the processor.
  • a WaitingLevel is maintained for each buffer in the registers 35.
  • a WaitingLevel different from zero indicates that the packet is waiting for some event, and should not be handed over to the processor, nor transmitted. Typically, WaitingLevel represents the number of ongoing shared resource requests. After reset, all buffers are in the Empty state. When a packet is received completely, the state of the buffer where it was stored, is updated to ReadyForProcessing state for packets that need to be processed, or to the ReadyForTransfer state for packets that need no processing (e.g. dropped packets). The WaitingLevel for a buffer is set to zero on any incoming packet.
  • the processor 14 After processing a packet, the processor 14 updates the buffer state of that packet, by writing the Transfer and SRRequest bit into the HAF, i.e. into the relevant buffer of the dual port RAM 7. This write is monitored by the communication engine 11 via the monitoring bus 18.
  • the processor 14 can put a buffer in a ReadyForProcessing or ReadyFor-Transfer state if there are no SR requests to be sent, or to the ReadyForTransferWSRPending or ReadyFor- ProcessingWSRPending states if there are requests to be sent. From the ReadyForTransferWSRPending or ReadyForProcessing WSRPending states, the buffer state returns to ReadyForTransfer or ReadyForProcessing as soon as all requests are transmitted.
  • the ReadPointer When the ReadPointer reaches the start of a new buffer, it waits until that buffer gets into the Ready-ForTransfer state and has WaitingLevel equal to zero, before reading and transmitting the packet. As soon as the transmission starts, the buffer state is set to Empty. This guarantees that the packet cannot be selected anymore. (Untransmitted data can not be overwritten even if the buffer is in the Empty state, because the WritePointer will never pass the ReadPointer).
  • Packet transmission is triggered when the buffer ReadPointer points to, gets into the ReadyForTransfer state and has a WaitingLevel of zero.
  • the buffer state is set to Empty, Then, the HAF and the scratch area are read from the RAM and transmitted. The words that contain only overhead to be stripped are skipped. Then the rest of the packet data is read and realigned before transmission, such that the remaining overhead bytes in the first word are removed. However if a packet has its Drop flag set, the packet data is not read. After a packet is transmitted, ReadPointer jumps to the start of the next buffer.
  • the communication engine maintains the CurrentBuffer pointer, pointing to the buffer of the packet currently being processed by the processing element.
  • An associated Valid flag indicates that the content of Current-Buffer is valid. If the processor is not processing any packet, the Valid flag is set to false. Five different algorithms are provided to select a new buffer:
  • FirstPacket (0) returns the buffer containing the oldest packet.
  • NextPacket (1) returns the first buffer after the current buffer containing a packet. If there is no current buffer, behaves like FirstPacket.
  • FirstProccesablePacket (2) returns the buffer containing the oldest packet in the ReadyForProcessing state.
  • NextProcessablePacket (3) returns the first buffer after the current buffer containing a packet in the ReadyForProcessing state. If there is no current buffer, behaves like FirstProcessablePacket.
  • NextBuffer (4) returns the first buffer after the current buffer. If there is no current buffer, returns the first buffer.
  • Task a pointer to the next task.
  • Tunnel set if the next task is not on this or on the next processor.
  • SRRequest set if shared resource accesses have to be done before switching to the next task.
  • Transfer and SRRequest bits are not only written into the memory, but also monitored by the communication engine via the XLMI interface. This is used to update the buffer state:
  • the communication engine 11 provides a generic interface 24 to shared resources.
  • a request consists of a header followed by a block of data sent to the shared resource.
  • the communication engine 11 generates the header in the SRTX 34a, but the data has to be provided by the processor 14.
  • three ways of assembling the request can be distinguished: • Immediate: the data to be sent are part of the RequestlD. This works for requests containing only small amounts of data.
  • the reply on the request is stored at a position indicated by the Offset field in the RequestlD (offset), or to a default offset (default).
  • the RequestlD contains location and size of the data. Two request types are provided: one where the data are located in the packet buffer (relative), and one where the location points to an absolute memory address (absolute). An offset field indicates where the reply must be stored in the buffer.
  • Sequencer a small sequencer collects data from all over the packet and builds the request.
  • the RequestlD contains a pointer to the start of the sequencer program.
  • An offset field indicates where the reply must be stored in the buffer.
  • RequestType determines the type of the request as discussed above.
  • SuccessBit the index of the success bit to be used (see below)
  • Last set for the last RequestlD for the packet. Cleared for other RequestlD's.
  • Offset position in the buffer where the reply of the request must be stored. The offset is in bytes, starting from the beginning of the buffer.
  • Offset if set, indicates that the Offset indicates where the end of the reply must be positioned. Offset then points to the first byte after the reply. If cleared, Offset points to the position where the first byte of the reply must be stored.
  • Length number of words to be transmitted, for a memory request.
  • the processor After putting the RequestlD's in the buffer memory 7, the processor indicates the presence of these IDs by setting the SRRequest bit in the HAF (this is typically done when the HAF is updated for the next task).
  • the SRRequest bit in the HAF is checked. This can be done by evaluating the buffer state. If set, the buffer number of this packet is pushed into a small FIFO, the SRRequest FIFO. When this FIFO is full, the Idle task is returned on a request for a new packet, to avoid overflow.
  • the SR TX state machine 34a (Fig. 8) pops buffer numbers from the SRRequest FIFO.
  • the WaitLevel field is incremented by one.
  • a reply is received, it is decremented by one.
  • the buffer state is set to ReadyForTransfer (when coming from ReadyForTransferWSRPending) or ReadyForProcessing (when coming from ReadyForProcessing WSRPending). This mechanism guarantees that a packet can only be transmitted or processed (using the Next/First-ProcessablePacket algorithm) not earlier than the moment where
  • the destination address of a reply is decoded by the shared resource bus socket.
  • Replies that match the local address are received by the communication engine over the SRBI RX interface 24b.
  • the reply header contains a buffer number and offset where the reply has to be stored. Based on this, the communication engine is able to calculate the absolute memory address.
  • the data part of the reply is received from the SRBI bus 8 and stored into the data memory 7.
  • the success bits are updated by performing a read-modify- write on the HAF in the addressed buffer, and finally the WaitLevel field of that buffer is decremented by one.
  • Some of the shared resource requests can end with a success or failure status (e.g. Exact Match resource compares an address to a list of addresses.
  • a match returns an identifier, no match returns a failure status).
  • Means are added to propagate this to the HAF of the involved packet.
  • a number of bits e.g. five, are provided in the HAF which can catch the result of different requests. Therefore it is necessary that a RequestlD specifies which of the five bits has to be used.
  • Shared resources can also be put in a chain, i.e. the result of a first shared resource is the request for a second shared resource and so on. Each of these shared resources may have a success or failure status and thus may need its own success bit. It is important to note that the chain of requests is discontinued when a resource terminates with a failure status. In that case the failing resource sends its reply directly to the originating communication engine.
  • the processing element 14 associated with a communication engine 11 can make the communication engine 11 issue one or more requests to the shared resources, by writing the necessary RequestlD's into the relevant packet's buffer.
  • Each RequestlD is, for example, a single 64 bit word, and will cause one shared resource request to be generated.
  • Replies from a shared resource are also stored in the packet's buffer.
  • the process of assembling and transmitting the requests to shared resources is preferably started when the packet is not being processed any more by the processor. The packet can only become selectable for processing again after all replies from the shared resources have arrived. This guarantees that a single buffer will never be modified by the processor and the communication engine at the same time.
  • a shared resource request is invoked by sending out the request information together with information for the next action from a processing element to the associated communications engine. This is a pointer identifying the next action that needs to be performed on this packet, and an option to indicate that the packet needs to be transferred to the next processing unit for that action.
  • the processing unit reads the pointer to the action that needs to be performed next. This selection is done by the same dedicated hardware, e.g. the communication engine, which regulates the copying of heads into and out of the buffer memory 7 for the processing element relating to the processing unit.
  • the communication engine also processes the answers from the shared resources.
  • a request to a shared resource preferably includes a reference to the processing element which made the request. When the answer returns from the shared resource, the answer includes this reference.
  • the processing model is that of a single thread of execution. There is no need for an expensive context switch that needs to save all processing element states, an operation that may either be expensive in time or in hardware. Moreover, it trims down the number of options for the selection of such a processing element.
  • the single thread of execution is in fact an endless loop of:
  • This programming model thus strictly defines the subsequent actions which will be performed on a single packet, together with the stage in which these actions will be performed. It does not define the order of (action, packet) tuples which are performed on a single processing element. This is a consequence of timing and latency of the shared resources, and exact behavior as such is transparent to the programming model.
  • the rigid definition of this programming model allows a verification of the programming code of the actions performed on the packets on a level which does not need to include the detail of these timing and latency figures.
  • a further embodiment of the present invention relates to how the shared resources are accessed.
  • Processing units and shared resources are connected via a number of busses, e.g. double 64 bit wide busses.
  • Each node (be it a processing unit or a shared resource) has a connection to one or more of these busses.
  • the number of busses and number of nodes connected to each bus are determined by the bandwidth requirements.
  • Each node preferably latches the bus, to avoid long connections. This allows a high speed, but also a relative high latency bus.
  • All nodes have the same priority and arbitration is accomplished in a distributed manner in each node.
  • Each node can insert a packet whenever an end of packet is detected on the bus. While inserting a packet, it stalls the incoming traffic. It is assumed that this simple arbitration is sufficient when the actual bandwidth is not too close to the available bandwidth, and latency is less important. The latter is true for the packet processor, and the former can be achieved by a good choice of the bus topology.
  • the shared resources may be connected to double bit wide busses as shown schematically in Fig. 10.
  • the processing units PI to P8 are arranged on one bus and can access shared resources SRI and SR2, the processing units P9 to P16 are arranged on a second bus and can only access SR2, the processing units PI 7, 19, 21, 23 to P24 are arranged on a third bus and can only access SR3 and the processing units PI 8, 20, 22, 24 are arranged on a fourth bus and can only access SR3.
  • Processing nodes communicate with the shared resources by sending messages to each other on the shared bus. Each node on the bus has a unique address. Each node can insert packets on the bus whenever the bus is idle. The destination node of a packet removes the packet from the bus. A contention scheme is provided on the bus to prevent collisions. Each request traveling down the bus is selected by the relevant shared resource, processed and the response is placed on the bus again.
  • the buses may be in the form of a ring and a response travels around the ring until the relevant processing unit/shared resource is reached at which point it is received by that processing unit/shared resource.
  • a packet entering a processing pipeline 4-6 triggers a chain of actions which are executed on that processing pipeline for that packet.
  • An action is defined as a trace of program (be it in hardware or in software) code that is executed on a processing element during some amount of clock cycles without interaction with any of the shared resources or without communication with the next processing element in the pipeline.
  • An action ends on either a request to a shared resource, or by handing over the packet to the next stage.
  • This sequence of actions, shared resource requests and explicit packet hand-overs to the next stage is shown schematically in figure 6 in the form of a flow diagram.
  • a packet head is first delivered from the dispatch unit.
  • the processing element of the processing unit of the first stage of the pipeline performs an action on this head.
  • a request is then made to a shared resource SRI.
  • the head remains in the associated FIFO memory.
  • a second action is carried out by the same processing element. Accordingly, within one processing element, several of these actions on the same packet can be performed.
  • the modified head is transferred to the next stage where further actions are performed on it.
  • each buffer may be one of the following possible buffer states:
  • R4PwSRPending ready for processing after transmission of SR requests
  • R4TwSRPending ready for transfer after transmission of SR requests
  • a new packet head is presented at the receive port of a communications engine and if there is a free (empty) buffer location in the memory, the packet head is received and the status of free buffers is accessed via the buffer manager. If a free buffer exists, the head data is sent in step 102 to the memory and stored in step 104 in the appropriate buffer, i.e. at the appropriate memory location.
  • the buffer state in the buffer state register is updated by the communication engine from empty to R4P if the head is to be processed (or R4T for packet heads that do not require processing, e.g. dropped and tunneled packet heads). As older packet heads in the buffers are processed and sent further down the pipeline, after some time, the current R4P packet head is ready to be selected.
  • step 108 the processing element finishes processing of a previous head and requests a next packet head from the communications engine.
  • the next packet selection is decided in step 110 on the basis of the buffer states contained in the buffer state register. If no R4P packet heads are available then idle is returned by the communications engine to the processor. The processing element will request the same again until a non-idle answer is given.
  • step 114 the communications engine accesses the next packet register and sends the next packet head location and the associated task pointer is sent to the processing element.
  • the processing element In order for the processing element to get started right away, not only the next packet head location is provided in the answer, also the associated task pointer is given. This data is part of the HAF of the next packet head to be processed and hence requires the cycle(s) of a read to memory. Therefore the communication engine continuously updates in step 112 the new packet register with a packet head location + task pointer tuple so as to have this HAF read take place outside the cycle budget of the processing element.
  • step 116 the processing element processing the packet head and updates the HAF fields 'Transfer' and 'SRRequest'.
  • the communications engine monitors the data bus and on the basis of this bus monitoring between the processing element and the memory, the buffer state manager is informed to update the buffer state in step 118. For instance, a head can become R4P or R4T if no SR requests are to be sent or R4PwSRP ending or R4TwSRPending if SR request are to be sent.
  • step 120 the pending SR request triggers the SR transmit machine after the processing phase to assemble and transmit the SR requests that are listed at the end of the buffer, i.e. the requestlDs list.
  • step 122 the request IDs are processed in sequence.
  • the indirect type requests require reads from memory.
  • step 124 for every request that expects an answer back, as opposed to a command, the WaitingLevel counter is increased.
  • step 126 upon receipt of an SR answer, the SR receive machine processes the result, and writes in step 128 writing to the memory, more specifically to the buffer location associated with the appropriate packet head.
  • step 130 the WaitingLevel counter is decreased.
  • a packet head is set to R4P or R4T in step 132.
  • a first-in-first out approach is taken for the packet head stream in the buffers.
  • step 134 when the oldest present packet head becomes 'R4T' then the transmit machine will output this packet head to the transmit port.
  • processing pipelines in accordance with the present invention meet the following requirements:
  • a processing unit is able to read, strip and modify the heads; items which a processing unit is not interested in are transferred to the next stage without any intervention of the processing unit. Thus, parts of the payload carried in the header are not corrupted but simply forwarded.
  • processing units are synchronized.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

L'invention concerne un dispositif de traitement de paquets destiné à être utilisé dans un réseau à commutation par paquets, comprenant un moyen servant à recevoir un paquet, un moyen servant à ajouter des informations administratives à une première partie de données du paquet, les informations administratives renfermant au moins une indication portant sur au moins une opération à effectuer sur la première partie de données, ainsi qu'une pluralité de pipelines parallèles comprenant chacun au moins une unité de traitement permettant de réaliser le traitement de la première partie de données indiqué par les informations administratives et de modifier la première partie de données. Selon un procédé, les tâches réalisées par chaque unité de traitement sont organisées sous forme d'une pluralité de fonctions de façon qu'il y ait uniquement des appels de fonction et pas d'appels interfonctionnels et de façon qu'à la fin de chaque fonction appelée par l'appel de fonction pour une unité de traitement, le seul contexte soit une première partie de données.
EP03726675A 2002-04-26 2003-04-25 Dispositif et procede efficaces de traitement de paquets en pipeline Withdrawn EP1523829A4 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GB0209670 2002-04-26
GBGB0209670.9A GB0209670D0 (en) 2002-04-26 2002-04-26 Efficient packet processing pipelining device and method
PCT/US2003/014259 WO2003091857A2 (fr) 2002-04-26 2003-04-25 Dispositif et procede efficaces de traitement de paquets en pipeline

Publications (2)

Publication Number Publication Date
EP1523829A2 true EP1523829A2 (fr) 2005-04-20
EP1523829A4 EP1523829A4 (fr) 2007-12-19

Family

ID=9935632

Family Applications (1)

Application Number Title Priority Date Filing Date
EP03726675A Withdrawn EP1523829A4 (fr) 2002-04-26 2003-04-25 Dispositif et procede efficaces de traitement de paquets en pipeline

Country Status (5)

Country Link
EP (1) EP1523829A4 (fr)
CN (1) CN100450050C (fr)
AU (1) AU2003228900A1 (fr)
GB (1) GB0209670D0 (fr)
WO (1) WO2003091857A2 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103026679B (zh) * 2010-07-26 2016-03-02 惠普发展公司,有限责任合伙企业 网络设备中检测到的模式的减轻
WO2018188738A1 (fr) * 2017-04-11 2018-10-18 NEC Laboratories Europe GmbH Procédé et appareil de gestion de paquets pour des fonctions de service de réseau
CN113364685B (zh) * 2021-05-17 2023-03-14 中国人民解放军国防科技大学 一种分布式mac表项处理装置和方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4755986A (en) * 1985-09-13 1988-07-05 Nec Corporation Packet switching system
US4899333A (en) * 1988-03-31 1990-02-06 American Telephone And Telegraph Company At&T Bell Laboratories Architecture of the control of a high performance packet switching distribution network
US5341369A (en) * 1992-02-11 1994-08-23 Vitesse Semiconductor Corp. Multichannel self-routing packet switching network architecture
WO1999031580A1 (fr) * 1997-12-16 1999-06-24 Intel Corporation Processeur presentant de multiples registres d'adresse d'instruction et de tampons d'analyse a l'exterieur d'une pipeline d'execution
WO2000068780A2 (fr) * 1999-05-11 2000-11-16 Sun Microsystems, Inc. Logique de commutation dans un processeur a unites d'execution multiples
WO2001048606A2 (fr) * 1999-12-28 2001-07-05 Intel Corporation Signalisation par filieres dans des processeurs reseau mulitfilieres
US6286027B1 (en) * 1998-11-30 2001-09-04 Lucent Technologies Inc. Two step thread creation with register renaming
WO2002005499A1 (fr) * 2000-01-30 2002-01-17 Celox Networks, Inc. Dispositif et procede de mise en forme des paquets
JP2002116908A (ja) * 2000-10-05 2002-04-19 Arm Ltd ネイティブおよび非ネイティブの命令集合間相互呼び出し

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4755986A (en) * 1985-09-13 1988-07-05 Nec Corporation Packet switching system
US4899333A (en) * 1988-03-31 1990-02-06 American Telephone And Telegraph Company At&T Bell Laboratories Architecture of the control of a high performance packet switching distribution network
US5341369A (en) * 1992-02-11 1994-08-23 Vitesse Semiconductor Corp. Multichannel self-routing packet switching network architecture
WO1999031580A1 (fr) * 1997-12-16 1999-06-24 Intel Corporation Processeur presentant de multiples registres d'adresse d'instruction et de tampons d'analyse a l'exterieur d'une pipeline d'execution
US6286027B1 (en) * 1998-11-30 2001-09-04 Lucent Technologies Inc. Two step thread creation with register renaming
WO2000068780A2 (fr) * 1999-05-11 2000-11-16 Sun Microsystems, Inc. Logique de commutation dans un processeur a unites d'execution multiples
WO2001048606A2 (fr) * 1999-12-28 2001-07-05 Intel Corporation Signalisation par filieres dans des processeurs reseau mulitfilieres
WO2002005499A1 (fr) * 2000-01-30 2002-01-17 Celox Networks, Inc. Dispositif et procede de mise en forme des paquets
JP2002116908A (ja) * 2000-10-05 2002-04-19 Arm Ltd ネイティブおよび非ネイティブの命令集合間相互呼び出し

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO03091857A2 *

Also Published As

Publication number Publication date
WO2003091857A3 (fr) 2003-12-11
AU2003228900A1 (en) 2003-11-10
EP1523829A4 (fr) 2007-12-19
WO2003091857A2 (fr) 2003-11-06
CN1663188A (zh) 2005-08-31
WO2003091857A9 (fr) 2007-12-13
GB0209670D0 (en) 2002-06-05
AU2003228900A8 (en) 2003-11-10
CN100450050C (zh) 2009-01-07

Similar Documents

Publication Publication Date Title
US20050232303A1 (en) Efficient packet processing pipeline device and method
US6804815B1 (en) Sequence control mechanism for enabling out of order context processing
US6032190A (en) System and method for processing data packets
US8856379B2 (en) Intelligent network interface system and method for protocol processing
US7110400B2 (en) Random access memory architecture and serial interface with continuous packet handling capability
US6731631B1 (en) System, method and article of manufacture for updating a switching table in a switch fabric chipset system
US7546399B2 (en) Store and forward device utilizing cache to store status information for active queues
US6804731B1 (en) System, method and article of manufacture for storing an incoming datagram in switch matrix in a switch fabric chipset system
JP4068166B2 (ja) 高性能多層スイッチ要素用探索エンジン・アーキテクチャ
US7328277B2 (en) High-speed data processing using internal processor memory space
US5418781A (en) Architecture for maintaining the sequence of packet cells transmitted over a multicast, cell-switched network
US7936758B2 (en) Logical separation and accessing of descriptor memories
US9450894B2 (en) Integrated circuit device and method of performing cut-through forwarding of packet data
US20030115347A1 (en) Control mechanisms for enqueue and dequeue operations in a pipelined network processor
US6754744B2 (en) Balanced linked lists for high performance data buffers in a network device
JP2002538726A (ja) 高性能ネットワークインターフェースを有するダイナミックパケットバッチングのための方法および装置。
JP2002538723A (ja) 高性能ネットワークインターフェイスを有するデータリアセンブリのための方法および装置
US8099515B2 (en) Context switched route look up key engine
US5935235A (en) Method for branching to an instruction in a computer program at a memory address pointed to by a key in a data structure
US6724759B1 (en) System, method and article of manufacture for transferring a packet from a port controller to a switch fabric in a switch fabric chipset system
US7536692B2 (en) Thread-based engine cache partitioning
US20050102474A1 (en) Dynamically caching engine instructions
EP1631906B1 (fr) Maintien d'un ordre d'entites au moyen de gestionnaires de grille
US20040246956A1 (en) Parallel packet receiving, routing and forwarding
EP1523829A2 (fr) Dispositif et procede efficaces de traitement de paquets en pipeline

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20041124

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

RIC1 Information provided on ipc code assigned before grant

Ipc: H04L 12/56 20060101ALI20070907BHEP

Ipc: H04L 12/28 20060101AFI20041206BHEP

A4 Supplementary search report drawn up and despatched

Effective date: 20071119

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 9/38 20060101ALI20071113BHEP

Ipc: H04L 12/56 20060101ALI20071113BHEP

Ipc: H04L 12/28 20060101AFI20041206BHEP

R17D Deferred search report published (corrected)

Effective date: 20071213

17Q First examination report despatched

Effective date: 20080327

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20081007