HK1069046B

HK1069046B - A scalable network processor and apparatus and method for operating the same

Info

Publication number: HK1069046B
Application number: HK05102297.1A
Authority: HK
Inventors: 小詹姆斯‧艾伦; 布赖恩‧M‧巴斯; ‧L‧卡尔维格纳克琼; 桑托斯‧P‧高尔; 马科‧C‧赫德斯; 迈克尔‧S‧西戈尔; 法布里斯‧J‧沃普兰肯
Original assignee: 国际商业机器公司
Priority date: 1999-08-27
Filing date: 2005-03-15
Publication date: 2007-04-04

Description

Scalable network processor and method and apparatus for operating the same

The present application is a divisional application of patent application No. 00811558.3 entitled "network switch and method using network processor" filed on 24/8/2000.

Technical Field

The present invention relates to a communication network apparatus that can be used to connect various types and functions of information processing systems or computers to the apparatus. The invention relates in particular to a retractable exchange device and components for assembling the device. The present invention relates to an improved multifunction interface device and the combination of such a device with other components to form a medium speed network switch. The invention also relates to a method of operating such an arrangement, thereby improving the ability of a network switch to process data flows.

Background

In the following description we assume that the reader has some knowledge of the network data traffic, switches and routers used in the communications network. We assume in particular that the reader is familiar with the ISO schema of the network architecture, which is the division of network operation into several layers. A typical ISO schema-based architecture extends from layer 1 (sometimes referred to as "L1"), where layer 1 refers to the physical channel or medium from which signals are passed up to layers 2, 3, 4, and up to layer 7, where layer 7 is the application layer, where the application runs on a computer system connected to a network. In this context, we will denote the respective layers in the network architecture by L1, L2, etc. The present disclosure also assumes that the reader has a basic understanding of the "packets" and "frames" that consist of bit strings in network communications.

Bandwidth is a very important resource in today's network world. Network traffic is increasing due to the development of the internet and the continued emergence of other new applications, placing tremendous pressure on the capacity of the network infrastructure. To keep pace, many companies are seeking better techniques and methods to support and manage the growth of traffic and the convergence of voice and data.

The current dramatic increase in network traffic is mainly due to the popularity of the internet, the increasing frequency of access to remote information, and the continuous emergence of new applications. The internet alone has made the network backbone overwhelmed by the rapid growth of electronic commerce. The amount of data information is increasing and for the first time exceeds telephone traffic, where the internet plays a vital role. At the same time, the increasing demand for remote access applications, including email, database access, and file transfer, is increasing the pressure on the network.

Convergence of voice and data will become a leading role in future network environments. Currently, data transmission over Internet Protocol (IP) networks is not costly. This is because audio communication naturally seeks the lowest cost way and voice will necessarily be grouped with data. In this changing market, the cost performance ratios of technologies such as voice over IP (VolP), voice over ATM (VoATM), and voice over frame relay (VoFR) all make it a niche. However, to make these techniques practical, the industry must also guarantee voice quality of service (QoS) and determine how to charge for voice transmissions over the data lines. The telecommunications liberty act of 1996 further complicates this environment. The legislation is intended to achieve a cooperative relationship between the selected voice protocol ATM and the selected data protocol IP.

With the continuous emergence of new products and new functions, the integration of new products into the original system becomes a great concern for many companies. To keep investments in legacy equipment and software, many companies are demanding solutions that can move new technologies in without interfering with the use of legacy equipment.

Eliminating network bottlenecks remains an ongoing hurdle for many service providers. The source of these bottlenecks is often the router. However, the usual network congestion is often misjudged as a bandwidth problem, and the search for higher bandwidth is considered as a solution. Manufacturers are becoming increasingly aware of this problem. They are beginning to move to network processor technology in an attempt to more efficiently manage bandwidth resources at wire speeds and to provide advanced data services, which are typically provided in routers and network application servers. These services include load balancing, QoS, gateways, firewalls, security, and Web caching.

For applications for remote access, performance, required bandwidth, security and authentication are among the most important aspects. The need to combine QoS with CoS, integrated voice processing, more complex security solutions, will also impact the design considerations of future remote access network switches. In addition, remote access must also accommodate the ever increasing number of physical media, such as ISDN, T1, El, OC-3 to OC-48, cable, and xDSL modems.

The definition by the industry advisor under the network processor (also referred to herein as "NP") is: a programmable communications integrated circuit capable of performing one or more of the following functions:

packet classification-identifying packets according to known characteristics (e.g., address or protocol)

Packet modification-modifying a packet so that it conforms to IP, ATM, or other protocols (e.g., updating a real-time activation field in an IP header)

Queue/policy management-reflecting the design strategy of packet queues

Packet dequeuing and scheduling for specific applications

Packet forwarding-sending and receiving data through a switching fabric and forwarding or routing the packet to the appropriate address.

While this definition gives an accurate description of the basic features of early NPs, the development of full capabilities and benefits of NPs remains to be realized. Since tasks originally handled by software can now be performed by hardware, the network processor can increase bandwidth and solve problems in a large number of applications. In addition, NPs also improve speed in architectures such as parallel distributed processing and pipelined processing designs. The function can improve the efficiency of the search engine, increase the processing capacity and quickly process complex tasks.

It is expected that a network processor will become the basic network fabric element in a network, just like a CPU for a PC. The basic functions that an NP can provide include: real-time processing, security, store and forward, switch fabric, IP packet processing, and learning functions. The NP targets are ISO layer 2 to layer 5, which are primarily responsible for optimizing network-related tasks.

Processor-mode NPs combine multiple general-purpose processors with specialized logic. Currently, suppliers look at such designs, which provide a scalable, flexible solution to address the variety of changes in a timely and economical manner. The processor-mode NP can perform distributed processing at a lower level of integration, and has higher processing capacity, better flexibility and controllability. Programmability also enables convenient migration to new protocols and new technologies without the need to redesign application specific integrated circuits. By processor-mode NPs, NEVs can benefit from two aspects: firstly, the unrewritable engineering cost is reduced, and secondly, the time to market is shortened.

Disclosure of Invention

It is an object of the present invention to provide a scalable switch architecture for data communications networks that improves the speed of processing of transmitted data while providing robust support for a variety of potential needs. To achieve the above object, the present invention provides several components and combinations of components that allow the data processing capacity of the processing unit to be significantly reduced compared to before.

It is a further object of the present invention to provide an interface device or network processor (the terms being used interchangeably) which includes several subcomponents integrated on a single substrate that act together to exchange frames at intermediate speeds in layers 2, 3, 4 and 5. The interface device may provide basic functionality to the workgroup switch as a single solution, advanced functionality to the workgroup switch as an interconnect solution, or advanced functionality through cooperation with a switch fabric device.

To this end, the invention provides a device comprising: a self-routing switch fabric means for directing data entering the device from the identifiable address to flow from the device to the identified address, said switch fabric means having an input port and an output port through which the data flows; a control point processor; and a plurality of interface devices coupled to said switch fabric device and said control point processor, each of said interface devices having: a semiconductor substrate; a plurality of interface processors mounted on said substrate, said processors being at least 5 in number; an internal instruction memory mounted on said substrate and storing instructions accessible by said interface processor; a data memory mounted on said substrate and storing data accessible by said interface processor via said interface device; a plurality of input/output ports mounted on said substrate; at least one of the input/output ports connects the data memory with an external data memory; at least two other input/output ports exchange data with external network via the interface device under the direction of the interface processor; said control point processor cooperating with said interface means to load instructions into said instruction memory by said control point processor, said instructions to be executed by said interface processor to control the exchange of data between said data exchange input/output ports and the flow of data through said data memory; and a plurality of said interface devices connected to each input port and output port of said switch fabric device.

Brief description of the drawings

Having thus described some of the objects of the present invention, other objects of the invention will now be described with reference to the accompanying drawings. In these drawings:

fig. 1 is a block diagram showing an interface device of the present invention.

Fig. 1A is a block diagram illustrating a MAC.

Fig. 2A to 2D are views showing the case where the interface apparatus is interconnected with other components in different system configurations.

Fig. 3 shows the flow and processing steps for compressing a boot frame.

Fig. 4 shows the flow and processing steps of an internal boot frame.

Fig. 5 is a generalized version of the guide unit.

Fig. 6 shows a format of frame control information.

FIG. 7 shows the form of an associator.

Fig. 8 is a command control information format.

Fig. 9 is an address information format.

Fig. 10 is a general format for structure addressing.

Fig. 11 is an addressing and island coding scheme.

FIG. 12A is a block diagram of an embedded processor complex.

FIG. 12B is a diagram of an embedded processor.

Fig. 12C shows the structure of the GxH processor.

FIG. 13 is a block diagram of a memory complex.

FIG. 14 is a flow chart of a Fixed Match (FM) search algorithm.

FIG. 15 is a data structure flow diagram for both the case where no direct tables are used and the case where direct tables are used.

Fig. 16 is a block diagram of a switching system exemplified by Prizma.

Fig. 17 is a block diagram of a CP.

FIG. 18 is a block diagram of a single-chip network processor, with primary emphasis on its EDS-UP, EDS-DOWN, and EPC functions.

Examples of the invention

The present invention will be described more fully hereinafter with reference to the accompanying drawings, in which preferred embodiments of the invention are shown, but it will be understood by those skilled in the art that certain changes may be made therein without departing from the scope of the invention. Accordingly, it is to be understood that the following description is only intended to provide one of ordinary skill in the art with a broad, informative disclosure, and is not to be construed as limiting the invention.

The device in the present disclosure is scalable, has an interconnected desktop or workgroup switch function, can be accessed to a network backbone, and can provide backbone switch services. The device may support hardware above layer 2, layer 3, and layer 4. Some versions of the device are designed for desktop or workgroup switch aggregates, while others are designed for core backbone switches.

The architecture for the apparatus is based on an interface device or network processor hardware subsystem and a software library running at the control point, as will be explained more fully later in this document. The interface device or network processor subsystem is a high performance frame forwarding engine that parses and translates the protocol headers of L2, L3, and L4 +. This allows protocols to be exchanged at a faster rate by using hardware. The interface device or network processor subsystem provides a fast channel through the box, and the software library and control point processor provide the management and route display functions required to maintain the fast channel. The control point processor and the software library running thereon collectively define the Control Point (CP) of the system. Both real bridges and routing protocols (e.g., transparent bridges and OSPF) run on the CP. The CP may also act as a low speed channel in the system.

The apparatus of the present disclosure may operate solely as a switch on L2, in addition to supporting multiple layers of forwarding in hardware devices, and this is also its default mode of operation in the brief disclosure of this patent. There is a port in each domain so that any device can communicate with other devices. The device may be configured at L2 to enable a system administrator to configure certain functions, such as dividing a communication port into separate domains or trunk areas, configuring virtual data network (VLAN) segments, or configuring filters to control broadcast and multicast traffic.

This telescopic arrangement has several advantages. First, it enables the system administrator to configure L3 forwarding and routing of IP and IPX traffic using the same hardware and at the same speed as L2. Second, it does not require the use of external routers to connect the campus building while improving performance. Third, it simplifies or merges the management of L2/L3 services, concentrating it on one control point. Finally, it also provides some value-added features for the L4+ function, enabling system administrators to distribute different classes of traffic to support mission-critical applications, and it also enables the network dispatcher to dynamically balance between different servers.

The device can be used as a modular unit, and basically comprises the following components: an interface device or network processor, a Control Point (CP) and a switch fabric arrangement (optional). The interface device preferably provides the fast path forwarding services of L2/L3/L4+, and the CP may provide the management and route display functions required to maintain the fast path. The above alternative switch fabric apparatus is used in the case where more than two interface device subsystems are interconnected. For details about this alternative Switching fabric device, see U.S. Pat. No. 5,008,878, published as 1991, 16.4 and entitled "High speed modular Switching Apparatus for Circuit and Packet Switched Traffic".

The device is built-up mounted using printed circuit board units (also referred to herein as "trays"). The printed circuit board units are loaded with circuit components which are inserted into connectors of the device housing. Similar devices are also referred to as "optional inserts". For this device we assume that the chassis or housing of the tray is interchangeable with the appropriate connectors and backplane power connections. The basic component that makes up the various pallets is the pallet subsystem. From the perspective of the tray subsystem, there may be three pallets. The first is a CP-only pallet, which consists of a carrier subsystem and a CP subsystem. Such CP-only pallets are mainly used for products where redundancy is a priority. The second type is a CP + media pallet, which consists of a carrier subsystem, a CP subsystem, and a 1-3 media subsystem. Such CP + media trays are primarily used for products where port density is more important than redundancy. The third is a media tray, which consists of a carrier subsystem, and a 1-4 media subsystem. The media tray can be used with any chassis, and the media subsystem can be configured.

The pallet management includes fault detection, power management, new device detection, initialization, and configuration. The management is achieved by using various registers, I/O signals, and a pilot unit interface for communication between the CP and the carrier subsystem. Unlike the chassis, however, all trays have programmable devices and memory on them. The degree of programmability depends on the type of pallet. Both the CP and carrier subsystems are programmable when a CP subsystem is present on the pallet. The media subsystem is also programmable, but can only be implemented indirectly through the carrier subsystem.

In some more powerful products, there is also an exchange tray that contains the exchange fabric device subsystem. The management of the trays includes fault detection, power management, new device detection and initialization and configuration. The above management is accomplished by using various registers and I/O signals that can be mapped into the CP subsystem.

The invention assumes in its simplified form such a switching apparatus a control point processor and an operable interface device connected to the control point processor. In the present disclosure, we prefer to assume the interface device (also known as a network processor) as a single, very large scale integrated circuit device (VLSI, very large scale integrated circuit) or integrated circuit block comprising a semiconductor substrate having a plurality of interface processors thereon, said substrate having internal instruction memory thereon and storing data accessible by the interface processors via the device, internal data memory on the substrate and stored data accessible to the interface processors via the device; and a plurality of input/output ports. The interface processor is also sometimes referred to herein as a micro-microprocessor or processing unit. Among the ports provided, at least one port is used to connect the internal data memory with the external data memory, and more than two ports are used to exchange data with the external network through the interface device under the direction of the interface processor. The control point cooperates with the interface device to input commands into the command memory by the control point and to execute these commands by the interface processor to control the exchange of data between the input port and the output port and the flow of data through the data memory.

The network processor in the present disclosure is treated as an invention other than the individual components that make up the switch. The network processor disclosed herein is considered to have within its unit some other invention not discussed herein.

Fig. 1 is a block diagram illustrating the interface device integrated circuit package, which includes a substrate 10 and a plurality of sub-components integrated on the substrate. These subcomponents are arranged in an upstream configuration and a downstream configuration. Here, "upstream" refers to the flow of data from the network into the device, and "downstream" refers to the flow of data out of the device into the network service. The data streams flow according to different configurations, respectively. Thus, there is one upstream and one downstream. The sub-components in the up-line include: enqueue-dequeue-schedule UP (EDS-UP)16, multiplex composite MAC-UP (PPM-UP)14, exchange data mover-UP (SDM-UP)18, System Interface (SIF)20, data alignment Serial Link connection A (DASLA)22, and Data Alignment Serial Link B (DASLB) 24. Data alignment Serial linking is described in detail in U.S. patent application No. 09/330,968 entitled High Speed parallel/Serial Link for Data Communication, filed on month 6 1999. Although the device in the present invention preferably takes the form of a DASL connection, the present invention recognizes that other forms of connection may be used to achieve higher data flow rates, particularly limiting data flow to VLSI architectures.

The sub-components in the downstream line include: DASL-A26, DASL-B28, SIF 30, SDM-DN 32, EDS-DN 34, and PPM-DN 36. Also contained within the integrated circuit block are a plurality of internal S-RAMs, a communication management scheduler 40, and an Embedded Processor Complex (EPC) 12. Interface device 38 is connected to PMMs 14 and 36 via respective DMU buses. The interface 38 may be any suitable L1 line, such as an ethernet entity (ENET PHY), ATM framer, etc. The type of interface is somewhat dependent on the network medium to which the integrated circuit package is connected. There are a plurality of external D-RAMs and S-RAMs available for use by the integrated circuit block.

While the network of the present disclosure typically flows data from associated switches and routing devices over electrical conductors, such as wires and cables routed in a building, the present disclosure assumes that the network switches and components of the present disclosure may also be used in a wireless environment. For example, the Medium Access Control (MAC) unit described herein may be replaced by a suitable radio frequency unit, such as known sige technology, to connect the unit directly to the wireless network. When the above techniques are applied appropriately, the rf unit can be integrated into the VLSI structure of the present disclosure by one skilled in the art. Alternatively, radio frequency or wireless transponders (e.g., infrared transponders) may be mounted on the pallet along with other units of the present disclosure to form a switching device for a wireless network system.

Arrows show the general flow of data within the interface device. Frames received from the ethernet MAC are placed by EDS-UP in a buffer of the internal data store. These frames are queued into the EPC queue as normal data frames or system control lead frames (fig. 1). The EPC contains N protocol processors, and can process N frames (N > 1) at most in parallel. In one embodiment with 10 protocol processors (fig. 12B), two protocol processors are dedicated: one for processing the pilot frame (normal central processor, or GCH) and the other for building the lookup data in the control memory (normal tree processor, or GTH). Frames As shown in FIG. 12A, the EPC contains a dispatcher to match new frames to idle processors; a completion unit for maintaining the sequence of frames; a common instruction memory shared by all ten processors; a classifier hardware accelerator to determine the classification of the frame and a co-processor to help determine the starting instruction address of the frame; an entry and exit data storage connection for controlling read-write operations of the frame buffer; a control memory arbiter that enables ten processors to share a control store; web controller, arbitrator and connection that can debug the data structure of the internal interface device; as well as other hardware configurations.

The dispatcher sends in the pilot frame as soon as the GCH processor is available. Various coded operations in the boot frame are then performed by the GCH processor, such as register writes, counter reads, ethernet MAC configuration changes, etc. Various changes to the lookup table, such as MAC or IP entry additions, will also be communicated to the lookup data processor for performing control memory operations, such as memory reads and writes. Some commands, such as MIB counter read, require a reply frame to be created and forwarded to the appropriate port on the appropriate interface device. In some cases, the lead frame is encoded for the egress port of the interface device. These frames are forwarded to the egress of the querying interface device, which then performs the encoding operation and builds the appropriate reply frame.

The data frame is sent to the next available protocol processor where a frame lookup is performed. Frame data is sent to the protocol processor along with results from a Classifier Hardware Accelerator (CHA) engine. CHA resolves IP or IPX. The result will determine the start of the tree search algorithm and the Common Instruction Address (CIA). The supported tree search algorithm includes a fixed matching tree (fixed size format requiring exact fits, such as a layer 2 ethernet MAC table), a longest prefix matching tree (variable length format requiring variable length matching, such as subnet IP forwarding), and a software processing tree (two patterns defining ranges or byte mask sets, such as used in filter rules).

The lookup is performed with the help of a Tree Search Engine (TSE) coprocessor, which is part of the protocol processor. The TSE coprocessor performs control memory accesses freeing the protocol processor for continued execution. The control memory stores all tables, counters and other data required by the picocode. The control memory operations are managed by a control memory arbiter that arbitrates memory accesses among the ten processors.

The frame data is accessed by the data storage coprocessor. The data storage coprocessor comprises a primary data buffer (capable of accommodating 8 frame data segments with 16 bytes at most), a scratch pad data buffer (capable of accommodating 8 frame data segments with 16 bytes at most) and a plurality of control registers for data storage operation. Once a match is found, the ingress frame change includes a VLAN header insertion or override. This change is not performed by the interface device processor complex, but rather is first flagged by the hardware and then performed by the other ingress conversion interface hardware. Alteration of other frames may be accomplished by the picocode and data store coprocessor by modifying the contents contained in the frame's ingress data store.

Other data is collected and used to construct the switch header and frame header before sending the frame to the switch fabric device. The frame data is accessed by the data storage coprocessor.

After completion, the enqueue coprocessor establishes the necessary form to add the frame to the switch fabric queue and send it to the completion unit. The completion unit ensures the order of the frames from the ten protocol processors to the switch fabric queue. The frame coming out of the switch fabric queue is divided into segments in units of 64 bytes and is embedded with a frame header byte and a switch header byte as it would be sent to the Przma-E switch.

Frames received from the fabric are placed in egress data storage (egress DS) buffers and queued into EPC queues by egress EDS (34). A portion of the frames are sent by the dispatcher to some idle protocol processor to perform a frame lookup. The frame data is sent to the protocol processor along with data from the classifier hardware accelerator. The classifier hardware accelerator uses frame control data generated by the ingress interface device to help determine the start of a Cryptographic Instruction Address (CIA).

The same algorithm is supported by the exit tree search as in the case of the entry retrieval. The lookup is performed on the TSE coprocessor, which enables the protocol processor to continue execution. All control memory operations are managed by a control memory arbiter that arbitrates between ten processors for allocated memory accesses.

Egress frame data is accessed by the data storage coprocessor. The data storage coprocessor comprises a primary data buffer (capable of accommodating 8 frame data segments with 16 bytes at most), a scratch pad data buffer (capable of accommodating 8 frame data segments with 16 bytes at most) and a plurality of control registers for data storage operation. The successful search result may include forwarding information and frame change information. Frame changes may also include VLAN header deletions, real-time activation value-added (IPX) or value-subtracted (IP), IP header and checksum recalculation, Ethernet frame CRC covering or insertion, and MAC DA/SA covering or insertion. The IP header and checksum are prepared by the checksum coprocessor. This change is not performed by the interface device processor complex, but rather is first generated by the hardware flags and then performed by the PMM entry hardware. After completion, the enqueue coprocessor establishes the necessary forms as required to enqueue and send the frames in the EDS egress queue to the completion unit. The completion unit ensures the order of frames from the ten protocol processors to the EDS egress queue for feeding to the egress ethernet MAC 36.

The completed frame is finally sent by the PMM egress hardware to the ethernet MAC and output from the ethernet port.

An internal bus (also known as the Web) provides access to internal registers, counters and memory. The Web also contains an external interface to control command steps or interrupt control during debugging and diagnostics.

The tree search engine co-processor may provide memory range checking, illegal memory access hints, and may also execute tree search instructions (such as memory reads, writes, or read-add-writes) operating in parallel with the protocol processor.

The common instruction memory consists of one 1024 × 128RAM and two sets of complex 512 × 128 RAMs. Each set of multiple RAMs may provide two micro-microcode, thereby enabling the processors to access instructions independently within the same address range. Four 32-bit instructions are included in each 128-bit word, for a total of 8192 instructions may be provided.

The dispatcher controls the passage of frames to the ten protocol processors while managing the interrupt signals and timers.

The completion unit ensures the order in which frames are composited from the processor to the switch fabric and destination port queues. A rich set of instructions contains conditional execution, spellings (for entering hash keys), conditional branches, signed and unsigned operations, leading zero character statistics, and so on.

The classifier hardware accelerator engine passes layer 2 and layer 3 of each frame to the protocol header and provides this information in the process of sending the frame to the protocol processor.

The control memory arbiter controls the access of the processor to the internal and external memories.

The externally controlled memory option includes 5 to 7 DDR DRAM subsystems, each subsystem supporting either a pair of 2 mx 16 bit x 4 banks or a pair of 4 mx 16 bit x 4 banks of DDR DRAM. The DDR DRAM connection runs at a clock frequency of 133 mhz, and a 266 mhz data strobe supports configurable CAS latency and drive strength. In addition, a 133 MHz ZBTSRAM can be added according to the requirement, and the configuration can be 128k multiplied by 36, 2 x 256k multiplied by 18 or 2 x 512k multiplied by 18.

The egress frames may be stored in one external data buffer (e.g., DSO) or may be stored in two external data buffers (DSO and DS 1). Each buffer may consist of a pair of 2 mx 16 bit x 4 rows of DDR DRAMs (up to 256k of 64 byte frame can be stored), or a pair of 4 mx 16 bit x 4 rows of DDR DRAMs (up to 512k of 64 byte frame can be stored). An external data buffer (e.g., DSO) may be selected for 2.28Mbps, and a second buffer (e.g., DS1) may be added to support layer 2 and layer 3 of 4.57 Mbps. Adding a second buffer improves performance but does not increase frame capacity. The external data buffer connection runs at a clock frequency of 133 mhz with a 266 mhz data strobe to support configurable CAS latency and drive strength.

The internal control memory includes two 512 × 128-bit RAMs, two 1024 × 36-bit RAMs, and one 1024 × 64-bit RAM.

The internal data memory can provide a maximum of 2048 frames of 64 bytes of buffering in the ingress direction (upstream).

The fixed frame changes include VLAN tag insertion and deletion in the ingress direction, Time To Live increment/decrement (IP, IPx), ethernet CRC overlay/insertion, and MAC DA/SA overlay/addition in the egress (downstream) direction.

The port mapping function can copy a receive port and a transmit port to a viewport of a given system without using protocol processor resources. The mapped interface device ports may be used to add frames and exchange control data. A separate data channel may control the entry of frames into the ingress switch interface queue.

Four types of ethernet macros are integrated in the interface device. Each macro, in turn, may be individually configured to operate in a fast ethernet mode of 1 gigabit or 10/100. Each ethernet macro can support up to ten 10/100Mbps MACs, or each of the four macros can support a 1000Mbps MAC.

Fig. 1A is a block diagram of a MAC core portion. Each macro includes three ethernet core designs; namely, a multi-port 10/100Mbps MAC core (Fenet), a 1000Mbps MAC core (Genet), and a 100Mbps physical code bottom layer core (PCS).

Function of multiport ethernet 10/100 MAC:

supporting ten serial media devices independent connections on the physical layer

Ten ports capable of processing medium speed or arbitrary speed combination with speed of 10Mbps or 100Mbps

One MAC serves all ten ports through one time-division multiplex connection

Supporting full/semi-bidirectional communication at medium speed on all ports

Supporting IEEE 802.3 binary exponential compensation

Function of 1000Mbps ethernet MAC core:

support of gigabit media independent connection (GMIi) to physical PCS layer directly to physical layer

Supporting full TBI (8b/10b) solutions through PCS core

Supporting full bidirectional point-to-point connections at medium speed

Support IBM PCS core valid byte signal

The function of the underlying core of 1000Mbps Ethernet physical code:

performing 8b/lOb code and decoding

Supporting a PMA (10 bit) service interface as defined in IEEE 802.3z, which can be connected to any PMA conforming to the IEEE 802.3z definition.

Synchronizing data from PMA (two-phase clock) with MAC (single-phase) clock

Supporting automatic negotiations including the next two pages

Conversion from a two-phase clock system defined in the standard to a single-phase clock

Signaling the MAC to indicate the clock cycle containing the new data

Checking the code set (10 bits) received by COMMA and establishing word synchronization

Calculating and checking for non-uniformity of 8b/10b operation

Fig. 2A to 2D illustrate different configurations of the interface device integrated circuit block. This configuration is facilitated by the presence of the DASL and connection to the switch fabric device. Two channels, one transmit channel and one receiver channel, are included in each DASL.

Fig. 2A shows a packet configuration for a single interface device. In this configuration, the transmit channels are cycled into the receive channels.

Fig. 2B shows a configuration in which two interface device integrated circuit blocks are connected. At least two DASLs are mounted in each interface device integrated circuit block. In this configuration, channels on the DASL on one integrated circuit block operate in conjunction with channels on a matching DASL located on another integrated circuit block. The other DASL on each ic is then rolled up.

Fig. 2C shows a configuration when multiple interface devices are connected in a switching fabric. Wherein the double headed arrows indicate transmission in both directions.

Fig. 2D shows a configuration in which a main switch and a backup switch are connected to a plurality of interface devices. If the primary switch fails, the backup switch may be activated.

The Control Point (CP) includes a system processor coupled to each configuration. The system processor on the CP primarily provides initialization and configuration services related to the integrated circuit block. The CP may be in one of three positions: on the IC chip of the interface device, on a carrier on which the IC chip is mounted, or outside the carrier. If outside the pallet, the CP may be remote; that is, elsewhere, communication takes place via the interface device and the network to which the central processor belongs. The various units on the CP are shown in fig. 17, which includes several memory units (cache, flash, and SDRAM), a memory controller, a PCI bus, and connectors for backplane and L1 networking media.

FIG. 18 is a block diagram of a single-chip network processor, with primary emphasis on its EDS-UP, traffic Management (MGT) scheduler, and EDS-DOWN (DN) functions. The U-shaped icons represent queues and the Control Blocks (CBs) that track the contents of the queues are represented by rectangular icons.

The elements, their respective functions and interactions will be explained below.

PMM: is part of a network processor that contains the MAC (FEnet, POS, geet) and is connected to external PHY devices.

UP-PMM: this logic extracts bytes from the PHY, converts them to FISH (16 bytes) format, and then sends them to the UP-EDS. There are 4 DMUs in the PMM, each capable of using a 1GEnet or 10FEnet device.

UP-EDS: the logic circuit fetches FISH from the UP-PMM and stores it into the UP-data memory (internal random access memory). It can process 40 frames simultaneously and can queue the frames into the EPC queue after receiving the appropriate pulse code modulated bytes. When the EPC has finished processing a frame, the UP-EDS queues the frame into the appropriate destination port queue and begins sending the frame to the UP-SDM. The UP-EDS is responsible for the management of all buffers and frames and for sending buffers/frames back to the idle pool after the transfer to the UP-SDM is complete.

EPC: the logic circuit includes a micro-microprocessor and may include an embedded PowerPC. The logic can decide how to process the frame (forward, modify, filter, etc.) by looking at the frame header. The EPC has access to several lookup tables whose hardware helps the micro-microprocessor keep up with the network processor's requirements for high-speed bandwidth.

UP-SDM: the logic may extract the frame and convert it to the format of the PRIZMA unit for transmission to the switch fabric. The logic is also capable of inserting a VLAN header into the frame.

UP-SIF: the logic circuit contains UP-DASL macros, which are connected with the external switch I/O.

DN-SIF: the logic includes a DN-DASL macroinstruction that receives the PRIZMA unit from the external I/O.

DN-SDM: the logic circuit receives and pre-processes the PRIZMA cells to assist in reassembling the frame.

DN-EDS: the logic circuit extracts each cell and restores its combination to a frame. The unit is stored in an external data storage, in association with a buffer to construct a frame. After receiving the entire frame, the frame is queued into the EPC queue. When the EPC has finished processing the frame, it enqueues the frame into the scheduler (if present) or into the destination port queue. The DN-EDS then sends the frame to the appropriate port, via a frame sending routine, and all change information and some control information is sent to the DN-PMM.

DN-PMM: extracts information from the DN-EDS, formats the frame into ethernet, POS, etc., and then sends the frame to an external PHY.

SPM: the logic circuit may couple the network processor to external devices (PHY, LED, FLASH, etc.), but only requires 3I/O. The network processor communicates with the SPM using a serial interface, and the SPM then initiates the necessary functions to manage these external devices.

Uplink flow

1) Frame arrival PHY

2) UP-PMM received bytes

3) UP-PMM sends FISH to UP-EDS (Fish refers to a part of a frame)

4) Storage of FISH to UP-DS by UP-EDS

5) UP-EDS sends header to EPC

6) Sending the enqueue information back to UP-EDS after EPC processes the header

7) UP-EDS continues to receive the rest of the frame from UP-PMM

8) When the appropriate data can be sent to the switch, the UP-EDS sends the information to the UP-SDM

9) UP-SDM reads frame data and then formats it into PRIZMA cells

10) UP-SDM sends respective units to UP-SIF

11) UP-SIF delivers individual units to PRIIZMA through DASL sequences

12) When all data is extracted, the UP-EDS releases the buffer/frame

Downstream flow

1) DN-SIF receiving PRIZMA unit

2) DN-SDM stores and preprocesses these cells to recombine the information

3) The DN-EDS receives the cell data and reassembles the information, and concatenates the cells into a new frame on the downstream line

4) DN-EDS stores cells to DN-DS

5) After all data is received, the DN-EDS queues the frame in the EPC queue

6) EPC processes the header and sends enqueue information back to the DN-EDS

7) DN-EDS enqueues frames into scheduler queue (if present) or target port queue

8) DN-EDS services queues and sends frame information into PCB

9) The DN-EDS uses the PCB to "fan out" the frame, then reads the appropriate data and sends the data to the DN-PMM

10) DN-PMM formats data (unchangeable if required) and then sends the frame to the external PHY

11) When the buffer is no longer needed, the DN-PMM informs the DN-EDS, which releases the resources

Frame control flow

1) Sending headers from UP-DS or DN-DS to EPC

2) EPC looks up header data in lookup tables and receives frame queue information

3) The EPC sends queuing information back to the EDS, whereupon the frame is queued in the appropriate queue

4) Cell headers and frame headers are sent with the frame data to facilitate frame reassembly and forwarding

CP control flow

1) The control point formats the directed frame and sends it to the network processor

2) The network processor enqueues the steered frames into the GCH micro-microprocessor queue

3) GCH processing processes the leading frame and reads or writes the required Rainier region

4) GCH passes all table update requests to GTH

5) The GTH updates the appropriate table based on information from the bootstrap frame

6) Sending a leading frame acknowledgement back to the CP

Network processor control flow

1) The micro-processor may create a boot frame to send information to another Rainier or control point

2) The guide frame is sent to an appropriate position for processing

The interface device provides medium speed switching up to 40 fast ethernet ports (fig. 2A). If IBM's Data Alignment Synchronous Linkage (DASL) is used to connect two interface devices, 80 fast Ethernet ports can be supported. Each DASL differentiator pair may carry 440Mbps of data. Two sets of eight pairs may provide a 3.5Gbps bi-directional connection (440 Mbps in 8X each direction). As shown in fig. 2C and 2D, a larger system can be built by interconnecting multiple interface devices to a switch, such as the IBM's Prizma-E switch. The interface device may provide two 3.5Gbps bidirectional DASL connections, a primary and a secondary, may be used to provide a post-wrap path for local frame traffic (fig. 2B when the two interface devices are directly connected) or to a redundant switching fabric (fig. 2D, standby switch). As can be seen from the above, such network processor chips are scalable in that the same integrated circuit block can provide both low end systems (having a relatively low port density, e.g., 40) and high end systems (having a relatively high port density, e.g., 80-N ports).

The interface device in the system is connected to the system processor via 10/100Mbps fast ethernet ports (up to ten) or one 1000Mbps ethernet port. The ethernet configured to the system processor is located in an EEPROM which is connected to the interface device and loaded during initialization. The system processor can communicate with all interface devices in the same system by establishing a special boot frame (compressed into an ethernet frame). The compressed boot frames are forwarded via a DASL connected to other devices so that all interface devices in the system can be controlled through one point.

The bootstrap frame is used to communicate between the Control Point (CP) and the embedded processor complex and within the same interface device. Previous disclosures of boot elements will help clarify the discussion in this patent, see U.S. patent 5,724,348 entitled Efficient Hardware/Software Interface for Data Switch, set forth at 3 months 1998.

For those pilot frame communications that originate from the CP, the CP will construct the pilot frame in a data buffer in its local memory. The CP device driver sends the boot frame to the media interface of the network processor. The pilot frames are recovered by the medium access control (MAC hardware) and stored in the internal data memory (U _ DS). The lead frame is sent to the appropriate pallet, processed and routed back to the CP as needed. The bootstrap frames passed between the external CP and the interface device are compressed in order to accommodate the external network protocol. Thus, if there is ethernet in the external network, the leading frame is compressed into an ethernet frame, and so on.

Ethernet compression provides a means for directing communications between the CP and the interface device. The ethernet mac (enet mac) of the interface device does not resolve the Destination Address (DA) or the Source Address (SA) when receiving the frame. This parsing is performed by the EPC pico code. The bootstrap communication assumes that the interface device is not configured yet and that the DA and SA cannot be resolved by the EPC picocode. Thus, the frames themselves can be self-routed. However, the Enet MAC parses the ethernet type field to distinguish the bootstrap communication from the data communication. The ethernet Type value of the leading frame must match the value loaded into the E _ Type _ C register. The registers are loaded from flash memory by the root directory microcode of the interface device.

The CP builds a leading frame in the data buffer of its local memory. The contents of the 32-bit register in the CP processor are stored in local memory in a large-head format (as shown in fig. 3). After the frame is constructed, the device driver of the CP sends out an ethernet frame, the sent content including: a DA associated with a particular lead unit processor (GCH), an SA corresponding to the full MAC address of the CP or to the MAC address of the dedicated interface, a specific ethernet type field indicating a certain lead frame, and lead frame data. All ethernet frames arriving at the port are received and parsed by the onet MAC. For those frames whose ethernet Type value matches the contents of the E _ Type _ C register, the Enet MAC strips the DA, SA, and ethernet Type fields off and stores the leading frame data in the UDS memory. The Enet MAC collects the bytes one at a time and makes them into blocks in units of 16 bytes, called Fish. These bytes are stored in a large-head format, and the first byte of the leading frame is stored in the important byte position (byte 0) in Fish. Each subsequent byte is successively stored in a subsequent byte position (byte 1, byte 2.. byte 15) in the Fish. The 16 byte block is again stored in a buffer in the U _ DS, with the ordering starting from Fish 0. Subsequent individual Fish are sequentially stored at subsequent Fish locations in the buffer (Fish1, Fish2, Fish3, etc.). Additional buffers may also be available from the free banks to store the remainder of the boot frame, as needed.

The internal boot communication flow of the interface device 10 is shown in fig. 4. The inet MAC function of the interface device will check the header information of the frame to determine that the frame is a leading frame. The Enet MAC removes the frame header from the boot frame and then the rest of the internal buffering into the internal U _ DS memory of the interface device. The etemac indicates that the frame is to be queued into the total control (GC queue) for processing by the GCH. When the end of the leading frame is reached, enqueue, dequeue and schedule (EDS) logic enqueues the frame into the GC queue.

The GCH pico code on the tray connected to the local CP checks the frame control information (see fig. 6) to determine if the leading frame is intended for other trays in the system and is executed downstream of the interface device. If the frame is for another pallet than the local pallet, the GCH pico code will update the TB value in the Frame Control Block (FCB) to the TB value in the frame control information of the leading frame and instruct the EDS to queue the frame into the multiplex target pallet start of frame (TB _ SOF) queue. For performance reasons, all pilot traffic is queued in the multiplex TB _ SOF queue, regardless of the number of destination pallets.

If the frame is intended for local pallets only, the GCH pico code will check the upstream/downstream section in the frame control information to determine whether to perform the leading frame on the upstream or downstream line of the interface device (see fig. 6). If the boot frame is executed downstream of the interface device, the GCH pico code updates the TB value in the frame control area (FCB) to the TB value in the frame control information of the boot frame and instructs the EDS to queue the frame into the start of multiplex target interface device frame (TB _ SOF) queue. If the frame control information indicates that the boot frame is to be executed in the upstream line, the GCH pico code will parse the boot frame and perform the operations shown in the boot command.

Before processing the bootstrap command, the pico code checks the value of the ack/noack field in the frame control information. If the value is '0' b, the leading frame is deleted after processing and the leading read command will not act on this type of frame.

If the value in the Ack/noack field is '1' b and the value in the "Early/late" field is '1' b, the pico code will construct an Early acknowledgement boot frame with the value in the frame controlled TB field equal to the value in the Early _ Ack boot frame and the value in the frame controlled TB field equal to the value in the My _ TB register before processing the boot command in any boot frame. The microcode routes the early acknowledgement boot frame back to the CP by updating the TB value in the FCB of the frame to the value contained in the TB field of the local area network control point address (LAN _ CP _ Addr) register, and then instructing the EDS to queue the frame into the multiplexed TB _ SOF queue. The microcode then processes the boot command for the boot frame and deletes the boot frame. Boot read commands do not fall into this category.

In contrast, if the value in the ack/noack field is '1' b and the value in the early/late field is '0' b, the pico code changes the resp/req field in the frame control information to '1' b, thereby indicating a response of the guide frame, while replacing the TB field with the content of the My _ TB register, and processing each guide command within the guide frame. In processing the boot command, the picocode updates the completion code field of the next boot command with the completion status code value of the current boot command. The microcode routes the response back to the source by updating the TB value in the (FCB) to the value of its corresponding CP source tray (LAN _ CP _ Addr) register, and then instructing the EDS to queue the frame into the multiplexed TB _ SOF queue.

The EDS schedules the frames residing in the TB _ SOF queue for forwarding. The Switch Data Mover (SDM) creates a switch fabric unit header and an interface device frame header based on the information contained in the FCB. The above units reach the target pallet via the switch fabric device and are reassembled into frames by these units in the pallet's D-DS memory. The downstream SDM is identified to determine that the frame is a lead frame and then instructs the EDS to queue it into the GC queue.

Pressure from the GC queue or the GT queue causes the microcode to access and parse the boot frame. All leading frames arriving downstream are initially queued in the GC queue. The GCH pico code checks the value of the frame control information gth/GCH of these frames. If the gth/gch value is '0' b, a leading frame is queued in the GT queue. Otherwise, the resp/req field in the frame control information is checked by the GCH pico code to determine if the leading frame has been executed. If the value of resp/req is '1' b, it indicates that the leading frame has already been executed, and it is routed to the CP. The destination port value corresponding to the CP connection is maintained by the EPC pico code. Frames from the destination port queue are sent back to the CP from the interface device.

If the value of the resp/req field is '0' b, it may be a local or remote blade with respect to the CP. The solution is to compare the value in the LAN _ CP _ Addr register TB field with the contents of the My target tray (My _ TB) register. If the two match, the pallet is the local pallet, otherwise the pallet is the remote pallet of the CP. In either case, the pico code checks the uplink/downlink value in the frame control information. If up/down is '1' b, the frame is queued into the round robin TP queue for forwarding to the U _ DS and upstream processing by the GCH. Otherwise, the microcode (GCH or Gth) performs the operations contained in the boot command of the boot frame. Before executing the bootstrap command, the pico code checks the value of the ack/noack field in the frame control information. If the value is '0' b, the leading frame is deleted after processing. Boot read commands do not fall into this category.

If the value in the ack/noack field is '1' b and the value in the early/late field is also '1' b, the picocode will construct an early acknowledgement boot frame with the value of the TB field in the frame control information equal to the content in the My TB register before processing the boot command in any boot frame. If the tray is far from the CP, the microcode will route the early acknowledgement boot frame to the round robin port. Conversely, if the tray is local to the CP, the frame is routed to the port queue corresponding to the CP. The microcode processes the boot command while the round-robin port moves the early acknowledge boot frame from D _ DS to U _ DS and queues the frame in the upstream line GC queue or returns the frame from the port queue to the CP. For frames back to U _ DS after the round-robin, the GCH microcode will see it again, but the value of the resp/req field will become '1' b. The GCH microcode routes the frame back to the CP by updating the TB field of the FCB to the value contained in the TB field of the LAN _ CP _ Addr register and then instructing the EDS to queue the frame into the multiplexed TB _ SOF queue. The EDS schedules the frames residing in the TB _ SOF queue for forwarding. The SDM builds a Prizma cell header and interface device frame header from the information contained in the FCB. The individual elements from the frame are reassembled into a frame by Prizma and on the CP's local pallet. The downstream SDM is identified to determine that the frame is a lead frame and then instructs the EDS to queue it into the GC queue. This time when the GCH pico-code parses the frame, the value of the resp/req field becomes '1' b. This means that the tray is the CP's local tray and the leading frame is routed to the port queue corresponding to the CP. The frames from the queue are sent back to the CP from the interface device.

In contrast, if the value in the ack/noack field is '1' b and the value in the early/late field is '0' b, the pico code changes the value of the resp/req field to '1' b, thereby indicating a response of the boot frame while replacing the TB field with the content of the My TB register, and then processes each boot command within the boot frame. In processing the boot command, the picocode updates the completion code field of the next boot command with the completion status code value of the current boot command. If the tray is far from the CP, the pico code will route the leading frame to the round robin port. Conversely, if the tray is local to the CP, the frame is routed to the port queue corresponding to the CP. The round robin port moves the leading frame from D _ DS to U _ DS and queues the frame in the upstream line GC queue or sends the frame back from the port queue to the CP. For frames back to U _ DS after the round-robin, the GCH microcode will see it again, but the value of the resp/req field will become '1' b. The GCH microcode routes the frame back to the CP by updating the TB field of the FCB to the value contained in the TB field of the LAN _ CP _ Addr register and then instructing the EDS to queue the frame into the multiplexed TB _ SOF queue. The EDS schedules the frames residing in the TB _ SOF queue for forwarding. The SDM builds a Prizma cell header and interface device frame header from the information contained in the FCB. The individual elements from the frame are reassembled into a frame by Prizma and on the downstream line of the CP's local pallet. The downstream SDM is identified to determine that the frame is a lead frame and then instructs the EDS to queue it into the GC queue. This time, when the GCH microcode parses the frame from the D _ DS, the value of the resp/req field has changed to '1' b. This means that the tray is the CP's local tray and the leading frame is routed to the port queue corresponding to the CP. The frames from the queue are sent back to the CP from the interface device.

If for some reason the GCH pico code encounters a leading frame with a TB field of the frame control information equal to '0000' h, the GCH pico code will consider the frame to be for this pallet only and act accordingly. This action is necessary at initialization, since the value in the My _ TB register of all pallets is '0000' h at this time. The CP initializes the My _ TB register of the local tray by transmitting a write boot command, which has a TB value of '0000' h of frame control information in a boot frame.

Any micro-microprocessor in the EPC can generate the bootstrap frame. The frame may be an unsolicited lead frame or any other form of lead frame. Such internally generated frames are established in an unacknowledged (i.e., ack/noack ═ 0' b) manner. These frames may be sent to one of two microprocessors (GCH or GTH) in the same EPC, or to GCH or GTH of other trays.

The unsolicited pilot frame may also be sent to the CP. The leading frame to the same EPC is constructed with a data buffer in the D _ DS. These frames are then queued for processing in a queue of the GC or GT. These frames are processed and then erased in the usual manner. The unsolicited pilot frame destined to the local CP is constructed using a data buffer in the D _ DS. These frames are constructed while indicating that they have been executed by the EPC (i.e., resp/req ═ 1' b, TB ═ My _ TB). These frames are queued into the port queues of the respective CPs. The frames from the queue are sent back to the CP.

A leading frame to another tray may be constructed using a data buffer in the D _ DS or U _ DS. When constructing unsolicited pilot frames destined to the CP, they will also indicate that they have already been executed by the EPC (i.e. resp/req ═ 1' b, TB ═ My _ TB). Frames constructed using the buffers in the D _ DS are queued into the round robin port queue. These frames are moved to the U _ DS and enqueued to the upstream GC queue. Those unsolicited pilot frames with resp/req of value '1' b will be routed to the CP using the TB value in the LAN _ CP _ Addr register. Otherwise, the GCH pico code will route the frames using the TB value in the frame control information of the pilot frame. On the receiving tray, the frame is queued into the downstream GC queue. The GCH of the pallet executes and deletes a frame (resp/req ═ 0 ' b, gth/GCH ═ 1 '), or arranges the frame into a GT queue (resp/req ═ 0 ' b, gth/GCH ═ 0 '), or arranges the frame into a corresponding CP port queue (resp/req ═ 1 ' b). Frames constructed using the data buffer in U _ DS are enqueued directly into the upstream GC queue. From this point on, the frames will follow the same path, and the accepted procedure is the same as for frames constructed using the D _ DS data buffer. Fig. 5 is a generalized form of a bootstrap frame.

The format in the figure is a logical representation with the important bytes on the left and the least important bytes on the right. The 4-byte words are sequentially incremented from the top 0 down the page.

Since the bootstrap frames must be routed and processed before the CP configures the interface device, these frames must be able to route themselves. The result of the lookup and sorting is typically contained in the frame control information field of the lead frame, thereby enabling the integrated circuit block to update the FCB with this information without having to perform a lookup operation. The target pallet information contained in the boot frame is used in the boot frame control program to prepare the leaf Page field in the FCB. The CP provides the target pallet information while the GCH pico-code fills other fields in the FCB. The SDM uses these FCB information to prepare cell and frame headers. The format of the frame control information field of the leading frame is shown in fig. 6.

The abbreviations for the positions of each bit in fig. 6 are explained as follows:

the resp/req response and the do not require an indicator value. This field is used to distinguish between demand (unprocessed) and response leading frames.

0 requirement

1 response

ack/noack acknowledges or does not acknowledge the control value. This field is used to control whether the GCH pico code is (ack) or not (noack) to acknowledge the bootstrap frame. The boot frame that does not require acknowledgement will not contain any boot command to perform the read operation.

0 not confirmed

1 confirmation

The Early/Late Early and Late confirm the control value. This field is used to control whether an acknowledgement (ack/noack ═ 1' b) needs to be made before (early) or after (late) the boot frame is processed. When ack/noack is '0' b, this field is ignored.

0 acknowledgement after boot frame processing

1 acknowledging before leading frame processing

neg/all does not acknowledge or acknowledge all control values. When the value of the ack/noack field is '0' b, the field is ignored, except for the case where the boot command is not successfully completed.

0 acknowledges the entire leading frame if ack/noack is equal to '1' b. Whether to confirm early or late is determined according to the value of early/late.

1 acknowledge only those frames that did not complete successfully. This acknowledgement is independent of the values of ack/noack and early/late, and is therefore naturally a late acknowledgement.

up/down uplink or downlink control values. This value is used to control whether the frame is processed on the uplink or the downlink. When resp/req is '1' b, this field is ignored. The up/down value of all multiplexed leading frames should be '0' b. Furthermore, the boot command that requires the use of the GTH hardware accelerator instruction should have an up/down value of '0' b.

0 downstream processing

1 uplink processing

gth/gch normal directory tree processor or boot cell processor control values. This value is used to control the leading frame to the appropriate micro-microprocessor.

0GCH micro-microprocessor

1GTH micro-microprocessor

TB target pallet value. When resp/req is '0' b, this field contains routing information available to Prizma. Each bit position corresponds to a target pallet. If the value is '0000' h, the leading frame is considered to correspond to the pallet and the corresponding operation is performed. If one or more bit locations in the TB field have a value of '1' b, this indicates that the location should be routed to the corresponding target pallet. When resp/req is '1' b, this field will contain the My _ TB value of the corresponding pallet.

The 1 st word of the leading frame contains an associated sub-value (fig. 7). This value is specified by the CP and has the effect of associating the responses of the pilot frames with their respective requirements. The correlator includes a plurality of bits having a particular function.

Each boot command starts with a command control information field. The command control includes information that assists in the GCH pico-code processing of the pilot frame. The format of this information is shown in figure 8.

Length value: this value represents the total number of 32-bit Words (Cmd Word 0), address information (Cmd Word 1), and operands of the pilot frame (Cmd Words 2+) included in the control information.

And (3) completing code value: this field is initialized by the CP and modified by the GCH microcode when processing the boot command. This field is used by the GCH pico code to indicate the completion status when executing the bootstrap command in the command table. Since all the leading command tables are ended with the end delimiter leading command, the completion status of the last command is contained in the completion code field of the end delimiter.

Leading Command type value (symbolic name)

Symbolic name type value type specification

End _ Delimiter 0000 marks the End of the leading frame sequence

Build _ TSE _ Free _ List 0001 builds a Free List.

Software _ Action 0010 executes a Software Action

Unsolidated 0011 frame initialized by EPC picocode

Block _ Write 0100 writes a Block of data to a serial address

Duplicate _ Write 0101 writes the copied data to a register or memory.

Read 0110 request and response Read store data

register 0111 Retention

Insert _ Leaf 1000 inserts a Leaf into the search tree.

Update _ Leaf 1001 updates a Leaf of the search tree

Read _ Leaf 1010 requests and responds to Read page data

Leaf 1011 retention

Delete _ Leaf 1100 deletes a Leaf of the search tree

1101-

The address information contained in the leading frame may identify the element according to an addressing scheme internal to the network processor. The general format of the address information field is shown in fig. 9.

The interface device uses a 32-bit addressing scheme. The addressing scheme may assign an address value to each accessible structure in the interface device. These structures may be internal to the processor or may be connected to an interface controlled by the processor. Some structures may be accessed by an Embedded Processor Complex (EPC) via an internal interface known as a Web interface. The remaining structures may be accessed via a memory controller interface. In all cases, the general format of the address is shown in FIG. 10.

The network controller is subdivided into islands of primary integrated circuit blocks. Each island has a unique island ID value. These 5-bit island ID values constitute the 5 most significant bits of the fabric address, which are controlled by the integrated circuit block islands. Communication between the encoded island ID value and the integrated circuit block island name is shown in fig. 11. The second part of the Web address consists of the next most important 23 bits. These address fields are divided into a structure address part and a unit address part. The number of bits per segment varies between islands. Some islands may contain only a few large structures, while other islands may contain many small structures. For the above reasons, the address fragment also has no fixed size. The structure address portion is used to address the array inside the islands, while the cell address portion is used to address the elements inside the array. The remainder of the address is used to accommodate the 32-bit data bus limitations of the Web interface. The address of this 4-bit word is used to select the 32-bit field of the address element. This is necessary to move structural elements wider than 32 bits across the Web data bus of the network controller. The word address value '0' h refers to the 32 most significant bits in the structural element, while the sequential word address value corresponds to the next most significant segment in the structural element. For structures that are not accessed via a Web interface, the word address portion of the address is not needed. Thus, the upstream data memory, the control memory and the downstream data memory all use all 27 least significant bits of the address to access the fabric elements. Another exception to this format is the address used for the SPM interface. In this case, all 27 bits of the address are used, with no element being more than 32 bits wide.

An embedded core assembly (EPC) provides and controls the programmability of the interface device integrated circuit blocks. Including the following (see fig. 12A):

n processing units, called GxH: these gxhs can simultaneously execute picocodes stored in a common instruction memory. Each GxH consists of a core of processing units called CLPs, 16 GPRs and an operator. The CLP includes a 3-stage pipeline. Each GxH also contains several coprocessors, such as tree search engines.

An instruction memory: during initialization, micro-microcode is loaded, which is used to forward frames and manage the system.

A dispenser: the frame-address is dequeued from the upstream and downstream scheduling queues. After dequeuing, the distributor pre-extracts a part of the frame, the header, from the upstream or downstream Data Storage (DS) and stores it in the internal memory. Once GxH is idle, the dispatcher passes the frame header to GxH with the appropriate control information, such as the Cipher Instruction Address (CIA). The distributor is also responsible for timers and interrupt signals.

Tree Search Memory (TSM) arbiter: each GxH may use several shared internal and external memory units. Since the memory units are shared, an arbiter is needed to control the access of the memory. The TSM may be directly accessible by the pico code, may be used to store an age table in the TSM, and the like. The TSM is also accessible by the TSE during tree retrieval.

Completion Unit (CU): the completion unit performs two functions. First, it can connect N processing units to the upstream and downstream EDSs (enqueue, dequeue, and dispatch islands). The EDS performs the enlisting action: a frame address is queued in a transmit queue, a delete queue, or a dispatch queue, along with an appropriate parameter called fcpage. Second, the completion unit may guarantee a frame sequence. Since it may happen that multiple gxhs process frames in the same flow, precautions must be taken to place the frames in the correct order into the upstream and downstream transmit queues. The completion unit uses a tag generated by the classifier hardware accelerator when the frame is sent.

Classifier hardware accelerator: for upstream frames, the classifier hardware accelerator provides classification for the well-known frame format. The classification result is sent to GxH during frame transmission according to the CIA and the contents of one or more registers. For downstream frames, the classifier hardware accelerator determines the CIA from the frame header. For both upstream and downstream frame transmissions, the classifier hardware accelerator generates a tag for use in the completion unit to maintain the sequence of frames.

Upstream and downstream data storage interfaces and arbiters: each GxH may use both upstream and downstream data storage. Read access is provided when reading multiple Fish, and write access is provided when writing the content in the fisher book back to the data store. Since there are N processing units and only one processing unit can access the upstream data store at a time, and only one processing unit can access the downstream data store at a time, an arbiter is required for each data store.

WEB arbiter and WEBWatch connection: the WEB arbiter arbitrates when each GxH accesses the WEB. All GxH can access the WEB and thus all memory and register functions in the interface device. Thus, any GxH can also modify or read all configuration regions. The WEB can be viewed as a memory map of the interface device. The WEB page connection provides access to the entire WEB from outside the ic block using 3 pins lO of the ic block.

Debug interrupt signaling and single step control: WEB allows GCH or WEBWatch to control each GxH on the integrated circuit block as necessary. For example, WEB can be used by GCH or WEBWatch for single-step instructions on GxH.

An embedded general purpose processor such as a PowerPC.

There are 4 types of GxH (fig. 12B):

GDH (general data processor). There are 8 DH. Each GDH has a full CLP consisting of 5 coprocessors (as described further in the following section). GDH is mainly used for forwarding frames.

GCH (Guided Cell Handler). The hardware of GCH is identical to that of GDH. However, the leading frame can only be processed by the GCH. Programming on the WEB (CLP _ Ena register) can enable the GCH to process data frames as well (when it functions as a GDH). GCH also has additional hardware compared to GDH: i.e., hardware that assists in performing directory tree insertion and deletion. The GCH is used to execute microcode associated with boot cells and also to execute integrated circuit blocks and associated picocode and directory tree management, such as aging management and control information exchange with CPs and/or other GCHs. Without performing the above task, the GCH executes micro-microcode related to frame forwarding, and in this case, functions exactly as the GDH.

GTH (general directory tree processor). The GTH also has some other hardware to assist in performing directory tree insertion, directory tree deletion, and rope management. When there are no frames (containing directory tree management commands) in GPQ, the GTH will process the data frame.

GPH (general PowerPC processor). GPH has some additional hardware compared to GDH and GTH. The GPH is coupled to the general purpose processor through a mailbox interface (i/f).

The number of GxH (10) is a well guessed number. The performance evaluation may determine how many gxhs are actually needed. The architecture and structure is fully scalable, it can be upgraded to more GxH, and the only limitation is the size of the silicon area (here also a larger arbiter and instruction memory is required).

The construction of each GxH is shown in fig. 12C. In addition to CLP with General Purpose Registers (GPRs) and Arithmetic and Logic Units (ALUs), the following 5 coprocessors are included in each GxH:

(DS) coprocessor interface. Interfaces to the allocator and the various sub-islands that provide read and write access to upstream and downstream data storage. The DS interface includes a so-called fisherpool.

Tree search engine coprocessor (TSE). The TSE performs searches in the directory tree, and may also be coupled to a tree retrieval memory (TSM).

The enqueue coprocessor. And the completion unit interface is connected with the completion unit interface and comprises FCBPage. The coprocessor includes a 256-bit register with an additional hardware accelerator that the pico-code must use to create the fcpage containing the enqueue parameter. After the FCBPage is established, the micro-microprocessor executes an enqueue instruction that causes the coprocessor to forward the FCBPage to the completion unit.

WEB interface coprocessor. The coprocessor provides a connection to the WEB arbiter allowing reading from and writing to the interface device WEB.

A checksum co-processor. Are generated and checked on frames, which are stored in fisherpool (described further below with respect to fisherpool).

These processing units are shared between the ingress processing and the egress processing. How much bandwidth is reserved for ingress processing and how much bandwidth is reserved for egress processing can be set programmatically. There are two modes in current devices: 50/50 (i.e., the ingress and egress take the same bandwidth) or 66/34 (i.e., the ingress takes twice the bandwidth of the egress).

The operation of the processing unit is event driven. That is, the arrival of a frame is treated as an event like an intermittent or interrupted signal of a timer. The switch handles the different events in the same way, albeit sequentially (interrupt signal first, then timer event, and finally frame arrival event). When an event is sent to the processing unit, corresponding information is also sent to the processing unit. The frame arrival event contains a portion of the frame header and information from the hardware classifier. The timer and signal interrupt events include the code entry point and other information associated with the event.

When a frame arrives at the ingress port and the number of bytes received from the frame exceeds the programmable limit, the address of the frame control block is written to the GQ.

When a frame reassembly at the egress end is completed, the address of the frame is written to the GQ. There are four types of GQ (each type has an inlet pattern and an outlet pattern, see fig. 12B):

GCQ: containing frames that have to be processed by the GCH

GTQ: containing frames that must be processed by the GTH

GPQ: containing frames that must be processed by the GPH

GDQ: contains frames that can be processed by any GDH (or GCH/GTH, when they are capable of processing digital frames). GDQ have multiple priorities such that frames queued in GDQ of higher priority are processed preferentially over frames queued in lower priority queues.

Some processing units have specialized uses. In the current apparatus, there are four types of processing units (GxH), see fig. 12B:

GDH (general data processor). GDH is mainly used for forwarding frames.

GCH (boot cell processor). The hardware of GCH is identical to that of GDH. However, the leading frame can only be processed by the GCH. The GCH can also process the data frame (when it functions as GDH) by programming on the WEB (CLP _ Ena register).

GTH (general directory tree processor). Compared to GDH/GCH, GTH has additional hardware: i.e., hardware that assists in performing directory tree insertion, directory tree deletion, and rope management. When there are no frames (containing directory tree management commands) at GPQ, the GTH processes the data frame.

GPH (general PowerPC processor). GPH has some additional hardware compared to GDH/GTH. The GPH is connected with the embedded PowerPC through a mailbox interface.

Among real devices, the roles of GCH, GTH, and GPH may be performed by one processing unit. For example, one device may use one processing unit as both a GCH and a GPH. Similar explanations apply to GCQ, GTQ and GPQ.

The purpose of the data storage coprocessor is as follows:

connected to an upstream data store containing frames received from the medium; and the downstream data storage is connected with and contains the reassembled frame received from Prizma Atlantic.

The data storage coprocessor may also receive configuration information when a timer event or interrupt signal is sent.

The data storage coprocessor may compute and check on the frame.

The data storage coprocessor comprises a FishPool (capable of accommodating 8 Fishs), a temporary memory (capable of accommodating 8 Fishs) and a plurality of control registers, and is used for reading the content of the FishPool from an uplink data storage or writing the content of the FishPool into a downlink data storage. We can imagine fisherwood as a working area for some kind of data storage: instead of directly reading/writing to the data storage, a large amount of frame data is read out from the data storage and put into the fisherpool, or a large amount of frame data is written into the data storage from the fisherpool. The transmission unit is Fish, and one Fish is 16 bytes.

We can imagine fisherpool as a memory that holds 8 Fish, i.e. 8 128-bit words. In the CLP processor architecture, Fishpool is a 128-byte register array. Each byte in the fisherpool contains a 7-bit byte address (0.. 127), and the base of the entry is a 16-bit or 32-bit substrate. As with all register arrays, Fishpool has a circular addressing scheme. That is, addressing for one word (i.e., 4 bytes) begins at location 126 in fisherpool and then returns to bytes 126, 127, 0, and 1. Furthermore, from the perspective of the data storage coprocessor, the Fish location in the fisherpool has a 3-bit Fish address.

When the frame is transmitted, the first N Fishs of the frame are automatically copied into the Fishpool by the distributor. The value of N can be programmed in the PortConfigMemory. In general, N is equal to 4 for uplink frame transmission, N is equal to 2 for downlink unicast frame transmission, N is equal to 4 for downlink multicast frame transmission, and N is equal to 0 for interrupt signals and timers.

The microcode may read more frame data from the frame, and the data storage coprocessor will automatically read the frame data into the fisherpool at the next Fish address, and will automatically wrap back to 0 when the boundary of the fisherpool is reached. The microcode may also read or write the upstream/downstream data stores at absolute addresses.

The WEB coprocessor is connected with the EPC WEB arbiter. The EPC WEB arbiter arbitrates between ten GxH, so the WEB monitor becomes dominant over the interface device WEB interface. This allows all GxH to read and write on the WEB.

The interface device memory complex may provide a storage device for an Embedded Product (EPC), see fig. 12A. The memory complex includes a directory Tree Search Memory (TSM) arbiter and a plurality of on-chip memories and off-chip memories. These memories may store tree structures, counters, and any other contents that microcode requires access by a store. In addition, these memories may also be used to store data structures for hardware, such as free lists, queue control blocks, etc. By default, any memory location not allocated to the directory tree or used by hardware is available for pico-code use, such as counters and age lists.

FIG. 13 is a more detailed block diagram of a memory assembly. A directory Tree Search Memory (TSM) arbiter provides communication between the embedded processor (GxH) and the respective memories. These memories include SRAM on 5 integrated circuit blocks, SRAM outside 1 integrated circuit block, and DRAM outside 7 integrated circuit blocks. The TSM arbiter comprises ten request control units (each control unit being connected to one embedded processor GxH) and 13 memory arbiter units, one for each memory. A bus structure connects the request control units with the arbiter units such that each control unit and the GxH connected thereto have access to all memories.

The control unit contains the necessary hardware to direct data between the embedded processor (GxH) and the arbiter.

The SRAM arbiter unit focuses on managing the data flow between the embedded processor GxH and the on-chip and off-chip SRAMs.

The DRAM arbiter unit focuses on managing the flow of data between the embedded processor (GxH) and the off-chip DRAM devices.

Each memory arbiter contains a "back-gate" entry, typically for other parts of the integrated circuit block, which have the highest level of access priority.

DRAM memory can operate in two modes:

TDM mode. Memory accesses to 4 blocks in the DDRAM are interleaved between a read window and a write window. In the read window, access to any of the 4 blocks is read-only; while in the write window, access to any of the 4 blocks is write-only. Using TDM mode for multiple DDRAMs allows sharing of certain control signals between the respective DDRAMs, thereby saving some of the integrated circuit blocks IO (which are very rare resources).

non-TDM mode. Memory access to 4 blocks in DDRAM can be read and write combined, but this must follow certain rules. For example, we can read in block a while writing in block C through the same access window.

The TSM arbiter allows N requestors to access M memories simultaneously. When multiple requesters want to access the same memory, round robin arbitration is performed.

The M memories may have different attributes. In our current device, there are three storage types: internal SRAM, external SRAM, and external DDRAM.

These M memories are of the same type as the N requesters: any requestor can access any memory.

Some memories are logically divided into a plurality of sub-memories (e.g., 4 blocks in a DDRAM) that are logically accessible simultaneously.

A portion of the M memories, which contain internally used data structures, is used as control memory, which has a higher priority access than the micro-microprocessor. Since the micro-microprocessor can read the contents of the control memory, the integrated circuit block can also be debugged.

The arbiter supports read access, write access, and read-add-write. By read-add-write, an N-byte integer can be added to the contents of the memory by atomic operations.

The normal addressing scheme used to access the M memories is such that the physical location of the objects in the memories is transparent.

The concept of a directory tree in a tree search engine refers to the storing and retrieving of information. Retrieval refers to tree search and insert and delete operations based on keys in a bit-wise combination format, such as a MAC source address, or a concatenation of an IP source address and an IP destination address. The information is stored in a control block called a leaf, which contains at least the above-mentioned key (the stored bit pattern is actually a hash key, as will be further discussed below). The leaf also contains some other information, such as age information or user information, which may be forwarding information (e.g., destination tray and destination port number), etc.

There are various different tree types (FM, LPM and SMT), and accordingly there are different kinds of tree searches: fixed matching, software management tree structure, and maximum prefix matching. In tree searching, another optional criterion is available for leaf examination, namely VectorMask. Also, wrap-up, aging, and locking may be employed to improve search performance.

The search algorithm for the FM tree structure is shown in fig. 14. The retrieval algorithm operates on the input parameters containing the key, first hashing the key, then accessing the Direct Table (DT), traversing the tree structure in the Pattern Search Control Box (PSCB), and finally ending on a leaf (see fig. 14). There are three different tree structures, each with its own search algorithm, and therefore it is necessary to traverse between trees according to different rules. For example, for a Fixed Match (FM) tree structure, the data structure is a Patricia tree structure. When a leaf is found, it is the only object that can match the input key. In a software processing tree structure, there will be multiple leaves linked in the same linked list. At this point, all leaves in the chain need to be checked with the input key until a match is found, or until the chain has been checked in its entirety. The so-called "final compare" operation is to compare the input key with the format stored in the leaf, which can verify whether the leaf does indeed match the input key. If the corresponding leaf is found and a match is achieved, the result of the search is "OK", if for all other cases, the result of the search is "KO".

The input of the search operation consists of the following parameters:

the key (128 bits). The key must be established by a special pico-code instruction before searching (or insertion/deletion). There is only one key register. However, after the tree search begins, the key register will be micro-coded to establish the key for the next retrieval, which is performed at the same time that the TSE is searching. This is because the TSE hashes the key and stores the result in an internal HashedKey register (thus, there are actually two key registers).

KeyLength (7 bits). The register contains bits of the key length. The key length is automatically updated by hardware during the key establishment process.

LUDeflndex (8 bits). It is a pointer to the LUDefTable, which contains definitions of all tree structures, where the search is done. Details regarding the LUDefTable are provided below.

TSRNr (1 bit). The search results may be stored in the tree search result area 0(TSR0) as well as in TSR 1. This is determined by TSRNr. While the TSE performs the search, the pico code may access another TSR to parse the results of the previous search.

Vectorlndex (6 bits). For those tree structures that enable VectorMask (which is determined in the ludeftab), vectorlindex will mark one bit in VectorMask. When the end of the search is reached, the value of this bit is sent back for use by the picocode.

The input key is hashed to a HashedKey, as shown in fig. 14. There are 6 fixed hash algorithms that can be used (one "algorithm" performs a non-hash function). Which algorithm to use is specified in the LUDefTable. A programmable hash function may also be used to increase flexibility.

The computed output of the hash function is always a 128-bit number, whose properties are a one-to-one correspondence between the original input key and the hash function at the output. As will be explained below, this property minimizes the depth of the tree structure that begins after the direct table.

If color is enabled for the tree structure, as is the case in FIG. 14, a 16-bit color register is inserted into the 128-bit hash function computation output. This insertion occurs directly after the direct table. That is, if 2 is included in the direct table^NFor each entry, a 16-bit color value is inserted at the nth bit position, as shown. The computed output of the hash function is stored in the HashedKey register along with the inserted color value (when enabled).

The hash function is defined such that most of the entropy in the computed output resides in the highest bits. The N most significant bits in the HasshedKey register may be used as pointers to perform Direct Tables (DTs).

The search starts with entering the direct table: a DTentry is read from the direct table. The address used to read the DTEntry is computed from the N most significant bits in the HashedKey, as is the address for the tree-structured attribute defined in the LUDefTable. This will be described in detail below. DTEntry can be viewed as the root of a tree structure. Which tree structure data structure is used depends on the type of tree structure. In this regard, we can fully consider that the data structure of the Patricia tree structure can be used for FM tree structures, while various extensions of the Patricia tree structure can be used for LPM and SMT tree structures.

An example of the use of an 8 entry DT is shown in fig. 15. It can be seen that the search time (i.e., the number of PSCBs that must be accessed) can be reduced by DT. Thus, by increasing the capacity of the DT, a balance between memory usage and search performance can be achieved.

As can be seen from fig. 15, the DTEntry may contain the following information:

and (4) is empty. The DTentry is not connected to any leaf.

A pointer to a leaf. There is one leaf connected to the DTEntry.

A pointer to the PSCB. There are multiple leaves connected to the DTentry.

DTEntry. DTentry defines the root of the tree structure.

The search algorithm for managing the tree structure and the algorithm for generating the tree structure are disclosed in U.S. patent application No. 09/312,148.

There is an algorithm known as the "selection bit algorithm" which uses a metric to build a binary search tree based on an item called "rule" selected from one or more sets of rules. Our examples are described herein in terms of Internet Protocol (IP) headers, but any fixed format header could be used instead.

In IP, each rule is matched to a specific key, which may be combined from the following subsections: source Address (SA), Destination Address (DA), Source Port (SP), Destination Port (DP) and protocol (P). These data are 32, 16 and 8 bits long, respectively, so that the key to be tested contains 104 bits. The select bits algorithm finds those 104 bits that are particularly useful. By testing a few valid bits, most of them can be culled out of many potentially useful applications, leaving only one or a few useful rules. For some laws, the inequality can also be tested by simple comparison. Bit testing and comparison is performed logically organized in a binary tree. The tree structure is mapped into a hardware-enabled structure where the bits are tested at high speed. One or a few rules (called leaf chains) that may be matched to the key are derived by testing. In the former case, the key is then fully tested by the rule. In the latter case, the key is then tested in a lattice manner using a comparison and full rule test.

Each rule in each set of rules is accompanied by an action that is executed when the rule matches the key and has the highest priority. Cross-over of the various rules can also occur (one key fits two or more rules). In this case, the rules are given priorities 1, 2, 3, so that the two intersecting rules have different priorities (which rule is the primary one needs to be declared by the administrator if the key matches two or more rules). Thus, if more than one rule remains to be tested after bit testing and comparison, the testing is performed in the order of priority of the rules. The smaller the number of priorities, the higher the priority of the represented rule.

If a match is not found, some default provisions need to be made.

A search algorithm related to the longest prefix matching method is disclosed in U.S. patent No. 5,787,430. The method requires entry from one node (root node) of the database; determining a search path from one node to another node via said tree database by sequentially processing segments containing only search arguments identifying entries required by the next (child) node, and simultaneously sequentially processing secondary link information until all of said segments are processed or a (leaf) node without secondary link information is reached; comparing the search argument with entries stored in search path end nodes; if the local match between the search argument and the entry is not found on the node, returning to the search path by processing the first-level link information of the current node; the first two steps are repeated until at least a partial match is found or the root node is reached.

Figure 16 shows one embodiment of a main switch fabric device. Preferably, each interface device integrated circuit block has more than two parallel-serial ports integrated therein, which receive parallel data and convert it to a high-speed serial data stream, which is then forwarded to the switch fabric device via the serial link. Data received from the switch fabric device on the high speed serial link is converted to parallel data via another DASL. Here, we will describe an example of a serializer/deserializer called data over serial link (DASL).

At least one DASL connects the switch fabric devices to the serial links. Data from the serial link is converted to parallel data and then transferred to the switch fabric device. Similarly, parallel data from the switch fabric device is converted to serial data and then forwarded to the serial link. Serial links may also be aggregated to increase throughput.

Still referring to fig. 16, the switching system includes a switch fabric 11, an input switch adapter 13(13-1.. 13-k) coupled to a switch fabric input port 15(15-1.. 15-k), and an output switch adapter 17(17-1.. 17-p) coupled to an output port 19(19-1.. 19-p) of the switch fabric.

The input and output transmission links 21(21-i.. 21-q) and 23(23-i.. 23-r) are connected to the switching system by line (link) adapters 25(25-i.. 25-q) and 27(27-i.. 27-r), respectively. The transmission link performs the transfer of circuit-switched traffic or packet-switched traffic to and from connected units (such as workstations, receivers, etc. links specified by WS), to and from local area networks (links specified by lan), to and from ISDN integrated services digital networks (links specified by ISDN), or to and from all other communication systems. The transmission link performs the transfer of circuit-switched traffic or packet-switched traffic to and from connected units (such as workstations, receivers, etc. links specified by WS), to and from local area networks (links specified by lan), to and from ISDN integrated services digital networks (links specified by ISDN), or to and from all other communication systems. The processor may furthermore be connected to switching adapters 13 and 17. The Line Adapter (LA) and the Switching Adapter (SA) share the same interface.

At the input side switching adapter, the various services from the packet-switched and circuit-switched interfaces are collected and then converted into a unified mini-packet (one of several of fixed length) containing in its header routing information specifying the output ports (and the outgoing links) of the switch. In the next section, the format of the mini-packets, their generation on the input side switching adapter and their unpacking on the output side switching adapter will be described in detail.

The switch fabric routes mini-packets from any input to any output through the fast self-routing interconnect network. The structure of the self-routing network allows the mini-packets to be routed internally simultaneously without any collisions.

The central part of the switching system is the switching fabric. Here, we study two different implementations and describe them separately. In a first implementation, each input port in the switch fabric has a self-routing binary tree, and the corresponding input port is connected to all output ports; there are k such tree structures in the switch fabric (if k input ports can be provided) which are combined with each other. In another implementation, there is one segment with output port RAM as each output port, which connects all input ports to the corresponding output ports; there are P such segments in the switch fabric grouped together (if P output ports can be provided).

DASL is described in detail in the 1999 application Ser. No. 6/11, numbered 09/330,968. The DASL interface receives data from a parallel interface, such as a CMOS ASIC, and splits the bits from the parallel interface into a smaller number of parallel bitstreams. These smaller number of parallel bit streams are converted to a high speed serial stream and then transmitted via a transmission medium to a receiver in another module. A differential driver with controlled impedance delivers a serial bit stream of the data to a transmission medium.

The DASL analyzes the data stream, and converts the N-bit parallel data stream into a plurality of sections, wherein each section has N bits, and N is a part of N; concatenating each n-bit segment of the data stream; transmitting each serial portion through one of a plurality of parallel channel rows; the transmitted portions of each data stream are serial-to-parallel converted to restore them to a parallel N-bit data stream.

In the drawings and specification there has been set forth preferred embodiments of the invention and, although specific terms are employed, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

1. A scalable network processor, comprising:

a processing assembly comprising N processors, wherein N > 1;

a plurality of input/output ports;

at least one upper media access control, MAC, module for processing incoming data received from the input ports;

an upper-end enqueue/dequeue scheduler in the upstream path, operatively connected to the processing complex and the upper-end MAC;

at least one lower MAC located on the downlink path and connected to the output port, the at least one lower MAC processing the egress data and forwarding the processed data to the output port;

a lower enqueue/dequeue scheduler located in the downlink path and connected in use to a processing complex and to at least one lower MAC; and

a connector system operatively connected to the upper enqueue/dequeue scheduler and the lower enqueue/dequeue scheduler such that input/output ports of a single network processor or M network processors are operatively interconnected, where M ≧ 2, data being exchanged between the input/output ports of the single network processor or M network processors.

2. The scalable network processor of claim 1 wherein the connector system comprises:

a first conversion module;

a second conversion module for converting the first and second conversion modules,

the first and second conversion modules have parallel inputs, serial outputs, and are connected at an upper end;

a third conversion module; and

a fourth conversion module for converting the first and second conversion modules,

the third conversion module and the fourth conversion module have serial inputs, parallel outputs, and are connected at the lower end.

3. The scalable network processor of claim 2 further comprising a transmission line interconnecting at least one serial output of the first or second conversion module with at least one serial input of the third or fourth conversion module when a connection to an input/output port of a single network processor is desired.

4. An apparatus for data communication, comprising:

a first network processor having an upstream data path and a downstream data path;

a second network processor having an upstream data path and a downstream data path; and

a connector system interconnecting the first network processor and the second network processor, the connector system comprising first and second modules having a serial output mounted in the upstream path of the first network processor and the second network processor, respectively; third and fourth modules having a serial input in the downstream path of the first network processor and the second network processor, respectively;

a first conductor connecting a serial output on one of the modules at the upper end of the first network processor with a serial input on one of the modules at the lower end of the first network processor;

a second conductor connecting a serial output on one of the modules at the upper end of the second network processor with a serial input on one of the modules at the lower end of the second network processor;

a third conductor connecting the serial output of the other module at the upper end of the first network processor with the serial input of the other module at the lower end of the second network processor.

5. The apparatus for data communication of claim 4, further comprising:

a fourth conductor connecting the other serial output in the upper end module of the second network processor with the serial output on the lower end module of the first network processor.

6. A method for data communication, comprising the steps of:

receiving an incoming data stream through an input port of an interface device;

transmitting the incoming data stream through a plurality of interface processors embedded within the interface device;

dividing an incoming data stream into a first portion to be redirected by the switch fabric device and a second portion temporarily stored separately from the switch fabric device;

directing the first portion to a switch fabric device and the second portion to a memory unit;

receiving a first portion at an interface processor as a data stream flows from the switch fabric device;

recombining the first portion with the second portion; and

the redirected recombined data stream is streamed out through an output port as a result of execution of the instructions by the interface processor.

7. The method of claim 6, wherein said step of transmitting data streams through a plurality of interface processors comprises: the data stream is parsed into a plurality of portions, and the parsed portions are distributed among the plurality of interface processors for parallel processing.