CA2330984A1

CA2330984A1 - System and method for interfacing network processors with shared resources

Info

Publication number: CA2330984A1
Application number: CA 2330984
Authority: CA
Inventors: Matthew D. Clark; Robert D. Oden, Jr.; Eric L. Raymond; Joseph M. Rash
Original assignee: Nortel Networks Ltd
Current assignee: Nortel Networks Ltd
Priority date: 2000-10-10
Filing date: 2001-01-15
Publication date: 2002-04-10

Abstract

An interface logic IC (ILIC) 30 for translating data between a network processor 20 and a shared resource 10.
A data bus between the network processor 20 and the ILIC
30 conveys network processor 20 commands and returns shared resource 10 results. A data bus between the ILIC
30 and the shared resource 10 conveys network processor 20 commands that have been translated to be compatible with the shared resource's 10 interface and returns the results of the commands. The ILIC 30 translates data between the network processor 20 interface into the shared resource 10 interface and vice versa.

Description

DOCKET NO. 12685RNUSOlU
TITLE OF THE INVENTION
System and Method for Interfacing Network Processors With Shared Resources FIELD OF THE INVENTION
The present invention relates generally to a mechanism for interfacing network processors (service requestors) with shared resources (service providers), and more specifically multiple network processors to a single shared resource.
BACKGROUND OF THE INVENTION
The network processing industry uses Content Addressable Memory (CAM) systems to provide a variety of.
services including, but not limited to data packet classification, Quality of Service (QOS), and policy enforcement, as well as packet forwarding.
Some network processors are aimed at the OC-192c classification and policing marketplace. OC-192c refers to Optical Carrier level 192 of the Synchronous Optical Network (SONET) data rate spectrum. OC-192c provides for a 9.953 Gbps data line rate which is commonly rounded and referred to as 10 GB/sec. This means that given an extended burst of 40 byte packets, a 10 Gb/sec stream transmits 35 million packets per second in which every data packet generates a reverse address look up for Quality of Service (QOS), billing, or routing reasons.
Therefore, a CAM must support at least 35 million lookups per second. Even more importantly, a network processor system containing one or more processing elements must be able to categorize 35 million packets per second.
The figure of merit of most concern, however, is the number of user instructions required to process a packet of data. Consider a network processor containing four DOCKET N0. 12685RNUSOlU
processor elements, each element running at 200 MHz. One instruction per clock cycle translates to 800 million instructions per second (max available). Divide that by 20 million packets per second and the result is 40 clock cycles available per packet. This clock per packet value is probably sufficient for simple IP forwarding purposes.
Unfortunately, it is likely insufficient for many more advanced applications.
Several suggestions have been promoted in order to increase the number of clock cycles per packet. One suggestion is to run the processing elements faster (e. g., 400 MHz). Another suggestion is to add more network processing elements to a chip so as to increase the number of threads. Excess heat and routing issues are major concerns with respect to running the processor's faster. Area constraints and verification issues are a concern with respect to increasing the number of network processors per chip.
Another industry problem is the lack of a standard CAM interface. Currently there is no defined standard interface for the variety of CAMs that are on the market..
Most traditional memory devices do have defined standard interfaces that application writers can adhere to while writing computer applications that utilize such memory devices. The lack of a defined standard CAM interface presents several concerns.
One concern is a lack of confidence that by the time a product gets to market, its particular CAM interface will still be i.n use. Another cause for concern is that each CAM vendor potentially uses a different CAM
interface. Thus, a different interface needs to be designed and implemented for each different CAM.
Another issue with non-standard CAM interfaces is that they are single resource devices. There is no way DOCKET N0. 12685RNUSOlU
to connect a CAM to multiple network processors (service requestors). One solution to this problem involves designing a serial bus into a network processor to send CAM access requests to the "front" network processor in a chain of network processors. The frant network processor would then submit the access request to the CAM (shared resource). This method, however, introduces a latency into the chain of network processors where the latency depends on a requesting network processor's position in the chain. Moreover, this method introduces duplicate unneeded pins into each network processor since each network processor would have to accommodate both serial and CAM interface pins. Since CAM pins typically number in excess of 125, there is significant waste.
Additionally, this method does not address network processor scalability.
A second suggested method is to create a star topology bus with respect to the network processors.
This would solve the latency issues but adds an enormous number of unused pins into each network processor.
Moreover, this method limits scalability to a pre-defined number of network processors.
What is needed is a means for increasing the number of clock cycles per packet that can be handled by a network processor to CAM interface. Moreover, the solution needs to resolve non-standard interface issues between network processors and CAMS, or any other shared resource for that matter.
A significant beneficial side ef:~ect of solving the problems identified above is that the solution provides a scalable atmosphere for network processor migration such as, for instance, OC-192 (10 GBs data rate) to OC-768 (40 GBs data rate).

DOCKET N0. 12685RNUS01U
SUMMARY OF THE INVENTION
The present invention proposes keeping current network processor chip configurations unchanged, but to run multiple network processor chips in parallel. This increases the number of network processors at a macro level (more chips), as opposed to a micro level (more processors per chip). Thus, if an application requires 80 clock cycles per packet, the data stream can be divided between two 40 clock cycle per packet network processors where each network processor receives half of:
the packets. Similarly, if an application requires 160 clock cycles per packet, the data stream can be divided among four 40 clock cycle per packet network processors where each network processor receives one quarter of the packets.
This solves the number of clock cycles per packet issue and creates a scalable architecture since faster speeds can be achieved by adding more network processor chips. However, this solution requires a mechanism for interfacing multiple network processors to a single CAM.
It is possible to maintain multiple CAMs concurrent across multiple network processors. However, a single OC-192c stream uses just 60~ of a CAM's resources.
Adding additional CAM's would lead to further under utilization of the CAMS while increasing the area and cost and power consumption of the fin<~1 solution.
The present invention takes a di:Eferent approach by introducing another :I(. into the system, one that can interface between multiple network processors and a single CAM. The additional IC interface approach provides a more scalable and predictable solution for the OC-192c market, as well as allowing network processors the ability to scale t:o OC-768 with out requiring any changes'to the network processors.

DOCKET N0. 12685RNUS01U
The network processor to CAM interface includes a packet processing application running in the network processor to manage (e.g., add, modify, delete) the contents of the CAM. 'The interface exists in an additional IC referred to as an Interface Logic IC
(ILIC). The IhIC can be built into a Field Programmable Gate Array (FPGA) and placed into a network processor chip set.
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.
BRIEF DESCRIPTION OF THE FIGURES
FIGURE 1 illustrates a general block diagram of multiple network processors sharing a single shared resource.
FIGURE 2 illustrates a general block diagram of a single network processor utilizing multiple shared resources.
FIGURE 3 illustrates a general b:Lock diagram of multiple network processors utilizing multiple shared resources.
FIGURE 4 illustrates a detailed block diagram of multiple network processors sharing a single shared resource.
FIGURE 5 illustrates the flow of logic among the various system components for the case of multiple network processors utilizing a single shared resource.
FIGURE 6 illustrates a detailed block diagram of a single network processor utilizing multiple shared resources.

DOCKET N0. 12685RNUSOlU
FIGURE 7 illustrates the flow of logic among the various system components for the case of a single network processor utilizing multiple shared resources.
FIGURE 8 illustrates a detailed block diagram of multiple network processors utilizing multiple shared resources.
FIGURE 9 illustrates the flow of- logic among the various system components for the case of multiple network processors utilizing multiple shared resources.
In the following description like numbers will be used to indicate like elements in the figures.
DETAILED DISCLOSURE OF THE INVENTION
The present invention describes a means for interfacing network processors and shared resources. A
shared resource is basically any component that can serve more than one network processor. Examples of shared resources include, but are not limited to, Content Addressable Memory (CAM) systems, random access memory (RAM), compression and encryption algorithms, and line interfaces.
The interfacing means of the present invention is designed to allow multiple network processors the ability to utilize a single shared resource (:~1:1 configuration) in the case of a shared resource that can meet and exceed the requests of a single network processor. Moreover, the present invention can also be configured to allow a single network processor to utilize multiple shared resources (1:M configuration) in the case of a network processor that is capable of exhausting a single shared resource. In addition, the present invention can be configured to allow multiple network processors to utilize multiple shared resources (N:M configuration) without unnecessarily wasting pins. The interface logic DOCKET N0. 12685RNUS01U
is responsible for resolving non-standard interface issues between the network processors) and the shared resource(s).
FIGURE 1 .illustrates a general block diagram in which multiple network processors share a single shared resource. The N:1 configuration allows one shared resource 10 to be utilized by N network processors 20 provided that the shared resource 10 has the excess capacity to service more than one network processor 20.
In this embodiment, N network processors 20 (servic:e requestors) all have access to a single shared resource 10 (service provider). The shared resource 10 has sufficient bandwidth to handle all service requests but has only one interface port. N interface ports are provided on an Interface Logic IC (ILIC) 30. ILIC 30 separates network processors 20 and shared resource 10.
The N network processors 20 interface to N interface ports of ILIC 30 on a 1:1 basis. Thus, each network processor 20 is able to connect with shared resource 10 on a 1:1 basis. ILIC 30 provides the logic necessary to order and control multiple service requests from multiple network processors 20 simultaneously so that shared resource 10 is much more efficiently utilized.
FIGURE 2 illustrates a l:M system configuration that allows M shared resources 10 to be utilized by a single network processor 20. The 1:M configuration is different from the N:1 configuration described :in FIGURE 1 in that the interface logic IC 30 also has a communications port.
The communications port allows multiple interface logic ICs 30 to be chained together. Chaining multiple interface logic ICs 30 allows coherency issues to be managed.

DOCKET N0. 12685RNUSOlU
One of the difficulties of the N:M and 1:M
configurations is keeping the data in the M shared resources identical across all M devices such that accessing one shared resource is the same as accessing any other. This is referred to as coherency. For example, in a system where M = 4, the same result is returned regardless of which shared resource ILIC 30 queries for the information. ILIC 30 essentially creates a virtual shared resource that is larger and faster than any single component of the system.
A 1:M configuration is useful when one shared resource 10 does not have the capacity to service a single network processor 20. In this architecture, the network processor 20 sees a virtual single shared resource 10 because the logic interface IC 30 handles the differentiation of multiple shared resources 10. Just as in the 1:M case, the shared resources 10 are connected i_n a 1:1 (physical) configuration with, but are logically aggregated to, the network processor 20.
FIGURE 3 illustrates a general block diagram in which multiple network processors utilize multiple shared resources in an integrated fashion. FIGURE 3 illustrates an N:M configuration that allows M shared resources 10 to be utilized by N network processors 20. Situations when.
an N:M configuration .is useful include when a shared resource 10 has the excess capacity to service more than one network processor 20, but not all of the network processors 20.
The data in the shared resource 10 must be uniform for all network. processors 20. Thus, adding shared resources 10 to create a 1:1 configur<~tion will not work because of the difficulty of maintaining coherency among multiple shared resources 10.
_g_ DOCKET N0. 12685RNUSOlU
In this case, N network processors 20 (service requestors) have access to M shared resources 10 (service providers). Each shared resource has the bandwidth to handle a portion of t:he total service requests from all of the network processors 20, however, each shared resource 10 contains only one interface port. By providing N interface ports on the Interface Logic ICs 30 for the N network processors 20 to interface to, network processors 20 and shared resources 30 are physically connecting on a one-to-one basis. The Interface Logic ICs 30 provide the logic necessary to order and control multiple simultaneous service requests so that the shared resources 10 are much more efficiently utilized.
To further illustrate the present invention an example detailing the interaction between two network processors and a TCAM is described.
Consider a network processor that utilizes a Ternary Content Addressable Memory (TCAM). The network processor interface is different from the TCAM interface making communications and data exchanges between the two components problematic. The only way to :resolve the problem is to provide a translation means between the two interfaces.
The current state-of-the-art is to design an interface translation for a specific network processor to interface with a specific TCAM. This requires that a third IC acting as a translation interface between the network processor and TCAM be developed. This third IC
interface, however, must be custom built and wired/programmed with specific knowledge of the network processor interface and the TCAM interface making it specific to a particular network processor and TCAM
combination. In other words, the third IC interface cannot be augmented to work with other network processor _g_ DOCKET N0. 12685RNUS01.U
and TCAM combinations having different: interface requirements. Given that there are a variety of non-standard TCAM interfaces, this solution is undesirable when certain market factors are considered. First, it is expensive to custom design a specific interface IC for each particular application. Second, and more important,, it is risky to utilize an interface IC; that is tied to one specific TCAM interface. What if that TCAM vendor goes out of business, or if they migrate to a new interface, or if the TCAM industry subsequently adopts a standard interface? Any of these events could render the specific interface IC useless which, i.n turn, would render the entire system that it is a part of useless.
The proposed solution is to create a generic translation interface IC, previously identified as an Interface Logic IC (ILIC), that is able to communicate and exchange data between a variety of network processors and a variety of TCAMs regardless of t:he specific interface(s). Such a generic interface eliminates the risks described above in that changing a TCAM interface can be handled by the end user via some relatively minor programming techniques. Thus, if a TC:AM interface within a system is updated or otherwise changed, the ILIC can be re-programmed on the fly. The ILIC becomes a re-usable, re-programmable component that can be adapted to interface between a variety of network processors and TCAMs.
One way to provide a generic platform for development of a translation interfaces is to use a Field Programmable Gate Array (FPGA). An FPGA is a specialized microprocessor having no physical connections among its own logic gates at the time of manufacture. It does contain, however, a large number of potential connections that can be specifically implemented at a later date, DOCKET N0. 12685RNUS01U
i.e., in the "field". For the present invention, an FPGA
can be programmed to .interface a generic network processor bus with a specific shared resource interface.
Although the network processor to shared resource interface is referred to as a "bus", it is actually two single direction data buses, each with its own handshake interface.
Network processors) communicate with the ILIC via a standard point to point low voltage differential signaling (LVDS) data bus. Both data buses are configurable with respect to speed, and width. This is because the FPGA needs to support an OC-192 data stream that is potentially spread across up to eight network processors. The data search bus and data results bus can run at different speeds. Typically, the ILIC receives an LVDS clock signal from the network processor over the search data bus, and uses it to drive the data results bus back to the network processor. Each bus has a data enable signal that the transmitter asserts when data is valid on the next clock edge. For the case of a single network processor running OC-192 speeds, the data search bus will have a bandwidth requirement of 200 MHz and 32 bits wide, which gives the data search bus a 6.4Gb/s bandwidth. Because LVDS is a differential signal, each bit requires two pins on the network processor IC and the ILIC.
FIGURE 4 illustrates a block diagram of two network processors sharing a single shared resource which in this case is a TCAM. The following table details a pin list for a network processor to ILIC interf=ace.
Data Pin Name Description Bus Search TCAM Search Clk This LVDS signal provides Out the clock to the ILIC for serial communication on the data search bus.

DOCKET NO. 12685RNUS01U
Data Pin Name Description Bus Search TCAM Data Valid Out This LVDS signal is asserted n low _ while data is valid on search da to bus.

Search TCAM Search Data This LVDS signal bus transfers Out; the requested TCAM operation to the ILIC. Search Data Out contains the MSbit of the TCAM request during-the first data transfer, while Search Data(0) contains the LSb:it of the TCAM request during the last data transfer.

Results TCAM Result C.Lk This LVDS signal provides In the clock to the network processor .for serial communication on the dat<3 results bus.

Results TCAM Result Valid This LVDS signal is asserted In n low while data is valid on the data results bus.

Results TCAM Result In This LVDS signal bus transfers ~he results of the requested TCAM

operation back to the network processor. Result Data contains the MSbit of the TCAM result during the first data transfer, while Result Data(0) contains the LSbit of the TCAM result during the last data transfer.

The term "LVDS signal" is used to refer to the pair of LVDS pins on the network processor IC that together produce the signal of the pin name in the table above.
Because of the variances of TCAM interfaces (e. g., 36 bit key to 288 bit key), much of the detail of the bus interfaces are left for the end user to define. However, certain minimum features have been defined.
A user defined application running on a network processor 20 will generate a TCAM command. The TCAM
command will define what operation (e. g., data search, table management, TCAM configuration) the TCAM 10 is to perform, as well as any data required by TCAM 10 to perform the operation. The specifics of the TCAM command are user definable since the user has control over the software that generates the TCAM command as well as the ILIC design, which must understand the format of the TCAM
command. Once the TCAM command has been created, it is DOCKET N0. 12685RNUSOIU
sent from network processor 20 over an LVDS data bus to ILIC 30.
Moving the data from the software running on a network processor to the pins of network processor 20 via an ILIC interface 25 can be accomplished in a wide variety of methods. Por this example, there is an internal bus that the network processor application software can access for sending and receiving data. One of the devices on this bus includes custom logic that converts the TCAM command data into sa_gments that are the proper width for the LVDS bus. In addition, the custom logic device drives the clock and enables signals as well as inserts any desired error checking/correction into the LVDS transmission.
ILIC 30 is constantly receiving 'rCAM commands from network processors 1 & 2 20. The TCAM commands are forwarded to an internal ILIC arbitrator 34 that is responsible for prioritizing and serializing the TCAM
commands. Arbitrator 34 then forwards the prioritized TCAM commands to TCAM specific logic 36 for translation.
The translated requests are forwarded to TCAM 10 for operation. TCAM 10 performs the requested operation and.
returns the TCAM command results back to the TCAM
specific logic 36 of ILIC 30. The TC.AM specific logic 36 sends the TCAM command results to a router 38 component that determines which network processor (1 or 2) 20 made the request and shou:Ld receive the TC.AM command results.
Router 38 forwards the TCAM command results to a protocol engine 39 that packages the TCAM command :result data for' the requesting network processor 20. Protocol engine 39 then forwards t:he TCAM command results to the proper network processor 20.

DOCKET N0. 12685RNUSOlU
The data in the payload (e.g., command data or command result data) is passed transparently by the ILIC
interface 25. The only point in the link that understands the data payload is the user designed portion of the FPGA. This al7_ows user software and the FPGA to be configurable for a specific application. Since the TCAM returns a result for all requests, be it a search or a table update, network processors are assured of receiving results in t:he same order they were requested.
Thus, ILIC 30 can be viewed as FIFO, where TCAM
operations are inserted in one end of a FIFO queue, and, after some pipeline delay, TCAM results fall out of the opposite end of the FIFO queue. ILIC 30 provides thread/processor tracking and insure that TCAM results are returned to the correct processor/thread.
The data flow corresponding to the architecture in FIGURE 4 is shown in FIGURE 5. The following operations occur in a round trip between the network processor and the TCAM.
Application software running in network processors 1 & 2 periodically identify a need for a TCAM request 502.
Since each network processor is running independent of the other, it is possible that they wil7_ make simultaneous TCAM reriuests. The network processors create TCAM commands, which (in this example) are larger than the internal bus between the network processor application software and the NP/ILIC interface. As a result, transfers of multiple segments (two in this example) are needed across the internal bus. The network processor application software builds 504 and sends 506 the first segment of the TCAM command. Immediately thereafter, the network processor application software builds 508 and sends 510 the second segment of the TCAM command.

DOCKET N0. 12685RNUSOlU
The NP/ILIC interface receives the TCAM command segments from the internal bus, identifies the sender (thread id, processor id, etc), and places the incoming TCAM command into temporary storage, until. it determines that a complete TCAM command has been received from the network processor. 'The NP/ILIC interface then packages 512 the complete TCAM command. At th_Ls paint, the NP/ILIC interfaces in each of the network processors have a complete TCAM command to send to the ILIC. Prior to sending, however, the NP/ILICs perform any error correction enhancements required on the data word, and then begin transmitting 514 the packaged data to the ILIC
via the LVDS data bus.
The ILIC receives 516 the two dal=a streams from the NP/ILIC interfaces of each network processor at approximately the same time. Because the TCAM is a serial access resource, the ILIC must arbitrate 518 between the data streams from each network processor to decide the order that the TCAM commands are sent to the TCAM. The arbitration criteria could be a simple scheduling algorithm, or could be based on which order is going to cause the least number of stalls in the TCAM
pipeline. For this example, we will schedule the TCAM
command from network processor 2 to go before the TCAM
command from network processor 1.
The network processor 2 TCAM command is sent to TCAM
specific logic within the ILIC. Any data correction that may have been added is checked and removed. If the data correction determines that an invalid packet was received, a No Operation command is inserted in the pipeline going to the TCAM, along with sideband information so that the order of TCAM commands is preserved. If there :is no transmission error, the command is separated .into opcode and data portions, and DOCKET NO. 12685RNUSOlU
any processing that needs to occur to convert the opcode into something that the TCAM understands occurs. For example, the software may only requirs~ 3 TCAM operations, allowing the command field of the LVDS transmission to be coded into 2 binary bats. The TCAM, however, may have an opcode field definition of 12 bits. The ILIC must translate the coded 2 binary bits into the correct 12 bit opcode such that the TCAM understands the command.
The opcode, and its associated data for network processor 2 is sent 520 to the TCAM. When that transmission is finished, the converted TCAM command from network processor 1 i5 sent 522. The TCAM receives 524 the TCAM commands via the pipeline and unpacks 526 them as it receives them. Some delay after the network processor 2 TCAM command is finished, the TCAM will generate a result and returns 528 it to the ILIC. The ILIC knows how much delay a given TCAI~ command requires.
The ILIC packages the results returned from the TCAM, along with any other status information the system designer desires. The packaged results are then routed 530 back to the appropriate network processor. The NP/ILIC interface within the network processor has been keeping track of which threaded, processor ID is due a result. When the NP/ILIC interface receives a result from the ILIC, it breaks the result into internal bus sized packets, anc~ sends the data back to the requester.
To further illustrate the present invention an example detailing the interaction between a single network processor and two TCAMs is described.
A user defined application running on a network processor 20 generates TCAM commands. The TCAM commands, however, are too frequent for a single TCAM to handle.

DOCKET N0. 12685RNUS01U
Once a TCAM command has been created, it is sent from network processor 20 over an hVDS data bus to ILIC 30.
The TCAM command is forwarded to TCAM specific logic 36 for translation. Lipon translation the TCAM command is sent to an internal router 38 that is responsible for selecting a TCAM 10 to handle the TCAM command. Router 38 forwards the TCAM command to one of the TCAMs 10. The TCAM 10 executes the TCAM command and returns the results to an internal ILIC arbitrator 34 that. is responsible for prioritizing and serializing the TCAM command results.
Arbitrator 34 then forwards the prioritized TCAM command results to the TCAM specific logic 36.. The TCAM command results are translated and sent to a protocol engine 39 that packages the result data for the requesting network processor 20. Protocol engine 39 then forwards the TCAM
command results to the network processor 20 in the proper order.
The data flow corresponding to the architecture in FIGURE 6 is shown in FIGURE 7. The following operations occur in a round trip between the network processor and the TCAMs.
Application software running in t:he network processor periodically identifies a need for a TCAM
request 702. The network processor creates TCAM
commands, which (in this example) are larger than the internal bus between t:he network processor application software and the NP/ILIC interface. As a result, transfers of multiple segments (two in this example) are needed across the internal bus. The network processor application software builds 704 and sends 706 the first segment of the TCAM command. Immediately thereafter, the network processor application software builds 708 and sends 710 the second segment of the TCAM command.

DOCKET N0. 12685RNUSOlU
The NP/ILIC interface receives the TCAM command segments from the internal bus, identifies the sender (thread id, processor id, etc), and places the incoming TCAM command into temporary storage, until it determines that a complete TCAM command has been received from the network processor. The NP/ILIC interface then packages 712 the complete TCAM command. At this point, the NP/ILIC interface in the network processor has a complete TCAM command to send to the ILIC. Prior to sending the TCAM command, however, the NP/ILIC performs any error correction enhancements required on t:he data word, and then begins transmitting 714 the packaged data to the ILIC via the LVDS data bus.
The ILIC receives 716 the data stream from the NP/ILIC interface of the network processor The network processor TCAM command is sent to TCAM
specific logic 718 within the ILIC. Any data correction that may have been added is checked and removed. If there is no transmission error, the TCAM command is separated into opcode and data portions, and any processing that needs to occur to convert the opcode 720 into a format that the TCAM understands occurs. In this example, TCAM commands can be sent out by the network processor at a faster rate than a single TCAM can handle.
Thus, multiple (two) 'rCAMS are used to process all of the TCAM commands. An internal ILIC router is used to route 722 the translated TCAM commands to TCAM 1 or TCAM 2 depending on availabi.Lity.
Either TCAM 1 or TCAM 2 receives 724 a TCAM command via the pipeline and unpacks 726 it as it receives it.
Some delay after the network processor TCAM command is finished executing, the TCAM will generate a result and returns 728 it to the ILIC. Since two TCAMs are being employed independently, it is possible for both of them DOCKET N0. 12685RNUSOlU
to send results back to the ILIC simultaneously. Thus, an arbitrator is used 730 to receive and order the returned TCAM command results. The ILIC packages the results returned from the TCAM, along with any other status information the system designer desires. The packaged results are then sent 730 back to the network processor. The NP/ILIC interface within the network processor has been keeping track of which threaded, processor ID is due a result. When the NP/ILIC interface receives a result from the ILIC, it breaks the result into internal bus sized packets, and sends the data back to the requester.
To further illustrate the present invention an example detailing the interaction between two network processors and two TCAMs is described.
User defined applications running on network processors 20 generate TCAM commands. The TCAM commands, however, are too frequent for a single TCAM to handle.
Moreover, since each network processor is running independent of the other, it is possible that they will make simultaneous TCAM requests. Once a TCAM command has been created, it is sent from its network processor 20 over an LVDS data bus to ILIC 30.
ILIC 30 is constantly receiving TCAM commands from network processors 1 & 2 20. The TCAM commands are forwarded to a first internal ILIC arbitrator 34a that is responsible for prioritizing and serializing the TCAM
commands. Arbitrator 34a then forwards the prioritized TCAM commands to TCAM specific logic 36 for translation.
Upon translation the 'rCAM command is sent to a first router 38a that is responsible for selecting a TCAM 10 to handle the TCAM command. Router 38a forwards the TCAM
command to one of the TCAMs 10. The 'rCAM 10 executes th.e DOCKET N0. 12685RNUS01U
TCAM command and returns the results to a second internal ILIC arbitrator 34b that is responsible for prioritizing and serializing the TCAM command results. Arbitrator 34b then forwards the prioritized TCAM command results to the TCAM specific logic 36. The TCAM specific logic 36 sends the TCAM command results to a second r_outer 38b that determines which network processor (1 or 2) 20 made the request and should receive the TCAM command results.
Router 38b forwards the TCAM command results to a protocol engine 39 that packages the TCAM command result data for the requesting network processor 20. Protocol engine 39 then forwards the TCAM command results to the proper network processor 20.
The data flow corresponding to the architecture in FIGURE 8 is shown in FIGURE 9. The following operations occur in a round trip between the network processor and the TCAMs.
Application software running in network processors 1 & 2 periodically identify a need for a TCAM request 902.
Since each network processor is running independent of the other, it is possible that they will make simultaneous TCAM requests. The network processors create TCAM commands, which (in this example) are larger than the internal bus between the network processor application software and the NP/ILIC interface. As a result, transfers of multiple segments (two in this example) are needed across the internal bus. The network processor application software builds 904 and sends 906 the first segment of the TCAM command. Immediately thereafter, the network processor app7_ication software builds 908 and sends 910 the second segment of the TCAM
command .

DOCKET NO. 12685RNUSOlU
The NP/ILIC interface receives the TCAM command segments from the internal bus, identifies the sender (thread id, processor id, etc), and p:Laces the incoming TCAM command into temporary storage, until it determines that a complete TCAM command has been received from the network processor. The NP/ILIC interface then packages 912 the complete TCAM command. At this point, the NP/ILIC interfaces in each of the network processors have a complete TCAM command to send to the ILIC. Prior to sending, however, the NP/ILICs perform any error correction enhancements required on the data word, and then begin transmitting 914 the packaged data to the ILIC
via the LVDS data bus.
The ILIC receives 916 the two data streams from the NP/ILIC interfaces o:E each network processor at approximately the same time. Because the TCAM is a serial access resource, the ILIC must arbitrate 918 between the data streams from each network processor to decide the order that the TCAM commands are sent to the TCAMs.
The network processor TCAM command is sent to TCAM
specific logic within the ILIC. Any data correction that may have been added .LS checked and removed. If there is no transmission error, the TCAM command is separated into opcode and data portions, and any processing that needs to occur to convert the opcode into a format that the TCAM understands occurs. In this example, TCAM commands can be sent out by the network processor at a faster rate than a single TCAM can handle. Thus, multiple (two) TCAMS are used to process all of the TCAM commands. An internal ILIC router i.s used to route 920 922 the translated TCAM commands to TCAM 1 or TCAM 2 depending on availability.

DOCKET N0. 12685RNUS01U
Either TCAM 1 or TCAM 2 receives 924 a TCAM command via the pipeline and unpacks 926 it as it receives it.
Some delay after the network processor TCAM command is finished executing, the TCAM will generate a result and returns 928 it to the ILIC. Since two TCAMs are being employed independently, it is possible for both of them to send results back to the ILIC simu:Ltaneously. Thus, an arbitrator is used 930 to receive and order the returned TCAM command results. The ILIC packages the results returned from the TCAM, along with any other status information the system designer desires. The packaged results are then sent: 932 back to the appropriate network processor. The NP/ILTC interface within the network processor has been keeping track of which threaded, processor ID is due a result. When the NP/ILIC interface receives a result from the ILIC, it breaks the result into internal bus sized packets, and sends the data back to the requester.
In the foregoing disclosure the term CAM was used to describe a shared resource that was a Content Addressable Memory. It is to be understood that one of ordinary skill in the art could substitute a Ternary Content Addressable Memory (TCAM) for a CAM. TCAM refers to a three level Content Addressable Memory system. The "T"
in TCAM may also sometimes be referred to as Tertiary or Terinary. Thus, when referring to Content Addressable Memory (CAM), the present disclosure includes three (3) level content addressable memory systems.
It is to be understood that the present invention illustrated herein is readily implementable by those of ordinary skill in the art as a computer program product having a medium with a computer program embodied thereon.
The computer program product is capable of being loaded and executed on the appropriate computer processing DOCKET N0. 12685RNUSOlU
devices) in order to carry out the method or process steps described. Appropriate computer program code in combination with hardware implements many of the elements of the present invention. This computer code is often stored on storage media. This media can be a diskette, hard disk, CD-ROM, optical storage media, or tape. The media can also be a memory storage device or collection of memory storage devices such as read-only memory (ROM) or random access memory (RAM). Additionally, the computer program code can be transferred t:o the appropriate hardware over some type oi= data network.
The present invention has been described, in part, with reference to flowchart illustration(s). It will be understood that each block of the flowchart illustration(s), and combinations of blocks in the flowchart illustration(s), can be implemented by computer program instructions.
These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block(s).
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored i.n the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block(s). The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of DOCKET NO. 12685RNUSOlU
operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block(s).
Accordingly, blocks) of flowchart illustrations) or message diagrams) support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions.
It will also be understood that each block of flowchart illustration(s), and combinations of blocks in flowchart illustrations) can be implemented by special purpose hardware-based computer systems that ;perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
In the following claims, any means-plus-function clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents but also equiv~slent structures.
Therefore, it is to be understood that the foregoing is illustrative of the present invention and is not to be construed as limited to the specific embodiments disclosed, and that modifications to 'the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the=_ appended claims.
The invention is defined by the following claims, with equivalents of the claims to be included therein.

Claims

1. An interface logic IC (ILIC) comprising:
an arbitrator for receiving and organizing shared resource commands issued by multiple network processors;
shared resource logic for:
receiving said shared resource commands from said arbitrator;
translating said shared resource commands from a network processor format to a shared resource format;
receiving shared resource results from a shared resource;
translating said shared resource results from said shared resource format to said network processor format; and a router for:
receiving said translated shared resource results from said shared resource logic; and routing said shared resource results to said multiple network processors.

2. The interface logic IC (ILIC) of claim 1 wherein said shared resource is a content addressable memory (CAM).

3. The interface logic IC (ILIC) of claim 1 wherein said shared resource is a ternary content addressable memory (TCAM).

4. An interface logic IC (ILIC) comprising:
shared resource logic for:
receiving shared resource commands from a network processor;

translating said shared resource commands from a network processor format to a shared resource format;
receiving shared resource results from an arbitrator; and translating said shared resource results from said shared resource format to said network processor format; and a router for receiving said translated shared resource commands from said shared resource logic and routing said translated shared resource commands to multiple shared resources; and an arbitrator for receiving and organizing said shared resource results from said multiple shared resources.

5. The interface logic IC (ILIC) of claim 4 wherein said multiple shared resources are content addressable memorys (CAMs).

6. The interface logic IC (ILIC) of claim 4 wherein said shared resources are ternary content addressable memorys (TCAMs).

7. An interface logic IC (ILIC) comprising:
a first arbitrator for receiving and organizing shared resource commands from multiple network processors;
a second arbitrator for receiving and organizing shared resource results from multiple shared resources;
shared resource logic for:
receiving said shared resource commands from said first arbitrator;

translating said shared resource commands from a network processor format to a shared resource format;
receiving shared resource results from said second arbitrator;
translating said shared resource results from said shared resource format to said network processor format;
a first router for receiving said translated shared resource commands from said shared resource logic and routing said translated shared resource commands to multiple shared resources; and a second router for receiving said translated shared resource results from said shared resource logic and routing said translated shared resource results to multiple network processors.

8. The interface logic IC (ILIC) of claim 7 wherein said multiple shared resources are content addressable memorys (CAMs).

9. The interface logic IC (ILIC) of claim 7 wherein said shared resources are ternary content addressable memorys (TCAMs).

10. An interface logic IC (ILIC) comprising:
means for receiving and organizing shared resource commands issued by multiple network processors;
means for translating said shared resource commands from a network processor format to a shared resource format; means for sending said translated shared resource commands to a shared resource;
means for receiving shared resource results from said shared resource;

means for translating said shared resource results from said shared resource format to said network processor format; and means for routing said shared resource results to said multiple network processors.

11. An interface logic IC (ILIC) comprising:
means for receiving shared resource commands from a network processor;
means for translating shared resource commands from a network processor format to a shared resource format;
means for routing said translated shared resource commands to multiple shared resources;
means for receiving and organizing shared resource results from multiple shared resources; and means for translating said shared resource results from said shared resource format to said network processor format;
means for sending said translated shared resource results to a network processor.

12. An interface logic IC (ILIC) comprising:
means for receiving and organizing shared resource commands issued by multiple network processors;
means for translating said shared resource commands from a network processor format to a shared resource format; means for routing said translated shared resource commands to multiple shared resources;
means for receiving and organizing shared resource results from said multiple shared resources;

means for translating said shared resource results from said shared resource format to said network processor format; and means for routing said shared resource results to said multiple network processors.

13. A computer program product for linking multiple network processors with a single shared resource, the computer program product having a medium with a computer program embodied thereon, the computer program product comprising:

computer program code for receiving and organizing shared resource commands issued by multiple network processors;

computer program code for translating said shared resource commands from a network processor format to a shared resource format;

computer program code for sending said translated shared resource commands to a shared resource;

computer program code for receiving shared resource results from said shared resource;

computer program code for translating said shared resource results from said shared resource format to said network processor format; and computer program code for routing said shared resource results to said multiple network processors.

14. A computer program product for linking a network processor with multiple shared resources, the computer program product having a medium with a computer program embodied thereon, the computer program product comprising:

computer program code for receiving shared resource commands from a network processor;

computer program code for translating shared resource commands from a network processor format to a shared resource format;

computer program code for routing said translated shared resource commands to multiple shared resources;

computer program code for receiving and organizing shared resource results from multiple shared resources;

and computer program code for translating said shared resource results from said shared resource format to said network processor format;

computer program code for sending said translated shared resource results to a network processor.

15. A computer program product for linking multiple network processors with multiple shared resources, the computer program product having a medium with a computer program embodied thereon, the computer program product comprising:

computer program code for receiving and organizing shared resource commands issued by multiple network processors;

computer program code for translating said shared resource commands from a network processor format to a shared resource format;

computer program code for routing said translated shared resource commands to multiple shared resources;
computer program code for receiving and organizing shared resource results from said multiple shared resources;

computer program code for translating said shared resource results from said shared resource format to said network processor format; and computer program code for routing said shared resource results to said multiple network processors.

16. A system for permitting multiple network processors to utilize a single shared resource comprising:

multiple network processors, each network processor generating and sending shared resource commands over an LVDS data bus;

an interface logic IC (ILIC) for:
receiving said shared resource commands over said LVDS data bus as generated by said multiple network processors;

arbitrating said shared resource commands such that the shared resource commands are serialized;

translating said shared resource commands from a network processor interface to a shared resource interface;

sending said translated shared resource commands to said shared resource;
receiving shared resource results from said shared resource;
translating said shared resource results from a shared resource interface to a network processor interface;

routing said translated shared resource results to said network processors; and a shared resource for:

receiving said translated shared resource commands ;

executing said translated shared resource commands; and returning shared resource results to said ILIC.

17. The system of claim 16 wherein said shared resource is a content addressable memory (CAM).

18. The system of claim 16 wherein said shared resource is a ternary content addressable memory (TCAM).

19. A system for permitting a network processor to utilize multiple shared resources comprising:

a network processor for generating and sending shared resource commands over an LVDS data bus;

an interface logic IC (ILIC) for:

receiving said shared resource commands over said LVDS data bus as generated by said network processor;

translating said shared resource commands from a network processor interface to a shared resource interface;

routing said translated shared resource commands to said shared resources;

receiving shared resource results from said shared resources;

arbitrating said shared resource results such that the shared resource results are serialized;

translating said shared resource results from a shared resource interface to a network processor interface;

sending said translated shared resource results to said network processor; and multiple shared resources for:

receiving said translated shared resource commands;

executing said translated shared resource commands; and returning shared resource results to said ILIC.

20. The system of claim 19 wherein said shared resources are content addressable memorys (CAMs).

21. The system of claim 19 wherein said shared resources are ternary content addressable memorys (TCAMs).

22. A system for permitting multiple network processors to utilize multiple shared resources comprising:

multiple network processors, each network processor generating and sending shared resource commands over an LVDS data bus;

an interface logic IC (ILIC) for:

receiving said shared resource commands over said LVDS data bus as generated by said multiple network processors;

arbitrating said shared resource commands such that the shared resource commands are serialized;

translating said shared resource commands from a network processor interface to a shared resource interface;

routing said translated shared resource commands to multiple shared resources;

receiving shared resource results from said multiple shared resources;

translating said shared resource results from a shared resource interface to a network processor interface;

routing said translated shared resource results to said network processors; and multiple shared resources for:

receiving said translated shared resource commands;
executing said translated shared resource commands; and returning shared resource results to said ILIC.

23. The system of claim 22 wherein said shared resources are content addressable memorys (CAMs).

24. The system of claim 22 wherein said shared resources are ternary content addressable memorys (TCAMs).