CA2565596A1

CA2565596A1 - Semantic processor storage server architecture

Info

Publication number: CA2565596A1
Application number: CA002565596A
Authority: CA
Inventors: Somsubhra Sikdar; Kevin Jerome Rowett; Jonathan Sweedler; Rajesh Nair; Komal Rathi; Hoai V. Tran; Caveh Jalali
Original assignee: Mistletoe Technologies, Inc.; Somsubhra Sikdar; Kevin Jerome Rowett; Jonathan Sweedler; Rajesh Nair; Komal Rathi; Hoai V. Tran; Caveh Jalali
Current assignee: MISTLETOE TECHNOLOGIES Inc
Priority date: 2004-05-11
Filing date: 2005-05-11
Publication date: 2005-11-24
Also published as: EP1761852A2; KR20070020289A; WO2005111813A2; JP2007537550A; WO2005111813A3

Abstract

A storage server uses a semantic processor to parse and respond to client requests. A direct execution parser in the semantic processor parses an input stream, comprising client storage server requests, according to a defined grammar. A semantic processor execution engine capable of manipulating data (e.g., data movement, mathematical, and logical operations) executes microcode segments in response to requests from the direct execution parser in order to perform the client-requested operations. The resulting operational efficiency allows an entire storage server to be collapsed in some embodiments into a few relatively small integrated circuits that can be placed on a media device~s printed circuit board, with the semantic processor itself drawing perhaps a few Watts of power.

Description

SEMANTIC PROCESSOR STORAGE SERVER ARCHITECTURE
REFERENCE TO RELATED APPLICATIONS

This application claims priority of U.S. utility patent application No.
10/843,727, filed May 11, 2004, U.S. provisional patent application No. 60/591,978, filed July 28, 2004, U.S.
provisional patent application No. 60/591,663, filed July 27, 2004, U.S.
provisional patent application No. 60/590,738, filed July 22, 2004, and U.S. provisional patent application No. 60/592,000, filed July 28, 2004.

Copending U.S. Patent Application 10/351,030, titled "Reconfigurable Semantic Processor," filed by Somsubhra Sikdar on January 24, 2003, is incorporated herein by reference.

FIELD OF THE INVENTION

This invention relates generally to storage servers; and more specifically to storage server implementations using digital semantic processors.

BACKGROUND OF THE INVENTION

Traditionally, a storage server is a networked computer that provides media and file access to other computers and computing devices, referred to herein as "clients." The storage server may be accessible on a Wide-Area Network (WAN), a Local-Area Network (LAN), and/or share a dedicated point-to-point connection with another computing device.

Storage servers can be of several types, including a Network Attached Storage (NAS) server, a Storage Area Network (SAN) server, and an application-style server.

A NAS server comprehends directory structures. Requests to a NAS server generally specify a file path/filename and action such as create, read, write, append, etc. A NAS server converts the requests to one or more disk sector/block disk access transactions, accesses the disk (or other storage media), and performs the requested file-level transaction. Such a server maintains and utilizes a directory structure to locate the file starting locations, and a File Allocation Table to reconstruct segmented files and locate free sectors on a disk or other writeable media.

A SAN server generally receives direct requests for disk sector/block-based transactions, and may have no knowledge of a file system or relationships between data blocks. Although block requests can relate to physical block locations on a media device, more typically a SAN server supports logical blocks-thus allowing the server to make one physical disk drive appear as several logical drives, or several physical disk drives appear as one logical drive, as in a RAID (Redundant Array of Independent Disks) scheme.

An application-style server generally provides access to media in a media-specific format. For instance, an application-style server could provide access to streaming audio or video without the explicit knowledge of a filepath/filename by the client (who may not know the filename or have the authority to access a file directly), while reading the streaming media from files and encapsulating it in a streaming packet format.

Architecturally, most storage servers differ little from general-purpose computers (other than in some cases including more and/or faster than usual disk drives and disk bus controllers). Figure 1 shows a block diagram for a typical storage server 20.
A Central Processing Unit (CPU) 22, typically one or more microprocessors, operates the server according to stored program instructions. This type of processor is often called a von Neumann (VN) processor or machine, after its innovator, who proposed its execution style of processing sequential instructions.

CPU 22 connects through a Frontside Bus (FSB) 26 to a Memory Controller/Hub (MCH) 24, which is responsible for interfacing CPU 22 to other system components that store microprocessor program instructions and store, supply, and/or consume data. MCH 24 manages system memory 30 through memory bus 32. MCH 24 also communicates with PCI

(Peripheral Component Interconnect) bridge 40 across hub bus 42 to move data between PCI-connected devices and either memory 30 or CPU 22.

Many different types of data source/sink devices can connect to server 20 through PCI
bridge 40 and PCI bus 44. For storage server purposes, the two necessary devices are a network interface card (NIC) 50 and a media controller, such as ATA (AT
Attachment) controller 60.

NIC 50 allows the server to attach directly to a Local Area Network (LAN) or other network, a Fibre Channel switch fabric, or other point-to-point connection to a client or another device that provides network visibility for the server. NIC 50 thereby provides a communication path for the server to receive data requests from clients and respond to those requests. The actual network physical communication protocol can be varied by inserting a different NIC, e.g., for whatever wired or wireless communication protocol is to be used, and loading a software device driver for that NIC onto CPU 22.

ATA controller 60 provides at least one ATA bus 62 for attaching media devices such as hard disks, optical disks, and/or tape drives. Each ATA bus 62 allows either one or two media devices 64 to attach to storage server 20. Some servers may employ other controllers, such as a SATA (serial ATA) controller or a SCSI (Small Computer System Interface) host adapter.

In order to operate as a storage server, CPU 22 must execute a variety of software programs. With NAS, the two common formats are NFS (Networked File System) and CIFS
(Common Internet File System); the former protocol is found primarily in UNIX
"
environments, and the latter protocol in Microsoft operating system environments. For an NFS server, the following software processes help implement NFS: a network device driver for NIC 50; TCP/IP drivers; RPC (Remote Procedure Call) and XDR (External Data Representation) drivers to present data from TCP/IP to NFS server software;
the NIPS server software itself; a local file system; a VFS (Virtual File System) driver to interface the NFS
server software with the local file system; data buffering software; and device drivers for the storage controller/host adapter. For each NFS data transaction, CPU 22 must execute a process for each of these software entities, switching context between them as required. To provide adequate performance, storage server 20 operates at relatively high speed and power, requiring forced air cooling and a substantial power supply to operate the microprocessor, chipset, network interface, memory, other peripherals, and cooling fans. The current state of the art, with a 1 Gbps (Gigabit per second) full duplex network connection, requires about 300 watts of power, occupies a volume of 800 cubic inches, and costs about $1500 without storage media.

Some vendors have attempted to build custom integrated circuit hardware explicitly for processing NAS requests. Design costs for such circuits generally cannot be recaptured in the fluid world of telecommunication protocols, where a specific complex circuit built to handle a narrow combination of specific protocols and storage interfaces may become rapidly obsolete.

SUMMARY OF THE INVENTION

It has now been recognized that in many storage server applications a microprocessor/sequential program approach is inefficient and bulky. Most of the storage server's existence is spent executing storage server functions in response to client requests received at the NIC. The requests themselves are styled in a protocol-specific format with a limited number of requested functions and options. In response to these requests, well-defined storage system commands are supplied to a media device, and the results of those commands are used to return protocol-specific datagrams to the client.

Storage server programmability and code commonality are highly desirable traits.
Accordingly, high-volume processors and chipsets (such as those built by Intel Corporation and Advanced Micro Devices) have been used to allow programmability and the use of standard operating systems, and thereby reduce costs. Unfortunately, the flexibility and power of a general-purpose microprocessor-based system go largely unused, or used inefficiently, in such an operational configuration.

The present invention provides a different architecture for a storage server, using in exemplary embodiments what is referred to generally as a semantic processor.
Such a device is configurable, and preferably reconfigurable like a VN machine, as its processing depends on its "programming." As used herein, a "semantic processor" contains at least two components: a direct execution parser to parse an input stream according to a defined grammar; and an execution engine capable of manipulating data (e.g., data movement, mathematical, and logical operations) in response to requests from the direct execution parser.

The programming of a semantic processor differs from the conventional machine code used by a VN machine. In a VN machine the protocol input parameters of a data packet are sequentially compared to all possible inputs using sequential machine instructions, with branching, looping, and thread-switching inefficiencies. In contrast, the semantic processor responds directly to the semantics of an input stream. In other words, the "code" segments that the semantic processor executes are driven directly by the input data.
For instance, of the 75 or so possible CIFS commands that could appear in a packet, a single parsing cycle of an embodiment described below is all that is necessary to allow the semantic processor to load grammar and microinstructions pertinent to the actual CIFS command sent in the packet.

In a storage server application as described in the embodiments below, a semantic processor can perform many of the functions of the prior art VN processor, chipset, and attached devices. The semantic processor receives datagrams from a network port or datagram interface. Some of these datagrams will include client requests for data operations.

The semantic processor parses elements of the received datagrams using a parser table designed for the server protocol grammar(s) supported by the server. Based on what is parsed, the semantic processor transacts responsive data operations with a data storage device or devices. Generally, the data operations are performed by launching microinstruction code segments on a simple execution unit or units. As the data operations are performed, the semantic processor generates response datagrams to send back to the client.
The resulting operational efficiency allows an entire storage server to be collapsed in some embodiments into a few relatively small integrated circuits that can be placed on a media device's printed circuit board, with the semantic processor itself drawing perhaps a few Watts of power.

BRIEF DESCRIPTION OF THE DRAWING

The invention may be best understood by reading the disclosure with reference to the drawing, wherein:

Figure 1 contains a block diagram for a typical von Neumann machine storage server;
Figure 2 contains a block diagram for a generic storage server embodiment of the present invention;

Figure 3 illustrates, in block form, a more-detailed storage server embodiment of the invention;

Figure 4 shows the general structure of an Ethernet/IP/TCP/CIFS storage server client request frame;

Figure 5 shows details of a Storage Message Block portion of the CIFS frame structure of Figure 4;

Figure 6 illustrates, in block form, one semantic processor implementation useful with embodiments of the present invention;

Figure 7 shows a block diagram for a storage server embodiment with several physical data storage devices;

Figure 8 illustrates a block diagram for an embodiment of the invention that can be implemented on the printed circuit board of a data storage device;

Figure 9 shows a block diagram for am implementation with a semantic processor that communicates directly with the drive electronics of a disk;

Figure 10 shows a translator-storage server block diagram according to an embodiment of the invention;

Figure 11 illustrates a system embodiment wherein multiple physical data storage devices are accessed by the semantic processor over an external storage interface;

Figure 12 illustrates a system embodiment wherein multiple physical data storage devices are coupled to the semantic processor through a port extender;

Figure 13 illustrates yet another semantic processor implementation useful with embodiments of the present invention;

Figure 14 illustrates, in block form, a semantic processor useful with embodiments of the present invention;

Figure 15A shows one possible parser table construct useful with embodiments of the invention;

Figure 15B shows one possible production rule table organization useful with embodiments of the invention;

Figure 16 illustrates, in block form, one implementation for an input buffer useful with embodiments of the present invention;

Figure 17 illustrates, in block form, one implementation for a direct execution parser (DXP) useful with embodiments of the present invention;

Figure 18 contains a flow chart example for processing data input in the semantic processor in Figure 14;

Figure 19 illustrates yet another semantic processor implementation useful with embodiments of the present invention.

Figure 20 illustrates, in block form, one implementation for port input buffer (PIB) useful with embodiments of the present invention;

Figure 21 illustrates, in block form, another implementation for a direct execution parser (DXP) useful with embodiments of the present invention;

Figure 22 contains a flow chart example for processing data input in the semantic processor in Figure 19.

Figure 23 illustrates, in block form, a semantic processor useful with embodiments of the present invention;

Figure 24 contains a flow chart for the processing of received packets in the semantic processor with the recirculation buffer in Figure 23;

Figure 25 illustrates another more detailed semantic processor implementation useful with embodiments of the present invention;

Figure 26 contains a flow chart of received IP fragmented packets in the semantic processor in Figure 25;

Figure 27 contains a flow chart of received encrypted and/or unauthenticated packets in the semantic processor in Figure 25; and Figure 28 illustrates yet another semantic processor implementation useful with embodiments of the present invention;

Figure 29 contains a flow chart of received iSCSI packets through a TCP
connection in the semantic processor in Figure 28;

Figures 30-43 show the memory subsystem in more detail.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Figure 2 illustrates a high-level diagram representative of many embodiments of the present invention. A storage server comprises a semantic processor 100 with a datagram interface 90 and a storage interface 110. The datagram interface 90 provides client connectivity to the server, e.g., over network 80 as shown or via a point-to-point connection to the client. The storage interface 110 provides a path for the semantic processor to initiate data transactions with data storage device 120 in response to client requests.
The data storage device may be local. The data storage device alternately may be network-connected to the illustrated server, e.g., if the semantic processor 100 emulates a NAS server to clients, while at the same time using remote SAN servers for physical storage.

Figure 3 contains a more-detailed block diagram of a storage server 200, including a semantic processor 100. The datagram interface 90 connects buffer 130 of semantic processor 100 to a physical interface device (PHY) 92, e.g., an optical, electrical, or radio frequency driver/receiver pair for an Ethernet, Fibre Channel, 802.11 x, Universal Serial Bus, Firewire, or other physical layer interface. The datagram interface supplies an input digital data stream to buffer 130, and receives an output digital data stream from buffer 130.

In Figure 3, the storage interface 110 is implemented as a block I/O bus between a data storage device 120 and the semantic processor 100. In various embodiments, this bus can be a cabled connection (iSCSI, Fibre Channel, SATA) or circuit board bus (SATA), depending on the configured capability of the semantic processor and the data storage device.
Figure 3 shows major elements of a hard disk type of media device, e.g., a disk controller 122, drive electronics 124, and the disk platter(s), motor, and read/write heads 126.

One configuration for a semantic processor 100 is illustrated in the storage server of Figure 3. Semantic processor 100 contains a direct execution parser (DXP) 140 that controls the processing of input packets or frames received at buffer 130 (e.g., the input "stream").
DXP 140 maintains an internal parser stack of terminal and non-terminal symbols, based on parsing of the current frame up to the current symbol. When the symbol at the top 6f the parser stack is a terminal symbol, DXP 140 compares data at the head of the input stream to the terminal symbol and expects a match in order to continue. When the symbol at the top of the parser stack is a non-terminal symbol, DXP 140 uses the non-terminal symbol and current input data to expand the grammar production on the stack. As parsing continues, DXP 140 instructs semantic code execution engine (SEE) 150 to process segments of the input, or perform other operations.

This structure, with a sophisticated grammar parser that assigns tasks to an execution engine, as the data requires, is both flexible and powerful for highly structured input such as datagram protocols. In preferred embodiments, the semantic processor is reconfigurable by modifying its tables, and thus has the flexibility appeal of a VN machine.
Because the semantic processor responds to the input it is given, it can generally operate efficiently with a smaller instruction set than a VN machine. The instruction set also benefits because the semantic processor allows data processing in a machine context, as will be explained below.

Semantic processor 100 uses at least three tables to perform a given function.
Codes for retrieving production rules are stored in a parser table (PT) 170.
Grammatical production rules are stored in a production rule table (PRT) 180. Code segments for SEE
150 are stored in semantic code table (SCT) 190.

The codes in parser table 170 point to production rules in table 180. Parser table codes are stored, e.g., in a row-column format or a content-addressable format. In a row-column format, the rows of the table are indexed by a non-terminal code on the inte'rnal parser stack, and the columns of the table are indexed by an input data value at the head of the input (e.g., the symbol currently on the Si-Bus). In a content-addressable format, a concatenation of the non-terminal code and the input data value can provide the input to the table.

The production rule table 180 is indexed by the codes in parser table 170. The tables can be linked as shown in Figure 3, such that a query to the parser table will directly return the production rule applicable to the non-terminal code and input data value.
The direct execution parser replaces the non-terminal code at the top of its stack with the production rule returned from the PRT, and continues to parse its input.

Practically, codes for many different grammars can exist at the same time in a production rule table. For instance, one set of codes can pertain to MAC
(Media Access Control) packet header format parsing, and other sets of codes can pertain to Address Resolution Protocol (ARP) packet processing, Internet Protocol (IP) packet processing, Transmission Control Protocol (TCP) packet processing, Real-time Transport Protocol (RTP) packet processing, etc. Grammars for CIFS, NFS, and/or other storage server protocols are added to the production rule code memory in order to add storage server capability.- Non-terminal codes need not be assigned in any particular order in production rule code memory 122, nor in blocks pertaining to a particular protocol.

The semantic code table 190 can be indexed by parser table codes, and/or from production rule table 180. Generally, parsing results allow DXP 140 to detect whether, for a given production rule, a code segment from semantic code table 190 should be loaded and executed by SEE 150.

The semantic code execution engine 150 has an access path to machine context 160, which is a structured memory interface, addressable by contextual symbols.
Machine context 160, parser table 170, production rule table 180, and semantic code table 190 may use on-chip memory, external memory devices such as synchronous DRAMs and CAMs, or a combination of such resources. Each table or context may merely provide a contextual interface to a shared physical memory space with one or more of the other tables or contexts.

Detailed design optimizations for the functional blocks of semantic processor 100 are not within the scope of the present invention. For some examples of the detailed architecture of applicable semantic processor functional blocks, the reader is referred to copending application 10/351,030, which has been incorporated herein by reference.

The function of semantic processor 100 in a storage server context can be better understood with a specific example. In this example, CIFS commands and data structures are used with an Ethernet/IP/TCP supporting structure. Those skilled in the art will recognize that the concepts illustrated readily apply to other server communication protocols as well.
Figure 4 shows pertinent header/data blocks of an Ethernet/IP/TCP/CIFS frame (ignoring trailers and checksums). The MAC header will contain, among other things, the MAC address of server 200 for any frame intended for server 200. Server 200 can support several network and transport protocols, but this example uses an Internet Protocol (IP) network header and Transmission Control Protocol (TCP) transport protocol header.
Following the TCP header are a CIFS Storage Message Block (SMB) header HEADERI
and a SMB data buffer BUFFER1 containing data related to HEADERI. Many CIFS
opcodes can be combined with other CIFS opcodes in the same SMB if desired, as long as the maximum frame length is not exceeded. The additional headers for second, third, etc.

opcodes contain only the last few fields of the first header, with all other fields implied from the first header. As shown, the last SMB header contains HEADERN and BUFFERN.
Figure 5 shows additional detail for the first SMB header and buffer of a frame. A

full SMB header first indicates its protocol, i.e., a character OxFF indicates that this is an SMB header. A command character follows the protocol character, and indicates an operation code (opcode) for an operation that is either being requested or responded to. The status and flags field determine how other fields of the SMB should be interpreted, and/or whether an error occurred. The MAC signature (in this instance MAC stands for Media Authentication Code), when valid, can be used to authenticate messages.

The next four fields in the SMB are the TID, PID, UID, and MID fields. A TID
(Tree Identifier) is assigned by a server to a client when the client successfully connects to a server resource. The client uses the assigned TID to refer to that resource in later requests. A PID
(Process Identifier) is assigned by a client-although the server notes it and uses it to reply to the client, its use is primarily up to the client. A UID (User Identifier) is assigned by a server to identify the user that authenticated the connection. A MID (Multiplex Identifier) allows a client to use the same PID for multiple outstanding transactions, by using a different MID for each transaction.

The Parameters field is of variable length and structure, with its first character generally indicating the length of following parameters, followed by the parameter words appropriate for the SMB opcode. If the opcode is one that indicates a second SMB header and buffer follows, some of the parameter words indicate the follow-on opcode and offset to the follow-on shortened header.

The SMB ends with a byte count and buffer of byte count length. Such a buffer is used, e.g., to send a filename, transfer write data to a server, or transfer read data to a client.
With this background, an explanation of basic CIFS functionality for semantic processor 100 (Figure 3) can proceed.

Referring to Figures 3 and 4, each new frame received at buffer 130 starts with a MAC header. The MAC header contains a destination address, a source address, a payload type field, a payload, and a checksum. DXP 140 parses the destination address for a match to the address(es) assigned to storage server 200, and to a broadcast address. If the destination address fails to parse, the DXP launches a SEE code segment on SEE 150 to flush the frame.
Otherwise, the source address is consumed by a SEE and saved for use in reply, and the payload type field is parsed.

In this example, the MAC payload type parses to IP. The DXP consequently loads production rules for IP header parsing onto its parser stack, and works through the IP header.
This may include, for instance, parsing the destination IP address to ensure that the packet was intended for the storage server, dispatching a SEE to check the IP header checksum, update addressing tables, save the destination and source addresses for return packet and other processing, and/or reassemble IP fragmented packets. The protocol indicated in the IP
header is parsed to load production rules for the next header (in this case a TCP header) onto the DXP parser stack.

The TCP header is then parsed. A SEE is dispatched to load the TCP connection context associated with the destination IP address, source IP address, destination port, and source port (if a valid connection exists-a condition that is assumed for this example). The SEE then consumes the sequence number, acknowledgment number, window, checksum, and other fields necessary to update the connection context. The connection context is saved for use in reply.

Assuming that the MAC, IP, and TCP headers have parsed to a valid connection to the storage server, the next symbol on the input stream will indicate that the data contains an SMB request. DXP 140 parses this symbol and loads CIFS grammar production rules onto its stack.

The next input symbol will be matched with a non-terminal symbol for a CIFS
command (CIFS opcode). For instance, the parser table can contain an entry for each possible combination of CIFS opcode and this non-terminal symbol. When the CIFS
command is parsed, grammar pertinent to the CIFS opcode in the command field is loaded on to the parser stack.

The status and flags fields of the CIFS header can be parsed, but preferably are consumed by SEE 150 and saved in machine context 160 for use as necessary in interpreting packet contents.

The MAC signature, TID, PID, UID, and MID fields are also directed to a SEE, which saves the fields for use in constructing a return frame, and performs lookups in machine context 160 to authenticate the packet as originating from a valid source and directed to a proper resource.

The Parameters field format varies with CIFS opcode. Depending on the opcode, it may be preferable to parse elements of the Parameters field, or instruct SEE
150 to consume the Parameters using a segment of microcode. Several examples for common CIFS

commands will be given.

A CIFS NEGOTIATE client request is used by a new client to identify the CIFS
dialects that the client can understand. The Parameters field for NEGOTIATE
include a ByteCount, followed by ByteCount symbols that list the acceptable dialects as null-terminated strings. The production rule for NEGOTIATE prompts DXP 140 to cause SEE
150 to save the ByteCount, and then DXP 140 parses the first input string up until a NULL
character. If the string parses to a known dialect, DXP 140 causes SEE 150 to save.a code for that dialect to the current frame context. SEE 150 also determines whether ByteCount symbols have been parsed, indicating that all dialects have been parsed. If not, SEE 150 pushes a non-terminal symbol onto the top of the parser stack, causing DXP 140 to parse the remaining input for another dialect. This process continues until ByteCount symbols have been parsed. At that time, SEE 150 pushes a symbol onto the head of the input stream to cause DXP 140 to stop parsing for dialects. DXP 140 then finishes parsing the packet and instructs SEE 150 to select a dialect (e.g., according to a pre-programmed hierarchical preference) and send a response packet back to the client with parameters pertaining to a new session. SEE 150 also sets up a new session context within machine context 160.

When a client receives a NEGOTIATE reply, normally the client will send an SMB
for SESSION_SETUP_ANDX to complete the session. DXP 140, upon receiving this SMB, can parse the first Parameter (WordCount) and verify that it is correct for the opcode. The second Parameter, AndXCommand, indicates whether (and which) secondary command X

also appears in this frame, following this command (CIFS uses "AndX" to identify opcodes that can be concatenated with a following opcode to form a multi-opcode SMB).
If AndXCommand indicates that no second opcode is present (OxFF), DXP 140 can parse this and continue. If a separate opcode is present, processing is more complicated.

The second opcode can be handled in several ways. One way is to write a separate grammar variant for each possible combination of first and second opcodes.
This is feasible, but potentially inefficient, depending on parser table and production rule table constraints.
Another way is to use a multi-level grammar, with a higher level grammar parsing opcodes and a lower level grammar further processing each parsed opcode. A third method is to use a pushback mechanism from SEE 150. In this method, for instance, the AndXCommand Parameter loads a production rule that causes SEE 150 to save the AndXCommand from the input stream. When the first opcode has been completely parsed, SEE 150 is prompted to run a microcode segment that pushes the AndXCommand Parameter back onto the head of the input stream. DXP 140 then parses the new opcode and continues from that point, loading Parameter field grammar for the new opcode if a second opcode is present.
Additional AndX
commands in the SMB can be handled in the same manner.

The other Parameters of SESSION_SETUP_ANDX likely will be consumed by SEE
150 without parsing, as most contain parameters to be saved in the session context, or verified, in the case of a password. Null-terminated string parameters can be parsed to locate the null termination symbols, followed by an instruction to SEE 150 to save the symbols whose length is now determined.

Once the SESSION_SETUP_ANDX command has been parsed, SEE 150 can be commanded to build a response packet. If a second opcode was included, however, the packet will not be finalized until each opcode has been processed.

A LOGOFF_ANDX command is used to end a session. The primary function performed by the semantic processor in response to this opcode is to cause SEE
150 to remove the session context for that session from machine context 160.

A TREE CONNECT ANDX command is used to connect the session with a shared server resource indicated by a parameter string Path. The Path string included with this command could be parsed only for length, and/or parsed for correct server name. The remainder of the Path name could be parsed, although since valid paths can be created and destroyed frequently on a writeable resource, keeping the production rule codes correct for each directory may be challenging. Accordingly, the path would typically be passed to SEE
150 for opening. Alternately, DXP 140 could parse for "/" characters and pass the path to SEE 150 one level at a time.

SEE 150 traverses the specified path by reading directories stored on data storage device 120 across block I/O bus 110, starting with the root directory, and verifying that the requested path exists and is not restricted from the client. Assuming the path is valid and available, SEE 150 builds a response packet and saves the path in the session context.

The NT_CREATE_ANDX command creates or opens a file or directory. Like with the tree connect command, DXP 140 may hand off the bulk of this command to SEE
150 for block I/O transactions with data storage device 120. SEE 150 will open and/or create the file, if possible, by modifying the appropriate directory, assigning a file identifier (FID) to the open file, and creating a file context for the open file in machine context 160. SEE 150 then formats the appropriate response frame indicating the results of the file create/open request.
The READ_ANDX and WRITE_ANDX commands are used by a client to read and write data from/to an open FID. When the DXP parses down to the FID parameter, it signals SEE 150 to take FID off of the Si-Bus and locate the corresponding file context in machine context 160. SEE 150 then performs the appropriate block I/O transactions to read or write data on data storage device 120, and constructs a return packet to the client.
It is noted that a write operation return frame could optionally be generated and sent before all block I/O
transactions with the data storage device are completed.

The commands presented above are a subset of possible CIFS commands. Those skilled in the art can appreciate, from the examples above, how a semantic processor can implement full CIFS functionality. Further, the concepts exemplified by the semantic processor performing these CIFS commands are applicable in implementing other storage server protocols on a semantic processor.

Figure 6 shows another semantic processor embodiment 300. Semantic processor contains four semantic code execution engines 152, 154, 156, and 158. Semantic code execution engine 158 communicates with a block I/O circuit 112. Machine context 160 contains two functional units: a variable machine context data memory (VMCD) 162 and an array machine context data memory (AMCD) 164. Each SEE can transact with VMCD

across V-Bus and with AMCD 164 across A-Bus.

In semantic processor 300, when DXP 140 determines that a SEE task is to be launched at a specific point in its parsing, DXP 140 signals one of the SEEs to load microinstructions from SCT 140. The handle for some instruction segments may indicate to DXP 140 that it may choose any available SEE, or the handle may indicate that a specific SEE (for instance SEE 158, which has sole access to block I/O 112) should receive that code segment. The availability of multiple SEEs allows many tasks to proceed in parallel, without some slow tasks (like block 1/O) blocking all processing.

Although not strictly necessary, specific kinds of tasks also can be assigned to specific SEEs. For instance, SEE 152 can be a designated input protocol SEE, responsible for handling the input side of IP, TCP, and other protocols, and updating client, session, and file contexts with data from incoming CIFS frames. SEE 154 can be designated to perform file system operations, such as comprehending and updating directories, file allocation tables, and user/password lists, authenticating users and requests, etc. SEE 156 can be designated to handle the output side of protocols, e.g., building response frames. And SEE
158 can be designated to handle transactions with the data storage device(s). With such partitioning, one SEE can launch microcode on another SEE without having to go through DXP 140, which may have moved on to a different parsing task. For instance, SEE 154 can launch tasks on block I/O SEE 158 to retrieve or update directories. As another, example, output protocol SEE 156 can have a semaphore that can be set by another SEE when data is ready for a response packet.

Each SEE contains pipeline registers to allow machine-context access to data.
As opposed to a standard CPU, the preferred SEE embodiments have no notion of the physical data storage structure used for the data that they operate on. Instead, accesses to data take a machine-context transactional form. Variable (e.g., scalar) data is accessed on the V-bus;
array data is accessed on the A-bus; and input stream data is accessed on the Si-bus. For instance, to read a scalar data element of length m octets located at a given location offset within a data context ct, a SEE uses an instruction decoder to prompt the V-bus interface to issue a bus request {read, ct, offset, m}. The context mct refers to the master context of the semantic processor; other sub-contexts will usually be created and destroyed as the RSP
processes input data, such as a sub-context for each active CIFS session, each open -file, each open transaction, etc.

Once a SEE pipeline register has been issued a command, it handles the data transfer process. If multiple bus transfers are required to read or write m octets, the pipeline register tracks the transaction to completion. As an example, a six-octet field can be transferred from the stream input to a machine-context variable using two microinstructions: a first instruction reads six octets from the Si-bus to a pipeline register; a second instruction then writes the six octets from the register to the machine-context variable across the V-bus. The register interfaces perform however many bus data cycles are required to effect the transfer..

VMCD 162 serves the requests initiated on the V-bus. VMCD 162 has the capability to translate machine-context variable data requests to physical memory transactions. Thus VMCD 162 preferably maintains a translation table referencing machine context identifiers to physical memory starting addresses, contains a mechanism for allocating and deallocating contexts, allows contexts to be locked by a given SEE, and ensures that requested transactions do not fall outside of the requested context's boundaries. The actual storage mechanism employed can vary based on application: the memory could be completely internal, completely external, a mix of the two, a cache with a large external memory, etc.

An external memory can be shared with external memory for other memory sections, such as the AMCD, input and output buffers, parser table, production rule table, and semantic code table, in a given implementation.

The A-bus interface and AMCD 164 operate similarly, but with an array machine context organization. Preferably, different types of arrays and tables can be allocated, resized, deallocated, written to, read from, searched, and possibly even hashed or sorted using simple bus requests. The actual underlying physical memory can differ for different types of arrays and tables, including for example fast onboard RAM, external RAM or ROM, content-addressable memory, etc.

The storage server organization shown in Figure 3 is one of many possible functional partitions of a server according to an embodiment of the present invention.
Some other possible configurations are shown in Figures 7-10, as described below.

Figure 7 illustrates a storage server 500 with interfaces for one or more conventional SATA drives. A network connection 502 (such as a standard electrical or optical connector port or antenna) allows clients to communicate with a PHY 510 supporting a desired protocol, such as Fibre Channel, Ethernet, etc. For Ethernet, a commercially available PHY

such as a Broadcom BCM5421 or a Marvell 88E1011 can be used.

PHY 510 supplies input frames to, and drives output frames from, an RSP 520.
RSP
520 can be configured in one of the configurations described above, one of the configurations described in copending application 10/351,030, or any other functionally similar semantic processing configuration.

RAM 530 provides physical storage, e.g., for machine context and buffers, to RSP
520. RAM 530 may comprise several types of memory (DRAM, Flash, CAM, etc.), or a single type such as Synchronous DRAM. A boot ROM or boot Flash memory can be used to initialize RSP 520 when parser, production rule, and semantic code tables are stored in volatile memory during operation. Part of the non-volatile table storage could also exist on the data storage device(s), as long as enough code is available in boot memory to allow RSP
520 to communicate with the data storage device(s).

A SATA controller 540 connects to a block I/O port of RSP 520 to serve the disk access requests of RSP 520. SATA controller 540 can be, e.g., a commercially available SATA controller such as a SII 3114 or a Marvel188SX5080.

Serial ATA coritroller 540 connects to one or more SATA data storage devices via SATA bus(es). As shown, storage server 500 supports drives 550-0 through 550-N, respectively, through SATA buses BUSO through BUSN.

PHY 510, RSP 520, RAM 530, and SATA controller 540 are preferably interconnected on a common printed circuit board (not shown). The circuit board can be arranged with the drives in a common enclosure, with SATA cabling providing BUSO
through BUSN. Alternately, SATA bus signaling can be routed on the printed circuit board to connectors for each drive, or through a connector to a backplane.

Another storage server implementation 600 is shown in Figure 8. Preferably, in this implementation the entire server is implemented on the printed circuit board of a disk drive, with a network connection 602 for providing a connection to storage server clients. Although storage server architecture 500 could possibly be packaged in the same way, storage server 600 achieves space and cost savings by integrating a PHY 610 and an SATA
controller 640 within an RSP 620. SATA controller 640 preferably is at least partially implemented using a SEE.

The SATA controller section of RSP 620 interfaces through an SATA cable-or circuit board bus to drive electronics 660, which include an SATA interface, disk cache, and drive control electronics.

With an entire storage server implemented on a common circuit board within a media device, it is possible to remove the SATA interface altogether, as shown in storage server 700 of Figure 9. In Figure 9, disk controller functionality is implemented within an RSP 740.

The control devices (such as a Marvell 88C7500), motors, and servos shown as block 770 interface directly with RSP 740.

Figure 10 illustrates a storage server 800 that is essentially a "translation"
gateway.
Storage server 800 comprises a first network physical interface PHY1 (block 810) to connect to a network 802, and consequently to storage server clients. Storage server interface 800 also contains a second network physical interface PHY2 (block 840) to connect, across a network or point-to-point connection, to one or more physical servers, such as the illustrated SAN server 850. An RSP 820, with attached memory 830, attaches to PHY1 and PHY2.

PHY1 can be, e.g., an Ethernet PHY, and PHY2 can be, e.g., a Fibre Channel PHY.

In operation, RSP 820 responds to client requests, such as NAS-style or application-style requests, by initiating requests on the remotely-attached server 850.
Thus server 800 appears as a server to clients on network 802, and as a client to server 850.
Although server 850 is illustrated as a SAN server, server 850 could be a NAS or even an application-style server. If PHY 1 and PHY2 support the same server protocol, storage server 800 can still serve a useful function as an aggregation point for a scalable server farm, an encryption point, and/or firewall.

When both PHY1 and PHY2 are supplying datagrams to RSP 820, RSP 820 can provide parsing for both input streams. For instance, both PHYI and PHY2 input streams can be saved to a common input buffer, and a DXP (not shown) in RSP 820 can alternately parse datagrams in both input streams, thus coordinating two-way translation tasks. RSP 820 could also be configured with two DXPs, one to serve each physical port, sharing a common bank of SEEs and other RSP resources.

The preceding embodiments can be adapted to other system architectures that provide access to multiple physical data storage devices, such as those shown in Figures 11 and 12.
In Figure 11, a storage server 900 contains a storage interface connecting RSP
100 to multiple physical data storage devices, e.g., four devices 120-1, 120-2, 120-3, and 120-4 as shown. These data storage devices can be accessed in some embodiments as a RAID
(Redundant Array of Independent Disks) array, with either the RSP acting as a RAID

controller, or with a separate RAID controller (not shown) implemented as part of storage interface 110. The four data storage devices can alternately be configured as a JBOD (Just a Bunch Of Disks) array.

Generally, RSP 100 can perform other forms of storage server virtualization for its clients. In one alternate form, disk sector addresses used by a client application are mapped by the RSP into C-H-S (Cylinder-Head-Sector) addresses of a specific device from one of many physical devices. In such a configuration, data storage devices 120-1 through 120-4 (and potentially many more such devices) need not be geographically co-located, allowing sophisticated control of physical resources and disk allocation for clients.
The RSP can also function within a virtualization configuration where a separate intermediate device provides all or part of the virtualization functionality for clients.

Figure 12 illustrates the use of an RSP 100 in a storage server system 1000 that incorporates a port extender 950. Port extender 950 connects to a single SATA
controller port associated with RSP 100 (either internal or external), but communicates in turn with multiple physical data storage devices (e.g., devices 120-1 to 120-4 as shown). Such a configuration can increase overall system bandwidth while using only one port on the RSP
for communication with data storage devices.

Figure 13 shows yet another semantic processor implementation 1100 useful with embodiments of the present invention. Semantic processor 1100 communicates with a Port 0 PHY 93, a Port 1 PHY 94, and a PCI-X interface 95, each of which can be integrated with processor 1100 or connected externally to processor 1100, as desired. Buffer 130 of Figure 3 is replaced with a Port Input Buffer (PIB) 132 and a Port Output Buffer (POB) 134; which preferably, but not necessarily, reside in a common memory section integrated as part of processor 1100. PIB 132 connects to and maintains at least one input queue for each of Port 0, Port 1, and the PCI-X interface, and may contains other queues as well. POB
134 connects to and maintains at least one output queue for each of Port 0, Port 1, and the PCI-X interface, and may contain other queues as well. Port 0 and Port 1 typically represent datagram interfaces such as Gigabit Ethernet, while the PCI-X interface couples to a familiar PXI-X
bus. Depending on the storage server design, storage resources can couple to any of these ports, and clients can couple to either Port 1, Port 2, or both.

Unlike Figure 3, in Figure 13 direct execution parser 140 can receive input from each of the multiple input queues in PIB 132, through a parser source selector 136.
For instance, direct execution parser 140 can maintain separate parser stacks for each input source, and can signal parser source selector to switch input sources each time a packet has finished parsing, or when parsing of one packet is stalled, e.g., while a SEE performs some calculation on a packet field.

SEE cluster 152 comprises N semantic code execution engines 150-1 to 150-N.
The SEE cluster preferably includes a single dispatcher (not shown) to communicate with DXP
140 over the Sx-Bus, and to distribute tasks to the individual SEEs.

A PIB read controller 133 and a POB write controller block 135 provide access, respectively, to PIB 132 and POB 134 for SEE cluster 152. As shown, parser source selector 136 and PIB read controller 133 allow DXP 140 to work with data from one input while the SEE cluster accesses data from another input. PIB read controller 133 and POB
write controller 135 preferably allow non-blocking access to the input and output buffers.

A machine context port controller 154 provides access to machine context 160 for SEE cluster 152. Like the port buffer controllers, machine context port controller 154 provides non-blocking access to machine context 160 for SEE cluster 152.

Machine context 160 prioritizes and executes memory tasks for SEE cluster 152 and for a management microprocessor 195, and typically contains one or more specialized caches that each depend on the attributes of the target data access. Machine context 160 can also contains one or more encryption and/or authentication engines that can be used to perform inline encryption/authentication. One or more traditional memory bus interfaces connect machine context 160 with physical memory 165, which can consist, e.g., of DRAM

(Dynamic Random Access Memory), CAM (Content Addressable Memory), and/or any other type of desired storage. Physical memory 165 can be located on processor 1100, external to processor 1100, or split between these two locations.

Management microprocessor 195 performs any desired functions for semantic processor 1100 that can reasonably be accomplished with traditional software.
For instance, microprocessor 195 can interface with its own instruction space and data space in physical memory 165, through machine context 160, and execute traditional software to boot the processor, load or modify the parser table, production rule table, and semantic code table, gather statistics, perform logging, manage client access, perform error recovery, etc.
Preferably, microprocessor 195 also has the capability to communicate with the dispatcher in SEE cluster 152 in order to request that a SEE perform tasks on the microprocessor's behalf.
The management microprocessor is preferably integrated into semantic processor 1100.

Within the preceding embodiments, the semantic units can encrypt the blocks of data as the blocks are written to disk and/or decrypt the blocks of data as the blocks are read from disk. This provides security to the data "at rest" on the disk drive or drives. For instance, as the semantic units receive data payloads for writing to disk, one operation in preparing the data payloads for writing can include encrypting the packets. The reverse of this process can be used for reading encrypted data from the disk.

Although special-purpose execution engines have been shown and described, an alternative implementations can use the parser as a front-end datagram processor for a general purpose processor.

As exemplified by the preceding embodiments, many different semantic-processor based storage servers fall within the scope of the present invention. At the low-functionality end, a SAN-style server, e.g., connected by Fibre Channel to an enterprise network,- is possible. NAS-style servers include more sophisticated features, such as those described in the detailed CIFS embodiment described above. Translation-gateway servers provide clients with access to a potentially large and changing underlying array of physical servers without allowing those clients visibility to the physical servers.

Both translation-gateway servers and physical servers can be structured as application style-servers. As an application-style server, data is served in an application-style format.
For instance, a video-on-demand server can store videos for usage by multiple clients, and stream different parts of a video to different clients, allowing each to have independent navigation. A music-on-demand server can operate similarly.

Application-style servers can also supply storage space for applications, like a wired or wireless server that stores digital camera pictures off of a digital camera, allowing the camera to operate with a relatively small internal buffer and/or flash card.
Such a server could be relatively small, battery-powered, and portable for field use.

Semantic processor storage server implementations can also implement wireless protocols, e.g., for use in a home computer data and/or audio/video network.
Although the detailed embodiments have focused on traditional "storage" devices, printers, scanners, multi-function printers, and other data translation devices can also be attached to a semantic processor storage server.

One of ordinary skill in the art will recognize that the concepts taught herein can be tailored to a particular application in many other advantageous ways. It is readily apparent that a semantic processor storage server can be made to serve different client types by changing the protocols that can be parsed by the server. If desirable, the storage server could even parse multiple storage server protocols simultaneously to allow access by different classes of clients (e.g., SAN access by trusted clients and NAS access by others).
Figure 14 shows a block diagram of a semantic processor 2100 according to an embodiment of the invention. The semantic processor 2100 contains an input buffer 2300 for buffering a data stream (e.g., the input "stream") received through the input port 2110, a direct execution parser (DXP) 2400 that controls the processing of packets in the input buffer 2300, a semantic processing unit 2140 for processing segments of the packets or for performing other operations, and a memory subsystem 2130 for storing or augmenting segments of the packets.

The DXP 2400 maintains an internal parser stack 2430 of non-terminal (and possibly also terminal) symbols, based on parsing of the current input frame or packet up to the current input symbol. When the symbol (or symbols) at the top of the parser stack 2430 is a terminal symbol, DXP 2400 compares data DI at the head of the input stream to the terminal symbol and expects a match in order to continue. When the symbol at the top of the parser stack 2430 is a non-terminal (NT) symbol, DXP 400 uses the non-terminal symbol NT and current input data DI to expand the grammar production on the stack 2430. As parsing continues, DXP 2400 instructs SPU 2140 to process segments of the input, or perform other operations.

Semantic processor 2100 uses at least three tables. Code segments for SPU 2140 are stored in semantic code table 2150. Complex grammatical production rules are stored in a production rule table (PRT) 2250. Production rule (PR) codes for retrieving those production rules are stored in a parser table (PT) 2200. The PR codes in parser table 2200 also allow DXP 2400 to detect whether, for a given production rule, a code segment from semantic code table 2150 should be loaded and executed by SPU 2140.

The production rule (PR) codes in parser table 2200 point to production rules in production rule table 2250. PR codes are stored, e.g., in a row-column format or a content-addressable format. In a row-column format, the rows of the table are indexed by a non-terminal symbol NT on the top of the internal parser stack 2430, and the columns of the table are indexed by an input data value (or values) DI at the head of the input. In a content-addressable format, a concatenation of the non-terminal symbol NT and the input data value (or values) DI can provide the input to the table. Preferably, semantic processor 2100 implements a content-addressable format, where DXP 2400 concatenates the non-terminal symbol NT with 8 bytes of current input data DI to provide the input to the parser table.
Optionally, parser table 2200 concatenates the non-terminal symbol NT and 8 bytes of current input data DI received from DXP 2400.

Some embodiments of the present invention contain more elements than those shown in Figure 14. For purposes of understanding the operation of the present invention,,however, those elements are peripheral and are omitted from this disclosure.

General parser operation for some embodiments of the invention will first be explained with reference to Figures 14, 15A, 15B, 16, and 17. Figure 15A
illustrates one possible implementation for a parser table 2200. Parser table 2200 is comprised of a production rule (PR) code memory 2220. PR code memory 2220 contains a plurality of PR
codes that are used to access a corresponding production rule stored in the production rule table (PRT) 2250. Practically, codes for many different grammars can exist at the same time in production rule code memory 2220. Unless required by a particular lookup implementation, the input values (e.g., a non-terminal (NT) symbol concatenated with current input values DI[n], where n is a selected match width in bytes) need not be assigned in any particular order in PR code memory 2220.

In one embodiment, parser table 2200 also includes an addressor 2210 that receives an NT symbol and data values DI[n] from DXP 2400. Addressor 2210 concatenates an NT
symbol with the data values DI[n], and applies the concatenated value to PR
code memory 2220. Optionally, DXP 2400 concatenates the NT symbol and data values DI[n]
prior to transmitting them to parser table 2200.

Although conceptually it is often useful to view the structure of production rule code memory 2220 as a matrix with one PR code for each unique combination of NT
code and data values, the present invention is not so limited. Different types of memory and memory organization may be appropriate for different applications.

For example, in an embodiment of the invention, the parser table 2200 is implemented as a Content Addressable Memory (CAM), where addressor 2210 uses an NT code and input data values DI[n] as a key for the CAM to look up the PR code corresponding to a production rule in the PRT 2250. Preferably, the CAM is a Ternary CAM (TCAM) populated with TCAM entries. Each TCAM entry comprises an NT code and a DI[n] match value.
Each NT

code can have multiple TCAM entries. Each bit of the DI[n] match value can be set to "0", "1 ", or "X" (representing "Don't Care"). This capability allows PR codes to require that only certain bits/bytes of DI[n] match a coded pattern in order for parser table 2200 to find a match. For instance, one row of the TCAM can contain an NT code NT_IP for an IP

destination address field, followed by four bytes representing an IP
destination address corresponding to a device incorporating semantic processor. The remaining four bytes of the TCAM row are set to "don't care." Thus when NT_IP and eight bytes DI[8] are submitted to parser table 2200, where the first four bytes of DI[8] contain the correct IP
address, a match will occur no matter what the last four bytes of DI[8] contain.

Since, the TCAM employs the "Don't Care" capability and there can be multiple TCAM entries for a single NT, the TCAM can find multiple matching TCAM entries for a given NT code and DI[n] match value. The TCAM prioritizes these matches through its hardware and only outputs the match of the highest priority. Further, when a NT code and a DI[n] match value are submitted to the TCAM, the TCAM attempts to match every TCAM

entry with the received NT code and DI[n] match code in parallel. Thus, the TCAM has the ability to determine whether a match was found in parser table 2200 in a single clock cycle of semantic processor 2100.

Another way of viewing this architecture is as a "variable look-ahead" parser.
Although a fixed data input segment, such as eight bytes, is applied to the TCAM, the TCAM
coding allows a next production rule to be based on any portion of the current eight bytes of input. If only one bit, or byte, anywhere within the current eight bytes at the head of the input stream, is of interest for the current rule, the TCAM entry can be coded such that the rest are ignored during the match. Essentially, the current "symbol" can be defined for a given production rule as any combination of the 64 bits at the head of the input stream. By intelligent coding, the number of parsing cycles, NT codes, and table entries can generally be reduced for a given parsing task.

The TCAM in parser table 2200 produces a PR code corresponding to the TCAM
entry 2230 matching NT and DI[n], as explained above. The PR code can be sent back to DXP 2400, directly to PR table 2250, or both. In one embodiment, the PR code is the row index of the TCAM entry producing a match.

When no TCAM entry 2230 matched NT and DI[n], several options exist. In one embodiment, the PR code is accompanied by a "valid" bit, which remains unset if no TCAM
entry matched the current input. In another embodiment, parser table 2200 constructs a default PR code corresponding to the NT supplied to the parser table. The use of a valid bit or default PR code will next be explained in conjunction with Figure 15B.

Parser table 2200 can be located on or off-chip or both, when DXP 2400 and SPU
2140 are integrated together in a circuit. For instance, static RAM (SRAM) or TCAM
located on-chip can serve as parser table 2200. Alternately, off-chip DRAM or TCAM
storage can store parser table 2200, with addressor 2210 serving as or communicating with a memory controller for the off-chip memory. In other embodiments, the parser table 2200 can be located in off-chip memory, with an on-chip cache capable of holding a section of the parser table 2200.

Figure 15B illustrates one possible implementation for production rule table 2250.
PR table 2250 comprises a production rule memory 2270, a Match All Parser entries Table (MAPT) memory 2280, and an addressor 2260.

In one embodiment, addressor 2260 receives PR codes from either DXP 2400 or parser table 2200, and receives NT symbols from DXP 2400. Preferably, the received NT
symbol is the same NT symbol that is sent to parser table 2200, where it was used to locate the received PR code. Addressor 2260 uses these received PR codes and NT
symbols to access corresponding production rules and default production rules, respectively. In a preferred embodiment of the invention, the received PR codes address production rules in production rule memory 2270 and the received NT codes address default production rules in MAPT 2280. Addressor 2260 may not be necessary in some implementations, but when used, can be part of DXP 2400, part of PRT 2250, or an intermediate functional block. An addressor may not be needed, for instance, if parser table 2200 or DXP 2400 constructs addresses directly.

Production rule memory 2270 stores the production rules 2262 containing three data segments. These data segments include: a symbol segment, a SPU entry point (SEP) segment, and a skip bytes segment. These segments can either be fixed length segments or variable length segments that are, preferably, null-terminated. The symbol segment contains terminal and/or non-terminal symbols to be pushed onto the DXP's parser stack 2430 (Figure 17). The SEP segment contains SPU entry points (SEP) used by the SPU 2140 in processing segments of data. The skip bytes segment contains skip bytes data used by the input buffer 300 to increment its buffer pointer and advance the processing of the input stream. Other information useful in processing production rules can also be stored as part of production rule 2262.

MAPT 2280 stores default production rules 2264, which in this embodiment have the same structure as the PRs in production rule memory 2270, and are accessed when a PR code cannot be located during the parser table lookup.

Although production rule memory 2270 and MAPT 2280 are shown as two separate memory blocks, the present invention is not so limited. In a preferred embodiment of the invention, production rule memory 2270 and MAPT 2280 are implemented as on-chip SRAM, where each production rule and default production rule contains multiple null-terminated segments.

As production rules and default production rules can have various lengths, it is preferable to take an approach that allows easy indexing into their respective memories 2270 and 2280. In one approach, each PR has a fixed length that can accommodate a fixed maximum number of symbols, SEPs, and auxiliary data such as the skip bytes field. When a given PR does not need the maximum number of symbols or SEPs allowed for, the sequence can be terminated with a NULL symbol or SEP. When a given PR would require more than the maximum number, it can be split into two PRs, accessed, e.g., by having the first issue a skip bytes value of zero and pushing an NT onto the stack that causes the second to be accessed on the following parsing cycle. In this approach, a one-to-one correspondence between TCAM entries and PR table entries can be maintained, such that the row address obtained from the TCAM is also the row address of the corresponding production rule in PR
table 250.

The MAPT 2280 section of PRT 2250 can be similarly indexed, but using NT codes instead of PR codes. For instance, when a valid bit on the PR code is unset, addressor 2260 can select as a PR table address the row corresponding to the current NT. For instance, if 2256 NTs are allowed, MAPT 2280 could contain 2256 entries, each indexed to one of the NTs. When parser table 2200 has no entry corresponding to a current NT and data input DI[n], the corresponding default production rule from MAPT 2280 is accessed.

Taking the IP destination address again as an example, the parser table can be configured, e.g., to respond to one of two expected destination addresses during the appropriate parsing cycle. For all other destination addresses, no parser table entry would be found. Addressor 2260 would then look up the default rule for the current NT, which would direct the DXP 2400 and/or SPU 2140 to flush the current packet as a packet of no interest.

Although the above production rule table indexing approach provides relatively straightforward and rapid rule access, other indexing schemes are possible.
For variable-length PR table entries, the PR code could be arithmetically manipulated to determine a production rule's physical memory starting address (this would be possible, for instance, if the production rules were sorted by expanded length, and then PR codes were assigned according to a rule's sorted position). In another approach, an intermediate pointer table can be used to determine the address of the production rule in PRT 2250 from the PR code or the default production rule in MAPT 2280 from the NT symbol.

The use of the symbols, SEPs, and skip bytes values from a production rule 2262 or 2264 will be explained further below, after one additional functional unit, the input buffer 2300, is explained in further detail.

Figure 16 illustrates one possible implementation for input buffer 2300 useful with embodiments of the invention. Input buffer 2300 is comprised of: a buffer 2310 that receives data through input port 2110; a control block 2330 for controlling the data in buffer 2310; an error check (EC) block 2320 for checking the received data for transmission errors; a FIFO
block 2340 to allow DXP 2400 FIFO access to data in buffer 2310, and a random access (RA) block 2350 to allow SPU 2140 random access to the data in buffer 2310.
Preferably, EC block 2320 determines if a received data frame or packet contains errors by checking for inter-packet gap (IPG) violations and Ethernet header errors, and by computing the Cyclic Redundancy Codes (CRC).

When a packet, frame, or other new data segment is received at buffer 2310 through input port 2110, input buffer 2300 transmits a Port ID to DXP 2400, alerting DXP 2400 that new data has arrived. EC block 2320 checks the new data for errors and sets status bits that are sent to DXP 2400 in a Status signal. When DXP 2400 decides to parse through the headers of a received data segment, it sends a Control_DXP signal to input buffer 2300 asking for a certain amount of data from buffer 2310, or requesting that buffer 2310 increment its data head pointer without sending data to DXP 2400. Upon receipt of a Control_DXP signal, control block 2330 transmits a Data DXP signal, containing data from buffer 2310 (if requested), to DXP 2400 through FIFO block 2340. In an embodiment of the invention, the control block 330 and FIFO block 340 add control characters into the data segment as it is sent to DXP 2400 in the Data_DXP signal. Preferably, the control characters include 1-bit status flags that are added at the beginning of each byte of data transferred and denote whether the subsequent byte of data is a terminal or non-terriminal symbol. The control characters can also include special non-terminal symbols, e.g., start-of-packet, end-of-packet, port_ID, etc.

When SPU 2140 receives a SPU entry point (SEP) from DXP 2400 that reqiuires SPU
2140 to access data within the input buffer data, the SPU 2140 sends a Control_SPU signal to input buffer 2300 requesting the data at a certain location in buffer 2310.
Upon receipt of the Control_SPU signal, control block 2330 transmits a Sideband signal to SPU 2140 and subsequently transmits a Data SPU signal, containing data from buffer 2310, to through RA block 2350. The Sideband signal, preferably, indicates how many bytes of data being sent are valid and if there is error in the data stream. In an embodiment of the invention, the control block 2330 and RA block 2350 add control characters into the data stream as it is sent to SPU 2140. Preferably, the control characters include appending a computed CRC value and error flag, when necessary, to the end of a packet or frame in the data stream.

Figure 17 shows one possible block implementation for DXP 2400. Parser control finite state machine (FSM) 2410 controls and sequences overall DXP 2400 operation, based on inputs from the other logical blocks in Figure 17. Parser stack 2430 stores the symbols to be executed by DXP 2400. Input stream sequence control 2420 retrieves input data values from input buffer 2300, to be processed by DXP 2400. SPU interface 2440 dispatches tasks to SPU 2140 on behalf of DXP 2400. The particular functions of these blocks will be further described below.

The basic operation of the blocks in Figures 14-17 will now be described with reference to the flowchart for data stream parsing in Figure 18. The flowchart 2500 is used for illustrating a method embodiment of the invention.

According to a block 2510, semantic processor 2100 waits for a packet to be received at input buffer 2300 through input port 2110.

The next decision block 2512 determines whether a packet was received in block 2510. If a packet has not yet been received, processing returns to block 2510 where semantic processor 2100 waits for a packet to be received. If a packet has been received at input buffer 2300, according to a next block 2520, input buffer 2300 sends a Port ID signal to DXP 2400, where it is pushed onto parser stack 2430 as a NT symbol. The Port ID signal alerts DXP
2400 that a packet has arrived at input buffer 2300. In a preferred embodiment of the invention, the Port ID signal is received by the input stream sequence contro12420 and transferred to FSM 2410, where it is pushed onto parser stack 2430.
Preferably, a 1-bit status flag, preceding or sent in parallel with the Port ID, denotes the Port ID as an NT symbol.

According to a next block 2530, DXP 2400, after determining that the symbol on the top of parser stack 2430 is not the bottom-of-stack symbol and that the DXP is not waiting for further input, requests and receives N bytes of input stream data from input buffer 2300.
DXP 2400 requests and receives the data through a DATA/CONTROL signal coupled between the input stream sequence contro12420 and input buffer 2300.

The next decision block 2532 determines whether the symbol on the parser stack is a terminal (T) or a NT symbol. This determination is preferably performed by FSM 2410 reading the status flag of the symbol on parser stack 2430.

When the symbol is determined to be a terminal symbol, according to a next block 2540, DXP 2400 checks for a match between the T symbol and the next byte of data from the received N bytes. FSM 2410 checks for a match by comparing the next byte of data received by input stream sequence control 2420 to the T symbol on parser stack 2430.
After the check is completed, FSM 2410 pops the T symbol off of the parser stack 2430, preferably by decrementing the stack pointer.

The next decision block 2542 determines whether there was a match between the T

symbol and the next byte of data. If a match is made, execution returns to block 2530, where DXP 2400, after determining that the symbol on the parser stack 2430 is not the bottom-of-stack symbol and that it is not waiting for further input, requests and receives additional input stream data from input buffer 2300. In a preferred embodiment of the invention, DXP 2400 would only request and receive one byte of input stream data after a T symbol match was made, to refill the DI buffer since one input symbol was consumed.

When a match was not made, the remainder of the current data segment may be assumed in some circumstances to be unparseable. According to a next block 2550, DXP
2400 resets parser stack 2430 and launches a SEP to remove the remainder of the current packet from the input buffer 2300. In an embodiment of the invention, FSM 2410 resets parser stack 2430 by popping off the remaining symbols, or preferably by setting the top-of-stack pointer to point to the bottom-of-stack symbol. DXP 2400 launches a SEP
by sending a command to SPU 2140 through SPU interface 2440. This command requires that SPU
2140 load microinstructions from SCT 2150, that when executed, enable SPU 2140 to remove the remainder of the unparseable data segment from the input buffer 2300. Execution then returns to block 2510.

It is noted that not every instance of unparseable input in the data stream may result in abandoning parsing of the current data segment. For instance, the parser may be configured to handle ordinary header options directly with grammar. Other, less common or difficult header options could be dealt with using a default grammar rule that passes the header options to a SPU for parsing.

When the symbol in decision block 2532 is determined to be an NT symbol, according to a next block 2560, DXP 2400 sends the NT symbol from parser stack 2430 and the received N bytes DI[N] in input stream sequence control 2420 to parser table 2200, where parser table 2200 checks for a match, e.g., as previously described. In the illustrated embodiment, parser table 2200 concatenates the NT symbol and the received N
bytes.
Optionally, the NT symbol and the received N bytes can be concatenated prior to being sent to parser table 2200. Preferably, the received N bytes are concurrently sent to both SPU
interface 2440 and parser table 2200, and the NT symbol is concurrently sent to both the parser table 2200 and the PRT 2250. After the check is completed, FSM 2410 pops the NT

symbol off of the parser stack 2430, preferably by decrementing the stack pointer.

The next decision block 2562 determines whether there was a match in the parser table 2200 to the NT symbol concatenated with the N bytes of data. If a match is made, according to a next block 2570, the parser table 2200 returns a PR code to PRT

corresponding to the match, where the PR code addresses a production rule within PRT 2250.

Optionally, the PR code is sent from parser table 2200 to PRT 2250, through DXP 2400.
Execution then continues at block 2590.

When a match is not made, according to a next block 2580, DXP 2400 uses the received NT symbol to look up a default production rule in the PRT 2250. In a preferred embodiment, the default production rule is looked up in the MAPT 2280 memory located within PRT 2250. Optionally, MAPT 2280 memory can be located in a memory block other than PRT 2250.

In a preferred embodiment of the invention, when PRT 2250 receives a PR code, it only returns a PR to DXP 2400, corresponding either to a found production rule or a default production rule. Optionally, a PR and a default PR can both be returned to DXP
2400, with DXP 2400 determining which will be used.

According to a next block 2590, DXP 2400 processes the rule received from PRT
2250. The rule received by DXP 2400 can either be a production rule or a default production rule. In an embodiment of the invention, FSM 2410 divides the rule into three segments, a symbol segment, SEP segment, and a skip bytes segment. Preferably, each segment of the rule is fixed length or null-terminated to enable easy and accurate division.

In the illustrated embodiment, FSM 2410 pushes T and/or NT symbols, contained in the symbol segment of the production rule, onto parser stack 2430. FSM 2410 sends the SEPs contained in the SEP segment of the production rule to SPU interface 2440. Each SEP
contains an address to microinstructions located in SCT 2150. Upon receipt of the SEPs, SPU

interface 2440 allocates SPU 2140 to fetch and execute the microinstructions pointed to by the SEP. SPU interface 2440 also sends the current DI[N] value to SPU 2140, as in many situations the task to be completed by the SPU will need no further input data. Optionally, SPU interface 2440 fetches the microinstructions to be executed by SPU 2140, and sends them to SPU 2140 concurrent with its allocation. FSM 2410 sends the skip bytes segment of the production rule to input buffer 300 through input stream sequence control 2420. Input buffer 2300 uses the skip bytes data to increment its buffer pointer, pointing to a location in the input stream. Each parsing cycle can accordingly consume any number of input symbols between 0 and 8.

After DXP 2400 processes the rule received from PRT 2250, the next decision block 2592 determines whether the next symbol on the parser stack 2430 is a bottom-of-stack symbol. If the next symbol is a bottom-of-stack symbol, execution returns to block 2510, where semantic processor 2100 waits for a new packet to be received at input buffer 2300 through input port 2110.

When the next symbol is not a bottom-of-stack symbol, the next decision block determines whether DXP 2400 is waiting for further input before it begins processing the next symbol on parser stack 2430. In the illustrated embodiment, DXP 2400 could wait for SPU 2140 to begin processing segments of the input stream, SPU 2140 to return processing result data, etc.

When DXP 2400 is not waiting for further input, execution returns to block 2530, where DXP 2400 requests and receives input stream data from input buffer 2300.
When DXP 2400 is waiting for further input, execution returns to block 2594 until the input is received.

Figure 19 shows yet another semantic processor embodiment. Semantic processor 2600 contains a semantic processing unit (SPU) cluster 2640 containing a plurality of semantic processing units (SPUs) 2140-1 to 2140-N. Preferably, each of the SPUs 2140-1 to 2140-N are identical and have the same functionality. SPU cluster 2640 is coupled to the memory subsystem 2130, a SPU entry point (SEP) dispatcher 2650, the SCT 2150, a port input buffer (PIB) 2700, a port output buffer (POB) 2620, and a machine central processing unit (MCPU) 2660.

When DXP 2800 determines that a SPU task is to be launched at a specific point in parsing, DXP 2800 signals SEP dispatcher 2650 to load microinstructions from semantic code table (SCT) 2150 and allocate a SPU from the plurality of SPUs 2140-1 to within the SPU cluster 2640 to perform the task. The loaded microinstructions indicate the task to be performed and are sent to the allocated SPU. The allocated SPU then executes the microinstructions and the data in the input stream is processed accordingly.
The SPU can optionally load microinstructions from the SCT 150 directly when instructed by the SEP
dispatcher 2650.

Referring to Figure 20 for further detail, PIB 2700 contains at least one network interface input buffer 2300 (2300-0 and 2300-1 are shown), a recirculation buffer 2710, and a Peripheral Component Interconnect (PCI-X) input buffer 2300_2. POB 2620 contains (not shown) at least one network interface output buffer and a PCI-X output buffer.
The port block 2610 contains one or more ports, each comprising a physical interface, e.g., an optical, electrical, or radio frequency driver/receiver pair for an Ethernet, Fibre Channel, 802.11 x, Universal Serial Bus, Firewire, SONET, or other physical layer interface.
Preferably, the number of ports within port block 2610 corresponds to the number of network interface input buffers within PIB 2700 and the number of output buffers within POB 2620.

Referring back to Figure 19, PCI-X interface 2630 is coupled to the PCI-X
input buffer within PIB 2700, the PCI-X output buffer within POB 2620, and an external PCI
bus 2670. The PCI bus 2670 can connect to other PCI-capable components, such as disk drives, interfaces for additional network ports, etc.

The MCPU 2660 is coupled with the SPU cluster 2640 and memory subsystem 2130.
MCPU 2660 performs any desired functions for semantic processor 2600 that can reasonably be accomplished with traditional software. These functions are usually infrequent, non-time-critical functions that do not warrant inclusion in SCT 2150 due to code complexity.

Preferably, MCPU 2660 also has the capability to communicate with SEP
dispatcher 2650 in order to request that a SPU perform tasks on the MCPU's behalf.

Figure 20 illustrates one possible implementation for port input buffer (PIB) useful with embodiments of the invention. The PIB 2700 contains two network interface input buffers 2300_0 and 2300_1, a recirculation buffer 2710, and a PCI-X
input buffer 2300_2. Input buffer 2300_0 and 2300_1, and PCI-X input buffer 2300_2 are functionally the same as input buffer 2300, but they receive input data from a different input to port block 2610 and PCI-X interface 2630, respectively.

Recirculation buffer 2710 is comprised of a buffer 2712 that receives recircttlation data from SPU Cluster 2640, a control block 2714 for controlling the recirculation data in buffer 2712, a FIFO block 2716 to allow a DXP 2800 (Figure 21) FIFO access to the recirculation data in buffer 2712, and a random access (RA) block 2718 to allows a SPU
within SPU Cluster 2640 random access to the recirculation data in buffer 2712. When the recirculation data is received at buffer 2712 from SPU Cluster 2640, recirculation buffer 2710 transmits a Port ID to DXP 2800, alerting DXP 2800 that new data has arrived.
Preferably, the Port ID that is transmitted is the first symbol within buffer 2712.

When DXP 2800 decides to parse through the recirculation data, it sends a Control_DXP signal to recirculation buffer 2710 asking for a certain amount of data from buffer 2712, or to increment buffer's 2712 data pointer. Upon receipt of a Control_DXP
signal, control block 2714 transmits a Data DXP signal, containing data from buffer 2712, to DXP 2800 through FIFO block 2716. In an embodiment of the invention, the control block 2714 and FIFO block 2716 add control characters into the recirculation data that is sent to DXP 2800 using the Data DXP signal. Preferably, the control characters are 1-bit status flags that are added at the beginning of each byte of data transferred and denote whether the byte of data is a terminal or non-terminal symbol.

When a SPU 2140 within SPU cluster 2640 receives a SPU entry point (SEP) from DXP 2800 that requires it to access data within the recirculation stream, the SPU 2140 sends a Control_SPU signal to recirculation buffer 2710 requesting the data at a certain location from buffer 2712. Upon receipt of a Control_SPU signal, control block 2714 transmits a Data_SPU signal, containing data from buffer 2712, to SPU 2140 through RA
block 2718.

Figure 21 shows one possible block implementation for DXP 2800. Parser control finite state machine (FSM) 2410 controls and sequences overall DXP 2800 operation, based on inputs from the other logical blocks in Figure 21, in similar fashion to that described for DXP 2400 illustrated in Figure 17. Differences exist, however, due-to the existence of multiple parsing inputs in input buffer 2700. These differences largely lie within the parser control FSM 2410, the stack handler 2830, and the input stream sequence control 2420.

Additionally, parser stack 2430 of Figure 17 has been replaced with a parser stack block 2860 capable of maintaining a plurality of parser stacks 2430_1 to 2430_M. Finally, a parser data register bank 2810 has been added.

Stack handler 2830 controls the plurality of parser stacks 2430_1 to 2430_M, by storing and sequencing the symbols to be executed by DXP 2800. In an embodiment of the invention, parser stacks 2430_1 to 2430_M are located in a single memory, where each parser stack is allocated a fixed portion of that memory. Alternately, the number of parser stacks 2430_1 to 2430_M within a parser stack block 2860 and the size of each parser stack can be dynamically determined and altered by stack handler 2830 as dictated by the number of active input data ports and the grammar.

DXP 2800 receives inputs through a plurality of interface blocks, including:
parser table interface 2840, production rule table (PRT) interface 2850, input stream sequence contro12420 and SPU interface 2440. Generally, these interfaces function as previously described, with the exception of input stream sequence control 2420.

Input stream sequence contro12420 and data register bank 2810 retrieve and hold input stream data from PIB 2700. Data register bank 2810 is comprised of a plurality of registers that can store received input stream data. Preferably, the number of registers is equal to the maximum number of parser stacks 2430_1 to 2430_M that can exist within parser stack block 2860, each register capable of holding N input symbols.

Parser control FSM 2410 controls input stream sequence control 2420, data register bank 2810, and stack handler 2830 to switch parsing contexts between the different input buffers. For instance, parser control FSM 2410 maintains a context state that indicates whether it is currently working with data from input buffer 2300_0, input buffer 2300_1, PCI-X input buffer 2300_2, or recirculation buffer 2710. This context state is communicated to input stream sequence contro12420, causing it to respond to data input or skip commands in the grammar with commands to the appropriate input or recirculation buffer.
The context state is also communicated to the data register bank 2810, causing loads and reads of that register to access a register corresponding to the current context state.
Finally, the context state is communicated to the stack handler 2830, causing pushes and pop commands to stack handler 2830 to access the correct one of the parser stacks 2430_1 to 2430_M.

Parser control FSM decides when to switch parsing contexts. For instance, when a bottom-of-stack symbol is reached on a particular parser stack, or when a particular parser context stalls due to a SPU operation, parser control FSM can_examine the state of the next parsing context, and continue in round-robin fashion until a parsing context that is ready for parsing is reached.

The basic operation of the blocks in Figures 15A, 15B, and 19-21 will now be described with reference to the flowchart for data parsing in Figure 22. The flowchart 2900 is used for illustrating a method according to an embodiment of the invention.

According to a decision block 2905, DXP 2800 determines whether new data, other than data corresponding to a stalled parser stack, has been received at PIB
2700. In an embodiment of the invention, the four buffers within PIB 2700 each have a unique Port ID, which is sent to DXP 2800 when new data is received. Preferably, recirculation buffer 2710 contains its unique Port ID as the first byte in each recirculation data segment. Since the four buffers within PIB 2700 each have an independent input, DXP 2800 can receive multiple Port IDs simultaneously. When DXP 2800 receives multiple Port IDs it, preferably uses round robin arbitration to determine the sequence in which it will parse the new data present at the ports.

In one embodiment of the invention, parser stacks can be saved by DXP 2800 when parsing has to halt on a particular stream. A parser stack is saved when FSM
2410 sends a Control signal to stack handler 2830 commanding it to switch the selection of parser stacks.

When new data has not yet been received, processing returns to block 2905, where DXP 2800 waits for new data to be received by PIB 2700.

When new data has been received, according to a next block 2910, DXP 2800 pushes the Port ID of the selected buffer onto the selected parser stack as an NT
symbol, where the selected buffer is the buffer within PIB 2700 that DXP 2800 selected to parse, and the selected parser stack within DXP 2800 is the parser stack that DXP 2800 selected to store symbols to be executed. The grammar loaded for each port, or a portion of that grammar, can be different depending on the initial non-terminal symbol loaded for that port. For example, if one input port receives SONET frames and another input port receives Ethernet frames, the Port ID NT symbols for the respective ports can be used to automatically select the proper grammar for each port.

In an embodiment of the invention, input stream sequence contro12420 selects a buffer within PIB 2700 through round robin arbitration, and stack handler 2830 selects a parser stack within parser stack block 2860. In a preferred embodiment of the invention, FSM 2410 sends a signal to input stream sequence contro12420 to enable selection of a buffer within PIB 2700, and a Control Reg signal to data register bank 2810 to select a register. Also, FSM 2410 sends a Control signal to stack handler 2830 to enable selection of a buffer or to dynamically allocate a parser stack in parser stack block 2860.

For illustrative purposes, it is assumed that input buffer 2300_0 had its Port ID
selected by DXP 2800 and that parser stack 2430_1 is selected for storing the grammar symbols to be used by DXP 2800 in parsing data from input buffer 2300_0. In the illustrated embodiment of the invention, the Port ID is pushed onto parser stack 2430_1 by stack handler 2830, after stack handler 2830 receives the Port ID and a Push command from FSM 2410 in SYM Code and Control signals, respectively. A 1-bit status flag, preceding the Port ID, denotes the Port ID as a NT symbol.

According to a next block 2920, DXP 2800 requests and receives N bytes of data (or a portion thereof) from the stream within the selected buffer. In the illustrated embodiment, DXP 2800 requests and receives the N bytes of data through a DATA/CONTROL
signal coupled between the input stream sequence control 2420 and input buffer 2300_0 within PIB

2700. After the data is received by the input stream sequence control 2420, it is stored to a selected register within data register control 2810, where the selected register within data register control 2810 is controlled by the current parsing context.

According to a next block 2930, DXP 2800, after determining that it is not waiting for further input and that the symbol on the selected parser stack is not the bottom-of-stack symbol, processes the symbol on the top of the selected parser stack and the received N bytes (or a portion thereof). Block 2930 includes a determination of whether the top symbol is a terminal or a non-terminal symbol. This determination can be performed by stack handler 2830, preferably by reading the status flag of the symbol on the top of parser stack 2430_1, and sending that status to FSM 2410 as a prefix (P) code signal.

When the symbol is determined to be a terminal (T) symbol, at decision block DXP 2800 checks for a match between the T symbol and the next byte of data from the received N bytes.

In a preferred embodiment of the invention, a match signal M, used by DXP 2400 to check whether a T symbol match has been made, is sent to FSM 2410 by comparator 2820 when comparator 2820 is inputted with the T symbol from stack handler 2830 and next byte of data from the selected register within data register control 2810. Stack handler 2830 sends the T symbol on parser stack 2430_1 to the input of comparator 2820, by popping the symbol off of parser stack 2430_1.

When the symbol on the top of the current parser stack is determined to be a non-terminal (NT) symbol, at block 2945 DXP 2800 sends the NT symbol from parser stack 2430_1 and the received N bytes in the selected register from bank 2810 to the parser table 2200. In the illustrated embodiment, the NT symbol and the received N bytes are sent to parser table interface 2840, where they are concatenated prior to being sent to parser table 2200. Optionally, the NT symbol and the received N bytes can be_sent directly to parser table 2200. In some embodiments, the received N bytes in the selected register are concurrently sent to SPU 2140 and parser table 2200.

Preferably, the symbol on the parser stack 2430_1 is sent to comparator 2820, parser table interface 2450 and PRT interface 2460 concurrently.

Assuming that a valid block 2935 T-symbol match was attempted, when that match is successful, execution returns to block 2920, where DXP 2800 requests and receives up to N
bytes of additional data from the PIB 2700. In one embodiment of the invention, DXP 2800 would only request and receive one byte of stream data after a T symbol match was made.

When a block 2935 match is attempted and unsuccessful, according to a next block 2940, DXP 2800 may, when the grammar directs, clear the selected parser stack and launches a SEP to remove the remainder of the current data segment from the current input buffer.

DXP 2800 resets parser stack 2430_1 by sending a control signal to stack handler 2830 to pop the remaining symbols and set the stack pointer to the bottom-of-stack symbol. DXP
2800 launches a SEP by sending a command to SPU dispatcher 2650 through SPU
interface 2440, where SPU dispatcher 2650 allocates a SPU 2140 to fetch microinstructions from SCT

2150. The microinstructions, when executed, remove the remainder of the current data segment from input buffer 2300_0. Execution then returns to block 2905, where determines whether new data, for a data input other than one with a stalled parser context, has been received at PIB 2700.

Assuming that the top-of-stack symbol was a non-terminal symbol, a block 2945 match is attempted instead of a block 2935 match. When there was a match in the parser table 2200 to the NT symbol concatenated with the N bytes of data, execution proceeds to block 2950. The parser table 2200 returns a PR code corresponding to the match to DXP
2800, and DXP 2800 uses the PR code to look up a production rule in PRT 2250.
In one embodiment, the production rule is looked up in the PRT memory 22701ocated within PRT
2250.

In the illustrated embodiment, the PR code is sent from parser table 2200 to PRT
2250, through intermediate parser table interface 2450 and PRT interface 2460.
Optionally, the PR code can be sent directly from parser table 2200 to PRT 2250.

When a match is unsuccessful in decision block 2945, according to a next block 2960, DXP 2800 uses the NT symbol from the selected parser stack to look up a default production rule in PRT 2250. In one embodiment, the default production rule is looked up in the MAPT
2280 memory located within PRT 2250. Optionally, MAPT 280 memory can be located in a memory block other than PRT 2250.

In the illustrated embodiment, stack handler 2830 sends production rule interface 2850 and parser table interface 2840 the NT symbol at the same time.
Optionally, stack handler 2830 could send the NT symbol directly to parser table 2200 and PRT
2250. When PRT 2250 receives a PR code and an NT symbol, it sends both a production rule and a default production rule to PRT interface 2850, concurrently. Production rule interface 2480 only returns the appropriate rule to FSM 2410. In another embodiment, both the production rule and default production rule are sent to FSM 2410. In yet another embodiment, PRT
2250 only sends PRT interface 2850 one of the PR or default PR, depending on if a PR code was sent to PRT 2250.

Whether block 2950 or block 2960 was executed, both proceed to a next block 2970.
According to block 2970, DXP 2800 processes the received production rule from PRT 2250.
In an embodiment of the invention, FSM 2410 divides the production rule into three segments, a symbol segment, SEP segment, and a skip bytes segment. Preferably, each segment of the production rule is fixed length or null-terminated to enable easy and accurate division, as described previously.

Block 2970 of Figure 22 operates in similar fashion as block 2590 of Figure 18, with the following differences. First, the symbol segment of the production rule is pushed onto the correct parser stack for the current context. Second, the skip bytes section of the production rule is used to manipulate the proper register in the data register bank, and the proper input buffer, for the current context. And third, when SEPs are sent to the SEP
dispatcher, the instruction indicates the proper input buffer for execution of semantic code by a SPU.

According to a next decision block 2975, DXP 2800 determines whether the input data in the selected buffer is in need of further parsing. In an embodiment of the invention, the input data in input buffer 2300_0 is in need of further parsing when the stack pointer for parser stack 2430_1 is pointing to a symbol, other than the bottom-of-stack symbol.

Preferably, FSM 2410 receives a stack empty signal SE from stack handler 2830 when the stack pointer for parser stack 2430_1 is pointing to the bottom-of-stack symbol.

When the input data in the selected buffer does not need to be parsed further, execution returns to block 2905, where DXP 2800 determines whether another input buffer, other than a buffer with a stalled parser stack, has new data waiting at PIB
2700.

When the input data in the selected buffer needs to be parsed further, according to a next decision block 2985, DXP 2800 determines whether it can continue parsing the input data in the selected buffer. In an embodiment of the invention, parsing can halt on input data from a given buffer, while still in need of parsing, for a number of reasons, such as dependency on a pending or executing SPU operation, a lack of input data, other input buffers having priority over parsing in DXP 2800, etc. In one embodiment, the other input buffers that have priority over the input data in input buffer 2300_0 can be input buffers that have previously had their parser stack saved, or have a higher priority as the grammar dictates. DXP 2800 is alerted to SPU processing delays by SEP dispatcher 2650 through a Status signal, and is alerted to priority parsing tasks by status values in stored in FSM 2410.

When DXP 2800 can continue parsing in the current parsing context, execution returns to block 2920, where DXP 2800 requests and receives up to N bytes of data from the input data within the selected buffer.

When DXP 2800 cannot continue parsing, according to a next block 2990, DXP

saves the selected parser stack and subsequently de-selects the selected parser stack, the selected register in data register bank 2810, and the selected input buffer.
After receiving a switch Control signal from FSM 2410, stack handler 2830 saves and de-selects parser stack 2430_1 by selecting another parser stack within parser stack block 2860.

Input stream sequence control 2420, after receiving a switch signal from FSM
2410, de-selects input buffer 2300_0 by selecting another buffer within PIB 2700 that has received input data, and data register bank 2810, after receiving a switch signal from FSM 2410, de-selects the selected register by selecting another register. Input buffer 2300_0, the selected register, and parser stack 2430_1 can remain active when there is not another buffer with new data waiting in PIB 2700 to be parsed by DXP 2800.

Execution then returns to block 2905, where DXP 2800 determines whether.another input buffer, other than one with a stalled parser stack, has been received at PIB 2700.

Figure 23 shows a block diagram of a semantic processor 3100 according to an embodiment of the invention. The semantic processor 3100 contains an input buffer 3140 for buffering a packet data stream received through the input port 3120, a direct execution parser (DXP) 3180 that controls the processing of packet data received at the input buffer 3140 and a recirculation buffer 3160, and a packet processor 3200 for processing packets. The input buffer 3140 and recirculation buffer 3160 are preferably first-in-first-out (FIFO) buffers. The packet processor 3200 is comprised of an execution engine 3220 for processing segments of the packets or for performing other operations, and a memory subsystem 3240 for storing and/or augmenting segments of the packets.

The DXP 3180 controls the processing of packets or frames within the input buffer 3140 (e.g., the input "stream") and the recirculation buffer 3160 (e.g., the recirculation "stream"). Since the DXP 3180 parses the input stream from input buffer 3140 and the recirculation stream from the recirculation buffer 3160 in a similar fashion, only the parsing of the input stream will be described below.

The DXP 3180 maintains an internal parser stack of terminal and non-terminal symbols, based on parsing of the current frame up to the current symbol. When the symbol (or symbols) at the top of the parser stack is a terminal symbol, DXP 3180 compares data at the head of the input stream to the terminal symbol and expects a match in order to continue.
When the symbol at the top of the parser stack is a non-terminal symbol, DXP
3180 uses the non-terminal symbol and current input data to expand the grammar production on the stack.

As parsing continues, DXP 3180 instructs execution engine 3220 to process segments of the input, or perform other operations.

Semantic processor 3100 uses at least two tables. Complex grammatical production rules are stored in a production rule table (PRT) 3190. Codes for retrieving those production rules are stored in a parser table (PT) 3170. The codes in parser table 3170 also allow DXP

3180 to determine, for a given production rule, the processing the packet processor 3200 should perform upon a segment of a packet.

Some embodiments of the present invention contain many more elements than those shown in Figure 23. A description of the packet flow within the semantic processor shown in Figure 23 will thus be given before more complex embodiments are addressed.

Figure 24 contains a flow chart 3300 for the processing of received packets through the semantic processor 3100 of Figure 23. The flowchart 3300 is used for illustrating a method of the invention.

According to a block 3310, a packet is received at the input buffer 3140 through the input port 3120. According to a next block 3320, the DXP 3180 begins to parse through the header of the packet within the input buffer 3140. In the case where the packet needs no additional manipulation or additional packets to enable the processing of the packet payload, the DXP 3180 will completely parse through the header. In the case where the packet needs additional manipulation or additional packets to enable the processing of the packet payload, the DXP 3180 will cease to parse the header.

According to a decision block 3330, it is inquired whether the DXP 3180 was able to completely parse the through header. If the DXP 180 was able to completely parse through the header, then according to a next block 3370, the DXP 3180 calls a routine within the packet processor 3200 to process the packet payload and the semantic processor 3100 waits for a next packet to be received at the input buffer 3140 through the input port 3120.

If the DXP 3180 had to cease parsing the header, then according to a next block 3340, the DXP 3180 calls a routine within the packet processor 3200 to manipulate the packet or wait for additional packets. Upon completion of the manipulation or the arrival of additional packets, the packet processor 3200 creates an adjusted packet.

According to a next block 3350, the packet processor 3200 writes the adjusted packet (or a portion thereof) to the recirculation buffer 3160. This can be accomplished by either enabling the recirculation buffer 3160 with direct memory access to the memory subsystem 3240 or by having the execution engine 3220 read the adjusted packet from the memory subsystem 3240 and then write the adjusted packet to the recirculation buffer 3160. -Optionally, to save processing time within the packet processor 3200, a specialized header can be written to the recirculation buffer 3160 instead of the entire adjusted packet. This specialized header directs the packet processor 3200 to process the adjusted packet without having to transfer the entire packet out of packet processor's memory sub-system 3240.

According to a next block 3360, the DXP 3180 begins to parse through the header of the data within the recirculation buffer 3160. Execution is then returned to block 3330, where it is inquired whether the DXP 3180 was able to completely parse through the header.
If the DXP 3180 was able to completely parse through the header, then according to a next block 3370, the DXP 3180 calls a routine within the packet processor 3200 to process the packet payload and the semantic processor 3100 waits for a next packet to be received at the input buffer 3140 through the input port 3120.

If the DXP 3180 had to cease parsing the header, execution returns to block where the DXP 3180 calls a routine within the packet processor 3200 to manipulate the packet or wait for additional packets, thus creating an adjusted packet. The packet processor 3200, then, writes the adjusted packet to the recirculation buffer 3160 and the DXP 3180 begins to parse through the header of the packet within the recirculation buffer 3160.

Figure 25 shows another semantic processor embodiment 3400. Semantic processor 3400 contains an array machine-context data memory (AMCD) 3430 for accessing data in dynamic random access memory (DRAM) 3480 through a hashing function or content-addressable memory (CAM) lookup, a cryptography block 3440 for the encryption, decryption or authentication of data, a context control block cache 3450 for caching context control blocks to and from DRAM 3480, a general cache 3460 for caching data used in basic operations, and a streaming cache 3470 for caching data streams as they are being written to and read from DRAM 3480. The context control block cache 3450 is preferably a software-controlled cache, i.e. a process determines when a cache line is used and freed. Each of the five blocks is coupled with DRAM 3480 and the Semantic code Execution Engine (SEE), also referred to as Semantic Processing Unit (SPU) 3410. The SEE 3410, when signaled by the DXP 3180, processes segments of packets or performs other operations. When DXP
3180 determines that a SEE task is to be launched at a specific point in its parsing, DXP 3180 signals SEE 3410 to load microinstructions from semantic code table (SCT) 3420. The loaded microinstructions are then executed in the SEE 3410 and the segment of the packet is processed accordingly.

Figure 26 contains a flow chart 3500 for the processing of received Internet Protocol (IP)-fragmented packets through the semantic processor 3400 of Figure 25. The flowchart 3500 is used for illustrating one method according to an embodiment of the invention.

Once a packet is received at the input buffer 3140 through the input port 3120 and the DXP 3180 begins to parse through the headers of the packet within the input buffer 3140, according to a block 3510, the DXP 3180 ceases parsing through the headers of the received packet because the packet is determined to be an IP-fragmented packet.
Preferably, the DXP
3180 completely parses through the IP header, but ceases to parse through any headers belonging to subsequent layers (such as TCP, UDP, iSCSI, etc.).

According to a next block 3520, the DXP 3180 signals to the SEE 3410 to load the appropriate microinstructions from the SCT 3420 and read the received packet from the input buffer 3140. According to a next block 3530, the SEE 3410 writes the received packet to DRAM 3480 through the streaming cache 3470. Although blocks 3520 and 3530 are shown as two separate steps they can be optionally performed as one step with the SEE 3410 reading and writing the packet concurrently. This concurrent operation of reading and writing by the SEE 3410 is known as SEE pipelining, where the SPU 3410 acts as a conduit or pipeline for streaming data to be transferred between two blocks within the semantic processor 3400.

According to a next decision block 3540, the SPU 3410 determines if a Context Control Block (CCB) has been allocated for the collection and sequencing of the correct IP
packet fragment. The CCB for collecting and sequencing the fragments corresponding to an IP-fragmented packet, preferably, is stored in DRAM 3480. The CCB contains pointers to the IP fragments in DRAM 3480, a bit mask for the IP-fragments packets that have not arrived, and a timer value to force the semantic processor 3400 to cease waiting for,additional IP-fragments packets after an allotted period of time and to release the data stored in the CCB
within DRAM 3480.

The SPU 3410 preferably determines if a CCB has been allocated by accessing the AMCD's 3430 content-addressable memory (CAM) lookup function using the IP
source address of the received IP fragmented packet combined with the identification and protocol from the header of the received IP packet fragment as a key. Optionally, the IP fragment keys are stored in a separate CCB table within DRAM 3480 and are accessed with the CAM
by using the IP source address of the received IP fragmented packet combined with.the identification and protocol from the header of the received IP packet fragment. This optional addressing of the IP fragment keys avoids key overlap and sizing problems.

If the SPU 3410 determines that a CCB has not been allocated for the collection and sequencing of fragments for a particular IP-fragmented packet, execution then proceeds to a block 3550 where the SPU 3410 allocates a CCB. The SPU 3410 preferably enters a key corresponding to the allocated CCB, the key comprising the IP source address of the received IP fragment and the identification and protocol from the header of the received IP fragmented packet, into an IP fragment CCB table within the AMCD 3430, and starts the timer -located in the CCB. When the first fragment for given fragmented packet is received, the IP header is also saved to the CCB for later recirculation. For further fragments, the IP
header need not be saved.

Once a CCB has been allocated for the collection and sequencing of IP-fragmented packet, according to a next block 3560, the SPU 3410 stores a pointer to the IP-fragment (minus its IP header) packet in DRAM 3480 within the CCB. The pointers for the fragments can be arranged in the CCB as, e.g. a linked list. Preferably, the SPU 3410 also updates the bit mask in the newly allocated CCB by marking the portion of the mask corresponding to the received fragment as received.

According to a next decision block 3570, the SPU 3410 determines if all of the IP-fragments from the packet has been received. Preferably, this determination is accomplished by using the bit mask in the CCB. A person of ordinary skill in the art can appreciate that there are multiple techniques readily available to implement the bit mask, or an equivalent tracking mechanism, for use with the present invention.

If all of the IP-fragments have not been received for the fragments packet, then the semantic processor 3400 defers further processing on that fragmented packet until another fragment is received.

If all of the IP-fragments have been received, according to a next block 3580, the SPU
3410 resets the timer, reads the IP fragments from DRAM 3480 in the correct order and writes them to the recirculation buffer 3160 for additional parsing and processing.

Preferably, the SPU 410 writes only a specialized header and the first part of the reassembled IP packet (with the fragmentation bit unset) to the recirculation buffer 3160.
The specialized header enables the DXP 3180 to direct the processing of the reassembled IP-fragmented packet stored in DRAM 3480 without having to transfer all of the IP fragmented packets to the recirculation buffer 3160. The specialized header can consist of a designated non-terminal symbol that loads parser grammar for IP and a pointer to the CCB. The parser can then parse the IP header normally, and proceed to parse higher-layer (e.g., TCP) headers.
In an embodiment of the invention, DXP 3180 decides to parse the data received at either the recirculation buffer 3160 or the input buffer 3140 through round robin arbitration.
A high level description of round robin arbitration will now be discussed with reference to a first and a second buffer for receiving packet data streams. After DXP 3180 completes the parsing of a packet within the first buffer, it looks to the second buffer to determine if data is available to be parsed. If so, the data from the second buffer is parsed. If not, then DXP
3180 looks back to the first buffer to determine if data is available to be parsed. DXP 3180 continues this round robin arbitration until data is available to be parsed in either the first buffer or second buffer.

Figure 27 contains a flow chart 3600 for the processing of received packets in need of decryption and/or authentication through the semantic processor 3400 of Figure 25. The flowchart 3600 is used for illustrating another method according to an embodiment,of the invention.

Once a packet is received at the input buffer 3140 or the recirculation buffer 3160 and the DXP 3180 begins to parse through the headers of the received packet, according to a block 3610, the DXP 3180 ceases parsing through the headers of the received packet because it is determined that the packet needs decryption and/or authentication. If DXP 3180 begins to parse through the packet headers from the recirculation buffer 3160, preferably the recirculation buffer 3160 will only contain the aforementioned specialized header and the first part of the reassembled IP packet.

According to a next block 3620, the DXP 3180 signals to the SPU 3410 to load the appropriate microinstructions from the SCT 3420 and read the received packet from input buffer 3140 or recirculation buffer 3160. Preferably, SPU 3410 will read the packet fragments from DRAM 3480 instead of the recirculation buffer 3160 for data that has not already been placed in the recirculation buffer.

According to a next block 3630, the SPU 3410 writes the received packet to cryptography block 3440, where the packet is authenticated, decrypted, or both. In a preferred embodiment, decryption and authentication are performed in parallel within cryptography block 3440. The cryptography block 3440 enables the authentication, encryption, or decryption of a packet through the use of Triple Data Encryption Standard (T-DES), Advanced Encryption Standard (AES), Message Digest 5 (MD-5), Secure Hash Algorithm 1(SHA-1), Rivest Cipher 4 (RC-4) algorithms, etc. Although block 3620 and 3630 are shown as two separate steps they can be performed optionally as one step with the SPU 3410 reading and writing the packet concurrently.

The decrypted and/or authenticated packet is then written to SPU 3410 and, according to a next block 3640, the SPU 3410 writes the packet to the recirculation buffer 160 for further processing. In a preferred embodiment, the cryptography block 3440 contair-s a direct memory access engine that can read data from and write data to DRAM 3480. By writing the decrypted and/or authenticated packet back to DRAM 3480, SPU 3410 can then read just the headers of the decrypted and/or authenticated packet from DRAM 3480 and subsequently write them to the recirculation buffer 3160. Since the payload of the packet remains in DRAM 3480, semantic processor 3400 saves processing time. Like with IP
fragmentation, a specialized header can be written to the recirculation buffer to orient the parser and pass CCB
information back to SPU 3410.

Multiple passes through the recirculation buffer 3160 may be necessary when IP
fragmentation and encryption/authentication are contained in a single packet received by the semantic processor 3400.

Figure 28 shows yet another semantic processor embodiment. Semantic processor 3700 contains a semantic execution engine (SPU) cluster 3710 containing a plurality of semantic execution engines (SPU) 3410-1, 3410-2, to 3410-N. Preferably, each of the SPUs 3410-1 to 3410-N are identical and have the same functionality. The SPU
cluster 3710 is coupled to the memory subsystem 3240, a SPU entry point (SEP) dispatcher 3720, the SCT
3420, port input buffer (PIB) 3730, port output buffer (POB) 3750, and a machine central processing unit (MCPU) 3771.

When DXP 3180 determines that a SPU task is to be launched at a specific point in parsing, DXP 3180 signals SEP dispatcher 3720 to load microinstructions from semantic code table (SCT) 3420 and allocate a SPU from the plurality of SPUs 3410-1 to within the SPU cluster 3710 to perform the task. The loaded microinstructions and task to be performed are then sent to the allocated SPU. The allocated SPU then executes the microinstructions and the data packet is processed accordingly. The SPU can optionally load microinstructions from the SCT 3420 directly when instructed by the SEP
dispatcher 3720.

The PIB 3730 contains at least one network interface input buffer a recirculation buffer, and a Peripheral Component Interconnect (PCI-X) input buffer. The POB

contains at least one network interface output buffer and a Peripheral Component Interconnect (PCI-X) output buffer. The port block 3750 contains one or more ports, each comprising a physical interface, e.g., an optical, electrical, or radio frequency driver/receiver pair for an Ethernet, Fibre Channel, 802.11 x, Universal Serial Bus, Firewire, or other physical layer interface. Preferably, the number of ports within port block 3740 corresponds to the number of network interface input buffers within the PIB 3730 and the number of output buffers within the POB 3750.

The PCI-X interface 3760 is coupled to a PCI-X input buffer within the PIB
3730, a PCI-X output buffer within the POB 3750, and an external PCI bus 3780.
The PCI
bus 3780 can connect to other PCI-capable components, such as disk drive, interfaces for additional network ports, etc.

The MCPU 3771 is coupled with the SPU cluster 3710 and memory subsystem 3240.
MCPU 3771 performs any desired functions for semantic processor 3700 that can reasonably be accomplished with traditional software running on standard hardware. These functions are usually infrequent, non-time-critical functions that do not warrant inclusion in SCT 3420 due to complexity. Preferably, MCPU 3771 also has the capability to communicate with the dispatcher in SPU cluster 3720 in order to request that a SPU perform tasks on the MCPU's behalf.

In an embodiment of the invention, the memory subsystem 3240 is further comprised of a DRAM interface 3790 that couple the cryptography block 3440, context contral block cache 3450, general cache 3460 and streaming cache 3470 to DRAM 3480 and external DRAM 3791. In this embodiment, the AMCD 3430 connects directly to an External TCAM
3793, which, in turn, is coupled to an external SRAM (Static Random Access Memory) 3795.

Figure 29 contains a flow chart 3800 for the processing of received Internet Small Computer Systems Interface (iSCSI) data through the semantic processor 3700 of Figure 28.
The flowchart 3800 is used for illustrating another method according to an embodiment of the invention.

According to a block 3810, an iSCSI connection having at least one Transmission Control Protocol (TCP) session is established between an initiator and the target semantic processor 3700 for the transmission of iSCSI data. The semantic processor 3700 contains the appropriate grammar in the PT 3160 and the PRT 3190 and microcode in SCT 3420 to establish a TCP session and then process the initial login and authentication of the iSCSI
connection through the MCPU 3771. In one embodiment, one or more SPUs within the SPU
cluster 3810 organize and maintain state for the TCP session, including allocating a CCB in DRAM 3480 for TCP reordering, window sizing constraints and a timer for ending the TCP

session if no further TCP/iSCSI packets arrive from the initiator within the allotted time frame. The TCP CCB contains a field for associating that CCB with an iSCSI CCB
once an iSCSI connection is established by MCPU 3771.

After a TCP session is established with the initiator, according to a next block 3820, semantic processor 3700 waits for a TCP/iSCSI packet, corresponding to the TCP
session established in block 3810, to arrive in the input buffer 3140 of the PIB 3730.
Since semantic processor 3700 has a plurality of SPUs 3410-1 to 3410-N available for processing input data, semantic processor 3700 can receive and process multiple packets in parallel while waiting for the next TCP/iSCSI packet corresponding to the TCP session established in the block 810.

A TCP/iSCS1 packet is received at the input buffer 3140 of the PIB 3730 through the input port 3120 of port block 3740, and the DXP 3180 parses through the TCP
header of the packet within the input buffer 3140. According to a next block 3830, the DXP
3180 signals to the SEP dispatcher 3720 to load the appropriate microinstructions from the SCT 3420, allocate a SPU from the SPU cluster 3710, and send the allocated SPU
microinstructions that, when executed, require the allocated SPU to read the received packet from the input buffer 3140 and write the received packet to DRAM 480 through the streaming cache 3470. The allocated SPU then uses the AMCD's 34301ookup function to locate the TCP CCB, stores the pointer to the location of the received packet in DRAM 3480 to the TCP
CCB, and restarts the timer in the TCP CCB. The allocated SPU is then released and can be allocated for other processing as the DXP 3180 determines.

According to a next block 3840, the received TCP/iSCS1 packet is reordered, if necessary, to ensure correct sequencing of payload data. As is well known in the art, a TCP
packet is deemed to be in proper order if all of the preceding packets have arrived.

When the received packet is determined to be in the proper order, the responsible SPU
signals the SEP dispatcher 3720 to load microinstructions from the SCT 3420 for iSCSI

recirculation. According to a next block 3850, the allocated SPU combines the iSCSI header, the TCP connection ID from the TCP header and an iSCSI non-terminal to create a specialized iSCSI header. The allocated SPU then writes the specialized iSCSI
header to the recirculation buffer 3160 within the PIB 3730. Optionally, the specialized iSCSI header can be sent to the recirculation buffer 3160 with its corresponding iSCSI payload.

According to a next block 3860, the specialized iSCSI header is parsed and semantic processor 3700 processes the iSCSI payload.

According to a next decision block 3870, it is inquired whether there is another iSCSI
header in the received TCP/iSCSI packet. If YES, then execution returns to block 850 where the second iSCSI header within the received TCP/iSCSI packet is used to process the second iSCSI payload. As is well known in the art, there can be multiple iSCSI
headers and payloads in a single TCP/iSCSI packet and thus there may be a plurality of packet segments sent through the recirculation buffer 3160 and DXP 3180 for any given iSCSI
packet.

If NO, block 3870 returns execution to the block 3820, where semantic processor 3700 waits for another TCP/iSCSI packet corresponding to the TCP session established in the block 3810. The allocated SPU is then released and can be allocated for other processing as the DXP 3180 determines.

As can be understood by a person skilled in the art, multiple segments of a packet may be passed through the recirculation buffer 3160 at different times when any combination of encryption, authentication, IP fragmentation and iSCSI data processing are contained in a single packet received by the semantic processor 3700.
MEMORY SUBSYSTEM

FIG. 30 shows the memory subsystem 3240 in more detail. The cluster of SPUs 3710 and an ARM 3814 are connected to the memory subsystem 3240. In an alternative embodiment, the ARM 3814 is coupled to the memory subsystem 3240 through the SPUs 3710. The memory subsystem 3240 includes multiple different cache regions 3460, 3450, 3470, 3430, 3440 and 3771 that are each adapted for different types of memory access.
The multiple cache regions 3460, 3450, 3470, 3430, 3440 and 3771 are referred to generally as cache regions 3825. The SPU cluster 3710 and the ARM 3814 communicate with any of the different caches regions 3825 that then communicate with an external Dynamic Random Access Memory (DRAM) 3791A through a main DRAM arbiter 3828. In one implementation, however, the CCB cache 3450 may communicate to a separate external CCB
DRAM 3791B through a CCB DRAM controller 3826.

The different cache regions 3825 improve DRAM data transfers for different data processing operations. The general cache 3460 operates as a conventional cache for general purpose memory accesses by the SPUs 3710. For example, the general cache 3460 may be used for the general purpose random memory accesses used for conducting general control and data access operations.

Cache line replacement in the CCB cache 3450 is controlled exclusively by software commands. This is contrary to conventional cache operation where hardware controls contents of the cache based on who occupied a cache line position last. Controlling the CCB cache region 3450 with software prevents the cache from prematurely reloading cache lines that may need some intermediary processing by one or more SPUs 3710 before being loaded or updated from external DRAM 3791B.

The streaming cache 3470 is primary used for processing streaming packet data. The streaming cache 3470 prevents streaming packet transfers from replacing all the entries in say the general cache 3460. The streaming cache3470 is implemented as a cache instead of a First In-First Out (FIFO) memory buffer since it is possible that one or more of the SPUs 3710 may need to read data while it is still located in the streaming cache 3470. If a FIFO were used, the streaming data could only be read after it had been loaded into the external DRAM 3791A. The streaming cache 3470 includes multiple buffers that each can contain different packet streams. This allows different SPUs 3710 to access different packet streams while located in streaming cache 3470.

The MCPU 3771 is primarily used for instruction accesses from the ARM
3814. The MCPU 3771 improves the efficiency of burst mode accesses between the ARM

3814 and the external DRAM 3791A. The ARM 3814 includes an internal cache 3815 that in one embodiment is 32 bits wide. The MCPU 3771 is directed specifically to handle 32 burst bit transfers. The MCPU 3771 may buffer multiple 32 bit bursts from the ARM
3814 and then burst to the external DRAM 3791A when cache lines reach some threshold amount of data.

In one embodiment, each of the cache regions 3825 may map physically to different associated regions in the external DRAM 3791A and 3791B. This plus the separate MCPU 3771 prevents the instruction transfers between the ARM 3814 and external DRAM
3791A from being polluted by data transfers conducted in other cache regions.
For example, the SPUs 3710 can load data through the cache regions 3460, 3450, and 3470 without polluting the instruction space used by the ARM 3814.
S-Code FIG. 31 shows in more detail how memory accesses are initiated by the individual SPUs 3410 to the different cache regions 3825. For simplicity, only the general cache 3460, CCB cache 3450, and the streaming cache 3470 are shown in FIG. 31.

Microinstructions 3900, alternatively referred to as SPU codes (S-Codes), are sent from the direct execution parser 3180 (FIG. 1) to the SPU subsystem 3710.
An example of a microinstruction 3900 is shown in more detail in FIG. 32A. The microinstruction 3900 may include a target field 3914 that indicates to the individual SPUs 3410 which cache region 3825 to use for accessing data. For example, the cache region field 3914 in FIG. 32A directs the SPU 3410 to use the CCB cache 3450. The target field 3914 can also be used to direct the SPUs 3410 to access the MCPU interface 3771 (FIG. 30), recirculation buffer 3160 (FIG.
23), or output buffers 3750 (FIG. 28).

Referring back to FIG. 31, each cache region 3825 has an associated set of queues 3902 in the SPU subsystem 3710. The individual SPUs 3410 send data access requests to the queues 3902 that then provide orderly access to the different cache regions 3825. The queues 3902 also allow different SPUs 3410 to conduct or initiate memory accesses to the different cache regions 3825 at the same time.

FIG. 32B shows an example of a cache request 3904 sent between the SPUs 3410 and the cache regions 3825. The cache request 3904 includes the address and any associated data. In addition, the cache request 3904 includes a SPU tag 3906 that identifies what SPU 3410 is associated with the request 3904. The SPU tag 3906 tells the cache regions 3825 which SPU 3410 to send back any requested data.

Arbitration Referring back to FIG. 30, of particular interest is the DRAM arbiter 3828 that in one embodiment uses a round robin arbitration for determining when data from the different data cache regions 3825 gain access to external DRAM 3791A. In the round robin arbitration scheme, the main DRAM arbiter 3828 goes around in a predetermined order checking if any of the cache regions 3825 has requested access to external DRAM 3791A. If a particular cache region 3825 makes a memory access request, it is granted access to the external DRAM 3791A during its associated round robin period. The arbiter 3828 then checks the next cache region 3825 in the round robin order for a memory access request. If the next cache region 3825 has no memory access request, the arbiter 3828 checks the next cache region 3825 in the round robin order. This process continues with each cache region 3825 being serviced in the round robin order.

Accesses between the CCB cache 3450 and external DRAM 3791A can consume a large amount of bandwidth. A CCB DRAM controller 3826 can be used exclusively for CCB transfers between the CCB cache 3450 and a separate external CCB
DRAM 3791B. Two different busses 3834 and 3836 can be used for the accesses to the two different banks of DRAM 3791A and 3791B, respectively. The external memory accesses by the other cache regions 3460, 3470, 3430, 3440 and 3771 are then arbitrated separately by the main DRAM arbiter 3828 over bus 3834. If the CCB cache 3450 is not connected to external DRAM through a separate CCB controller 3826, then the main DRAM controller arbitrates all accesses to the external DRAM 3791A for all cache regions 3825.

In another embodiment, the accesses to the external DRAM 3791A and external CCB DRAM 3791B are interleaved. This means that the CCB cache 3450, and the other cache regions 3825, can conduct memory accesses to both the external and external CCB DRAM 3791B. This allows two memory banks 3791A to be accessed at the same time. For example, the CCB cache 3450 can conduct a read operation from external memory 3791 A and at the same time conduct a write operation to external memory 3791 B.
General Cache FIG. 33 shows in more detail one example of a general cache 3460. The general cache 3460 receives a physical address 3910 from one of the SPUs 3410 (FIG.31).
The cache lines 3918 are accessed according to a low order address space (LOA) 3916 from the physical address 3910.

The cache lines 3918, in one example, may be relatively small or have a different size than the cache lines used in other cache regions 3825. For example, the cache lines 3918 may be much smaller than the size of the cache lines used in the streaming cache 3470 and the CCB cache 3450. This provides more customized memory accesses for the different types of data processed by the different cache regions 3825. For example,.the cache lines 3918 may only be 16 bytes long for general control data processing. On the other hand, the cache lines for the streaming cache 3470 may have larger cache lines, such as 64 bytes, for transferring larger blocks of data.

Each cache line 3918 may have an associated valid flag 3920 that indicates whether or not the data in the cache line is valid. The cache lines 3918 also have an associated high order address (HOA) field 3922. The general cache 3460 receives the physical address 3910 and then checks HOA 3922 and valid flag 3920 for the cache line 3918 associated with the LOA 3916. If the valid flag 3920 indicates a valid cache entry and the HOA 3922 matches the HOA 3914 for the physical address 3910, the contents of the cache line 3918 are read out to the requesting SPU 3410. If flag field 3920 indicates an invalid entry, the contents of cache line 3918 are written over by a corresponding address in the external DRAM 3791A (FIG. 30).

If flag field 3920 indicates a valid cache entry, but the HOA 3922 does not match the HOA 3914 in the physical address 3910, one of the entries in cache lines 3918 is automatically loaded into the external DRAM 3791A and the contents of external DRAM
3791A associated with the physical address 3910 is loaded into the cache lines associated with the LOA 3916.

Context Control Block (CCB) Cache FIG. 34 shows in more detail the context control block (CCB) cache 3450.
The CCB 3450 includes multiple buffers 3940 and associative tags 3942. As opposed to a conventional 4-way associative cache, the CCB 3450 operates essentially like a 32 way associative cache. The multiple CCB buffers 3940 and associative tags 3942 are controlled by a set of software commands sent through the SPUs 3410. The software commands 3946 include a set of Cache/DRAM instructions 3946 used for controlling the transfer of data between the CCB cache 3450 and the external DRAM 3791A or 3791B (FIG. 30). A
set of SPU/cache commands 3948 are used for controlling data transfers between the SPUs 3410 and the CCB cache 3450. The software instructions 3946 include ALLOCATE, LOAD, COMMIT AND DROP operations. The software instructions 3948 include READ and WRITE operations.

FIG. 35 shows some examples of CCB commands 3954 sent between the SPUs 3410 and the CCB cache 3450. Any of these software commands 3944 can be issued by any SPU 3410 to the CCB cache 3450 at any time.

Referring to FIGS 34 and 35, one of the SPUs 3410 sends the ALLOCATE
command 3954A to the CCB cache 3450 to first allocate one of the CCB buffers 3940. The ALLOCATE command 3954A may include a particular memory address or CCB tag 3956 associated with a physical address in DRAM 3791 containing a CCB. The controller 3950 in the CCB cache 3450 conducts a parallel match of the received CCB address 3956 with the addresses or tags associated with the each of the buffers 3940. The addresses associated with each buffer 3940 are contained in the associated tag fields 3942.

If the address/tag 3956 is not contained in any of the tag fields 3942, the controller 3950 allocates one of the unused buffers 3940 to the specified CCB
tag 3956. If the address already exists in one of the tag fields 3942, the controller 3950 uses the buffer 3940 already associated with the specified CCB tag 3956.

The controller 3950 sends back a reply 3954B to the requesting SPU 3410 that indicates weather or not a CCB buffer 3940 has been successfully allocated. If a buffer 3940 is successfully allocated, the controller 3950 maps all CCB commands 3944 from all SPUs 3410 that use the CCB tag 3956 to the newly allocated buffer 3940.

There are situations where the SPUs 3410 may not care about the data that is currently in the external DRAM 3791 for a particular memory address. For example, when the data in external DRAM 3791 is going to be overwritten. In conventional cache architectures, the contents of any specified address not currently contained in the cache is automatically loaded into the cache from main memory. However, the ALLOCATE command 3946 simply allocates one of the buffers 3940 without having to first read in data from the DRAM 3791.
Thus, the buffers 3940 can also be used as scratch pads for intermediate data processing without ever reading or writing the data in buffers 3940 into or out of the external DRAM

3791.

The LOAD and COMMIT software commands 3946 are required to complete the transfer of data between one of the cache buffers 3940 and the external DRAM
3791. For example, a LOAD command 3956C is sent from a SPU 3410 to the controller 3950 to load a CCB associated with a particular CCB tag 3956 from external DRAM 3791 into the associated buffer 3940 in CCB cache 3450. The controller 3950 may convert the CCB tag 3956 into a physical DRAM address and then fetch a CCB from the DRAM 3791 associated with the physical DRAM address.

A COMMIT command 3956C is sent by a SPU 3410 to write the contents of a buffer 3940 into a physical address in DRAM 3791 associated with the CCB tag 3956. The COMMIT command 3956C also causes the controller 3950 to deallocate the buffer making it available for allocating to another CCB. However, another SPU 3410 can later request buffer allocation for the same CCB tag 3956. The controller 3950 uses the existing CCB currently located in buffer 3940 if the CCB still exists in one of the buffers 3940.

The DROP command 3944 tells the controller 3950 to discard the contents of a particular buffer 3940 associated with a specified CCB tag 3956. The controller 3950 discards the CCB simply by deallocating the buffer 3940 in CCB cache 3450 without ever loading the buffer contents into external DRAM 3791.

The READ and WRITE instructions 3948 are used to transfer CCB data between the CCB cache 3450 and the SPUs 3410. The READ and WRITE instructions only allow a data transfer between the SPUs 3410 and the CCB cache 3450 when a buffer 3940 has previously been allocated.

If all the available buffers 3940 are currently in use, then one of the SPUs 3410 will have to COMMIT one of the currently used buffers 3940 before the current ALLOCATE command can be serviced by the CCB cache 3450. The controller 3950 keeps track of which buffers 3940 are assigned to different CCB addresses. The SPUs 3410 only need to keep a count of the number of currently allocated buffers 3940. If the count number reaches the total number of available buffers 3940, one of the SPUs 3410 may issue a COMMIT or DROP command to free up one of the buffers 3940. In one embodiment, there are at least twice as many buffers 3940 as SPUs 3410. This enables all SPUs 3410 to have two available buffers 3940 at the same time.

Because the operations in the CCB cache 3450 are under software control, the SPUs 3410 control when buffers 3940 are released and transfer data to the external memory 3791. In addition, one SPU 3410 that initially allocates a buffer 3940 for a CCB can be different from the SPU 3410 that issues the LOAD command or different from the that eventually releases the buffer 3940 by issuing a COMMIT or DROP command.

The commands 3944 allow complete software control of data transfers between the CCB cache 3450 and the DRAM 3791. This has substantial advantages when packet data is being processed by one or more SPUs 3410 and when it is determined during packet processing that a particular CCB no longer needs to be loaded into or read from DRAM 3791.

For example, one of the SPUs 3410 may determine during packet processing that the packet has an incorrect checksum value. The packet can be DROPPED from the CCB buffer without ever loading the packet into DRAM 3791.

The buffers 3940 in one embodiment are implemented as cache lines. Therefore, only one cache line ever needs to be written back into external DRAM memory 3791.
In one embodiment, the cache lines are 512 bytes and the words are 64 bytes wide. The controller 3950 can recognize which cache lines have been modified and during a COMMIT
command only write back the cache lines that have been changed in buffers 3940.

FIG. 36 shows an example of how CCBs are used when processing TCP
sessions. The semantic processor 3100 (FIG. 23) can be used for processing any type of data;
however, the TCP packet 3960 is shown for explanation purposes. The packet 3960 in this example includes an Ethernet header 3962, an IP header 3964, IP source address 3966, IP
destination address 3968, TCP header 3970, TCP source port address 3972, TCP
destination port address 3974, and a payload 3976.

The direct execution parser 3180 directs one or more of the SPUs 3410 to obtain the source address 3966 and destination address 3968 from the IP header 3964 and obtain the TCP source port address 3972 and TCP destination port address 3974 from the TCP header 3970. These addresses may be located in the input buffer 3140 (FIG.
23).

The SPU 3410 sends the four address values 3966, 3968, 3972 and 3974 to a CCB
lookup table 3978 in the AMCD 3430. The lookup table 3978 includes arrays of IP source address fields 3980, IP destination address fields 3982, TCP source port address fields 3984, and TCP destination port address fields 3986. Each unique combination of addresses has an associated CCB tag 3979.

The AMCD 3430 tries to match the four address values 3966, 3968, 3972 and 3974 with four entries in the CCB lookup table 3978. If there is no match, the SPU
3410 will allocate a new CCB tag 3979 for the TCP session associated with packet 3960 and the four address values are written into table 3978. If a match is found, then the AMCD
3430 returns the CCB tag 3979 for the matching combination of addresses.

If a CCB tag 3979 is returned, the SPU 410 uses the returned CCB tag 3979 for subsequent processing of packet 3960. For example, the SPU 3410 may load particular header information from the packet 3960 into a CCB located in CCB cache 3450.
In addition, the SPU 410 may send payload data 3976 from packet 3960 to the streaming cache 3470 (FIG. 30).

FIG. 37 shows some of the control information that may be contained in a CCB
3990.
The CCB 3990 may contain the CCB tag 3992 along with a session ID 3994. The session ID
3994 may contain the source and destination address for the TCP session. The may also include linked list pointers 3996 that identify locations in external memory 3791 that contain the packet payload data. The CCB 3990 can also contain a TCP
sequence number 998 and an acknowledge number 4000. The CCB 3990 can include any other parameters that may be needed to process the TCP session. For example, the CCB
3990 may include a receive window field 4002, send window field 4004, and a timer field 4006.
All of the TCP control fields are located in the same associated CCB 3990.
This allows the SPUs 3410 to quickly access all of the associated fields for the same TCP session from the same CCB buffer 3940 in the CCB cache 3450. Further, because the CCB
cache 3450 is controlled by software, the SPUs 3410 can maintain the CCB 3990 in the CCB cache 3450 until all required processing is completed by all the different SPUs 3410.

There could also be CCBs 3990 associated with different OSI layers. For example, there may be CCBs 3990 associated and allocated with SCSI sessions and other CCBs 3990 associated and allocated for TCP sessions within the SCSI sessions.

FIG. 38 shows how flags 4112 are used in the CCB cache 3450 to indicate when SPUs 3410 are finished processing the CCB contents in buffers 3940 and when the buffers 3940 are available to be released for access by another SPU.

An IP packet 4100 is received by the processing system 3100 (FIG. 23). The IP
packet 4100 has header sections including an IP header 4102, TCP header 4104 and ISCSI
header 4106. The IP packet 4100 also includes a payload 4108 containing packet data. The parser 3180 (FIG. 23) may direct different SPUs 3410 to process the information in the different IP header 4102, TCP header 4104, ISCSI header 4106 and the data in the payload 4108. For example, a first SPU # 1 processes the IP header information 4102, a SPU #2 processes the TCP header information 4104, and a SPU #3 processes the ISCSI
header information 4106. Another SPU #N may be directed to load the packet payload 1108 into buffers 4114 in the streaming cache 3470. Of course, any combination of SPUs 3410 can process any of the header and payload information in the IP packet 4100.

All of the header information in the IP packet 4100 can be associated with a same CCB 4110. The SPUs 1-3 store and access the CCB 4110 through the CCB
cache 3450. The CCB 4110 also includes a completion bit mask 4112. The SPUs 34101ogically OR a bit in the completion mask 4112 when their task is completed. For example, the SPU
# 1 may set a first bit in the completion bit mask 4112 when processing of the IP header 4102 is completed in the CCB 4110. The SPU #2 may set a second bit in the completion bit mask 4112 when processing for the TCP header 4104 is complete. When all of the bits in the completion bit mask 4112 are set, this indicates that SPU processing is completed on the IP
packet 4100.

Thus, when processing is completed for the payload 4108, SPU #N checks the completion mask 4112. If all of the bits in mask 4112 are set, SPU #N may for example send a COMMIT command to the CCB cache 3450 (see FIG. 34) that directs the CCB
cache 3450 to COMMIT the contents of the cache lines containing CCB 4110 into external DRAM

memory 3791.
StreamingCache FIG. 39 shows the streaming cache 3470 in more detail. In one embodiment, the streaming cache 3470 includes multiple buffers 4200 used for transmitting or receiving data from the DRAM 3791. The buffers 4200 in one example are 256 bytes wide and each cache line includes a tag field 4202, a VSD field 4204, and a 64 byte portion of the buffer 4200.

Thus, four cache lines are associated with each buffer 4200. The streaming cache 3470 in one implementation includes two buffers 4200 for each SPU 3410.

The VSD field 4204 includes a Valid value that indicates a cache line as valid/invalid, a Status value that indicates a dirty or clean cache line, and a Direction value that indicates a read, write or no merge condition.

Of particular interest is a pre-fetch operation conducted by the cache controller 4206.
A physical address 4218 is sent to the controller 4206 from one of the SPUs 3410 requesting a read from the DRAM 3791. The controller 4206 associates the physical address with one of the cache lines, such as cache line 4210. The streaming cache controller 4206 then automatically conducts a pre-fetch 4217 for the three other 64 byte cache lines 4212, 4214 and 4216 associated with the same FIFO order of bytes in the buffer 4200.

One important aspect of the pre-fetch 4217 is the way that the tag fields 4202 are associated with the different buffers 4200. The tag fields 4202 are used by the cont"roller 4206 to identify a particular buffer 4200. The portion of the physical address 4218 associated with the tag fields 4202 is selected by the controller 4206 to prevent the buffers 4200 from containing contiguous physical address locations. For example, the controller 4206 may use middle order bits 4220 of the physical address 4218 to associate with tag fields 4202. This prevents the pre-fetch 4217 of the three contiguous cache lines 4212, 4214, and 4216 from colliding with streaming data operations associated with cache line 4210.

For example, one of the SPUs 3410 may send a command to the streaming cache 3470 with an associated physical address 4218 that requires packet data to be loaded from the DRAM memory 3791 into the first cache line 4210 associated with a particular buffer 4200.
The buffer 4200 having a tag value 4202 associated with a portion of the physical address 4218. The controller 4206 may then try to conduct the pre-fetch operations 1217 to also load the cache lines 4212, 4214 and 4216 associated with the same buffer 4200.
However, the pre-fetch 4217 is stalled because the buffer 4200 is already being used by the SPU 3410. In addition, when the pre-fetch operations 4217 are allowed to complete, they could overwrite the cache lines in the buffer 4200 that were already loaded pursuant to other SPU commands.

By obtaining the tag values 4202 from middle order bits 4220 of the physical address 4218, each consecutive 256 byte physical address boundary will be located in a different memory buffer 4200, thus, avoiding collisions during the pre-fetch operations.

AMCD
FIG. 40 illustrates a functional block diagram of an example embodiment of the AMCD 3430 of FIG. 28. The SPU cluster 4012 communicates directly to the AMCD

while the ARM 4014 can communicate to the AMCD 3430 through the SPUs 3710 in the SPU cluster 4012. The AMCD 3430 provides a memory lookup facility for the SPUs 3410.
In one example, a SPU 3410 determines where in memory, e.g., within the external,DRAM
3791 (FIG. 28), a previously stored entry is stored. The lookup facility in the AMCD 3430 can look up where data is stored anywhere in the network system, and is not limited to the external DRAM 3791.

When the system is in a non-learning mode, a SPU 3410 maintains its own table of memory mappings, and the SPU manages its table by adding, deleting, and modifying entries.
When the system is in a learning mode, a SPU 3410 maintains the table by performing commands that search the TCAM memory while also adding an entry, or that search the TCAM memory while also deleting an entry. Key values are used by the SPU 3410 in performing each of these different types of searches, in either mode.

The AMCD 3430 of FIG. 40 includes a set of lookup interfaces (LUIFs) 4062. In one embodiment, there are eight LUIFs 4062 in the AMCD 3430. Detail of an example LUIF is illustrated, which includes a set of 64-bit registers 4066. The registers 4066 provide storage for data and commands to implement a memory lookup, and the lookup results are also returned via the registers 4066. In one embodiment, there is a single 64 bit register for the lookup command, and up to seven 64 bit registers to store the data. Not all data registers need be used. In some embodiments of the invention, a communication interface between the SPU cluster 4012 and the LUIFs 4062 is 64 bits wide, which makes it convenient to include 64 bit registers in the LUIFs 4062. An example command structure is illustrated in FIG. 41, the contents of which will be described below.

Because there is a finite number of LUIFs 4062 in a designed system, and because LUIFs cannot be accessed by more than one SPU 3410 at a time, there is a mechanism to allocate free LUIFs to a SPU 3410. A free list 4050 manages the usage of the LUIFs 4062.
When a SPU 3410 desires to access a LUIF 4062, the SPU reads the free list 4050 to determine which LUIFs 4062 are in use. After reading the free list 4050, the address of the next available free LUIF 4062 is returned, along with a value that indicates the LUIF 4062 is able to be used. If the returned value about the LUIF 4062 is valid, the SPU
3410 can safely take control of that LUIF. Then an entry is made in the free list 4050 that the particular LUIF

4062 then cannot be used by any other SPU 3410 until the first SPU releases the LUIF. After the first SPU 3410 finishes searching and gets the search results back, the SPU puts the identifier of the used LUIF back on the free list 4050, and the LUIF is again available for use by any SPU 3410. If there are no free LUIFs 4062 in the free list 4050, the requesting SPU
3410 will be informed that there are no free LUIFs, and the SPU will be forced to try again later to obtain a free LUIF 4062. The free list 4050 also provides a pipelining function that allows SPUs 3410 to start loading indexes while waiting for other SPU requests to be processed.

The selected LUIF sends the lookup command and data to an arbiter 4068, described below. The arbiter 4068 selects which particular LUIF 4062 accesses a particular TCAM

controller. In this described embodiment, there is an external TCAM controller 1072 as well as an internal TCAM controller 4076. The external TCAM controller 4072 is coupled to an external TCAM 4082, which, in turn, is connected to an external SRAM 4092.
Similarly, the internal TCAM controller 4076 is coupled to an internal TCAM 4096, which, in turn, is coupled to an internal SRAM 4086.

Typically, only one TCAM, either the internal TCAM 4096 or the external TCAM

4082 would be active in the system at any one time. In other words, if the system includes the external TCAM and SRAM 4082, 4092, then AMCD 3430 communicates with these external memories. Similarly, if the system does not include the external TCAM
and SRAM
memories 4082, 4092, then the AMCD 3430 communicates only with the internal TCAM

4096 and the internal SRAM 4086. As follows, only one TCAM controller 4076 or would be used depending on whether the external memory was present. The particular controller 4072, 4076 that is not used by the AMCD 3430 would be "turned off' in a setup process. In one embodiment, a setup command is sent to the AMCD 3430 upon system initialization that indicates if an external TCAM 4082 is present. If the external TCAM 1082 is present, the internal TCAM controller 4076 is "turned off," and the external TCAM
controller 4072 is used. In contrast, if the external TCAM 4082 is not present, then the external TCAM controller 4072 is "turned off," and the internal TCAM
controller 4076 is used. Although it is preferable to use only one TCAM controller, either 4076 or 4072, for simplicity, the AMCD 3430 could be implemented to use both TCAM controllers 4076 and 4072.

In an example embodiment, the internal TCAM 4096 includes 512 entries, as does the internal SRAM 4086. In other example embodiments, the external TCAM 4082 includes 64k to 256k entries (an entry is 72-bits and multiple entries can be ganged together to create searches wider than 72-bits), with a matching number of entries in the external SRAM 4092.
The SRAMS 4086, 4092 are typically 20 bits wide, while the TCAMs 4082, 4086 are much wider. The internal TCAM 4096 could be, for example, 164 bits wide, while the external TCAM 4082 could be in the range of between 72 and 448 bits wide, for example.

When a SPU 3710 performs a lookup, it builds a key from the packet data, as described above. The SPU 3410 reserves one of the LUIFs 4062 and then loads a command and data into the registers 4066 of the LUIF 4062. When the command and data are loaded, the search commences in one of the TCAMs 4096 or 4082. The command from the register 4066 is passed to the arbiter 4068, which in turn sends the data to the appropriate TCAM
4096, 4082. Assume, for example, that the external TCAM 4082 is present and is therefore in use. For the TCAM command, the data sent by the SPU 3410 is presented to the external TCAM controller 4072, which presents the data to the external TCAM 4082. When the external TCAM 4082 finds a match of the key data, corresponding data is retrieved from the external SRAM 4092. In some embodiments, the SRAM 4092 stores a pointer to the memory location that contains the desired data indexed by the key value stored in the TCAM.
The pointer from the SRAM 4092 is returned to the requesting SPU 3410, through the registers 4066 of the original LUIF 4062 used by the original requesting SPU
3410. After the SPU 3410 receives the pointer data, it releases the LUIF 4062 by placing its address back in the free list 4050, for use by another SPU 3410. The LUIFs 4062, in this manner, can be used for search, write, read or standard maintenance operations on the DRAM
3791, or other memory anywhere in the system.

Using these methods, the TCAM 4082 or 4096 is used for fast lookups in CCB
DRAM 3791B (FIG. 30). The TCAM 4082 or 4096 can also be used for applications where a large number of sessions need to be looked up for CCBs for IPv6 at the same time. The TCAM 4082 or 4096 can also be used for implementing a static route table that needs to lookup port addresses for different IP sessions.

A set of configuration register tables 4040 is used in conjunction with the key values sent by the SPU 3410 in performing the memory lookup. In one embodiment, there are 16 table entries, each of which can be indexed by a four-bit indicator, 0000-1111. For instance, data stored in the configuration table 4040 can include the size of the key in the requested lookup. Various sized keys can be used, such as 64, 72, 128, 144, 164, 192, 256, 288, 320, 384, and 448, etc. Particular key sizes and where the keyed data will be searched, as well as other various data, are stored in the configuration table 4040. With reference to FIG. 41, a table identifier number appears in the bit locations 19:16, which indicates which value in the configuration table 4040 will be used.

FIG. 42 illustrates an example arbiter 4068. The arbiter 4068 is coupled to each of the LUIFs 4062, and to a select MUX 4067 that is coupled to both the internal and external TCAM controllers 4076, 4072. As described above, in some embodiments of the invention, only one TCAM controller 4076 or 4072 is active at one time, which is controlled by the signal sent to the select MUX 4067 at startup. In this embodiment, the arbiter 4068 does not distinguish whether its output signal is sent to the internal or external TCAM
controller 4076, 4072. Instead, the arbiter 4068 simply sends the output signal to the select MUX 4067, and the MUX routes the lookup request to the appropriate TCAM controller 4076, 4072, based on the state of the setup value input to the MUX.

The function of the arbiter 4068 is to select which of the LUIFs 4062, labeled in FIG. 42 as LUIF-1 - LUIF-8, will be next serviced by the selected TCAM
controller 4076 or 4072. The arbiter 4068, in its most simple form, can be implemented as simply a round-robin arbiter, where each LUIF 4062 is selected in succession. In more intelligent systems, the arbiter 4068 uses a past history to assign a priority value describing which LUIF 4062 should next be selected, as described below.

In a more intelligent arbiter 4068, a priority system indicates which LUIF
4062 was most recently used, and factors this into the decision of which LUIF 4062 to select for the next lookup operation. FIG. 43 illustrates an example of arbitration in an example intelligent arbiter 4068. At Time A, each of the priority values have already been initialized to "0", and LUIF-3 and LUIF-7 both have operations pending. Because the arbiter 4068 selects only one LUIF 4062 at a time, LUIF-3 is arbitrarily chosen because all LUIFs having pending operations also have the same priority, in this case, "0." Once the LUIF-3 is choseri, its priority is set to 1. In Time B, LUIF-3 has a new operation pending, while LUIF-7 still has an operation that has not been served. The arbiter 4068, in this case, selects LUIF-7, because it has a "higher" priority than LUIF-3. This ensures fair usage by each of the LUIFs 4062, and that no one LUIF monopolizes the lookup time.

In Time C, LUIF-1 and LUIF 3 have operations pending, and the arbiter 4068 selects LUIF-1 because it has a higher priority, even though the operation in LUIF-3 has been pending longer. Finally, in Time D, only LUIF-3 has an operation pending, and the arbiter 4068 selects LUIF-3, and moves its priority up to "2".

In this manner, the arbiter 4068 implements an intelligent round-robin arbitration. In other words, once a particular LUIF 4062 has been selected, it moves to the "end of the line,"
and all of the other LUIFs having pending operations will be serviced before the particular LUIF is again chosen. This equalizes the time each LUIF 4062 uses in its lookups, and ensures than no one particular LUIF monopolizes all of the lookup bandwidth.

The system described above can use dedicated processor systems, micro controllers, programmable logic devices, or microprocessors that perform some or all of the operations.
Some of the operations described above may be implemented in software and other operations may be implemented in hardware.

Those skilled in the art recognize that other functional partitions are possible within the scope of the invention. Further, what functions are and are not implemented on a common integrated circuit can vary depending on application.

Finally, although the specification may refer to "an", "one", "another", or "some"
embodiment(s) in several locations, this does not necessarily mean that each such reference is to the same embodiment(s), or that the feature only applies to a single embodiment.

Claims

1. A storage server comprising:

a datagram interface to receive client requests for data operations;
a storage interface to access at least one data storage device; and a semantic processor having the capability to parse received client requests, based on a stored grammar, in order to transact, through the storage interface, responsive data operations with the at least one data storage device.

2. The storage server of claim 1, wherein the semantic processor comprises a direct execution parser to parse symbols from datagrams received at the datagram interface, and a plurality of semantic code execution engines to perform data operations as directed by the direct execution parser.

3. The storage server of claim 1, wherein the semantic processor comprises a direct execution parser to parse symbols from datagrams received at the datagram interface, and a microprocessor to perform data operations on the parsed datagrams.

4. The storage server of claim 1, wherein the semantic processor comprises a direct execution parser having a parser stack, to parse symbols from datagrams received at the datagram interface according to stack symbols, and at least one semantic code execution engine to perform data operations as directed by the direct execution parser, the semantic code execution engine having the capability to alter the direct execution parser operation by modifying the contents of the parser stack.

5. The storage server of claim 1, wherein the at least one data storage device is a disk drive having an enclosure, and wherein the datagram interface, storage interface, and semantic processor are packaged within the disk drive enclosure.

6. The storage server of claim 1, wherein the storage interface is a secondary datagram interface and the at least one data storage device is accessed remotely over the secondary datagram interface, wherein the storage server can operate as a client of the at least one data storage device in order to serve client requests received by the storage server.

7. The storage server of claim 1, further comprising a second datagram interface to receive client requests for data operations, and a parser source selector to allow the semantic processor to switch between parsing datagram symbols from the two datagram interfaces.

8. The storage server of claim 1, further comprising a reconfigurable memory to hold at least a portion of the stored grammar, such that the storage server is reconfigurable to function as a storage server with at least two different storage server protocol sets.

9. A device comprising:

a direct execution parser configured to control the processing of digital data by semantically parsing data in a buffer;

a semantic processing unit configured to perform data operations when prompted by the direct execution parser; and a memory subsystem configured to process the digital data when directed by a semantic processing unit.

10. The device of claim 9 wherein the memory subsystem includes a plurality of memory caches coupled between a memory and the semantic processing unit.

11. The device of claim 9 wherein the memory subsystem includes a cryptography circuit to perform cryptography operation on digital data when directed by the semantic processing unit.

12. The device of claim 9 wherein the memory subsystem includes a search engine to perform look-up functions when directed by the semantic processing unit.

13. The device of claim 9 wherein the buffer receives the data to be parsed by the direct execution parser from an external network.

14. The device of claim 9 wherein the buffer receives the data to be parsed by the direct execution parser from the semantic processing unit.