JP2003526269A

JP2003526269A - High-speed data processing using internal processor memory area

Info

Publication number: JP2003526269A
Application number: JP2001564992A
Authority: JP
Inventors: テレンスハッセイ，; ドナルドダブリュー．モンロー，; アーノルドエヌ．ソダー，
Original assignee: テノーネットワークス，インコーポレイテッド
Priority date: 2000-03-03
Filing date: 2001-03-02
Publication date: 2003-09-02
Also published as: WO2001067237A2; AU2001249083A1; US20010049744A1; WO2001067237A3; CN1437724A; EP1261915A2; CA2402018A1; KR20030007447A; US7328277B2

Abstract

(57)【要約】データ処理システムは、その内部レジスタファイル内のプロセッサの動作を制限して、プロセッサによって実行される命令回数を減少させることにより、その性能が大いに改善され得る。プロセッサと関係のない直接メモリアクセスを用いて、内部レジスタファイル内に適合するほど小さなデータが内部レジスタファイル内に送信され得、実行結果が内部レジスタファイルから削除され得る。従って、プロセッサは、外部に格納されたデータを操作するためにロードおよび格納命令を実行することを回避することが可能になる。さらに、処理活動のデータおよび実行結果は、完全に内部レジスタファイル内にあるプロセッサによってアクセスかつ操作される。 (57) [Summary] The performance of a data processing system can be greatly improved by limiting the operation of the processor in its internal register file to reduce the number of instructions executed by the processor. Using direct memory access independent of the processor, data that is small enough to fit in the internal register file may be sent into the internal register file and execution results may be deleted from the internal register file. Thus, the processor can avoid executing load and store instructions to manipulate externally stored data. Further, the data and execution results of the processing activities are accessed and manipulated by the processor, which is completely in the internal register file.

Description

Detailed Description of the Invention

【０００１】（関連出願）本出願は、２０００年３月３日に出願された米国仮特許出願番号第６０／１８
６，７８２号に基づく優先権およびその利益を主張する。本明細書中、この出願
の全体を参考として援用する。(Related Application) This application is a US provisional patent application No. 60/18 filed on Mar. 3, 2000.
Claim priority and benefits under 6,782. The entire application is incorporated herein by reference.

【０００２】（発明の属する技術分野）本発明は、概して、情報処理に関し、具体的には、プロセッサの内部構成要素
内において行われる処理活動に関する。TECHNICAL FIELD OF THE INVENTION The present invention relates generally to information processing, and specifically to processing activities performed within internal components of a processor.

【０００３】（発明の背景）データ処理は、通常、メモリからデータを取り出すこと、取り出したデータを
処理すること、およびこの処理活動の結果をメモリ内に戻して格納することを含
む。このデータ処理活動をサポートするハードウェアアーキテクチャは、通常、
情報処理システムの個々のハードウェアユニット間の情報の流れおよび制御の流
れを制御する。そのようなハードウェアユニットの１つが、プロセッサまたは処
理エンジンであり、これは演算論理処理回路、一般目的および特殊目的レジスタ
、処理制御またはシーケンシングロジック、ならびにこれらの構成要素を接続す
るデータパスを含む。いくつかの実施形態において、プロセッサは、カスタム設
計された集積回路として設けられるか、または、特定用途向け特定集積回路（Ａ
ＳＩＣ）内に設けられる、独立型の中央演算ユニット（ＣＰＵ）として構成され
る。プロセッサは、１組の命令によって規定される動作と共に用いる内部レジス
タを有する。命令は、通常、命令メモリ内に格納され、プロセッサ上で利用可能
な１組のハードウェア機能を特定する。BACKGROUND OF THE INVENTION Data processing typically involves retrieving data from memory, processing the retrieved data, and storing the results of this processing activity back in memory. The hardware architecture that supports this data processing activity is typically
It controls the flow of information and the flow of control between the individual hardware units of the information processing system. One such hardware unit is a processor or processing engine, which includes arithmetic logic processing circuits, general purpose and special purpose registers, processing control or sequencing logic, and data paths connecting these components. . In some embodiments, the processor is provided as a custom designed integrated circuit, or is an application specific integrated circuit (A).
It is configured as an independent type central processing unit (CPU) provided in the SIC). The processor has internal registers for use with the operations defined by the set of instructions. Instructions are typically stored in instruction memory and specify a set of hardware functions available on the processor.

【０００４】これらの機能を設ける場合、プロセッサは、通常、「一時（ｔｒａｎｓｉｅｎ
ｔ）」データを、プロセッサの外部にあるメモリから取り出し、「ロード」命令
を実行することにより、このデータの一部を順次またはランダムに内部レジスタ
内にロードし、命令に応じてデータを処理し、そして、「格納」命令を用いて、
処理したデータを外部メモリに戻して格納する。一時データを内部レジスタにロ
ードすることおよび命令結果を内部レジスタから除去することに加えて、ロード
命令および格納命令もまた一時データの実際の処理の間に周期的に用いられて、
処理活動を完了するために必要なさらなる情報（例えばアクセスステータスレジ
スタおよびコマンドレジスタ）にアクセスする。外部メモリへの周期的なロード
／格納アクセスは、プロセッサの実行能力がプロセッサの外部インターフェース
能力よりも実質的に早いため、通常、非効率的になる。結果として、プロセッサ
は、アクセスされたデータが内部レジスタファイルにロードされるのを待つ間に
、しばしばアイドル状態になる。When providing these functions, the processor is typically "transient".
t) "data from memory external to the processor and executing a" load "instruction to load some of this data into internal registers sequentially or randomly and process the data in response to the instruction. , And using the “store” command,
Store the processed data back in the external memory. In addition to loading temporary data into internal registers and removing instruction results from internal registers, load and store instructions are also used cyclically during the actual processing of temporary data,
Access additional information needed to complete the processing activity (eg, access status register and command register). Cyclic load / store access to external memory is typically inefficient because the processor's execution capability is substantially faster than the processor's external interface capability. As a result, the processor is often idle while waiting for the accessed data to be loaded into the internal register file.

【０００５】この非効率は、通信システム内で動作するデバイスにおいて特に制限となり得
る。これは、正味の効率が、デバイスの全体のデータ処理能力を制限し、伝送さ
れるデータ量からいくつかのデータを間引かなければ、ネットワーク自体の最大
データレートを制限するからである。This inefficiency can be particularly limiting in devices operating within communication systems. This is because the net efficiency limits the overall data handling capacity of the device and limits the maximum data rate of the network itself unless some data is thinned out from the amount of data transmitted.

【０００６】（発明の要旨）本発明は、あるデータセットが、このデータセットを処理するために割り当て
られたプロセッサのローカルレジスタファイル領域内に含まれるのに十分小さい
場合、外部メモリへの周期的なアクセスが必要でないと理解する。したがって、
本発明は、少なくとも部分的に、プロセッサとは独立して実行され、プロセッサ
によるロード命令および格納命令の実行を避ける、データアクセス技術を組み込
む。SUMMARY OF THE INVENTION The present invention provides a method for providing periodicity to external memory when a data set is small enough to be contained within the processor's local register file area allocated to process this data set. Understand that special access is not required. Therefore,
The present invention incorporates, at least in part, data access techniques that execute independently of the processor and avoid execution of load and store instructions by the processor.

【０００７】ある実施形態において、本発明の一局面を組み込んだ情報処理システムおよび
方法は、割り当てられたプロセッサの動作を制限して、プロセッサの内部レジス
タファイル内のデータセットを処理する。情報処理システムは、プロセッサ、入
口（ｉｎｇｒｅｓｓ）エレメント、および出口（ｅｇｒｅｓｓ）エレメントを含
む。入口エレメントは、インターフェースからデータソース（例えば、通信ネッ
トワークからデータを受け取るネットワークインターフェースに対応するデータ
ソース）への未処理のデータを受け取る。入口エレメントは、内部レジスタファ
イル領域に直接アクセスすることにより、未処理のデータまたはその一部を、プ
ロセッサの内部レジスタファイル領域に送信する。プロセッサ内のデータを操作
するユニット（例えば、演算論理ユニット）が、プロセッサのレジスタファイル
へのこのデータ転送に応答してデータの操作および処理を行い、その内部レジス
タファイル領域内で動作を全体的に制限する。この処理活動が完了すると、出口
エレメントが処理されたデータに直接アクセスし、内部レジスタファイル領域か
らこのデータを除去する。あるいは、中間ステートマシンが処理されたデータに
直接アクセスし、このデータを出口エレメントに転送する。In one embodiment, an information processing system and method incorporating one aspect of the invention limits the operation of an assigned processor to process a data set in the processor's internal register file. The information processing system includes a processor, an ingress element, and an egress element. The ingress element receives the raw data from the interface to a data source (eg, the data source corresponding to the network interface receiving the data from the communication network). The entry element sends the unprocessed data or a part thereof to the internal register file area of the processor by directly accessing the internal register file area. A unit for manipulating data in the processor (eg, an arithmetic logic unit) manipulates and processes the data in response to this data transfer to the processor's register file and generally operates within its internal register file area. Restrict. When this processing activity is complete, the exit element directly accesses the processed data and removes it from the internal register file area. Alternatively, the intermediate state machine directly accesses the processed data and forwards this data to the exit element.

【０００８】本発明の一局面において、１つ以上のステートマシンが入口エレメントおよび
出口エレメント内に含まれ、入口エレメントおよび出口エレメントの動作を支配
する。ステートマシンは、プロセッサの内部レジスタファイル領域に直接アクセ
スして、この領域にデータを送信するかまたはこの領域からデータを除去する。
ある実施形態において、ステートマシンのデータ転送活動は、ａ）入口エレメン
トにおける未処理データの受け取り、ｂ）プロセッサのレジスタファイル領域へ
の未処理データの転送を示すプロセッサロジックによる信号、ｃ）コマンドレジ
スタなどのロジックエレメント内に格納された値の変化、に応答して開始される
。In one aspect of the invention, one or more state machines are included within the inlet and outlet elements and govern the operation of the inlet and outlet elements. The state machine directly accesses the internal register file area of the processor to send data to or remove data from this area.
In one embodiment, the data transfer activity of the state machine is: a) signals by processor logic indicating the transfer of raw data at the entry element, b) transfer of raw data to the processor's register file area, c) command registers, etc. Is initiated in response to a change in the value stored in the logic element of the.

【０００９】本発明の利益は、画像処理、信号処理、映像処理、およびネットワークパケッ
ト処理などの焦点を合わせた処理システムなどの、多数の情報処理システムにお
いて実現され得る。一例として、本発明は、ルーターなどの通信デバイス内で実
現され得、ルート処理、パス決定、パス切り換え機能等のネットワークサービス
を実行する。ルート処理機能は、パケットに必要なルーチンのタイプを決定し、
それに対してパス切り換え機能は、ルーターが、１つのインターフェース上でパ
ケットを受け取り、それを第２のインターフェースに転送することを可能にする
。パス決定機能は、パケットの転送のために最も適切なインターフェースを選択す
る。The benefits of the present invention may be realized in numerous information processing systems, such as focused processing systems such as image processing, signal processing, video processing, and network packet processing. As an example, the present invention may be implemented within a communication device such as a router and performs network services such as route processing, path determination, path switching functions and the like. The route processing function determines the type of routine required for the packet,
The path switching function, on the other hand, allows a router to receive a packet on one interface and forward it to a second interface. The path determination function selects the most appropriate interface for forwarding the packet.

【００１０】通信デバイスの複数のインターフェース間でのパケットの伝送をサポートする
ために、通信デバイスのパス切替機能を、本発明の局面を組み込んだ１つ以上の
フォワーディングエンジンＡＳＩＣ内で実現し得る。この例示の実施形態におい
て、パケットデータは、通信ネットワークを介して通信デバイスのネットワーク
インターフェースの特定の入力ポートに関連付けられた入口ロジックによって受
信される。次いで、プロセッサは、入口ロジックによってパケットを処理する受
信ポートに関連付けられた候補プロセッサのプールから選択される。To support the transmission of packets between multiple interfaces of a communication device, the path switching functionality of the communication device may be implemented within one or more forwarding engine ASICs incorporating aspects of the present invention. In this exemplary embodiment, the packet data is received via the communication network by ingress logic associated with a particular input port of the network interface of the communication device. A processor is then selected from the pool of candidate processors associated with the receiving port that processes the packet by the ingress logic.

【００１１】プロセッサが割り当てられた後、パケットはヘッダ部分および本体部分に分割
される。パケットヘッダは、ダイレクトメモリ／レジスタアクセスを用いて、プ
ロセッサがロード命令または格納命令を呼び出すことなく、パケットヘッダに書
き込むように構成された入口ロジックの少なくとも１つの状態マシンによって、
割り当てられたプロセッサに関連付けられた内部レジスタファイルなどのメモリ
要素内の固定された場所内に書き込まれる。パケット本体は出力バッファに書き
込まれる。次いで、プロセッサは、ローカルに格納された命令に従って（ここで
も、プロセッサはロード命令または格納命令を呼び出さない）、パケットヘッダ
を処理して、処理されたパケットヘッダを選択された出力バッファに伝送する。
出力バッファにおいて、パケットヘッダはパケット本体に組み込まれ、次いで、
通信デバイスから伝達されるように宛先出力ポートに伝送される。After the processor is assigned, the packet is divided into a header part and a body part. The packet header is written by at least one state machine of the ingress logic configured to write to the packet header using direct memory / register access without the processor invoking a load or store instruction,
Written in a fixed location in a memory element, such as an internal register file associated with the assigned processor. The packet body is written to the output buffer. The processor then processes the packet header according to the locally stored instructions (again, the processor does not call a load or store instruction) and transmits the processed packet header to the selected output buffer.
In the output buffer, the packet header is embedded in the packet body, then
It is transmitted to the destination output port as it is transmitted from the communication device.

【００１２】パケットヘッダを受信する前、割り当てられたプロセッサは、プロセッサの命
令メモリ（例えば、アドレス０）内の第１の既知の場所／アドレスにおいて格納
された命令を無限ループで繰り返し実行する。プロセッサ内のハードウェアは、
アドレス０が、プロセッサに接続された命令メモリからの命令ではなく、ハード
ワイヤード命令が返される「特別」アドレスであることを検出する。パケットヘ
ッダが入口ロジックからプロセッサに伝送されると、制御信号はヘッダ伝送が進
行中であることをプロセッサに示す。この信号がアクティブである間、プロセッ
サのハードウェアは、プロセッサのプログラムカウンタを強制的に非特別アドレ
ス（例えば、アドレス２）にし、無限ループの実行を終了する。パケットヘッダ
の伝送を完了すると、プロセッサは、その命令メモリのアドレス２において開始
する命令の実行を開始する。パケット処理活動が完了したら、プロセッサをリセ
ットして（例えば、プログラムカウンタをアドレス０に設定して）、上述の特別
アドレスにおいて命令を繰り返し実行する。Prior to receiving the packet header, the assigned processor repeatedly executes the instruction stored at the first known location / address in the processor's instruction memory (eg, address 0) in an endless loop. The hardware inside the processor is
Detects that address 0 is the "special" address to which the hardwired instruction is returned, rather than the instruction from the instruction memory attached to the processor. When the packet header is transmitted from the ingress logic to the processor, the control signal indicates to the processor that header transmission is in progress. While this signal is active, the processor hardware forces the processor's program counter to a non-special address (eg, address 2), ending the execution of the infinite loop. Upon completing the transmission of the packet header, the processor begins executing instructions starting at address 2 of its instruction memory. When the packet processing activity is complete, the processor is reset (eg, the program counter is set to address 0) and the instruction is repeatedly executed at the above special address.

【００１３】このように、プロセッサがパケットヘッダを処理する準備ができるまで、プロ
セッサは、任意の双方向通信または以前の知識を必要とすることなく、パケット
ヘッダがプロセッサのレジスタファイルに直接書き込まれる。パケットの状態ま
たは特徴（例えば、長さ）に関する他の情報はさらに、類似の手順を用いてロー
カルにレジスタファイルに格納され得、これにより、プロセッサはこの情報を取
得するために外部ソースにアクセスする必要がなくなる。Thus, until the processor is ready to process the packet header, the processor writes the packet header directly to the processor's register file, without the need for any two-way communication or prior knowledge. Other information about the state or characteristics (eg, length) of the packet may also be stored locally in a register file using a similar procedure, which allows the processor to access an external source to obtain this information. There is no need.

【００１４】複数のプロセッサのプログラミングモデルを簡略化するために、各命令メモリ
内の共通の連続した命令を実行するように構成されたプロセッサそれぞれを用い
て、各パケットに１つのプロセッサを割り当て得る。パケットが通信ネットワー
クのワイヤ／ライン速度（すなわち、ネットワークインターフェースの最大ビッ
ト速度）で処理され得ることを保証するように十分なプロセッサが割り付けられ
ている。ＡＳＩＣ内の複数のプロセッサに本発明の局面を組み込んだ場合に実現
される縮小命令セットにより、ＡＳＩＣのダイサイズが小さくなる。したがって
、ＡＳＩＣなどの製造時に生じる技術的障害および悪影響を被ることなく、ＡＳ
ＩＣ内のプロセッサ数の密度を高くすることが可能になる。例えば、プロセッサ
のクロック速度を上げることによって、より多くのプロセッサをＡＳＩＣに追加
することによって、そして複数のＡＳＩＣから（共通命令セットを有する）プロ
セッサのプールを統合することによって、本発明のＡＳＩＣ実現をさらに拡張す
ることが可能である。To simplify the programming model of multiple processors, one processor may be assigned to each packet, with each processor configured to execute a common sequence of instructions in each instruction memory. Sufficient processors are allocated to ensure that packets can be processed at the wire / line rate of the communication network (ie, the maximum bit rate of the network interface). The reduced instruction set realized when incorporating aspects of the present invention into multiple processors within an ASIC reduces the die size of the ASIC. Therefore, the ASIC can be manufactured without suffering technical obstacles and adverse effects that occur during manufacturing.
It is possible to increase the density of the number of processors in the IC. The ASIC implementation of the present invention can be implemented, for example, by increasing the processor clock speed, by adding more processors to the ASIC, and by consolidating a pool of processors (having a common instruction set) from multiple ASICs. Further expansion is possible.

【００１５】一実施形態において、通信ネットワークを介して受信されたパケットを処理す
るために、縮小命令セットコンピュータ（ＲＩＳＣ）アーキテクチャを提示する
、完全対称型マルチプロセッシング（ＳＭＰ）システム内で本発明を用い得る。
ＳＭＰシステムは、プールとして動作する共通ソフトウェアを備えた複数の同一
のプロセッサを含み、プロセッサはいずれも特定のパケットを処理することに適
任である。各入来パケットはプール内の利用可能なプロセッサに割り付けられ、
プロセッサは共通命令セットを用いて同時にパケットを処理する。ＳＭＰシステ
ムは、適切なパケットの順序を提示するように、処理されたパケットストリーム
を再構築する。In one embodiment, the invention is used in a fully symmetric multiprocessing (SMP) system that presents a reduced instruction set computer (RISC) architecture for processing packets received over a communications network. obtain.
The SMP system includes a plurality of identical processors with common software acting as a pool, all of which are suitable for processing a particular packet. Each incoming packet is assigned to an available processor in the pool,
The processor processes packets simultaneously using a common instruction set. The SMP system reconstructs the processed packet stream to present the proper packet order.

【００１６】上述の説明は、添付の図面と共に考えみれば、本発明の以下の詳細な説明から
より容易に理解される。The above description will be more readily understood from the following detailed description of the invention when considered in conjunction with the accompanying drawings.

【００１７】（発明の詳細な説明）通常のマイクロプロセッサは、ロードおよび格納命令を実行して、将来の実行
のために、プロセッサに対して外部のメモリエレメント中に格納されたデータ構
造を表す一時的なデータのイメージをプロセッサのローカルレジスタにロードす
る。本明細書中で使用されるように、用語「ローカルレジスタファイル」は、デ
ータを操作する際に使用するために利用可能である、プロセッサの内部構造中の
レジスタ全体を意味する。「レジスタ」は、Ｄフリップフロップなどの格納エレ
メントの異なるグループをいう。プロセッサ設計に応じて、レジスタファイル空
間は、メモリおよびフリップフロップから構成され得る。いずれの場合も、レジ
スタファイルは通常、独立にアクセス可能な複数のリードおよびライトポートを
提供する高速メモリコンポーネントを使用して実施される。ソフトウェアプログ
ラムの実行中、通常のプロセッサは、比較的多数のロード／格納命令を実行して
、データを外部メモリからローカルレジスタファイルへ移動し、かつ実行結果を
ローカルレジスタファイルから外部メモリへ移動する。これらの頻繁な外部メモ
リへのアクセスが必要なのは、処理されるように設定されたデータがあまりにも
大きく、ローカルレジスタファイルの実行領域に収まりきれないからである。DETAILED DESCRIPTION OF THE INVENTION A typical microprocessor executes load and store instructions to represent temporary data structures stored in memory elements external to the processor for future execution. An image of static data into a local register of the processor. As used herein, the term "local register file" refers to the entire register in the internal structure of the processor that is available for use in manipulating data. "Registers" refer to different groups of storage elements such as D flip-flops. Depending on the processor design, the register file space may consist of memory and flip-flops. In either case, the register file is typically implemented using high speed memory components that provide multiple independently accessible read and write ports. During execution of a software program, a typical processor executes a relatively large number of load / store instructions to move data from external memory to a local register file and move execution results from the local register file to external memory. These frequent accesses to external memory are necessary because the data set to be processed is too large to fit in the execution area of the local register file.

【００１８】本発明においては、頻繁な外部メモリへのアクセスは、ローカルレジスタ領域
に全部が配置される程度に十分小さなデータセット（１２８から５１２の８ビッ
トデータエレメントなど）を処理するためには必要でないと認識される。以下に
詳細に記載されるように、本発明は、ダイレクトメモリアクセス（ＤＭＡ）およ
びダイレクトレジスタアクセス（ＤＲＡ）技術を組み合わせて、データおよび実
行結果をプロセッサのレジスタファイルに出し入れする。この際、プロセッサは
、ロードおよび格納命令などの命令を実行してデータを移動する必要がない。こ
の意味で、ＤＭＡは、１つ以上の状態マシンを使用して１ブロックのデータをプ
ロセッサとは独立に内部メモリまたは外部メモリへ出し入れする方法をいう。同
様に、ＤＲＡは、特定のタイプのＤＭＡ、すなわち、１つ以上のブロックのデー
タをプロセッサとは独立にプロセッサのレジスタファイル領域に出し入れする移
動を含むＤＭＡをいう。１つの実施形態において、直接のレジスタファイルアク
セスを容易にするために、（１つのライトポートおよび２つのリードポートを有
する３−ポートレジスタファイルとは反対に）レジスタファイルの１領域が２つ
のライトポートおよび３つのリードポートを有する５−ポートレジスタファイル
領域として割り当てられる。このアプローチは、（レジスタファイル内の動作に
比較して）比較的遅い外部メモリへのアクセスを避け、メモリ待ち状態を避け、
かつプロセッサの命令セットのサイズを低減する。その結果、かつ個々のプロセ
ッサの性能を著しく増加することに加えて、このようなプロセッサなどを含む特
定アプリケーション向け集積回路（ＡＳＩＣ）のダイサイズおよび電力消費が低
減され、かつＡＳＩＣ内のプロセッサの総数が堪えられないほどのコストを発生
せずに著しく増加され得る。In the present invention, frequent external memory accesses are necessary to process a data set small enough to be entirely located in the local register area (such as 128 to 512 8-bit data elements). Is not recognized. As described in detail below, the present invention combines direct memory access (DMA) and direct register access (DRA) techniques to move data and execution results into and out of a processor's register file. At this time, the processor does not need to execute instructions such as load and store instructions to move data. In this sense, DMA refers to a method of using one or more state machines to transfer a block of data to and from an internal memory or an external memory independently of a processor. Similarly, DRA refers to a particular type of DMA, that is, DMA that involves moving one or more blocks of data into and out of a processor's register file area independently of the processor. In one embodiment, a region of the register file has two write ports (as opposed to a 3-port register file having one write port and two read ports) to facilitate direct register file access. And as a 5-port register file area with 3 read ports. This approach avoids relatively slow external memory access (compared to what happens in the register file), avoids memory wait states,
It also reduces the size of the processor instruction set. As a result, and in addition to significantly increasing the performance of individual processors, the die size and power consumption of application specific integrated circuits (ASICs) including such processors are reduced, and the total number of processors in the ASIC is reduced. Can be significantly increased without incurring unbearable costs.

【００１９】本発明は、本明細書中において、ネットワークを介して受信されるパケットを
処理するために通信デバイスのネットワークインタフェースカードにおいて実施
されるように記載されるが、この特定の実施は、単に例示的な実施形態であって
、かつ当業者は、本願発明の利点を有し得る任意数の他の実施形態およびアプリ
ケーションを認識する。例えば、かつ制限なしに、本発明は、画像処理、信号処
理、およびビデオ処理などの比較的小さなデータセットに関わる情報処理アプリ
ケーションに利用され得る。本発明はまた、広範囲のネットワーク通信デバイス
（スイッチおよびルータなど）および他の情報処理実施形態において実施され得
る。Although the present invention is described herein as being implemented in a network interface card of a communication device for processing packets received over a network, this particular implementation is merely It is an exemplary embodiment, and those skilled in the art will recognize any number of other embodiments and applications that may have the advantages of the present invention. For example, and without limitation, the present invention may be utilized in information processing applications involving relatively small data sets such as image processing, signal processing, and video processing. The present invention may also be implemented in a wide range of network communication devices (such as switches and routers) and other information processing embodiments.

【００２０】図１を参照する。通信デバイス１５０は、通信リンク１１２を介して通信ネッ
トワーク１１０から情報（パッケージ／フレーム、セル、またはＴＤＭフレーム
の形態など）、そして受信した情報を異なる通信ネットワークまたは分枝（ロー
カルエリアネットワーク（ＬＡＮ）１２０、大都市エリアネットワーク（ＭＡＮ
）１３０、またはワイドエリアネットワーク（ＷＡＮ）１４０など）または局地
的に取り付けられたエンドステーション（図示せず）へ転送する。通信デバイス
１５０は、ＮＩＣ１６０およびＮＩＣ１８０などの多くのネットワークインタフ
ェースカード（ＮＩＣ）を含み得る。各ＮＩＣは一続きのポート（１６２、１６
４、および１６６など）および出力ポート（１６８、１７０、および１７２など
）を有する。入力ポート１６２、１６４、および１６６は、通信ネットワーク１
１０から情報を受信し、そしてその情報を多くのパケット処理エンジン（図示せ
ず）に転送する。パケット処理エンジンは、エンドステーションを含むＬＡＮ１
２０、ＭＡＮ１３０、またはＷＡＮ１４０などの通信ネットワークに対応する出
力ポート１６８、１７０、および１７２の１つにおける送信のためにパケットを
処理かつ準備する。Please refer to FIG. Communication device 150 may communicate information (eg, in the form of packages / frames, cells, or TDM frames) from communication network 110 via communication link 112, and may receive the received information from a different communication network or branch (local area network (LAN) 120). , Metropolitan Area Network (MAN
) 130, or a wide area network (WAN) 140, etc.) or a locally attached end station (not shown). Communication device 150 may include many network interface cards (NICs) such as NIC 160 and NIC 180. Each NIC has a series of ports (162, 16
4 and 166) and output ports (168, 170, and 172, etc.). The input ports 162, 164, and 166 are connected to the communication network 1
It receives information from 10 and forwards it to many packet processing engines (not shown). The packet processing engine is LAN1 including end stations
Process and prepare the packet for transmission on one of output ports 168, 170, and 172 corresponding to a communications network such as 20, MAN 130, or WAN 140.

【００２１】図２を参照する。本発明の局面を実施するネットワークインタフェースカード
（ＮＩＣ）１６０は、入力ポート１６２、１６４および１６６、パケット処理ま
たは転送エンジン２２０、アドレスルックアップエンジン（ＡＬＥ）２１０、統
計モジュール２３０、待ち行列出し入れ（ｑｕｅｕｉｎｇ／ｄｅｑｕｅｕｉｎｇ
）モジュール２４０、および出力ポート１６８、１７０および１７２を含む。Ｎ
ＩＣ１６０は、入力ポート１６２，１６４および１６６におけるパケットに基づ
く通信ネットワーク１１０（図１）からパケットを受信する。転送エンジン２２
０は、ＡＬＥ２１０と一緒に、送信先に関連する適切な出力ポート１６８，１７
０および１７２をルックアップし、そして転送ベクトルをパケットの前に付加し
てそのパケットが適切な出力ポートへルーティングするのを補助することによっ
てパケットの送信先出力ポートを決定する。Please refer to FIG. A network interface card (NIC) 160 embodying aspects of the invention includes input ports 162, 164 and 166, a packet processing or forwarding engine 220, an address lookup engine (ALE) 210, a statistics module 230, queuing / queuing. dequeuing
) Module 240 and output ports 168, 170 and 172. N
IC 160 receives packets from communication network 110 (FIG. 1) based on the packets at input ports 162, 164 and 166. Transfer engine 22
0, along with ALE 210, is the appropriate output port 168,17 associated with the destination.
The destination output port of the packet is determined by looking up 0 and 172 and prepending the transfer vector to the packet to help route it to the appropriate output port.

【００２２】改変されたパケットは、待ち行列出し入れモジュール２４０へ送達され、そこ
で転送ベクトルを使用してパケットを特定の送信先出力ポート１６８、１７０お
よび１７２に関する待ち行列へ編成する。次いで、各パケットの転送ベクトルが
取り除かれ、そしてパケットは、選択された出力ポート１６８、１７０および１
７２への送信のためにスケジューリングされる。その後、パケットは選択された
出力ポート１６８、１７０および１７２からＬＡＮ１２０、ＭＡＮ１３０または
ＷＡＮ１４０などの通信ネットワークへ送信される。１つの実施形態において、
ＮＩＣ１６０の待ち行列出し入れモジュール２４０は、フルメッシュ相互接続（
図示せず）を介して改変されたパケットを受信するので、通信デバイス１５０内
に設置されたＮＩＣ１６０および１８０のいずれかの入力ポートにおいて元々受
信されたパケット（それ自身のＮＩＣ１６０の入力ポート１６２、１６４および
１６６によって受信されたパケットを含む）をそれ自身のＮＩＣ１６０の出力ポ
ート１６８、１７０および１７２の１つ以上へ通し（ｆｕｎｎｅｌ）得る。別の
実施形態において、入力ポート１６２、１６４および１６６において受信された
パケットは、転送エンジン２２０によって待ち行列出し入れモジュール２４０へ
直接転送される。The modified packet is delivered to the dequeue / ingress module 240, which uses the transfer vector to organize the packet into a queue for a particular destination output port 168, 170, and 172. Then, the transfer vector of each packet is removed and the packet is transferred to the selected output ports 168, 170 and 1
Scheduled for transmission to 72. The packet is then transmitted from the selected output port 168, 170 and 172 to a communication network such as LAN 120, MAN 130 or WAN 140. In one embodiment,
The queuing module 240 of the NIC 160 has a full mesh interconnect (
The modified packet is received via (not shown), so that the packet originally received at the input port of either of the NICs 160 and 180 installed in the communication device 150 (the input port 162, 164 of its own NIC 160). And packets received by 166) to one or more of the output ports 168, 170 and 172 of its own NIC 160. In another embodiment, the packets received at input ports 162, 164 and 166 are forwarded by forwarding engine 220 directly to queuing module 240.

【００２３】図３および図４を参照すると、転送エンジン２２０の構造の実施例は、入口ロ
ジック３１０と、ＡＬＥインターフェース３５０と、統計インターフェース３６
０と、出口ロジック３７０と、それぞれ３２０、３３０および３４０として示す
１以上のプロセッサとを含む。動作は以下のとおりである。パケットに対応する
データが通信ネットワーク１１０を介して送信され、ＮＩＣ１６０または１８０
の、通信ネットワーク１１０に接続された特定の入力ポート１６２、１６４また
は１６４で受信される（ステップ４１０）。その後、入力ポート１６２、１６４
または１６６に関連する、プールされているプロセッサ（それぞれ３２０、３３
０および３４０で示す）からプロセッサ３３０が選択され、パケットを処理する
（ステップ４２０）。プロセッサ３３０が一旦割り当てられると、パケットは、
入口ロジック３１０により、ヘッダとボディ部とに分割される（ステップ４３０
）。パケットヘッダは、レジスタに対する直接アクセスを用いて、プロセッサ３
３０に関連するレジスタファイル７１０（図７）の特定の位置に書き込まれ、パ
ケットボディは、メモリに対する直接アクセスを用いて、出口ロジック３７０内
の出力バッファに書き込まれる（ステップ４４０）。その後プロセッサ３３０は
、局所的に格納された命令に従ってパケットヘッダを処理し（ステップ４５０）
、処理されたパケットヘッダを出口ロジック３７０に伝送し、パケットヘッダは
出口ロジック３７０において、パケットボディと再結合される（ステップ４６０
）。With reference to FIGS. 3 and 4, an example of the structure of the transfer engine 220 is an ingress logic 310, an ALE interface 350, and a statistics interface 36.
0, egress logic 370, and one or more processors shown as 320, 330, and 340, respectively. The operation is as follows. The data corresponding to the packet is transmitted via the communication network 110 and the NIC 160 or 180
Is received at a particular input port 162, 164 or 164 connected to the communication network 110 (step 410). After that, input ports 162, 164
Or 166 associated pooled processors (320, 33 respectively)
A processor 330 is selected from 0 and 340) to process the packet (step 420). Once the processor 330 is assigned, the packet
The entrance logic 310 divides the header and body (step 430).
). The packet header uses the processor 3 direct access to register
At a particular location in the register file 710 (FIG. 7) associated with 30, the packet body is written to the output buffer in egress logic 370 using direct access to memory (step 440). Processor 330 then processes the packet header according to the locally stored instructions (step 450).
, Transmit the processed packet header to egress logic 370, which recombines with the packet body at egress logic 370 (step 460).
).

【００２４】プロセッサ３３０は以下のようにして、パケットヘッダを処理するなどのタス
クを実行し得る。パケットヘッダの完全性をチェックし、チェックサムをベリフ
ァイし、統計インターフェース３６０を介して統計モジュール２３０にアクセス
することにより、このパケットヘッダを含む処理活動を転送エンジン２２０外部
のモジュールに報告するために用いられる統計を提供し、ＡＬＥインターフェー
ス３５０を介してＡＬＥ２１０と通信する。これにより、出力ポート１６８、１
７０および１７２のうちパケットの宛先に関連する１つに関するルーティング情
報を獲得する。追加のネットワーク特異的な（例えば、ＩＰ、ＡＴＭ、フレーム
リレー、ＨＤＬＣ、ＴＤＭ）パケット処理は、このときになされ得る。この処理
活動の終了時に、プロセッサ３３０が、パケットヘッダを変更して、ＮＩＣ１６
０の特定の出力ポート１６８、１７０、１７２を指定するルーティング情報を含
むようにする（例えば、転送ベクタをパケットヘッダの前に付加することにより
）。変更されたパケットヘッダはその後、転送エンジン２２０の出口ロジック３
７０に書き込まれ、出口ロジック３７０においては、続いて上述したようにキュ
ー／デキューモジュール２４０にルーティングされる。Processor 330 may perform tasks such as processing packet headers as follows. Used to report the processing activity including the packet header by checking the integrity of the packet header, verifying the checksum, and accessing the statistics module 230 via the statistics interface 360. Provided statistics and communicate with ALE 210 via ALE interface 350. This allows output ports 168, 1
Obtain routing information for one of 70 and 172 associated with the destination of the packet. Additional network-specific (eg IP, ATM, Frame Relay, HDLC, TDM) packet processing may be done at this time. At the end of this processing activity, processor 330 modifies the packet header to cause NIC 16
Include routing information that specifies a particular output port 168, 170, 172 of 0 (eg, by prepending the transfer vector to the packet header). The modified packet header is then forwarded by egress logic 3 of forwarding engine 220.
70 and subsequently routed to egress logic 370 to queue / dequeue module 240 as described above.

【００２５】ＡＬＥインターフェース３５０、統計インターフェース３６０および出口ロジ
ック３７０は、転送エンジン２２０内の、プロセッサ３２０、３３０および３４
０間で共有可能なリソースである。これらのリソース３５０、３６０および３７
０へのアクセスに関してプロセッサ３２０、３３０および３４０間の仲裁を行う
ために、転送エンジン２２０内に仲裁メカニズム（図示せず）が転送エンジン２
２０内に設けられる。一実施形態では、プロセッサ３３０がパケットに割り当て
られると、プロセッサ３３０用の、プロセッサ番号などのプロセッサ識別子が、
上述した３つの共有されたリソース３５０、３６０および３７０の各々に通信さ
れる。これらの共有されたリソース３５０、３６０および３７０の各々はその後
、ＦＩＦＯにプロセッサ番号を書き込み、ＦＩＦＯは好適には、転送エンジン２
２０内のプロセッサの総数に等しい深さを有する。共有されたリソース３５０、
３６０および３７０の各々内のロジックは、対応するＦＩＦＯにアクセスして、
プロセッサ３２０、３３０および３４０のいずれが次にリソースにアクセスする
ことを許可されるべきかを決定する。許可されたプロセッサが特定のリソース３
５０、３６０、３７０に対するアクセスを一旦完了すると、アクセスされたリソ
ースが次のＦＩＦＯエントリを読み出して、許可が発行されるべき次のプロセッ
サを決定する。The ALE interface 350, the statistics interface 360, and the exit logic 370 are included in the transfer engine 220 and include processors 320, 330 and 34.
It is a resource that can be shared between 0s. These resources 350, 360 and 37
An arbitration mechanism (not shown) in transfer engine 220 is provided in transfer engine 220 to arbitrate between processors 320, 330 and 340 for access to 0.
It is provided in 20. In one embodiment, when the processor 330 is assigned to a packet, the processor identifier, such as the processor number, for the processor 330 is
Each of the three shared resources 350, 360 and 370 described above are communicated. Each of these shared resources 350, 360 and 370 then writes the processor number to the FIFO, which is preferably the transfer engine 2
It has a depth equal to the total number of processors in 20. Shared resource 350,
The logic within each of 360 and 370 accesses the corresponding FIFO to
Determine which of the processors 320, 330 and 340 should then be allowed to access the resource. Allowed processor is a specific resource 3
Once the access to 50, 360, 370 is complete, the accessed resource reads the next FIFO entry to determine the next processor to which the grant should be issued.

【００２６】より詳細に図５および図６を参照すると、転送エンジン２２０内でのパケット
データの受け取り、操作、および伝送は、主に複数のＤＭＡおよびＤＲＡ状態マ
シンによって処理される。一実施形態では、これらの状態マシンは、入口ロジッ
ク３１０およびプロセッサ３３０内に収容されている。この実施形態の動作にお
いて、パケットはＮＩＣ１６０の入力ポート１６２、１６４および１６６のうち
の１つから受け取られ、入口ロジック３１０内のＲｅｃｅｉｖｅ＿ＤａｔａＦ
ＩＦＯ（ファーストイン／ファーストアウトバッファ）５１０内に格納される（
ステップ６１０）。Ｒｅｃｅｉｖｅ＿ＳｔａｔｕｓＦＩＦＯ５１２は、パケッ
トが到着した特定の入力ポート１６２、１６４または１６４を記録し、転送エン
ジン２２０によって受け取られた各パケットの入力ポート番号の順序だったリス
トを保持する。上記リストは、パケットが受け取られた時点に従ってソートされ
る。Referring to FIGS. 5 and 6 in greater detail, the receipt, manipulation, and transmission of packet data within transfer engine 220 is primarily handled by multiple DMA and DRA state machines. In one embodiment, these state machines are housed within ingress logic 310 and processor 330. In operation of this embodiment, a packet is received from one of the input ports 162, 164 and 166 of the NIC 160 and received by the Receive_Data F in the ingress logic 310.
Stored in IFO (first-in / first-out buffer) 510 (
Step 610). The Receive_Status FIFO 512 records the particular input port 162, 164 or 164 at which the packet arrived and maintains an ordered list of the input port number of each packet received by the forwarding engine 220. The list is sorted according to when the packet was received.

【００２７】Ｉｓｓｕｅ＿ＤＭＡ＿Ｃｏｍｍａｎｄ状態マシン５１４は、Ｒｅｃｅｉｖｅ＿
ＳｔａｔｕｓＦＩＦＯ５１２がいつデータを収容し、Ｒｅｃｅｉｖｅ＿Ｓｔａ
ｔｕｓＦＩＦＯ５１２からパケットを受け取った入力ポート１６２、１６４ま
たは１６６に関連する入力ポート番号を獲得するかを検出する（ステップ６２０
）。Ｉｓｓｕｅ＿ＤＭＡ＿Ｃｏｍｍａｎｄ状態マシン５１４はその後、パケット
のポート番号を含むプロセッサ割り当て要求を、Ａｌｌｏｃａｔｅ＿Ｐｒｏｃｅ
ｓｓｏｒ状態マシン５１６に送信する。Ａｌｌｏｃａｔｅ＿Ｐｒｏｃｅｓｓｏｒ
状態マシン５１６は、上記ポート番号に関連するＡｌｌｏｃａｔｉｏｎ＿Ｐｏｏ
ｌＲｅｇｉｓｔｅｒ５１８にアクセスして、このパケット上で動作する候補で
あるプロセッサ３２０、３３０および３４０のセットを決定する（ステップ６３
０）。Ａｌｌｏｃａｔｅ＿Ｐｒｏｃｅｓｓｏｒステートマシン５１６はその後、
Ｐｒｏｃｅｓｓｏｒ＿ＦｒｅｅＲｅｇｉｓｔｅｒ５２０にアクセスして、Ａｌ
ｌｏｃａｔｉｏｎ＿ＰｏｏｌＲｅｇｉｓｔｅｒ５１８によって特定される候補
プロセッサ３２０、３３０および３４０のうちのいずれかが使用可能であるか否
かを決定する。Ａｌｌｏｃａｔｅ＿Ｐｒｏｃｅｓｓｏｒステートマシン５１６は
続いて、候補プロセッサ３２０、３３０および３４０のセットから使用可能なプ
ロセッサ３３０のうちの１つを割り当てて、パケットを処理し（ステップ６４０
）、プロセッサ３３０の割り当て許可およびプロセッサ番号を、Ｉｓｓｕｅ＿Ｄ
ＭＡ＿Ｃｏｍｍａｎｄステートマシン５１４に送信する。The Issue_DMA_Command state machine 514 receives the Receive_DMA_Command state machine 514.
When the Status FIFO 512 accommodates the data and the Receive_sta
Detects whether to get the input port number associated with the input port 162, 164 or 166 that received the packet from the tus FIFO 512 (step 620).
). The Issue_DMA_Command state machine 514 then issues a processor allocation request containing the port number of the packet to the Allocate_Proce.
Send to sors state machine 516. Allocate_Processor
The state machine 516 uses the Allocation_Poo associated with the above port number.
l Register 518 is accessed to determine the set of processors 320, 330 and 340 that are candidates to operate on this packet (step 63).
0). The Allocate_Processor state machine 516 then
Access Processor_Free Register 520 and
Determines whether any of the candidate processors 320, 330 and 340 identified by the location_Pool Register 518 are available. The Allocate_Processor state machine 516 then allocates one of the available processors 330 from the set of candidate processors 320, 330 and 340 to process the packet (step 640).
), Assign permission of processor 330 and processor number to Issue_D
Send to MA_Command state machine 514.

【００２８】割り当てられたプロセッサ３１０に関連するプロセッサ番号を受け取ると、Ｉ
ｓｓｕｅ＿ＤＭＡ＿Ｃｏｍｍａｎｄステートマシン５１４は、プロセッサ番号を
含む実行信号／コマンドを、ＤＭＡ＿Ｅｘｅｃｕｔｅステートマシン５２２に送
信する。ＤＭＡ＿Ｅｘｅｃｕｔｅステートマシン５２２は、Ｈｅａｄｅｒ＿ＤＭ
Ａ＿ＬｅｎｇｔｈＲｅｇｉｓｔｅｒ５２４にアクセスして、プロセッサ３３０
に送信すべき受信パケットの量（すなわち、パケットヘッダの長さ）を獲得する
（ステップ６５０）。ＤＭＡ＿Ｅｘｅｃｕｔｅステートマシン５２２はその後、
ＤＭＡコマンドを発行する。ＤＭＡコマンドは、Ｒｅｃｅｉｖｅ＿ＤａｔａＦ
ＩＦＯ５１０からパケットのヘッダ部（パケットヘッダに対応する）を取り出し
て、ＤＲＡバス５２６を介して伝送する。ＤＲＡバス５２６において、パケット
のヘッダ部は、プロセッサ３３０内に含まれるＰｒｏｃｅｓｓｏｒ＿ＤＲＡステ
ートマシン５３０によって取り出される（ステップ６６０）。ＤＭＡ＿Ｅｘｅｃ
ｕｔｅステートマシン５２２はさらに、Ｒｅｃｅｉｖｅ＿ＤａｔａＦＩＦＯ５
１０からパケットボディを取り出すＤＭＡコマンドを発行し、出口ロジック３７
０のバッファ（図示せず）によって受け取られるように、別のＤＭＡバス５２８
を介して伝送する（ステップ６６０）。Ｐｒｏｃｅｓｓｏｒ＿ＤＭＡステートマ
シン５３０は続いて、ＤＲＡバス５２６を介して受け取ったパケットヘッダデー
タを、プロセッサ３３０内のレジスタファイル領域７１０（図７）内の固定アド
レス位置（例えば、アドレス０）から開始するレジスタファイル領域に直接書き
込む（ステップ６７０）。プロセッサ３３０はその後、パケットヘッダを処理し
（ステップ６８０）、Ｔｒａｎｓｍｉｔ＿ＤＭＡステートマシン５３２を介して
、処理されたヘッダを出口ロジック３７０に送信し、処理されたヘッダはパケッ
トボディと再結合される（ステップ６９０）。Upon receiving the processor number associated with the assigned processor 310, I
The sue_DMA_Command state machine 514 sends an execution signal / command including the processor number to the DMA_Execute state machine 522. The DMA_Execute state machine 522 uses Header_DM
Access A_Length Register 524 to access processor 330
1. Acquire the amount of received packets to be sent (ie, the length of the packet header) (step 650). The DMA_Execute state machine 522 then
Issue a DMA command. The DMA command is Receive_Data F
The header portion of the packet (corresponding to the packet header) is taken out from the IFO 510 and transmitted via the DRA bus 526. On the DRA bus 526, the header portion of the packet is retrieved by the Processor_DRA state machine 530 included in the processor 330 (step 660). DMA_Exec
The ute state machine 522 further includes a Receive_Data FIFO5.
Issue a DMA command to retrieve the packet body from 10, and exit logic 37
0 DMA buffer 528 as received by a 0 buffer (not shown).
(Step 660). The Processor_DMA state machine 530 then continues the packet header data received via the DRA bus 526 from a fixed address location (eg, address 0) within the register file area 710 (FIG. 7) within the processor 330. (Step 670). Processor 330 then processes the packet header (step 680) and sends the processed header to egress logic 370 via Transmit_DMA state machine 532, where the processed header is recombined with the packet body (step 690). ).

【００２９】より具体的に、図７および図８を参照すると、プロセッサ３３０内のパケット
ヘッダの処理は、好ましくは、プロセッサの命令および活動が、プロセッサのロ
ーカルレジスタファイル７１０内に形成される実行領域におけるデータおよび実
行結果の操作に限定されるようなものである。ある例示的な実施形態におけるプ
ロセッサ３３０の構造は、Ｓｔａｔｕｓ＿Ｉｎｔｅｒｆａｃｅステートマシン７
０４、ＡＬＥ＿Ｉｎｔｅｒｆａｃｅステートマシン７０６、Ｐｒｏｃｅｓｓｏｒ
＿ＤＲＡステートマシン５３０、Ｔｒａｎｓｍｉｔ＿ＤＭＡステートマシン５３
２、レジスタファイル７１０、演算論理装置（ＡＬＵ）７２０、プロセッサ制御
モジュール７３０、および命令メモリ７４０を含む。計算装置７２５は、プロセ
ッサ制御７３０およびＡＬＵ７２０を含む。More specifically, referring to FIGS. 7 and 8, the processing of the packet header within processor 330 is preferably such that the instructions and activities of the processor are executed in the local register file 710 of the execution region. Are limited to manipulating data and execution results in. The structure of the processor 330 in an exemplary embodiment is as follows: Status_Interface state machine 7
04, ALE_Interface state machine 706, Processor
_DRA state machine 530, Transmit_DMA state machine 53
2, register file 710, arithmetic logic unit (ALU) 720, processor control module 730, and instruction memory 740. Computing device 725 includes processor control 730 and ALU 720.

【００３０】この例示的な操作の間、かつ、プロセッサ３３０がパケットヘッダの受信を待
機している間、計算装置７２５は、命令メモリ７４０内の特別なアドレス（例え
ば、アドレス０）で継続的に（すなわち、無限ループで）命令を実行する（工程
８１０）。プロセッサ３３０内のハードウェアは、アドレス０が特別なアドレス
であることを検出する。特別なアドレスにおいては、命令は、命令メモリ７４０
に格納されている命令ではなく、シリコンにエッチングされた「ハードワイヤー
」命令値から返ってくる。ある起こり得るインプリメンテーションにおいて、特
別なアドレス０で命令にアクセスすると、「ＪＭＰ０」が返ってきて（または
、アドレス０命令にジャンプして）、プロセッサ３３０がそのアドレスで無限ル
ープを実行する原因となる。During this exemplary operation, and while the processor 330 is waiting to receive a packet header, the computing device 725 continually at a special address (eg, address 0) in the instruction memory 740. The instruction is executed (ie, in an endless loop) (step 810). Hardware within processor 330 detects that address 0 is a special address. At the special address, the instruction is stored in the instruction memory 740.
It comes back from the "hardwired" instruction value etched into the silicon, not the instruction stored in. In one possible implementation, accessing an instruction at a special address 0 returns a "JMP 0" (or jumps to the address 0 instruction), causing the processor 330 to execute an infinite loop at that address. Becomes

【００３１】パケットヘッダが、入口ロジック３１０からプロセッサのレジスタファイル７
１０に転送される場合、Ｐｒｏｃｅｓｓｏｒ＿ＤＲＡステートマシン５３０から
の制御信号は、プロセッサ制御モジュール７３０に、パケットヘッダ転送が進行
中であることを示す（工程８２０）。この信号がアクティブである間、プロセッ
サ制御モジュール７３０は、プロセッサプログラムカウンタ（図示せず）に命令
メモリ７４０の、非特別アドレス（例えば、アドレス２）を特定させ、計算装置
７２５に、特別アドレス０で実行されている無限ループを破らせ、信号がイナク
ティブになるまで待機させる（工程８３０）。計算装置７２５は、信号がイナク
ティブになることに応答して、アドレス２での命令の実行を開始する（工程８４
０）。命令メモリ７４０のアドレス２は、レジスタファイル７１０内のパケット
ヘッダを処理するために用いられる第１の命令を保持するように構成され得る（
すなわち、アドレス２での命令は、パケットヘッダで実行するように以前にダウ
ンロードされた「リアル」ソフトウェアイメージの開始に対応する）。Ｐｒｏｃ
ｅｓｓｏｒ＿ＤＲＡステートマシン５３０が、レジスタファイル７１０内の固定
位置で開始するパケットヘッダの書き込みを完了すると（制御信号がイナクティ
ブになる場合に起こる）、計算装置７２５は、命令メモリ７４０内の残りの命令
を普通に（すなわち、アドレス２を越えて）実行することを続ける。命令メモリ
７４０内の特定の命令は、レジスタファイル７１０内の位置を指定する。特定の
パケットヘッダについての処理アクティビティが完了すると、実行しているソフ
トウェアは、アドレス０へと「ジャンプ」し、無限ループ内のアドレス０で命令
を実行する。この技術は、プロセッサ３３０がレジスタファイル７１０に格納さ
れているパケットヘッダの処理をロードおよび格納命令を用いることなくトリガ
され得る方法について、ある特定のインプリメンテーションを示す。The packet header is transferred from the ingress logic 310 to the processor register file 7
If so, the control signal from Processor_DRA state machine 530 indicates to processor control module 730 that a packet header transfer is in progress (step 820). While this signal is active, the processor control module 730 causes the processor program counter (not shown) to identify a non-special address (eg, address 2) in the instruction memory 740 and causes the computing device 725 to specify the special address 0. Break the running infinite loop and wait until the signal is inactive (step 830). Computing device 725 begins executing the instruction at address 2 in response to the signal becoming inactive (step 84).
0). Address 2 of instruction memory 740 may be configured to hold the first instruction used to process the packet header in register file 710 (
That is, the instruction at address 2 corresponds to the start of the previously downloaded "real" software image to execute in the packet header). Proc
When the essor_DRA state machine 530 completes writing the packet header starting at a fixed location in the register file 710 (which occurs when the control signal becomes inactive), the computing device 725 will normally scan the remaining instructions in the instruction memory 740. Continue (ie, beyond address 2). A particular instruction in instruction memory 740 specifies a location in register file 710. When the processing activity for a particular packet header is complete, the executing software "jumps" to address 0 and executes the instruction at address 0 in an infinite loop. This technique illustrates one particular implementation of how processor 330 may be triggered to process packet headers stored in register file 710 without using load and store instructions.

【００３２】他の実施形態において、割り当てられたプロセッサ３３０は、外部ステートマ
シンから、完全なパケットヘッダがレジスタファイル７１０にあることを示す信
号を受信するまで、アイドル（すなわち、命令メモリにアクセスしたり、命令を
実行したりしない）のままである。その後、計算装置７２５は、命令メモリ７４
０からのコードを実行して、パケットヘッダを処理する。トリガイベントは、例
えば、制御信号がイナクティブになる場合も含み得る。あるいは、割り当てられ
たプロセッサ３３０は、ＤＲＡ転送が開始されたとき、完了されたとき、もしく
は処理されているときに、トリガされる。他の多くのトリガイベントが、当業者
にとって明らかである。In another embodiment, the assigned processor 330 may idle (ie, access instruction memory, etc.) until it receives a signal from the external state machine indicating that the complete packet header is in the register file 710. , Do not execute the instruction). After that, the computing device 725 may use the instruction memory 74.
Run the code from 0 to process the packet header. The trigger event may include, for example, when the control signal becomes inactive. Alternatively, the assigned processor 330 is triggered when the DRA transfer is initiated, completed, or being processed. Many other trigger events will be apparent to those skilled in the art.

【００３３】上述したように、プロセッサ３３０は、パケットヘッダの処理の間、プロセッ
サ３３０の外部の１つ以上の共有のリソース（例えば、図３、ＡＬＥインターフ
ェース３５０、統計インターフェース３６０、および出口ロジック３７０）にア
クセスする。例えば、プロセッサ３３０は、ＡＬＥインターフェース３５０（図
３）を介してＡＬＥ２１０（図２）とインタラクトして、ＡＬＥ２１０の検索を
開始させ、そこから検索結果を取り出す。また、プロセッサ３３０によって行わ
れる、ＡＬＥ２１０とのこれらのインタラクションは、プロセッサ３３０が命令
をロードおよび格納することなく、起こる。As described above, processor 330 may process one or more shared resources external to processor 330 (eg, FIG. 3, ALE interface 350, statistics interface 360, and egress logic 370) during processing of packet headers. To access. For example, processor 330 interacts with ALE 210 (FIG. 2) via ALE interface 350 (FIG. 3) to initiate a search for ALE 210 and retrieve search results therefrom. Also, these interactions with ALE 210 performed by processor 330 occur without processor 330 loading and storing instructions.

【００３４】１つの局面において、かつ、命令メモリ７４０内の命令を実行する間、プロセ
ッサ３３０は、レジスタファイル７１０内の所定のアドレスで開始する検索キー
を構成する。計算装置７２５は、ＡＬＥ２１０に転送する検索キーデータの量を
特定する、ＡＬＥ＿Ｃｏｍｍａｎｄレジスタへの値の書き込みに関わる命令を実
行する。この値は、プロセッサ３３０のＡＬＥ＿Ｉｎｔｅｒｆａｃｅステートマ
シン７０６に対する制御ラインとして効率的に機能し、ＡＬＥ＿Ｉｎｔｅｒｆａ
ｃｅステートマシン７０６を、計算装置７２５から独立した直接メモリアクセス
を用いて、ＡＬＥ＿コマンドレジスタから値または他のデータを読み出し、転送
されるデータの量を決定し、指定されたデータをＡＬＥインターフェース３５０
に転送するように、トリガする。プロセッサ３３０が、検索の結果が返ってくる
まで待機している間、他の機能、例えば、パケットヘッダのネットワークプロト
コル（例えば、ＩＰ）チェックサムの認証などを行い得る。ＡＬＥ２１０からの
検索結果が利用可能である場合、検索結果は、ＡＬＥインターフェース３５０を
介してＡＬＥ＿Ｉｎｔｅｒｆａｃｅステートマシン７０６に転送される。ＡＬＥ
＿Ｉｎｔｅｒｆａｃｅステートマシン７０６は、検索結果を、１つ以上の直接レ
ジスタアクセスを用いて、レジスタファイル７１０の所定の位置に書き込み、書
き込みが完了した場合、計算装置７２５に知らせる。計算装置７２５は、続いて
、検索結果に応答して、パケットヘッダを変更する。In one aspect, and while executing instructions in instruction memory 740, processor 330 configures a search key starting at a given address in register file 710. The computing device 725 executes an instruction related to writing a value to the ALE_Command register that specifies the amount of search key data to be transferred to the ALE 210. This value effectively serves as a control line for the ALE_Interface state machine 706 of the processor 330, and the ALE_Interface
The ce state machine 706 uses direct memory access independent of the computing device 725 to read a value or other data from the ALE_command register, determine the amount of data to be transferred, and transfer the specified data to the ALE interface 350.
Trigger to transfer to. The processor 330 may perform other functions, such as authenticating a network protocol (eg, IP) checksum of the packet header while waiting for the results of the search to be returned. If the search results from ALE 210 are available, the search results are forwarded to ALE_Interface state machine 706 via ALE interface 350. ALE
The _Interface state machine 706 writes the search result to a predetermined location in the register file 710 using one or more direct register accesses and informs the computing device 725 when the write is complete. Computing device 725 subsequently modifies the packet header in response to the search results.

【００３５】また、プロセッサ３３０は、アドレスおよび長さの値を、プロセッサ３３０の
Ｓｔａｔｉｓｔｉｃｓ＿Ｕｐｄａｔｅ＿Ｃｏｍｍａｎｄレジスタ（図示せず）に
書き込むことによって、統計更新コマンドを発行し得る。プロセッサ３３０のＳ
ｔａｔｉｓｔｉｃｓ＿Ｉｎｔｅｒｆａｃｅステートマシン７０４は、計算装置７
２５から独立した直接メモリアクセスを用いて、Ｓｔａｔｉｓｔｉｃｓ＿Ｕｐｄ
ａｔｅ＿Ｃｏｍｍａｎｄレジスタからデータを読み出し、転送するデータのソー
スおよび量を決定し、指定されたデータを統計インターフェース３６０に転送す
るようにトリガされる。The processor 330 may also issue a statistics update command by writing the address and length values to the Statistics_Update_Command register (not shown) of the processor 330. S of processor 330
The statistics_Interface state machine 704 is the computing device 7.
25, using direct memory access independent of Statistics_Upd
Triggered to read data from the ate_Command register, determine the source and amount of data to transfer, and transfer the specified data to the statistics interface 360.

【００３６】同様に、プロセッサ３３０が、パケットヘッダの処理を完了する場合、計算装
置７２５は、処理されたパケットヘッダを、プロセッサ３３０のＴｒａｎｓｍｉ
ｔ＿ＤＭＡステートマシン５３２に書き込む。プロセッサ３３０は、処理された
ヘッダを、プロセッサ３３０から独立した直接メモリアクセスを用いて、出口ロ
ジック３７０におけるバッファに転送する（工程８５０）。全ての処理が完了し
た後、プロセッサ３３０におけるソフトウェア実行は、命令メモリ７４０のアド
レス０にジャンプして戻り、次のパケットヘッダの到着を待機する間、以前に説
明した無限ループ命令を実行し始める（工程８６０）。Similarly, when the processor 330 completes the processing of the packet header, the computing device 725 sends the processed packet header to the Transmi of the processor 330.
Write to the t_DMA state machine 532. Processor 330 transfers the processed header to a buffer at egress logic 370 using direct memory access independent of processor 330 (step 850). After all processing is complete, software execution in processor 330 jumps back to address 0 of instruction memory 740 and begins executing the infinite loop instruction previously described while waiting for the arrival of the next packet header ( Step 860).

【００３７】より詳細には、処理活動が終了した後、パケットヘッダをレジスタファイル７
１０の隣接領域に常駐させる必要は必ずしもないため、演算ユニット７２５は、
レジスタファイル７１０中の処理済みパケットヘッダそれぞれのロケーションを
指定しなければならない場合がある。そのため、演算ユニット７２５は、Ｍｏｖ
ｅＤＭＡＣｏｍｍａｎｄＲｅｇｉｓｔｅｒ（図示せず）に書き込みを１つ
以上発行して、処理されたパケットヘッダそれぞれの開始アドレスおよび長さを
指定する。これらの書き込みは、実質的には再アセンブリコマンドのリストとし
てＦＩＦＯに格納される。フラグメント化されたパケットヘッダ全てに関するデ
ータが取得された後、演算ユニット７２５は、ＴｒａｎｓｍｉｔＤＭＡＣｏ
ｍｍａｎｄＲｅｇｉｓｔｅｒ（図示せず）に書き込みを行ない、他のデータに
沿ったパケットのボディ長を指定する。More specifically, after the processing activity is completed, the packet header is transferred to the register file 7
Since it is not always necessary to make it resident in the ten adjacent areas,
It may be necessary to specify the location of each processed packet header in register file 710. Therefore, the arithmetic unit 725 is
e Issue one or more writes to the DMA Command Register (not shown) to specify the starting address and length of each processed packet header. These writes are effectively stored in the FIFO as a list of reassembly commands. After the data regarding all the fragmented packet headers are obtained, the arithmetic unit 725 determines that the Transmit DMA Co
Write to the mmand Register (not shown) to specify the body length of the packet along with other data.

【００３８】ＴｒａｎｓｍｉｔＤＭＡＣｏｍｍａｎｄＲｅｇｉｓｔｅｒに値が書き込
まれると、プロセッサ３３０内のＴｒａｎｓｍｉｔＤＭＡステートマシン５３
２がトリガされて、パケットヘッダのアセンブリを上述したＦＩＦＯ内に格納さ
れた再アセンブリコマンドに従って開始する。その後、ＴｒａｎｓｍｉｔＤＭ
Ａステートマシン５３２は、演算ユニット７２５から独立した直接的なメモリア
クセス法を用いて、アセンブルされたパケットヘッダを特定の制御情報（パケッ
トボディの長さを含む）と共に出口ロジック３７０に送信する。この出口ロジッ
ク３７０は、ＴｒａｎｓｍｉｔＤＭＡステートマシン５３２から受信した処理
されたパケットヘッダを、出口ロジック３７０のＦＩＦＯに格納されたパケット
ボディと連結させ、その後、再構築されたパケットを上述したようにキューイン
グ／デキューイングモジュール２４０に送信する。When a value is written in the Transmit DMA Command Register, the Transmit DMA state machine 53 in the processor 330 is written.
2 is triggered to start the assembly of the packet header according to the reassembly command stored in the FIFO described above. After that, Transmit DM
The A state machine 532 sends the assembled packet header with specific control information (including the length of the packet body) to the egress logic 370 using a direct memory access method independent of the arithmetic unit 725. The egress logic 370 concatenates the processed packet header received from the Transmit DMA state machine 532 with the packet body stored in the egress logic 370 FIFO, and then queues the reassembled packet as described above. / Transmit to the dequeuing module 240.

【００３９】パケットヘッダをパケットボディと適切に再構築するために、プロセッサ３３
０は、パケットヘッダそのものに埋め込まれたデータからパケット全長を入手し
、ＲｅｃｅｉｖｅＤａｔａＦＩＦＯ５１０（図５）によってプロセッサ３
３０に送信されたデータからパケットヘッダ長を入手する（これは、図５のヘッ
ダ長レジスタ５２４に書き込まれた値に対応する）。この情報に基づいて、プロ
セッサ３３０は、出口ロジック３７０中の出力ＦＩＦＯに以前に送信されたパケ
ットボディデータの量を計算し、パケットボディの長さを、Ｔｒａｎｓｍｉｔ
ＤＭＡステートマシン５３２によって出口ロジック３７０へと送信されるべき制
御情報として指定する。このようにして、プロセッサ３３０は、出口ロジック３
７０の出力ＦＩＦＯから引き出されて、プロセッサ３３０によって形成された新
規にアセンブルされたパケットヘッダに追加されるべきパケットボディのデータ
量を指定することができ、これにより、改変されたパケットを再構築することが
できる。改変されたパケットを適切に再構築するために、プロセッサ３３０は、
プロセッサ３３０が割り当てられた順序と同じ順序（よってパケットボディが出
口ロジック３７０の出力ＦＩＦＯに書き込まれた順序と同じ順序）での出口ロジ
ック３７０へのアクセスを認められる。In order to properly reconstruct the packet header with the packet body, the processor 33
0 obtains the total length of the packet from the data embedded in the packet header itself, and the Receive Data FIFO 510 (FIG. 5) causes the processor 3
Obtain the packet header length from the data sent to 30 (this corresponds to the value written to the header length register 524 of FIG. 5). Based on this information, the processor 330 calculates the amount of packet body data previously sent to the output FIFO in the egress logic 370 and determines the length of the packet body in the Transmit
Designated as control information to be sent by DMA state machine 532 to egress logic 370. In this way, the processor 330 causes the exit logic 3
It is possible to specify the amount of data in the packet body to be extracted from the output FIFO of 70 and added to the newly assembled packet header formed by the processor 330, thereby reconstructing the modified packet. be able to. In order to properly reconstruct the modified packet, the processor 330
Access to egress logic 370 is granted in the same order in which processor 330 was assigned (and thus the same order that the packet bodies were written to the output FIFO of egress logic 370).

【００４０】本発明の局面は、入力パケット処理要件への演算リソースの割り当てにおいて
多大な柔軟性を提供する。転送エンジン２２０内のプロセッサ３２０、３３０、
３４０の総数を例示的目的のために４０個であると仮定すると、プロセッサ３２
０、３３０、３４０を柔軟に割り当てて、複数の入力／出力ポートの構成のパケ
ット処理要件を満たすことができる。例えば、論理入力ポートが１つ（すなわち
、ポート０）しかないＮＩＣ１６０において、４０個のプロセッサ３２０、３３
０、３４０を全て、その１つのポート用のパケットを処理するように割り当てる
ことができる。このシナリオの場合、各プロセッサ３２０、３３０、３４０の命
令メモリ７４０にロードされるコードイメージを同一にすることができるため、
各プロセッサ３２０、３３０、３４０は、その一種類の入力ポートについて同一
のアルゴリズムを行なうことができる。論理入力ポートが４つあり、これらの論
理入力ポートがそれぞれ別の種類のネットワークインターフェースを備える別の
シナリオにおいて、各種のネットワークインターフェースに必要な処理アルゴリ
ズムは異なり得る。この場合、４０個のプロセッサを以下のようにして割り当て
ることができる：すなわち、プロセッサ［０〜９］をポート０に割り当て、プロ
セッサ［１０〜１９］をポート１に割り当て、プロセッサ［２０２９］をポート
２に割り当て、そしてプロセッサ［３０〜３９］をポート３に割り当てる。加え
て、４つの異なるコードイメージをダウンロードして、各一意に定まるイメージ
を特定の入力ポートに対応させる。さらに別のシナリオにおいて、ＮＩＣ１６０
は、２つの論理入力ポートを含み得、これらの論理入力ポートはそれぞれ、異な
る処理性能要件を有する。このようなシナリオにおいて、これらの入力ポートの
１つは、入口（ｉｎｇｒｅｓｓ）バス帯域の７５％を消費し得、プロセッサのリ
ソースの７５％を必要とするパケット到着レートを有し、もう一方のポートは残
りを占める。これらの性能要件をサポートするためには、３０個のプロセッサを
入力ポート０に割り当て、１０個のプロセッサを入力ポート１に割り当てればよ
い。Aspects of the invention provide a great deal of flexibility in allocating computational resources to incoming packet processing requirements. Processors 320, 330 in transfer engine 220,
Assuming that the total number of 340 is 40 for exemplary purposes, the processor 32
0, 330, 340 can be flexibly assigned to meet the packet processing requirements of multiple input / output port configurations. For example, in the NIC 160 having only one logical input port (that is, port 0), 40 processors 320, 33
All 0, 340 can be assigned to handle packets for that one port. In this scenario, the code images loaded in the instruction memory 740 of each processor 320, 330, 340 can be the same,
Each processor 320, 330, 340 can perform the same algorithm for its one type of input port. In another scenario where there are four logical input ports, each of which comprises a different type of network interface, the processing algorithms required for different network interfaces may be different. In this case, 40 processors can be assigned as follows: processor [0-9] assigned to port 0, processor [10-19] assigned to port 1, processor [2029] assigned to port. 2 and then processors [30-39] to port 3. In addition, four different code images are downloaded, each unique image corresponding to a particular input port. In yet another scenario, the NIC 160
May include two logical input ports, each of which has different processing performance requirements. In such a scenario, one of these input ports may consume 75% of the ingress bus bandwidth and has a packet arrival rate that requires 75% of the processor's resources and the other port Occupy the rest. To support these performance requirements, 30 processors may be assigned to input port 0 and 10 processors may be assigned to input port 1.

【００４１】１つのプロセッサを受信された各パケットに割り当てることにより、自身の転
送エンジン２２０の一部として複数のプロセッサを取り入れたＮＩＣ１６０、１
８０用のプログラミングモデルを簡素化することができる。さらに、上述したよ
うに、本発明を取り入れたシステムによって実現されたダイサイズの縮小により
、ＮＩＣＳ１６０、１８０の転送エンジンＡＳＩＣ内にプロセッサをさらに設け
ることが可能となり、その結果、パケット送信をネットワーク１１０のワイヤレ
ートで確実に行なうことが可能となる。本発明は、所与の転送エンジンＡＳＩＣ
上にプロセッサを追加してプロセッサのクロックレートを上昇させることと、複
数のＡＳＩＣの処理プールを統合することとにより、用意にスケーリング可能で
ある。この能力を提供する際には、本発明のハードウェアのアーキテクチャによ
り、ネットワークインターフェースを介して到着するパケットの順序が維持され
、これにより、再統合されたパケットを転送エンジンから適切な順序で送信する
ことが可能となる点に留意されたい。NICs 160, 1 that incorporate multiple processors as part of their forwarding engine 220 by assigning one processor to each received packet.
The programming model for 80 can be simplified. Further, as noted above, the die size reduction achieved by the system incorporating the present invention allows additional processors to be provided within the transfer engine ASICs of the NICS 160, 180, resulting in packet transmission of the network 110. The wire rate can be surely performed. The present invention is based on a given transfer engine ASIC.
It can be easily scaled by adding a processor on top of it to increase the clock rate of the processor and by integrating the processing pools of multiple ASICs. In providing this capability, the hardware architecture of the present invention maintains the ordering of packets arriving through the network interface, which causes reassembled packets to be sent from the forwarding engine in the proper order. Note that this is possible.

【００４２】このプロセッサプールによる統合技術が特に有利となる事例を挙げると、通信
デバイス１５０のＮＩＣ１６０が通信ネットワーク１１０を介してパケットデー
タストリームを受信するラインレートが（この統合技術を用いない場合に）ＮＩ
Ｃ１６０の処理能力を上回り得、その結果、パケットが落下してサービスの品質
が低下し得る事態がある。この統合技術を用いると、１つ以上の転送エンジンか
らアイドル状態のプロセッサを割り当てることが可能となる。例えば、ＮＩＣ１
６０は複数の転送エンジンＡＳＩＣを含み得、これらの転送エンジンはそれぞれ
、ＮＩＣ１６０上の任意の入力ポートに到着するパケットを処理するように割り
当てることが可能なプロセッサのプールを備える。あるいは、通信デバイス１５
０内の他のＮＩＣ１８０上に存在するさらなる転送エンジンＡＳＩＣＳ中のプロ
セッサのプールを、ネットワークのロードが重いＮＩＣ１６０に割り当てること
もできる。To give an example where this processor pool integration technique is particularly advantageous, the line rate at which the NIC 160 of the communication device 150 receives the packet data stream via the communication network 110 (when not using this integration technique). NI
There is a situation where the processing capacity of C160 may be exceeded, resulting in dropped packets and poor quality of service. Using this integration technique, it is possible to allocate idle processors from one or more transfer engines. For example, NIC1
60 may include multiple forwarding engine ASICs, each of which comprises a pool of processors that may be assigned to handle packets arriving at any input port on NIC 160. Alternatively, the communication device 15
A pool of processors in additional forwarding engine ASICS residing on other NICs 180 in 0 may also be assigned to the NIC 160, which has a heavy network load.

【００４３】本発明について特定の詳細を参照しながら説明してきたが、このような詳細は
、本明細書中の特許請求の範囲についてそしてその範囲までを除いては、本発明
の範囲を限定するものとしてみなされるべきではないことが意図される。While the present invention has been described with reference to particular details, such details are a limitation on the scope of the invention, except as to and to the extent of the claims below. It is intended that it should not be considered as a thing.

[Brief description of drawings]

【図１】図１は、ＬＡＮ、ＭＡＮおよびＷＡＮなどの他のネットワークに通信ネットワ
ークを接続する通信デバイスを模式的に示す。FIG. 1 schematically shows a communication device for connecting a communication network to other networks such as LAN, MAN and WAN.

【図２】図２は、本発明の一実施形態による、図１の通信デバイス内にインストールさ
れたネットワークインターフェースカードのいくつかのコンポーネントを模式的
に示す。2 schematically illustrates some components of a network interface card installed in the communication device of FIG. 1, according to one embodiment of the invention.

【図３】図３は、本発明の一実施形態による、図２のネットワークインターフェースカ
ードの一部を形成する、フォワーディングエンジンのいくつかのコンポーネント
を模式的に示す。3 schematically illustrates some components of a forwarding engine that form part of the network interface card of FIG. 2, according to one embodiment of the invention.

【図４】図４は、本発明の一実施形態による、図３のフォワーディングエンジンを動作
する場合に実行される工程の流れ図を提供する。FIG. 4 provides a flow diagram of the steps performed when operating the forwarding engine of FIG. 3, according to one embodiment of the invention.

【図５】図５は、本発明の一実施形態による、ダイレクトメモリおよびダイレクトレジ
スタアクセスを実行する、図３の入口ロジックのいくつかのコンポーネントおよ
びフォワーディングエンジンのプロセッサを模式的に示す。5 schematically illustrates some components of the ingress logic of FIG. 3 and a processor of the forwarding engine that perform direct memory and direct register access, according to one embodiment of the invention.

【図６】図６は、本発明の一実施形態による、図５の入口ロジックおよびプロセッサの
動作の間に実行される工程の流れ図を提供する。FIG. 6 provides a flow diagram of the steps performed during operation of the ingress logic and processor of FIG. 5, according to one embodiment of the invention.

【図７】図７は、本発明の一実施形態による、図５のプロセッサを形成するより詳細な
セットのコンポーネントを模式的に示す。FIG. 7 schematically illustrates a more detailed set of components forming the processor of FIG. 5, according to one embodiment of the invention.

【図８】図８は、本発明の一実施形態による、図７に示すプロセッサコンポーネントを
動作する場合に実行される工程の流れ図を提供する。FIG. 8 provides a flow diagram of the steps performed when operating the processor component shown in FIG. 7, according to one embodiment of the invention.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ，ＴＲ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＧＷ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＭＺ，ＳＤ，ＳＬ，ＳＺ，ＴＺ，ＵＧ，ＺＷ)，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＥ，ＡＧ，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＢＺ，ＣＡ，ＣＨ，ＣＮ，ＣＯ，ＣＲ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＤＭ，ＤＺ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＤ，ＧＥ，ＧＨ，ＧＭ，ＨＲ，ＨＵ，ＩＤ，ＩＬ，ＩＮ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＡ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＭＺ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＴＺ，ＵＡ，ＵＧ，ＵＺ，ＶＮ，ＹＵ，ＺＡ，ＺＷ (72)発明者モンロー，ドナルドダブリュー．アメリカ合衆国マサチューセッツ 01741，カーライル，パッチメドウレーン 75 (72)発明者ソダー，アーノルドエヌ．アメリカ合衆国マサチューセッツ 02460，ニュートンビル，オーククリフロード 44 Ｆターム(参考） 5K030 GA03 HA08 HD01 KA03 KA12 KA15 ─────────────────────────────────────────────────── ─── Continued front page (81) Designated countries EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, I T, LU, MC, NL, PT, SE, TR), OA (BF , BJ, CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG), AP (GH, G M, KE, LS, MW, MZ, SD, SL, SZ, TZ , UG, ZW), EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AE, AG, AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, B Z, CA, CH, CN, CO, CR, CU, CZ, DE , DK, DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, ID, IL, IN, I S, JP, KE, KG, KP, KR, KZ, LC, LK , LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ, NO, NZ, PL, P T, RO, RU, SD, SE, SG, SI, SK, SL , TJ, TM, TR, TT, TZ, UA, UG, UZ, VN, YU, ZA, ZW (72) Inventor Monroe, Donald W. United States Massachusetts 01741, Carlisle, Patch Meadow Lane 75 (72) Inventor Soda, Arnold N. United States Massachusetts 02460, Newtonville, Oak Rifroad 44 F-term (reference) 5K030 GA03 HA08 HD01 KA03 KA12 KA15

Claims

[Claims]

1. A method of processing a packet, the steps of receiving the packet, identifying a packet header portion of the data packet, and transmitting the packet header to a register file accessible to a processor. And processing the packet header without invoking at least one load and store instructions by the processor.

2. The method of claim 1, wherein the transmitting step is performed without invoking at least one load and store instruction.

3. The packet is divided into the packet header portion and the packet body portion, the packet header is transmitted to the register file using direct register access, and the packet body is transmitted to an output buffer. The method of claim 1, further comprising:

4. Selecting an output port for transmitting the packet, integrating the processed packet header with the packet body in the output buffer, and combining the integrated packet with the output buffer. To the selected output port for transmission from there. 4. The method of claim 3, further comprising:

5. Providing a plurality of identical processors to execute a general instruction set, each processor storing the instruction set locally in the processor, and the packet header The method of claim 1, further comprising: selecting a processor from the plurality of processors for processing, causing the selected processor to process the packet header.

6. The method of claim 5, wherein selecting the processor is performed by a state machine responsive to receiving the packet at an input port.

7. The step of causing the selected processor to process the packet header includes writing the packet header to at least one fixed location in the register file accessible to the selected processor. The method of claim 5, performed by at least one configured state machine.

8. The method of claim 5, further comprising downloading a general instruction set into instruction memory within each of the plurality of processors.

9. A method of processing a packet header of a packet received via a communication network, the method comprising: sending the packet header to at least one fixed location in the register file; Providing an associated processor, the processor repeatedly executing instructions in an infinite loop, the instructions being stored at a first known location in an instruction memory associated with the processor; Causing the processor to execute an instruction from a second known location in the instruction memory in response to transmitting the packet header, the instruction starting at the second known location in the instruction memory According to what is done in the at least one fixed location in the register file Processing a packet header, and resetting the processor to repeatedly execute the instruction stored at the first known location in the instruction memory after processing the packet header. , Including a method.

10. The method of claim 9, wherein the processing step comprises processing the packet header without invoking at least one load and store instruction.

11. A step of receiving the packet at an input port connected to the communication network, a step of selecting the processor from a plurality of candidate processors associated with the input port, the packet including the packet header and a packet. Process of dividing into main body and DRA issued by state machine connected to the register file
The method of claim 9, further comprising: executing the command to send the packet header to the at least one fixed location in the register file associated with the selected processor.

12. The method of claim 11, further comprising downloading a general instruction set into instruction memory in each of the plurality of candidate processors.

13. A packet processing system for processing a packet received via a communication network, the input port being configured to receive the packet via the communication network, the input port being associated with the input port. A processor, a register file accessible to the processor, an input element connected to the input port, the processor, and the register file, the entrance element comprising at least one of the packets by invoking a DRA command. An entry element configured to send a portion to the register file, the processor responsive to the DRA command without invoking at least one load instruction and a store instruction in the register file. Process the at least a portion of the packet That, packet processing system.

14. The packet processing system of claim 13, wherein the ingress element is configured to select the processor from a plurality of candidate processors associated with the input port.

15. A plurality of instruction memories are further included, each of the plurality of instruction memories being associated with a corresponding processor of the plurality of candidate processors, the plurality of instruction memories including the same instruction set. The packet processing system according to claim 14.

16. The packet processing system of claim 13, wherein the at least a portion of the packet corresponds to a header of the packet.

17. The packet processing system of claim 16, wherein the ingress element comprises a state machine configured to write the packet header to a fixed location within the register file.

18. A packet processing system for processing a packet header of a packet received via a communication network, comprising: an input port connected to the communication network; and a packet header connected to the input port to obtain the packet header. An ingress element configured to receive and parse the packet for storing the packet header received from the ingress element connected to the ingress element at at least one fixed location A register file, an instruction memory configured to return instructions from at least first and second addresses, the entry element, the register file, and a processor connected to the instruction memory, the processor comprising: Repeats the instructions stored in the first instruction memory. And running, the processor is responsive to signals from said inlet element, in order to process the packet header of the register file, the instruction said memory 2
A packet processing system including a processor that executes instructions from an address of.

19. An information processing system, a processor having an internal register file area for operating data and a unit, an entry element for sending unprocessed data to the internal register file area, and the internal register file area. An exit element for deleting processed data from the processor, the operation of the processor being restricted to manipulate the data in the internal register file area.

20. At least one state machine for controlling the operation of said ingress and egress elements and responsive to an instruction in said internal register file area, said state machine being in accordance with said instruction. 20. The system of claim 19, further comprising at least one state machine that moves data in and out of the internal register file area using direct access to.

21. The system of claim 20, further comprising a network interface that receives data from a communication network, the interface providing the received data to the ingress element.

22. A method of processing information, comprising providing a processor having an internal register file area for manipulating data and a unit, and sending unprocessed data to the internal register file area, and A step of deleting processed data from the internal register file area using direct access to the internal register file area, the operation of the processor to manipulate the data in the internal register file area. A method, comprising the steps of:

23. Providing at least one state machine for controlling the sending of data to and the deletion of data from the internal register file area using direct access to the internal register file area. Causing the processor to signal the state machine by writing the value in a control register, the state machine responsive to the value and performing the direct access by state machine logic, 23. The method of claim 22, further comprising:

24. The method of claim 22, wherein the unprocessed data is provided by a communication network having a line data rate and the processor processes the data at a rate equal to the line rate.

25. The unprocessed data is in the form of packets,
The method of claim 24.

26. A method of processing a packet stream containing a temporal sequence of packets, the method comprising providing a plurality of identical processors executing a general instruction set, each processor Storing the instruction set locally; receiving the packet; (i) identifying a packet header portion of the data packet for each packet; (ii) a processor from the plurality of processors. Selecting to process the packet header based on processor availability; and (iii) causing the selected processor to process the packet header using the locally stored instructions. Collecting the processed packets and replaying the packet stream according to the temporary sequence, Method.

27. The method of claim 26, wherein the plurality of processors are physically located on a plurality of integrated circuits.

28. A system for processing a packet stream containing a temporary sequence of packets, the plurality of identical processors executing a general instruction set, each processor comprising a local instruction containing the instruction set. A processor including a memory, an input port for receiving the packet, and an ingress logic unit connected to the input port and the processor, the ingress logic unit comprising: (i) for each packet Identifying a packet header portion, and (ii) selecting a processor from the plurality of processors,
The selected processor is configured to process the packet header based on processor availability, the selected processor processing the packet header using the locally stored instruction. And an egress logic unit that collects the processed packets and reproduces the packet stream according to the temporary sequence.

29. The system of claim 28, wherein the plurality of processors are physically located on a plurality of integrated circuits.