JP2000242613A

JP2000242613A - Meta address architecture and address specifying method for dynamic reconstitution calculation

Info

Publication number: JP2000242613A
Application number: JP2000034825A
Authority: JP
Inventors: Baxter Michael; バクスターマイケル
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1999-02-23
Filing date: 2000-02-14
Publication date: 2000-09-08
Anticipated expiration: 2020-02-14
Also published as: JP4285877B2

Abstract

PROBLEM TO BE SOLVED: To efficiently perform reprogramming. SOLUTION: This meta address specifying architecture for specifying a local memory destination for a data packet for the network of a dynamic reprogrammable processing machine is constructed by plural address specifying machines 14 provided with an unique geographical address for executing interruption, generating and transmitting a meta address including the geographical address and a local address and making respective messages stand by, the plural dynamic reprogrammable processing machines 12 connected to at least one address specifying machine for storing, retrieving and processing data from a local memory device corresponding to the received local address, plural memory devices relating to the dynamic reprogrammable processing machines 12 and an interconnection device 16 connected to the address specifying machines 14 for routing the data between the address specifying machines corresponding to the geographical address included in the meta address.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、一般にコンピュー
タアーキテクチャに係り、特に、再構成計算のためのシ
ステム及び方法、つまり、動的再構成計算のためのメタ
アドレスアーキテクチャ及びアドレス指定方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates generally to computer architecture, and more particularly to a system and method for reconfiguration computation, that is, a meta-address architecture and addressing method for dynamic reconfiguration computation.

【０００２】本発明は、１９９５年４月１７日付で提出
した米国特許Ｎｏ．５，７９４，０６２の分割出願であ
る１９９８年２月２６日付で提出した「変更可能な内部
ハードウェア編成を含む処理装置を用いた動的再構成計
算のためのシステム及び方法」という名称の米国特許出
願Ｎｏ．０９／０３１，３２３の米国一部継続出願に基
づく優先権主張出願である。[0002] The present invention is disclosed in US Pat. U.S. Pat. Patent application no. This is a priority application based on the U.S. Serial No. 09 / 031,323.

【０００３】[0003]

【従来の技術】コンピュータアーキテクチャの進展は、
より優れた計算性能への要求によって推進されている。
各種の計算問題を迅速かつ正確に解くには、一般に異な
る種類の計算リソースが必要である。問題の種類が限ら
れている場合には、検討中の種類の問題のために特に構
築された計算リソースを用いることによって計算性能を
向上させることができる。たとえば、デジタル信号処理
（ＤＳＰ：Digital Signal Processing）ハードウェア
を汎用コンピュータと併用すると、ある種の信号処理能
力を大幅に向上させることができる。コンピュータ自体
が検討中の種類の問題のために特別に構築されていると
きには、こうした特定の種類の問題について計算性能が
さらに向上するか、または、利用可能な計算リソースと
比べて、おそらくさらに最適化されたものとなる。現在
の並列コンピュータ及び大規模並列コンピュータは、Ｏ
（ｎ² ）またはそれ以上に複雑な特殊な種類の問題に対
する処理能力が優れており、これが上記の場合の例であ
る。2. Description of the Related Art Advances in computer architecture include:
Driven by the demand for better computing performance.
To quickly and accurately solve various types of computational problems, different types of computational resources are generally required. If the type of problem is limited, computational performance can be improved by using computational resources specifically constructed for the type of problem under consideration. For example, when digital signal processing (DSP) hardware is used in combination with a general-purpose computer, certain signal processing capabilities can be significantly improved. When the computer itself is specially constructed for the type of problem under consideration, the computational performance for these particular types of problems will be further improved, or perhaps further optimized, relative to the available computational resources. It was done. Current parallel computers and massively parallel computers are
Excellent handling of special types of problems (n ² ) or more, which is an example in the above case.

【０００４】優れた計算性能は必要ではあるが、その一
方でシステム費用を最小限に抑える必要性と均衡させな
ければならず、また現在及び将来考えられるできるだけ
広範囲な用途においてシステム生産性を最大限に高める
必要性とも均衡させなければならない。一般に、特殊な
ハードウェアは汎用ハードウェアより高価であるため、
限られた数種類の問題専用の計算リソースをコンピュー
タシステムに組込むことは、システム費用を低く抑える
ことに悪影響を与える。専用コンピュータを設計し生産
することは、エンジニアリング（工学設計）に要する時
間とハードウェアの費用の点からきわめて高価なものと
なる。計算性能を高めるために専用ハードウェアを用い
た場合、計算性能の必要度が変化すると性能上の利点は
少なくなる。先行技術では、計算性能の必要度が変化す
ると、新しい専用ハードウェアまたは新しい専用システ
ムが設計され、製造され、結果として望ましくないほど
高額の再活用できない設計・製造費用が繰返し支出され
る。したがって、特定種類の問題専用の計算リソースを
用いると、計算の必要度が変化した場合、利用可能なシ
リコンリソースを非効率に利用することになる。したが
って、上記のような理由で専用ハードウェアを用いて計
算性能を向上させようとする試みは望ましくない。While good computational performance is required, it must be balanced with the need to minimize system costs and maximize system productivity in the widest possible range of current and future applications. Needs to be balanced. In general, specialized hardware is more expensive than general-purpose hardware,
Incorporating computing resources dedicated to a limited number of problems into a computer system has a negative impact on keeping system costs low. Designing and producing a special purpose computer is extremely expensive in terms of engineering time and hardware costs. In the case where dedicated hardware is used to enhance the calculation performance, if the degree of calculation performance changes, the performance advantage decreases. In the prior art, as computational performance needs change, new dedicated hardware or new dedicated systems are designed and manufactured, resulting in recurring undesirably high non-reusable design and manufacturing costs. Thus, using computation resources dedicated to a particular type of problem results in inefficient use of available silicon resources when the computational needs change. For this reason, it is not desirable to use dedicated hardware to improve the calculation performance.

【０００５】従来、再プログラマブルハードウェアまた
は再構成ハードウェアを用いて計算性能を向上させ、ま
た問題の種類の適用可能性を最大限に高めるさまざまな
試みが行われてきた。最初のこのような先行技術のアプ
ローチは、ダウンロード可能マイクロコードコンピュー
タアーキテクチャによるものである。ダウンロード可能
マイクロアーキテクチャでは、固定された非再構成ハー
ドウェアリソースの機能を特定のバージョンのマイクロ
コードを用いることによって選択的に変化させることが
できる。このようなアーキテクチャの例に、ＩＢＭシス
テム／３６０がある。このような先行技術システムの基
本的計算ハードウェア自体は再構成可能ではないので、
広範囲の種類の問題について検討する場合、こうしたシ
ステムでは最適化された計算性能は得られない。In the past, various attempts have been made to improve computational performance using reprogrammable or reconfigurable hardware and to maximize the applicability of the type of problem. The first such prior art approach relies on a downloadable microcode computer architecture. In a downloadable microarchitecture, the functionality of fixed non-reconfigurable hardware resources can be selectively varied by using a particular version of microcode. An example of such an architecture is the IBM System / 360. Since the basic computing hardware of such prior art systems is not reconfigurable itself,
When considering a wide variety of problems, such systems do not provide optimized computational performance.

【０００６】計算性能を向上させ、問題の種類の適用可
能性を最大限に高めるための先行技術の第２のアプロー
チは、非再構成ホストプロセッサまたはホストシステム
に結合された再構成ハードウェアを用いることである。
この先行技術のアプローチでは、非再構成ホストに結合
された１個またはそれ以上の再構成プロセッサを利用す
ることが最も一般的である。このアプローチは、ホスト
に付加されたプロセッサセット内のハードウェアの一部
分が再構成できるような「付加再構成可能プロセッサ
（ＡＲＰ：Attached Reconfigurable Processor）」ア
ーキテクチャとして分類することができる。ホストシス
テムに結合された１組の再構成プロセッサを利用する現
在の付加再構成可能プロセッサ（ＡＲＰ）システムの例
には、Supercomputing Research Center（Bowie，メリ
ーランド）が設計したＳＰＬＡＳＨ−１とＳＰＬＡＳＨ
−２、Annapolis Micro Systems（Annapolis，メリーラ
ンド）製のWILDFIRE Custom Configurable Computer
（ＳＰＬＡＳＨ−２の市販バージョン）、Virtual Comp
uter Corporation（Reseda，カリフォルニア）製のＥＣ
Ｖ−１がある。計算を主体とした問題の多くでは、プロ
グラムコードの比較的小さな部分の実行にかなりの時間
が費やされる。一般に、付加再構成可能プロセッサ（Ａ
ＲＰ）アーキテクチャを用いて、プログラムコードのこ
のような部分のために再構成計算アクセラレータが提供
される。A second prior art approach to improving computational performance and maximizing the applicability of a type of problem uses reconfigurable hardware coupled to a non-reconfigurable host processor or host system. That is.
This prior art approach most commonly utilizes one or more reconfigurable processors coupled to a non-reconfigurable host. This approach can be categorized as an "Attached Reconfigurable Processor (ARP)" architecture, where a portion of the hardware in the processor set attached to the host can be reconfigured. Examples of current additive reconfigurable processor (ARP) systems utilizing a set of reconfigurable processors coupled to a host system include SPLASH-1 and SPLASH designed by the Supercomputing Research Center (Bowie, MD).
-2, WILDFIRE Custom Configurable Computer manufactured by Annapolis Micro Systems (Annapolis, MD)
(Commercial version of SPLASH-2), Virtual Comp
EC made by uter Corporation (Reseda, California)
V-1. In many computationally intensive problems, a significant amount of time is spent executing relatively small portions of program code. Generally, additional reconfigurable processors (A
Using the (RP) architecture, a reconstruction computation accelerator is provided for such portions of program code.

【０００７】[0007]

【発明が解決しようとする課題】残念ながら、１個また
はそれ以上の再構成計算アクセラレータを基礎においた
計算モデルには、下記に詳細に説明するような重大な欠
点がある。Unfortunately, computational models based on one or more reconstructed computational accelerators have significant disadvantages, as described in detail below.

【０００８】＜第1の欠点＞付加再構成可能プロセッサ
（ＡＲＰ）アーキテクチャの第１の欠点は、付加再構成
可能プロセッサ（ＡＲＰ）システムが特定のときに再構
成ハードウェアの特定のアルゴリズムの最適実動化を実
行しようと試みるために起こる。First disadvantage The first disadvantage of the incrementally reconfigurable processor (ARP) architecture is that when the incrementally reconfigurable processor (ARP) system is specific, the optimal implementation of the specific algorithm of the reconfigurable hardware is not possible. Happens to try to perform the mobilization.

【０００９】たとえば、Virtual Computer Corporation
のＥＣＶ−１の背後にある設計方針は、特定のアルゴリ
ズムのために最適の計算性能を提供するよう、その特定
のアルゴリズムを再構成ハードウェアソースの特定の構
成に変換するというものである。再構成ハードウェアリ
ソースは、特定のアルゴリズムのために最適の能力を提
供する目的だけに用いられる。命令実行の管理などの一
般的な目的のために再構成ハードウェアリソースを用い
ることは避けられる。したがって、所定のアルゴリズム
について、再構成ハードウェアリソースは最適の能力が
得られるよう結合された個々のゲートの全体像から検討
される。For example, Virtual Computer Corporation
The design philosophy behind the ECV-1 is to translate that particular algorithm into a particular configuration of reconfigured hardware sources to provide optimal computational performance for that particular algorithm. Reconfigured hardware resources are used only to provide optimal performance for a particular algorithm. Using reconfigurable hardware resources for general purposes such as managing instruction execution is avoided. Thus, for a given algorithm, the reconfigured hardware resources are considered from an overview of the individual gates combined for optimal performance.

【００１０】一部の付加再構成可能プロセッサ（ＡＲ
Ｐ）システムは、「プログラム」が従来型のプログラム
命令と、各種再構成ハードウェアリソースがどのように
相互結合されるかを定める専用命令との両方を含むプロ
グラミングモデルに依拠している。付加再構成可能プロ
セッサ（ＡＲＰ）システムは、ゲートレベルアルゴリズ
ムに適した固有の方法で再構成ハードウェアリソースを
検討するので、これらの専用命令は、用いられる各再構
成ハードウェアリソースの特性に関する詳細な内容と、
再構成ハードウェアリソースが他の再構成ハードウェア
リソースに結合される方法を提供しなければならない。
これにより、プログラムは複雑なものとなる。プログラ
ミングの複雑さを軽減するために、プログラムに従来型
の高レベルプログラミング言語命令と高レベル専用命令
の両方を含めたプログラミングモデルを利用する試みが
行われている。つまり、現在の付加再構成可能プロセッ
サ（ＡＲＰ）システムでは、高レベルプログラミング言
語命令と上記の高レベル専用命令の両方をコンパイルで
きるコンパイルシステムを利用しようという試みがなさ
れている。このようなコンパイルシステムの目的出力
は、従来型の高レベルプログラミング言語命令について
はアセンブリ言語コードであり、専用命令についてはハ
ードウェア記述言語（ＨＤＬ：Hardware Description L
anguage）である。検討中の特定のアルゴリズムについ
て最適の計算性能を得るために１組の再構成ハードウェ
アリソースと相互結合スキームを自動決定することは、
残念ながらＮＰハード問題である。一部の付加再構成可
能プロセッサ（ＡＲＰ）システムの長期目標は、アルゴ
リズムを１組のゲートのための最適化相互結合スキーム
に直接コンパイルできるコンパイルシステムを開発する
ことである。しかし、このようなコンパイルシステムの
開発は、特に複数の種類のアルゴリズムについて検討す
る場合、きわめて困難な作業である。Some additional reconfigurable processors (ARs)
P) Systems rely on a programming model in which a "program" includes both conventional program instructions and dedicated instructions that define how the various reconfigurable hardware resources are interconnected. Since the additive reconfigurable processor (ARP) system considers the reconfigurable hardware resources in a unique manner appropriate for the gate-level algorithm, these dedicated instructions provide detailed information about the characteristics of each reconfigurable hardware resource used. Content and
A method must be provided in which reconfigured hardware resources are combined with other reconfigured hardware resources.
This makes the program complicated. Attempts have been made to use programming models that include both conventional high-level programming language instructions and high-level dedicated instructions in programs to reduce programming complexity. That is, current additive reconfigurable processor (ARP) systems attempt to utilize a compilation system that can compile both high-level programming language instructions and the high-level dedicated instructions described above. The intended output of such a compilation system is assembly language code for conventional high-level programming language instructions, and hardware description language (HDL) for dedicated instructions.
anguage). Automatically determining a set of reconfigured hardware resources and interconnection schemes for optimal computational performance for the particular algorithm under consideration is:
Unfortunately, it is an NP hardware problem. The long-term goal of some additive reconfigurable processor (ARP) systems is to develop a compilation system that can compile algorithms directly into an optimized interconnection scheme for a set of gates. However, developing such a compilation system is an extremely difficult task, especially when considering multiple types of algorithms.

【００１１】＜第２の欠点＞付加再構成可能プロセッサ
（ＡＲＰ）アーキテクチャの第２の欠点は、付加再構成
可能プロセッサ（ＡＲＰ）装置を構成するアルゴリズム
に関連した計算作業を、付加再構成可能プロセッサ（Ａ
ＲＰ）装置が多重再構成論理装置全体に分散するために
起こる。<Second Disadvantage> A second disadvantage of the additively reconfigurable processor (ARP) architecture is that the computational work associated with the algorithms that make up the incrementally reconfigurable processor (ARP) device is performed by the additional reconfigurable processor. (A
RP) occurs because devices are spread across multiple reconfigurable logic devices.

【００１２】たとえば１組のフィールドプログラマブル
論理回路（ＦＰＧＡ）を用いて実装され、また並列乗算
アクセラレータを実動化するために構成された付加再構
成可能プロセッサ（ＡＲＰ）装置については、並列乗算
に関連した計算作業がフィールドプログラマブル論理回
路（ＦＰＧＡ）全体に分散される。したがって、付加再
構成可能プロセッサ（ＡＲＰ）装置を構成できるアルゴ
リズムの大きさは、存在する再構成論理装置の数によっ
て制限される。同様に、付加再構成可能プロセッサ（Ａ
ＲＰ）装置が扱うことができる最大データセットの大き
さも制限される。一部のアルゴリズムにはデータ従属性
があるので、ソースコードの試験を行っても、付加再構
成可能プロセッサ（ＡＲＰ）装置の限界が必ずしも明示
的に示されるとは限らない。一般に、データ従属性アル
ゴリズムは避けられる。For example, an additional reconfigurable processor (ARP) device implemented using a set of field-programmable logic circuits (FPGAs) and configured to implement a parallel multiplication accelerator may be associated with parallel multiplication. The calculated work is distributed throughout the field programmable logic circuit (FPGA). Therefore, the size of the algorithm that can constitute an additional reconfigurable processor (ARP) device is limited by the number of reconfigurable logic devices present. Similarly, the additional reconfigurable processor (A
The size of the largest data set that the RP) device can handle is also limited. Because some algorithms have data dependencies, testing source code does not always explicitly indicate the limitations of additional reconfigurable processor (ARP) devices. Generally, data dependency algorithms are avoided.

【００１３】さらに、付加再構成可能プロセッサ（ＡＲ
Ｐ）アーキテクチャが多重再構成論理装置全体に計算作
業を分散することを開示しているので、新規のまたはや
や修正したアルゴリズムを含めるには、再構成をひとま
とめに行う必要がある。すなわち、多重再構成論理装置
を再構成しなければならない。これにより、別の問題ま
たはカスケード接続された副次的問題について再構成を
行うことができる最大レートが限定される。Further, an additional reconfigurable processor (AR)
P) Because the architecture discloses distributing the computational work across multiple reconfigurable logic units, the reconfiguration must be performed in batches to include new or slightly modified algorithms. That is, the multiple reconfigurable logic device must be reconfigured. This limits the maximum rate at which reconstruction can be performed for another problem or a cascaded secondary problem.

【００１４】＜第３の欠点＞付加再構成可能プロセッサ
（ＡＲＰ）アーキテクチャの第３の欠点は、プログラム
コードの１つまたはそれ以上の部分がホストで実行され
るために生じる。<Third Disadvantage> A third disadvantage of the additively reconfigurable processor (ARP) architecture arises because one or more portions of the program code are executed on the host.

【００１５】すなわち、付加再構成可能プロセッサ（Ａ
ＲＰ）装置はそれ自体独立した計算システムではなく、
プログラム全体を実行するものではない。このためホス
トとの相互作用が必要となる。一部のプログラムコード
が非再構成ホストで実行されるので、利用可能なシリコ
ンリソースがプログラムの実行の時間枠において最大限
に利用されない。特にホストによる命令の実行中、付加
再構成可能プロセッサ（ＡＲＰ）装置のシリコンリソー
スはアイドル状態であるか、非効率な利用状態にある。
同様に、付加再構成可能プロセッサ（ＡＲＰ）装置がデ
ータを処理するとき、ホストでのシリコンリソースの利
用はおおむね非効率である。複数の全プログラムを容易
に実行するためには、システム内のシリコンリソース
を、容易に再利用できるリソースにグループ化しなけれ
ばならない。上記のように、付加再構成可能プロセッサ
（ＡＲＰ）システムは、再構成ハードウェアリソースを
特定の時間における特定のアルゴリズムの実動化のため
に最適に相互結合された１組のゲートとして扱う。した
がって、再利用できるためにはアルゴリズムがある程度
の独立性をもつ必要があるので、付加再構成可能プロセ
ッサ（ＡＲＰ）システムは再構成ハードウェアリソース
の特定のセットをアルゴリズムごとに容易に再利用でき
るリソースとして扱うための手段は提供しない。That is, the additional reconfigurable processor (A
RP) device is not an independent computing system itself,
It does not execute the entire program. This requires interaction with the host. Since some program code runs on a non-reconfigured host, available silicon resources are not fully utilized in the time frame of program execution. In particular, during the execution of instructions by the host, the silicon resources of the additional reconfigurable processor (ARP) device are idle or inefficiently utilized.
Similarly, utilization of silicon resources at the host is generally inefficient when additional reconfigurable processor (ARP) devices process data. In order to easily execute all of the programs, the silicon resources in the system must be grouped into resources that can be easily reused. As described above, additive reconfigurable processor (ARP) systems treat reconfigurable hardware resources as a set of gates that are optimally interconnected for the implementation of a particular algorithm at a particular time. Therefore, since an algorithm must have some degree of independence to be reusable, an additively reconfigurable processor (ARP) system uses a resource that can easily reuse a particular set of reconfigurable hardware resources for each algorithm. It does not provide a means to treat as.

【００１６】付加再構成可能プロセッサ（ＡＲＰ）装置
は、現在実行しているホストプログラムをデータとして
扱うことができず、一般にそれ自体を計算環境に適合さ
せることができない。付加再構成可能プロセッサ（ＡＲ
Ｐ）装置は、それ自体のホストプログラムを実行するこ
とによって、それ自体をシミュレートするようには作ら
れていない。さらに付加再構成可能プロセッサ（ＡＲ
Ｐ）装置は、構築される再構成ハードウェアリソースを
直接用いて、それ自体に対しそれ自体のハードウェア記
述言語（ＨＤＬ）またはアプリケーションプログラムを
コンパイルするようには作られていない。したがって付
加再構成可能プロセッサ（ＡＲＰ）装置は、ホストプロ
セッサからの独立性を開示する独立計算モデルに関して
アーキテクチャ的に制限されている。An additively reconfigurable processor (ARP) device cannot handle the currently executing host program as data, and generally cannot adapt itself to the computing environment. Additional Reconfigurable Processor (AR
P) The device is not designed to simulate itself by executing its own host program. In addition, additional reconfigurable processor (AR
P) The device is not designed to compile its own hardware description language (HDL) or application program for itself, directly using the reconfigured hardware resources that are built. Accordingly, additional reconfigurable processor (ARP) devices are architecturally limited with respect to an independent computation model that discloses independence from the host processor.

【００１７】付加再構成可能プロセッサ（ＡＲＰ）装置
は、計算アクセラレータとして機能するので、一般に独
立した入力／出力（Ｉ／Ｏ）処理は行えない。通常は、
付加再構成可能プロセッサ（ＡＲＰ）装置は入出力処理
のためのホスト相互作用を必要とする。したがって、付
加再構成可能プロセッサ（ＡＲＰ）装置の性能は入出力
について限られているかもしれない。しかし、当業者
は、付加再構成可能プロセッサ（ＡＲＰ）装置が特定の
入出力問題を加速処理するために構成できることを認め
るであろう。しかし、付加再構成可能プロセッサ（ＡＲ
Ｐ）装置全体は単一の特定の問題について構成されてい
るので、付加再構成可能プロセッサ（ＡＲＰ）装置が入
出力処理とデータ処理について互いに悪影響を与えずに
均衡をとることはできない。さらに、付加再構成可能プ
ロセッサ（ＡＲＰ）装置は割込み処理のための手段を提
供しない。付加再構成可能プロセッサ（ＡＲＰ）装置は
計算アクセラレーションを最大化するのに向けられてい
るので、付加再構成可能プロセッサ（ＡＲＰ）に関する
開示内容ではこのような割込みメカニズムは述べられて
おらず、割込みは計算加速に否定的な影響を与える。An add-on reconfigurable processor (ARP) device functions as a computation accelerator and generally cannot perform independent input / output (I / O) processing. Normally,
Additional reconfigurable processor (ARP) devices require host interaction for I / O processing. Thus, the performance of an additively reconfigurable processor (ARP) device may be limited in input and output. However, those skilled in the art will recognize that additional reconfigurable processor (ARP) devices can be configured to accelerate certain input / output problems. However, additional reconfigurable processors (AR
P) Since the entire device is configured for a single particular problem, additive reconfigurable processor (ARP) devices cannot balance input / output processing and data processing without adversely affecting each other. Further, additional reconfigurable processor (ARP) devices do not provide a means for interrupt handling. Since additional reconfigurable processor (ARP) devices are aimed at maximizing computational acceleration, the disclosure of additional reconfigurable processor (ARP) does not mention such an interrupt mechanism, and Has a negative effect on computational acceleration.

【００１８】＜第４の欠点＞付加再構成可能プロセッサ
（ＡＲＰ）アーキテクチャの第４の欠点は、付加再構成
可能プロセッサ（ＡＲＰ）装置を用いて利用するのが困
難な固有のデータ並列性を有するソフトウェアアプリケ
ーションが存在するために生じる。<Fourth Disadvantage> A fourth disadvantage of the additively reconfigurable processor (ARP) architecture is that it has inherent data parallelism that is difficult to use with additively reconfigurable processor (ARP) devices. Caused by the presence of software applications.

【００１９】きわめて大規模なネットリストのネットネ
ーム記号導出が必要とされるときには、ハードウェア記
述言語（ＨＤＬ）コンパイルアプリケーションがこのよ
うな例の１つとして挙げられる。A hardware description language (HDL) compilation application is one such example when very large netlists need to be deduced for netname symbols.

【００２０】＜第５の欠点＞付加再構成可能プロセッサ
（ＡＲＰ）アーキテクチャの第５の欠点は、これのアー
キテクチャが基本的にＳＩＭＤ（Single Instruction S
tream Multiple Data Stream）コンピュータアーキテク
チャモデルである点である。<Fifth Disadvantage> A fifth disadvantage of the additional reconfigurable processor (ARP) architecture is that its architecture is basically SIMD (Single Instruction S
(tream Multiple Data Stream) is a computer architecture model.

【００２１】したがって、付加再構成可能プロセッサ
（ＡＲＰ）アーキテクチャは、１つまたはそれ以上の革
新的先行技術の非再構成システムと比べ、アーキテクチ
ャとして効率的ではない。付加再構成可能プロセッサ
（ＡＲＰ）システムは、各特定の構成例について、利用
できる再構成ハードウェアが提供できるのと同じ程度の
計算能力について、プログラム実行のプロセスのほんの
一部、主として算術計算のための演算ロジックしか反映
していない。これに対して、１９７１年のFairchildで
のＳＹＭＢＯＬマシンのシステム設計では、コンピュー
タ全体がプログラムの実行の各局面について一意的なハ
ードウェアコンテクストを使用している。結果として、
付加再構成可能プロセッサ（ＡＲＰ）システムが開示し
ているホスト部分を含めて、ＳＹＭＢＯＬはコンピュー
タのシステムアプリケーションについてのすべての構成
部分を含むこととなった。Accordingly, the additively reconfigurable processor (ARP) architecture is not as efficient as an architecture as compared to one or more innovative prior art non-reconfigurable systems. An additive reconfigurable processor (ARP) system provides, for each particular configuration example, only a small part of the process of executing a program, primarily arithmetic, for the same amount of computing power that available reconfigurable hardware can provide. Only reflects the calculation logic. In contrast, in the system design of the SYMBOL machine at Fairchild in 1971, the entire computer uses a unique hardware context for each phase of program execution. as a result,
SYMBOL will include all components for computer system applications, including the host portion disclosed by the Additive Reconfigurable Processor (ARP) system.

【００２２】＜その他の欠点＞付加再構成可能プロセッ
サ（ＡＲＰ）アーキテクチャにはその他の欠点もある。Other Disadvantages The additional reconfigurable processor (ARP) architecture has other disadvantages.

【００２３】たとえば、付加再構成可能プロセッサ（Ａ
ＲＰ）装置は多重再構成論理装置に独立したタイミング
を与えるための有効な手段を持っていない。同様に、カ
スケード接続された付加再構成可能プロセッサ（ＡＲ
Ｐ）装置には、独立してタイミング決定された装置を提
供するための有効なクロック分散手段を持っていない。
別の例としては、実行時間と、アクセラレーションを試
みるソースコード文とを正確に相関させることが困難な
ことがある。ネットシステムクロックレートを正確に算
出するためには、ハードウェア記述言語（ＨＤＬ）コン
パイルのあと、コンピュータ支援設計（ＣＡＤ：Comput
er‐Aided Design）手段で付加再構成可能プロセッサ
（ＡＲＰ）装置をモデル化しなければならないが、この
ような基本パラメータへの到達は時間のかかるプロセス
である。For example, an additional reconfigurable processor (A
RP) devices have no effective means to provide independent timing to multiple reconfigurable logic devices. Similarly, a cascaded additional reconfigurable processor (AR
P) Devices do not have effective clock distribution means to provide independently timed devices.
As another example, it may be difficult to accurately correlate execution time with the source code statement attempting to accelerate. In order to accurately calculate the net system clock rate, it is necessary to compile a hardware description language (HDL) and then execute a computer-aided design (CAD).
The er-Aided Design means must be used to model additional reconfigurable processor (ARP) devices, but reaching such basic parameters is a time consuming process.

【００２４】従来のアーキテクチャで同様に重要な問題
は、それらが仮想または共有メモリを使用していること
である。この開示内容は統合されたアドレススペースを
使用するということであり、これによって、より複雑な
アドレス指定演算が必要となり、そのためメモリアクセ
スが遅くなり、効率が低下する。たとえば仮想メモリを
用いてシステム内のメモリ装置の個々のビットにアクセ
スするには、まずメモリの物理的アドレススペースを論
理アドレスに区分し、次に仮想アドレスをその論理アド
レスにマッピングしなければならない。このようにして
ようやくメモリの各ビットにアクセスすることができ
る。さらに共有メモリシステムでは、プロセッサは通常
はメモリにアクセスを許容する前にアドレス確認演算を
実行するため、メモリ演算はさらに複雑になる。最後に
プロセッサは、ある種の優先度決定システムを提供する
ことにより、メモリの同じ領域に同時にアクセスしよう
とする複数のプロセスの間で調整（仲裁）を行わなけれ
ばならない。An equally important problem with conventional architectures is that they use virtual or shared memory. The disclosure discloses the use of a unified address space, which requires more complex addressing operations, which slows down memory access and reduces efficiency. For example, to access individual bits of a memory device in a system using virtual memory, the physical address space of the memory must first be partitioned into logical addresses, and then the virtual addresses must be mapped to the logical addresses. Thus, each bit of the memory can be finally accessed. Further, in shared memory systems, memory operations are further complicated because the processor typically performs address confirmation operations before allowing access to the memory. Finally, the processor must provide coordination (arbitration) between multiple processes attempting to simultaneously access the same area of memory by providing some sort of prioritization system.

【００２５】共有メモリ及び仮想メモリを使用すること
により起こる多くの問題に対処するために、多くの従来
型のシステムはメモリ管理システム（ＭＭＵ：Memory M
anagement Units）を用いて、論理アドレスの仮想アド
レスへの変換などのメモリ管理機能のほとんど実行して
いる。しかし、メモリ管理システム（ＭＭＵ）／ソフト
ウェア相互作用により、メモリアクセス演算はさらに複
雑になる。さらにメモリ管理システム（ＭＭＵ）は、実
行できる演算の種類がきわめて限定されている。メモリ
管理システム（ＭＭＵ）は、割込みを扱うことができ
ず、メッセージを待ち行列に待機させることができず、
また複雑なアドレス指定演算を実行することができない
ため、すべてプロセッサが実行しなければならない。多
重並列プロセッサを有するコンピュータアーキテクチャ
で共有メモリまたは仮想メモリシステムを用いると、上
記のような欠点がさらに拡大される。ハードウェア／ソ
フトウェア相互作用が上に述べたように管理されなけれ
ばならないばかりでなく、共有メモリにアクセスしよう
とする多重プロセッサに応じてメモリ内のデータのコヒ
ーレント性と一致性を、ソフトウェア及びハードウェア
の両方によって維持しなければならない。プロセッサを
追加すると、仮想アドレスの論理アドレスへの変換はさ
らに困難となる。メモリアクセス演算におけるこうした
複雑性によってシステム能力が必然的に低下し、プロセ
ッサを追加しシステムの規模が大きくなればなるほどこ
の能力低下は大幅なものとなる。To address many of the problems caused by using shared and virtual memory, many conventional systems use a memory management system (MMU).
Most of the memory management functions such as conversion of a logical address to a virtual address are performed by using an anagement unit. However, the memory management operation (MMU) / software interaction further complicates the memory access operation. Furthermore, the types of operations that can be performed by a memory management system (MMU) are extremely limited. The memory management system (MMU) cannot handle interrupts, cannot queue messages,
Also, since complex addressing operations cannot be performed, all must be performed by the processor. The use of a shared memory or virtual memory system in a computer architecture having multiple parallel processors further amplifies the disadvantages described above. Not only must the hardware / software interaction be managed as described above, but also the coherency and consistency of the data in memory depending on the multiple processors attempting to access the shared memory, software and hardware Must be maintained by both. With the addition of processors, the translation of virtual addresses to logical addresses becomes more difficult. Such complexity in memory access operations necessarily reduces system capacity, and the more processors are added and the size of the system increases, the greater the reduction in capacity.

【００２６】従来型のシステムの一例は、キャッシュコ
ヒーレント、非一様メモリアクセス（ｃｃＮＵＭＡ：No
n-Uniform Memory Access）コンピュータアーキテクチ
ャである。非一様メモリアクセス（ｃｃＮＵＭＡ）マシ
ンは、キャッシュ制御装置やクロスバー切替装置などの
複雑で高価なハードウェアを使用し、このメモリは多重
プロセッサにより実際には共有されているとしても、個
々の独立したＣＰＵについて単一のアドレススペースの
幻影（仮想スペース）を維持する。非一様メモリアクセ
ス（ｃｃＮＵＭＡ）はやや拡張性があるが、この拡張性
は、システム内でプロセッサを緊密に結合させるために
ハードウェアを追加することによって達成される。この
種のシステムは、科学的計算での有限要素グリッドの場
合のように、共有メモリ入出力演算のためきわめて広い
帯域幅を必要とする単一のプログラムイメージが共有さ
れている計算環境で用いられると一層有利である。さら
に非一様メモリアクセス（ｃｃＮＵＭＡ）は、プロセッ
サの特性が互いに類似していないようなシステムでは役
に立たない。非一様メモリアクセス（ｃｃＮＵＭＡ）ア
ーキテクチャでは、追加される各プロセッサが既存のプ
ロセッサと同種のものでなければならない。したがっ
て、プロセッサを異なる機能を実行させるために最適化
し、互いに異なった作動を行うシステムでは、非一様メ
モリアクセス（ｃｃＮＵＭＡ）アーキテクチャは有効な
解決策とはならない。最後に、従来型のシステムでは、
標準的メモリアドレススキームのみがシステム内のメモ
リをアドレス指定するのに用いられる。One example of a conventional system is a cache coherent, non-uniform memory access (ccNUMA: No
n-Uniform Memory Access) computer architecture. Non-uniform memory access (ccNUMA) machines use complex and expensive hardware, such as cache controllers and crossbar switches, which are individually shared, even though they are actually shared by multiple processors. A single address space phantom (virtual space) is maintained for a given CPU. Non-uniform memory access (ccNUMA) is somewhat scalable, but this scalability is achieved by adding hardware to tightly couple processors in the system. This type of system is used in computing environments where a single program image is required that requires extremely high bandwidth for shared memory I / O operations, such as the case of finite element grids in scientific computing. This is more advantageous. Further, non-uniform memory access (ccNUMA) is useless in systems where the characteristics of the processors are not similar to each other. In a non-uniform memory access (ccNUMA) architecture, each processor added must be of the same type as an existing processor. Thus, in systems where processors are optimized to perform different functions and operate differently, a non-uniform memory access (ccNUMA) architecture is not a viable solution. Finally, in traditional systems,
Only standard memory addressing schemes are used to address memory in the system.

【００２７】必要とされているのは、拡張性、平易性の
あるアドレス指定を提供し、またシステムの処理能力に
ほとんど影響を与えないような、並列計算環境でのメモ
リをアドレス指定するための手段である。What is needed is a method for addressing memory in a parallel computing environment that provides scalable and easy addressing and has little effect on the processing power of the system. Means.

【００２８】[0028]

【課題を解決するための手段】請求項１記載の動的再構
成計算のためのメタアドレスアーキテクチャの発明は、
動的再プログラマブル処理マシンのネットワークのため
の、データパケットのためのローカルメモリ宛先を指定
するメタアドレス指定アーキテクチャであって、それぞ
れ、割込みを実施し、地理アドレスとローカルアドレス
とを含むメタアドレスを生成し、伝送し、各メッセージ
を待機させるための一意的な地理アドレスを有する複数
のアドレス指定マシンと、それぞれ、少なくとも１個の
前記アドレス指定マシンに結合され、受取ったローカル
アドレスに応じてローカルメモリ装置からデータを記憶
し、検索し、処理する複数の動的再プログラマブル処理
マシンと、それぞれ、前記動的再プログラマブル処理マ
シンに関連している複数のメモリ装置と、前記アドレス
指定マシンに結合され、メタアドレスに含まれる地理ア
ドレスに応じて前記アドレス指定マシン相互間でデータ
をルーティングする相互結合装置と、を具備する。According to the first aspect of the present invention, there is provided a meta-address architecture for dynamic reconfiguration calculation.
A meta-addressing architecture for specifying a local memory destination for data packets for a network of dynamically reprogrammable processing machines, each implementing an interrupt and generating a meta-address including a geographic address and a local address. A plurality of addressing machines having a unique geographical address for transmitting, waiting for each message, and a local memory device each coupled to at least one of said addressing machines and responsive to a received local address A plurality of dynamically reprogrammable processing machines for storing, retrieving, and processing data from a plurality of memory devices each associated with the dynamically reprogrammable processing machine; and Previous depending on the geographic address included in the address Comprising a mutual coupling device that routes data between addressing machines each other.

【００２９】請求項２記載の発明は、請求項１記載の動
的再構成計算のためのメタアドレスアーキテクチャにお
いて、少なくとも１個の前記アドレス指定マシンが、受
取ったメタアドレスを地理アドレスとローカルアドレス
とに復号するためのアドレス復号器と、前記動的再プロ
グラマブル処理マシンと前記ローカルメモリ装置と前記
アドレス復号器とに結合され、前記動的再プログラマブ
ル処理マシンからの無条件命令の受領に応じて前記ロー
カルメモリからメタアドレス情報を検索し、検索したメ
タアドレスに応じてデータパケットを組立て、前記アド
レス復号器から地理アドレスとローカルアドレスとを受
取り、復号した地理アドレスが関連地理アドレスに対応
するとの決定に応じて前記動的再プログラマブル処理マ
シンにデータパケットを伝送するための制御装置と、を
具備する。According to a second aspect of the present invention, in the meta-address architecture for dynamic reconfiguration calculation according to the first aspect, at least one of the addressing machines converts a received meta-address into a geographical address and a local address. An address decoder for decoding to the dynamic reprogrammable processing machine, the local memory device and the address decoder, and responsive to receipt of an unconditional instruction from the dynamic reprogrammable processing machine. Retrieving the meta address information from the local memory, assembling a data packet according to the retrieved meta address, receiving the geographic address and the local address from the address decoder, and determining that the decoded geographic address corresponds to the relevant geographic address. Data packet to the dynamic reprogrammable processing machine Comprising a control unit for transmitting and.

【００３０】請求項３記載の発明は、請求項１記載の動
的再構成計算のためのメタアドレスアーキテクチャにお
いて、前記動的再プログラマブル処理マシンに結合さ
れ、結合されている前記動的再プログラマブル処理マシ
ンについての前記地理アドレスを記憶する複数のアーキ
テクチャ記述メモリ装置を具備する。According to a third aspect of the present invention, in the meta-address architecture for dynamic reconfiguration calculation according to the first aspect, the dynamic reprogrammable processing is coupled to the dynamic reprogrammable processing machine. A plurality of architectural description memory devices for storing said geographic addresses for machines.

【００３１】請求項４記載の発明は、請求項２記載の動
的再構成計算のためのメタアドレスアーキテクチャにお
いて、前記アドレス指定マシンが、入出力装置に結合さ
れた割込みハンドラーをさらに含み、この割込みハンド
ラーが、割込み要求を識別するための認識装置と、割込
み要求の有効性を検証するために、識別した割込み要求
を割込み要求の記憶されたリストと比較するためのコン
パレータと、記憶された割込み処理命令に従って有効性
が確認された割込み要求を処理するための割込みロジッ
クと、を具備する。According to a fourth aspect of the present invention, in the meta-address architecture for dynamic reconfiguration calculation according to the second aspect, the addressing machine further includes an interrupt handler coupled to an input / output device. A handler for identifying the interrupt request, a comparator for comparing the identified interrupt request with a stored list of interrupt requests to verify the validity of the interrupt request, and a stored interrupt process Interrupt logic for processing an interrupt request that has been validated according to the instruction.

【００３２】請求項５記載の発明は、請求項１記載の動
的再構成計算のためのメタアドレスアーキテクチャにお
いて、メタアドレスが８０ビット幅であり、地理アドレ
スが１６ビット幅であり、ローカルアドレスが６４ビッ
ト幅である。According to a fifth aspect of the present invention, in the meta-address architecture for dynamic reconfiguration calculation according to the first aspect, the meta-address is 80 bits wide, the geographic address is 16 bits wide, and the local address is It is 64 bits wide.

【００３３】請求項６記載のアドレス指定方法の発明
は、ローカルアドレス指定マシンとローカルメモリとに
結合されたローカル処理マシンを用い、前記ローカルア
ドレス指定マシンが、一意的な地理識別によって識別さ
れ、相互結合装置によって相互結合されている並列プロ
セッサアーキテクチャ内で命令を処理するための方法で
あって、プログラム命令を受取る段階と、受取ったプロ
グラム命令が遠隔演算を要求しているかどうかを決定す
る段階と、要求されている遠隔演算に応じて、前記ロー
カルメモリに遠隔コンポーネント情報を記憶する段階
と、遠隔演算を開始するために、前記ローカルアドレス
指定マシンに無条件命令を発する段階と、を具備する。[0033] The invention of an addressing method according to claim 6 uses a local processing machine coupled to a local addressing machine and a local memory, wherein the local addressing machine is identified by a unique geographical identification and is mutually exclusive. A method for processing instructions in a parallel processor architecture interconnected by a coupling device, the method comprising: receiving a program instruction; determining whether the received program instruction requires a remote operation; Storing remote component information in the local memory in response to the remote operation being requested; and issuing unconditional instructions to the local addressing machine to initiate the remote operation.

【００３４】請求項７記載の発明は、請求項６記載のア
ドレス指定方法において、前記ローカルアドレス指定マ
シンは、前記ローカル処理マシンから無条件命令を受取
る段階と、ローカル地理アドレスと、遠隔地理アドレス
と、遠隔ローカルメモリアドレスとを含んでいる遠隔コ
ンポーネント情報を前記ローカルメモリから検索する段
階と、検索した遠隔コンポーネント情報に応じてメタア
ドレスを生成する段階と、生成したメタアドレスに応じ
てデータパケットを生成する段階と、データパケットを
相互結合装置に送る段階と、を実行する。According to a seventh aspect of the present invention, in the addressing method according to the sixth aspect, the local addressing machine receives an unconditional instruction from the local processing machine; Retrieving remote component information from the local memory including a remote local memory address; generating a meta-address in response to the retrieved remote component information; and generating a data packet in response to the generated meta-address. And sending the data packet to the interconnection device.

【００３５】請求項８記載のアドレス指定方法の発明
は、ローカル処理装置がローカルメモリとローカルアド
レスマシンと相互結合装置とに結合されている並列計算
環境でメモリをアドレス指定するためのアドレス指定方
法であって、前記ローカルアドレスマシンは、データパ
ケットを受取る段階と、データパケットを地理アドレス
とローカルアドレスとに復号する段階と、地理アドレス
を関連地理アドレスと比較する段階と、関連地理アドレ
スにマッチする地理アドレスに応じてデータパケットを
ローカルプロセッサに伝送する段階と、を実行する。The invention of an addressing method according to claim 8 is an addressing method for addressing a memory in a parallel computing environment in which a local processing unit is connected to a local memory, a local address machine and an interconnecting unit. Wherein the local address machine receives a data packet, decodes the data packet into a geographical address and a local address, compares the geographical address with an associated geographical address, Transmitting the data packet to the local processor according to the address.

【００３６】請求項９記載の発明は、請求項８記載のア
ドレス指定方法において、データパケットを前記ローカ
ルプロセッサに伝送する段階は、前記ローカルプロセッ
サによって処理するためにデータパケットを待ち行列に
記憶する段階を含む。According to a ninth aspect of the present invention, in the addressing method according to the eighth aspect, the step of transmitting a data packet to the local processor includes the step of storing the data packet in a queue for processing by the local processor. including.

【００３７】請求項１０記載の発明は、請求項８記載の
アドレス指定方法において、前記ローカルプロセッサか
らデータを受取る段階と、受取ったデータに応じて前記
ローカルメモリから遠隔演算データを検索する段階と、
検索したデータからメタアドレスを生成する段階と、生
成したメタアドレスに応じてデータパケットを生成する
段階と、データパケットを前記相互結合装置に伝送する
段階と、を具備する。According to a tenth aspect of the present invention, in the addressing method according to the eighth aspect, a step of receiving data from the local processor, and a step of retrieving remote operation data from the local memory in accordance with the received data.
Generating a meta-address from the retrieved data, generating a data packet according to the generated meta-address, and transmitting the data packet to the interconnection device.

【００３８】請求項１１記載の発明は、請求項１０記載
のアドレス指定方法において、遠隔演算データを検索す
る段階は、遠隔地理アドレスと遠隔ローカルメモリアド
レスとを検索する段階を具備する。In the eleventh aspect of the present invention, in the addressing method according to the tenth aspect, the step of retrieving the remote operation data includes the step of retrieving a remote geographical address and a remote local memory address.

【００３９】請求項１２記載の発明は、請求項１１記載
のアドレス指定方法において、前記ローカルメモリから
ソース地理アドレスを検索する段階を具備する。According to a twelfth aspect of the present invention, in the addressing method according to the eleventh aspect, a step of retrieving a source geographical address from the local memory is provided.

【００４０】請求項１３記載の発明は、請求項１２記載
のアドレス指定方法において、各プロセッサに結合され
て結合されている前記ローカルプロセッサのための地理
アドレスを記憶するアーキテクチャ記述メモリを用い、
このアーキテクチャ記述メモリからソース地理アドレス
を検索する。According to a thirteenth aspect of the present invention, in the addressing method according to the twelfth aspect, an architecture description memory for storing a geographical address for the local processor coupled to each processor is provided,
Retrieve the source geographic address from this architecture description memory.

【００４１】請求項１４記載のアドレス指定方法の発明
は、ローカルアドレス指定マシンとローカルメモリとに
結合されたローカル処理マシンを用い、前記ローカルア
ドレス指定マシンが、一意的な地理識別により識別さ
れ、相互結合装置によって相互結合されている並列プロ
セッサアーキテクチャ内で命令を処理するための方法で
あって、前記ローカルアドレス指定マシンは、前記ロー
カル処理マシンから無条件命令を受取る段階と、ローカ
ル地理アドレスと、遠隔地理アドレスと、遠隔ローカル
メモリアドレスとを含んでいる遠隔コンポーネント情報
を前記ローカルメモリから検索する段階と、検索した遠
隔コンポーネント情報に応じてメタアドレスを生成する
段階と、生成したメタアドレスに応じてデータパケット
を生成する段階と、データパケットを相互結合装置に送
る段階と、を具備する。[0041] The invention of an addressing method according to claim 14 uses a local processing machine coupled to a local addressing machine and a local memory, wherein the local addressing machine is identified by a unique geographical identification, and A method for processing instructions in a parallel processor architecture interconnected by a coupling device, the local addressing machine receiving an unconditional instruction from the local processing machine; a local geographical address; Retrieving remote component information including a geographic address and a remote local memory address from the local memory; generating a metaaddress in response to the retrieved remote component information; and storing data in response to the generated metaaddress. Generating a packet; Comprising the steps of sending to the interconnecting device Tapaketto, the.

【００４２】請求項１５記載のアドレス指定方法の発明
は、ローカル処理装置がローカルメモリとローカルアド
レスマシンと相互結合装置とに結合されている並列計算
環境でメモリをアドレス指定するためのアドレス指定方
法であって、前記ローカルアドレスマシンは、ローカル
プロセッサからデータを受取る段階と、受取ったデータ
に応じて前記ローカルメモリから遠隔演算データを検索
する段階と、検索したデータからメタアドレスを生成す
る段階と、生成したメタアドレスに応じてデータパケッ
トを生成する段階と、データパケットを相互結合装置に
伝送する段階と、を実行する。The invention of an addressing method according to claim 15 is an addressing method for addressing a memory in a parallel computing environment in which a local processing unit is connected to a local memory, a local address machine and an interconnecting unit. Receiving the data from a local processor, retrieving remote operation data from the local memory in accordance with the received data, generating a meta-address from the retrieved data; Generating a data packet according to the meta-address and transmitting the data packet to the interconnection device.

【００４３】[0043]

【発明の実施の形態】＜概要＞本発明は、１組のＳマシ
ンと、各Ｓマシンに対応するＴマシンと、汎用相互結合
マトリックス（ＧＰＩＭ：General-Purpose Interconne
ct Matrix）と、１組の入出力Ｔマシンと、１組の入出
力装置と、マスタタイムベース装置とが、拡張性、並
列、動的再構成計算のためのシステムを形成する。各Ｓ
マシンは、メモリと、第１ローカルタイムベース装置
と、動的再構成処理装置（ＤＲＰＵ：Dynamically Reco
nfigurable Process unit）とを含む動的再構成コンピ
ュータである。動的再構成処理装置（ＤＲＰＵ）は、命
令取出し装置（ＩＦＵ：InstructionFetch Unit）とし
て構成された再プログラマブル論理装置と、データ演算
装置（ＤＯＵ：Data Operate unit）と、アドレス演算
装置（ＡＯＵ：Address OperateUnit）とを用いて実装
され、これらはそれぞれ再構成割込みまたは１組のプロ
グラム命令に埋込まれた再構成命令の選択に応じてプロ
グラム実行中に選択的に再構成される。各再構成割込み
と各再構成命令は、特定の命令セットアーキテクチャ
（ＩＳＡ：Instruction Set Architecture）の実装のた
めに最適化された動的再構成処理装置（ＤＲＰＵ）ハー
ドウェア編成を指定する構成データセットを引照する。
命令取出し装置（ＩＦＵ）は、再構成演算と、命令取り
出し・復号演算と、メモリアクセス演算とを指示し、命
令の実行を容易にするために制御信号をデータ演算装置
（ＤＯＵ）とアドレス演算装置（ＡＯＵ）とに発する。
データ演算装置（ＤＯＵ）はデータ演算を実行し、アド
レス演算装置（ＡＯＵ）はアドレス演算を実行する。各
Ｔマシンは、共通インタフェース制御装置（ＣＩＣＵ：
Common Interface and Control Unit）と、１個または
それ以上の相互結合入出力装置と、第２ローカルタイム
ベース装置とを含むデータ転送装置である。汎用相互結
合マトリックス（ＧＰＩＭ）は、Ｔマシン相互間の並列
通信を容易に行えるようにする拡張性相互結合ネットワ
ークである。この１組のＴマシンと汎用相互結合マトリ
ックス（ＧＰＩＭ）によって、Ｓマシン間の並列通信が
容易に行われる。またＴマシンは、ネットワークのＳマ
シン相互間のデータの転送を制御し、要求されるアドレ
ス指定演算を提供する。メタアドレスは、各Ｓマシンに
拡張性ビットアドレス指定能力を提供するのに用いられ
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS <Overview> The present invention provides a set of S machines, a T machine corresponding to each S machine, and a general-purpose interconnect matrix (GPIM).
ct Matrix), a set of input / output T machines, a set of input / output devices, and a master timebase device form a system for scalable, parallel, dynamic reconfiguration calculations. Each S
The machine includes a memory, a first local time base device, and a dynamically reconfigurable processing device (DRPU).
nfigurable Process unit). The dynamic reconfiguration processor (DRPU) includes a reprogrammable logic device configured as an instruction fetch unit (IFU: Instruction Fetch Unit), a data operation unit (DOU: Data Operate unit), and an address operation unit (AOU: Address OperateUnit). ), Which are selectively reconfigured during program execution in response to selection of a reconfiguration interrupt or a reconfiguration instruction embedded in a set of program instructions, respectively. Each reconfiguration interrupt and each reconfiguration instruction is a configuration data set that specifies a dynamic reconfiguration processor (DRPU) hardware organization optimized for the implementation of a particular instruction set architecture (ISA). To illuminate.
An instruction fetch unit (IFU) instructs a reconfiguration operation, an instruction fetch / decode operation, and a memory access operation, and sends control signals to a data operation unit (DOU) and an address operation unit in order to facilitate execution of an instruction. (AOU).
The data operation unit (DOU) executes data operation, and the address operation unit (AOU) executes address operation. Each T machine has a common interface controller (CICU:
A data transfer device including a common interface and control unit, one or more interconnected input / output devices, and a second local time base device. The Generic Interconnect Matrix (GPIM) is an extensible interconnect network that facilitates parallel communication between T machines. This set of T machines and a general interconnect matrix (GPIM) facilitate parallel communication between S machines. The T machines also control the transfer of data between the S machines in the network and provide the required addressing operations. The meta-address is used to provide scalable bit addressability to each S-machine.

【００４４】＜具体的態様＞図１は、本発明に基づいて
構築された、拡張性、並列、動的再構成計算のためのシ
ステム１０の好ましい実施例の構成図である。システム
１０は、少なくとも１個のＳマシン１２と、各Ｓマシン
１２に対応するＴマシン１４と、汎用相互結合マトリッ
クス（ＧＰＩＭ）１６と、少なくとも１個の入出力Ｔマ
シン１８と、１個またはそれ以上の入出力装置２０と、
マスタタイムベース装置２２とを含んでいることが好ま
しい。好ましい実施例では、システム１０は、多重Ｓマ
シン１２と、したがって多重Ｔマシン１４と、多重入出
力Ｔマシン１８と、多重入出力装置２０とを含んでい
る。<Specific Embodiment> FIG. 1 is a block diagram of a preferred embodiment of a system 10 for scalable, parallel, dynamic reconfiguration calculation constructed according to the present invention. The system 10 includes at least one S machine 12, a T machine 14 corresponding to each S machine 12, a general interconnect matrix (GPIM) 16, at least one input / output T machine 18, and one or more With the above input / output device 20,
Preferably, a master time base device 22 is included. In a preferred embodiment, system 10 includes multiple S-machines 12, and thus multiple T-machines 14, multiple I / O T-machines 18, and multiple I / O devices 20.

【００４５】Ｓマシン１２と、Ｔマシン１４と、入出力
Ｔマシン１８とは、それぞれマスタタイムベース装置２
２のタイミング出力部に結合されたマスタタイミング入
力部を含んでいる。各Ｓマシン１２は、それに対応する
Ｔマシン１４に結合された入力部と出力部とを含んでい
る。各Ｔマシン１４は、それに対応するＳマシン１２に
結合された入力部と出力部の他に、汎用相互結合マトリ
ックス（ＧＰＩＭ）１６に結合されたルーティング入力
部とルーティング出力部とを含んでいる。同様に、各入
出力Ｔマシン１８は、入出力装置２０に結合された入力
部と出力部とを含み、また汎用相互結合マトリックス
（ＧＰＩＭ）１６に結合されたルーティング入力部とル
ーティング出力部とを含んでいる。The S machine 12, the T machine 14, and the input / output T machine 18 are connected to the master time base unit 2 respectively.
And a master timing input coupled to the second timing output. Each S-machine 12 includes an input and an output coupled to its corresponding T-machine 14. Each T machine 14 includes a routing input and a routing output coupled to a general interconnect matrix (GPIM) 16 in addition to an input and an output coupled to its corresponding S machine 12. Similarly, each input / output T machine 18 includes an input and an output coupled to an input / output device 20 and has a routing input and a routing output coupled to a general interconnect matrix (GPIM) 16. Contains.

【００４６】下記に詳細に説明するように、各Ｓマシン
１２は動的再構成コンピュータである。汎用相互結合マ
トリックス（ＧＰＩＭ）１６は、Ｔマシン１４間の通信
を容易に行えるようにする２点間並列相互結合手段を形
成している。Ｔマシン１４と汎用相互結合マトリックス
（ＧＰＩＭ）１６は、Ｓマシン１２間のデータ転送のた
めの２点間並列相互結合手段を形成している。同様に、
汎用相互結合マトリックス（ＧＰＩＭ）１６と、１組の
Ｔマシン１４と、１組の入出力Ｔマシン１８とは、Ｓマ
シン１２と各入出力装置２０との間の入出力転送のため
の２点間並列相互結合手段を形成している。マスタタイ
ムベース装置２２は、各Ｓマシン１２と各Ｔマシン１４
にマスタタイミング信号を送る発振器を含んでいる。As described in detail below, each S machine 12 is a dynamically reconfigurable computer. A general interconnect matrix (GPIM) 16 forms a point-to-point parallel interconnect that facilitates communication between T machines 14. The T machine 14 and the general interconnect matrix (GPIM) 16 form a point-to-point parallel interconnect for data transfer between the S machines 12. Similarly,
A general-purpose interconnect matrix (GPIM) 16, a set of T machines 14, and a set of input / output T machines 18 provide two points for input / output transfer between the S machine 12 and each input / output device 20. An inter-parallel interconnection means is formed. The master time base device 22 includes each S machine 12 and each T machine 14
And an oscillator for sending a master timing signal to the CPU.

【００４７】模範実施例では、各Ｓマシン１２は、６４
メガバイトのランダムアクセスメモリ（ＲＡＭ）に結合
されたＸｉｌｉｎｘＸＣ４０１３（Xilinx, Inc., サ
ンノゼ，カリフォルニア）フィールドプログラマブルゲ
ートアレイ（ＦＰＧＡ：Field Programmable Gate Arra
y）を用いて実装されている。各Ｔマシン１４は、各入
出力Ｔマシン１８と同様に、ＸｉｌｉｎｘＸＣ４０１
３フィールドプログラマブルゲートアレイ（ＦＰＧＡ）
の再構成ハードウェアリソースの約５０％を用いて実装
されている。汎用相互結合マトリックス（ＧＰＩＭ）１
６は、環状体の相互結合メッシュとして実装されてい
る。マスタタイムベース装置２２は、システム全体の周
波数基準を提示するクロック分散回路に結合されたクロ
ック発振器であり、米国に特許出願された「位相同期、
フレキシブル周波数クロッキングとメッセージングのた
めのシステムと方法（System and Method for Phase-Sy
nchronous, Flexible Frequency Clocking and Messagi
ng）」に記載されている。Ｔマシン１４と、Ｓマシン１
２と、入出力Ｔマシン１８とは、拡張性コヒーレントイ
ンタフェース（ＳＣＩ）を定めたＡＮＳＩ／ＩＥＥＥ規
格１５９６−１９９２に従って情報を転送するのが好ま
しい。In the exemplary embodiment, each S machine 12 has 64
Xilinx XC4013 (Xilinx, Inc., San Jose, Calif.) Field Programmable Gate Arra (FPGA) coupled to megabytes of random access memory (RAM)
y) is implemented. Each T machine 14 is, like the input / output T machine 18, a Xilinx XC401.
3-field programmable gate array (FPGA)
Implemented using about 50% of the reconfigured hardware resources. Generic interconnect matrix (GPIM) 1
6 is implemented as an interconnected mesh of annular bodies. The master time base unit 22 is a clock oscillator coupled to a clock distribution circuit that presents a frequency reference for the entire system, and is a U.S. patent application for "phase synchronization,
System and Method for Phase-Sy for Flexible Frequency Clocking and Messaging
nchronous, Flexible Frequency Clocking and Messagi
ng) ". T machine 14 and S machine 1
2 and the input / output T machine 18 preferably transfer information according to ANSI / IEEE standard 1596-1992 which defines an extensible coherent interface (SCI).

【００４８】好ましい実施例では、システム１０は並列
で機能する多重Ｓマシン１２を含んでいる。個々のＳマ
シン１２の構造と機能については、図２から図１７を用
いて下記に詳しく説明する。図２は、Ｓマシン１２の好
ましい実施例の構成図である。Ｓマシン１２は、第１ロ
ーカルタイムベース装置３０と、プログラム命令を実行
するための動的再構成処理装置（ＤＲＰＵ）３２と、メ
モリ３４とを含んでいる。第１ローカルタイムベース装
置３０は、Ｓマシンのマスタタイミング入力部を形成す
るタイミング入力部を含んでいる。また第１ローカルタ
イムベース装置３０は、第１ローカルタイミング信号す
なわちクロックを、第１タイミング信号ライン４０を経
て動的再構成処理装置（ＤＲＰＵ）３２のタイミング入
力部に、またメモリ３４のタイミング入力部に送るタイ
ミング出力部を含んでいる。動的再構成処理装置（ＤＲ
ＰＵ）３２は、メモリ制御ライン４２を経てメモリ３４
の制御信号入力部に結合された制御信号出力部と、アド
レスライン４４を経てメモリ３４のアドレス入力部に結
合されたアドレス出力部と、メモリ入出力ライン４６を
経てメモリ３４の双方向データポートに結合された双方
向データポートとを含んでいる。さらに動的再構成処理
装置（ＤＲＰＵ）３２は、外部制御ライン４８を経てそ
の対応するＴマシン１４の双方向データポートに結合さ
れた双方向データポートを含んでいる。図２に示すよう
に、メモリ制御ライン４２はＸビットであり、アドレス
ライン４４はＭビットであり、メモリ入出力ライン４６
は（Ｎ×ｋ）ビットであり、外部制御ライン４８はＹビ
ットである。In the preferred embodiment, system 10 includes multiple S machines 12 functioning in parallel. The structure and function of each S machine 12 will be described in detail below with reference to FIGS. FIG. 2 is a configuration diagram of a preferred embodiment of the S machine 12. The S machine 12 includes a first local time base device 30, a dynamic reconfiguration processor (DRPU) 32 for executing a program instruction, and a memory 34. The first local time base device 30 includes a timing input forming a master timing input of the S machine. The first local time base device 30 also applies a first local timing signal or clock to a timing input of a dynamic reconfiguration processor (DRPU) 32 via a first timing signal line 40 and to a timing input of a memory 34. And a timing output unit for sending to the user. Dynamic reconfiguration processor (DR
PU) 32 is connected to a memory 34 via a memory control line 42.
A control signal output coupled to a control signal input of the memory 34, an address output coupled to an address input of the memory 34 via an address line 44, and a bidirectional data port of the memory 34 via a memory input / output line 46. And a combined bidirectional data port. In addition, the dynamic reconfiguration processor (DRPU) 32 includes a bidirectional data port coupled via an external control line 48 to a corresponding bidirectional data port of the T-machine 14. As shown in FIG. 2, the memory control line 42 is X bits, the address line 44 is M bits, and the memory input / output lines 46
Are (N × k) bits, and the external control line 48 is Y bits.

【００４９】好ましい実施例では、第１ローカルタイム
ベース装置３０は、マスタタイムベース装置２２からマ
スタタイミング信号を受取る。第１ローカルタイムベー
ス装置３０は、マスタタイミング信号から第１ローカル
タイミング信号を生成し、第１ローカルタイミング信号
を動的再構成処理装置（ＤＲＰＵ）３２とメモリ３４に
送る。好ましい実施例では、第１ローカルタイミング信
号は個々のＳマシン１２ごとに異なる。したがって、所
定のＳマシン１２内の動的再構成処理装置（ＤＲＰＵ）
３２とメモリ３４は、他のＳマシン１２内の動的再構成
処理装置（ＤＲＰＵ）３２とメモリ３４とは独立したク
ロックレートで機能する。第１ローカルタイミング信号
は、マスタタイミング信号と位相同期であることが好ま
しい。好ましい実施例では、第１ローカルタイムベース
装置３０は、再構成ハードウェアリソースを用いて実装
された位相ロック検出回路を含む位相ロック周波数変換
回路を用いて実動化される。当業者は、別の実施例で、
第１ローカルタイムベース装置３０がクロック分散ツリ
ーの一部として実動化できることを認めるであろう。In the preferred embodiment, first local time base device 30 receives a master timing signal from master time base device 22. The first local time base device 30 generates a first local timing signal from the master timing signal and sends the first local timing signal to a dynamic reconfiguration processor (DRPU) 32 and a memory 34. In the preferred embodiment, the first local timing signal is different for each individual S machine 12. Therefore, a dynamic reconfiguration processing device (DRPU) in a predetermined S machine 12
32 and the memory 34 operate at a clock rate independent of the dynamic reconfiguration processing unit (DRPU) 32 and the memory 34 in the other S machines 12. Preferably, the first local timing signal is in phase synchronization with the master timing signal. In a preferred embodiment, the first local time base device 30 is implemented using a phase locked frequency conversion circuit including a phase locked detection circuit implemented using reconfigured hardware resources. One skilled in the art will appreciate that in another embodiment,
It will be appreciated that the first local time base device 30 can be implemented as part of a clock distribution tree.

【００５０】メモリ３４は、ＲＡＭとして実動化され、
またプログラム命令と、プログラムデータと、動的再構
成処理装置（ＤＲＰＵ）３２のための構成データとを記
憶することが好ましい。任意のＳマシン１２のメモリ３
４は、汎用相互結合マトリックス（ＧＰＩＭ）１６を経
てシステム１０内の他のＳマシン１２にアクセスできる
ことが好ましい。さらに各Ｓマシン１２には、均一のメ
モリアドレススペースがあることが好ましい。好ましい
実施例では、メモリ３４に記憶されたプログラム命令
は、動的再構成処理装置（ＤＲＰＵ）３２へ向けられた
再構成指示を選択的に含んでいる。図３は、再構成指示
を含む模範プログラムリスト５０である。図３に示すよ
うに、模範プログラムリスト５０は１組の外部ループ部
分５２と、第１内部ループ部分５４と、第２内部ループ
部分５５と、第３内部ループ部分５６と、第４内部ルー
プ部分５７と、第５内部ループ部分５８とを含んでい
る。当業者は、「内部ループ」という用語が特定のセッ
トの関連演算を実行するプログラムの反復部分を指し、
また「外部ループ」という用語が、主として汎用演算を
実行し、及び／または一つの内部ループ部分からもう一
つの内部ループ部分へ制御を転送するプログラムの部分
を指すことを容易に認めるであろう。一般に、プログラ
ムの内部ループ部分５４、５５、５６、５７、５８は、
潜在的に大きなデータセットについて特定の演算を実行
する。たとえば画像処理アプリケーションでは、第１内
部ループ部分５４は画像データについてカラーフォーマ
ット変換演算を実行し、第２〜第５内部ループ部分５
４、５５、５６、５７、５８は、線形フィルタリング演
算、畳込み演算、パターン探索演算、及び圧縮演算を実
行することになる。当業者は、内部ループ部分５５、５
６、５７、５８の連続シーケンスがソフトウェアパイプ
ラインとして考えられることを認めるであろう。各外部
ループ部分５２は、データの入出力について責任を有
し、及び／または第１内部ループ部分５４から第２内部
ループ部分５５へのデータ及び制御の転送を指示する。
当業者は、さらに、所定の内部ループ部分５４、５５、
５６、５７、５８が一つまたはそれ以上の再構成指示を
含むことを認めるであろう。一般に任意のプログラムに
ついて、プログラムリスト５０の外部ループ部分５２は
各種の汎用命令を含むが、プログラムリスト５０の内部
ループ５４、５６は特定の命令セットを実行するのに用
いられる比較的種類の少ない命令で構成される。The memory 34 is implemented as a RAM,
It is also preferable to store program instructions, program data, and configuration data for a dynamic reconfiguration processor (DRPU) 32. Memory 3 of any S machine 12
Preferably, 4 has access to other S machines 12 in system 10 via a general interconnect matrix (GPIM) 16. Further, each S machine 12 preferably has a uniform memory address space. In the preferred embodiment, the program instructions stored in memory 34 optionally include reconfiguration instructions directed to a dynamic reconfiguration processor (DRPU) 32. FIG. 3 is an exemplary program list 50 including a reconfiguration instruction. As shown in FIG. 3, the exemplary program list 50 includes a set of an outer loop portion 52, a first inner loop portion 54, a second inner loop portion 55, a third inner loop portion 56, and a fourth inner loop portion. 57 and a fifth inner loop portion 58. One skilled in the art will recognize that the term "inner loop" refers to an iterative portion of a program that performs a particular set of related operations,
It will also be readily appreciated that the term "outer loop" refers to a portion of a program that primarily performs general purpose operations and / or transfers control from one inner loop portion to another. Generally, the inner loop portions 54, 55, 56, 57, 58 of the program
Perform certain operations on potentially large data sets. For example, in an image processing application, the first inner loop portion 54 performs a color format conversion operation on image data, and the second to fifth inner loop portions 5
4, 55, 56, 57, and 58 perform a linear filtering operation, a convolution operation, a pattern search operation, and a compression operation. One skilled in the art will appreciate that the inner loop portions 55,5,
It will be appreciated that a continuous sequence of 6, 57, 58 can be considered as a software pipeline. Each outer loop portion 52 is responsible for inputting and outputting data and / or directs the transfer of data and control from the first inner loop portion 54 to the second inner loop portion 55.
Those skilled in the art will further appreciate that certain inner loop portions 54, 55,
It will be appreciated that 56, 57, 58 include one or more reconfiguration instructions. Generally, for any program, the outer loop portion 52 of the program list 50 contains various general-purpose instructions, while the inner loops 54, 56 of the program list 50 contain relatively few types of instructions used to execute a particular instruction set. It is composed of

【００５１】模範プログラムリスト５０では、第１再構
成指示は第１内部ループ部分５４の開始部分に現れ、第
２再構成指示は第１内部ループ部分５４の終了部分に現
れる。同様に、第３再構成指示は第２内部ループ部分５
５の開始部分に、また第４の再構成指示は第３内部ルー
プ部分５６の開始部分に、第５再構成指示は第４内部ル
ープ部分５７の開始部分に、第６及び第７再構成指示は
それぞれ第５内部ループ部分５８の開始部分と終了部分
に現れる。各再構成指示は、特定の命令セットアーキテ
クチャ（ＩＳＡ）を実動化するためのものであり、また
それに最適化された内部動的再構成処理装置（ＤＲＰ
Ｕ）ハードウェア編成を指定する構成データセットを指
示することが好ましい。命令セットアーキテクチャ（Ｉ
ＳＡ）は、コンピュータをプログラムするのに用いるこ
とができる基本的なまたは中核となる命令セットであ
る。命令セットアーキテクチャ（ＩＳＡ）は、命令フォ
ーマットと、操作コードと、データフォーマットと、ア
ドレス指定モードと、実行制御フラグと、プログラムア
クセス可能レジスタとを定義する。当業者は、これが命
令セットアーキテクチャ（ＩＳＡ）の従来の定義に対応
することを認めるであろう。本発明では、各Ｓマシンの
動的再構成処理装置（ＤＲＰＵ）３２は、各所望の命令
セットアーキテクチャ（ＩＳＡ）について独自の構成デ
ータセットを用いて多重命令セットアーキテクチャ（Ｉ
ＳＡ）を直接実装するよう、迅速なランタイム構成とす
ることができる。すなわち各命令セットアーキテクチャ
（ＩＳＡ）は、対応する構成データセットによって定め
られる独自の内部動的再構成処理装置（ＤＲＰＵ）ハー
ドウェア編成で実装される。したがって本発明では、第
１〜第５内部ループ部分５４、５５、５６、５７、５８
はそれぞれ一意の命令セットアーキテクチャ（ＩＳ
Ａ）、すなわち命令セットアーキテクチャ（ＩＳＡ）
１、命令セットアーキテクチャ（ＩＳＡ）２、命令セッ
トアーキテクチャ（ＩＳＡ）３、命令セットアーキテク
チャ（ＩＳＡ）４及び命令セットアーキテクチャ（ＩＳ
Ａ）ｋに対応する。当業者は、連続命令セットアーキテ
クチャ（ＩＳＡ）がそれぞれ一意である必要はないこと
を認めるであろう。したがって、命令セットアーキテク
チャ（ＩＳＡ）ｋは命令セットアーキテクチャ（ＩＳ
Ａ）１、命令セットアーキテクチャ（ＩＳＡ）２、命令
セットアーキテクチャ（ＩＳＡ）３、命令セットアーキ
テクチャ（ＩＳＡ）４であってもよく、また異なる命令
セットアーキテクチャ（ＩＳＡ）であっても良い。１組
の外部ループ部分５２も、一意の命令セットアーキテク
チャ（ＩＳＡ）、すなわち命令セットアーキテクチャ
（ＩＳＡ）０に対応する。好ましい実施例では、プログ
ラムの実行中、連続した再構成指示の選択はデータ従属
的に行われる（データに応じて異なる）。特定の再構成
指示を選択すると、プログラム命令はその後、対応する
構成データセットによって指定された独自の動的再構成
処理装置（ＤＲＰＵ）ハードウェア構成により、対応す
る命令セットアーキテクチャ（ＩＳＡ）に従って実行さ
れる。In the exemplary program list 50, the first reconfiguration instruction appears at the start of the first inner loop portion 54, and the second reconfiguration instruction appears at the end of the first inner loop portion 54. Similarly, the third reconfiguration instruction is the second inner loop part 5
5, the fourth reconstruction instruction is at the beginning of the third inner loop part 56, the fifth reconstruction instruction is at the beginning of the fourth inner loop part 57, the sixth and seventh reconstruction instructions. Appear at the beginning and end of the fifth inner loop portion 58, respectively. Each reconfiguration instruction is for implementing a specific instruction set architecture (ISA), and an internal dynamic reconfiguration processor (DRP) optimized for it.
U) It is preferable to indicate a configuration data set that specifies the hardware organization. Instruction Set Architecture (I
SA) is a basic or core instruction set that can be used to program a computer. The instruction set architecture (ISA) defines an instruction format, an operation code, a data format, an addressing mode, an execution control flag, and a program accessible register. One skilled in the art will recognize that this corresponds to the traditional definition of an instruction set architecture (ISA). In the present invention, the dynamic reconfiguration processor (DRPU) 32 of each S machine uses a multiple instruction set architecture (I) using a unique configuration data set for each desired instruction set architecture (ISA).
A rapid runtime configuration can be implemented to implement SA) directly. That is, each instruction set architecture (ISA) is implemented with a unique internal dynamic reconfiguration processor (DRPU) hardware organization defined by a corresponding configuration data set. Therefore, in the present invention, the first to fifth inner loop portions 54, 55, 56, 57, 58
Are unique instruction set architectures (IS
A) Instruction Set Architecture (ISA)
1. Instruction Set Architecture (ISA) 2, Instruction Set Architecture (ISA) 3, Instruction Set Architecture (ISA) 4, and Instruction Set Architecture (IS)
A) Corresponds to k. Those skilled in the art will recognize that each successive instruction set architecture (ISA) need not be unique. Therefore, the instruction set architecture (ISA) k is
A) 1, an instruction set architecture (ISA) 2, an instruction set architecture (ISA) 3, an instruction set architecture (ISA) 4, or a different instruction set architecture (ISA). The set of outer loop portions 52 also corresponds to a unique instruction set architecture (ISA), ie, instruction set architecture (ISA) 0. In the preferred embodiment, during the execution of the program, the selection of successive reconfiguration instructions is made data-dependent (depending on the data). Upon selection of a particular reconfiguration instruction, the program instructions are then executed according to the corresponding instruction set architecture (ISA), with a unique Dynamic Reconfiguration Processor (DRPU) hardware configuration specified by the corresponding configuration data set. You.

【００５２】本発明では、特定の命令セットアーキテク
チャ（ＩＳＡ）は、命令セットアーキテクチャ（ＩＳ
Ａ）が含む命令の数と種類に従って、内部ループ命令セ
ットアーキテクチャ（ＩＳＡ）または外部ループ命令セ
ットアーキテクチャ（ＩＳＡ）として分類することがで
きる。いくつかの命令を含み、汎用演算の実行に役立つ
命令セットアーキテクチャ（ＩＳＡ）は外部ループ命令
セットアーキテクチャ（ＩＳＡ）であり、一方、比較的
少ない命令を含み、特定の種類の命令の実行に向けられ
ている命令セットアーキテクチャ（ＩＳＡ）は内部ルー
プ命令セットアーキテクチャ（ＩＳＡ）である。外部ル
ープ命令セットアーキテクチャ（ＩＳＡ）は汎用演算の
実行に向けられているので、プログラム命令の逐次実行
が望ましい場合に最も役に立つ。外部ループ命令セット
アーキテクチャ（ＩＳＡ）の実行性能は、実行される命
令ごとのクロックサイクルで特徴付けられることが好ま
しい。これに対して、内部ループ命令セットアーキテク
チャ（ＩＳＡ）は特定の種類の命令の実行に向けられて
いるので、プログラム命令の並列実行が望ましい場合に
最も役に立つ。内部ループ命令セットアーキテクチャ
（ＩＳＡ）の実行性能は、クロックサイクル当たりで実
行される命令で、またはクロックサイクル当たり得られ
る計算結果で特徴付けられることが好ましい。In the present invention, a particular instruction set architecture (ISA) is an instruction set architecture (IS).
According to the number and type of instructions included in A), they can be classified as inner loop instruction set architecture (ISA) or outer loop instruction set architecture (ISA). An instruction set architecture (ISA) that includes several instructions and is useful for performing general-purpose operations is an outer loop instruction set architecture (ISA), whereas it includes a relatively small number of instructions and is directed to executing certain types of instructions. The current instruction set architecture (ISA) is the inner loop instruction set architecture (ISA). Since the outer loop instruction set architecture (ISA) is directed to performing general purpose operations, it is most useful when sequential execution of program instructions is desired. The execution performance of the outer loop instruction set architecture (ISA) is preferably characterized by a clock cycle for each instruction executed. In contrast, the inner loop instruction set architecture (ISA) is most useful when parallel execution of program instructions is desired, as it is directed to the execution of certain types of instructions. The execution performance of the inner loop instruction set architecture (ISA) is preferably characterized by the instructions executed per clock cycle or by the computation results obtained per clock cycle.

【００５３】当業者は、プログラム命令の逐次実行及び
並列実行に関するこれまでの説明が単一の動的再構成処
理装置（ＤＲＰＵ）３２内でのプログラム命令の実行に
関連していることを認めるであろう。システム１０に多
重Ｓマシン１２が存在することによって、特定の動的再
構成処理装置（ＤＲＰＵ）３２によって各プログラム命
令シーケンスが実行される場合、多重プログラム命令シ
ーケンスを任意の時間に並列実行することが容易にな
る。各動的再構成処理装置（ＤＲＰＵ）３２は、特定の
時間にそれぞれ特定の内部ループ命令セットアーキテク
チャ（ＩＳＡ）または外部ループ命令セットアーキテク
チャ（ＩＳＡ）を実動化するための並列ハードウェアま
たは直列ハードウェアを含むように構成されている。任
意の動的再構成処理装置（ＤＲＰＵ）３２の内部ハード
ウェア構成は、実行される一連のプログラム命令内に埋
込まれた１つまたはそれ以上の再構成指示の選択に従っ
て経時的に変化する。Those skilled in the art will recognize that the preceding description of the sequential and parallel execution of program instructions relates to the execution of program instructions within a single dynamic reconfigurable processor (DRPU) 32. There will be. Due to the presence of multiple S-machines 12 in the system 10, if each program instruction sequence is executed by a specific dynamic reconfiguration processor (DRPU) 32, the multiple program instruction sequences can be executed in parallel at any time. It will be easier. Each dynamic reconfiguration processor (DRPU) 32 is a parallel or serial hardware for implementing a particular inner loop instruction set architecture (ISA) or outer loop instruction set architecture (ISA) at a particular time. It is configured to include hardware. The internal hardware configuration of any dynamic reconfiguration processor (DRPU) 32 changes over time according to the selection of one or more reconfiguration instructions embedded in the sequence of program instructions to be executed.

【００５４】好ましい実施例では、各命令セットアーキ
テクチャ（ＩＳＡ）とその対応する内部動的再構成処理
装置（ＤＲＰＵ）ハードウェア編成は、１組の利用可能
な再構成ハードウェアリソースに対して特定のクラスの
計算上の問題について最適の計算性能を備えるよう設計
されている。上に述べたように、また下記に詳しく説明
するように、外部ループ命令セットアーキテクチャ（Ｉ
ＳＡ）に対応する内部動的再構成処理装置（ＤＲＰＵ）
ハードウェア編成は、プログラム命令の逐次実行につい
て最適化されるのが好ましい。また内部ループ命令セッ
トアーキテクチャ（ＩＳＡ）に対応する内部動的再構成
処理装置（ＤＲＰＵ）ハードウェア編成は、プログラム
命令の並列実行について最適化されるのが好ましい。模
範汎用外部ループ命令セットアーキテクチャ（ＩＳＡ）
を参考資料Ａに示し、畳込み演算専用の模範内部ループ
命令セットアーキテクチャ（ＩＳＡ）を参考資料Ｂに示
す。In the preferred embodiment, each instruction set architecture (ISA) and its corresponding internal dynamic reconfigurable processor (DRPU) hardware organization is specific to a set of available reconfigurable hardware resources. It is designed to provide optimal computational performance for class computational problems. As mentioned above and as described in more detail below, the outer loop instruction set architecture (I
Internal dynamic reconfiguration processor (DRPU) corresponding to SA)
The hardware organization is preferably optimized for the sequential execution of program instructions. Also, the internal dynamic reconfiguration processor (DRPU) hardware organization corresponding to the inner loop instruction set architecture (ISA) is preferably optimized for parallel execution of program instructions. Model Universal External Loop Instruction Set Architecture (ISA)
Is shown in Reference Material A, and Reference Material B shows an exemplary inner loop instruction set architecture (ISA) dedicated to convolution operation.

【００５５】各再構成指示を除いて、図３の模範プログ
ラムリスト５０は、従来の高レベル言語文、たとえばＣ
プログラミング言語に従って書かれた文で構成されてい
ることが好ましい。当業者は、一連のプログラム命令に
１つまたはそれ以上の再構成指示を含むには、再構成指
示に対応するために修正されたコンパイラが必要である
ことを認めるであろう。図４は、一連のプログラム命令
のコンパイル中に実行される先行技術コンパイル演算の
フローチャートである。ここで、先行技術のコンパイル
演算は、Free Software Foundation（Cambridge，マサ
チューセッツ）によって作成されたＧＮＵＣコンパイ
ラ（ＧＣＣ：ＧＮＵＣ Compiler）によって実行される
ものにほぼ相当している。当業者は、下記に説明する先
行技術コンパイル演算が他のコンパイラについて容易に
一般化できることを認めるであろう。先行技術コンパイ
ル演算はステップ５００で始まり、コンパイラフロント
エンドが一連のプログラム命令から次の高レベル文を選
択する。次にステップ５０２で、コンパイラフロントエ
ンドは選択した高レベル文に対応する中間レベルのコー
ドを生成する。これは、ＧＮＵＣコンパイラ（ＧＣ
Ｃ）の場合には、レジスタ転送レベル（ＲＴＬ：Regist
er Transfer Level）文に相当する。ステップ５０２の
あとステップ５０４で、コンパイラフロントエンドはさ
らに別の高レベル文を検討する必要があるかどうかを決
定する。検討する必要があれば、この好適な方法はステ
ップ５００に戻る。Except for each restructuring instruction, the exemplary program list 50 of FIG.
It is preferably composed of sentences written according to a programming language. Those skilled in the art will recognize that including one or more reconfiguration instructions in a sequence of program instructions requires a compiler modified to accommodate the reconfiguration instructions. FIG. 4 is a flowchart of a prior art compilation operation performed during compilation of a series of program instructions. Here, the compilation operation of the prior art substantially corresponds to that executed by a GNU C Compiler (GCC) created by the Free Software Foundation (Cambridge, Mass.). One skilled in the art will recognize that the prior art compilation operations described below can be easily generalized to other compilers. The prior art compilation operation begins at step 500, where the compiler front end selects the next higher level statement from a series of program instructions. Next, at step 502, the compiler front end generates intermediate-level code corresponding to the selected high-level statement. This is the GNU C compiler (GC
In the case of C), the register transfer level (RTL: Register)
er Transfer Level) statement. After step 502, at step 504, the compiler front end determines whether further high-level statements need to be considered. If so, the preferred method returns to step 500.

【００５６】ステップ５０４でコンパイラフロントエン
ドが他のどの高レベル文も検討する必要がないと決定し
たときは、次にステップ５０６でコンパイラバックエン
ドが従来のレジスタ割当て演算を実行する。ステップ５
０６のあとステップ５０８で、コンパイラバックエンド
は現在のレジスタ転送レベル（ＲＴＬ）文グループ内で
検討するために次のレジスタ転送レベル（ＲＴＬ）文を
選択する。次にステップ５１０でコンパイラバックエン
ドは現在のレジスタ転送レベル（ＲＴＬ）文グループが
１組のアセンブリ言語文に翻訳することのできる方法を
定めるルールが存在するかどうかを決定する。このよう
なルールが存在しないときには、この好適な方法はステ
ップ５０８に戻り、現在のレジスタ転送レベル（ＲＴ
Ｌ）文グループに含めるためにさらに別のレジスタ転送
レベル（ＲＴＬ）文を選択する。現在のレジスタ転送レ
ベル（ＲＴＬ）文グループに対応するルールが存在する
ときには、ステップ５１２でコンパイラバックエンドは
そのルールに従って１組のアセンブリ言語文を生成す
る。ステップ５１２のあと、コンパイラバックエンドは
次のレジスタ転送レベル（ＲＴＬ）文グループのコンテ
クストにおいて次のレジスタ転送レベル（ＲＴＬ）文を
検討する必要があるかどうかを決定する。検討する必要
があるときには、この好適な方法はステップ５０８に戻
る。必要がなければ、この好適な方法は終了する。If at step 504 the compiler front end determines that no other high level statements need to be considered, then at step 506 the compiler back end performs a conventional register allocation operation. Step 5
After 06, at step 508, the compiler backend selects the next register transfer level (RTL) statement for consideration within the current register transfer level (RTL) statement group. Next, at step 510, the compiler back end determines whether there are rules that define how the current register transfer level (RTL) statement group can be translated into a set of assembly language statements. If no such rule exists, the preferred method returns to step 508, where the current register transfer level (RT
L) Select yet another register transfer level (RTL) statement for inclusion in the statement group. If there is a rule corresponding to the current register transfer level (RTL) statement group, at step 512 the compiler back end generates a set of assembly language statements according to the rule. After step 512, the compiler backend determines whether the next register transfer level (RTL) statement needs to be considered in the context of the next register transfer level (RTL) statement group. When necessary, the preferred method returns to step 508. If not, the preferred method ends.

【００５７】本発明は、動的再構成計算のためのコンパ
イラを含んでいることが好ましい。図５と図６は、動的
再構成計算のためのコンパイラによって実行される好ま
しいコンパイル演算のフローチャートである。好ましい
コンパイル演算はステップ６００から始まり、動的再構
成計算のためのコンパイラのフロントエンドが一連のプ
ログラム命令内の次の高レベル文を選択する。次にステ
ップ６０２で動的再構成計算のためのコンパイラのフロ
ントエンドは、選択された高レベル文が再構成指示であ
るかどうかを決定する。再構成指示であるときには、ス
テップ６０４で動的再構成計算のためのコンパイラのフ
ロントエンドはレジスタ転送レベル（ＲＴＬ）再構成文
を生成し、ステップ６００に戻る。好ましい実施例で
は、レジスタ転送レベル（ＲＴＬ）再構成文は命令セッ
トアーキテクチャ（ＩＳＡ）識別を含む非標準レジスタ
転送レベル（ＲＴＬ）文である。ステップ６０２で、選
択した高レベルプログラム文が再構成指示ではないとき
には、次にステップ６０６で動的再構成計算のためのコ
ンパイラのフロントエンドは従来の方法で１組のレジス
タ転送レベル（ＲＴＬ）文を生成する。ステップ６０６
のあと、ステップ６０８で動的再構成計算のためのコン
パイラのフロントエンドはさらに別の高レベル文を検討
する必要があるかどうかを決定する。検討する必要があ
るときには、この好適な方法はステップ６００に戻る。
そうでないときにはこの好適な方法はステップ６１０に
進み、バックエンド演算を開始する。The present invention preferably includes a compiler for dynamic reconfiguration calculations. 5 and 6 are flowcharts of a preferred compile operation performed by a compiler for dynamic reconfiguration calculations. The preferred compilation operation begins at step 600, where the front end of the compiler for dynamic reconfiguration computation selects the next higher level statement in the sequence of program instructions. Next, in step 602, the front end of the compiler for the dynamic reconfiguration calculation determines whether the selected high-level statement is a reconfiguration instruction. If the instruction is a reconfiguration instruction, the front end of the compiler for the dynamic reconfiguration calculation generates a register transfer level (RTL) reconfiguration statement in step 604, and returns to step 600. In a preferred embodiment, the register transfer level (RTL) reconfiguration statement is a non-standard register transfer level (RTL) statement that includes an instruction set architecture (ISA) identification. If, at step 602, the selected high-level program statement is not a reconfiguration instruction, then at step 606 the front end of the compiler for dynamic reconfiguration computation uses a set of register transfer level (RTL) statements in a conventional manner. Generate Step 606
After, at step 608, the front end of the compiler for dynamic reconfiguration computation determines whether additional high-level statements need to be considered. When necessary, the preferred method returns to step 600.
Otherwise, the preferred method proceeds to step 610, where the back-end operation is started.

【００５８】ステップ６１０で、動的再構成計算のため
のコンパイラのバックエンドはレジスタ割当て演算を実
行する。本発明の好ましい実施例では、各命令セットア
ーキテクチャ（ＩＳＡ）は命令セットアーキテクチャ
（ＩＳＡ）ごとのレジスタアーキテクチャが互いに一致
するように定められている。したがって、レジスタ割当
て演算は従来の方法で実行される。当業者は、一般に、
命令セットアーキテクチャ（ＩＳＡ）ごとのレジスタア
ーキテクチャが互いに一致することが絶対的要件ではな
いことを認めるであろう。次にステップ６１２で動的再
構成計算のためのコンパイラのバックエンドは、現在検
討中のレジスタ転送レベル（ＲＴＬ）文グループ内で次
のレジスタ転送レベル（ＲＴＬ）文を選択する。次にス
テップ６１４で動的再構成計算のためのコンパイラのバ
ックエンドは、選択したレジスタ転送レベル（ＲＴＬ）
文がレジスタ転送レベル（ＲＴＬ）再構成文であるかど
うかを決定する。選択したレジスタ転送レベル（ＲＴ
Ｌ）文がレジスタ転送レベル（ＲＴＬ）再構成文でない
ときには、ステップ６１８で動的再構成計算のためのコ
ンパイラのバックエンドは、現在検討中のレジスタ転送
レベル（ＲＴＬ）文グループについてのルールが存在す
るかどうかを決定する。存在しなければ、この好適な方
法はステップ６１２に戻り、現在検討中のレジスタ転送
レベル（ＲＴＬ）文グループに含めるために次のレジス
タ転送レベル（ＲＴＬ）文グループを選択する。ステッ
プ６１８で現在検討中のレジスタ転送レベル（ＲＴＬ）
文グループについてのルールが存在するときには、次に
ステップ６２０で動的再構成計算のためのコンパイラの
バックエンドはこのルールに従って現在検討中のレジス
タ転送レベル（ＲＴＬ）文グループに対応する１組のア
センブリ言語文を生成する。ステップ６２０のあと、ス
テップ６２２で動的再構成計算のためのコンパイラのバ
ックエンドは、次のレジスタ転送レベル（ＲＴＬ）文グ
ループのコンテクストにおいて、さらに別のレジスタ転
送レベル（ＲＴＬ）文を検討する必要があるかどうかを
決定する。検討する必要があればこの好適な方法はステ
ップ６１２に戻り、そうでなければこの好適な方法は終
了する。At step 610, the back end of the compiler for the dynamic reconfiguration calculation performs a register allocation operation. In the preferred embodiment of the present invention, each instruction set architecture (ISA) is defined such that the register architecture for each instruction set architecture (ISA) matches each other. Therefore, the register allocation operation is performed in a conventional manner. Those skilled in the art generally
It will be appreciated that it is not an absolute requirement that the register architectures per instruction set architecture (ISA) match each other. Next, at step 612, the back end of the compiler for dynamic reconfiguration computation selects the next register transfer level (RTL) statement in the register transfer level (RTL) statement group under consideration. Next, at step 614, the compiler back end for the dynamic reconfiguration calculation uses the selected register transfer level (RTL).
Determine whether the statement is a register transfer level (RTL) reconstructed statement. Selected register transfer level (RT
If the L) statement is not a register transfer level (RTL) reconstructed statement, then at step 618 the compiler back end for dynamic reconfiguration computations has rules for the register transfer level (RTL) statement group currently under consideration. Decide if you want to. If not, the preferred method returns to step 612 to select the next register transfer level (RTL) statement group to include in the currently considered register transfer level (RTL) statement group. Register transfer level (RTL) currently under consideration in step 618
If there are rules for the statement group, then in step 620 the compiler back end for dynamic reconfiguration computations will follow the rules to set the assembly corresponding to the register transfer level (RTL) statement group currently under consideration. Generate a language sentence. After step 620, the compiler backend for dynamic reconfiguration computation in step 622 needs to consider yet another register transfer level (RTL) statement in the context of the next register transfer level (RTL) statement group. Determine if there is. If so, the preferred method returns to step 612; otherwise, the preferred method ends.

【００５９】ステップ６１４で、選択したレジスタ転送
レベル（ＲＴＬ）文がレジスタ転送レベル（ＲＴＬ）再
構成文であるときには、ステップ６１６で動的再構成計
算のためのコンパイラのバックエンドはレジスタ転送レ
ベル（ＲＴＬ）再構成文内の命令セットアーキテクチャ
（ＩＳＡ）識別に対応する１組のルールセットを選択す
る。本発明では、各命令セットアーキテクチャ（ＩＳ
Ａ）について独自のルールが存在することが好ましい。
従って各ルールセットは、特定の命令セットアーキテク
チャ（ＩＳＡ）に従ってレジスタ転送レベル（ＲＴＬ）
文グループをアセンブリ言語文に変換するための１つま
たはそれ以上のルールを提供する。ステップ６１６のあ
と、好適な方法はステップ６１８に進む。任意の命令セ
ットアーキテクチャ（ＩＳＡ）に対応するルールセット
は、レジスタ転送レベル（ＲＴＬ）再構成文を、ソフト
ウェア割込みを生じるような１組のアセンブリ言語命令
に翻訳するためのルールを含んでいることが好ましい。
このソフトウェア割込みの結果、再構成ハンドラーが実
行されるが、これについては下記に詳しく説明する。At step 614, when the selected register transfer level (RTL) statement is a register transfer level (RTL) reconstructed statement, at step 616, the compiler back end for the dynamic reconfiguration calculation uses the register transfer level (RTL). RTL) Select a set of rulesets corresponding to the instruction set architecture (ISA) identification in the reconstructed statement. In the present invention, each instruction set architecture (IS
Preferably, there is a unique rule for A).
Thus, each rule set has a register transfer level (RTL) according to a specific instruction set architecture (ISA).
Provide one or more rules for converting a sentence group into an assembly language sentence. After step 616, the preferred method proceeds to step 618. A rule set corresponding to any instruction set architecture (ISA) may include rules for translating a register transfer level (RTL) reconstructed statement into a set of assembly language instructions that cause a software interrupt. preferable.
As a result of this software interrupt, a reconfiguration handler is executed, which is described in more detail below.

【００６０】上記に説明した方法では、動的再構成計算
のためのコンパイラは選択的にまた自動的にコンパイル
演算中に多重命令セットアーキテクチャ（ＩＳＡ）に従
ってアセンブリ言語文を生成する。言い換えれば、コン
パイル中、動的再構成計算のためのコンパイラはそれぞ
れ異なる命令セットアーキテクチャ（ＩＳＡ）に従って
１組のプログラム命令をコンパイルする。動的再構成計
算のためのコンパイラは、図５と図６を用いて上に説明
したような好ましいコンパイル演算を実行するよう修正
した従来型コンパイラであることが好ましい。当業者
は、必要とされる修正は複雑ではないが、このような修
正は先行技術コンパイル技術及び先行技術再構成計算技
術から見て自明ではないことを認めるであろう。In the method described above, the compiler for dynamic reconfiguration computation selectively and automatically generates assembly language statements according to the multiple instruction set architecture (ISA) during the compilation operation. In other words, during compilation, the compiler for dynamic reconfiguration computation compiles a set of program instructions according to different instruction set architectures (ISAs). The compiler for the dynamic reconfiguration calculation is preferably a conventional compiler modified to perform the preferred compilation operation as described above with reference to FIGS. Those skilled in the art will recognize that the modifications required are not complex, but such modifications are not obvious from the prior art compilation and prior art reconstruction computation techniques.

【００６１】図７は、動的再構成処理装置（ＤＲＰＵ）
３２の好ましい実施例の構成図である。動的再構成処理
装置（ＤＲＰＵ）３２は、命令取出し装置（ＩＦＵ）６
０と、データ演算装置（ＤＯＵ）６２と、アドレス演算
装置（ＡＯＵ）６４とを含んでいる。命令取出し装置
（ＩＦＵ）６０と、データ演算装置（ＤＯＵ）６２と、
アドレス演算装置（ＡＯＵ）６４のそれぞれは、第１タ
イミング信号ライン４０に結合されたタイミング入力部
を含んでいる。命令取出し装置（ＩＦＵ）６０は、メモ
リ制御ライン４２に結合されたメモリ制御出力部と、メ
モリ入出力ライン４６に結合されたデータ入力部と、外
部制御ライン４８に結合された双方向制御ポートとを含
んでいる。命令取出し装置（ＩＦＵ）６０はさらに、第
１制御ライン７０を経てデータ演算装置（ＤＯＵ）６２
の第１制御入力部に結合された第１制御出力部と、第２
制御ライン７２を経てアドレス演算装置（ＡＯＵ）６４
の第１制御入力部に結合された第２制御出力部とを含ん
でいる。命令取出し装置（ＩＦＵ）６０は、第３制御ラ
イン７４を経てデータ演算装置（ＤＯＵ）６２の第２制
御入力部とアドレス演算装置（ＡＯＵ）６４の第２制御
入力部に結合された第１制御出力部を含んでいる。デー
タ演算装置（ＤＯＵ）６２とアドレス演算装置（ＡＯ
Ｕ）６４とは、それぞれメモリ入出力ライン４６に結合
された双方向データポートを含んでいる。最後にアドレ
ス演算装置（ＡＯＵ）６４は、動的再構成処理装置（Ｄ
ＲＰＵ）のアドレス出力部を形成するアドレス出力部を
含んでいる。FIG. 7 shows a dynamic reconfiguration processor (DRPU).
FIG. 32 is a block diagram of a preferred embodiment of the present invention. The dynamic reconfiguration processor (DRPU) 32 includes an instruction fetch unit (IFU) 6
0, a data operation unit (DOU) 62, and an address operation unit (AOU) 64. An instruction fetch unit (IFU) 60, a data operation unit (DOU) 62,
Each of the address operation units (AOU) 64 includes a timing input coupled to the first timing signal line 40. An instruction fetch unit (IFU) 60 includes a memory control output coupled to the memory control line 42, a data input coupled to the memory input / output line 46, and a bidirectional control port coupled to the external control line 48. Includes The instruction fetch unit (IFU) 60 further includes a data operation unit (DOU) 62 via a first control line 70.
A first control output coupled to the first control input of the second
Address operation unit (AOU) 64 via control line 72
And a second control output coupled to the first control input. An instruction fetch unit (IFU) 60 is connected to a second control input of a data operation unit (DOU) 62 and a second control input of an address operation unit (AOU) 64 via a third control line 74. Includes output section. Data operation unit (DOU) 62 and address operation unit (AO)
U) 64 includes a bi-directional data port coupled to the memory input / output line 46, respectively. Finally, the address operation unit (AOU) 64 is a dynamic reconfiguration processing unit (D
RPU) to form an address output.

【００６２】動的再構成処理装置（ＤＲＰＵ）３２は、
再構成論理装置または再プログラマブル論理装置、たと
えばＸｉｌｉｎｘＸＣ４０１３（Xilinx, Inc., サン
ノゼ，カリフォルニア）またはＡＴ＆ＴＯＲＣＡＩ
Ｃ０７（ＡＴ＆Ｔ Microelectronics, Allentown, ペ
ンシルバニア）などのフィールドプログラマブルゲート
アレイ（ＦＰＧＡ）を用いて実装されるのが好ましい。
再プログラマブル論理装置は、複数の、１）選択的再プログラマブル論理ブロック、または構成
可能論理ブロック（ＣＬＢ：Selectively Reprogramabl
e Logic Blocks or Configurable Logick Blocks）と、２）選択的再プログラマブル入出力ブロック（ＩＯＢ：
Ｉ／Ｏ Blocks）と、３）選択的再プログラマブル相互結合構造と、４）データ記憶リソースと、５）３値バッファリソースと、６）ワイヤード論理関数能力と、を備えていることが好ましい。各論理ブロック（ＣＬ
Ｂ）は、論理関数を生成し、データを記憶し、信号のル
ーティングを行うための選択的再構成回路を含んでいる
ことが好ましい。当業者は、使用中の再プログラマブル
論理装置の正確な設計に応じて、再構成データ記憶回路
が論理ブロック（ＣＬＢ）とは別の１個またはそれ以上
のデータ記憶ブロック（ＤＳＢ：Data Storage Block）
に含まれることもあることを認めるであろう。ここで
は、フィールドプログラマブルゲートアレイ（ＦＰＧ
Ａ）内の再構成データ記憶回路は、論理ブロック（ＣＬ
Ｂ）内に取入れられている。すなわち、データ記憶ブロ
ック（ＤＳＢ）の存在は想定されていない。当業者は、
上に説明した論理ブロック（ＣＬＢ）ベース再構成デー
タ記憶回路を利用する１個またはそれ以上の構成部分
が、データ記憶ブロック（ＤＳＢ）が存在する場合には
データ記憶ブロック（ＤＳＢ）ベース回路も利用できる
ことを認めるであろう。各入出力ブロック（ＩＯＢ）
は、論理ブロック（ＣＬＢ）とフィールドプログラマブ
ルゲートアレイ（ＦＰＧＡ）出力ピンとの間でデータを
転送するための選択的再構成回路を含んでいることが好
ましい。構成データセットは、論理ブロック（ＣＬＢ）
内で実行される関数を指定することによって動的再構成
処理装置（ＤＲＰＵ）ハードウェア構成または編成を定
め、また、１）論理ブロック（ＣＬＢ）内、２）論理ブロック（ＣＬＢ）相互間、３）入出力ブロック（ＩＯＢ）内、４）入出力ブロック（ＩＯＢ）相互間、及び、５）論理ブロック（ＣＬＢ）と入出力ブロック（ＩＯ
Ｂ）との間の相互結合を定める。当業者は、構成データセットによ
って、メモリ制御ライン４２と、アドレスライン４４
と、メモリ入出力ライン４６と、外部制御ライン４８の
それぞれにおけるビット数が再構成可能であることを認
めるであろう。再構成データセットは、システム１０の
中の１個またはそれ以上のＳマシン３４に記憶されるこ
とが好ましい。当業者は、動的再構成処理装置（ＤＲＰ
Ｕ）３２がフィールドプログラマブルゲートアレイ（Ｆ
ＰＧＡ）ベース実装に限定されないことを認めるであろ
う。たとえば動的再構成処理装置（ＤＲＰＵ）３２は、
１つまたはそれ以上のルックアップテーブルをおそらく
含むＲＡＭベース状態マシンとして実装することができ
る。あるいは動的再構成処理装置（ＤＲＰＵ）３２は、
複合プログラマブル論理装置（ＣＰＬＤ）を用いて実装
することができる。しかし当業者は、システム１０のＳ
マシン１２の一部が再構成可能ではない動的再構成処理
装置（ＤＲＰＵ）１２を含むことができることを認める
であろう。The dynamic reconfiguration processing unit (DRPU) 32
Reconfigurable or reprogrammable logic devices, such as Xilinx XC4013 (Xilinx, Inc., San Jose, CA) or AT & T ORCA I
It is preferably implemented using a field programmable gate array (FPGA) such as C07 (AT & T Microelectronics, Allentown, PA).
The reprogrammable logic device comprises a plurality of 1) selectively reprogrammable logic blocks or configurable logic blocks (CLBs).
e Logic Blocks or Configurable Logick Blocks) and 2) Selective reprogrammable input / output blocks (IOB:
I / O Blocks), 3) a selectively reprogrammable interconnect structure, 4) a data storage resource, 5) a ternary buffer resource, and 6) a wired logic function capability. Each logical block (CL
B) preferably includes selective reconfiguration circuitry for generating logic functions, storing data, and performing signal routing. Those skilled in the art will recognize that, depending on the exact design of the reprogrammable logic device in use, the reconfigurable data storage circuit may have one or more data storage blocks (DSBs) separate from logic blocks (CLBs).
Will also be included. Here, a field programmable gate array (FPG)
The reconstructed data storage circuit in A) is a logical block (CL)
B). That is, the existence of a data storage block (DSB) is not assumed. Those skilled in the art
One or more components utilizing the logic block (CLB) based reconfigurable data storage circuit described above also utilize a data storage block (DSB) based circuit if a data storage block (DSB) is present I will admit that I can do it. Each input / output block (IOB)
Preferably includes a selective reconfiguration circuit for transferring data between a logic block (CLB) and a field programmable gate array (FPGA) output pin. The configuration data set is a logical block (CLB)
A dynamic reconfigurable processor (DRPU) hardware configuration or organization is defined by specifying the functions to be executed within: 1) within a logical block (CLB); 2) between logical blocks (CLB); ) Within the input / output block (IOB), 4) between input / output blocks (IOB), and 5) logical block (CLB) and input / output block (IO)
And B). Those skilled in the art will recognize that depending on the configuration data set, the memory control line 42 and the address line 44
And that the number of bits in each of the memory input / output lines 46 and the external control lines 48 is reconfigurable. The reconstructed data set is preferably stored on one or more S machines 34 in the system 10. Those skilled in the art will recognize a dynamic reconfiguration processor (DRP).
U) 32 is a field programmable gate array (F)
It will be appreciated that it is not limited to PGA) based implementations. For example, a dynamic reconfiguration processor (DRPU) 32
It can be implemented as a RAM-based state machine, possibly containing one or more lookup tables. Alternatively, the dynamic reconfiguration processing device (DRPU) 32
It can be implemented using a composite programmable logic device (CPLD). However, those skilled in the art will recognize that the S
It will be appreciated that a portion of machine 12 may include a non-reconfigurable dynamic reconfigurable processor (DRPU) 12.

【００６３】好ましい実施例では、命令取出し装置（Ｉ
ＦＵ）６０と、データ演算装置（ＤＯＵ）６２と、アド
レス演算装置（ＡＯＵ）６４はそれぞれ動的に再構成可
能である。したがって、その内部ハードウェア構成はプ
ログラム実行中に選択的に変更することができる。命令
取出し装置（ＩＦＵ）６０は、命令取出し・復号演算
と、メモリアクセス演算と、動的再構成処理装置（ＤＲ
ＰＵ）再構成演算とを指示し、命令の実行を容易に行う
ためにデータ演算装置（ＤＯＵ）６２とアドレス演算装
置（ＡＯＵ）６４に制御信号を送る。データ演算装置
（ＤＯＵ）６２は、データ計算に関する演算を実行し、
アドレス演算装置（ＡＯＵ）６４はアドレス計算に関す
る演算を実行する。命令取出し装置（ＩＦＵ）６０と、
データ演算装置（ＤＯＵ）６２と、アドレス演算装置
（ＡＯＵ）６４のそれぞれの内部構造と演算については
下記に詳しく説明する。In the preferred embodiment, the instruction fetch unit (I
The FU) 60, the data operation unit (DOU) 62, and the address operation unit (AOU) 64 are each dynamically reconfigurable. Therefore, the internal hardware configuration can be selectively changed during execution of the program. The instruction fetch unit (IFU) 60 includes an instruction fetch / decode operation, a memory access operation, and a dynamic reconfiguration processing unit (DR).
PU), and sends a control signal to a data operation unit (DOU) 62 and an address operation unit (AOU) 64 in order to easily execute the instruction. The data operation unit (DOU) 62 executes an operation related to data calculation,
The address operation unit (AOU) 64 executes an operation related to address calculation. An instruction fetch unit (IFU) 60;
The internal structures and operations of the data operation unit (DOU) 62 and the address operation unit (AOU) 64 will be described in detail below.

【００６４】図８は、命令取出し装置（ＩＦＵ）６０の
好ましい実施例の構成図である。命令取出し装置（ＩＦ
Ｕ）６０は、命令状態シーケンサ（ＩＳＳ：Instructio
n State Sequencer）１００と、アーキテクチャ記述メ
モリ１０１と、メモリアクセスロジック１０２と、再構
成ロジック１０４と、割込みロジック１０６と、取出し
制御装置１０８と、命令バッファ１１０と、復号制御装
置１１２と、命令復号器１１４と、操作コード記憶レジ
スタセット１１６と、レジスタファイル（ＲＦ：Regist
er File）アドレスレジスタセット１１８と、定数レジ
スタセット１２０と、プロセス制御レジスタセット１２
２とを含んでいる。命令状態シーケンサ（ＩＳＳ）１０
０は、それぞれ命令取出し装置（ＩＦＵ）６０の第１及
び第２制御出力部を形成する第１及び第２制御出力部を
含んでおり、また命令取出し装置（ＩＦＵ）６０のタイ
ミング入力部を形成するタイミング入力部を含んでい
る。また命令状態シーケンサ（ＩＳＳ）１００は、取出
し／復号制御ライン１３０を経て取出し制御装置１０８
の制御入力部と復号制御装置１１２の制御入力部とに結
合された取出し／復号制御出力部を含んでいる。さらに
命令状態シーケンサ（ＩＳＳ）１００は、双方向制御ラ
イン１３２を経てメモリアクセスロジック１０２と、再
構成ロジック１０４と、割込みロジック１０６のそれぞ
れの第１双方向制御ポートに結合された双方向制御ポー
トを含んでいる。また命令状態シーケンサ（ＩＳＳ）１
００は、操作コードライン１４２を経て、操作コード記
憶レジスタセット１１６の出力部に結合された操作コー
ド入力部を含んでいる。最後に命令状態シーケンサ（Ｉ
ＳＳ）１００は、処理データライン１４４を経て、プロ
セス制御レジスタセット１２２の双方向制御ポートに結
合された双方向制御ポートを含んでいる。FIG. 8 is a block diagram of a preferred embodiment of the instruction fetch unit (IFU) 60. Instruction fetch device (IF
U) 60 is an instruction state sequencer (ISS: Instructio).
n state sequencer) 100, an architecture description memory 101, a memory access logic 102, a reconfiguration logic 104, an interrupt logic 106, a fetch controller 108, an instruction buffer 110, a decoding controller 112, and an instruction decoder. 114, an operation code storage register set 116, and a register file (RF: Register)
er File) address register set 118, constant register set 120, process control register set 12
And 2. Instruction status sequencer (ISS) 10
0 includes first and second control outputs forming first and second control outputs of the instruction fetch unit (IFU) 60, respectively, and forms timing inputs of the instruction fetch unit (IFU) 60. It includes a timing input section. The instruction state sequencer (ISS) 100 also controls the fetch controller 108 via a fetch / decode control line 130.
And an extraction / decoding control output coupled to the control input of the decoding controller 112. The instruction state sequencer (ISS) 100 further includes a bi-directional control port coupled to a first bi-directional control port of each of the memory access logic 102, the reconfiguration logic 104, and the interrupt logic 106 via a bi-directional control line 132. Contains. Instruction status sequencer (ISS) 1
00 includes an operation code input coupled to the output of operation code storage register set 116 via operation code line 142. Finally, the instruction status sequencer (I
SS) 100 includes a bi-directional control port coupled to the bi-directional control port of process control register set 122 via processing data line 144.

【００６５】メモリアクセスロジック１０２と、再構成
ロジック１０４と、割込みロジック１０６は、それぞれ
外部制御ライン４８に結合された第２双方向制御ポート
を含んでいる。さらにメモリアクセスロジック１０２
と、再構成ロジック１０４と、割込みロジック１０６
は、それぞれ実装制御ライン１３１を経てアーキテクチ
ャ記述メモリ１０１のデータ出力部に結合されたデータ
入力部を含んでいる。メモリアクセスロジック１０２
は、さらに命令取出し装置（ＩＦＵ）６０のメモリ制御
出力部を形成する制御出力部を含み、また割込みロジッ
ク１０６はさらに処理データライン１４４に結合された
出力部を含んでいる。命令バッファ１１０は、命令取出
し装置（ＩＦＵ）６０のデータ入力部を形成するデータ
入力部と、取出し制御ライン１３４を経て取出し制御装
置１０８の制御出力部に結合された制御入力部と、命令
ライン１３６を経て命令復号器１１４の入力部に結合さ
れた出力部とを含んでいる。命令復号器１１４は、復号
制御ライン１３８を経て復号制御装置１１２の制御出力
部に結合された制御入力部と、復号命令ライン１４０を
経て、１）操作コード記憶レジスタ１１６の入力部と、２）レジスタファイル（ＲＦ）アドレスレジスタセット
１１８の入力部と、３）定数レジスタセット１２０の入力部に結合された出
力部と、を含んでいる。レジスタファイル（ＲＦ）アドレスレジ
スタセット１１８と定数レジスタセット１２０は、それ
ぞれ命令取出し装置（ＩＦＵ）６０の第３制御出力部７
４を形成する出力部を含んでいる。The memory access logic 102, the reconfiguration logic 104, and the interrupt logic 106 each include a second bidirectional control port coupled to the external control line 48. Further, the memory access logic 102
, Reconstruction logic 104, interrupt logic 106
Each include a data input coupled to a data output of the architecture description memory 101 via an implementation control line 131. Memory access logic 102
Also includes a control output forming the memory control output of the instruction fetch unit (IFU) 60, and the interrupt logic 106 further includes an output coupled to the processing data line 144. The instruction buffer 110 includes a data input forming the data input of the instruction fetch unit (IFU) 60, a control input coupled to the control output of the fetch controller 108 via a fetch control line 134, and an instruction line 136. And an output coupled to the input of the instruction decoder 114 via The instruction decoder 114 has a control input coupled to a control output of the decoding controller 112 via a decoding control line 138, and, via a decoding instruction line 140: 1) an input of an operation code storage register 116; 3) an input of a register file (RF) address register set 118; and 3) an output coupled to the input of the constant register set 120. The register file (RF) address register set 118 and the constant register set 120 are respectively provided in the third control output unit 7 of the instruction fetch unit (IFU) 60.
4 is included.

【００６６】アーキテクチャ記述メモリ１０１は、現在
の動的再構成処理装置（ＤＲＰＵ）構成を特徴付けるア
ーキテクチャ指定信号を記憶する。このアーキテクチャ
指定信号は、１）デフォルト構成データセットに対する基準と、２）許容される構成データセットリストに対する基準
と、３）現在検討中の命令セットアーキテクチャ（ＩＳＡ）
に対応する構成データセットに対する基準、すなわち現
在の動的再構成処理装置（ＤＲＰＵ）構成を定める構成
データセットに対する基準と、４）命令取出し装置（ＩＦＵ）６０が存在するＳマシン
１２に関連したＴマシン１４内の１個またはそれ以上の
相互結合入出力装置３０４を識別する相互結合アドレス
リスト（これについては、図１８を用いて下記に詳しく
説明する）と、５）割込み待ち時間と、命令取出し装置（ＩＦＵ）６０
が割込みにどのように応答するかを定める割込み精度情
報とを指定する１組の割込み応答信号と、６）アトミックメモリアドレスインクリメントを定める
メモリアクセス定数と、を含んでいることが好ましい。
好ましい実施例では、各構成データセットは、読出し専
用メモリ（ＲＯＭ）として構成された１組の論理ブロッ
ク（ＣＬＢ）としてアーキテクチャ記述メモリ１０１を
実動化する。アーキテクチャ記述メモリ１０１の内容を
定めるアーキテクチャ指定信号は、各構成データセット
に含まれることが好ましい。したがって、各構成データ
セットが特定の命令セットアーキテクチャ（ＩＳＡ）に
対応するので、アーキテクチャ記述メモリ１０１の内容
は、現在検討中の命令セットアーキテクチャ（ＩＳＡ）
によって異なる。所定の命令セットアーキテクチャ（Ｉ
ＳＡ）について、アーキテクチャ記述メモリ１０１の内
容へのプログラムアクセスは、命令セットアーキテクチ
ャ（ＩＳＡ）にメモリ読出し命令を含めることによって
容易に行われることが好ましい。これによってプログラ
ム実行中に現在の動的再構成処理装置（ＤＲＰＵ）構成
に関する情報をプログラムが検索することができる。The architecture description memory 101 stores an architecture designation signal characterizing the current dynamic reconfiguration processing unit (DRPU) configuration. The architecture specification signals include: 1) a criterion for the default configuration data set; 2) a criterion for the list of allowed configuration data sets; and 3) an instruction set architecture (ISA) currently under consideration.
A reference to the configuration data set corresponding to the above, i.e., a reference to the configuration data set defining the current dynamic reconfiguration processor (DRPU) configuration; An interconnect address list identifying one or more interconnect I / O devices 304 in machine 14 (which is described in greater detail below with reference to FIG. 18); 5) interrupt latency and instruction fetch. Equipment (IFU) 60
Preferably includes a set of interrupt response signals that specify how interrupts respond to interrupts, and 6) a memory access constant that defines atomic memory address increments.
In the preferred embodiment, each configuration data set implements the architecture description memory 101 as a set of logical blocks (CLBs) configured as read-only memory (ROM). Preferably, an architecture designating signal that determines the contents of the architecture description memory 101 is included in each configuration data set. Therefore, since each configuration data set corresponds to a specific instruction set architecture (ISA), the contents of the architecture description memory 101 are stored in the instruction set architecture (ISA) currently under consideration.
Depends on A given instruction set architecture (I
For SA), program access to the contents of the architecture description memory 101 is preferably facilitated by including a memory read instruction in the instruction set architecture (ISA). This allows the program to retrieve information about the current dynamic reconfiguration processor (DRPU) configuration during program execution.

【００６７】本発明では、再構成ロジック１０４は一連
の再構成演算を制御する状態マシンであり、これによっ
て構成データセットに応じて動的再構成処理装置（ＤＲ
ＰＵ）３２の再構成が容易に行われる。再構成ロジック
１０４は、再構成信号を受取り次第、再構成演算を開始
することが好ましい。下記に詳しく説明するように、再
構成信号は、外部制御ライン４８で受取った再構成割込
みに応じて割込みロジック１０６が発生させた信号であ
るか、またはプログラムに埋込まれた再構成指示に応じ
て命令状態シーケンサ（ＩＳＳ）１００が発生させた信
号である。再構成演算によって、アーキテクチャ記述メ
モリ１０１によって参照されるデフォルト構成データを
用いて電源オン／リセット後の当初の動的再構成処理装
置（ＤＲＰＵ）構成が得られる。また再構成演算によっ
て、当初の動的再構成処理装置（ＤＲＰＵ）構成が確定
したあとの選択的動的再構成処理装置（ＤＲＰＵ）再構
成が得られる。再構成演算が完了すると再構成ロジック
１０４は完了信号を発する。好ましい実施例では、再構
成ロジック１０４は、再プログラマブル論理装置自体へ
の構成データセットのローディングを制御する非再構成
ロジックであり、したがって再構成演算のシーケンスは
再プログラマブル論理装置のメーカーによって定められ
る。したがって、再構成演算は当業者に既知である。In the present invention, the reconstruction logic 104 is a state machine that controls a series of reconstruction operations, whereby the dynamic reconstruction processing unit (DR) is controlled according to the configuration data set.
PU) 32 is easily reconfigured. Preferably, the reconstruction logic 104 starts the reconstruction operation upon receiving the reconstruction signal. As will be described in greater detail below, the reconfiguration signal is a signal generated by interrupt logic 106 in response to a reconfiguration interrupt received on external control line 48, or in response to a reconfiguration instruction embedded in a program. This is a signal generated by the instruction status sequencer (ISS) 100. The reconfiguration operation results in an initial dynamic reconfiguration processor (DRPU) configuration after power on / reset using default configuration data referenced by the architecture description memory 101. Also, the reconfiguration operation provides a selective dynamic reconfiguration processor (DRPU) reconfiguration after the initial dynamic reconfiguration processor (DRPU) configuration is determined. When the reconstruction operation is completed, the reconstruction logic 104 issues a completion signal. In the preferred embodiment, the reconfiguration logic 104 is non-reconfiguration logic that controls the loading of a configuration data set into the reprogrammable logic device itself, so that the sequence of reconfiguration operations is determined by the manufacturer of the reprogrammable logic device. Therefore, reconstruction operations are known to those skilled in the art.

【００６８】各動的再構成処理装置（ＤＲＰＵ）構成
は、対応する命令セットアーキテクチャ（ＩＳＡ）の実
動化のための特定のハードウェア編成を定める構成デー
タセットによって与えられるのが好ましい。好ましい実
施例では、命令取出し装置（ＩＦＵ）６０は動的再構成
処理装置（ＤＲＰＵ）構成に関係なく、上記の各構成部
分を含んでいる。基本レベルでは、命令取出し装置（Ｉ
ＦＵ）６０内の各構成部分によって与えられる機能性
は、現在検討中の命令セットアーキテクチャ（ＩＳＡ）
とは無関係である。しかし、好ましい実施例では、命令
取出し装置（ＩＦＵ）６０の１個またはそれ以上の構成
部分の詳細な構造と機能性は、それが構成されている命
令セットアーキテクチャ（ＩＳＡ）の特性に応じて異な
る。好ましい実施例では、アーキテクチャ記述メモリ１
０１及び再構成ロジック１０４の構造と機能性は、それ
ぞれの動的再構成処理装置（ＤＲＰＵ）構成について一
定であることが好ましい。命令取出し装置（ＩＦＵ）６
０のその他の構成部分の構造と機能性について、またこ
れらが命令セットアーキテクチャ（ＩＳＡ）の種類によ
って異なることについては、下記に詳しく説明する。Each Dynamic Reconfiguration Processor (DRPU) configuration is preferably provided by a configuration data set that defines the specific hardware organization for the implementation of the corresponding instruction set architecture (ISA). In the preferred embodiment, instruction fetch unit (IFU) 60 includes the components described above, regardless of the dynamic reconfiguration processor (DRPU) configuration. At the basic level, the instruction fetch unit (I
The functionality provided by each component within the FU) 60 is based on the instruction set architecture (ISA) currently under consideration.
Has nothing to do with. However, in the preferred embodiment, the detailed structure and functionality of one or more components of instruction fetch unit (IFU) 60 will depend on the characteristics of the instruction set architecture (ISA) in which it is configured. . In the preferred embodiment, the architecture description memory 1
Preferably, the structure and functionality of 01 and the reconfiguration logic 104 are constant for each dynamic reconfiguration processor (DRPU) configuration. Instruction fetch unit (IFU) 6
The structure and functionality of the other components of O.0 and how they vary with the type of instruction set architecture (ISA) are described in detail below.

【００６９】プロセス制御レジスタセット１２２は、命
令実行中に命令状態シーケンサ（ＩＳＳ）１００によっ
て用いられる信号とデータを記憶する。好ましい実施例
では、プロセス制御レジスタセット１２２は、プロセス
制御ワードを記憶するためのレジスタと、割込みベクト
ルを記憶するためのレジスタと、構成データセットへの
参照を記憶するためのレジスタとを含んでいる。プロセ
ス制御ワードは、命令実行中に発生する状態にもとづい
て選択的に設定またはリセットすることができる複数の
条件フラグを含んでいることが好ましい。さらにプロセ
ス制御ワードは、割込みを実施できる１つまたはそれ以
上の方法を定める複数の遷移制御信号を含んでいる（こ
れについては、下記に詳しく説明する）。好ましい実施
例では、プロセス制御レジスタセット１２２は、データ
記憶及びゲーティングロジックのために構成された１組
の論理ブロック（ＣＬＢ）として実装される。The process control register set 122 stores signals and data used by the instruction status sequencer (ISS) 100 during instruction execution. In the preferred embodiment, the process control register set 122 includes a register for storing a process control word, a register for storing an interrupt vector, and a register for storing a reference to a configuration data set. . Preferably, the process control word includes a plurality of condition flags that can be selectively set or reset based on conditions that occur during instruction execution. Further, the process control word includes a plurality of transition control signals that define one or more ways in which an interrupt can be implemented (this is described in more detail below). In the preferred embodiment, the process control register set 122 is implemented as a set of logic blocks (CLBs) configured for data storage and gating logic.

【００７０】命令状態シーケンサ（ＩＳＳ）１００は、
取出し制御装置１０８と復号制御装置１１２と、データ
演算装置（ＤＯＵ）６２と、アドレス演算装置（ＡＯ
Ｕ）６４との演算を制御し、命令の実行を容易にするた
めにメモリ読出し信号とメモリ書込み信号をメモリアク
セスロジック１０２に発信する状態マシンであることが
好ましい。図９は、命令状態シーケンサ（ＩＳＳ）１０
０によって支援される１組の好ましい状態を示す状態図
である。電源オンまたはリセット後、または再構成が行
われた直後、命令状態シーケンサ（ＩＳＳ）１００は状
態Ｐで演算を開始する。再構成ロジック１０４により発
せられた完了信号に応じて、命令状態シーケンサ（ＩＳ
Ｓ）１００は状態Ｓに進み、命令状態シーケンサ（ＩＳ
Ｓ）は電源オン／リセットまたは再構成が行われた場
合、それぞれプログラム状態情報を初期化するか、復元
する。命令状態シーケンサ（ＩＳＳ）１００は次に状態
Ｆに進み、命令取出し演算を実行する。命令取出し演算
では、命令状態シーケンサ（ＩＳＳ）１００はメモリ読
出し信号をメモリアクセスロジック１０２に発信し、取
出し信号を取出し制御装置１０８に発信し、次命令プロ
グラムアドレスレジスタ（ＮＩＰＡＲ）２３２をインク
リメントするためにインクリメント信号をアドレス演算
装置（ＡＯＵ）６４に発信する（これについては、図１
５と図１６を用いて下記に詳しく説明する）。状態Ｆの
後、命令状態シーケンサ（ＩＳＳ）１００は状態Ｄに進
み、命令復号演算を開始する。状態Ｄで、命令状態シー
ケンサ（ＩＳＳ）１００は復号信号を復号制御１１２に
発信する。状態Ｄで、命令状態シーケンサ（ＩＳＳ）１
００はさらに復号命令に対応する操作コードを操作コー
ド記憶レジスタセット１１６から検索する。検索した操
作コードに基づいて、命令状態シーケンサ（ＩＳＳ）１
００は状態Ｅまたは状態Ｍに進み、命令実行演算を実行
する。命令が１回のクロックサイクルで実行できるとき
には、命令状態シーケンサ（ＩＳＳ）１００は状態Ｅに
進む。それ以外の場合には、命令状態シーケンサ（ＩＳ
Ｓ）１００は複数のサイクルで命令を実行するために状
態Ｍに進む。命令実行演算では、命令状態シーケンサ
（ＩＳＳ）１００はデータ演算装置（ＤＯＵ）制御信号
と、アドレス演算装置（ＡＯＵ）制御信号と、及び／ま
たは検索した操作コードに対応する命令の実行を容易に
するためのメモリアクセスロジック１０２専用の信号と
を生成する。状態ＥまたはＭのあと、命令状態シーケン
サ（ＩＳＳ）１００は状態Ｗに進む。状態Ｗで、命令状
態シーケンサ（ＩＳＳ）１００は、データ演算装置（Ｄ
ＯＵ）制御信号と、アドレス演算装置（ＡＯＵ）制御信
号と、及び／または命令実行の結果の記憶を容易にする
ためのメモリ書込み信号とを生成する。したがって、状
態Ｗはライトバック状態と呼ばれる。当業者は、状態
Ｆ、Ｄ、Ｅ、Ｍ、Ｗが完全な命令実行サイクルを含むこ
とを認めるであろう。状態Ｗのあと命令状態シーケンサ
（ＩＳＳ）１００は、命令の実行を中断する必要がある
ときには状態Ｙに進む。状態Ｙは、たとえばＴマシン１
４がＳマシンのメモリ３４にアクセスしなくてはならな
いときに必要とされるようなアイドル状態に対応してい
る。状態Ｙのあと、または命令の実行を継続するときに
は状態Ｗの後、命令状態シーケンサ（ＩＳＳ）１００は
状態Ｆに戻り、さらに別の命令実行サイクルを開始す
る。The instruction status sequencer (ISS) 100
Retrieval control unit 108, decoding control unit 112, data operation unit (DOU) 62, and address operation unit (AO
U) Preferably, the state machine transmits memory read and write signals to memory access logic 102 to control operations with 64 and facilitate instruction execution. FIG. 9 shows an instruction status sequencer (ISS) 10
FIG. 4 is a state diagram showing a set of preferred states supported by 0. After power-on or reset, or immediately after reconfiguration, the instruction state sequencer (ISS) 100 starts operation in state P. In response to the completion signal issued by the reconfiguration logic 104, the instruction status sequencer (IS
S) 100 proceeds to state S, where the instruction state sequencer (IS
S) initializes or restores the program state information when power on / reset or reconfiguration is performed. The instruction state sequencer (ISS) 100 then proceeds to state F and performs an instruction fetch operation. In an instruction fetch operation, the instruction status sequencer (ISS) 100 sends a memory read signal to the memory access logic 102, sends a fetch signal to the fetch controller 108, and increments the next instruction program address register (NIPAR) 232. An increment signal is transmitted to the address operation unit (AOU) 64 (this is shown in FIG. 1).
5 and FIG. 16 will be described in detail below). After state F, the instruction state sequencer (ISS) 100 proceeds to state D and starts the instruction decoding operation. In state D, instruction state sequencer (ISS) 100 sends a decode signal to decode control 112. In state D, instruction state sequencer (ISS) 1
00 further retrieves the operation code corresponding to the decoding instruction from the operation code storage register set 116. Instruction status sequencer (ISS) 1 based on the retrieved operation code
00 proceeds to the state E or the state M, and executes an instruction execution operation. The instruction state sequencer (ISS) 100 proceeds to state E when the instruction can be executed in one clock cycle. Otherwise, the instruction status sequencer (IS
S) 100 proceeds to state M to execute the instruction in multiple cycles. In an instruction execution operation, the instruction state sequencer (ISS) 100 facilitates execution of instructions corresponding to data operation unit (DOU) control signals, address operation unit (AOU) control signals, and / or retrieved operation codes. And a signal dedicated to the memory access logic 102 for the purpose. After state E or M, instruction state sequencer (ISS) 100 proceeds to state W. In the state W, the instruction state sequencer (ISS) 100 operates the data operation device (D
OU) control signal, an address operation unit (AOU) control signal, and / or a memory write signal for facilitating storage of the result of the instruction execution. Therefore, state W is called a write-back state. Those skilled in the art will recognize that states F, D, E, M, W include a complete instruction execution cycle. After state W, the instruction state sequencer (ISS) 100 proceeds to state Y when execution of the instruction needs to be interrupted. State Y is, for example, T machine 1
4 corresponds to an idle state as required when the memory 34 of the S machine has to be accessed. After state Y, or after state W when continuing execution of the instruction, instruction state sequencer (ISS) 100 returns to state F and begins another instruction execution cycle.

【００７１】図９に示すように、状態図には状態Ｉも含
まれている。この状態は、割込み実施状態として定義さ
れる。本発明では、命令状態シーケンサ（ＩＳＳ）１０
０は割込みロジック１０６から割込み通知信号を受取
る。図１０を用いて下記に詳しく説明するように、割込
みロジック１０６は遷移制御信号を生成し、プロセス制
御レジスタセット１２２内のプロセス制御ワード内に遷
移制御信号を記憶する。遷移制御信号は、状態Ｆ、Ｄ、
Ｅ、Ｍ、Ｗ、Ｙのどの状態が割込み可能かについて、ま
た各割込み可能状態で必要とされる割込み精度のレベル
について、また状態Ｉのあとも命令の実行を継続すべき
各割込み可能状態の次の状態を示すことが好ましい。命
令状態シーケンサ（ＩＳＳ）１００が所定の状態で割込
み通知信号を受取ったとき、遷移制御信号によって現在
の状態が割込み可能であることが示されている場合に
は、命令状態シーケンサ（ＩＳＳ）１００は状態Ｉに進
む。それ以外の場合には、命令状態シーケンサ（ＩＳ
Ｓ）１００は割込み可能状態に達するまで割込み信号を
受取っていなかったかのように進む。As shown in FIG. 9, the state diagram also includes the state I. This state is defined as an interrupt execution state. In the present invention, the instruction status sequencer (ISS) 10
0 receives an interrupt notification signal from the interrupt logic 106. As described in greater detail below with reference to FIG. 10, interrupt logic 106 generates a transition control signal and stores the transition control signal in a process control word in process control register set 122. The transition control signals are states F, D,
Which of E, M, W, and Y are interruptible, the level of interrupt accuracy required in each interruptible state, and the state of each interruptable state in which instruction execution should continue after state I. It is preferable to show the following state. When the instruction state sequencer (ISS) 100 receives the interrupt notification signal in a predetermined state, and the transition control signal indicates that the current state is interruptible, the instruction state sequencer (ISS) 100 Proceed to state I. Otherwise, the instruction status sequencer (IS
S) 100 proceeds as if no interrupt signal had been received until the interrupt enabled state was reached.

【００７２】命令状態シーケンサ（ＩＳＳ）１００が状
態Ｉに進むと、命令状態シーケンサ（ＩＳＳ）１００は
割込みマスキングフラグを設定し、また割込みベクトル
を検索するために、プロセス制御レジスタセット１２２
にアクセスするのが好ましい。割込みベクトルを受取っ
た後、命令状態シーケンサ（ＩＳＳ）１００は、割込み
ベクトルによって指定される割込みハンドラーに従来の
ようなサブルーチンジャンプを行い現在の割込みを実施
するのが好ましい。When the instruction state sequencer (ISS) 100 advances to state I, the instruction state sequencer (ISS) 100 sets an interrupt masking flag and retrieves the process control register set 122 to retrieve the interrupt vector.
It is preferable to access. After receiving the interrupt vector, the instruction state sequencer (ISS) 100 preferably performs a conventional subroutine jump to the interrupt handler specified by the interrupt vector to perform the current interrupt.

【００７３】本発明では、動的再構成処理装置（ＤＲＰ
Ｕ）３２の再構成は、１）外部制御ライン４８で表明される再構成割込みか、
または、２）一連のプログラム命令内の再構成指示の実行に応じて開始される。好ましい実施例では、再構成割込
みを行っても、また再構成指示を実行しても、再構成ハ
ンドラーへのサブルーチンジャンプが行われる。再構成
ハンドラーはプログラム状態情報をセーブし、構成デー
タセットアドレスと再構成信号を再構成ロジック１０４
に発信することが好ましい。In the present invention, the dynamic reconfiguration processor (DRP)
U) The reconfiguration of 32 may be 1) a reconfiguration interrupt asserted on external control line 48, or
Or 2) It is started in response to execution of a reconfiguration instruction in a series of program instructions. In a preferred embodiment, a subroutine jump to a reconfiguration handler is performed whether a reconfiguration interrupt or a reconfiguration instruction is performed. The reconfiguration handler saves program state information and reconfigures configuration data set addresses and reconfiguration signals to the reconfiguration logic 104.
It is preferred to send to

【００７４】現在の割込みが再構成割込みでないときに
は、命令状態シーケンサ（ＩＳＳ）１００は、割込みが
実施された場合に遷移制御信号によって示される次の状
態に進み、これによって命令実行サイクルを再開し、完
了し、または開始する。When the current interrupt is not a reconfiguration interrupt, the instruction state sequencer (ISS) 100 proceeds to the next state indicated by the transition control signal if the interrupt was performed, thereby restarting the instruction execution cycle, Complete or start.

【００７５】好ましい実施例では、命令状態シーケンサ
（ＩＳＳ）１００により支援される１組の状態は、動的
再構成処理装置（ＤＲＰＵ）３２が構成される命令セッ
トアーキテクチャ（ＩＳＡ）の特性に応じて異なる。し
たがって、典型的な内部ループ命令セットアーキテクチ
ャ（ＩＳＡ）での場合のように、１つまたはそれ以上の
命令が１回のクロックサイクルで実行できる命令セット
アーキテクチャ（ＩＳＡ）について状態Ｍは存在しな
い。図に示すように、図９の状態図は、汎用外部ループ
命令セットアーキテクチャ（ＩＳＡ）を実動化するため
に命令状態シーケンサ（ＩＳＳ）によって支援される状
態を規定することが好ましい。内部ループ命令セットア
ーキテクチャ（ＩＳＡ）の実動化については、命令状態
シーケンサ（ＩＳＳ）１００は複数の状態Ｆ、Ｄ、Ｅ、
Ｗを並列に支援するのが好ましい。これによって当業者
が容易に理解するような方法で命令実行のパイプライン
制御を容易に行うことができる。好ましい実施例では、
命令状態シーケンサ（ＩＳＳ）１００は現在検討中の命
令セットアーキテクチャ（ＩＳＡ）に従って上記に述べ
た状態または状態のサブセットを支援する論理ブロック
（ＣＬＢ）ベース状態マシンとして実動化される。In the preferred embodiment, the set of states supported by the instruction state sequencer (ISS) 100 depends on the characteristics of the instruction set architecture (ISA) in which the dynamic reconfigurable processor (DRPU) 32 is configured. different. Thus, there is no state M for an instruction set architecture (ISA) where one or more instructions can execute in one clock cycle, as in a typical inner loop instruction set architecture (ISA). As shown, the state diagram of FIG. 9 preferably defines states supported by an instruction state sequencer (ISS) for implementing a general purpose external loop instruction set architecture (ISA). For an internal loop instruction set architecture (ISA) implementation, the instruction state sequencer (ISS) 100 includes a plurality of states F, D, E,
Preferably, W is supported in parallel. This facilitates pipeline control of instruction execution in a manner readily understood by those skilled in the art. In a preferred embodiment,
The instruction state sequencer (ISS) 100 is implemented as a logic block (CLB) based state machine that supports the states or subsets of states described above according to the instruction set architecture (ISA) under consideration.

【００７６】割込みロジック１０６は、遷移制御信号を
生成し、外部制御ライン４８を経て受取った割込み信号
に応じて割込み通知演算を実行する状態マシンを含んで
いることが好ましい。図１０は、割込みロジック１０６
によって支援される１組の好ましい状態を示す状態図で
ある。割込みロジック１０６は状態Ｐで演算を開始す
る。状態Ｐは、電源オン、リセット、または再構成状態
に対応している。再構成ロジック１０４によって発せら
れた完了信号に応じて、割込みロジック１０６は状態Ａ
に進み、アーキテクチャ記述メモリ１０１から割込み応
答信号を検索する。割込みロジック１０６は、次に割込
み応答信号から遷移制御信号を生成し、この遷移制御信
号をプロセス制御レジスタセット１２２に記憶する。好
ましい実施例では、割込みロジック１０６は、割込み応
答信号を受取り遷移制御信号を生成するための論理ブロ
ック（ＣＬＢ）ベースプログラマブル論理アレイ（ＰＬ
Ａ）を含んでいる。状態Ａのあと、割込みロジック１０
６は状態Ｂに進み割込み信号を待つ。割込み信号を受取
り、プロセス制御レジスタセット１２２内の割込みマス
キングフラグがリセットされた場合に割込みロジック１
０６は状態Ｃに進む。状態Ｃでは、割込みロジック１０
６は割込みの開始点と、割込み優先度と、割込みハンド
ラーアドレスとを決定する。割込み信号が再構成割込み
のときには、割込みロジック１０６は状態Ｒに進み、構
成データセットアドレスをプロセス制御レジスタセット
１２２に記憶する。状態Ｒのあと、または割込み信号が
再構成割込みではないときには状態Ｃのあと、割込みロ
ジック１０６は状態Ｎに進み、割込みハンドラーアドレ
スをプロセス制御レジスタセット１２２に記憶する。割
込みロジック１０６は次に状態Ｘに進み、割込み通知信
号を命令状態シーケンサ（ＩＳＳ）１００に発する。状
態Ｘのあと、割込みロジック１０６は状態Ｂに戻り、次
の割込み信号を待つ。The interrupt logic 106 preferably includes a state machine that generates a transition control signal and performs an interrupt notification operation in response to an interrupt signal received via the external control line 48. FIG. 10 illustrates the interrupt logic 106
FIG. 4 is a state diagram showing a set of preferred states supported by the system. The interrupt logic 106 starts operation in state P. State P corresponds to a power on, reset, or reconfiguration state. In response to the completion signal issued by the reconfiguration logic 104, the interrupt logic 106
To retrieve an interrupt response signal from the architecture description memory 101. The interrupt logic 106 then generates a transition control signal from the interrupt response signal and stores the transition control signal in the process control register set 122. In a preferred embodiment, interrupt logic 106 includes a logic block (CLB) based programmable logic array (PLL) for receiving interrupt response signals and generating transition control signals.
A). After state A, interrupt logic 10
6 goes to state B and waits for an interrupt signal. When the interrupt signal is received and the interrupt masking flag in the process control register set 122 is reset, the interrupt logic 1
06 proceeds to state C. In state C, the interrupt logic 10
6 determines an interrupt start point, an interrupt priority, and an interrupt handler address. When the interrupt signal is a reconfiguration interrupt, the interrupt logic 106 proceeds to state R and stores the configuration data set address in the process control register set 122. After state R, or after state C when the interrupt signal is not a reconfiguration interrupt, the interrupt logic 106 proceeds to state N and stores the interrupt handler address in the process control register set 122. The interrupt logic 106 then proceeds to state X and issues an interrupt notification signal to the instruction state sequencer (ISS) 100. After state X, interrupt logic 106 returns to state B and waits for the next interrupt signal.

【００７７】好ましい実施例では、割込み応答信号が、
したがって遷移制御信号が指定する割込み待ち時間のレ
ベルは、動的再構成処理装置（ＤＲＰＵ）３２が構成さ
れている現在の命令セットアーキテクチャ（ＩＳＡ）に
よって異なる。たとえば高性能リアルタイム動作制御用
の命令セットアーキテクチャ（ＩＳＡ）では、迅速で予
測可能な割込み応答能力が求められる。したがって、こ
のような命令セットアーキテクチャ（ＩＳＡ）に対応す
る構成データセットは、待ち時間の短い割込みが必要で
あることを示す割込み応答信号を含んでいることが好ま
しい。対応する遷移制御信号は、複数の命令状態シーケ
ンサ（ＩＳＳ）状態を割込み可能として識別することが
好ましい。これにより、命令実行サイクルが完了する前
に割込みによって命令実行サイクルを中断することがで
きる。リアルタイム動作制御用の命令セットアーキテク
チャ（ＩＳＡ）とは異なり、画像畳込み演算用の命令セ
ットアーキテクチャ（ＩＳＡ）では、単位時間当たりに
実行される畳込み演算の回数が最大となるような割込み
応答能力が必要である。画像畳込み演算用命令セットア
ーキテクチャ（ＩＳＡ）に対応する構成データセット
は、待ち時間の長い割込みが必要であることを指定する
割込み応答信号を含んでいることが好ましい。対応する
遷移制御信号は、状態Ｗを割込み可能として識別するこ
とが好ましい。画像畳込み演算用命令セットアーキテク
チャ（ＩＳＡ）を実装するために構成され、命令状態シ
ーケンサ（ＩＳＳ）１００が複数の状態Ｆ、Ｄ、Ｅ、Ｗ
を並列に支援するときには、遷移制御信号はそれぞれ状
態Ｗを割込み可能として識別し、さらに各並列命令実行
サイクルがその状態Ｗ演算を完了するまで割込み実施を
遅延すべきであることを指定することが好ましい。これ
により、割込みが実施される前にすべての命令が実行さ
れることが保証され、これによって適切なパイプライン
実行能力レベルが維持される。In a preferred embodiment, the interrupt response signal is:
Therefore, the level of the interrupt latency specified by the transition control signal differs depending on the current instruction set architecture (ISA) in which the dynamic reconfigurable processor (DRPU) 32 is configured. For example, instruction set architectures (ISAs) for high performance real-time operation control require fast and predictable interrupt response capabilities. Accordingly, the configuration data set corresponding to such an instruction set architecture (ISA) preferably includes an interrupt response signal indicating that a low latency interrupt is required. Preferably, the corresponding transition control signal identifies a plurality of instruction state sequencer (ISS) states as interruptible. Thus, the instruction execution cycle can be interrupted by an interrupt before the instruction execution cycle is completed. Unlike an instruction set architecture (ISA) for real-time operation control, an instruction set architecture (ISA) for image convolution operation has an interrupt response capability that maximizes the number of convolution operations executed per unit time. is necessary. Preferably, the configuration data set corresponding to the instruction set architecture for image convolution (ISA) includes an interrupt response signal that specifies that a high latency interrupt is required. Preferably, the corresponding transition control signal identifies state W as interruptible. An instruction state sequencer (ISS) 100 configured to implement an instruction convolution architecture (ISA) for image convolution comprises a plurality of states F, D, E, W
In parallel, the transition control signals may each identify state W as interruptible and further specify that interrupt execution should be delayed until each parallel instruction execution cycle has completed its state W operation. preferable. This ensures that all instructions are executed before the interrupt is serviced, thereby maintaining a proper pipeline execution capability level.

【００７８】割込み待ち時間のレベルと同様に、割込み
応答信号によって指定される割込み精度のレベルも動的
再構成処理装置（ＤＲＰＵ）３２が構成される命令セッ
トアーキテクチャ（ＩＳＡ）によって異なる。たとえ
ば、状態Ｍが割込み可能なマルチサイクル演算を支援す
る外部ループ命令セットアーキテクチャ（ＩＳＡ）につ
いて割込み可能状態であると定められた場合、割込み応
答信号は正確な割込みが必要であることを指定すること
が好ましい。したがって遷移制御信号は、マルチサイク
ル演算がうまく再スタートできるよう状態Ｍで受取った
割込みを正確な割込みとして扱うよう指定する。もう１
つの例として、無欠陥パイプライン算術演算を支援する
命令セットアーキテクチャ（ＩＳＡ）については、割込
み応答信号は不正確な割込みが必要であると指定するこ
とが好ましい。次に遷移制御信号は、状態Ｗで受取った
割込みを不正確な割込みとして扱うことを指定する。Like the interrupt latency level, the level of interrupt accuracy specified by the interrupt response signal also depends on the instruction set architecture (ISA) in which the dynamic reconfigurable processor (DRPU) 32 is configured. For example, if state M is defined as being interruptible for an external loop instruction set architecture (ISA) that supports interruptible multi-cycle operations, then the interrupt response signal specifies that a precise interrupt is required. Is preferred. Thus, the transition control signal specifies that the interrupt received in state M should be treated as an accurate interrupt so that the multi-cycle operation can be successfully restarted. Another one
As one example, for an instruction set architecture (ISA) that supports defect-free pipeline arithmetic, the interrupt response signal preferably specifies that an incorrect interrupt is required. The transition control signal then specifies that an interrupt received in state W should be treated as an incorrect interrupt.

【００７９】任意の命令セットアーキテクチャ（ＩＳ
Ａ）については、割込み応答信号は命令セットアーキテ
クチャ（ＩＳＡ）の対応する構成データセットの一部に
よって定められ、またプログラムされる。プログラマブ
ル割込み応答信号によって、また対応する遷移制御信号
を生成することにより、本発明では、命令セットアーキ
テクチャ（ＩＳＡ）ごとの最適の割込みスキームを実動
化することが容易となっている。当業者は、先行技術コ
ンピュータアーキテクチャのほとんどでは、割込み能
力、すなわちプログラマブル状態遷移の有効化、プログ
ラマブル割込み待ち時間、及びプログラマブル割込み精
度を柔軟に指定できないことを認めるであろう。好まし
い実施例では、割込みロジック１０６は上記のような状
態を支援する論理ブロック（ＣＬＢ）ベース状態マシン
として実装される。Any instruction set architecture (IS
For A), the interrupt response signal is defined and programmed by a portion of the corresponding configuration data set of the instruction set architecture (ISA). By means of a programmable interrupt response signal and the generation of a corresponding transition control signal, the present invention facilitates implementing an optimal interrupt scheme for each instruction set architecture (ISA). Those skilled in the art will appreciate that most of the prior art computer architectures do not provide flexibility in specifying interrupt capabilities, ie, enabling programmable state transitions, programmable interrupt latency, and programmable interrupt accuracy. In the preferred embodiment, the interrupt logic 106 is implemented as a logic block (CLB) based state machine that supports such states.

【００８０】取出し制御装置１０８は、命令セットアー
キテクチャ（ＩＳＡ）１００によって発せられた取出し
信号に応じて命令バッファ１１０に命令をロードするよ
う指示する。好ましい実施例では、取出し制御装置１０
８は１組の論理ブロック（ＣＬＢ）内でフリップフロッ
プを用いた従来型のワンホット符号化状態マシンとして
実装される。当業者は、別の実施例で、取出し制御装置
１０８が従来型の符号化状態マシンとして、またはＲＯ
Ｍベース状態マシンとして構成できることを認めるであ
ろう。命令バッファ１１０は、メモリ３４からロードさ
れた命令を一時記憶する。外部ループ命令セットアーキ
テクチャ（ＩＳＡ）の実装については、命令バッファ１
１０は多重論理ブロック（ＣＬＢ）を用いた従来型のＲ
ＡＭベース先入れ先出し（ＦＩＦＯ）バッファとして実
装されるのが好ましい。内部ループ命令セットアーキテ
クチャ（ＩＳＡ）の実装については、命令バッファ１１
０は１組の入出力ブロック（ＩＯＢ）内で複数のフリッ
プフロップを用いた、または入出力ブロック（ＩＯＢ）
と論理ブロック（ＣＬＢ）の両方で複数のフリップフロ
ップを用いた１組のフリップフロップレジスタとして実
装されるのが好ましい。The fetch controller 108 instructs the instruction buffer 110 to load instructions in response to a fetch signal issued by the instruction set architecture (ISA) 100. In the preferred embodiment, the unload controller 10
8 is implemented as a conventional one-hot encoded state machine using flip-flops within a set of logic blocks (CLBs). One skilled in the art will appreciate that in other embodiments, the retrieval controller 108 may be configured to use a conventional encoding state machine or RO
It will be appreciated that it can be configured as an M-based state machine. The instruction buffer 110 temporarily stores the instruction loaded from the memory 34. For the implementation of the outer loop instruction set architecture (ISA), see instruction buffer 1
10 is a conventional R using multiple logic blocks (CLBs).
It is preferably implemented as an AM-based first-in-first-out (FIFO) buffer. For the implementation of the inner loop instruction set architecture (ISA), see the instruction buffer 11
0 indicates that a plurality of flip-flops are used in a set of input / output blocks (IOB) or input / output blocks (IOB)
And a logic block (CLB) are preferably implemented as a set of flip-flop registers using a plurality of flip-flops.

【００８１】復号制御装置１１２は、命令セットアーキ
テクチャ（ＩＳＡ）１００によって発せられた復号信号
に応じて、命令を命令バッファ１１０から命令復号器１
１４へ転送するよう指示する。内部ループ命令セットア
ーキテクチャ（ＩＳＡ）については、復号制御装置１１
２は論理ブロック（ＣＬＢ）ベースレジスタに結合され
た論理ブロック（ＣＬＢ）ベースＲＯＭを含むＲＯＭベ
ース状態マシンとして実装されるのが好ましい。外部ル
ープ命令セットアーキテクチャ（ＩＳＡ）については、
復号制御装置１１２は論理ブロック（ＣＬＢ）ベース符
号化状態マシンとして実装されるのが好ましい。入力と
して受取った各命令については、命令復号器１１４は、
従来の方法で対応する操作コードと、レジスタファイル
アドレスと、選択的に１つまたはそれ以上の定数とを出
力する。内部ループ命令セットアーキテクチャ（ＩＳ
Ａ）については、命令復号器１１４は入力として受取っ
た一連の命令を復号するよう構成されていることが好ま
しい。好ましい実施例では、命令復号器１１４は現在検
討中の命令セットアーキテクチャ（ＩＳＡ）に含まれる
各命令を復号するために構成された論理ブロック（ＣＬ
Ｂ）ベース復号器として実装される。The decoding control unit 112 sends an instruction from the instruction buffer 110 to the instruction decoder 1 in accordance with the decoded signal issued by the instruction set architecture (ISA) 100.
14 to be transferred. For the inner loop instruction set architecture (ISA), the decoding controller 11
2 is preferably implemented as a ROM-based state machine that includes a logic block (CLB) based ROM coupled to a logic block (CLB) base register. For the outer loop instruction set architecture (ISA),
The decoding controller 112 is preferably implemented as a logical block (CLB) based encoding state machine. For each instruction received as input, the instruction decoder 114
The corresponding operation code, register file address, and optionally one or more constants are output in a conventional manner. Inner loop instruction set architecture (IS
For A), the instruction decoder 114 is preferably configured to decode a sequence of instructions received as input. In the preferred embodiment, the instruction decoder 114 is a logic block (CL) configured to decode each instruction in the instruction set architecture (ISA) under consideration.
B) Implemented as a base decoder.

【００８２】操作コード記憶レジスタセット１１６は、
命令復号器１１４による各操作コード出力を一時記憶
し、また各操作コードを命令状態シーケンサ（ＩＳＳ）
１００に出力する。外部ループ命令セットアーキテクチ
ャ（ＩＳＡ）を動的再構成処理装置（ＤＲＰＵ）３２に
実装するとき、操作コード記憶レジスタセット１１６は
最適数のフリップフロップレジスタバンクを用いて実装
されることが好ましい。フリップフロップレジスタバン
クは、命令バッファ１１０を通りすでに待ち行例を形成
している命令の操作コードリテラルビットフィールドか
ら導出されるクラスコードまたはグループコードを表す
信号を命令復号器１１４から受取る。フリップフロップ
レジスタバンクは、命令状態シーケンサ（ＩＳＳ）の複
雑性を最小限にとどめることのできる復号スキームに従
って、前述のクラスコードまたはグループコードを記憶
する。内部ループ命令セットアーキテクチャ（ＩＳＡ）
の場合には、操作コード記憶レジスタセット１１６は、
命令復号器１１４による操作コードリテラルビットフィ
ールドから直接導出される操作コード指示信号を記憶す
る。内部ループ命令セットアーキテクチャ（ＩＳＡ）は
小さい操作コードリテラルビットフィールドを必然的に
有し、これによってそれぞれ命令バッファ１１０と、命
令復号器１１４と、操作コード記憶レジスタセット１１
６とによるバッファリングと、復号化と、命令シーケン
シング（順序づけ）のための操作コード表示とについて
の実装要件を最小限にとどめる。以上をまとめると、外
部ループ命令セットアーキテクチャ（ＩＳＡ）について
は、操作コード記憶レジスタセット１１６は操作コード
リテラルサイズに等しいビット幅またはその一部として
特徴付けられるフリップフロップレジスタバンクの小さ
な組合わせとして実装されることが好ましい。内部ルー
プについては、操作コード記憶レジスタセット１１６は
外部ループ命令セットアーキテクチャ（ＩＳＡ）の場合
よりもフリップフロップレジスタバンクが小さくまた統
合されていることが好ましい。内部ループで、フリップ
フロップレジスタバンクのサイズが小さくて済むのは、
外部ループ命令セットアーキテクチャ（ＩＳＡ）と比較
して内部ループ命令セットアーキテクチャ（ＩＳＡ）の
命令数がきわめて少ないためである。The operation code storage register set 116 includes:
Each operation code output by the instruction decoder 114 is temporarily stored, and each operation code is stored in an instruction state sequencer (ISS).
Output to 100. When implementing an outer loop instruction set architecture (ISA) in a dynamic reconfigurable processor (DRPU) 32, the operation code storage register set 116 is preferably implemented using an optimal number of flip-flop register banks. The flip-flop register bank receives from the instruction decoder 114 a signal representing a class code or group code derived from the operation code literal bit field of the instruction that has already formed the queue through the instruction buffer 110. The flip-flop register bank stores the aforementioned class code or group code according to a decoding scheme that can minimize the complexity of the instruction state sequencer (ISS). Inner loop instruction set architecture (ISA)
In the case of the operation code storage register set 116,
An operation code indication signal directly derived from the operation code literal bit field by the instruction decoder 114 is stored. The inner loop instruction set architecture (ISA) necessarily has a small operation code literal bit field, which causes an instruction buffer 110, an instruction decoder 114, and an operation code storage register set 11 respectively.
6 minimizes implementation requirements for buffering, decoding, and operation code display for instruction sequencing. In summary, for the outer loop instruction set architecture (ISA), the operation code storage register set 116 is implemented as a small combination of flip-flop register banks characterized as a bit width equal to or part of the operation code literal size. Preferably. For the inner loop, the operation code storage register set 116 preferably has a smaller and integrated flip-flop register bank than in the outer loop instruction set architecture (ISA). The only reason why the size of the flip-flop register bank in the inner loop is small is that
This is because the number of instructions of the inner loop instruction set architecture (ISA) is extremely small compared to the outer loop instruction set architecture (ISA).

【００８３】レジスタファイル（ＲＦ）アドレスレジス
タセット１１８と定数レジスタセット１２０は、それぞ
れ命令復号器１１４による各レジスタファイルと各定数
出力とを一時記憶する。好ましい実施例では、操作コー
ド記憶レジスタセット１１６と、レジスタファイル（Ｒ
Ｆ）アドレスレジスタセット１１８と、定数レジスタセ
ット１２０とはそれぞれデータ記憶のために構成された
１組の論理ブロック（ＣＬＢ）として実装される。The register file (RF) address register set 118 and the constant register set 120 temporarily store each register file and each constant output by the instruction decoder 114, respectively. In the preferred embodiment, the operation code storage register set 116 and the register file (R
F) Address register set 118 and constant register set 120 are each implemented as a set of logic blocks (CLBs) configured for data storage.

【００８４】メモリアクセスロジック１０２は、アーキ
テクチャ記述メモリ１２２で指定されたアトミックメモ
リアドレスのサイズに従って、メモリ３４と、データ演
算装置（ＤＯＵ）６２と、アドレス演算装置（ＡＯＵ）
６４との間でデータの転送を指示し同期させるメモリ制
御回路である。メモリアクセスロジック１０２はさら
に、Ｓマシン１２と所定のＴマシン１４との間のデータ
とコマンドの転送を指示し同期させる。好ましい実施例
では、メモリアクセスロジック１０２はバーストメモリ
アクセスを支援し、論理ブロック（ＣＬＢ）を用いた従
来型のＲＡＭコントローラとして実装されることが好ま
しい。当業者は、再構成中に、再構成論理装置の入力ピ
ンと出力ピンが３値であり、抵抗停止によって非表明ロ
ジックレベルを定めることができ、したがってメモリ３
４を混乱させないことを認めるであろう。別の実施例で
は、メモリアクセスロジック１０２は動的再構成処理装
置（ＤＲＰＵ）３２の外部に実装することができる。The memory access logic 102 has a memory 34, a data operation unit (DOU) 62, and an address operation unit (AOU) according to the size of the atomic memory address specified in the architecture description memory 122.
64 is a memory control circuit for instructing and synchronizing data transfer with the H.64. The memory access logic 102 further directs and synchronizes the transfer of data and commands between the S machine 12 and a given T machine 14. In a preferred embodiment, memory access logic 102 supports burst memory access and is preferably implemented as a conventional RAM controller using logic blocks (CLBs). Those skilled in the art will recognize that during reconfiguration, the input and output pins of the reconfigurable logic device are ternary, and that a non-asserted logic level can be defined by a resistive stop, and thus the memory 3
Would admit not to confuse 4. In another embodiment, memory access logic 102 may be implemented external to dynamic reconfiguration processor (DRPU) 32.

【００８５】図１１は、データ演算装置６２の好ましい
実施例の構成図である。データ演算装置（ＤＯＵ）６２
はデータ演算装置（ＤＯＵ）制御信号と、レジスタファ
イル（ＲＦ）アドレスと、命令セットアーキテクチャ
（ＩＳＡ）１００から受取った定数とに従ってデータに
ついて演算を実行する。データ演算装置（ＤＯＵ）６２
は、データ演算装置（ＤＯＵ）クロスバースイッチ１５
０と、記憶／整列ロジック１５２と、データ演算ロジッ
ク１５４とを含んでいる。データ演算装置（ＤＯＵ）ク
ロスバースイッチ１５０と、記憶／整列ロジック１５２
と、データ演算ロジック１５４とはそれぞれ第１制御ラ
イン７０を経て命令取出し装置（ＩＦＵ）６０の第１制
御出力部に結合された制御入力部を含んでいる。データ
演算装置（ＤＯＵ）クロスバースイッチ１５０は、デー
タ演算装置（ＤＯＵ）の双方向データポートを形成する
双方向データポートと、第３制御ライン７４に結合され
た定数入力部と、第１データライン１６０を経てデータ
演算ロジック１５４のデータ出力部に結合された第１デ
ータフィードバック入力部と、第２データライン１６４
を経て記憶／整列ロジック１５２のデータ出力部に結合
された第２データフィードバック入力部と、第３データ
ラインを経て記憶／整列ロジック１５２のデータ入力部
に結合されたデータ出力部とを含んでいる。記憶／整列
ロジック１５２は、そのデータ出力部の他に、第３制御
ライン７４に結合されたアドレス入力部を含んでいる。
データ演算ロジック１５４は、さらに第２データライン
１６４を経て記憶／整列ロジックの出力部に結合された
データ入力部を含んでいる。FIG. 11 is a block diagram of a preferred embodiment of the data arithmetic unit 62. Data operation unit (DOU) 62
Performs operations on data according to data operation unit (DOU) control signals, register file (RF) addresses, and constants received from instruction set architecture (ISA) 100. Data operation unit (DOU) 62
Is a data operation unit (DOU) crossbar switch 15
0, storage / alignment logic 152, and data operation logic 154. Data operation unit (DOU) crossbar switch 150 and storage / alignment logic 152
And the data operation logic 154 each include a control input coupled to a first control output of an instruction fetch unit (IFU) 60 via a first control line 70. The data operation unit (DOU) crossbar switch 150 includes a bidirectional data port forming a bidirectional data port of the data operation unit (DOU), a constant input unit coupled to the third control line 74, and a first data line. A first data feedback input coupled to a data output of data operation logic 154 via a second data line 164;
And a data output coupled to the data output of the storage / alignment logic 152 via a third data line and a data output coupled to the data input of the storage / alignment logic 152 via a third data line. . The storage / alignment logic 152 includes an address input coupled to the third control line 74 in addition to its data output.
Data operation logic 154 further includes a data input coupled via a second data line 164 to an output of the storage / alignment logic.

【００８６】データ演算ロジック１５４は、その制御入
力部で受取ったデータ演算装置（ＤＯＵ）制御信号に応
じて、そのデータ入力部で受取ったデータについて、算
術演算、シフト演算及び／または論理演算を実行する。
記憶／整列ロジック１５２は、それぞれそのアドレス入
力部と制御入力部とで受取ったレジスタファイル（Ｒ
Ｆ）アドレスとデータ演算装置（ＤＯＵ）制御の指示に
従って、オペランドと、定数と、データ計算に関連した
部分的結果とを一時記憶するデータ記憶素子を含んでい
る。データ演算装置（ＤＯＵ）クロスバースイッチ１５
０は、その制御入力部で受取ったデータ演算装置（ＤＯ
Ｕ）制御信号に従って、メモリ３４からのデータのロー
ディングと、データ演算ロジック１５４による結果出力
の記憶／整列ロジック１５２またはメモリ３４への転送
と、命令取出し装置（ＩＦＵ）６０による定数出力の記
憶／整列ロジック１５２へのローディングとを容易にす
るような従来型のクロスバースイッチネットワークであ
ることが好ましい。好ましい実施例では、データ演算ロ
ジック１５４の詳細な構造は、現在検討中の命令セット
アーキテクチャ（ＩＳＡ）によって支援される演算の種
類によって定まる。すなわち、データ演算ロジック１５
４は、現在検討中の命令セットアーキテクチャ（ＩＳ
Ａ）内のデータ処理命令によって指定された算術演算及
び／または論理演算を実行するための回路を含んでい
る。同様に、記憶／整列ロジック１５２とデータ演算装
置（ＤＯＵ）クロスバースイッチ１５０の詳細な構造
は、現在検討中の命令セットアーキテクチャ（ＩＳＡ）
によって定まる。命令セットアーキテクチャ（ＩＳＡ）
の種類によるデータ演算ロジック１５４と、記憶／整列
ロジック１５２と、データ演算装置（ＤＯＵ）クロスバ
ースイッチ１５０との詳細な構造は、図１２及び図１３
を参照して下記に詳しく説明する。Data operation logic 154 performs arithmetic, shift, and / or logical operations on data received at the data input in response to a data operation unit (DOU) control signal received at the control input. I do.
The storage / alignment logic 152 receives the register file (R) at its address and control inputs, respectively.
F) It includes a data storage element for temporarily storing operands, constants, and partial results related to data calculation according to an instruction of an address and a data operation unit (DOU) control. Data operation unit (DOU) crossbar switch 15
0 is the data arithmetic unit (DO) received at the control input unit.
U) Loading of data from the memory 34 in accordance with the control signal, storage / alignment of the result output by the data operation logic 154 to the logic 152 or the memory 34, and storage / alignment of the constant output by the instruction fetch unit (IFU) 60. Preferably, it is a conventional crossbar switch network that facilitates loading into logic 152. In the preferred embodiment, the detailed structure of the data operation logic 154 depends on the type of operation supported by the instruction set architecture (ISA) currently under consideration. That is, the data operation logic 15
4 is an instruction set architecture (IS
It includes circuitry for performing the arithmetic and / or logical operations specified by the data processing instructions in A). Similarly, the detailed structures of the storage / alignment logic 152 and the data operation unit (DOU) crossbar switch 150 are described in the instruction set architecture (ISA) currently under consideration.
Is determined by Instruction Set Architecture (ISA)
The detailed structures of the data operation logic 154, the storage / alignment logic 152, and the data operation unit (DOU) crossbar switch 150 according to the type of the data operation are shown in FIGS.
The details will be described below with reference to FIG.

【００８７】外部ループ命令セットアーキテクチャ（Ｉ
ＳＡ）については、データ演算装置（ＤＯＵ）６２はデ
ータに対して逐次演算を実行するよう構成されているこ
とが好ましい。図１２は、汎用外部ループ命令セットア
ーキテクチャ（ＩＳＡ）の実動化のために構成されたデ
ータ演算装置（ＤＯＵ）６１の第１模範実施例の構成図
である。汎用外部ループ命令セットアーキテクチャ（Ｉ
ＳＡ）では、乗算、加算、減算などの数学的演算と、Ａ
ＮＤ、ＯＲ、ＮＯＴなどのブール演算と、シフト演算
と、回転演算とを実行するために構成されたハードウェ
アが必要である。したがって、汎用外部ループ命令セッ
トアーキテクチャ（ＩＳＡ）の実装については、データ
演算ロジック１５４は第１入力部と、第２入力部と、制
御入力部と、出力部とを有する従来型の演算論理装置
（ＡＬＵ）／シフタ１８４とを含んでいることが好まし
い。記憶／整列ロジック１５２は、第１ＲＡＭ１８０と
第２ＲＡＭ１８２とで構成されていることが好ましく、
これはそれぞれデータ入力部と、データ出力部と、アド
レス選択入力部と、イネーブル入力部とを含んでいる。
データ演算装置（ＤＯＵ）クロスバースイッチ１５０
は、双方向及び単方向クロスバー結合部を有し、また図
１１を用いてすでに説明したような入力部と出力部とを
有する従来型のクロスバースイッチネットワークを含ん
でいることが好ましい。当業者は、外部ループ命令セッ
トアーキテクチャ（ＩＳＡ）のためのデータ演算装置
（ＤＯＵ）クロスバースイッチ１５０の効率的な実動化
には、マルチプレクサと、３値バッファと、論理ブロッ
ク（ＣＬＢ）ベースロジックと、直接配線と、または再
構成結合手段によって、いずれかの組合わせで結合され
た上記構成部分のサブセットが含まれることを認めるで
あろう。外部ループについては、データ演算装置（ＤＯ
Ｕ）クロスバースイッチ１５０は最短時間で逐次データ
移動を促進するよう実動化されるが、汎用外部ループ命
令を支援するために最大数の単一データ移動クロスバー
結合部も提供する。External Loop Instruction Set Architecture (I
Regarding SA), it is preferable that the data operation unit (DOU) 62 is configured to execute a sequential operation on the data. FIG. 12 is a configuration diagram of a first exemplary embodiment of a data arithmetic unit (DOU) 61 configured for realizing a general-purpose external loop instruction set architecture (ISA). General-purpose external loop instruction set architecture (I
In SA), mathematical operations such as multiplication, addition, and subtraction, and A
Hardware configured to perform Boolean operations such as ND, OR, NOT, shift operations, and rotation operations is required. Thus, for a general-purpose outer loop instruction set architecture (ISA) implementation, the data operation logic 154 includes a conventional arithmetic logic device having a first input, a second input, a control input, and an output. ALU) / shifter 184. The storage / alignment logic 152 preferably comprises a first RAM 180 and a second RAM 182,
It includes a data input, a data output, an address selection input, and an enable input, respectively.
Data operation unit (DOU) crossbar switch 150
Preferably comprises a conventional crossbar switch network having bidirectional and unidirectional crossbar couplings and having inputs and outputs as already described with reference to FIG. Those skilled in the art will appreciate that efficient implementation of a data operation unit (DOU) crossbar switch 150 for an outer loop instruction set architecture (ISA) requires multiplexers, ternary buffers, and logic block (CLB) based logic. It will be appreciated that sub-sets of the above components coupled in any combination by direct wiring or by reconfiguration coupling means are included. For the outer loop, the data operation unit (DO
U) Crossbar switch 150 is implemented to facilitate sequential data movement in the shortest amount of time, but also provides the maximum number of single data movement crossbar couplings to support universal outer loop instructions.

【００８８】第１ＲＡＭ１８０のデータ入力部は第２Ｒ
ＡＭ１８２のデータ入力部と同様に、第３データライン
１６２を経てデータ演算装置（ＤＯＵ）クロスバースイ
ッチ１５０のデータ出力部に結合されている。第１ＲＡ
Ｍ１８０と第２ＲＡＭ１８２とのアドレス選択入力部
は、第３制御ライン７４を経て命令取出し装置（ＩＦ
Ｕ）６０からレジスタファイルアドレスを受取るよう結
合されている。同様に、第１ＲＡＭ１８０と第２ＲＡＭ
１８２とのイネーブル入力部は、第１制御ライン７０を
経てデータ演算装置（ＤＯＵ）制御信号を受取るよう結
合されている。第１ＲＡＭ１８０と第２ＲＡＭ１８２と
のデータ出力部は、それぞれＡＬＵ／シフタ１８４の第
１入力部と第２入力部に結合されており、またデータ演
算装置（ＤＯＵ）クロスバースイッチ１５０の第２デー
タフィードバック入力部にも結合されている。ＡＬＵ／
シフタ１８４の制御入力部は、第１制御ライン７０を経
てデータ演算装置（ＤＯＵ）制御信号を受取るよう結合
されている。またＡＬＵ／シフタ１８４の出力部は、デ
ータ演算装置（ＤＯＵ）クロスバースイッチ１５０の第
１データフィードバック入力部に結合されている。デー
タ演算装置（ＤＯＵ）クロスバースイッチ１５０の残り
の入力部と出力部への結合部は、図１１を用いて上記に
説明したものと同一である。The data input section of the first RAM 180 is the second R
Similar to the data input of the AM 182, it is coupled via a third data line 162 to the data output of a data operation unit (DOU) crossbar switch 150. 1st RA
An address selection input unit of the M180 and the second RAM 182 is connected to an instruction fetching device (IF
U) 60 to receive the register file address. Similarly, the first RAM 180 and the second RAM
An enable input to 182 is coupled to receive a data processing unit (DOU) control signal via a first control line 70. The data outputs of the first RAM 180 and the second RAM 182 are respectively coupled to the first input and the second input of the ALU / shifter 184, and the second data feedback input of the data operation unit (DOU) crossbar switch 150. The part is also joined. ALU /
The control input of shifter 184 is coupled to receive a data processing unit (DOU) control signal via first control line 70. Also, the output of ALU / shifter 184 is coupled to a first data feedback input of data arithmetic unit (DOU) crossbar switch 150. The connections to the remaining inputs and outputs of the data operation unit (DOU) crossbar switch 150 are the same as those described above with reference to FIG.

【００８９】データ演算命令の実行を容易にするため
に、命令取出し装置（ＩＦＵ）６０は命令状態シーケン
サ（ＩＳＳ）が状態ＥまたはＭであるときに、データ演
算装置（ＤＯＵ）制御信号と、レジスタファイル（Ｒ
Ｆ）アドレス信号と、定数信号とをデータ演算装置（Ｄ
ＯＵ）６１に発する。第１ＲＡＭ１８０と第２ＲＡＭ１
８２とは、それぞれ一時データ記憶のための第１及び第
２レジスタファイルを提供する。第１ＲＡＭ１８０と第
２ＲＡＭ１８２内の個々のアドレスは、各ＲＡＭのそれ
ぞれのアドレス選択入力部で受取ったレジスタファイル
（ＲＦ）アドレスに従って選択される。同様に、第１Ｒ
ＡＭ１８０と第２ＲＡＭ１８２のローディングは、その
書込みイネーブル入力部で各第１ＲＡＭ１８０と第２Ｒ
ＡＭ１８２とがそれぞれ受取るデータ演算装置（ＤＯ
Ｕ）制御信号によって制御される。好ましい実施例で
は、第１ＲＡＭ１８０と第２ＲＡＭ１８２の少なくとも
１個が、データ演算装置（ＤＯＵ）クロスバースイッチ
１５０からＡＬＵ／シフタ１８４へデータを直接転送す
るのを容易にするための伝達（引渡し）能力を含んでい
る。ＡＬＵ／シフタ１８４は、その制御入力部で受取っ
たデータ演算装置（ＤＯＵ）制御信号の指示に従って、
第１ＲＡＭ１８０から受取った第１オペランドに基づい
て、及び／または第２ＲＡＭ１８２から受取った第２オ
ペランドに基づいて、算術演算、論理演算、またはシフ
ト（桁送り）演算を実行する。データ演算装置（ＤＯ
Ｕ）クロスバースイッチ１５０は選択的に、１）メモリ３４と第１ＲＡＭ１８０及び第２ＲＡＭ１８
２との間のデータのルーティングと、２）ＡＬＵ／シフタ１８４から第１ＲＡＭ１８０及び第
２ＲＡＭ１８２へ、またはメモリ３４への結果のルーテ
ィングと、３）第１ＲＡＭ１８０または第２ＲＡＭ１８２に記憶さ
れたデータのメモリ３４へのルーティングと、４）命令取出し装置（ＩＦＵ）６０から第１ＲＡＭ１８
０及び第２ＲＡＭ１８２への定数のルーティングと、を行う。すでに述べたように、第１ＲＡＭ１８０か第２
ＲＡＭ１８２のいずれかが伝達能力を有するときには、
データ演算装置（ＤＯＵ）クロスバースイッチ１５０も
選択的にメモリ３４からＡＬＵ／シフタ１８４に、また
はＡＬＵ／シフタの出力部からＡＬＵ／シフタ１８４に
直接戻るようデータをルーティングする。データ演算装
置（ＤＯＵ）クロスバースイッチ１５０は、その制御入
力部で受取ったデータ演算装置（ＤＯＵ）制御信号に従
って、特定のルーティング演算を実行する。好ましい実
施例では、ＡＬＵ／シフタ１８４は再構成論理装置内の
数学的演算用の１組の論理ブロック（ＣＬＢ）と回路内
の論理関数発生器を用いて実装される。第１ＲＡＭ１８
０と第２ＲＡＭ１８２は、それぞれ１組の論理ブロック
（ＣＬＢ）内に存在するデータ記憶回路を用いて実装さ
れることが好ましい。データ演算装置（ＤＯＵ）クロス
バースイッチ１５０は、すでに述べた方法で実装される
ことが好ましい。To facilitate the execution of data operation instructions, instruction fetch unit (IFU) 60 includes a data operation unit (DOU) control signal and a register when the instruction state sequencer (ISS) is in state E or M. File (R
F) The address signal and the constant signal are converted into a data operation device (D
OU) 61. First RAM 180 and second RAM 1
82 provides first and second register files, respectively, for temporary data storage. Individual addresses in the first RAM 180 and the second RAM 182 are selected according to the register file (RF) address received at the respective address selection input of each RAM. Similarly, the first R
The loading of the AM 180 and the second RAM 182 is performed by the write enable input of each of the first RAM 180 and the second RAM 182.
AM182 and the data arithmetic unit (DO
U) Controlled by control signals. In a preferred embodiment, at least one of the first RAM 180 and the second RAM 182 has a transfer capability to facilitate the direct transfer of data from the data operation unit (DOU) crossbar switch 150 to the ALU / shifter 184. Contains. ALU / shifter 184 operates according to the instructions of the data operation unit (DOU) control signal received at its control input.
Perform an arithmetic, logical, or shift operation based on the first operand received from first RAM 180 and / or based on the second operand received from second RAM 182. Data arithmetic unit (DO
U) The crossbar switch 150 is selectively: 1) The memory 34, the first RAM 180, and the second RAM 18
2) routing of the results from ALU / shifter 184 to first RAM 180 and second RAM 182 or to memory 34; 3) to memory 34 of data stored in first RAM 180 or second RAM 182. 4) Instruction fetch unit (IFU) 60 to first RAM 18
0 and the routing of constants to the second RAM 182. As already mentioned, the first RAM 180 or the second
When any of the RAMs 182 has a transmission capability,
A data operation unit (DOU) crossbar switch 150 also selectively routes data from the memory 34 back to the ALU / shifter 184 or directly from the output of the ALU / shifter to the ALU / shifter 184. Data operation unit (DOU) crossbar switch 150 performs a specific routing operation according to a data operation unit (DOU) control signal received at its control input. In a preferred embodiment, ALU / shifter 184 is implemented using a set of logic blocks (CLBs) for mathematical operations in the reconfigurable logic and logic function generators in the circuit. First RAM 18
0 and the second RAM 182 are preferably implemented using data storage circuits, each of which resides in a set of logic blocks (CLBs). The data operation unit (DOU) crossbar switch 150 is preferably implemented in the manner already described.

【００９０】図１３は、内部ループ命令セットアーキテ
クチャ（ＩＳＡ）の実動化のために構成されたデータ演
算装置（ＤＯＵ）６３の第２模範実施例の構成図であ
る。一般に内部ループ命令セットアーキテクチャ（ＩＳ
Ａ）は比較的少ない専用演算を支援し、大きなデータセ
ットに対して共通した演算セットを実行するのに用いる
ことが好ましい。したがって、内部ループ命令セットア
ーキテクチャ（ＩＳＡ）のための最適の計算性能は、演
算を並列に実行するために構成されたハードウェアによ
って得られる。したがってデータ演算装置（ＤＯＵ）６
３の第２模範実施例では、データ演算ロジック１５４
と、記憶／整列ロジック１５２と、データ演算装置（Ｄ
ＯＵ）クロスバースイッチ１５０とはパイプライン計算
を実行するよう構成される。データ演算ロジック１５４
は、複数の入力部と、制御入力部と、出力部とを有する
パイプライン機能単位１９４を含んでいる。記憶／整列
ロジック１５２は、１）１組の従来型のフリップフロップアレイ１９２（そ
れぞれがデータ入力部と、データ出力部と、制御入力部
とを含んでいる）と、２）データセレクタ１９０（制御入力部と、データ入力
部と、フリップフロップアレイ１９２に対応する数のデ
ータ出力部とを含んでいる）と、を含んでいる。データ演算装置（ＤＯＵ）クロスバース
イッチ１５０は、二重単方向クロスバー結合部を有する
従来型のクロスバースイッチネットワークを含んでい
る。データ演算装置（ＤＯＵ）６３の第２模範実施例で
は、データ演算装置（ＤＯＵ）クロスバースイッチ１５
０は第２データフィードバック入力部を除き、図１１を
用いてすでに説明した入力部と出力部とを含んでいるこ
とが好ましい。外部ループ命令セットアーキテクチャ
（ＩＳＡ）の場合と同様に、内部ループ命令セットアー
キテクチャ（ＩＳＡ）のためのデータ演算装置（ＤＯ
Ｕ）クロスバースイッチ１５０の効率的な実装には、マ
ルチプレクサと、３値バッファと、論理ブロック（ＣＬ
Ｂ）ベースロジックと、直接配線と、または再構成可能
な方法で結合された上記構成部分のサブセットとを含め
ることができる。内部ループ命令セットアーキテクチャ
（ＩＳＡ）については、データ演算装置（ＤＯＵ）クロ
スバースイッチ１５０は最短時間で並列データ移動を最
大にするよう実装されるのが好ましいが、高度パイプラ
イン化内部ループ命令セットアーキテクチャ（ＩＳＡ）
命令を支援するために、最小数の単一データ動作クロス
バー結合部も提供する。FIG. 13 is a configuration diagram of a second exemplary embodiment of the data arithmetic unit (DOU) 63 configured for realizing the inner loop instruction set architecture (ISA). Generally, the inner loop instruction set architecture (IS
A) preferably supports relatively few dedicated operations and is used to perform a common set of operations on large data sets. Thus, optimal computational performance for an inner loop instruction set architecture (ISA) is provided by hardware configured to execute operations in parallel. Therefore, the data operation unit (DOU) 6
In the second exemplary embodiment of FIG.
, Storage / alignment logic 152, and a data operation device (D
OU) crossbar switch 150 is configured to perform pipeline calculations. Data operation logic 154
Includes a pipeline functional unit 194 having a plurality of inputs, a control input, and an output. The storage / alignment logic 152 includes: 1) a set of conventional flip-flop arrays 192 (each including a data input, a data output, and a control input); and 2) a data selector 190 (control). An input section, a data input section, and a number of data output sections corresponding to the number of flip-flop arrays 192). Data operation unit (DOU) crossbar switch 150 includes a conventional crossbar switch network with dual unidirectional crossbar connections. In the second exemplary embodiment of the data operation unit (DOU) 63, the data operation unit (DOU) crossbar switch 15
0 preferably includes the input and output already described with reference to FIG. 11, except for the second data feedback input. As with the outer loop instruction set architecture (ISA), the data operation unit (DO) for the inner loop instruction set architecture (ISA)
U) Efficient implementation of crossbar switch 150 includes multiplexers, ternary buffers, and logic blocks (CL
B) It may include base logic, direct wiring, or a subset of the above components combined in a reconfigurable manner. For the inner loop instruction set architecture (ISA), the data operation unit (DOU) crossbar switch 150 is preferably implemented to maximize parallel data movement in the shortest amount of time, but with a highly pipelined inner loop instruction set architecture. (ISA)
A minimum number of single data operation crossbar connectors are also provided to support the instructions.

【００９１】データセレクタ１９０のデータ入力部は、
第１データライン１６２を経てデータ演算装置（ＤＯ
Ｕ）クロスバースイッチ１５０のデータ出力部に結合さ
れている。データセレクタ１９０の制御入力部は、第３
制御ライン７４を経てレジスタファイル（ＲＦ）アドレ
スを受取るよう結合されており、データセレクタ１９０
の各出力部は、対応するフリップフロップアレイデータ
入力部に結合されている。各フリップフロップアレイ１
９２の制御入力部は、第１制御ライン７０を経てデータ
演算装置（ＤＯＵ）制御信号を受取るよう結合されてお
り、各フリップフロップアレイデータ出力部は機能単位
１９４の入力部に結合されている。機能単位１９４の制
御入力部は、第１制御ライン７０を経てデータ演算装置
（ＤＯＵ）制御信号を受取るよう結合されており、機能
単位１９４の出力部はデータ演算装置（ＤＯＵ）クロス
バースイッチ１５０の第１データフィードバック入力部
に結合されている。データ演算装置（ＤＯＵ）クロスバ
ースイッチ１５０の残りの入力部と出力部の結合部は、
図１１を用いて既に説明したものと同一である。The data input section of the data selector 190
Data operation device (DO) via first data line 162
U) It is coupled to the data output of the crossbar switch 150. The control input of the data selector 190 is
The data selector 190 is coupled to receive a register file (RF) address via control line 74.
Are coupled to corresponding flip-flop array data inputs. Each flip-flop array 1
A control input of 92 is coupled to receive a data operation unit (DOU) control signal via a first control line 70, and each flip-flop array data output is coupled to an input of a functional unit 194. The control input of the functional unit 194 is coupled to receive a data processing unit (DOU) control signal via the first control line 70, and the output of the functional unit 194 is connected to the data processing unit (DOU) crossbar switch 150. It is coupled to a first data feedback input. The remaining input section and output section of the data operation unit (DOU) crossbar switch 150 are
This is the same as that already described with reference to FIG.

【００９２】演算では、機能単位１９４はその制御入力
部で受取ったデータ演算装置（ＤＯＵ）制御信号に従っ
てそのデータ入力部で受取ったデータに対してパイプラ
イン演算を実行する。当業者は、機能単位１９４が乗算
／累算装置か、閾値決定装置か、画像回転装置か、エッ
ジ強調装置か、または区分されたデータに対してパイプ
ライン演算を実行するのに適したいずれかの種類の機能
単位であることを認めるであろう。データセレクタ１９
０は、その制御入力部で受取ったレジスタファイル（Ｒ
Ｆ）アドレスに従ってデータ演算装置（ＤＯＵ）クロス
バースイッチ１５０の出力部から所定のフリップフロッ
プアレイ１９２へデータをルーティング（経路決定）す
る。各フリップフロップアレイ１９２は、その制御入力
部で受取った制御信号の指示に従って、もう１個のフリ
ップフロップアレイ１９２のデータ内容に対してデータ
を空間的、時間的に整列させるために逐次結合されたデ
ータラッチを含んでいることが好ましい。データ演算装
置（ＤＯＵ）クロスバースイッチ１５０は選択的に、１）データをメモリ３４からデータセレクタ１９０へル
ーティングし、２）結果を乗算／累算装置１９４からデータセレクタ１
９０またはメモリ３４へルーティングし、３）定数を命令取出し装置（ＩＦＵ）６０からデータセ
レクタ１９０へルーティングする。当業者は、内部ループ命令セットアーキテクチャ（ＩＳ
Ａ）が１組の「内蔵」定数を含んでいることを認めるで
あろう。このような内部ループ命令セットアーキテクチ
ャ（ＩＳＡ）の実装では、記憶／整列ロジック１５４が
内蔵定数を有する論理ブロック（ＣＬＢ）ベースＲＯＭ
を含んでいることが好ましく、これによってデータ演算
装置（ＤＯＵ）クロスバースイッチ１５０を経て命令取
出し装置（ＩＦＵ）６０から記憶／整列ロジック１５２
へ定数をルーティングする必要性をなくすことができ
る。好ましい実施例では、機能単位１９４は１組の論理
ブロック（ＣＬＢ）内の数学的演算用の論理関数発生器
と回路とを用いて実装されるのが好ましい。各フリップ
フロップアレイ１９２は１組の論理ブロック（ＣＬＢ）
内のフリップフロップを用いて実装されることが好まし
い。データセレクタ１９０は１組の論理ブロック（ＣＬ
Ｂ）内の論理関数発生器とデータ選択回路とを用いて実
装されることが好ましい。最後にデータ演算装置（ＤＯ
Ｕ）クロスバースイッチ１５０は、内部ループについて
すでに説明した方法で実装されることが好ましい。In operation, functional unit 194 performs a pipeline operation on data received at its data input in accordance with a data operation unit (DOU) control signal received at its control input. One skilled in the art will recognize that the functional unit 194 is a multiply / accumulator, a threshold determiner, an image rotator, an edge enhancer, or any other suitable for performing pipeline operations on partitioned data. It will be appreciated that this is a type of functional unit. Data selector 19
0 is the register file (R
F) Data is routed (determined) from the output of the data operation unit (DOU) crossbar switch 150 to a predetermined flip-flop array 192 according to the address. Each flip-flop array 192 is sequentially coupled to spatially and temporally align data with the data content of another flip-flop array 192, as indicated by a control signal received at its control input. Preferably, it includes a data latch. The data operation unit (DOU) crossbar switch 150 selectively: 1) routes the data from the memory 34 to the data selector 190; 2) passes the result from the multiplication / accumulation unit 194 to the data selector 1
3) Route constants from instruction fetch unit (IFU) 60 to data selector 190. One skilled in the art will recognize the inner loop instruction set architecture (IS
It will be appreciated that A) contains a set of "built-in" constants. In such an internal loop instruction set architecture (ISA) implementation, the storage / alignment logic 154 has a logic block (CLB) based ROM with built-in constants.
, So that the storage / alignment logic 152 from the instruction fetch unit (IFU) 60 via the data operation unit (DOU) crossbar switch 150
This eliminates the need to route constants to In the preferred embodiment, the functional units 194 are preferably implemented using logic function generators and circuits for mathematical operations in a set of logic blocks (CLBs). Each flip-flop array 192 is a set of logic blocks (CLB).
It is preferably implemented using flip-flops within. The data selector 190 is a set of logical blocks (CL
It is preferably implemented using the logic function generator and the data selection circuit in B). Finally, the data operation device (DO
U) The crossbar switch 150 is preferably implemented in the manner already described for the inner loop.

【００９３】図１４は、アドレス演算装置（ＡＯＵ）６
４の好ましい実施例の構成図である。アドレス演算装置
（ＡＯＵ）６４は、アドレス演算装置（ＡＯＵ）制御信
号と、レジスタファイル（ＲＦ）アドレスと、命令取出
し装置（ＩＦＵ）６０から受取った定数とに従ってアド
レスに対して演算を実行する。アドレス演算装置（ＡＯ
Ｕ）６４は、アドレス演算装置（ＡＯＵ）クロスバース
イッチ２００と、記憶／計数ロジック２０２と、アドレ
ス演算ロジック２０４と、アドレスマルチプレクサ２０
６とを含んでいる。アドレス演算装置（ＡＯＵ）クロス
バースイッチ２００と、記憶／計数ロジック２０２と、
アドレス演算ロジック２０４と、アドレスマルチプレク
サ２０６とは、それぞれ第２制御ライン７２を経て命令
取出し装置（ＩＦＵ）６０の第２制御出力部に結合され
た制御入力部を含んでいる。アドレス演算装置（ＡＯ
Ｕ）クロスバースイッチ２００は、アドレス演算装置
（ＡＯＵ）の双方向データポートを形成する双方向デー
タポートと、第１アドレスライン２１０を経てアドレス
演算ロジック２０４のアドレス出力部に結合されたアド
レスフィードバック入力部と、第３制御ライン７４に結
合された定数入力部と、第２アドレスライン２１２を経
て記憶／計数ロジック２０２のアドレス入力部に結合さ
れたアドレス出力部とを含んでいる。記憶／計数ロジッ
ク２０２は、そのアドレス入力部と制御入力部の他に、
第３制御ライン７４に結合されたレジスタファイル（Ｒ
Ｆ）アドレス入力部と、第３アドレスライン２１４を経
てアドレス演算ロジック２０４のアドレス入力部に結合
されたアドレス出力部とを含んでいる。アドレスマルチ
プレクサ２０６は、第１アドレスライン２１０に結合さ
れた第１入力部と、第３アドレスライン２１４に結合さ
れた第２入力部と、アドレス演算装置（ＡＯＵ）６４の
アドレス出力部を形成する出力部とを含んでいる。FIG. 14 shows an address operation unit (AOU) 6
FIG. 4 is a configuration diagram of a fourth preferred embodiment. The address operation unit (AOU) 64 performs an operation on the address according to the address operation unit (AOU) control signal, the register file (RF) address, and the constant received from the instruction fetch unit (IFU) 60. Address arithmetic unit (AO
U) 64 is an address operation unit (AOU) crossbar switch 200, storage / counting logic 202, address operation logic 204, and address multiplexer 20.
6 is included. An address operation unit (AOU) crossbar switch 200, a storage / counting logic 202,
Address operation logic 204 and address multiplexer 206 each include a control input coupled to a second control output of instruction fetch unit (IFU) 60 via second control line 72. Address arithmetic unit (AO
U) The crossbar switch 200 includes a bidirectional data port forming a bidirectional data port of an address operation unit (AOU) and an address feedback input coupled to an address output of the address operation logic 204 via a first address line 210. And a constant input coupled to the third control line 74 and an address output coupled to the address input of the storage / counting logic 202 via a second address line 212. The storage / counting logic 202, in addition to its address and control inputs,
A register file (R) coupled to the third control line 74
F) Includes an address input and an address output coupled to the address input of address arithmetic logic 204 via third address line 214. Address multiplexer 206 has a first input coupled to first address line 210, a second input coupled to third address line 214, and an output forming an address output of address operation unit (AOU) 64. Department and contains.

【００９４】アドレス演算ロジック２０４は、その制御
入力部で受取ったアドレス演算装置（ＡＯＵ）制御信号
の指示に従ってそのアドレス入力部で受取ったアドレス
に対して算術演算を実行する。記憶／計数ロジック２０
２は、アドレス及びアドレス計算結果を一時記憶する。
アドレス演算装置（ＡＯＵ）クロスバースイッチ２００
は、その制御入力部で受取ったアドレス演算装置（ＡＯ
Ｕ）制御信号に従って、メモリ３４からのアドレスのロ
ーディングと、アドレス演算ロジック２０４の結果出力
の記憶／計数ロジック２０２またはメモリ３４への転送
と、命令取出し装置（ＩＦＵ）６０による定数出力の記
憶／計数ロジック２０２へのローディングとを容易にす
る。アドレスマルチプレクサ２０６は、その制御入力部
で受取ったアドレス演算装置（ＡＯＵ）制御信号の指示
に従って、記憶／計数ロジック２０２またはアドレスマ
ルチプレクサ２０６から受取ったアドレスをアドレス演
算装置（ＡＯＵ）６４のアドレス出力部に選択的に出力
する。好ましい実施例では、アドレス演算装置（ＡＯ
Ｕ）クロスバースイッチ２００と、記憶／計数ロジック
２０２と、アドレス演算ロジック２０４との詳細な構造
は、図１５と図１６を用いて下記に説明するように、現
在検討中の命令セットアーキテクチャ（ＩＳＡ）の種類
により定まる。Address operation logic 204 performs an arithmetic operation on the address received at the address input according to the instruction of the address operation unit (AOU) control signal received at the control input. Storage / counting logic 20
2 temporarily stores the address and the address calculation result.
Address operation unit (AOU) crossbar switch 200
Is the address arithmetic unit (AO) received at the control input unit.
U) Loading of the address from the memory 34, transfer of the result output of the address operation logic 204 to the storage / counting logic 202 or the memory 34, and storage / counting of the constant output by the instruction fetch unit (IFU) 60 according to the control signal. Loading into the logic 202 is facilitated. The address multiplexer 206 transfers the address received from the storage / counting logic 202 or the address multiplexer 206 to the address output of the address arithmetic unit (AOU) 64 in accordance with the instruction of the address arithmetic unit (AOU) control signal received at its control input. Selectively output. In the preferred embodiment, the address arithmetic unit (AO
U) The detailed structure of the crossbar switch 200, the storage / counting logic 202, and the address operation logic 204 will be described below with reference to FIGS. ) Is determined by the type.

【００９５】図１５は、汎用外部ループ命令セットアー
キテクチャ（ＩＳＡ）の実動化のために構成されたアド
レス演算装置（ＡＯＵ）６５の第１模範実施例の構成図
である。汎用外部ループ命令セットアーキテクチャ（Ｉ
ＳＡ）では、記憶／計数ロジック２０２に記憶されたプ
ログラムカウンタとアドレスの内容に対して加算、減
算、インクリメント、及びデクリメントなどの演算を実
行するためのハードウェアが必要である。アドレス演算
装置（ＡＯＵ）６５の第１模範実施例では、アドレス演
算ロジック２０４は、入力部と、出力部と、制御入力部
とを有する次命令プログラムアドレスレジスタ（ＮＩＰ
ＡＲ）２３２と、第１入力部と、第２入力部と、第３入
力部と、制御入力部と、出力部とを有する演算装置２３
４と、第１入力部と、第２入力部と、制御入力部と、出
力部とを有するマルチプレクサ２３０とを含んでいるこ
とが好ましい。記憶／計数ロジック２０２は、それぞれ
入力部と、出力部と、アドレス選択入力部と、イネーブ
ル入力部とを有する第３ＲＡＭ２２０と第４ＲＡＭ２２
２とを含んでいることが好ましい。アドレスマルチプレ
クサ２０６は、第１入力部と、第２入力部と、第３入力
部と、制御入力部と、出力部とを有するマルチプレクサ
を含んでいることが好ましい。アドレス演算装置（ＡＯ
Ｕ）クロスバースイッチ２００は、二重単方向クロスバ
ー結合部と、図１４を用いてすでに説明した入力部と出
力部とを有する従来型クロスバースイッチネットワーク
を含んでいることが好ましい。アドレス演算装置（ＡＯ
Ｕ）クロスバースイッチ２００の効率的な実動化には、
マルチプレクサと、３値バッファと、論理ブロック（Ｃ
ＬＢ）ベースロジックと、直接配線と、または再構成結
合部によって結合されたこのような構成部分のサブセッ
トが含まれる。外部ループ命令セットアーキテクチャ
（ＩＳＡ）については、アドレス演算装置（ＡＯＵ）ク
ロスバースイッチ２００は最短時間で逐次データ移動を
最大化するよう実装されることが好ましいが、汎用外部
ループアドレス演算命令を支援するために最大数の一意
のアドレス移動クロスバー結合部も提供する。FIG. 15 is a configuration diagram of a first exemplary embodiment of an address arithmetic unit (AOU) 65 configured for realizing a general-purpose external loop instruction set architecture (ISA). General-purpose external loop instruction set architecture (I
In SA), hardware for executing operations such as addition, subtraction, increment, and decrement on the contents of the program counter and the address stored in the storage / counting logic 202 is required. In a first exemplary embodiment of the address operation unit (AOU) 65, the address operation logic 204 includes a next instruction program address register (NIP) having an input, an output, and a control input.
AR) Arithmetic unit 23 having 232, first input unit, second input unit, third input unit, control input unit, and output unit
4, a multiplexer 230 having a first input, a second input, a control input, and an output. The storage / counting logic 202 includes a third RAM 220 and a fourth RAM 22 each having an input, an output, an address selection input, and an enable input.
And preferably 2. Address multiplexer 206 preferably includes a multiplexer having a first input, a second input, a third input, a control input, and an output. Address arithmetic unit (AO
U) The crossbar switch 200 preferably includes a conventional crossbar switch network having a dual unidirectional crossbar coupling and the inputs and outputs previously described with reference to FIG. Address arithmetic unit (AO
U) For efficient production of the crossbar switch 200,
A multiplexer, a ternary buffer, and a logic block (C
LB) Includes a subset of such components coupled by base logic, direct wiring, or reconfiguration coupling. For the external loop instruction set architecture (ISA), the address operation unit (AOU) crossbar switch 200 is preferably implemented to maximize sequential data movement in the shortest time, but supports general external loop address operation instructions. It also provides a maximum number of unique address move crossbar joiners.

【００９６】第３ＲＡＭ２２０の入力部と第４ＲＡＭ２
２２との入力部とは、それぞれ第２アドレスライン２１
２を経てアドレス演算装置（ＡＯＵ）クロスバースイッ
チ２００の出力部に結合されている。第３ＲＡＭ２２０
と第４ＲＡＭ２２２とのアドレス選択入力部は、第３制
御ライン７４を経て命令取出し装置（ＩＦＵ）６０から
レジスタファイル（ＲＦ）アドレスを受取るよう結合さ
れている。第３ＲＡＭ２２０と第４ＲＡＭ２２２とのイ
ネーブル入力部は、第２制御ライン７２を経てアドレス
演算装置（ＡＯＵ）制御信号を受取るよう結合されてい
る。第３ＲＡＭ２２０の出力部は、マルチプレクサ２３
０の第１入力部と、演算装置２３４の第１入力部と、ア
ドレスマルチプレクサ２０６の第１入力部とに結合され
ている。同様に、第４ＲＡＭ２２２の出力部は、マルチ
プレクサ２３０の第２入力部と、演算装置２３４の第２
入力部と、アドレスマルチプレクサ２０６の第２入力部
とに結合されている。マルチプレクサ２３０と、ＮＩＰ
ＡＲ２３２と、演算装置２３４との制御入力部は、それ
ぞれ第２制御ライン７２に結合されている。演算装置２
３４の出力部は、アドレス演算ロジック２０４の出力部
を形成しており、したがって、アドレス演算装置（ＡＯ
Ｕ）クロスバースイッチ２００のアドレスフィードバッ
ク入力部とアドレスマルチプレクサ２０６の第３入力部
とに結合されている。アドレス演算装置（ＡＯＵ）クロ
スバースイッチ２００とアドレスマルチプレクサ２０６
との残りの入力部と出力部への結合部は、図１４を用い
てすでに説明したものと同一である。The input section of the third RAM 220 and the fourth RAM 2
22 are input to the second address line 21 respectively.
2 is coupled to the output of an address operation unit (AOU) crossbar switch 200. Third RAM 220
And an address selection input of the fourth RAM 222 are coupled to receive a register file (RF) address from the instruction fetch unit (IFU) 60 via a third control line 74. The enable inputs of the third RAM 220 and the fourth RAM 222 are coupled to receive an address arithmetic unit (AOU) control signal via a second control line 72. The output of the third RAM 220 is the multiplexer 23
0, a first input of the arithmetic unit 234, and a first input of the address multiplexer 206. Similarly, the output of the fourth RAM 222 is connected to the second input of the multiplexer 230 and the second input of the arithmetic unit 234.
An input and a second input of the address multiplexer 206 are coupled. Multiplexer 230 and NIP
The control inputs of AR 232 and arithmetic unit 234 are each coupled to a second control line 72. Arithmetic unit 2
34 form the output of the address arithmetic logic 204, and therefore the address arithmetic unit (AO)
U) is coupled to the address feedback input of crossbar switch 200 and to a third input of address multiplexer 206; Address operation unit (AOU) crossbar switch 200 and address multiplexer 206
The remaining connection parts to the input and output parts are the same as those already described with reference to FIG.

【００９７】アドレス演算命令の実行を容易にするため
に、命令取出し装置（ＩＦＵ）６０は命令状態シーケン
サ（ＩＳＳ）が状態ＥまたはＭの時に、アドレス演算装
置（ＡＯＵ）制御信号と、レジスタファイル（ＲＦ）ア
ドレスと、定数とをアドレス演算装置（ＡＯＵ）６４に
発する。第３ＲＡＭ２２０と第４ＲＡＭ２２２とは、そ
れぞれアドレスの一時記憶のための第１及び第２レジス
タファイルを提供する。第３ＲＡＭ２２０と第４ＲＡＭ
２２２内の各記憶位置は、各ＲＡＭのアドレス選択入力
部で受取ったレジスタファイル（ＲＦ）アドレスに従っ
て選択される。第３ＲＡＭ２２０と第４ＲＡＭ２２２と
のローディングは、その書込みイネーブル入力部で第３
ＲＡＭ２２０と第４ＲＡＭ２２２とがそれぞれ受取るア
ドレス演算装置（ＡＯＵ）制御によって制御される。マ
ルチプレクサ２３０は、その制御入力部で受取ったアド
レス演算装置（ＡＯＵ）制御信号の指示に従って、第３
ＲＡＭ２２０と第４ＲＡＭ２２２とによるアドレス出力
をＮＩＰＡＲ２３２に選択的にルーティングする。ＮＩ
ＰＡＲ２３２は、マルチプレクサ２３０の出力部から受
取ったアドレスをロードしてその制御入力部で受取った
アドレス演算装置（ＡＯＵ）制御信号に応じてその内容
をインクリメントする。好ましい実施例では、ＮＩＰＡ
Ｒ２３２は、実行すべき次のプログラム命令のアドレス
を記憶する。演算装置２３４は、第３ＲＡＭ２２０と第
４ＲＡＭ２２２とから受取ったアドレスに対して、及び
／またはＮＩＰＡＲ２３２の内容に対して加算、減算、
インクリメント、及びデクリメントを含む算術演算を実
行する。アドレス演算装置（ＡＯＵ）クロスバースイッ
チ２００は選択的に、１）アドレスをメモリ３４から第３ＲＡＭ２２０と第４
ＲＡＭ２２２とへルーティングし、２）演算装置２３４によるアドレス計算出力の結果をメ
モリ３４または第３ＲＡＭ２２０と第４ＲＡＭ２２２と
へルーティングする。アドレス演算装置（ＡＯＵ）クロスバースイッチ２００
は、制御入力部で受取ったアドレス演算装置（ＡＯＵ）
制御信号に従って特定のルーティング演算を実行する。
アドレスマルチプレクサ２０６は、この制御入力部で受
取ったアドレス演算装置（ＡＯＵ）制御の指示に従って
第３ＲＡＭ２２０によるアドレス出力と、第４ＲＡＭ２
２２によるアドレス出力と、または演算装置２３４によ
るアドレス計算出力の結果とをアドレス演算装置（ＡＯ
Ｕ）のアドレス出力部に選択的にルーティングする。To facilitate the execution of the address operation instruction, the instruction fetch unit (IFU) 60 controls the address operation unit (AOU) control signal and the register file (IU) when the instruction state sequencer (ISS) is in the state E or M. An RF) address and a constant are issued to an address arithmetic unit (AOU) 64. The third RAM 220 and the fourth RAM 222 provide first and second register files for temporary storage of addresses, respectively. Third RAM 220 and Fourth RAM
Each storage location in 222 is selected according to the register file (RF) address received at the address selection input of each RAM. The loading of the third RAM 220 and the fourth RAM 222 is performed by the write enable input of the third RAM 220 and the fourth RAM 222.
The RAM 220 and the fourth RAM 222 are controlled by the control of an address arithmetic unit (AOU) received respectively. Multiplexer 230 responds to the instruction of the address operation unit (AOU) control signal received at its control input by a third
The address output from the RAM 220 and the fourth RAM 222 is selectively routed to the NIPAR 232. NI
The PAR 232 loads the address received from the output of the multiplexer 230 and increments its content in response to an address operation unit (AOU) control signal received at its control input. In a preferred embodiment, NIPA
R232 stores the address of the next program instruction to be executed. The arithmetic unit 234 adds, subtracts, and / or subtracts the address received from the third RAM 220 and the fourth RAM 222 and / or the content of the NIPAR 232.
Perform arithmetic operations including increment and decrement. The address operation unit (AOU) crossbar switch 200 may selectively: 1) transfer the address from the memory 34 to the third RAM 220 and the fourth RAM 220;
2) route the result of the address calculation output by the arithmetic unit 234 to the memory 34 or the third RAM 220 and the fourth RAM 222; Address operation unit (AOU) crossbar switch 200
Is the address arithmetic unit (AOU) received at the control input unit
Perform a specific routing operation according to the control signal.
The address multiplexer 206 outputs an address from the third RAM 220 according to the instruction of the address arithmetic unit (AOU) control received at the control input unit, and outputs the address from the fourth RAM 2.
22 and the result of the address calculation output by the arithmetic unit 234 are stored in the address arithmetic unit (AO
U) to selectively route to the address output.

【００９８】好ましい実施例では、第３ＲＡＭ２２０と
第４ＲＡＭ２２２とはそれぞれ１組の論理ブロック（Ｃ
ＬＢ）内に存在するデータ記憶回路を用いて実動化され
る。マルチプレクサ２３０とアドレスマルチプレクサ２
０６とは、それぞれ１組の論理ブロック（ＣＬＢ）内に
存在するデータ選択回路を用いて実動化されるのが好ま
しく、ＮＩＰＡＲ２３２は１組の論理ブロック（ＣＬ
Ｂ）内に存在するデータ記憶回路を用いて実装されるこ
とが好ましい。演算装置２３４は、１組の論理ブロック
（ＣＬＢ）内の数学的演算用の論理関数発生器と回路と
を用いて実装されることが好ましい。最後に、アドレス
演算装置（ＡＯＵ）クロスバースイッチ２００は、すで
に説明した方法で実装されることが好ましい。In the preferred embodiment, the third RAM 220 and the fourth RAM 222 each have a set of logical blocks (C
It is implemented using the data storage circuit existing in LB). Multiplexer 230 and address multiplexer 2
06 is preferably implemented using a data selection circuit that resides in each set of logic blocks (CLBs), and NIPAR 232 is implemented in a set of logic blocks (CLBs).
It is preferably implemented using the data storage circuit present in B). Arithmetic unit 234 is preferably implemented using logic function generators and circuits for mathematical operations within a set of logic blocks (CLBs). Finally, the address operation unit (AOU) crossbar switch 200 is preferably implemented in the manner already described.

【００９９】図１６は、内部ループ命令セットアーキテ
クチャ（ＩＳＡ）の実装のために構成されたアドレス演
算装置（ＡＯＵ）６６の第２模範実施例の構成図であ
る。内部ループ命令セットアーキテクチャ（ＩＳＡ）は
きわめて限られた数のアドレス演算を実行するためのハ
ードウェアを必要とし、また少なくとも１個のソースア
ドレスポインタと、対応する数の宛先アドレスポインタ
を維持するためのハードウェアを必要とする。きわめて
限られた数のアドレス演算または１個のアドレス演算が
必要な内部ループ処理の種類には、画像データのブロッ
ク演算、ラスター演算、またはサーペンタイン演算と、
ビットリバーサル演算と、環状バッファデータに対する
演算と、可変長データパーシング演算とが含まれる。こ
こでは、１回のアドレス演算すなわち、インクリメント
演算を検討する。当業者は、インクリメント演算を実行
するハードウェアが本来デクリメント演算も実行でき、
これによってさらに別のアドレス演算能力が得られるこ
とを認めるであろう。アドレス演算装置（ＡＯＵ）６６
の第２模範実施例では、記憶／計数ロジック２０２は、
入力部と、出力部と、制御入力部とを有する少なくとも
１個のソースアドレスレジスタ２５２と、入力部と、出
力部と、制御入力部とを有する少なくとも１個の宛先ア
ドレスレジスタ２５４と、入力部と、制御入力部と、現
存のソースアドレスレジスタ２５２及び宛先アドレスレ
ジスタ２５４の総数に等しい数の出力部とを有するデー
タセレクタ２５０とを含んでいる。ここでは、１個のソ
ースアドレスレジスタ２５２と、１個の宛先アドレスレ
ジスタ２５４を検討する。したがって、データセレクタ
２５０は、第１出力部と第２出力部とを含んでいる。ア
ドレス演算ロジック２０４は、入力部と、出力部と、制
御入力部とを有するＮＩＰＡＲ２３２と、データセレク
タと等しい数の入力部と、制御入力部と、出力部とを有
するマルチプレクサ２６０とを含んでいる。ここでマル
チプレクサ２６０は、第１入力部と第２入力部とを含ん
でいる。アドレスマルチプレクサ２０６は、データセレ
クタ出力部より１つ多い入力部と、制御入力部と、出力
部とを有するマルチプレクサを含んでいることが好まし
い。したがってここでは、アドレスマルチプレクサ２０
６は、第１入力部と、第２入力部と、第３入力部とを含
んでいる。アドレス演算装置（ＡＯＵ）クロスバースイ
ッチ２００は、双方向及び単方向クロスバー結合部を有
し、また図１４を用いてすでに説明した入力部と出力部
とを有する従来型のクロスバースイッチネットワークを
含んでいることが好ましい。アドレス演算装置（ＡＯ
Ｕ）クロスバースイッチ２００の効率的な実動化には、
マルチプレクサと、３値バッファと、論理ブロック（Ｃ
ＬＢ）ベースロジックと、直接配線と、または再構成結
合部によって結合されたこのような構成部分のサブセッ
トが含まれる。内部ループ命令セットアーキテクチャ
（ＩＳＡ）については、アドレス演算装置（ＡＯＵ）ク
ロスバースイッチ２００は最短時間で並列アドレス移動
を最大にするよう実動化されるのが好ましいが、内部ル
ープ操作コードを支援するために、最大数の一意のアド
レス移動クロスバー結合部も提供する。FIG. 16 is a block diagram of a second exemplary embodiment of the address arithmetic unit (AOU) 66 configured for implementing the inner loop instruction set architecture (ISA). The inner loop instruction set architecture (ISA) requires hardware to perform a very limited number of address operations and to maintain at least one source address pointer and a corresponding number of destination address pointers. Requires hardware. Types of internal loop processing that require a very limited number of address operations or one address operation include block operations, raster operations, or serpentine operations on image data,
It includes a bit reversal operation, an operation on circular buffer data, and a variable-length data parsing operation. Here, one address operation, that is, an increment operation is considered. Those skilled in the art will recognize that hardware that performs an increment operation can inherently perform a decrement operation,
It will be appreciated that this provides yet another addressing capability. Address operation unit (AOU) 66
In a second exemplary embodiment of the storage / counting logic 202,
At least one source address register 252 having an input, an output, and a control input; at least one destination address register 254 having an input, an output, and a control input; And a data selector 250 having a control input and a number of outputs equal to the total number of existing source address registers 252 and destination address registers 254. Here, one source address register 252 and one destination address register 254 are considered. Therefore, the data selector 250 includes a first output unit and a second output unit. The address operation logic 204 includes a NIPAR 232 having an input, an output, and a control input, and a multiplexer 260 having an equal number of inputs as data selectors, a control input, and an output. . Here, the multiplexer 260 includes a first input unit and a second input unit. Address multiplexer 206 preferably includes a multiplexer having one more input than the data selector output, a control input, and an output. Therefore, here, the address multiplexer 20 is used.
6 includes a first input unit, a second input unit, and a third input unit. An address operation unit (AOU) crossbar switch 200 includes a conventional crossbar switch network having a bidirectional and a unidirectional crossbar coupling unit, and having an input unit and an output unit already described with reference to FIG. It is preferable to include. Address arithmetic unit (AO
U) For efficient production of the crossbar switch 200,
A multiplexer, a ternary buffer, and a logic block (C
LB) Includes a subset of such components coupled by base logic, direct wiring, or reconfiguration coupling. For the inner loop instruction set architecture (ISA), the address arithmetic unit (AOU) crossbar switch 200 is preferably implemented to maximize parallel address movement in the shortest amount of time, but supports inner loop opcodes. To this end, it also provides a maximum number of unique address move crossbar joiners.

【０１００】データセレクタ２５０の入力部は、アドレ
ス演算装置（ＡＯＵ）クロスバースイッチ２００の出力
部に結合されている。データセレクタ２５０の第１及び
第２出力部は、それぞれソースアドレスレジスタ２５２
の入力部と宛先アドレスレジスタ２５４の入力部とに結
合されている。ソースアドレスレジスタ２５２と宛先ア
ドレスレジスタ２５４との制御入力部は、第２制御ライ
ン７２を経てアドレス演算装置（ＡＯＵ）制御信号を受
取るために結合されている。ソースアドレスレジスタ２
５２の出力部は、マルチプレクサ２６０の第１入力部と
アドレスマルチプレクサ２０６の第１入力部とに結合さ
れている。同様に、宛先アドレスレジスタ２５４の出力
部は、マルチプレクサ２６０の第２入力部とアドレスマ
ルチプレクサ２０６の第２入力部とに結合されている。
ＮＩＰＡＲ２３２の入力部は、マルチプレクサ２６０の
出力部に結合されており、ＮＩＰＡＲ２３２の制御入力
部は第２制御ライン７２を経てアドレス演算装置（ＡＯ
Ｕ）制御信号を受取るために結合されており、ＮＩＰＡ
Ｒ２３２の出力部はアドレス演算装置（ＡＯＵ）クロス
バースイッチ２００のアドレスフィードバック入力部と
アドレスマルチプレクサ２０６の第３入力部とに結合さ
れている。アドレス演算装置（ＡＯＵ）クロスバースイ
ッチ２００の残りの入力部と出力部とへの結合部は、図
１４を用いて上記に説明したものと同一である。The input of the data selector 250 is coupled to the output of an address operation unit (AOU) crossbar switch 200. The first and second output units of the data selector 250 are connected to the source address register 252, respectively.
And the input of the destination address register 254. The control inputs of the source address register 252 and the destination address register 254 are coupled via a second control line 72 to receive an address arithmetic unit (AOU) control signal. Source address register 2
The output of 52 is coupled to a first input of multiplexer 260 and a first input of address multiplexer 206. Similarly, the output of destination address register 254 is coupled to a second input of multiplexer 260 and a second input of address multiplexer 206.
The input of NIPAR 232 is coupled to the output of multiplexer 260, and the control input of NIPAR 232 is connected via a second control line 72 to an address arithmetic unit (AO).
U) NIPA coupled to receive control signals
The output of R232 is coupled to an address feedback input of an address operation unit (AOU) crossbar switch 200 and to a third input of an address multiplexer 206. The connection to the remaining inputs and outputs of the address operation unit (AOU) crossbar switch 200 is the same as that described above with reference to FIG.

【０１０１】演算では、データセレクタ２５０は、その
制御入力部で受取ったレジスタファイル（ＲＦ）アドレ
スに従ってアドレス演算装置（ＡＯＵ）クロスバースイ
ッチから受取ったアドレスをソースアドレスレジスタ２
５２または宛先アドレスレジスタ２５４にルーティング
する。ソースアドレスレジスタ２５２は、その制御入力
部に存在するアドレス演算装置（ＡＯＵ）制御信号に応
じてその入力部に存在するアドレスをロードする。宛先
アドレスレジスタ２５４は、同様の方法でその入力部に
存在するアドレスをロードする。マルチプレクサ２６０
は、その制御入力部で受取ったアドレス演算装置（ＡＯ
Ｕ）制御信号に従ってソースアドレスレジスタ２５２ま
たは宛先アドレスレジスタ２５４から受取ったアドレス
をＮＩＰＡＲ２３２の入力部にルーティングする。ＮＩ
ＰＡＲ２３２は、その制御入力部で受取ったアドレス演
算装置（ＡＯＵ）制御信号に応じてその入力部に存在す
るアドレスをロードし、その内容をインクリメントする
か、またはデクリメントする。アドレス演算装置（ＡＯ
Ｕ）クロスバースイッチ２００は選択的に、１）アドレスをメモリ３４からデータセレクタ２５０に
ルーティングし、２）ＮＩＰＡＲ２３２の内容をメモリ３４またはデータ
セレクタ２５０にルーティングする。アドレス演算装置（ＡＯＵ）クロスバースイッチ２００
は、その制御入力部で受取ったアドレス演算装置（ＡＯ
Ｕ）制御信号に従って特定のルーティング演算を実行す
る。アドレスマルチプレクサ２０６は、その制御入力部
で受取ったアドレス演算装置（ＡＯＵ）制御信号の指示
に従って、ソースアドレスレジスタ２５２、宛先アドレ
スレジスタ２５４、またはＮＩＰＡＲ２３２の内容をア
ドレス演算装置（ＡＯＵ）のアドレス出力部に選択的に
ルーティングする。In the operation, the data selector 250 receives the address received from the address arithmetic unit (AOU) crossbar switch in accordance with the register file (RF) address received at its control input section, and outputs the address to the source address register 2.
52 or to the destination address register 254. Source address register 252 loads an address present at its input in response to an address operation unit (AOU) control signal present at its control input. Destination address register 254 loads the address present at its input in a similar manner. Multiplexer 260
Is the address arithmetic unit (AO) received at the control input unit.
U) Route the address received from source address register 252 or destination address register 254 to the input of NIPAR 232 according to the control signal. NI
PAR 232 loads the address present at its input in response to an address operation unit (AOU) control signal received at its control input and increments or decrements its contents. Address arithmetic unit (AO
U) Crossbar switch 200 selectively: 1) routes the address from memory 34 to data selector 250, and 2) routes the contents of NIPAR 232 to memory 34 or data selector 250. Address operation unit (AOU) crossbar switch 200
Is the address arithmetic unit (AO) received at the control input unit.
U) Perform a specific routing operation according to the control signal. The address multiplexer 206 transfers the contents of the source address register 252, the destination address register 254, or the NIPAR 232 to the address output unit of the address arithmetic unit (AOU) according to the instruction of the address arithmetic unit (AOU) control signal received at the control input unit. Selectively route.

【０１０２】好ましい実施例では、ソースアドレスレジ
スタ２５２と宛先アドレスレジスタ２５４とは、それぞ
れ１組の論理ブロック（ＣＬＢ）内に存在するデータ記
憶回路を用いて実動化される。ＮＩＰＡＲ２３２は、１
組の論理ブロック（ＣＬＢ）内のインクリメント／デク
リメントロジック及びフリップフロップを用いて実動化
されることが好ましい。データセレクタ２５０と、マル
チプレクサ２３０と、アドレスマルチプレクサ２０６と
は、それぞれ１組の論理ブロック（ＣＬＢ）内に存在す
るデータ選択回路を用いて実動化されることが好まし
い。最後にアドレス演算装置（ＡＯＵ）クロスバースイ
ッチ２００は、内部ループ命令セットアーキテクチャ
（ＩＳＡ）についてすでに述べた方法で実動化されるこ
とが好ましい。当業者は、一部のアプリケーションで
は、外部ループデータ演算装置（ＤＯＵ）構成を備えた
内部ループアドレス演算装置（ＡＯＵ）構成に依存する
命令セットアーキテクチャ（ＩＳＡ）を用いることが、
またはその逆（内部ループアドレス演算装置（ＡＯＵ）
構成を備えた外部ループデータ演算装置（ＤＯＵ）構
成）が有利であることを認めるであろう。たとえば連想
ストリング探索命令セットアーキテクチャ（ＩＳＡ）
は、外部ループアドレス演算装置（ＡＯＵ）構成を備え
た内部ループデータ演算装置（ＤＯＵ）構成を利用する
と有利であろう。別の例として、ヒストグラム演算を実
行するための命令セットアーキテクチャ（ＩＳＡ）は、
内部ループアドレス演算装置（ＡＯＵ）構成を備えた外
部ループデータ演算装置（ＤＯＵ）構成を利用すると有
利であろう。In the preferred embodiment, source address register 252 and destination address register 254 are each implemented using data storage circuits residing in a set of logic blocks (CLBs). NIPAR232 is 1
It is preferably implemented using increment / decrement logic and flip-flops within a set of logic blocks (CLBs). It is preferable that the data selector 250, the multiplexer 230, and the address multiplexer 206 are each implemented by using a data selection circuit existing in a set of logic blocks (CLB). Finally, the address operation unit (AOU) crossbar switch 200 is preferably implemented in the manner already described for the inner loop instruction set architecture (ISA). Those skilled in the art will recognize that in some applications, using an instruction set architecture (ISA) that relies on an inner loop address arithmetic unit (AOU) configuration with an outer loop data arithmetic unit (DOU) configuration,
Or vice versa (Internal loop address arithmetic unit (AOU)
It will be appreciated that an outer loop data processing unit (DOU) configuration with a configuration is advantageous. For example, Associative String Search Instruction Set Architecture (ISA)
It would be advantageous to utilize an inner loop data arithmetic unit (DOU) configuration with an outer loop address arithmetic unit (AOU) configuration. As another example, an instruction set architecture (ISA) for performing histogram operations is:
It would be advantageous to utilize an outer loop data arithmetic unit (DOU) configuration with an inner loop address arithmetic unit (AOU) configuration.

【０１０３】有限の再構成ハードウェアリソースを、動
的再構成処理装置（ＤＲＰＵ）３２の各構成部分間で割
当てなければならない。再構成ハードウェアリソースは
数が限られているので、たとえばこれらを命令取出し装
置（ＩＦＵ）６０に割当てるとデータ演算装置（ＤＯ
Ｕ）６２及びアドレス演算装置（ＡＯＵ）６４によって
達成可能な最大計算性能レベルに影響を与える。再構成
ハードウェアリソースを、命令取出し装置（ＩＦＵ）６
０と、データ演算装置（ＤＯＵ）６２と、アドレス演算
装置（ＡＯＵ）６４との間で割当てる方法は、任意の瞬
間で実装される命令セットアーキテクチャ（ＩＳＡ）の
種類に応じて異なる。命令セットアーキテクチャ（ＩＳ
Ａ）が複雑になると、次第に複雑になる復号演算及び制
御演算を容易に行うために、より多くの再構成ハードウ
ェアリソースを命令取出し装置（ＩＦＵ）６０に割当て
なければならなくなり、データ演算装置（ＤＯＵ）６２
とアドレス演算装置（ＡＯＵ）６４との間で利用できる
再構成ハードウェアリソースは少なくなる。したがっ
て、データ演算装置（ＤＯＵ）６２とアドレス演算装置
（ＡＯＵ）６４とによって達成可能な最大計算性能は、
命令セットアーキテクチャ（ＩＳＡ）の複雑性が増すと
低下する。一般に、外部ループ命令セットアーキテクチ
ャ（ＩＳＡ）は内部ループ命令セットアーキテクチャ
（ＩＳＡ）より多くの命令を含み、したがってその実装
は、復号回路と制御回路においてかなり複雑となる。た
とえば汎用６４ビットプロセッサを規定する外部ループ
命令セットアーキテクチャ（ＩＳＡ）は、データ圧縮の
みに用いられる内部ループ命令セットアーキテクチャ
（ＩＳＡ）より多くの命令を含むことになると考えられ
る。A limited number of reconfigurable hardware resources must be allocated between the components of the dynamic reconfiguration processor (DRPU) 32. Since the number of reconfigurable hardware resources is limited, for example, when these are assigned to the instruction fetch unit (IFU) 60, the data arithmetic unit (DO)
U) 62 and the maximum computational performance level achievable by the address operation unit (AOU) 64. An instruction fetch unit (IFU) 6
The method of allocating between 0, data operation unit (DOU) 62, and address operation unit (AOU) 64 depends on the type of instruction set architecture (ISA) implemented at any given moment. Instruction Set Architecture (IS
As A) becomes more complex, more and more reconfigurable hardware resources must be allocated to the instruction fetch unit (IFU) 60 in order to facilitate increasingly complex decoding and control operations. DOU) 62
Reconfigurable hardware resources that can be used between the AOU 64 and the address arithmetic unit (AOU) 64 are reduced. Therefore, the maximum calculation performance achievable by the data operation unit (DOU) 62 and the address operation unit (AOU) 64 is:
Decreases with increasing instruction set architecture (ISA) complexity. In general, the outer loop instruction set architecture (ISA) contains more instructions than the inner loop instruction set architecture (ISA), and therefore its implementation is significantly more complex in decoding and control circuits. For example, an outer loop instruction set architecture (ISA) defining a general purpose 64-bit processor would include more instructions than an inner loop instruction set architecture (ISA) used only for data compression.

【０１０４】図１７（ａ）は、外部ループ命令セットア
ーキテクチャ（ＩＳＡ）のための、命令取出し装置（Ｉ
ＦＵ）６０と、データ演算装置（ＤＯＵ）６２と、アド
レス演算装置（ＡＯＵ）６４との間での再構成ハードウ
ェアリソースの模範割当てを示す図である。外部ループ
命令セットアーキテクチャ（ＩＳＡ）のための再構成ハ
ードウェアリソースの模範割当てでは、命令取出し装置
（ＩＦＵ）６０と、データ演算装置（ＤＯＵ）６２と、
アドレス演算装置（ＡＯＵ）６４はそれぞれ利用できる
再構成ハードウェアリソースの約３分の１を割当てられ
る。内部ループ命令セットアーキテクチャ（ＩＳＡ）を
実装するために動的再構成処理装置（ＤＲＰＵ）３２を
再構成すべきときには、内部ループ命令セットアーキテ
クチャ（ＩＳＡ）によって支援される命令の数とアドレ
ス命令の種類が限られるため、命令取出し装置（ＩＦ
Ｕ）６０とアドレス演算装置（ＡＯＵ）６４とを実装す
るのに必要な再構成ハードウェアリソースは少なくて済
む。図１７（ｂ）は、内部ループ命令セットアーキテク
チャ（ＩＳＡ）のための、命令取出し装置（ＩＦＵ）６
０と、データ演算装置（ＤＯＵ）６２と、アドレス演算
装置（ＡＯＵ）６４との間での再構成ハードウェアリソ
ースの模範割当てを示す図である。内部ループ命令セッ
トアーキテクチャ（ＩＳＡ）のための再構成ハードウェ
アリソースの模範割当てでは、命令取出し装置（ＩＦ
Ｕ）６０は再構成ハードウェアリソースの約５〜１０％
を用いて実装され、アドレス演算装置（ＡＯＵ）６４は
再構成ハードウェアリソースの約１０〜２５％を用いて
実装される。したがって、再構成ハードウェアリソース
の約７０〜８０％はデータ演算装置（ＤＯＵ）６２の実
装に利用できる。このことは、内部ループ命令セットア
ーキテクチャ（ＩＳＡ）に関連したデータ演算装置（Ｄ
ＯＵ）６２の内部構造が、内部ループ命令セットアーキ
テクチャ（ＩＳＡ）に関連したデータ演算装置（ＤＯ
Ｕ）６２の内部構造より複雑であってもよく、したがっ
てはるかに高い性能を発揮できることを意味している。FIG. 17A shows an instruction fetch unit (I) for an outer loop instruction set architecture (ISA).
FIG. 3 is a diagram showing an exemplary assignment of reconfigurable hardware resources among an FU) 60, a data arithmetic unit (DOU) 62, and an address arithmetic unit (AOU) 64. In an exemplary allocation of reconfigurable hardware resources for an outer loop instruction set architecture (ISA), an instruction fetch unit (IFU) 60, a data operation unit (DOU) 62,
Each address operation unit (AOU) 64 is allocated approximately one third of the available reconfigurable hardware resources. When the dynamic reconfiguration processor (DRPU) 32 is to be reconfigured to implement the inner loop instruction set architecture (ISA), the number of instructions and types of address instructions supported by the inner loop instruction set architecture (ISA) Instruction fetch device (IF
U) 60 and the address arithmetic unit (AOU) 64 require less reconfigurable hardware resources. FIG. 17B shows an instruction fetch unit (IFU) 6 for the inner loop instruction set architecture (ISA).
FIG. 4 is a diagram showing an exemplary assignment of reconfigurable hardware resources among a data operation unit (DOU) 62 and an address operation unit (AOU) 64. In an exemplary allocation of reconfigurable hardware resources for an inner loop instruction set architecture (ISA), an instruction fetch unit (IF
U) 60 is about 5-10% of reconfigured hardware resources
And the address operation unit (AOU) 64 is implemented using about 10 to 25% of the reconfigurable hardware resources. Therefore, about 70-80% of the reconfigured hardware resources are available for implementing the data processing unit (DOU) 62. This is because the data processing unit (D) associated with the inner loop instruction set architecture (ISA)
OU) 62 has a data processing unit (DO) associated with an inner loop instruction set architecture (ISA).
U) It can be more complex than the internal structure of 62, which means that it can exhibit much higher performance.

【０１０５】当業者は、別の実施例で動的再構成処理装
置（ＤＲＰＵ）３２がデータ演算装置（ＤＯＵ）６２ま
たはアドレス演算装置（ＡＯＵ）６４を除外できること
を認めるであろう。たとえば別の実施例では、動的再構
成処理装置（ＤＲＰＵ）３２はアドレス演算装置（ＡＯ
Ｕ）６４を含まなくても良い。その場合、データ演算装
置（ＤＯＵ）６２はデータとアドレスの両方に対して演
算を実行することになる。（上記で）検討した特定の動
的再構成処理装置（ＤＲＰＵ）実施例とは無関係に、動
的再構成処理装置（ＤＲＰＵ）３２の各構成部分を実装
するために有限数の再構成ハードウェアリソースを割当
てなければならない。再構成ハードウェアリソースは、
利用できる再構成ハードウェアリソースの全スペースに
対して現在検討中の命令セットアーキテクチャ（ＩＳ
Ａ）について最適のまたは最適に近い能力が達成される
ように割当てるのが好ましい。Those skilled in the art will recognize that in another embodiment, the dynamic reconfigurable processor (DRPU) 32 may exclude a data arithmetic unit (DOU) 62 or an address arithmetic unit (AOU) 64. For example, in another embodiment, the dynamic reconfiguration processor (DRPU) 32 includes an address arithmetic unit (AO).
U) 64 may not be included. In that case, the data operation unit (DOU) 62 performs an operation on both data and addresses. Regardless of the particular dynamic reconfiguration processor (DRPU) embodiment discussed (above), a finite number of reconfiguration hardware to implement each component of the dynamic reconfiguration processor (DRPU) 32 Resources must be allocated. Reconfigured hardware resources are
The instruction set architecture (IS) currently under consideration for the entire space of available reconfigurable hardware resources
Preferably, the allocation is such that optimal or near-optimal performance for A) is achieved.

【０１０６】当業者は、命令取出し装置（ＩＦＵ）６０
と、データ演算装置（ＤＯＵ）６２と、アドレス演算装
置（ＡＯＵ）６４との各構成部分の詳細な構造が上記に
説明した実施例に限定されないことを認めるであろう。
所定の命令セットアーキテクチャ（ＩＳＡ）について、
対応する構成データセットは、命令取出し装置（ＩＦ
Ｕ）６０と、データ演算装置（ＤＯＵ）６２と、アドレ
ス演算装置（ＡＯＵ）６４内の各構成部分の内部構造
が、利用できる再構成ハードウェアリソースに対して計
算性能を最大限にするように定められるのが好ましい。Those skilled in the art will recognize that the instruction fetch unit (IFU) 60
It will be appreciated that the detailed structure of each component of the data operation unit (DOU) 62 and the address operation unit (AOU) 64 is not limited to the embodiment described above.
For a given instruction set architecture (ISA),
The corresponding configuration data set contains the instruction fetch device (IF
U) 60, data operation unit (DOU) 62, and internal structure of each component in address operation unit (AOU) 64 to maximize computational performance for available reconfigurable hardware resources. Preferably, it is determined.

【０１０７】図１８は、Ｔマシンの好ましい実施例の構
成図である。Ｔマシン１４は、第２ローカルタイムベー
ス装置３００と、共用インタフェース制御装置（ＣＩＣ
Ｕ：Common Interface and Control Unit）３０２と、
１組の相互結合入出力装置３０４とを含んでいる。第２
ローカルタイムベース装置３００は、Ｔマシンのマスタ
タイミング入力部を形成するタイミング入力部を含んで
いる。共通インタフェース制御装置（ＣＩＣＵ）３０２
は、第２タイミング信号ライン３１０を経て第２ローカ
ルタイムベース装置３００のタイミング出力部に結合さ
れたタイミング入力部と、アドレスライン４４に結合さ
れたアドレス出力部と、メモリ入出力ライン４６に結合
された第１双方向制御ポートと、外部制御ライン４８に
結合された双方向制御ポートと、メッセージ転送ライン
３１２を経て現存する各相互結合入出力装置３０４の双
方向データポートに結合された第２双方向データポート
とを含んでいる。各相互結合入出力装置３０４は、メッ
セージ入力ライン３１４を経て汎用相互結合マトリック
ス（ＧＰＩＭ）１６に結合された入力部と、メッセージ
出力ライン３１６を経て汎用相互結合マトリックス（Ｇ
ＰＩＭ）１６に結合された出力部とを含んでいる。FIG. 18 is a block diagram of a preferred embodiment of the T machine. The T machine 14 includes a second local time base device 300 and a shared interface control device (CIC).
U: Common Interface and Control Unit) 302,
A set of interconnected input / output devices 304. Second
The local time base device 300 includes a timing input forming the master timing input of the T machine. Common interface control unit (CICU) 302
Is coupled via a second timing signal line 310 to a timing output of the second local time base device 300, an address output coupled to the address line 44, and coupled to the memory input / output line 46. A first bidirectional control port, a bidirectional control port coupled to external control line 48, and a second bidirectional control port coupled to a bidirectional data port of each existing interconnected input / output device 304 via message transfer line 312. Data port. Each interconnect input / output device 304 has an input coupled to a general interconnect matrix (GPIM) 16 via a message input line 314, and a general interconnect matrix (G) via a message output line 316.
(PIM) 16.

【０１０８】Ｔマシン１４内の第２ローカルタイムベー
ス装置３００は、マスタタイムベース装置２２からマス
タタイミング信号を受取り、第２ローカルタイミング信
号を生成する。第２ローカルタイムベース装置３００
は、第２ローカルタイミング信号を共通インタフェース
制御装置（ＣＩＣＵ）３０２に送り、これによってそれ
が存在するＴマシン１４についてのタイミング基準を提
供する。第２ローカルタイミング信号は、マスタタイミ
ング信号と位相同期であることが好ましい。システム１
０では、各Ｔマシンの第２ローカルタイムベース装置３
００は同一の周波数で作動することが好ましい。当業者
は、別の実施例では、１個またはそれ以上の第２ローカ
ルタイムベース装置３００が異なる周波数で作動するこ
とを認めるであろう。第２ローカルタイムベース装置３
００は、論理ブロック（ＣＬＢ）ベース位相ロック検出
回路を含む従来型の位相ロック周波数変換回路を用いて
実装されることが好ましい。当業者は、別の実施例では
第２ローカルタイムベース装置３００がクロック分散ツ
リーの一部として実装できることを認めるであろう。The second local time base device 300 in the T machine 14 receives a master timing signal from the master time base device 22 and generates a second local timing signal. Second local time base device 300
Sends a second local timing signal to a common interface controller (CICU) 302, thereby providing a timing reference for the T-machine 14 in which it resides. Preferably, the second local timing signal is in phase synchronization with the master timing signal. System 1
0, the second local time base device 3 of each T machine
00 preferably operate at the same frequency. Those skilled in the art will recognize that in another embodiment, one or more second local time base devices 300 operate at different frequencies. Second local time base device 3
00 is preferably implemented using a conventional phase locked frequency conversion circuit including a logic block (CLB) based phase lock detection circuit. Those skilled in the art will recognize that in another embodiment, the second local time base device 300 can be implemented as part of a clock distribution tree.

【０１０９】共通インタフェース制御装置（ＣＩＣＵ）
３０２は、その対応するＳマシン１２と特定の相互結合
入出力装置３０４との間のメッセージの転送を指示し、
このメッセージにはコマンドとおそらくデータとが含ま
れる。好ましい実施例では、指定された相互結合入出力
装置３０４がシステム１０の内部または外部にあるいず
れかのＴマシン１４または入出力Ｔマシン１８内に存在
しても良い。好ましい実施例では、各相互結合入出力装
置３０４は相互結合入出力装置３０４を一意的に識別す
る相互結合アドレスを割当てられているのが好ましい。
所定のＴマシン内の相互結合入出力装置３０４について
の相互結合アドレスは、対応するＳマシンのアーキテク
チャ記述メモリ１０１に記憶される。Common interface control unit (CICU)
302 directs the transfer of messages between its corresponding S-machine 12 and a particular interconnected I / O device 304;
This message contains the command and possibly data. In a preferred embodiment, the designated interconnect I / O device 304 may reside in any T-machine 14 or I / O T-machine 18 that is internal or external to system 10. In a preferred embodiment, each interconnection I / O device 304 is preferably assigned an interconnection address that uniquely identifies the interconnection I / O device 304.
The interconnection address for the interconnection I / O device 304 in a given T machine is stored in the architecture description memory 101 of the corresponding S machine.

【０１１０】共通インタフェース制御装置（ＣＩＣＵ）
３０２は、それぞれメモリ入出力ライン４６と外部制御
信号ライン４８とを経てその対応するＳマシン１２から
データとコマンドとを受取る。受取った各コマンドは、
目的相互結合アドレスと実行すべき特定の種類の演算を
指定するコマンドコードとを含んでいることが好まし
い。好ましい実施例では、コマンドコードによって一意
的に識別される種類の演算には、１）データ読出し演算と、２）データ書込み演算と、３）再構成割込み転送を含む割込み信号転送と、を含んでいる。目的相互結合アドレスは、データとコマ
ンドとを転送すべき目的相互結合入出力装置３０４を識
別する。共通インタフェース制御装置（ＣＩＣＵ）３０
２は、従来の方法で１組のパケットベースメッセージと
して各コマンドと関連データを転送することが好まし
く、各メッセージには目的相互結合アドレスとコマンド
コードとが含まれている。Common interface control unit (CICU)
302 receives data and commands from its corresponding S-machine 12 via memory input / output lines 46 and external control signal lines 48, respectively. Each command received
It preferably includes a target interconnect address and a command code that specifies the particular type of operation to be performed. In a preferred embodiment, the types of operations uniquely identified by the command code include: 1) a data read operation, 2) a data write operation, and 3) an interrupt signal transfer including a reconfigured interrupt transfer. I have. The destination interconnect address identifies the destination interconnect I / O device 304 to which data and commands should be transferred. Common interface control unit (CICU) 30
2, preferably transfers each command and associated data as a set of packet-based messages in a conventional manner, each message including a target interconnect address and a command code.

【０１１１】共通インタフェース制御装置（ＣＩＣＵ）
３０２は、その対応するＳマシン１２からデータとコマ
ンドとを受取る他に、メッセージ転送ライン３１２に結
合された各相互結合入出力装置３０４からメッセージを
受取る。好ましい実施例では、共通インタフェース制御
装置（ＣＩＣＵ）３０２は、関連メッセージグループを
単一のコマンドとデータシーケンスに変換する。コマン
ドがその対応するＳマシン１２内の動的再構成処理装置
（ＤＲＰＵ）３２に向けられているときには、共通イン
タフェース制御装置（ＣＩＣＵ）３０２は外部制御信号
ライン４８を経てコマンドを発する。コマンドがその対
応するＳマシン１２内のメモリ３４に向けられていると
きには、共通インタフェース制御装置（ＣＩＣＵ）３０
２は外部制御信号ライン４８を経て適切なメモリ制御信
号を発し、またメモリアドレスライン４４を経てメモリ
アドレス信号を発する。データは、メモリ入出力ライン
４６を経て転送される。好ましい実施例では、共通イン
タフェース制御装置（ＣＩＣＵ）３０２は、ＡＮＳＩ／
ＩＥＥＥ規格１５９６−１９９２に定められた従来型の
ＳＣＩ切替装置によって実行される演算に類似した演算
を実行するための論理ブロック（ＣＬＢ）ベース回路を
含んでいる。Common interface control unit (CICU)
302 receives a message from each interconnected I / O device 304 coupled to the message transfer line 312, in addition to receiving data and commands from its corresponding S-machine 12. In a preferred embodiment, the common interface controller (CICU) 302 converts the associated message group into a single command and data sequence. When a command is directed to a dynamic reconfiguration processor (DRPU) 32 in its corresponding S-machine 12, the common interface controller (CICU) 302 issues the command via the external control signal line. When a command is directed to the memory 34 in the corresponding S machine 12, the common interface controller (CICU) 30
2 issues an appropriate memory control signal via an external control signal line 48 and a memory address signal via a memory address line 44. Data is transferred via the memory input / output line 46. In a preferred embodiment, the common interface controller (CICU) 302
Includes logic block (CLB) based circuitry for performing operations similar to those performed by conventional SCI switching devices as defined in IEEE Standard 1596-1992.

【０１１２】各相互結合入出力装置３０４は、共通イン
タフェース制御装置（ＣＩＣＵ）３０２からメッセージ
を受取り、共通インタフェース制御装置（ＣＩＣＵ）３
０２から受取った制御信号の指示に従って、このメッセ
ージを汎用相互結合マトリックス（ＧＰＩＭ）１６を経
て別の相互結合入出力装置３０４に転送する。好ましい
実施例では、相互結合入出力装置３０４は、ＡＮＳＩ／
ＩＥＥＥ規格１５９６−１９９２に定められたＳＣＩノ
ードに基づいている。図１９は、相互結合入出力装置３
０４の好ましい実施例の構成図である。相互結合入出力
装置３０４は、アドレス復号器３２０と、入力ＦＩＦＯ
バッファ３２２と、バイパスＦＩＦＯバッファ３２４
と、出力ＦＩＦＯバッファ３２６と、マルチプレクサ３
２８とを含んでいる。アドレス復号器３２０は、相互結
合入出力装置の入力部を形成する入力部と、入力ＦＩＦ
Ｏバッファ３２２に結合された第１出力部と、バイパス
ＦＩＦＯバッファ３２４に結合された第２出力部とを含
んでいる。入力ＦＩＦＯバッファ３２２は、メッセージ
を共通インタフェース制御装置（ＣＩＣＵ）３０２に転
送するためのメッセージ転送ライン３１２に結合された
出力部を含んでいる。出力ＦＩＦＯバッファ３２６は、
共通インタフェース制御装置（ＣＩＣＵ）３０２からメ
ッセージを受取るためのメッセージ転送ライン３１２に
結合された入力部と、マルチプレクサ３２８の第１入力
部に結合された出力部とを含んでいる。バイパスＦＩＦ
Ｏバッファ３２４は、マルチプレクサ３２８の第２入力
部に結合された出力部を含んでいる。最後にマルチプレ
クサ３２８は、メッセージ転送ライン３１２に結合され
た制御入力部と、相互結合入出力装置の出力部を形成す
る出力部とを含んでいる。Each mutual coupling input / output device 304 receives a message from the common interface control device (CICU) 302 and receives the message from the common interface control device (CICU) 3.
The message is transferred to another interconnection input / output device 304 via the general interconnection matrix (GPIM) 16 in accordance with the instruction of the control signal received from the communication device 02. In a preferred embodiment, the interconnected input / output device 304 comprises an ANSI /
It is based on the SCI node defined in IEEE Standard 1596-1992. FIG.
FIG. 4 is a configuration diagram of a preferred embodiment of the present invention. The interconnection input / output device 304 includes an address decoder 320 and an input FIFO.
Buffer 322 and bypass FIFO buffer 324
, Output FIFO buffer 326 and multiplexer 3
28. Address decoder 320 includes an input forming an input of the interconnected input / output device, and an input FIFO.
It includes a first output coupled to the O-buffer 322 and a second output coupled to the bypass FIFO buffer 324. The input FIFO buffer 322 includes an output coupled to a message transfer line 312 for transferring a message to a common interface controller (CICU) 302. The output FIFO buffer 326
It includes an input coupled to the message transfer line 312 for receiving a message from the common interface controller (CICU) 302, and an output coupled to a first input of the multiplexer 328. Bypass FIF
O-buffer 324 includes an output coupled to a second input of multiplexer 328. Finally, multiplexer 328 includes a control input coupled to message transfer line 312 and an output forming the output of the interconnected input / output device.

【０１１３】相互結合入出力装置３０４は、アドレス復
号器３２０の入力部でメッセージを受取る。アドレス復
号器３２０は、受取ったメッセージで指定されている目
的相互結合アドレスがそれが存在する相互結合入出力装
置３０４の相互結合アドレスと同一であるかどうかを決
定する。同一の場合には、アドレス復号器３２０はこの
メッセージを入力ＦＩＦＯバッファ３２２にルーティン
グする。そうでなければアドレス復号器３２０は、メッ
セージをバイパスＦＩＦＯバッファ３２４にルーティン
グする。好ましい実施例では、アドレス復号器３２０は
入出力ブロック（ＩＯＢ）と論理ブロック（ＣＬＢ）を
用いて実装された復号器とデータセレクタとで構成され
る。The interconnection input / output device 304 receives a message at the input of the address decoder 320. Address decoder 320 determines whether the target interconnect address specified in the received message is the same as the interconnect address of interconnect I / O device 304 in which it resides. If they are the same, the address decoder 320 routes this message to the input FIFO buffer 322. Otherwise, the address decoder 320 routes the message to the bypass FIFO buffer 324. In the preferred embodiment, the address decoder 320 comprises a decoder implemented using input / output blocks (IOB) and logic blocks (CLB) and a data selector.

【０１１４】入力ＦＩＦＯバッファ３２２は、その入力
部で受取ったメッセージをメッセージ転送ライン３１２
に転送する従来型のＦＩＦＯバッファである。バイパス
ＦＩＦＯバッファ３２４と出力ＦＩＦＯバッファ３２６
とは、いずれもその入力部で受取ったメッセージをマル
チプレクサ３２８に転送する従来型のＦＩＦＯバッファ
である。マルチプレクサ３２８は、その制御入力部で受
取った制御信号に従って、バイパスＦＩＦＯバッファ３
２４から受取ったメッセージまたは出力ＦＩＦＯバッフ
ァ３２６から受取ったメッセージを汎用相互結合マトリ
ックス（ＧＰＩＭ）１６にルーティングする従来型のマ
ルチプレクサである。好ましい実施例では、入力ＦＩＦ
Ｏバッファ３２２と、バイパスＦＩＦＯバッファ３２４
と、出力ＦＩＦＯバッファ３２６とはそれぞれ１組の論
理ブロック（ＣＬＢ）を用いて実装される。マルチプレ
クサ３２８は、１組の論理ブロック（ＣＬＢ）と入出力
ブロック（ＩＯＢ）とを用いて実装されることが好まし
い。The input FIFO buffer 322 transfers a message received at its input to the message transfer line 312.
Is a conventional FIFO buffer. Bypass FIFO buffer 324 and output FIFO buffer 326
Is a conventional FIFO buffer that forwards the message received at its input to the multiplexer 328. Multiplexer 328 controls the bypass FIFO buffer 3 according to the control signal received at its control input.
A conventional multiplexer that routes messages received from 24 or output FIFO buffer 326 to a general interconnect matrix (GPIM) 16. In the preferred embodiment, the input
O buffer 322 and bypass FIFO buffer 324
And the output FIFO buffer 326 are each implemented using a set of logic blocks (CLBs). The multiplexer 328 is preferably implemented using a set of logic blocks (CLB) and input / output blocks (IOB).

【０１１５】図２０は、入出力Ｔマシン１８の好ましい
実施例の構成図である。入出力Ｔマシン１８は、第３ロ
ーカルタイムベース装置３６０と、共通カスタムインタ
フェース制御装置３６２と、相互結合入出力装置３０４
とを含んでいる。第３ローカルタイムベース装置３６０
は、入出力Ｔマシンのマスタタイミング入力部を形成す
るタイミング入力部を含んでいる。相互結合入出力装置
３０４は、メッセージ入力部ライン３１４を経て汎用相
互結合マトリックス（ＧＰＩＭ）１６に結合された入力
部と、メッセージ出力ライン３１６を経て汎用相互結合
マトリックス（ＧＰＩＭ）１６に結合された出力部とを
含んでいる。共通カスタムインタフェース制御装置３６
２は、第３タイミング信号ライン３７０を経て第３ロー
カルタイムベース装置３６０のタイミング出力部に結合
されたタイミング入力部と、相互結合入出力装置３０４
の双方向データポートに結合された第１双方向データポ
ートと、入出力装置２０への１組の結合部とを含んでい
る。好ましい実施例では、入出力装置２０への１組の結
合部は、入出力装置２０の双方向データポートに結合さ
れた第２双方向データポートと、入出力装置２０のアド
レス入力部に結合されたアドレス出力部と、入出力装置
２０の双方向制御ポートに結合された双方向制御ポート
とを含んでいる。当業者は、共通カスタムインタフェー
ス制御装置３６２が結合されている入出力装置２０の種
類によって、入出力装置２０への結合部が定まることを
容易に認めるであろう。FIG. 20 is a block diagram of a preferred embodiment of the input / output T machine 18. As shown in FIG. The input / output T machine 18 includes a third local time base device 360, a common custom interface control device 362, and an interconnected input / output device 304
And Third local time base device 360
Include a timing input forming the master timing input of the input / output T machine. The interconnect input / output device 304 has an input coupled to the general interconnect matrix (GPIM) 16 via a message input line 314 and an output coupled to the general interconnect matrix (GPIM) 16 via a message output line 316. Department and contains. Common custom interface controller 36
2 is a timing input coupled to a timing output of a third local time base device 360 via a third timing signal line 370;
A first bidirectional data port coupled to the I / O device 20 and a set of couplings to the input / output device 20. In a preferred embodiment, a set of couplings to input / output device 20 is coupled to a second bidirectional data port coupled to the bidirectional data port of input / output device 20 and to an address input of input / output device 20. And a bidirectional control port coupled to the bidirectional control port of the input / output device 20. Those skilled in the art will readily recognize that the type of I / O device 20 to which the common custom interface controller 362 is coupled will determine the coupling to the I / O device 20.

【０１１６】第３ローカルタイムベース装置３６０は、
マスタタイムベース装置２２からマスタタイミング信号
を受取り、第３ローカルタイミング信号を生成する。第
３ローカルタイムベース装置３６０は、第３ローカルタ
イミング信号を共通カスタムインタフェース制御装置３
６２へ送り、それが配置されている入出力Ｔマシンにタ
イミング基準を提供する。好ましい実施例では、第３ロ
ーカルタイミング信号はマスタタイミング信号と位相同
期している。各入出力Ｔマシンの第３ローカルタイムベ
ース装置３６０は、同一の周波数で作動するのが好まし
い。別の実施例では、１個またはそれ以上の第３ローカ
ルタイムベース装置３６０は異なる周波数で作動するこ
とができる。第３ローカルタイムベース装置３６０は、
論理ブロック（ＣＬＢ）ベース位相ロック検出回路を含
む従来型の位相ロック周波数変換回路を用いて実装する
のが好ましい。第１ローカルタイムベース装置３０及び
第２ローカルタイムベース装置３００と同様の方法で、
別の実施例では第３ローカルタイムベース装置３６０を
クロック分散ツリーの一部として実装することができ
る。The third local time base device 360
A master timing signal is received from the master time base device 22, and a third local timing signal is generated. The third local time base device 360 transmits the third local timing signal to the common custom interface control device 3.
62 to provide a timing reference to the I / O T-machine in which it is located. In a preferred embodiment, the third local timing signal is phase synchronized with the master timing signal. The third local time base device 360 of each I / O T-machine preferably operates at the same frequency. In another embodiment, one or more third local time base devices 360 can operate at different frequencies. The third local time base device 360
Preferably, it is implemented using a conventional phase lock frequency conversion circuit including a logic block (CLB) based phase lock detection circuit. In the same manner as the first local time base device 30 and the second local time base device 300,
In another embodiment, the third local time base device 360 can be implemented as part of a clock distribution tree.

【０１１７】入出力Ｔマシン１８内の相互結合入出力装
置３０４の構造と機能は、Ｔマシン１４についてすでに
説明したものと同一であることが好ましい。入出力Ｔマ
シン１８内の相互結合入出力装置３０４は、任意のＴマ
シン１４内の各相互接続入出力装置３０４の場合と類似
した方法でユニークな相互結合アドレスが割当てられ
る。The structure and function of the interconnected input / output device 304 in the input / output T machine 18 are preferably the same as those already described for the T machine 14. The interconnected I / O devices 304 in the I / O T-machine 18 are assigned unique interconnect addresses in a manner similar to each interconnected I / O device 304 in any T-machine 14.

【０１１８】共通カスタムインタフェース制御装置３６
２は、それに結合された入出力装置２０と相互結合入出
力装置３０４との間のメッセージの転送を指示し、この
メッセージにはコマンドとおそらくデータとが含まれ
る。共通カスタムインタフェース制御装置３６２は、そ
の対応する入出力装置２０からデータとコマンドとを受
取る。入出力装置２０から受取った各コマンドは、目的
相互結合アドレスと実行すべき特定の種類の演算を指定
するコマンドコードとを含んでいることが好ましい。好
ましい実施例では、コマンドコードによって一意的に識
別される演算の種類には、１）データ要求と、２）データ転送確認と、３）割込み信号転送と、が含まれる。目的相互結合アドレスは、データとコマン
ドとを転送すべきシステム１０内の目的相互結合入出力
装置３０４を識別する。共通カスタムインタフェース制
御装置３６２は、従来の方法で１組のパケットベースメ
ッセージとして各コマンドと関連データを転送すること
が好ましく、各メッセージには目的相互結合アドレスと
コマンドコードとが含まれている。The common custom interface control device 36
2 directs the transfer of a message between the I / O device 20 coupled to it and the interconnected I / O device 304, the message including a command and possibly data. The common custom interface controller 362 receives data and commands from its corresponding input / output device 20. Each command received from input / output device 20 preferably includes a target interconnect address and a command code specifying a particular type of operation to be performed. In the preferred embodiment, the types of operations uniquely identified by the command code include: 1) data request, 2) data transfer confirmation, and 3) interrupt signal transfer. The destination interconnect address identifies the destination interconnect I / O device 304 in the system 10 to which data and commands are to be transferred. The common custom interface controller 362 preferably forwards each command and associated data as a set of packet-based messages in a conventional manner, each message including a target interconnect address and a command code.

【０１１９】共通カスタムインタフェース制御装置３６
２は、その対応する入出力装置２０からデータとコマン
ドとを受取る他に、その関連する入出力装置２０からメ
ッセージを受取る。好ましい実施例では、共通カスタム
インタフェース制御装置３６２は、その対応する入出力
装置２０に支援される通信プロトコルに従って、関連メ
ッセージグループを単一のコマンド及びデータシーケン
スに変換する。好ましい実施例では、共通カスタムイン
タフェース制御装置３６２は、ＡＮＳＩ／ＩＥＥＥ規格
１５９６−１９９２に定められた従来型のＳＣＩ切替装
置によって実行される演算と類似した演算を実行するた
めの論理ブロック（ＣＬＢ）ベース回路に結合された論
理ブロック（ＣＬＢ）ベース入出力装置コントローラを
含んでいる。The common custom interface control device 36
2 receives a message from its associated I / O device 20 in addition to receiving data and commands from its corresponding I / O device 20. In a preferred embodiment, the common custom interface controller 362 converts the associated message group into a single command and data sequence according to the communication protocol supported by its corresponding input / output device 20. In the preferred embodiment, the common custom interface controller 362 is based on a logic block (CLB) for performing operations similar to those performed by conventional SCI switching devices as defined in ANSI / IEEE Standard 1596-1992. A logic block (CLB) based input / output device controller is coupled to the circuit.

【０１２０】汎用相互結合マトリックス（ＧＰＩＭ）１
６は、相互結合入出力装置３０４の間の２点間並列メッ
セージルーティングを容易に行えるようにする従来型の
相互結合メッシュである。好ましい実施例では、汎用相
互結合マトリックス（ＧＰＩＭ）１６はワイヤーベース
でｋ−ａｒｙのｎキューブの静的相互結合ネットワーク
である。図２１は、汎用相互結合マトリックス（ＧＰＩ
Ｍ）１６の模範実施例の構成図である。図２１では、汎
用相互結合マトリックス（ＧＰＩＭ）１６は、複数の第
１通信チャネル３８０と、複数の第２通信チャネル３８
２とを含む、ｋ−ａｒｙの２キューブの環状体相互結合
メッシュである。各第１通信チャネル３８０は、複数の
ノード接続部３８４を含んでおり、各第２通信チャネル
３８２も同様に含んでいる。システム１０の各相互結合
入出力装置３０４は、メッセージ入力ライン３１４と、
メッセージ出力ライン３１６とが、所定の第１通信チャ
ネル３８０と第２通信チャネル３８２内とで連続ノード
接続部３８４と接続するように汎用相互結合マトリック
ス（ＧＰＩＭ）１６に結合されているのが好ましい。好
ましい実施例では、各Ｔマシン１４は上記に説明した方
法で第１通信チャネル３８０に結合された相互結合入出
力装置３０４と、第２通信チャネル３８２に結合された
相互結合入出力装置３０４とを含んでいる。Ｔマシン１
４内の共通インタフェース制御装置（ＣＩＣＵ）３０２
は、第１通信チャネル３８０に結合されたその相互結合
入出力装置３０４と、第２通信チャネル３８２に結合さ
れたその相互結合入出力装置３０４との間の情報のルー
ティングを容易に行えることが好ましい。したがって、
図２１で３８０ｃと表記されている第１通信チャネル３
８０に結合された相互結合入出力装置３０４と、３８２
ｃと表記されている第２通信チャネル３８２に結合され
た相互結合入出力装置３０４とを含むＴマシン１４につ
いては、このＴマシンの共通インタフェース制御装置
（ＣＩＣＵ）３０２は、第１通信チャネル３８０ｃと第
２通信チャネル３８２ｃとの間の情報ルーティングを容
易に行える。General Interconnection Matrix (GPIM) 1
6 is a conventional interconnection mesh that facilitates point-to-point parallel message routing between interconnection input / output devices 304. In a preferred embodiment, the general interconnect matrix (GPIM) 16 is a wire-based k-ary n-cube static interconnect network. FIG. 21 shows a general interconnect matrix (GPI
M) is a block diagram of an exemplary embodiment of 16; In FIG. 21, a general interconnect matrix (GPIM) 16 includes a plurality of first communication channels 380 and a plurality of second communication channels 38.
2 is a k-ary 2-cube ring interconnected mesh comprising Each first communication channel 380 includes a plurality of node connections 384, and each second communication channel 382 as well. Each interconnected input / output device 304 of system 10 includes a message input line 314,
Preferably, the message output line 316 is coupled to a general interconnect matrix (GPIM) 16 to connect to the continuous node connection 384 within a predetermined first communication channel 380 and a second communication channel 382. In a preferred embodiment, each T machine 14 has an interconnected I / O device 304 coupled to the first communication channel 380 and an interconnected I / O device 304 coupled to the second communication channel 382 in the manner described above. Contains. T machine 1
4 common interface control unit (CICU) 302
Preferably facilitates routing information between its interconnected I / O devices 304 coupled to the first communication channel 380 and its interconnected I / O devices 304 coupled to the second communication channel 382. . Therefore,
The first communication channel 3 denoted by 380c in FIG.
Interconnect input / output device 304 coupled to 80;
For a T-machine 14 that includes an interconnected input / output device 304 coupled to a second communication channel 382 labeled c, the T-machine's common interface controller (CICU) 302 includes a first communication channel 380c and a Information routing to and from the second communication channel 382c can be easily performed.

【０１２１】したがって汎用相互結合マトリックス（Ｇ
ＰＩＭ）１６は、並列に配置された相互結合入出力装置
３０４間の複数のメッセージのルーティングを容易に行
える。図２１の２次元汎用相互結合マトリックス（ＧＰ
ＩＭ）１６については、各Ｔマシン１４は第１通信チャ
ネル３８０について１個の相互結合入出力装置３０４
を、また第２通信チャネル３８２について１個の相互結
合入出力装置３０４を含んでいることが好ましい。当業
者は、汎用相互結合マトリックス（ＧＰＩＭ）１６の次
元が２次元を越える実施例では、Ｔマシン１４が２個を
超える相互結合入出力装置３０４を含んでいることが好
ましいことを認めるであろう。汎用相互結合マトリック
ス（ＧＰＩＭ）１６は、１６ビットデータパスを含むｋ
−ａｒｙの２キューブとして実装されることが好まし
い。Therefore, the universal interconnection matrix (G
The PIM 16 facilitates routing of multiple messages between interconnected I / O devices 304 arranged in parallel. The two-dimensional general interconnection matrix (GP
IM) 16, each T-machine 14 has one interconnected I / O device 304 for the first communication channel 380.
, And one interconnected input / output device 304 for the second communication channel 382. Those skilled in the art will recognize that in embodiments where the dimensions of the general interconnect matrix (GPIM) 16 are greater than two dimensions, it is preferred that the T machine 14 include more than two interconnected input / output devices 304. . A general interconnect matrix (GPIM) 16 includes a 16 bit data path
Preferably implemented as -ary two cubes.

【０１２２】上記の説明では、本発明の各種構成部分
は、再構成ハードウェアリソースを用いて実装されるこ
とが好ましい。再構成論理装置のメーカーは、一般に再
プログラマブルハードウェアリソースまたは再構成ハー
ドウェアリソースを用いて従来型のデジタルハードウェ
アを実装するための指針を公表している。たとえば１９
９４年度のＸｉｌｉｎｘプログラマブル論理装置データ
ブック（xilinx, Inc.,サンノゼ，カリフォルニア）に
は、次のようなアプリケーションノートが含まれてい
る。すなわち、アプリケーションノートＸＡＰＰ００
５．００２「レジスタベースＦＩＦＯ」、アプリケーシ
ョンノートＸＡＰＰ０４４．００「高性能ＲＡＭベース
ＦＩＦＯ」、アプリケーションノートＸＡＰＰ０１３．
００１「ＸＣ４０００での桁上げ専用ロジックの使
用」、アプリケーションノートＸＡＰＰ０１８．００
「ＸＣ４０００加算器とカウンタの性能の推定」、アプ
リケーションノートＸＡＰＰ０２８．００１「位相ロッ
クループのための周波数／位相コンパレータ」、アプリ
ケーションノートＸＡＰＰ０３１．０００「ＸＣ４００
０ＲＡＭの使用」、アプリケーションノートＸＡＰＰ０
３６．００１「４ポートＤＲＡＭコントロ−
ラ．．．」、アプリケーションノートＸＡＰＰ０３９．
００１「１８ビットパイプライン累算器」の各アプリケ
ーションノートである。Ｘｉｌｉｎｘ社が公表している
資料には、さらにＸｉｌｉｎｘプログラマブルロジック
のユーザーのための季刊誌である「ＸＣＥＬＬ」に含ま
れる記事がある。たとえば１９９４年の第３号（通刊第
１４号）には高速整数乗算器の実装に関する詳しい記事
が掲載されている。In the above description, the various components of the present invention are preferably implemented using reconfigured hardware resources. Reconfigurable logic device manufacturers generally publish guidelines for implementing conventional digital hardware using reprogrammable or reconfigurable hardware resources. For example, 19
The 1994 Xilinx Programmable Logic Device Data Book (xilinx, Inc., San Jose, CA) includes the following application notes: That is, application note XAPP00
5.002 "Register based FIFO", application note XAPP044.00 "High performance RAM based FIFO", application note XAPP013.
001 "Use of carry-only logic in XC4000", application note XAPP018.00
"Estimation of XC4000 Adder and Counter Performance", Application Note XAPP028.0001, "Frequency / Phase Comparator for Phase Locked Loop", Application Note XAPP031.000, "XC400
Use of 0RAM ", Application Note XAPP0
36.001 "4-port DRAM control
La. . . Application Note XAPP039.
001 is an application note of “18-bit pipeline accumulator”. Materials published by Xilinx also include articles included in XCELL, a quarterly magazine for Xilinx programmable logic users. For example, the third issue of 1994 (the 14th edition) contains a detailed article on the implementation of a high-speed integer multiplier.

【０１２３】この明細書で説明しているシステム１０
は、動的に実装される多重命令セットアーキテクチャ
（ＩＳＡ）のための拡張性、並列コンピュータアーキテ
クチャである。どのＳマシン１２も、別のＳマシン１２
やホストコンピュータなどの外部ハードウェアリソース
とは無関係に、それだけでコンピュータプログラム全体
を実行することができる。どのＳマシン１２において
も、多重命令セットアーキテクチャ（ＩＳＡ）は再構成
割込み及び／またはプログラムに埋込まれた再構成指示
に応じて、プログラム実行中に連続的に実装される。シ
ステム１０は多重Ｓマシン１２を含んでいるのが好まし
いので、複数のプログラムが同時に実行されるのが好ま
しく、各プログラムは独立したものでも良い。したがっ
て、システム１０が多重Ｓマシン１２を含んでいるのが
好ましいので、多重命令セットアーキテクチャ（ＩＳ
Ａ）はシステム初期化または再構成中以外はいつでも同
時に（すなわち、並列に）実動化される。すなわち、任
意の時間に複数のセットのプログラム命令が同時に実行
され、プログラム命令の各セットは対応する命令セット
アーキテクチャ（ＩＳＡ）に従って実行される。このよ
うな命令セットアーキテクチャ（ＩＳＡ）は、それぞれ
一意なものである。The system 10 described in this specification
Is an extensible, parallel computer architecture for dynamically implemented multiple instruction set architectures (ISAs). Every S machine 12 is another S machine 12
Independently of external hardware resources such as a computer and a host computer, it alone can execute the entire computer program. In any S-machine 12, a multiple instruction set architecture (ISA) is implemented continuously during program execution in response to reconfiguration interrupts and / or reconfiguration instructions embedded in the program. Since system 10 preferably includes multiple S-machines 12, it is preferred that multiple programs be executed simultaneously, and each program may be independent. Therefore, since the system 10 preferably includes multiple S machines 12, a multiple instruction set architecture (IS
A) is activated simultaneously (ie, in parallel) at any time except during system initialization or reconfiguration. That is, at any time, multiple sets of program instructions are executed simultaneously, and each set of program instructions is executed according to a corresponding instruction set architecture (ISA). Each such instruction set architecture (ISA) is unique.

【０１２４】Ｓマシン１２は、（複数の）Ｔマシン１４
と、汎用相互結合マトリックス（ＧＰＩＭ）１６と、各
入出力Ｔマシン１８とを経て、互いに、また入出力装置
２０と通信する。各Ｓマシン１２は、独立した演算を実
行できる、それ自体完全なコンピュータであるが、どの
Ｓマシン１２もその他のＳマシン１２またはシステム１
０全体についてマスタＳマシン１２として機能すること
ができ、データ及び／またはコマンドをその他のＳマシ
ン１２に、１個またはそれ以上のＴマシン１６に、１個
またはそれ以上の入出力Ｔマシン１８に、１個またはそ
れ以上の入出力装置２２に送ることができる。The S machine 12 is composed of a plurality of T machines 14
, A general interconnect matrix (GPIM) 16 and each input / output T machine 18 to communicate with each other and with input / output devices 20. Each S-machine 12 is itself a complete computer, capable of performing independent operations, but any S-machine 12 can be any other S-machine 12 or system 1
0 can function as a master S-machine 12 and transfer data and / or commands to other S-machines 12 to one or more T-machines 16 to one or more I / O T-machines 18 , To one or more input / output devices 22.

【０１２５】したがって、本発明のシステム１０は、空
間的及び時間的に１つまたはそれ以上のデータ並列（サ
ブ）問題に分割できる問題、たとえば画像処理、医療用
データ処理、校正済みカラーマッチング、データベース
計算、ドキュメントの処理、連想探索エンジン、及びネ
ットワークサーバについて特に有用である。オペランド
列が多い計算問題については、並列計算法によって効率
的な計算の高速化が得られるようにアルゴリズムを適用
できるときには、データが並列していることになる。デ
ータ並列問題は既知の複雑さを含んでいて、これはＯ
（ｎ^k）で表される。ｋの値は問題によって定まる。た
とえば画像処理ではｋ＝２であり、医療用データ処理で
はｋ＝３である。本発明では、各Ｓマシン１２はプログ
ラム命令グループのレベルでデータの並列性を活用する
のに用いられるのが好ましい。システム１０は多重Ｓマ
シン１２を含んでいるので、システム１０はプログラム
全体のレベルでデータの並列性を活用するのに用いられ
るのが好ましい。Thus, the system 10 of the present invention is a system that can be spatially and temporally divided into one or more data parallel (sub) problems, such as image processing, medical data processing, calibrated color matching, database It is particularly useful for calculations, document processing, associative search engines, and network servers. For a computation problem with many operand strings, data can be in parallel if the algorithm can be applied so that efficient computation can be speeded up by the parallel computation method. The data parallel problem has a known complexity, which is
(N ^k ). The value of k depends on the problem. For example, k = 2 in image processing and k = 3 in medical data processing. In the present invention, each S-machine 12 is preferably used to exploit data parallelism at the level of a program instruction group. Since system 10 includes multiple S machines 12, system 10 is preferably used to exploit data parallelism at the level of the entire program.

【０１２６】任意の瞬間で必要な計算に対して、このよ
うなハードウェアの計算性能を最適なものとするため
に、各Ｓマシン１２の命令処理ハードウェアを完全に再
構成できるので、本発明のシステム１０によって大規模
な計算力が得られる。各Ｓマシン１２は、他のＳマシン
１２とは無関係に再構成することができる。システム１
０は、ソフトウェアと、この明細書で説明した再構成ハ
ードウェアとの間のプログラムされた境界、すなわちイ
ンタフェースとして各構成データセットを、したがって
各命令セットアーキテクチャ（ＩＳＡ）を扱うのが有利
である。さらに本発明のアーキテクチャによって、本来
の場所で実際のシステムの問題を選択的に扱うために再
構成ハードウェアを高レベルに構築することが容易とな
り、こうした問題には、割込みが命令処理に影響する方
法と、リアルタイム処理とコンピュータ性能とを容易に
する決定待ち時間応答の必要性と、欠陥処理に対する選
択可能な応答の必要とが含まれる。The present invention can completely reconfigure the instruction processing hardware of each S machine 12 in order to optimize the calculation performance of such hardware for the calculation required at any moment. The system 10 provides a large amount of computing power. Each S machine 12 can be reconfigured independently of the other S machines 12. System 1
0 advantageously treats each configuration data set as a programmed boundary, or interface, between software and the reconfiguration hardware described herein, and thus each instruction set architecture (ISA). Further, the architecture of the present invention facilitates building reconfigurable hardware at a high level to selectively address real system problems in situ, where interrupts affect instruction processing. Methods, including the need for a decision latency response that facilitates real-time processing and computer performance, and the need for a selectable response to defect processing.

【０１２７】その他のコンピュータアーキテクチャとは
異なり、本発明はいつでもシリコンリソースを最大限に
利用できることを開示している。本発明は、いつでも所
望のサイズに拡大できる並列コンピュータシステムを提
供し、その規模は数千個のＳマシン１２からなる大規模
な並列システムでも可能である。このようなアーキテク
チャの拡張性は、Ｓマシンベース命令処理がＴマシンベ
ースデータ通信から意図的に分離されているので可能と
なっている。この命令処理／データ通信分離モデルは、
データ並列計算にきわめて適している。Ｓマシンハード
ウェアの内部構造は、命令のタイムフローについて最適
化されるのが好ましいが、Ｔマシンハードウェアの内部
構造は、有効なデータ通信について最適化されるのが好
ましい。Ｓマシン１２のセットとＴマシン１４のセット
は、それぞれデータ並列計算の空間的・時間的区分にお
いて分離可能で構成可能なコンポーネントである。[0127] Unlike other computer architectures, the present invention discloses that silicon resources can be utilized to the full at all times. The present invention provides a parallel computer system that can be expanded to a desired size at any time, and a large parallel system including thousands of S machines 12 is possible. Such scalability of the architecture is possible because S-machine based instruction processing is intentionally decoupled from T-machine based data communication. This instruction processing / data communication separation model is
Very suitable for data parallel computing. The internal structure of the S machine hardware is preferably optimized for instruction time flow, while the internal structure of the T machine hardware is preferably optimized for valid data communication. The set of S machines 12 and the set of T machines 14 are separable and configurable components in the spatial and temporal divisions of data parallel computation, respectively.

【０１２８】本発明を用いると、この明細書で説明した
全体的構造を維持しながらさらに優れた計算性能を有す
るシステムを構築するのに将来の再構成ハードウェアを
利用することができるかもしれない。言い換えれば、本
発明のシステム１０は技術的に拡張可能である。現在用
いられているほとんどすべての再構成論理装置は、メモ
リベース相補型金属酸化膜半導体（ＣＭＯＳ）技術を用
いている。装置の能力の進歩は、半導体メモリ技術の流
れ（傾向）に追随している。将来のシステムでは、Ｓマ
シン１２を構築するのに用いられる再構成論理装置は、
この明細書に説明した内部ループ及び外部ループ命令セ
ットアーキテクチャ（ＩＳＡ）に従った内部ハードウェ
アリソースの１区分を含むことになるであろう。大規模
の再構成論理装置であっても、単一の装置内でより多く
のデータ並列計算を実行する能力を単に提供するにすぎ
ない。たとえば図１３を用いて上記に説明したデータ演
算装置（ＤＯＵ）６３の第２模範実施例に含まれる機能
単位１９４が大きいと、より大きなサイズの画像処理カ
ーネルを含むことになるであろう。当業者は、本発明に
より提供される技術的拡張性がＣＭＯＳベース装置に限
定されず、またフィールドプログラマブルゲートアレイ
（ＦＰＧＡ）ベース実装にも限定されないことを認める
であろう。したがって、本発明は再構成可能性または再
プログラマブル性を得るために用いられる特定の技術と
は無関係に、技術的拡張性を提供する。Using the present invention, it may be possible to utilize future reconfiguration hardware to build a system with better computational performance while maintaining the overall structure described in this specification. . In other words, the system 10 of the present invention is technically scalable. Nearly all currently used reconfigurable logic devices use memory-based complementary metal oxide semiconductor (CMOS) technology. Advances in device capabilities have been following the trends in semiconductor memory technology. In future systems, the reconfigurable logic used to build the S-machine 12 will be:
It would include a partition of internal hardware resources according to the inner loop and outer loop instruction set architecture (ISA) described herein. Even large-scale reconfigurable logic devices simply provide the ability to perform more data parallel computations within a single device. For example, a larger functional unit 194 included in the second exemplary embodiment of the data operation unit (DOU) 63 described above with reference to FIG. 13 would include a larger size image processing kernel. One skilled in the art will recognize that the technical scalability provided by the present invention is not limited to CMOS-based devices and is not limited to field-programmable gate array (FPGA) -based implementations. Thus, the present invention provides technical scalability independent of the particular technique used to achieve reconfigurability or reprogrammability.

【０１２９】図２２は、拡張性、並列、動的再構成計算
のための好ましい方法のフローチャートである。図２２
の方法は、システム１０内の各Ｓマシン１２内で実行さ
れるのが好ましい。この好適な方法は、図２２のステッ
プ１０００から始まり、再構成ロジック１０４が命令セ
ットアーキテクチャ（ＩＳＡ）に対応する構成データセ
ットを検索する。次にステップ１００２で、再構成ロジ
ック１０４はステップ１０００で検索した構成データセ
ットに従って、命令取出し装置（ＩＦＵ）６０と、デー
タ演算装置（ＤＯＵ）６２と、アドレス演算装置（ＡＯ
Ｕ）６４内の各構成部分を構成し、これによって現在検
討中の命令セットアーキテクチャ（ＩＳＡ）の実装のた
めの動的再構成処理装置（ＤＲＰＵ）ハードウェア編成
が得られる。ステップ１００２のあと、ステップ１００
４で割込みロジック１０６はアーキテクチャ記述メモリ
１０１に記憶された割込み応答信号を検索し、現在の動
的再構成処理装置（ＤＲＰＵ）構成が割込みにどのよう
に応答するかを定める遷移制御信号の対応するセットを
生成する。その後、命令セットアーキテクチャ（ＩＳ
Ａ）１００はステップ１００６でプログラム状態情報を
初期化する。その後命令セットアーキテクチャ（ＩＳ
Ａ）１００はステップ１００８で命令実行サイクルを開
始する。FIG. 22 is a flowchart of a preferred method for scalable, parallel, dynamic reconfiguration computation. FIG.
Is preferably executed in each S machine 12 in the system 10. The preferred method begins at step 1000 of FIG. 22, where the reconstruction logic 104 retrieves a configuration data set corresponding to an instruction set architecture (ISA). Next, in step 1002, the reconstruction logic 104 according to the configuration data set retrieved in step 1000, the instruction fetch unit (IFU) 60, the data arithmetic unit (DOU) 62, and the address arithmetic unit (AO)
U) 64, which provides a dynamic reconfigurable processor (DRPU) hardware organization for an instruction set architecture (ISA) implementation currently under consideration. After step 1002, step 100
At 4, the interrupt logic 106 retrieves the interrupt response signal stored in the architecture description memory 101 and corresponds to the transition control signal that determines how the current dynamic reconfiguration processor (DRPU) configuration responds to the interrupt. Generate a set. After that, the instruction set architecture (IS
A) 100 initializes program state information in step 1006. Then the instruction set architecture (IS
A) 100 starts an instruction execution cycle in step 1008.

【０１３０】次にステップ１０１０では、命令セットア
ーキテクチャ（ＩＳＡ）１００または割込みロジック１
０６は再構成が必要かどうかを決定する。プログラム実
行中に再構成指示が選択されるときには、命令セットア
ーキテクチャ（ＩＳＡ）１００は再構成が必要であると
決定する。割込みロジック１０６は、再構成割込みに応
じて再構成が必要であると決定する。再構成が必要なと
きには、この優先的方法はステップ１０１２に進み、こ
こで再構成ハンドラーはプログラム状態情報をセーブす
る。プログラム状態情報は、現在の動的再構成処理装置
（ＤＲＰＵ）構成に対応した構成データセットへの引照
を含んでいることが好ましい。ステップ１０１２のあ
と、この優先的方法はステップ１０００に戻り、再構成
指示または再構成割込みによって引照される次の構成デ
ータセットを検索する。Next, at step 1010, the instruction set architecture (ISA) 100 or the interrupt logic 1
06 determines if reconstruction is needed. When a reconfiguration instruction is selected during program execution, instruction set architecture (ISA) 100 determines that reconfiguration is required. Interrupt logic 106 determines that reconfiguration is required in response to the reconfiguration interrupt. If reconfiguration is required, the preferred method proceeds to step 1012, where the reconfiguration handler saves program state information. Preferably, the program state information includes a reference to a configuration data set corresponding to a current Dynamic Reconfiguration Processor (DRPU) configuration. After step 1012, the preferred method returns to step 1000 to retrieve the next set of configuration data referenced by the reconfiguration instruction or reconfiguration interrupt.

【０１３１】ステップ１０１０で再構成が必要とされな
いときには、ステップ１０１４で割込みロジック１０６
は非再構成割込みを実施する必要があるかどうかを決定
する。必要な場合には、次にステップ１０２０で、命令
セットアーキテクチャ（ＩＳＡ）１００は命令実行サイ
クル内の現在の命令状態シーケンサ（ＩＳＳ）状態から
の割込み実施状態への遷移が遷移制御信号に基づいて許
容されるかどうかを決定する。割込み実施状態への遷移
が許容されないときには、命令セットアーキテクチャ
（ＩＳＡ）１００は命令実行サイクルの次の状態に進
み、ステップ１０２０に戻る。遷移制御信号によって命
令実行サイクル内の現在の命令状態シーケンサ（ＩＳ
Ｓ）状態からの割込み実施状態への遷移が許容されると
きには、次にステップ１０２４で命令セットアーキテク
チャ（ＩＳＡ）１００は割込み実施状態へと進む。ステ
ップ１０２４で、命令セットアーキテクチャ（ＩＳＡ）
１００はプログラム状態情報をセーブし、割込みを実施
するためのプログラム命令を実行する。ステップ１０２
４の後、この優先的方法はステップ１００８に戻り、現
在の命令実行サイクルが完了していなかったときにはこ
れを再開し、（完了していたときには）次の命令実行サ
イクルを開始する。If no reconfiguration is required at step 1010, then at step 1014 the interrupt logic 106
Determines if a non-reconfiguration interrupt needs to be performed. If necessary, then, at step 1020, the instruction set architecture (ISA) 100 allows the transition from the current instruction state sequencer (ISS) state to the interrupt execution state in the instruction execution cycle based on the transition control signal. To determine if it is. When the transition to the interrupt execution state is not allowed, the instruction set architecture (ISA) 100 proceeds to the next state of the instruction execution cycle and returns to step 1020. The current instruction state sequencer (IS
When the transition from the S) state to the interrupt execution state is permitted, the instruction set architecture (ISA) 100 proceeds to the interrupt execution state in step 1024. At step 1024, the instruction set architecture (ISA)
100 saves program state information and executes program instructions to perform interrupts. Step 102
After 4, the preferred method returns to step 1008 to resume the current instruction execution cycle if it was not completed, and to start the next instruction execution cycle (if it was).

【０１３２】ステップ１０１４で非再構成割込みを実施
する必要がないときには、この優先的方法はステップ１
０１６に進み、現在のプログラムの実行が完了している
かどうかを決定する。現在のプログラムの実行を継続す
べきときには、この優先的方法はステップ１００８に戻
り、別の命令実行サイクルを開始する。それ以外の場合
には、この優先的方法は終了する。If it is not necessary to perform a non-reconfiguration interrupt at step 1014, this preferential method is
Proceeding to 016, it is determined whether the execution of the current program has been completed. If execution of the current program is to continue, the preferred method returns to step 1008 to begin another instruction execution cycle. Otherwise, the priority method ends.

【０１３３】本発明は、本発明のアーキテクチャにより
必要とされるメモリ演算を実行するためのメタアドレス
指定メカニズムを組入れている。本発明によれば、Ｔマ
シン１４はアドレス指定マシンとして用いられる。Ｔマ
シン１４は、割込み処理と、メッセージの待ち行列設定
と、メタアドレス生成と、データパケットの全体的転送
の制御とを実行する。図２３は、本発明に基づくデータ
パケット１８００の構成図である。データパケット１８
００は、データ部分１８２４と、コマンド部分１８２０
と、ソース地理アドレス１８１６と、サイズ区切り記号
１８１２と、目的ローカルアドレスと、目的地理アドレ
ス１８０４とを含んでいる。メタアドレス１８２８は、
目的地理アドレス１８０４と、目的ローカルメモリアド
レス１８０８とを含んでいる。目的ローカルメモリアド
レス１８０８は、データパケット１８００のデータをロ
ーカルメモリ３４のどこに書込むかを指定する。目的地
理アドレス、すなわち相互結合アドレス１８０４は、ど
のＴマシン１４がデータパケット１８００を受取るべき
かを指定する。ソース地理アドレス１８１６は、データ
パケット１８００を生成したＴマシン１４を指定する。The present invention incorporates a meta-addressing mechanism for performing the memory operations required by the architecture of the present invention. According to the invention, the T machine 14 is used as an addressing machine. The T-machine 14 performs interrupt handling, message queuing, meta-address generation, and control of the overall transfer of data packets. FIG. 23 is a configuration diagram of the data packet 1800 based on the present invention. Data packet 18
00 is a data part 1824 and a command part 1820
, A source geographic address 1816, a size separator 1812, a destination local address, and a destination geographic address 1804. The meta address 1828 is
A destination geographic address 1804 and a destination local memory address 1808 are included. Destination local memory address 1808 specifies where in local memory 34 the data of data packet 1800 is to be written. Destination geographic address, or interconnect address 1804, specifies which T-machine 14 should receive data packet 1800. Source geographic address 1816 specifies the T-machine 14 that generated the data packet 1800.

【０１３４】任意の２対のソース地理アドレス１８１６
及び宛先（目的）地理アドレス１８０４によって、２６
４ビットのローカルアドレススペースへの１つの経路
（パス）が一意的に決定される。しかし、システムには
このような経路（パス）が２つ以上存在し、並列に作動
することができる。Ｓマシン１２はそれに結合されたＴ
マシン１４を含み、その数はローカルメモリ帯域幅に相
当する数まで、または持ち行列効果を考慮した任意の数
までを含むことができる。したがって、本発明では２の
不確定累乗分だけの拡張性が可能であり、またシステム
内のプロセッサが不均一であってもよく、さらに各Ｓマ
シン１２へのユニークなパスの数を任意に拡張すること
ができる。この種の拡張性は、分散画像処理など多くの
アプリケーションで重要であり、動的再構成処理構成部
分のピラミッドまたはツリーは、このシステムの高いレ
ベルに対しさらに広い通信帯域幅を提供できるよう構成
できるかもしれない。望む場合には、より多くの等速度
Ｔマシン１４がＳマシン１２のピラミッドの高いレベル
にアクセスできるようにすることによって、このピラミ
ッドアーキテクチャを実装し、アドレス指定能力を最も
必要とするＳマシン１２にこの能力を与える。これによ
って、システムリソースをほとんどの処理及び通信タス
クに集中させることができるので、対費用効果のすぐれ
たシステムが得られる。Any two pairs of source geographic addresses 1816
And destination (destination) geographic address 1804, 26
One path to the 4-bit local address space is uniquely determined. However, there are two or more such paths in the system and can operate in parallel. S machine 12 has a T coupled to it.
The number of machines 14 may be included, up to a number corresponding to the local memory bandwidth, or any number taking account of the matrix effect. Therefore, in the present invention, scalability by an indeterminate power of 2 is possible, the processors in the system may be uneven, and the number of unique paths to each S machine 12 can be arbitrarily expanded. can do. This type of scalability is important in many applications, such as distributed image processing, and the pyramids or trees of the dynamic reconstruction component can be configured to provide more communication bandwidth for the higher levels of the system. Maybe. If desired, implement this pyramid architecture by allowing more constant-speed T-machines 14 to access the higher levels of the pyramids of S-machines 12 to provide the S-machines 12 with the greatest need for addressability. Give this ability. This results in a cost-effective system because system resources can be concentrated on most processing and communication tasks.

【０１３５】好ましい実施例では、メタアドレスは８０
ビット幅である。この実施例では、地理アドレスは１６
ビットであり、ローカルメモリアドレスは６４ビット幅
である。１６ビットの地理アドレスにより、６５５３６
個の地理アドレスを指定することができる。６４ビット
のローカルメモリアドレスにより、各ローカルメモリ３
４内に２⁶⁴の個別のアドレス可能ビットを指定すること
ができる。各Ｓマシン１２は、特定のＳマシン１２につ
いて構成されるローカルメモリ３４を含んでいる。Ｓマ
シン１２とメモリ３４が互いに分離しているので、各メ
モリのサイズや構造が均一である必要はなく、またメモ
リ全体のコヒーレント性や一致性を維持する必要もな
い。ソースＳマシン１２のプログラム命令が目的Ｓマシ
ン１２のローカルメモリ３４のアーキテクチャを意識し
て書かれたものであり、またこのプログラム命令がメモ
リ位置を正確に指定する限り、そのサイズやレイアウト
とは無関係に、目的Ｓマシン１２のローカルメモリ３４
に容易にアドレス指定することができる。こうしたモジ
ュール性を備えているため、問題の取扱いとは無関係
に、さまざまなコンポーネントを用いてこのアーキテク
チャのサイズを拡大・縮小することができる。新しいＳ
マシンを統合する方法も大幅に単純化されている。新し
いＳマシン１２をシステムに加えるときには、そのＳマ
シン１２について新しい地理アドレスを選択し、新しい
Ｓマシン１２の使用を要求するプログラムに新しいアド
レスが与えられる。新しいＳマシン１２を利用するよう
設計されたプログラムに新しいアドレスがいったん組入
れられると、解決すべき問題は生じず、また計算を実行
する必要もなく、Ｓマシン１２が統合される。In the preferred embodiment, the metaaddress is 80
Bit width. In this example, the geographic address is 16
Bits, and the local memory address is 64 bits wide. 65536 with a 16 bit geographic address
Individual geographic addresses can be specified. Each local memory 3 has a 64-bit local memory address.
Within 4 2 ⁶⁴ individually addressable bits can be specified. Each S-machine 12 includes a local memory 34 configured for a particular S-machine 12. Since the S machine 12 and the memory 34 are separated from each other, the sizes and structures of the memories do not need to be uniform, and it is not necessary to maintain coherency and consistency of the entire memory. The source S machine 12 program instructions are written with the architecture of the local memory 34 of the destination S machine 12 in mind, and are independent of their size and layout as long as the program instructions specify memory locations accurately. The local memory 34 of the destination S machine 12
Can be easily addressed. This modularity allows the architecture to be scaled up and down using a variety of components, independent of the problem handling. New S
The way machines are integrated has also been greatly simplified. When a new S-machine 12 is added to the system, a new geographical address is selected for the S-machine 12 and the program requesting use of the new S-machine 12 is given the new address. Once the new address is incorporated into a program designed to take advantage of the new S-machine 12, the S-machine 12 is integrated without any problems to solve and without having to perform any calculations.

【０１３６】図２４は、遠隔演算を要求するための本発
明のＳマシン１２の処理の流れを示すフローチャートで
ある。ステップ１９００で、Ｓマシン１２は命令を受取
る。ステップ１９０４でＳマシン１２は、この命令が遠
隔演算を要求しているかどうかを決定する。この命令が
遠隔演算を要求していないときには、ステップ１９０６
でこの命令が実行される。命令が遠隔演算を要求してい
るときには、ステップ１９０４で遠隔演算情報はローカ
ルメモリに記憶される。下記に説明するようにステップ
１９２０に進んだ後、Ｓマシン１２は遠隔演算が要求さ
れているかどうかを示す命令コード内のフラグの状態を
調べることによって、命令が遠隔演算を要求していると
決定する。遠隔演算とは、結果を得るために異なるＳマ
シン１２を使用する必要がある演算である。遠隔演算情
報は、Ｓマシン１２によって実行されるプログラムによ
り提供され、遠隔演算の実行が望まれる場合にローカル
メモリ３４に記憶される。遠隔演算を記憶するにはロー
カルメモリ３４の一定のメモリ位置を用いるのが好まし
く、このようにすれば、Ｔマシン１４はただちに情報に
アクセスでき、最初にアドレスを取得する必要はない。
遠隔演算情報は、一般に遠隔Ｔマシン１４の目的地理ア
ドレス１８０４と、遠隔Ｓマシン１２にデータを記憶
し、または遠隔Ｓマシン１２からデータを検索するため
の目的ローカルメモリアドレス１８０８と、コマンド情
報１８２０と、サイズ情報１８１２と、データ１８２４
とを含んでいる。命令が遠隔演算を必要とすると決定さ
れ次第、これらの情報はすべてＳマシン１２によってロ
ーカルメモリ３４に記憶される。FIG. 24 is a flowchart showing the flow of the processing of the S machine 12 of the present invention for requesting a remote operation. At step 1900, S machine 12 receives the instruction. In step 1904, S machine 12 determines whether the instruction requires remote computing. If this instruction does not require remote operation, step 1906
Executes this instruction. If the instruction requires remote operation, at step 1904 the remote operation information is stored in local memory. After proceeding to step 1920 as described below, the S-machine 12 determines that the instruction requires remote operation by examining the state of a flag in the instruction code that indicates whether remote operation is required. I do. A remote operation is an operation that requires the use of a different S-machine 12 to obtain a result. The remote operation information is provided by a program executed by the S machine 12 and stored in the local memory 34 when the execution of the remote operation is desired. Preferably, certain memory locations in local memory 34 are used to store remote operations, so that T-machine 14 has immediate access to information and does not need to obtain an address first.
The remote computing information generally includes a target geographic address 1804 of the remote T machine 14, a target local memory address 1808 for storing data on or retrieving data from the remote S machine 12, command information 1820, , Size information 1812 and data 1824
And All of this information is stored by the S-machine 12 in local memory 34 as soon as the instruction is determined to require remote operation.

【０１３７】１つの実施例では、ステップ１９１２でＳ
マシン１２が、遠隔演算が必要であることを示すようＴ
マシンに無条件命令を発する。無条件命令は、Ｔマシン
１４が認識するよう設計されている一意的なコマンド列
である。無条件命令は、一般に遠隔演算情報がローカル
メモリ３４に納められているメモリアドレスと、アドレ
ス指定情報のサイズを示すサイズ区切り記号とを含んで
いる。これは遠隔演算情報の開始アドレスと一連のサイ
ズ区切り記号を単に指定するだけで、Ｓマシン１２によ
って実行中のプログラムによって１度に複数の遠隔演算
を要求することができる。そのときＴマシン１４は、情
報について異なる要求を逐次処理することができる。次
にステップ１９２０で、Ｓマシン１２は実行すべきその
他の命令があるかどうかを決定する。命令が存在すると
きには、次の命令を受取り実行する。したがって、Ｓマ
シン１２は遠隔演算の要求があってもそれとは無関係
に、命令の実行をほぼ瞬間的に行うことができる。Ｔマ
シン１４がデータの転送と検索を実行するので、Ｓマシ
ン１２の処理能力は、命令の処理のみに集中することが
できる。図２５は、Ｓマシン１２から無条件命令を受取
るＴマシン１４の処理のフローチャートである。まずス
テップ２０００でＴマシン１４は、制御ライン４８上で
Ｓマシン１２から受取ったコマンドが無条件命令である
かどうかを決定する。コマンドが無条件命令であると決
定したら、ステップ２００４でＴマシンはメモリ／デー
タライン４６を経てローカルメモリ３４から遠隔演算情
報を検索する。遠隔演算情報は、Ｔマシン１４がデータ
を検索する際に、遠隔演算情報を検索するたびに新しい
メモリアドレスを決定する必要がないようにメモリ３４
の一定の場所に納められるのが好ましい。遠隔演算情報
は、ローカルメモリ３４の任意の場所に記憶することも
できる。しかしこの場合には、情報の場所は無条件命令
の一部として伝送されなければならない。遠隔演算情報
を検索した後、Ｔマシン１４、特にＴマシン１４の共通
インタフェース制御装置（ＣＩＣＵ）３０２コンポーネ
ントは、ステップ２００８で情報からメタアドレス１８
２８を生成する。目的ローカルメモリアドレス１８０８
は、メタアドレス１８２８を形成するために、目的地理
アドレス１８０４に添付される。次にステップ２１１２
で、Ｔマシン１４は残りの遠隔演算情報からデータパケ
ット１８００を生成し、要求されている宛先に伝送する
ためにデータパケット１８００を相互結合装置または汎
用相互結合マトリックス（ＧＰＩＭ）１６に伝送する。In one embodiment, at step 1912 S
Machine 12 sends T to indicate that remote computing is required.
Issue unconditional instructions to the machine. An unconditional instruction is a unique command sequence designed to be recognized by the T machine 14. The unconditional instruction generally includes a memory address where the remote operation information is stored in the local memory 34, and a size delimiter indicating the size of the addressing information. This allows multiple remote operations to be requested at once by the program running by the S-machine 12, simply by specifying the starting address of the remote operation information and a series of size delimiters. At that time, the T-machine 14 can sequentially process different requests for information. Next, in step 1920, the S machine 12 determines whether there are any other instructions to execute. When an instruction exists, the next instruction is received and executed. Therefore, the S machine 12 can execute the instruction almost instantaneously regardless of the request for the remote operation regardless of the request. Since the T machine 14 performs the data transfer and the search, the processing capability of the S machine 12 can be concentrated only on the processing of the instruction. FIG. 25 is a flowchart of the processing of the T machine 14 that receives an unconditional instruction from the S machine 12. First, at step 2000, T machine 14 determines whether the command received from S machine 12 on control line 48 is an unconditional command. If the command is determined to be an unconditional command, the T machine retrieves the remote computation information from local memory 34 via memory / data line 46 in step 2004. The remote operation information is stored in the memory 34 so that the T machine 14 does not need to determine a new memory address each time the remote operation information is searched when searching for data.
It is preferable to be stored in a certain place. The remote operation information can be stored in any location of the local memory 34. However, in this case, the location of the information must be transmitted as part of the unconditional command. After retrieving the telecomputed information, the T-machine 14, and particularly the Common Interface Controller (CICU) 302 component of the T-machine 14, in step 2008 retrieves the meta-address 18 from the information.
28 is generated. Destination local memory address 1808
Is attached to destination geographic address 1804 to form meta-address 1828. Next, step 2112
At T, the T-machine 14 generates a data packet 1800 from the remaining telecomputation information and transmits the data packet 1800 to an interconnect device or general interconnect matrix (GPIM) 16 for transmission to the required destination.

【０１３８】ソース地理アドレス１８１６は、プログラ
ム命令によって指定してもよく、したがってＴマシン１
４による検索のためにローカルメモリ３４に記憶しても
良い。またソース地理アドレス１８１６は、アーキテク
チャ記述メモリ（ＡＤＭ：Architecture Description M
emory）１０１に記憶されるのが好ましい。アーキテク
チャ記述メモリ（ＡＤＭ）１０１は、それが結合されて
いるＴマシン１４の地理アドレスを記憶する変更可能な
メモリである。アーキテクチャ記述メモリ（ＡＤＭ）１
０１を用いて、システム全体の地理アドレスを明白に変
更することができる。このシステムの実施例では、Ｔマ
シン１４はアーキテクチャ記述メモリ（ＡＤＭ）１０１
からソース地理アドレス１８１６を検索し、Ｔマシン１
４自体の最も新しいソース地理アドレス１８１６を用い
ていることを確かめる。多重共通インタフェース制御装
置（ＣＩＣＵ）３０２が各Ｓマシン１２に結合された実
施例では、各共通インタフェース制御装置（ＣＩＣＵ）
３０２の地理アドレスはアーキテクチャ記述メモリ（Ａ
ＤＭ）１０１に記憶される。The source geographic address 1816 may be specified by a program instruction, thus
4 may be stored in the local memory 34 for retrieval. The source geographic address 1816 is stored in an architecture description memory (ADM).
emory) 101. The architecture description memory (ADM) 101 is a changeable memory that stores the geographic address of the T machine 14 to which it is coupled. Architecture description memory (ADM) 1
01 can be used to explicitly change the geographic address of the entire system. In an embodiment of this system, T machine 14 has an architecture description memory (ADM) 101.
From the T18 machine
4. Make sure it is using its own newest source geographic address 1816. In an embodiment where multiple common interface controllers (CICUs) 302 are coupled to each S machine 12, each common interface controller (CICU)
The geographic address of 302 is stored in the architecture description memory (A
DM) 101.

【０１３９】図２６は、相互結合装置を経て伝送された
データパケットを受取るためのＴマシン１４の処理を示
すフローチャートである。ステップ２１００でＴマシン
１４は、相互結合装置からデータパケットを受取る。ス
テップ２１０４でＴマシン１４は、メタアドレス１８２
８の目的地理アドレス１８０４コンポーネントを解析す
ることによって、データパケット１８００を復号する。
上記に説明したようにＴマシンのアドレス復号器３２０
は、データパケット１８００を復号する。ステップ２１
０８でアドレス復号器３２０は、目的地理アドレス１８
０４と、関連する地理アドレスとを比較する。変更可能
なアーキテクチャ記述メモリ（ＡＤＭ）１０１を用いる
実施例では、アドレス復号器３２０は受取った目的地理
アドレス１８０４とアーキテクチャ記述メモリ（ＡＤ
Ｍ）１０１内に記憶したアドレスとを比較する。ステッ
プ２１１２で、地理アドレスが一致するとアドレス復号
器３２０が決定したときには、データパケット１８００
はローカルメモリアドレス１８０８によって指定された
メモリ３４の場所に伝送される。データパケット１８０
０は解析され、データはメモリ／データライン４６を経
て送られ、コマンドは制御ライン４８を経て送られる。
アドレス情報は、アドレスライン４４を経て送られる。
各アドレスが一致しないときには、エラーメッセージが
バイパスＦＩＦＯ３２４と、ＭＵＸ３２８と、汎用相互
結合マトリックス（ＧＰＩＭ）１６とを通ってデータパ
ケット１８００のソース地理アドレス１８１６コンポー
ネントによって識別されたＴマシン１４に伝送される。
誤ってアドレスされたデータパケット１８００をＴマシ
ン１４が受取った場合も、上記に説明したのと同じプロ
セスを用いる。新しいデータパケット１８００を受取っ
た際に、ＣＩＣＵ３０４がデータパケット１８００を組
立てているか解体しているときには、ＣＩＣＵ３０４が
データを受取り処理できるようになるまでＴマシン１４
はデータパケット１８００を入力ＦＩＦＯ３２２に送
り、待機させる。FIG. 26 is a flowchart showing a process of the T machine 14 for receiving a data packet transmitted via the interconnection device. At step 2100, T machine 14 receives a data packet from the interconnection device. In step 2104, the T machine 14 sets the meta address 182
The data packet 1800 is decoded by analyzing the eight destination geographic address 1804 components.
As described above, the address decoder 320 of the T machine
Decrypts the data packet 1800. Step 21
At 08, the address decoder 320 sets the destination geographical address 18
04 and the associated geographic address. In an embodiment using a modifiable architecture description memory (ADM) 101, address decoder 320 receives received geographical address 1804 and architecture description memory (AD).
M) Compare with the address stored in 101. If the address decoder 320 determines in step 2112 that the geographic addresses match, the data packet 1800
Is transmitted to the location in memory 34 specified by local memory address 1808. Data packet 180
A 0 is parsed, the data is sent via a memory / data line 46, and the command is sent via a control line 48.
The address information is sent via an address line 44.
If the addresses do not match, an error message is transmitted through the bypass FIFO 324, the MUX 328, and the general interconnect matrix (GPIM) 16 to the T machine 14 identified by the source geographic address 1816 component of the data packet 1800.
The same process described above is used when the T-machine 14 receives an incorrectly addressed data packet 1800. When a new data packet 1800 is received and the CICU 304 is assembling or disassembling the data packet 1800, the T-machine 14 waits until the CICU 304 can receive and process the data.
Sends the data packet 1800 to the input FIFO 322 and waits.

【０１４０】別の実施例では、Ｔマシン１４はメッセー
ジの優先度を認識するように設計されており、Ｓマシン
に新しいコマンドを処理させるのが適切であるときに
は、Ｓマシン１２の処理に割込む。この実施例では、図
２７に示すように、共通インタフェース制御装置（ＣＩ
ＣＵ）３０２は割込みロジック２２００と、コンパレー
タ（比較器）２２０４と、認識装置２２０８とを含む追
加コンポーネントをさらに含んでいる。図２８は、共通
インタフェース制御装置（ＣＩＣＵ）３０２の割込み処
理能力を示すフローチャートである。ステップ２３００
で認識装置２２０８は、アドレス復号３２０によってア
ドレスを確認したあと、データパケット１８００を解析
し、パースコマンド１８２０を識別する。ステップ２３
０４で認識装置２２０８は、コマンド１８２０が割込み
要求であるかどうかを決定する。データパケット１８０
０が割込み要求であるときには、コマンド１８２０は割
込みＩＤを含むことになる。コマンド１８２０が割込み
ＩＤを含んでいないときには、上記に説明したような処
理を行うためにステップ２３０８でデータパケットは共
通インタフェース制御装置（ＣＩＣＵ）３０２に送られ
る。In another embodiment, the T machine 14 is designed to recognize the priority of the message and interrupt the processing of the S machine 12 when it is appropriate to have the S machine process the new command. . In this embodiment, as shown in FIG. 27, a common interface control device (CI
The CU 302 further includes additional components including an interrupt logic 2200, a comparator 2204, and a recognizer 2208. FIG. 28 is a flowchart showing the interrupt processing capability of the common interface control device (CICU) 302. Step 2300
After recognizing the address by the address decoding 320, the recognizing device 2208 analyzes the data packet 1800 and identifies the parse command 1820. Step 23
At 04, the recognizer 2208 determines whether the command 1820 is an interrupt request. Data packet 180
If 0 is an interrupt request, command 1820 will include an interrupt ID. If the command 1820 does not include an interrupt ID, the data packet is sent to the common interface controller (CICU) 302 in step 2308 to perform the processing described above.

【０１４１】コマンド１８２０が割込みＩＤを含んでい
るときには、割込みＩＤはメモリ３４に結合されている
コンパレータ２２０４に送られる。メモリ３４は割込み
ＩＤのリストを記憶する。各Ｓマシンには、その関連す
るローカルメモリ３４に記憶するようＳマシンが設計さ
れた割込みＩＤのリストを含むのが好ましい。このリス
トによって割込みを識別し、割込みの優先度を指定する
ことができ、またこのリストは割込みを実行するための
指示を含んでいる。ステップ２３１２で、コンパレータ
２２０４は受取ったコマンドの割込みＩＤと、記憶され
ているＩＤのリストとを比較する。コマンドにより指定
された割込みＩＤがリストのＩＤと一致しないときに
は、ステップ２３２０で、エラーメッセージがバイパス
ＦＩＦＯ３２４、ＭＵＸ３２８を経てソース地理アドレ
ス１８１６により指定された宛先に伝送され、また信号
ライン３１４を経て汎用相互結合マトリックス（ＧＰＩ
Ｍ）１６に伝送される。割込みＩＤが記憶されたＩＤと
一致するときには、ステップ２３２４で割込みロジック
２２００は記憶されたＩＤに関連するローカルメモリ３
４に含まれている情報に従って、またはデータパケット
１８００に含まれている情報に従って割込みを処理し、
また得られたコマンドを制御ライン４８を経てＳマシン
１２に送る。When the command 1820 includes an interrupt ID, the interrupt ID is sent to a comparator 2204 coupled to the memory 34. The memory 34 stores a list of interrupt IDs. Each S-machine preferably includes a list of interrupt IDs for which the S-machine is designed to be stored in its associated local memory 34. The list identifies the interrupt, specifies the priority of the interrupt, and includes instructions for executing the interrupt. In step 2312, the comparator 2204 compares the interrupt ID of the received command with the stored list of IDs. If the interrupt ID specified by the command does not match the ID in the list, then at step 2320 an error message is transmitted via bypass FIFO 324, MUX 328 to the destination specified by source geographic address 1816, and a general purpose interconnect via signal line 314. Binding matrix (GPI
M) 16. If the interrupt ID matches the stored ID, then in step 2324 the interrupt logic 2200 determines whether the local memory 3 associated with the stored ID
Processing the interrupt according to the information contained in the data packet 4 or according to the information contained in the data packet 1800;
The obtained command is sent to the S machine 12 via the control line 48.

【０１４２】優先度の比較が可能なときには、割込みロ
ジック２２００は、割込み要求の優先度と、現在入力Ｆ
ＩＦＯ３２２にあるデータパケット１８００の優先度と
を比較する。割込み要求の優先度がＦＩＦＯ３２のデー
タパケット１８００より高いときには、割込み要求の優
先度の低いデータパケット１８００の手前に置かれる。
場合によっては、割込み要求がＳマシン１２の実行を停
止するよう求めることもある。この場合、優先度レベル
はＳマシン１２で実行しているプロセスに割当てられ
る。割込み要求の優先度が現在実行しているプロセスの
優先度より高いときには、割込みロジック２２００は、
Ｓマシン１２が現在の処理を終了し割込み要求の処理を
開始する命令を制御ライン４８を経てＳマシン１２に発
信する。したがって、完全な優先度比較と割込み処理ス
キームは、本発明のアーキテクチャに基づくＴマシン１
４によって実動化され、Ｓマシン１２が追加の処理を行
う必要はほとんどない。When the priority can be compared, the interrupt logic 2200 determines the priority of the interrupt request and the current input F.
The priority of the data packet 1800 in the IFO 322 is compared. When the priority of the interrupt request is higher than the data packet 1800 of the FIFO 32, the interrupt request is placed before the data packet 1800 of the lower priority of the interrupt request.
In some cases, the interrupt request may require that the S machine 12 stop executing. In this case, the priority level is assigned to the process running on the S machine 12. When the priority of the interrupt request is higher than the priority of the currently executing process, the interrupt logic 2200
The S machine 12 sends a command to the S machine 12 via the control line 48 to terminate the current processing and start processing the interrupt request. Thus, a complete priority comparison and interrupt handling scheme is provided for the T machine 1 based on the architecture of the present invention.
4, the S machine 12 rarely needs to perform any additional processing.

【０１４３】したがって、Ｔマシン１４がコンピュータ
システムによって要求されるすべてのメモリ演算機能を
実行するので、Ｓマシン１２はプログラムの主命令を実
行することができる。メモリと命令実行演算を空間的・
時間的に分離することによって、マルチプロセッサで構
成される高度に並列のシステムの処理能力を大幅に最適
化することができる。仮想メモリや共用メモリを使用し
ないので、ハードウェア一致性及びコヒーレンス性演算
を行う必要はない。Ｓマシン１２は異なるレートで作動
することができ、動的に再構成可能なＳマシン１２によ
って実現される命令セットアーキテクチャ（ＩＳＡ）は
異なるものでも良い。さらにＳマシン１２を実動化する
フィールドプログラマブルゲートアレイ（ＦＰＧＡ）
も、特定のタスクについて最適化される。たとえば、埋
込まれた画像を計算する場合、フロントパネルＬＣＤ画
面コントローラを画像処理用に最適化したＳマシン１２
とする必要はない。しかし、システムのすべてのＳマシ
ン１２に対し、別のＳマシンと通信する必要がある各Ｓ
マシン１２が一様にアドレス指定できるようにすること
は、それでもきわめて望ましいことであり、これは上記
に説明したように本発明によって得られる。ソフトウェ
アは、システム全体のコヒーレント性また一致性を得る
ように用いられ、これにはＳマシン１２及びＴマシン１
４のためのメッセージ伝達インタフェース（ＭＰＩ）実
行時（ランタイム）ライブラリなど、また並列仮想マシ
ン（ＰＶＭ）のための実行時ライブラリなどの従来の方
法が用いられる。ＭＰＩもＰＶＭもハードウェア抽象化
層（ＨＡＬ）として機能する。本発明によると、ＨＡＬ
は動的に再構成可能なＳマシン１２と固定式Ｔマシン１
４のためのものである。メモリ演算はソフトウェアによ
って完全に制御されているので、このシステムは動的に
再構成可能であり複雑なハードウェア／ソフトウェア相
互作用の影響を受ない。したがって、独立して分離した
メモリを用い、また別個のアドレス指定マシンと処理マ
シンとを有する完全に拡張性がありアーキテクチャ的に
再構成可能なコンピュータシステムが、高度並列計算環
境で使用するために提供される。メタアドレスを用いる
ことにより、透過性で細かいアドレス指定が可能であ
り、またコンピュータシステムの通信経路をシステムの
要求に応じて割当てたり再割当てすることが可能であ
る。アドレス指定マシンと処理マシンとを分離すること
によって、処理マシンのリソースを処理のみに集中させ
ることができ、処理マシンに多様な命令セットアーキテ
クチャを利用したりさまざまなレートで作動させること
ができ、またそれぞれ最適化されたハードウェアを用い
て実動化することができる。これらのすべてによってシ
ステムの処理力が大幅に向上する。Therefore, S machine 12 can execute the main instruction of the program because T machine 14 performs all the memory operation functions required by the computer system. Spatial memory and instruction execution operations
By separating in time, the processing power of a highly parallel system composed of multiple processors can be greatly optimized. Since no virtual memory or shared memory is used, there is no need to perform hardware consistency and coherence calculations. The S-machines 12 can operate at different rates, and the instruction set architecture (ISA) implemented by the dynamically reconfigurable S-machines 12 can be different. Further, a field programmable gate array (FPGA) for realizing the S machine 12
Are also optimized for specific tasks. For example, when calculating the embedded image, the S-machine 12 with the front panel LCD screen controller optimized for image processing
You don't have to. However, for every S machine 12 in the system, each S machine that needs to communicate with another S machine
Ensuring that the machine 12 is uniformly addressable is still highly desirable, and is obtained by the present invention as described above. The software is used to obtain coherency or consistency of the entire system, including the S machine 12 and the T machine 1
Conventional methods are used, such as a message transfer interface (MPI) run-time (run-time) library for H.4 and a run-time library for a parallel virtual machine (PVM). Both MPI and PVM function as a hardware abstraction layer (HAL). According to the present invention, HAL
Is a dynamically reconfigurable S machine 12 and a fixed T machine 1
For four. Since the memory operations are completely controlled by software, the system is dynamically reconfigurable and is not affected by complex hardware / software interactions. Thus, a fully scalable and architecturally reconfigurable computer system using independently separated memory and having separate addressing and processing machines is provided for use in highly parallel computing environments. Is done. By using meta-addresses, transparent and fine addressing is possible, and communication paths of a computer system can be assigned or reassigned as required by the system. By separating the addressing machine from the processing machine, the resources of the processing machine can be concentrated on processing only, the processing machine can utilize different instruction set architectures and operate at different rates, and Each of them can be implemented using optimized hardware. All of these significantly increase the processing power of the system.

【０１４４】本発明の開示内容は、再プログラマブル計
算または再構成計算のためのその他のシステムと著しく
異なっている。特に、ダウンロード可能なマイクロコー
ドアーキテクチャは一般に非再構成制御手段と非再構成
ハードウェアに依存しているので、本発明はこのような
アーキテクチャとは同等ではない。本発明は、１組の再
構成ハードウェアが非再構成ホストプロセッサまたはホ
ストシステムに結合された付加再構成プロセッサ（ＡＲ
Ｐ）システムとも明白に異なっている。付加再構成可能
プロセッサ（ＡＲＰ）装置は、いくつかのプログラムを
実行するホストに従属している。したがって、ホストま
たは付加再構成可能プロセッサ（ＡＲＰ）装置がそれぞ
れデータについて演算する際に、付加再構成可能プロセ
ッサ（ＡＲＰ）装置またはホストのシリコンリソースが
アイドル状態であるかまたは非効率に用いられるので、
利用できるシリコンリソースはプログラム実行の時間枠
において最大限に利用されない。これに対して各Ｓマシ
ン１２は、プログラム全体を容易に実行することができ
る独立したコンピュータである。多重Ｓマシン１２は、
プログラムを同時に実行することが好ましい。したがっ
て本発明は、個々のＳマシン１２上で実行する各プログ
ラムとシステム１０全体上で実行する多重プログラムと
の両方について、常にシリコンリソースを最大限に利用
することを開示している。The disclosure of the present invention is significantly different from other systems for reprogrammable or reconstructed calculations. In particular, the present invention is not equivalent to a downloadable microcode architecture, as such architectures generally rely on non-reconfigurable control means and non-reconfigurable hardware. The present invention provides an additional reconfigurable processor (AR) in which a set of reconfigurable hardware is coupled to a non-reconfigurable host processor or host system.
P) It is clearly different from the system. Additional reconfigurable processor (ARP) devices are subordinate to hosts that execute some programs. Thus, the silicon resources of the additional reconfigurable processor (ARP) device or the host are idle or inefficiently used when the host or the additional reconfigurable processor (ARP) device respectively operate on the data,
Available silicon resources are not fully utilized in the time frame of program execution. In contrast, each S machine 12 is an independent computer that can easily execute the entire program. Multiple S machine 12
Preferably, the programs are executed simultaneously. Thus, the present invention discloses that silicon resources are always maximized for both each program running on an individual S machine 12 and for multiple programs running on the entire system 10.

【０１４５】付加再構成可能プロセッサ（ＡＲＰ）装置
は、特定の時間で特定のアルゴリズムについて計算アク
セラレータを提供し、特定のアルゴリズムに関して最適
に相互結合された１組のゲートとして実装される。命令
実行の管理などの汎用演算のために再構成ハードウェア
リソースを使用することは、付加再構成可能プロセッサ
（ＡＲＰ）システムでは避けられている。さらに付加再
構成可能プロセッサ（ＡＲＰ）システムは、所定のセッ
トの相互結合を容易に再使用可能なリソースとしては扱
わない。これに対して、本発明は特定の時間における計
算の必要性に最も適した命令実行モデルによる、命令実
行の効率的管理のために構成された動的再構成処理手段
を開示している。Ｓマシン１２は、複数の再使用可能な
リソース、たとえば命令状態シーケンサ（ＩＳＳ）１０
０と、割込みロジック１０６と、記憶／整列ロジック１
５２とを含んでいる。本発明は、相互結合ゲートのレベ
ルではなく、論理ブロック（ＣＬＢ）、入出力ブロック
（ＩＯＢ）、及び再構成相互結合のレベルで再構成論理
リソースを使用することを開示している。したがって本
発明は、単一のアルゴリズムについて有用な単一のゲー
ト接続スキームを開示するのではなく、すべてのクラス
の計算問題について演算を実行するのに有用な再構成可
能な高レベル論理構成部品の使用を開示している。An additively reconfigurable processor (ARP) device provides a computational accelerator for a particular algorithm at a particular time and is implemented as a set of gates that are optimally interconnected for a particular algorithm. The use of reconfigurable hardware resources for general purpose operations such as managing instruction execution has been avoided in additively reconfigurable processor (ARP) systems. Further, additive reconfigurable processor (ARP) systems do not treat a given set of interconnections as easily reusable resources. On the other hand, the present invention discloses a dynamic reconfiguration processing means configured for efficient management of instruction execution according to an instruction execution model most suitable for a calculation need at a specific time. The S-machine 12 includes a plurality of reusable resources, such as an instruction state sequencer (ISS) 10
0, interrupt logic 106, and store / align logic 1
52. The present invention discloses the use of reconfigurable logic resources at the level of logic blocks (CLBs), input / output blocks (IOBs), and reconfiguration interconnects, rather than at the level of interconnect gates. Thus, the present invention does not disclose a single gate connection scheme useful for a single algorithm, but rather reconfigurable high level logic components useful for performing operations on all classes of computational problems. Disclose use.

【０１４６】一般に付加再構成可能プロセッサ（ＡＲ
Ｐ）システムは、特定のアルゴリズムを１組の相互結合
ゲートに翻訳するためのものである。一部の付加再構成
可能プロセッサ（ＡＲＰ）システムは、高レベル命令を
最適のゲートレベルハードウェア構成にコンパイルする
よう試みるが、これは一般にＮＰハード問題である。こ
れに対して本発明は、きわめて簡単な方法で、可変命令
セットアーキテクチャ（ＩＳＡ）に従って高レベルプロ
グラム命令をアセンブリ言語命令にコンパイルする動的
再構成計算のためのコンパイラの使用を開示している。Generally, an additional reconfigurable processor (AR)
The P) system is for translating a particular algorithm into a set of interconnected gates. Some additive reconfigurable processor (ARP) systems attempt to compile high-level instructions into an optimal gate-level hardware configuration, which is generally an NP hardware problem. In contrast, the present invention discloses the use of a compiler for dynamic reconfiguration computation that compiles high-level program instructions into assembly language instructions according to a variable instruction set architecture (ISA) in a very simple manner.

【０１４７】付加再構成可能プロセッサ（ＡＲＰ）装置
は、一般にそのホストプログラムをデータとして扱うこ
とはできず、またそれ自体を計算環境に適合させること
もできない。これに対して、システム１０の各Ｓマシン
１２は、それ自体のプログラムをデータとして扱うこと
ができ、したがって、容易にそれ自体を計算環境に適合
させることができる。システム１０はそれ自体のプログ
ラムを実行することにより、それ自体を容易にシミュレ
ートすることができる。本発明はさらに、それ自体のコ
ンパイラをコンパイルすることができる。An additional reconfigurable processor (ARP) device generally cannot handle its host program as data, and cannot adapt itself to a computing environment. On the other hand, each S machine 12 of the system 10 can handle its own program as data, and thus can easily adapt itself to the computing environment. The system 10 can easily simulate itself by executing its own program. The present invention can also compile its own compiler.

【０１４８】本発明では、単一のプログラムには、第１
命令セットアーキテクチャ（ＩＳＡ）に属する第１命令
グループと、第２命令セットアーキテクチャ（ＩＳＡ）
に属する第２命令グループと、さらに別の命令セットア
ーキテクチャ（ＩＳＡ）に属する第３命令グループ
と．．．を含んでいる。この明細書で開示したこのアー
キテクチャは、命令が属する命令セットアーキテクチャ
（ＩＳＡ）を実装するためにランタイム構成されている
ハードウェアを用いて、このような命令グループをそれ
ぞれ実行する。先行技術のシステムや方法で同様の開示
内容を提示しているものはない。According to the present invention, a single program includes the first program.
A first instruction group belonging to an instruction set architecture (ISA) and a second instruction set architecture (ISA)
And a third instruction group belonging to yet another instruction set architecture (ISA). . . Contains. The architecture disclosed herein executes each such instruction group using hardware that is configured at run time to implement the instruction set architecture (ISA) to which the instruction belongs. No prior art system or method provides a similar disclosure.

【０１４９】本発明はさらに、割込み待ち時間と、割込
み精度と、プログラマブル状態遷移イネイブリングと
が、現在検討中の命令セットアーキテクチャ（ＩＳＡ）
に従って変化する再構成割込みスキームを開示してい
る。その他のコンピュータシステムでは、同様の開示内
容は認められない。本発明はさらに、先行技術コンピュ
ータシステムとは異なり、再構成データパスビット幅、
アドレスビット幅、及び再構成制御ライン幅を有するコ
ンピュータシステムを開示している。The present invention further provides that the interrupt latency, interrupt accuracy, and programmable state transition enabling are provided by an instruction set architecture (ISA) currently under consideration.
Discloses a reconfiguration interrupt scheme that varies according to Other computer systems do not share the same disclosure. The present invention further differs from prior art computer systems in that the reconstructed data path bit width,
A computer system having an address bit width and a reconfiguration control line width is disclosed.

【０１５０】本発明はいくつかの好ましい実施例を用い
て説明してきたが、当業者は、さまざまな変形例が得ら
れることを認めるであろう。Although the invention has been described using several preferred embodiments, those skilled in the art will recognize that various modifications may be made.

【０１５１】＜参考資料Ａ＞命令セット0，汎用外部ル
ープ命令セットアーキテクチャ（ＩＳＡ）１．０プロ
グラマのアーキテクチャモデルこの節では、レジスタ、
メモリモデル、高レベル言語から呼出しコンベンション
及び割込みモデルを含む命令セットアーキテクチャ（Ｉ
ＳＡ）０アーキテクチャについてのプログラマの概略コ
ンセプトを示す。<Reference Material A> Instruction set 0, general-purpose external loop instruction set architecture (ISA) 1.0 Programmer's architecture model In this section, registers,
Instruction set architecture (I) that includes memory models, call conventions and interrupt models from high-level languages
3 shows the schematic concept of the programmer for the SA) 0 architecture.

【０１５２】１．１レジスタ命令セットアーキテクチャ（ＩＳＡ）０は、１６個の１
６ビット汎用レジスタ、１６個のアドレスレジスタ、２
個のプロセッサ状態レジスタ、及び１個の割込みベクト
ルレジスタを含んでいる。データ及びアドレスレジスタ
のニーモニックは１６進数を用いており、したがって最
後のデータレジスタはdf.であり、最後のアドレスレジ
スタはaf.である。プロセッサ状態レジスタの１つであ
るnipar（Next Instruction Program Address Registe
r）は取出す（フェッチする）次命令のアドレスを指し
ている。もう一方の状態レジスタであるpcw（Processor
Control Word）はプログラムフローと割込み処理を行
うために用いられるフラグと制御ビットとを含んでい
る。そのビットは表２に定義されている。未定義のビッ
トは将来の使用のため保留される。さまざまな命令の副
作用として４つの条件フラグ、Ｚ、Ｎ、Ｖ及びＣが設定
される。どのフラグが各命令によって影響を受けるかの
概要については、２．０項を参照。1.1 Registers The instruction set architecture (ISA) 0 consists of 16 1
6-bit general register, 16 address registers, 2
Processor status registers and one interrupt vector register. The mnemonics of the data and address registers use hexadecimal numbers, so the last data register is df. And the last address register is af. One of the processor status registers, nipar (Next Instruction Program Address Registe
r) indicates the address of the next instruction to be fetched (fetched). The other status register, pcw (Processor
Control Word) includes flags and control bits used to perform program flow and interrupt processing. The bits are defined in Table 2. Undefined bits are reserved for future use. Four condition flags, Z, N, V and C, are set as side effects of various instructions. See Section 2.0 for an overview of which flags are affected by each instruction.

【０１５３】Ｔ（Trace Mode）及びＩＭ（Interrupt Ma
sk）フラグは、プロセッサが割込みに対してどのように
対応するか、またトラップがいつ取扱われるかを制御す
る。割込みベクトルレジスタivecは、割込みサービスル
ーチンの６４ビットアドレスを保持する。割込みとトラ
ップについては後述する１．４項で述べる。T (Trace Mode) and IM (Interrupt Ma
The sk) flag controls how the processor responds to interrupts and when traps are handled. The interrupt vector register ivec holds a 64-bit address of the interrupt service routine. Interrupts and traps are described in section 1.4 below.

【０１５４】[0154]

【表１】 [Table 1]

【０１５５】１．２メモリアクセス６４ビットアドレスレジスタに記憶されている値は、１
６ビット及び６４ビットインクリメントでメモリロード
／ストア命令アクセスメモリにより用いられる（表７参
照）。アドレスはビットアドレスである。つまり、アド
レス１６はメモリ内のビット１６で始まるワード（語）
を指す。ワードは１６ビット境界上でのみ読出すことが
でき、したがってメモリを読出すときにはアドレスレジ
スタの４つのＬＳＢ（最下位ビット）は無視される。Ｋ
_ISAのコンセプトの詳細については［１］を参照。６４
ビット値は、リトルエンディアン順（最下位１６ビット
が最も下位のアドレスに記憶される順序）の１６ビット
ワードとして記憶される。1.2 Memory Access The value stored in the 64-bit address register is 1
Used by memory load / store instruction access memory in 6-bit and 64-bit increments (see Table 7). The address is a bit address. That is, address 16 is a word (word) starting at bit 16 in the memory.
Point to. Words can only be read on 16-bit boundaries, so the four LSBs (least significant bits) of the address register are ignored when reading memory. K
See [1] for details of the _ISA concept. 64
The bit values are stored as 16-bit words in little-endian order (the order in which the least significant 16 bits are stored at the least significant address).

【０１５６】[0156]

【表２】 [Table 2]

【０１５７】１．３呼出しコンベンションコンベンションによって、レジスタafはＣプログラムに
よりスタックポインタとして用いられ、レジスタaeはス
タックフレームポインタとして用いられる。ニーモニッ
クsp及びfpはこれらのレジスタのエイリアス（別名）と
して用いられることがある。他のすべてのレジスタは一
般用に自由に用いられる。スタックは下に向かって増大
する。1.3 Call Convention By convention, register af is used by the C program as a stack pointer, and register ae is used as a stack frame pointer. The mnemonics sp and fp are sometimes used as aliases for these registers. All other registers are free for general use. The stack grows downward.

【０１５８】intは１６ビットであり、longは６４であ
り、aはボイド*である。int値はd0で復帰され、longと
ボイド*の値はa0で復帰される。d0-d4とa0-a3はファン
クションによってクロバーされ、他のすべての汎用レジ
スタはファンクションコール上で保持されなくてはなら
ない。ファンクションに入ると、スタックポインタは復
帰アドレスを指し、こうして最初の引数はアドレスsp＋
64（１０進）で始まる。Int is 16 bits, long is 64, and a is void *. The int value is returned at d0, and the values of long and void * are returned at a0. d0-d4 and a0-a3 are clovered by the function, and all other general purpose registers must be kept on the function call. Upon entry to the function, the stack pointer points to the return address, and thus the first argument is the address sp +
It starts with 64 (decimal).

【０１５９】１．４トラップと割込み命令セットアーキテクチャ（ＩＳＡ）０は１本の割込み
ラインに作用し、ソフトウエアは２つのソースからトラ
ップする。すべては下記に述べる同じ制御フロー（flow
-of-control）転送メカニズムを呼出す。1.4 Trap and Interrupts Instruction Set Architecture (ISA) 0 operates on one interrupt line, and software traps from two sources. All are the same control flow (flow
-of-control) Invoke the transfer mechanism.

【０１６０】外部的には単一のＩＮＴＲ信号入力があ
り、１つのiack出力がある。pcw内の割込みマスクビッ
トが、xpcw命令でpcwをリセットすることにより、また
はrti命令で割込みから復帰してpcwをその当初の値へ戻
すことによってクリアされると同時に、iackはアクティ
ブ（能動的）となる。外部装置による割込みの信号発信
とプロセッサによる割込みのサービスの間の時間量は、
現在実行中の命令とソフトウエアトラップの存在に応じ
て定まる。Externally, there is a single INTR signal input and one iack output. iack is active while the interrupt mask bit in pcw is cleared by resetting pcw with the xpcw instruction or by returning from the interrupt and returning pcw to its original value with the rti instruction. Becomes The amount of time between the signaling of the interrupt by the external device and the servicing of the interrupt by the processor is:
It depends on the currently executing instruction and the existence of the software trap.

【０１６１】ソフトウエアトラップは、明示トラップ命
令によって、またはＴ（トレース）フラッグセットで命
令を実行することによってトリガされる。この場合、Ｔ
の設定に続く最初の命令のあと、コントロール（制御
権）が割込みサービスルーチンへ移される。トラップ命
令が実行されるときは、プロセッサはＴフラグを設定
し、あたかも命令を実行する前にＴフラッグが設定され
ていたかのように割込みサービスルーチンに入る。Ｔフ
ラッグが設定されている間は割込みのサービスは行われ
ない。xpcw命令でpcwをリセットすることによって、ま
たはrti命令で割込みからの復帰によりスタックからリ
セットすることによってＴフラグがクリアされるまでは
それ以上トラップは起こらない。A software trap is triggered by an explicit trap instruction or by executing an instruction with the T (trace) flag set. In this case, T
After the first instruction following the setting of, control is transferred to the interrupt service routine. When the trap instruction is executed, the processor sets the T flag and enters the interrupt service routine as if the T flag had been set before executing the instruction. Interrupt service is not performed while the T flag is set. No more traps will occur until the T flag is cleared by resetting pcw with the xpcw instruction or resetting from the stack by returning from an interrupt with the rti instruction.

【０１６２】割込みは、intr外部信号でのアクティブ信
号の存在によって発生する。imフラグまたはＴフラグが
設定されているときは、割込みはマスクされ、未決定の
割込みは無視される。imフラグとＴフラグがクリアされ
ると、intrの表明に続く最初の命令のあと、コントロー
ルは割込みサービスルーチンに移される。割込みサービ
スルーチンに入ると、imフラグがプロセッサにより設定
される。xpcw命令でpcwがリセットされるか、またはrti
命令による割込みからの復帰によってスタックからリセ
ットされることによってimフラグがクリアされるまで
は、それ以上は割込みは起こらない。An interrupt is generated by the presence of an active signal in the intr external signal. When the im flag or the T flag is set, interrupts are masked and undetermined interrupts are ignored. When the im and T flags are cleared, control is transferred to the interrupt service routine after the first instruction following the assertion of intr. Upon entering the interrupt service routine, the im flag is set by the processor. xpcw instruction resets pcw or rti
No further interrupts will occur until the im flag is cleared by being reset from the stack by return from interrupt by instruction.

【０１６３】割込みまたはトラップが起こるときに、プ
ロセッサがとるステップは次の通りである。１．現在実行中のすべての命令を完了する。２．１６個のデータレジスタ（d0が先）、１６個のアド
レスレジスタ（a0が先）、pcw、ivec及びniparが、この
順序で（レジスタafによりポイントされた）スタックに
押込まれる。スタックに押込まれるafの値は、割込みま
たはトラップのサービスが始まる前のその値である。３．これが割込みであるときには、pcw内の割込みビッ
トが、それ以上の割込みをマスクするよう設定される。
これがトラップ命令であるときには、Ｔフラッグが設定
されるこれがＴフラッグにより発生したトラップである
ときは、pcwは変更されない。４．ivecレジスタ内の値をniperにロードする。割込みハンドラ内での命令の実行が始まる。When an interrupt or trap occurs, the steps taken by the processor are as follows. 1. Complete all currently executing instructions. 2. The 16 data registers (d0 first), 16 address registers (a0 first), pcw, ivec and nipar are pushed onto the stack (pointed to by register af) in this order. The value of af pushed onto the stack is its value before the interrupt or trap service begins. 3. When this is an interrupt, the interrupt bit in pcw is set to mask further interrupts.
If this is a trap instruction, the T flag is set. If this is a trap generated by the T flag, pcw is not changed. 4. Loads the value in the ivec register into nipper. Execution of the instruction in the interrupt handler begins.

【０１６４】rti命令の実行時には、下記の動作が行わ
れる。１．レジスタは、それが書込まれたのと反対の順序でス
タックから回復される。２．実行を再開する。At the time of execution of the rti instruction, the following operation is performed. 1. A register is recovered from the stack in the reverse order that it was written. 2. Resume execution.

【０１６５】割込みマスクフラグがすでにクリアされて
いないときは、それはrti命令によってクリアされる。p
cwの値がスタック上で変更されない限り、それはサービ
スルーチンに入ったときにクリアされていたからであ
る。トラップ命令を実行することによってＴフラグが設
定されるときは、同じ理由によってrtiの実行時にそれ
はクリアされる。サービスルーチンへ入る前に設定され
ていたＴフラッグによってトラップが発生したときは、
トラップが発生したことを確認するために、それはサー
ビスルーチンによりクリアされなくてはならない。割込
みマスクフラグが何らかの手段によってクリアされると
きは、外部出力信号iackは割込みが行われている外部装
置に信号を送るために１クロックサイクルの間アクティ
ブとなる。If the interrupt mask flag has not been cleared, it is cleared by the rti instruction. p
Unless the value of cw is changed on the stack, it was cleared when the service routine was entered. When the T flag is set by executing a trap instruction, it is cleared upon execution of rti for the same reason. If a trap occurs due to the T flag set before entering the service routine,
To confirm that a trap has occurred, it must be cleared by a service routine. When the interrupt mask flag is cleared by any means, the external output signal iack is active for one clock cycle to send a signal to the external device being interrupted.

【０１６６】２．０機能による命令の分類表記コンベンションは次の通りである。2.0 Classification of Instructions by Function Notation conventions are as follows.

【０１６７】[0167]

【表３】 [Table 3]

【０１６８】２．１レジスタの動き2.1 Operation of Register

【０１６９】[0169]

【表４】 [Table 4]

【０１７０】２．２論理演算2.2 Logical operation

【０１７１】[0171]

【表５】 [Table 5]

【０１７２】２．３メモリロード／ストア2.3 Memory Load / Store

【０１７３】[0173]

【表６】 [Table 6]

【０１７４】２．４算術演算2.4 Arithmetic Operations

【０１７５】[0175]

【表７】 [Table 7]

【０１７６】２．５制御フロー2.5 Control Flow

【０１７７】[0177]

【表８】 [Table 8]

【０１７８】３．０英字参照記号命令セットアーキテクチャ（ＩＳＡ）０のために設定さ
れた命令を下記にアルファベット順に示す。ニーモニッ
クは短い記述で示してある。その下は命令の２進コード
である。２進コードの各行は１６ビットのワードであ
る。影響を受けるフラグを次にリストで示す。ほかに定
めのない限り、フラグは宛先レジスタに記憶されたデー
タを用いて設定する。niparは命令実行の開始時にすで
にインクリメントされたものと想定する。最後に命令の
意味についてのテキスト記述を示す。3.0 Alphabetic Reference Symbols The instructions set for Instruction Set Architecture (ISA) 0 are listed below in alphabetical order. The mnemonics are shown in a short description. Below that is the binary code of the instruction. Each row of the binary code is a 16-bit word. The affected flags are listed below. Unless otherwise specified, flags are set using data stored in the destination register. nipar assumes that it has already been incremented at the start of instruction execution. Finally, a text description of the meaning of the instruction is shown.

【０１７９】２進コードに用いられている表記コンベン
ションを下記の表にまとめてある。条件コードは表５９
に定義されている。The notation conventions used for binary codes are summarized in the table below. Table 59 shows the condition codes.
Is defined in

【０１８０】[0180]

【表９】 [Table 9]

【０１８１】[0181]

【表１０】 [Table 10]

【０１８２】２個のデータレジスタを加算し、結果を宛
先レジスタに残す。The two data registers are added, and the result is left in the destination register.

【０１８３】[0183]

【表１１】 [Table 11]

【０１８４】２個のデータレジスタと桁上げフラグを加
算し、結果を宛先レジスタに残す。The two data registers and the carry flag are added, and the result is left in the destination register.

【０１８５】[0185]

【表１２】 [Table 12]

【０１８６】８ビット符号つき（２の補数）定数をデー
タレジスタに加算し、結果をレジスタに残す。An 8-bit signed (two's complement) constant is added to the data register and the result is left in the register.

【０１８７】[0187]

【表１３】 [Table 13]

【０１８８】２個のデータレジスタのビットワイズAND
を実行し、結果を宛先レジスタに残す。Bitwise AND of two data registers
And leave the result in the destination register.

【０１８９】[0189]

【表１４】 [Table 14]

【０１９０】条件が真のときは、（offset << K_isa ）
をniparに加算する。When the condition is true, (offset << K _isa )
Is added to nipar.

【０１９１】[0191]

【表１５】 [Table 15]

【０１９２】（offset << K_isa）をniparに加算する。(Offset << K _isa ) is added to nipar.

【０１９３】[0193]

【表１６】 [Table 16]

【０１９４】条件つきで８ビット右へシフトし、マスク
する。ワードオフセットから読出された８ビットデータ
を整列するためにロード命令のあとに用いられる。ソー
スアドレスレジスタに含まれるアドレスが８ビット境界
上にある（ビット２セットをもつ）ときには、データレ
ジスタ内の値を８ビット右へシフトする。アドレスが８
ビット境界上にないときは、レジスタの上流８ビットを
クリアする。Shift to right by 8 bits conditionally and mask. Used after the load instruction to align the 8-bit data read from the word offset. When the address contained in the source address register is on an 8-bit boundary (having 2 bits set), the value in the data register is shifted right by 8 bits. Address is 8
If not on a bit boundary, the upper 8 bits of the register are cleared.

【０１９５】［註］負のフラグは、ビット１５でなく
ビット７で設定する。これによって８ビット量の符号延
長が容易となる。[Note] The negative flag is set by bit 7 instead of bit 15. This facilitates code extension of an 8-bit amount.

【０１９６】[0196]

【表１７】 [Table 17]

【０１９７】宛先レジスタからソースレジスタを差引く
ことによって、２個のデータレジスタの絶対値比較のた
めのフラグを設定し、フラグだけに影響を与える。By subtracting the source register from the destination register, a flag for comparing the absolute values of the two data registers is set, and only the flag is affected.

【０１９８】[0198]

【表１８】 [Table 18]

【０１９９】１６ビット符号つき整数により３２ビット
符号つき整数の符号つき除算を行い、１６ビット符号つ
き商と剰余を戻す。３２ビット被除数は、宛先レジスタ
のインデックスから始まる2個の連続するレジスタ内に
格納する（リトルエンディアン順）。１６ビット除数は
ソースレジスタ内にある。剰余は宛先レジスタに戻し、
商は宛先レジスタ後のレジスタに戻す（モジュロ１
６）。商が１６ビットを越えるときはオーバフローとな
る。A signed division of a 32-bit signed integer is performed by a 16-bit signed integer, and a 16-bit signed quotient and remainder are returned. The 32-bit dividend is stored in two consecutive registers starting from the index of the destination register (little endian order). The 16-bit divisor is in the source register. The remainder is returned to the destination register,
The quotient returns to the register after the destination register (modulo 1
6). When the quotient exceeds 16 bits, an overflow occurs.

【０２００】[0200]

【表１９】 [Table 19]

【０２０１】データレジスタをアドレスレジスタに加算
し、結果をアドレスレジスタに残す。The data register is added to the address register, and the result is left in the address register.

【０２０２】[0202]

【表２０】 [Table 20]

【０２０３】８ビット符号つき定数をアドレスレジスタ
に加算し、結果をアドレスレジスタに残す。The 8-bit signed constant is added to the address register, and the result is left in the address register.

【０２０４】[0204]

【表２１】 [Table 21]

【０２０５】宛先レジスタからソースレジスタを差引く
ことによって、２個のアドレスレジスタの絶対値比較の
ためのフラグを設定し、フラグだけに影響を与える。By subtracting the source register from the destination register, a flag for comparing the absolute values of the two address registers is set, and only the flag is affected.

【０２０６】[0206]

【表２２】 [Table 22]

【０２０７】２個のアドレスレジスタを加算し、結果を
宛先レジスタに残す。The two address registers are added, and the result is left in the destination register.

【０２０８】[0208]

【表２３】 [Table 23]

【０２０９】宛先レジスタからソースレジスタを差引
き、結果を宛先レジスタに格納する。The source register is subtracted from the destination register, and the result is stored in the destination register.

【０２１０】[0210]

【表２４】 [Table 24]

【０２１１】ロードをアドレスレジスタ内に事後インク
リメントする。ソースレジスタによりポイントされたア
ドレスからメモリを読出し、宛先レジスタ内に入れる。
次にソースレジスタをインクリメントする。The load is post-incremented in the address register. The memory is read from the address pointed to by the source register and placed in the destination register.
Next, the source register is incremented.

【０２１２】[0212]

【表２５】 [Table 25]

【０２１３】アドレスレジスタを1ビット右へシフトす
る。The address register is shifted right by one bit.

【０２１４】[0214]

【表２６】 [Table 26]

【０２１５】アドレスレジスタから格納する。ソースレ
ジスタ内の６４ビット値を、宛先レジスタによりポイン
トされたメモリ位置に書込む。この値はリトルエンディ
アン順に配置した４つの１６ビットワードとして書込
む。The data is stored from the address register. Write the 64-bit value in the source register to the memory location pointed to by the destination register. This value is written as four 16-bit words arranged in little endian order.

【０２１６】[0216]

【表２７】 [Table 27]

【０２１７】アドレスレジスタからのストアを事前デク
リメントする。宛先レジスタをデクリメントし、次にソ
ースレジスタ内の値を、宛先レジスタによりポイントさ
れたメモリ位置に書込む。この値はリトルエンディアン
順に配置した４つの１６ビットワードとして書込む。The store from the address register is pre-decremented. Decrement the destination register, then write the value in the source register to the memory location pointed to by the destination register. This value is written as four 16-bit words arranged in little endian order.

【０２１８】[0218]

【表２８】 [Table 28]

【０２１９】アドレスレジスタからデータレジスタを差
引き、結果をアドレスレジスタに残す。The data register is subtracted from the address register, and the result is left in the address register.

【０２２０】[0220]

【表２９】 [Table 29]

【０２２１】宛先レジスタへソースレジスタをビットご
とに反転配置する。The source register is inverted and placed bit by bit in the destination register.

【０２２２】[0222]

【表３０】 [Table 30]

【０２２３】絶対アドレスへ条件つきジャンプする。条
件コードビットの定義については、表５９を参照。A conditional jump is made to an absolute address. See Table 59 for the definition of the condition code bits.

【０２２４】[0224]

【表３１】 [Table 31]

【０２２５】絶対アドレスへ無条件ジャンプする。条件
「常時」はjCCと同じ。An unconditional jump to an absolute address is performed. The condition “always” is the same as jCC.

【０２２６】[0226]

【表３２】 [Table 32]

【０２２７】宛先レジスタをまずインクリメントし、次
に宛先レジスタ（通常はスタックポインタ）によりポイ
ントされたアドレスに、（次命令を指す）現在のnipar
を格納する。次に、次命令を取出す前にソースレジスタ
内のアドレスでniparをロードする。The destination register is first incremented and then the current nipar (pointing to the next instruction) is added to the address pointed to by the destination register (usually the stack pointer).
Is stored. Next, nipar is loaded with the address in the source register before fetching the next instruction.

【０２２８】[0228]

【表３３】 [Table 33]

【０２２９】ビットの定数だけデータレジスタを左へシ
フトする。Shift the data register to the left by a bit constant.

【０２３０】[0230]

【表３４】 [Table 34]

【０２３１】ビットの定数だけデータレジスタを右へシ
フトする。The data register is shifted right by a bit constant.

【０２３２】[0232]

【表３５】 [Table 35]

【０２３３】メモリからデータレジスタをロードする。
ソースアドレスレジスタによりポイントされた値を宛先
データレジスタにロードする。Load data register from memory.
Load the value pointed to by the source address register into the destination data register.

【０２３４】[0234]

【表３６】 [Table 36]

【０２３５】データレジスタ内へロードを事後インクリ
メントする。ソースアドレスレジスタによりポイントさ
れたアドレスからメモリを読出し、宛先データレジスタ
に入れる。次にソースレジスタをインクリメントする。The load into the data register is post-incremented. The memory is read from the address pointed to by the source address register and stored in the destination data register. Next, the source register is incremented.

【０２３６】[0236]

【表３７】 [Table 37]

【０２３７】１６ビット隣接値をデータレジスタ内にロ
ードする。Load the 16-bit adjacent value into the data register.

【０２３８】[0238]

【表３８】 [Table 38]

【０２３９】宛先レジスタを、ソースレジスタのビット
ワイズ反転で置換え、宛先レジスタを加える。The destination register is replaced by the bitwise inversion of the source register, and the destination register is added.

【０２４０】[0240]

【表３９】 [Table 39]

【０２４１】ソースデータレジスタ内の値を宛先データ
レジスタ内に入れる。Put the value in the source data register into the destination data register.

【０２４２】[0242]

【表４０】 [Table 40]

【０２４３】ソースレジスタ内の値を宛先レジスタ内の
値により乗算した結果を、宛先レジスタで始まる2つの
連続するレジスタ内に格納する（リトルエンディアン
順）。The result of multiplying the value in the source register by the value in the destination register is stored in two consecutive registers starting with the destination register (little endian order).

【０２４４】[0244]

【表４１】 [Table 41]

【０２４５】２つのデータレジスタのビットワイズORを
実行し、結果を宛先レジスタに残す。Perform a bitwise OR of the two data registers and leave the result in the destination register.

【０２４６】[0246]

【表４２】 [Table 42]

【０２４７】データレジスタを１ビット左へシフトす
る。ＬＳＢ（最下位ビット）を桁上げフラグの値で置換
える。命令の終りに当初のＭＳＢ（最上位ビット）を桁
上げフラグ内に入れる。The data register is shifted one bit to the left. Replace LSB (least significant bit) with the value of the carry flag. At the end of the instruction, the original MSB (most significant bit) is placed in the carry flag.

【０２４８】[0248]

【表４３】 [Table 43]

【０２４９】前述した１．４項を参照。ソースレジスタ
をスタックポインタとして用いる。See section 1.4 above. Use the source register as a stack pointer.

【０２５０】[0250]

【表４４】 [Table 44]

【０２５１】サブルーチンからの復帰。宛先レジスタ
（通常はスタックポインタ）によりポイントされたメモ
リ位置からniparをロードする。次に、宛先レジスタを
インクリメントする。Return from subroutine. Load nipar from the memory location pointed to by the destination register (usually the stack pointer). Next, the destination register is incremented.

【０２５２】[0252]

【表４５】 [Table 45]

【０２５３】ソースレジスタ内の値により指定されたビ
ット数だけ、宛名レジスタを左へシフトする。Shift the destination register to the left by the number of bits specified by the value in the source register.

【０２５４】[0254]

【表４６】 [Table 46]

【０２５５】ソースレジスタ内の値により指定されたビ
ット数だけ、宛名レジスタを右へシフトする。The destination register is shifted to the right by the number of bits specified by the value in the source register.

【０２５６】[0256]

【表４７】 [Table 47]

【０２５７】データレジスタから格納（ストア）する。
ソース内の値を、宛先レジスタによりポイントされたメ
モリ位置に書込む。Data is stored from the data register.
Write the value in the source to the memory location pointed to by the destination register.

【０２５８】[0258]

【表４８】 [Table 48]

【０２５９】データレジスタからストアを事前デクリメ
ントする。宛先レジスタをデクリメントし、次に宛先レ
ジスタによりポイントされたメモリ位置にソースレジス
タ内の値を書込む。The store is pre-decremented from the data register. Decrement the destination register, then write the value in the source register to the memory location pointed to by the destination register.

【０２６０】[0260]

【表４９】 [Table 49]

【０２６１】宛先レジスタからソースレジスタを差引
き、結果を宛先レジスタに格納する。The source register is subtracted from the destination register, and the result is stored in the destination register.

【０２６２】[0262]

【表５０】 [Table 50]

【０２６３】宛先レジスタからソースレジスタを差引
き、次に桁上げビットを差引き、結果を宛先レジスタに
格納する。The source register is subtracted from the destination register, then the carry bit is subtracted, and the result is stored in the destination register.

【０２６４】[0264]

【表５１】 [Table 51]

【０２６５】割込みハンドラーを実行する。１．４項を
参照。宛先レジスタをスタックポインタとして用いる。Execute the interrupt handler. See section 1.4. Use the destination register as a stack pointer.

【０２６６】[0266]

【表５２】 [Table 52]

【０２６７】１６ビット符号つき整数による３２ビット
符号つき整数の符号なし除算を行い、１６ビット符号つ
き商と剰余を戻す。宛先レジスタのインデックスから始
まる２つの連続するレジスタ内に３２ビットを格納する
（リトルエンディアン順）。除数はソースレジスタ内に
ある。剰余は宛先レジスタへ戻し、商は宛先レジスタ後
の次レジスタへ戻す。商が１６ビットを越えるときはオ
ーバフローとなる。Unsigned division of a 32-bit signed integer by a 16-bit signed integer is performed, and a 16-bit signed quotient and remainder are returned. Store 32 bits in two consecutive registers starting from the index of the destination register (little endian order). The divisor is in the source register. The remainder is returned to the destination register, and the quotient is returned to the next register after the destination register. When the quotient exceeds 16 bits, an overflow occurs.

【０２６８】[0268]

【表５３】 [Table 53]

【０２６９】宛先レジスタ内の値によりソースレジスタ
内の値を乗算した結果を、宛先レジスタで始まる２つの
連続するレジスタ内に格納する（リトルエンディアン
順）。The result of multiplying the value in the source register by the value in the destination register is stored in two consecutive registers starting with the destination register (little endian order).

【０２７０】[0270]

【表５４】 [Table 54]

【０２７１】ソースアドレスレジスタ内の値を、宛先レ
ジスタで始まる4つの連続するデータレジスタへ転送す
る。この値はリトルエンディアン順に格納し、宛先レジ
スタアドレスをモジュロ１６で計算し、宛先レジスタが
どのレジスタでも良いようにする。Transfer the value in the source address register to four consecutive data registers starting at the destination register. This value is stored in little endian order, and the destination register address is calculated by modulo 16 so that the destination register may be any register.

【０２７２】[0272]

【表５５】 [Table 55]

【０２７３】４つの連続するデータレジスタ内のリトル
エンディアン順の６４ビット値を宛先アドレスレジスタ
内へ転送する。ソースレジスタアドレスをモジュロ１６
で計算し、宛先レジスタがどのレジスタでも良いように
する。Transfer the little endian 64-bit values in four consecutive data registers into the destination address register. Modulo 16 source register address
And the destination register may be any register.

【０２７４】[0274]

【表５６】 [Table 56]

【０２７５】２つのデータレジスタのビットワイズ排他
的ORを実行し、結果を宛先レジスタ内に残す。Perform a bitwise exclusive OR of the two data registers and leave the result in the destination register.

【０２７６】[0276]

【表５７】 [Table 57]

【０２７７】ソースデータレジスタ内の値をpcwレジス
タと交換する。Exchange the value in the source data register with the pcw register.

【０２７８】[0278]

【表５８】 [Table 58]

【０２７９】ソースアドレスレジスタ内の値をivecレジ
スタと交換する。Exchange the value in the source address register with the ivec register.

【０２８０】４．０条件コード条件コード操作コード部分フィールドには、下記の表か
らの値を用いる。4.0 Condition Code The values from the table below are used in the condition code operation code subfield.

【０２８１】[0281]

【表５９】 [Table 59]

【０２８２】＜参考資料Ｂ＞命令セット１，パイプライ
ン乗算・累算命令セットアーキテクチャ（ＩＳＡ）ＩＳ
Ａ１ − ＸＣ４０１３のためのパイプライン畳込みエン
ジンはじめに命令セットアーキテクチャ（ＩＳＡ）１
は、命令サイクルあたり４回の同時乗算・累算を行うこ
とのできるパイプライン乗算・累算アレイである。４個
の８ビット×８ビット乗算器への入力ごとに１個、つま
り、８個の８ビットデータレジスタ（xd0-xd3及びyd0-y
d3）がある。１つの最終１６ビット合計が出るまで、パ
イプライン加算アレイを経由して、４つの乗算器出力が
合計され、４個までの１６ビットレジスタが結果を記憶
できる（m0-m4）。命令セットアーキテクチャ（ＩＳ
Ａ）１のアーキテクチャは、主メモリでフロースルーバ
ッチ処理サイクルを仮定している。累算結果を再循環さ
せるための乗算器累算器データパスを通るフィードバッ
クパスはない。これはメモリデータ流量に重点が置かれ
ているからである。オーバフロースケーリングまたは拡
張有限性累算のための用意はない。畳込みフィルタリン
グに用いられる係数は、すべてのデータセットについ
て、１６ビットを超えない結果有限性を与えると、命令
セットアーキテクチャ（ＩＳＡ）１は仮定している。乗
算アレイは、８ビットの２の補数データ入力を受け、１
６ビットの２の補数結果を出す。<Reference Material B> Instruction Set 1, Pipeline Multiply / Accumulate Instruction Set Architecture (ISA) IS
A1-Pipeline Convolution Engine for XC4013 Introduction Instruction Set Architecture (ISA) 1
Is a pipeline multiply-accumulate array capable of performing four simultaneous multiplications / accumulations per instruction cycle. One for each input to the four 8-bit × 8-bit multipliers, ie, eight 8-bit data registers (xd0-xd3 and yd0-y
d3) There is. The four multiplier outputs are summed through the pipelined summation array until one final 16-bit sum is available, and up to four 16-bit registers can store the result (m0-m4). Instruction Set Architecture (IS
A) The 1 architecture assumes a flow-through batch processing cycle in main memory. There is no feedback path through the multiplier accumulator data path to recycle the accumulation results. This is because emphasis is placed on memory data flow. No provision is made for overflow scaling or extended finite accumulation. The instruction set architecture (ISA) 1 assumes that the coefficients used for convolutional filtering give a result finiteness not exceeding 16 bits for all data sets. The multiplication array receives an 8-bit two's complement data input,
Produce a 6-bit two's complement result.

【０２８３】メモリへのアクセスは、２個の１６ビット
アドレスレジスタ（a0とa1）によって管理され、これら
は互換性のあるソース及び宛先ポインタと考えることが
できる。プログラムフローは、標準６４ビットNIPARレ
ジスタにより管理され、６４ビット割込みベクトルレジ
スタは、フレームまたはデータ実行可能割込みなどの単
独割込みについて支援される（IVEC）。Access to memory is managed by two 16-bit address registers (a0 and a1), which can be considered compatible source and destination pointers. Program flow is managed by standard 64-bit NIPAR registers, with the 64-bit interrupt vector register being supported for single interrupts, such as frame or data ready interrupts (IVEC).

【０２８４】命令セットアーキテクチャ（ＩＳＡ）１の
命令セットはきわめて小さく、１６ビットのワードサイ
ズに整列され、汎用外部ループプロセッサ命令セットア
ーキテクチャ（ＩＳＡ）０のためのK_ISA＝４メモリ編成
に対応している。命令セットアーキテクチャ（ＩＳＡ）
１での単一のクロックサイクルで7回までの算術演算を
例示することができ、実動化によってクロックの小ウィ
ンドウ上でクロックあたり１の割合で結果を保持し、新
しいソースまたは宛名アドレスをインデックスする能力
があり、計算と並行してメモリから、またメモリへ、レ
ジスタデータを移す。命令セットアーキテクチャ（ＩＳＡ）１命令セットデータ移動 ld（reg−vector）命令ワード内に右揃えされた１４ビットのビットマップ
reg-vectorに従って、メモリから順次１４個までのレジ
スタがロードされる。The instruction set architecture (ISA) 1 instruction set is very small, aligned to a 16-bit word size, corresponding to a K _ISA = 4 memory organization for the general purpose external loop processor instruction set architecture (ISA) 0. I have. Instruction Set Architecture (ISA)
Up to seven arithmetic operations can be instantiated in a single clock cycle at one, with production keeping the results at a rate of one per clock over a small window of clocks and indexing new source or destination addresses. To transfer register data from and to memory in parallel with calculations. Instruction set architecture (ISA) 1 instruction set Data move ld (reg-vector) 14-bit bitmap right justified in instruction word
According to the reg-vector, up to 14 registers are sequentially loaded from the memory.

【０２８５】st（reg-vector）命令ワード内に右揃えされた１４ビットのビットマップ
reg-vectorに従って、メモリへ順次１４個までのレジス
タが記憶される。St (reg-vector) 14-bit bitmap right-aligned in instruction word
According to the reg-vector, up to 14 registers are sequentially stored in the memory.

【０２８６】ld（ivec-data）この命令に続く６４ビットのアドレスがIVECレジスタに
ロードされ、次命令をポイントするNIPAR＋＝5が実行さ
れる。Ld (ivec-data) The 64-bit address following this instruction is loaded into the IVEC register, and NIPAR + = 5 pointing to the next instruction is executed.

【０２８７】プログラム制御 jmp（nipar-data）この命令に続く６４ビットのアドレスがNIPARレジスタ
にロードされ、これによって次命令へのポインティング
が実行される。Program control jmp (nipar-data) The 64-bit address following this instruction is loaded into the NIPAR register, and pointing to the next instruction is executed.

【０２８８】算術演算 mac（m-reg）２ビットのm-regコードで示される乗算結果レジスタが
積と和（xd0^*yd0）＋（xd1^*yd1）＋（xd2^*yd2）＋（xd
3^*yd3）を受取る。Arithmetic operation mac (m-reg) The multiplication result register indicated by the 2-bit m-reg code is the product and the sum (xd0 ^* yd0) + (xd1 ^* yd1) + (xd2 ^* yd2) + (xd
3 ^* yd3) to receive.

【０２８９】macp（s-vec, d-vec）４ビットd-vecコードの２ビットにより示される乗算結
果レジスタが積と和（xd0^*yd0）＋（xd1^*yd1）＋（xd2^*
yd2）＋（xd3^*yd3）を受取る。d-vecコードの１つおき
のビットが選択的にこの結果レジスタのアドレス（a1）
へのメモリ書込みを可能にし、d-vecコードの残りのビ
ットが、アドレスレジスタa1が増分されるかどうかを選
択する。８ビットs-vecは４つの２ビットグループに分
けられ、データレジスタxd0-xd3について連続してアド
レス（a0）でのメモリからの読出しが行われるかどう
か、またアドレスレジスタa0が増分されるかどうかを指
定する。読出しまたは書込みが指定されるときは、乗算
に並行して行われる。ソフトウエアは、メモリから読出
され、メモリへ記憶されるデータの各バッチについて命
令処理のパイプライン整列を行わなければならない。Macp (s-vec, d-vec) The multiplication result register indicated by the two bits of the 4-bit d-vec code is the product and the sum (xd0 ^* yd0) + (xd1 ^* yd1) + (xd2 ^*
yd2) + (xd3 ^* yd3). Every other bit of the d-vec code is selectively the result register address (a1)
And the remaining bits of the d-vec code select whether the address register a1 is incremented. The 8-bit s-vec is divided into four 2-bit groups, and whether data register xd0-xd3 is continuously read from memory at address (a0) and whether address register a0 is incremented Is specified. When reading or writing is designated, it is performed in parallel with the multiplication. The software must perform instruction processing pipeline alignment for each batch of data read from and stored in memory.

【０２９０】再構成 reconf（ISA-vector）命令セットアーキテクチャ（ＩＳＡ）１は脱コンテキス
トされ、Ｓマシンは命令内の命令セットアーキテクチャ
（ＩＳＡ）ベクトルビットフィールドにより選択される
命令セットアーキテクチャ（ＩＳＡ）について再構成さ
れる。Reconfiguration reconf (ISA-vector) The instruction set architecture (ISA) 1 is de-contexted and the S machine reconfigures for the instruction set architecture (ISA) selected by the instruction set architecture (ISA) vector bit field in the instruction. Be composed.

【０２９１】表６０に、ＩＳＡ１のブロック構成とし
て、ＸＣ４０１３のためのパイプライン畳込みエンジン
を示す。Table 60 shows a pipeline convolution engine for XC4013 as a block configuration of ISA1.

【０２９２】[0292]

【表６０】 [Table 60]

【０２９３】[0293]

【発明の効果】本発明は、拡張性、並列、動的再構成計
算のためのシステム及び方法に関するものである。この
システムは、少なくとも１個のＳマシンと、各Ｓマシン
に対応するＴマシンと、汎用相互結合マトリックス（Ｇ
ＰＩＭ）と、１組の入出力Ｔマシンと、１個またはそれ
以上の入出力装置と、マスタタイムベース装置とを含ん
でいる。好ましい実施例では、このシステムは多重Ｓマ
シンを含んでいる。各Ｓマシンは、対応するＴマシンの
出力部と入力部とにそれぞれ結合された入力部と出力部
とを含んでいる。各Ｔマシンは、汎用相互結合マトリッ
クス（ＧＰＩＭ）に結合されたルーティング入力部とル
ーティング出力部とを含んでおり、各入出力Ｔマシンも
同様にこれらを含んでいる。入出力Ｔマシンはさらに、
入出力装置に結合された入力部と出力部とを含んでい
る。最後に、各Ｓマシンと、Ｔマシンと、入出力Ｔマシ
ンとは、マスタタイムベース装置のタイミング出力部に
結合されたマスタタイミング入力部を含んでいる。The present invention relates to a system and method for scalable, parallel, dynamic reconfiguration computation. The system comprises at least one S machine, a T machine corresponding to each S machine, and a universal interconnect matrix (G
PIM), a set of I / O T-machines, one or more I / O devices, and a master timebase device. In the preferred embodiment, the system includes multiple S machines. Each S machine includes an input and an output respectively coupled to the output and the input of the corresponding T machine. Each T machine includes a routing input and a routing output coupled to a general interconnect matrix (GPIM), and each input / output T machine includes them as well. The input / output T machine further
An input and an output are coupled to the input / output device. Finally, each S machine, T machine, and input / output T machine include a master timing input coupled to the timing output of the master time base device.

【０２９４】本発明のメタアドレス指定システムは、プ
ロセッサ自体に処理集中アドレス操作機能を実行するよ
う要求することなしに、ネットワーク内のプロセッサに
ビットアドレス指定能力を提供する。割当てられた各機
能を実行するよう最適化された個別の処理マシン及びア
ドレス指定マシンが開示される。処理マシンは命令を実
行し、ローカルメモリにデータを記憶し、またローカル
メモリからデータを検索し、いつ遠隔演算が要求される
かを決定する。アドレス指定マシンは、伝送するための
データのパケットを組立て、このパケットの地理アドレ
スまたはネットワークアドレスを決定し、入ってくるパ
ケットに対してアドレスをチェックする。さらにアドレ
ス指定マシンは、割込み処理とその他のアドレス指定演
算を実行することができる。The meta-addressing system of the present invention provides bit addressing capabilities to processors in a network without requiring the processors themselves to perform processing-intensive address manipulation functions. Disclosed are separate processing and addressing machines that are optimized to perform their assigned functions. The processing machine executes the instructions, stores the data in the local memory, and retrieves the data from the local memory to determine when a remote operation is required. The addressing machine assembles a packet of data for transmission, determines the geographical or network address of the packet, and checks the address against incoming packets. In addition, the addressing machine can perform interrupt handling and other addressing operations.

【０２９５】１つの実施例では、Ｔマシンはまた本発明
のメタアドレス指定メカニズムも提供する。メタアドレ
スは、システム内のＴマシンの地理的位置を指定し、ロ
ーカルメモリ装置内のデータの位置を指定する。メタア
ドレスのローカルアドレスは、装置のアドレス指定可能
スペースがローカルアドレスのビット数以下である限り
は、装置の実際のメモリサイズとは関係なく、新しい装
置のメモリ内の各ビットをアドレス指定するのに用いら
れる。したがって、単一のメタアドレスを用いて異なる
メモリサイズと構造とを有する装置のアドレス指定を行
うことができる。さらに、メタアドレスを用いているの
で、マルチプロセッサ並列アーキテクチャ内のハードウ
ェアが、システム全体のコヒーレント性と一致性とを保
証する必要はない。In one embodiment, the T machine also provides the meta-addressing mechanism of the present invention. The metaaddress specifies the geographic location of the T machine in the system and specifies the location of the data in the local memory device. The local address of the meta-address is used to address each bit in the new device's memory, regardless of the device's actual memory size, as long as the device's addressable space is less than or equal to the number of bits in the local address. Used. Thus, a single meta-address can be used to address devices having different memory sizes and structures. Furthermore, because of the use of meta-addresses, the hardware within the multiprocessor parallel architecture does not need to guarantee coherency and consistency of the entire system.

【０２９６】メタアドレスによって、完全な拡張性が得
られる。新しいＳマシンまたは新しい入出力装置が加え
られると、新しい地理アドレスがこの新しい装置につい
て指定される。本発明では、拡張性は不規則であっても
よく、プロセッサの数の２乗の拡張を行わなければなら
ないという条件がない。利用できるローカルメモリ帯域
幅までの任意の数のアドレス指定マシンを各処理マシン
に結合する能力によって、拡張性はさらに高まる。これ
により、システム設計者は、各処理マシンへの経路の数
を任意に指定することができる。このような柔軟性によ
って、システムのより高いレベルにより広い帯域幅を提
供することができ、システムの最も重要な機能に最も広
い帯域幅を与えるよう最適化されたピラミッド型処理ア
ーキテクチャを構築することができる。The meta-address provides complete extensibility. When a new S machine or a new I / O device is added, a new geographic address is specified for this new device. In the present invention, the scalability may be irregular, and there is no requirement that a square expansion of the number of processors must be performed. Scalability is further enhanced by the ability to couple any number of addressing machines to each processing machine up to the available local memory bandwidth. This allows the system designer to arbitrarily specify the number of paths to each processing machine. With this flexibility, you can provide more bandwidth at higher levels of the system, and you can build a pyramidal processing architecture that is optimized to provide the most bandwidth for the most important functions of the system. it can.

【０２９７】上に説明したように、好ましい実施例によ
ると、Ｔマシンはメタアドレスを生成し、割込みを扱
い、メッセージを待ち行列に待機させるアドレス指定マ
シンである。したがって、Ｓマシンはその処理能力をプ
ログラム命令の実行にのみ集中させることができ、本発
明のマルチプロセッサ並列アーキテクチャの全体的な効
率を大幅に最適化することができる。Ｓマシンは所望の
データを探し出すためのメタアドレスのローカルメモリ
コンポーネントにアクセスするだけでよく、地理アドレ
スは、Ｓマシンに対して透過性である。このアドレス指
定アーキテクチャは、分散メモリ／分散プロセッサ並列
計算システムときわめてよく相互作動する。ローカルメ
モリを分離するアーキテクチャ設計を選択することによ
り、ハードウェアを独立に、また並列に作動することが
できる。本発明によれば、各Ｓマシンは、１つの計算問
題にすべて並列に向けられていたとしても、実行時には
全く異なった再構成命令を有することができる。また、
動的再構成Ｓマシンにより実現される命令セットアーキ
テクチャ（ＩＳＡ）が異なっていても良いばかりでな
く、Ｓマシンを実現するのに用いられる実際のハードウ
ェアが一定のタスクを実行するように最適化されていて
も良い。したがって、単一のシステム内のＳマシンはす
べて異なるレートで作動してもよく、各Ｓマシンはシス
テムリソースの利用を最大限に高めながらその機能を最
適に実行することができる。As described above, according to the preferred embodiment, the T machine is an addressing machine that generates meta-addresses, handles interrupts, and queues messages. Thus, the S machine can concentrate its processing power only on the execution of program instructions, and can greatly optimize the overall efficiency of the multiprocessor parallel architecture of the present invention. The S-machine only needs to access the local memory component of the meta-address to find the desired data, and the geographic address is transparent to the S-machine. This addressing architecture works very well with distributed memory / distributed processor parallel computing systems. By choosing an architectural design that separates local memory, the hardware can operate independently and in parallel. According to the present invention, each S-machine can have a completely different reconfiguration instruction at runtime, even if all S-machines are directed in parallel. Also,
Not only may the instruction set architecture (ISA) implemented by the dynamically reconfigurable S-machine be different, but the actual hardware used to implement the S-machine may be optimized to perform certain tasks. It may be. Thus, the S machines in a single system may all operate at different rates, and each S machine can optimally perform its function while maximizing utilization of system resources.

【０２９８】さらに、唯一のメモリ確認によって、正確
な地理アドレスが伝送されていることが確認され、ロー
カルメモリアドレスの確認は提供されない。さらに、こ
の確認は処理マシンではなくアドレス指定マシンによっ
て実行される。仮想アドレス指定は用いられないので、
仮想アドレスを論理アドレスに変換するためのハードウ
ェア／ソフトウェア相互作動は必要ではない。メタアド
レスのアドレスは、物理的アドレスである。このような
予防的及び保全的機能をすべてなくすることにより、シ
ステム全体の処理速度が大幅に向上する。したがって、
メタアドレス指定スキームと組合わせて、コンピュータ
システムの「スペース」管理を、別個の処理マシンによ
り提供されるコンピュータシステムの「時間」管理から
別のアドレス指定マシンに分離することにより、高度並
列計算システムのための一意的なメモリ管理及びアドレ
ス指定システムが提供される。本発明のアーキテクチャ
により、Ｓマシンの作動は優れた柔軟性を有することと
なり、Ｔマシンのレートを一定に保ったまま各Ｓマシン
はそれぞれに最適なレートで作動することができる。シ
ステム全体のデータ通信を最も遠いスペースにも達する
ようにし、きわめて短時間でローカル命令処理を均衡さ
せることができ、これによって高度並列コンピュータシ
ステムによる複雑な問題の解決へのアプローチが改善さ
れる。In addition, a unique memory confirmation verifies that the correct geographic address is being transmitted, and does not provide confirmation of the local memory address. Further, this verification is performed by the addressing machine, not the processing machine. Since virtual addressing is not used,
No hardware / software interaction is required to translate virtual addresses to logical addresses. The address of the meta address is a physical address. Eliminating all such preventive and conservative functions greatly increases the overall system processing speed. Therefore,
In combination with a meta-addressing scheme, separating the "space" management of a computer system from the "time" management of a computer system provided by a separate processing machine to a separate addressing machine allows the A unique memory management and addressing system is provided. With the architecture of the present invention, the operation of the S-machines has great flexibility, and each S-machine can operate at its optimum rate while keeping the rate of the T-machine constant. Data communication throughout the system can reach farthest spaces and local instruction processing can be balanced in a very short time, thereby improving the approach to solving complex problems with highly parallel computer systems.

[Brief description of the drawings]

【図１】本発明に基づいて構築された、拡張性、並列、
動的再構成計算のためのシステムの好ましい構成例を示
すブロック図である。FIG. 1 shows a scalable, parallel,
FIG. 2 is a block diagram showing a preferred configuration example of a system for dynamic reconfiguration calculation.

【図２】本発明のＳマシンの好ましい構成例を示すブロ
ック図である。FIG. 2 is a block diagram showing a preferred configuration example of an S machine of the present invention.

【図３】再構成指示を含む模範的プログラムリストの模
式図である。FIG. 3 is a schematic diagram of an exemplary program list including a reconfiguration instruction.

【図４】一連のプログラム命令のコンパイル中に実行さ
れる先行技術コンパイル作業のフローチャートである。FIG. 4 is a flowchart of a prior art compilation operation performed during compilation of a series of program instructions.

【図５】動的再構成計算のためにコンパイラによって実
行される好ましいコンパイル作業のフローチャートであ
る。FIG. 5 is a flowchart of a preferred compilation operation performed by a compiler for dynamic reconfiguration calculations.

【図６】動的再構成計算のためにコンパイラによって実
行される好ましいコンパイル作業のフローチャートであ
る。FIG. 6 is a flowchart of a preferred compilation operation performed by a compiler for dynamic reconfiguration calculations.

【図７】本発明の動的再構成処理装置（ＤＲＰＵ）の好
ましい構成例を示すブロック図である。FIG. 7 is a block diagram showing a preferred configuration example of a dynamic reconfiguration processing device (DRPU) of the present invention.

【図８】本発明の命令取出し装置（ＩＦＵ）の好ましい
構成例を示すブロック図である。FIG. 8 is a block diagram showing a preferred configuration example of an instruction fetch unit (IFU) of the present invention.

【図９】本発明の命令状態シーケンサ（ＩＳＳ）によっ
て支援される好ましい１組の状態を示す模式図である。FIG. 9 is a schematic diagram illustrating a preferred set of states supported by the instruction state sequencer (ISS) of the present invention.

【図１０】本発明の割込みロジックによって支援される
好ましい１組の状態を示す模式図である。FIG. 10 is a schematic diagram illustrating a preferred set of states supported by the interrupt logic of the present invention.

【図１１】本発明のデータ演算装置（ＤＯＵ）の好まし
い構成例を示すブロック図である。FIG. 11 is a block diagram showing a preferred configuration example of a data operation device (DOU) of the present invention.

【図１２】汎用外部ループ命令セットアーキテクチャ
（ＩＳＡ）を実動化するために構成されたデータ演算装
置（ＤＯＵ）の第１模範実施例の構成図である。FIG. 12 is a block diagram of a first exemplary embodiment of a data operation unit (DOU) configured to implement a general purpose external loop instruction set architecture (ISA).

【図１３】内部ループ命令セットアーキテクチャ（ＩＳ
Ａ）を実動化するために構成されたデータ演算装置（Ｄ
ＯＵ）の第２模範実施例の構成ブロック図である。FIG. 13 shows an inner loop instruction set architecture (IS
A) A data processing device (D
(OU) is a configuration block diagram of a second exemplary embodiment.

【図１４】本発明のアドレス演算装置（ＡＯＵ）の好ま
しい構成例を示すブロック図である。FIG. 14 is a block diagram showing a preferred configuration example of an address arithmetic unit (AOU) of the present invention.

【図１５】汎用外部ループ命令セットアーキテクチャ
（ＩＳＡ）を実動化するために構成されたアドレス演算
装置（ＡＯＵ）の第１模範実施例の構成ブロック図であ
る。FIG. 15 is a configuration block diagram of a first exemplary embodiment of an address arithmetic unit (AOU) configured to implement a general purpose external loop instruction set architecture (ISA).

【図１６】内部ループ命令セットアーキテクチャ（ＩＳ
Ａ）を実動化するために構成されたアドレス演算装置
（ＡＯＵ）の第２模範実施例の構成ブロック図である。FIG. 16 shows an inner loop instruction set architecture (IS
FIG. 4 is a block diagram showing a configuration of a second exemplary embodiment of an address arithmetic unit (AOU) configured to implement A).

【図１７】（ａ）は、外部ループ命令セットアーキテク
チャ（ＩＳＡ）のための命令取出し装置（ＩＦＵ）と、
データ演算装置（ＤＯＵ）と、アドレス演算装置（ＡＯ
Ｕ）との間での再構成ハードウェアリソースの模範的割
当てを示す模式図、（ｂ）は、内部ループ命令セットア
ーキテクチャ（ＩＳＡ）のための命令取出し装置（ＩＦ
Ｕ）と、データ演算装置（ＤＯＵ）と、アドレス演算装
置（ＡＯＵ）との間での再構成ハードウェアリソースの
模範的割当てを示す模式図である。FIG. 17 (a) shows an instruction fetch unit (IFU) for an outer loop instruction set architecture (ISA);
A data operation unit (DOU) and an address operation unit (AO)
And (b) shows an instruction fetch unit (IF) for an inner loop instruction set architecture (ISA).
FIG. 2 is a schematic diagram showing an exemplary assignment of reconfigurable hardware resources among a U), a data operation unit (DOU), and an address operation unit (AOU).

【図１８】本発明のＴマシンの好ましい構成例を示すブ
ロック図である。FIG. 18 is a block diagram illustrating a preferred configuration example of a T machine according to the present invention.

【図１９】本発明の相互結合入出力装置の構成例を示す
ブロック図である。FIG. 19 is a block diagram showing a configuration example of a mutual coupling input / output device of the present invention.

【図２０】本発明の入出力Ｔマシンの好ましい構成例を
示すブロック図である。FIG. 20 is a block diagram showing a preferred configuration example of an input / output T machine of the present invention.

【図２１】本発明の汎用相互結合マトリックス（ＧＰＩ
Ｍ）の好ましい構成例を示すブロック図である。FIG. 21 shows a general interconnect matrix (GPI) of the present invention.
It is a block diagram which shows the preferable example of a structure of M).

【図２２】本発明に基づく、拡張性、並列、動的再構成
計算のための好ましい方法のフローチャートである。FIG. 22 is a flowchart of a preferred method for scalable, parallel, dynamic reconfiguration computation according to the present invention.

【図２３】本発明に基づくデータパケットの好ましい構
成例を示す模式図である。FIG. 23 is a schematic diagram showing a preferred configuration example of a data packet based on the present invention.

【図２４】本発明に基づくデータ要求を発生させるため
の好ましい方法のフローチャートである。FIG. 24 is a flowchart of a preferred method for generating a data request according to the present invention.

【図２５】本発明に基づくデータを送るための好ましい
方法のフローチャートである。FIG. 25 is a flowchart of a preferred method for sending data according to the present invention.

【図２６】本発明に基づくデータを受取るための好まし
い方法のフローチャートである。FIG. 26 is a flowchart of a preferred method for receiving data according to the present invention.

【図２７】本発明に基づく割込み処理演算を実行する相
互結合入出力装置の好ましい構成例を示すブロック図で
ある。FIG. 27 is a block diagram showing a preferred configuration example of an interconnected input / output device that executes an interrupt processing operation according to the present invention.

【図２８】本発明に基づく割込みを扱うための好ましい
方法のフローチャートである。FIG. 28 is a flowchart of a preferred method for handling interrupts according to the present invention.

[Explanation of symbols]

１２動的プログラム処理マシン（Ｓマシン）１４アドレス指定マシン（Ｔマシン）１６相互結合装置（汎用相互結合マトリック
ス）３４メモリ装置、ローカルメモリ（メモリ）１０１アーキテクチャ記述メモリ１０６，２２００割込みロジック３２０アドレス復号器１８００データパケット１８１６地理アドレス２２０８メタアドレス２２０８認識装置２２０４コンパレータ12 dynamic program processing machine (S machine) 14 addressing machine (T machine) 16 interconnection device (general-purpose interconnection matrix) 34 memory device, local memory (memory) 101 architecture description memory 106, 2200 interrupt logic 320 address decoder 1800 Data packet 1816 Geographic address 2208 Meta address 2208 Recognition device 2204 Comparator

Claims

[Claims]

A meta-addressing architecture for specifying a local memory destination for data packets for a network of dynamically re-programmable processing machines, each implementing an interrupt and storing a geographic address and a local address. A plurality of addressing machines having a unique geographical address for generating, transmitting, and waiting for each message, including a meta-address, wherein each of the plurality of addressing machines is coupled to at least one of the addressing machines and receives a local address. A plurality of dynamically reprogrammable processing machines for storing, retrieving, and processing data from a local memory device in response; a plurality of memory devices each associated with the dynamically reprogrammable processing machine; and the addressing machine. To the geo-address included in the meta-address. Meta address architecture for dynamic reconfiguration calculation comprising a mutual coupling device for routing data between said addressing machine one another depending on the address.

2. An address decoder for decoding at least one of the addressing machines into a geographical address and a local address, the dynamic reprogrammable processing machine; the local memory device; Coupled to an address decoder, retrieving meta-address information from the local memory in response to receipt of an unconditional instruction from the dynamic reprogrammable processing machine, assembling a data packet according to the retrieved meta-address, A controller for receiving a geographic address and a local address from a decoder and for transmitting a data packet to the dynamically reprogrammable processing machine in response to determining that the decoded geographic address corresponds to the associated geographic address. 2. A meta-addressing device for dynamic reconfiguration calculation according to claim 1. Architecture.

3. The system of claim 1, further comprising a plurality of architecture description memory devices coupled to said dynamically reprogrammable processing machine and storing said geographic address for said dynamically reprogrammable processing machine. Metaaddress architecture for dynamic reconstruction calculations.

4. The addressing machine further comprises an interrupt handler coupled to the input / output device, the interrupt handler comprising: a recognition device for identifying the interrupt request; and for verifying the validity of the interrupt request. And a comparator for comparing the identified interrupt request with a stored list of interrupt requests, and interrupt logic for processing the validated interrupt request according to the stored interrupt processing instructions. 2. A meta-address architecture for dynamic reconfiguration calculations according to item 2.

5. The meta-address architecture for dynamic reconfiguration calculations according to claim 1, wherein the meta-address is 80 bits wide, the geographic address is 16 bits wide, and the local address is 64 bits wide.

6. A parallel processor using a local addressing machine and a local processing machine coupled to a local memory, wherein the local addressing machine is identified by a unique geographic identification and interconnected by an interconnecting device. A method for processing instructions in an architecture, comprising: receiving a program instruction; determining whether the received program instruction requires a remote operation; An addressing method comprising: storing remote component information in said local memory; and issuing an unconditional instruction to said local addressing machine to initiate a remote operation.

7. The local addressing machine receives the unconditional instructions from the local processing machine, and stores the remote component information including a local geographic address, a remote geographic address, and a remote local memory address into the local addressing machine. Retrieving from a memory, generating a meta-address according to the retrieved remote component information, generating a data packet according to the generated meta-address, and sending the data packet to the interconnection device. 7. The method according to claim 6, wherein the method is performed.

8. An addressing method for addressing a memory in a parallel computing environment in which a local processing device is coupled to a local memory, a local address machine, and an interconnecting device, the local address machine comprising: Receiving the packet; decoding the data packet into a geographical address and a local address; comparing the geographical address with the associated geographical address; and transmitting the data packet to the local processor according to the geographical address matching the associated geographical address. Transmitting,
Addressing method to perform.

9. The method of claim 8, wherein transmitting a data packet to the local processor includes storing the data packet in a queue for processing by the local processor.

10. Receiving data from the local processor; retrieving remote operation data from the local memory in accordance with the received data; generating a meta-address from the retrieved data; Generating a data packet in response to: transmitting the data packet to the interconnection device;
9. The addressing method according to claim 8, comprising:

11. The addressing method of claim 10, wherein retrieving the remote computation data comprises retrieving a remote geographic address and a remote local memory address.

12. The method of claim 11, further comprising retrieving a source geographic address from said local memory.

13. The addressing of claim 12, further comprising using an architecture description memory coupled to each processor and storing a geographic address for the coupled local processor, and retrieving a source geographic address from the architecture description memory. Method.

14. A parallel processor using a local addressing machine and a local processing machine coupled to a local memory, wherein the local addressing machine is identified by a unique geographic identification and interconnected by an interconnecting device. A method for processing instructions in an architecture, wherein the local addressing machine receives an unconditional instruction from the local processing machine, and stores a local geographic address, a remote geographic address, and a remote local memory address. Retrieving the contained remote component information from the local memory; generating a meta-address according to the retrieved remote component information; generating a data packet according to the generated meta-address; Stage to send to interconnection device Addressing how to and, the provided.

15. An addressing method for addressing memory in a parallel computing environment in which a local processing unit is coupled to a local memory, a local address machine, and an interconnecting device, the local address machine comprising: Receiving data from a processor; retrieving remote operation data from the local memory according to the received data; generating a meta-address from the retrieved data; and generating a data packet according to the generated meta-address. And transmitting the data packet to the interconnection device.