JPH0628325A

JPH0628325A - Array processor communication network

Info

Publication number: JPH0628325A
Application number: JP5094770A
Authority: JP
Inventors: Paul Amba Wilkinson; アンバウィルキンソンポール; Peter M Kogge; マイケルコッジピーター
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1992-05-22
Filing date: 1993-04-22
Publication date: 1994-02-04
Anticipated expiration: 2011-09-11
Also published as: JP2533282B2

Abstract

PURPOSE: To provide an H-DOT for scale reduction in network implementation. CONSTITUTION: The H-DOT approach can execute one of several mesh configurations based on two-port fundamental elements of the same. H-DOT uses two or more links between two points and combines their functions to form one network having two or more systems and attachments. For example, in a twodimensional mesh, two adjacent north-south links and two adjacent east-west links are coupled to form one network having four processing elements and attachments. When the H-DOT approach is used, a processing element is connected to the next processing element by a link providing two vertical routes and two horizontal routes for an apparently H-shaped connection link with respect to mutual connection of processing elements, and dot OR operation is performed with respect to other mutual connection of arrays.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、コンピュータ及びコン
ピュータシステムに係り、特に、プロセッサのアレイ
と、ＳＩＭＤ、ＭＩＭＤ、及びＳＩＭＩＤ処理素子を相
互接続し、かつプロセッサのアレイを介して情報を通過
させるためのネットワークを提供する、アレイプロセッ
サの通信ネットワークに関する。FIELD OF THE INVENTION The present invention relates to computers and computer systems, and more particularly to interconnecting an array of processors with SIMD, MIMD, and SIMID processing elements and passing information through the array of processors. The present invention relates to a communication network of array processors, which provides a network of

【０００２】用語集・ＡＬＵＡＬＵはプロセッサの算術論理ユニット部分である。・アレイアレイは、一つ以上の次元における素子の配列（アレン
ジメント）に関する。アレイは、ＦＯＲＴＲＡＮ（フォ
ートラン：フォーミュラートランスレータの省略形）の
ような言語においてシングル（単一）ネームによって識
別されるこの種のオーダセットのデータ項目（アレイ素
子）を含むことができる。他の言語においては、この種
のオーダー（順序付けられた）セットのデータ項目のネ
ームは、全てが同一属性を有するデータ素子のオーダー
集合又はセットに関連している。プログラムアレイは、
一般に数又は寸法属性によって、特定化された寸法を有
している。アレイの宣言子は、いくつかの言語における
アレイの各寸法の大きさを特定化し得る。いくつかの言
語においては、アレイはテーブルにおける素子の配列で
ある。ハードウェアという意味においては、アレイは大
量並列アーキテクチュアにおいて概して同一であるスト
ラクチュア（機能的素子）即ち、構造体の集合である。
データ並列演算におけるアレイ素子は、操作が割り当て
られる素子である、かつ並列である時には、それぞれ独
立して、かつ並列で、必要とされる操作を実行すること
ができる素子である。一般に、アレイは処理素子の格子
として考えられ得る。区分データが規則的格子パターン
内で周辺移動することができるように、アレイのセクシ
ョンには区分データが割り当てられ得る。しかしなが
ら、データは、アレイ内の任意の位置へ索引されるか、
又は割り当てられることができる。・アレイディレクタアレイディレクタは、アレイのためのコントローラとし
てプログラムされるユニットである。アレイディレクタ
はアレイ内に配置される機能的素子のグループ化のため
のマスターコントローラの機能を実行する。・アレイプロセッサ二つの主要なアレイプロセッサのタイプ、複数命令複数
データ（ＭＩＭＤ）と単数命令単数データ（ＳＩＭＤ）
がある。ＭＩＭＤアレイプロセッサにおいては、アレイ
内の各処理素子が、それ自体のデータによってそれ自体
の独特な命令ストリーム（流れ）を実行する。ＳＩＭＤ
アレイプロセッサにおいては、アレイ内の各処理素子
が、共通の命令ストリームを介する同一命令に制限され
ているが、しかしながら、各処理素子に対応するデータ
は独特である。本発明の好適なアレイプロセッサは他の
特徴を有している。我々は、それを高度並列アレイプロ
セッサ（Advanced Parallel Array Processor ）と称
し、頭辞語ＡＰＡＰを使用する。・非同期（Asychronous) 非同期式は、規則的な時間関係がない、即ち機能の実行
が、他の機能実行に対して規則的又は予想できる時間関
係のない状態で生じる他の機能の実行に対して、予想不
可能な状態をいう。制御状態においては、コントローラ
が、データがアイドル（遊休）素子がアドレスされるの
を待機している時に制御がパスされる位置をアドレスす
る。これによって、操作がいかなる事象とも時間的に一
致しないシーケンス内に残存することが可能となる。・ＢＯＰＳ／ＧＯＰＳＢＯＰＳ又はＧＯＰＳは、毎秒数十億の操作、と同じ意
味を有する頭辞語である。ＧＯＰＳを参照されたい。・回線交換／蓄積交換（ Circuid Switched ／Store Fo
rward ）これらの用語は、ノードの網を介してデータパケットを
移動させるための二つのメカニズムに関する。蓄積交換
（Store Forward ）は、データパケットが、各中間ノー
ドによって受け取られ、メモリへ蓄積され、次いでその
行き先に向かって送られるメカニズムである。回線交換
は、データパケットが、中間ノードのメモリに入力する
ことなく、これらの行き先へ向かって、ノードを介して
直接通過することができるように、ノードの入力部分を
出力部分へ論理的接続させるためにコマンドされるメカ
ニズムである。・クラスタクラスタは制御ユニット（クラスタコントローラ）と、
それに取り付けられるハードウェア（端末、機能的ユニ
ット、又は仮想的構成素子であってもよい）からなるス
テーション（又は機能的ユニット）である。我々のクラ
スタは、しばしばノードアレイとよばれるＰＭＥのアレ
イを含む。クラスタは、通常、５１２個のＰＭＥを有す
る。 Glossary ALU ALU is the arithmetic logic unit part of a processor. Array An array relates to an arrangement of elements in one or more dimensions. An array can include this kind of ordered set of data items (array elements) identified by a single name in languages such as FORTRAN (abbreviation for Formula Translator). In other languages, the names of data items in this kind of ordered set are associated with ordered sets or sets of data elements that all have the same attributes. The program array is
Generally, it has a dimension specified by a number or a dimension attribute. Array declarators may specify the size of each dimension of the array in some languages. In some languages, an array is an array of elements in a table. In the hardware sense, an array is a collection of structures or structures that are generally identical in a massively parallel architecture.
An array element in a data parallel operation is an element to which an operation is assigned, and when parallel, it is an element that can perform the required operation independently and in parallel. In general, an array can be thought of as a grid of processing elements. Sections of the array can be assigned partition data so that the partition data can be moved around in a regular grid pattern. However, the data may be indexed anywhere in the array, or
Or can be assigned. Array Director The Array Director is the unit programmed as the controller for the array. The array director performs the function of a master controller for grouping the functional elements arranged in the array. Array Processor Two major array processor types, Multiple Instruction Multiple Data (MIMD) and Single Instruction Single Data (SIMD).
There is. In a MIMD array processor, each processing element in the array executes its own unique instruction stream with its own data. SIMD
In an array processor, each processing element in the array is limited to the same instruction through a common instruction stream, however, the data associated with each processing element is unique. The preferred array processor of the present invention has other features. We call it the Advanced Parallel Array Processor and use the acronym APAP. -Asychronous Asynchronous means that there is no regular time relationship, that is, the execution of a function occurs with respect to the execution of other functions that occur in a regular or unpredictable manner relative to the execution of other functions. , An unpredictable state. In the control state, the controller addresses locations where control is passed while data is waiting for idle (idle) elements to be addressed. This allows the operation to remain in a sequence that does not match any event in time. BOPS / GOPS BOPS or GOPS is an acronym with the same meaning as billions of operations per second. See GOPS.・ Circuit switch / Store switch
rward) These terms relate to two mechanisms for moving data packets through a network of nodes. Store Forward is a mechanism by which a data packet is received by each intermediate node, stored in memory, and then sent towards its destination. Circuit switching allows the input part of a node to be logically connected to the output part so that data packets can pass directly through these nodes towards their destination without entering into the memory of intermediate nodes. Is the mechanism that is commanded for.・ Cluster A cluster is a control unit (cluster controller)
A station (or functional unit) consisting of hardware (which may be a terminal, a functional unit, or a virtual component) attached to it. Our cluster contains an array of PMEs, often called a node array. A cluster typically has 512 PMEs.

【０００３】我々の全体的ＰＭＥノードアレイは、各ク
ラスタが、クラスタコントローラ（ＣＣ）によってサポ
ートされるクラスタのセットからなる。・クラスタコントローラクラスタコントローラは、一つ以上のデバイス又はそれ
らに連結される機能的ユニットに対する入力／出力（Ｉ
／Ｏ）の動作を制御するデバイスである。クラスタコン
トローラは、例えば、ＩＢＭ３６０１ファイナンスコミ
ュニケーションコントローラ（Finance Communication
Controller）などのユニット内で記憶されかつ実行され
るプログラムによって通常は制御されるが、例えば、Ｉ
ＢＭ３２７２制御ユニットなどのハードウェアによって
全体的に制御され得る。・クラスタシンクロナイザクラスタシンクロナイザは、素子の同期的動作を保持す
るためにクラスタのすべて又は一部の動作を管理する機
能的ユニットであり、これによって機能的ユニットはプ
ログラムの実行と特定な時間的関係を保持することがで
きる。・コントローラコントローラはネットワーク相互接続のリンクを介して
データの伝送及び命令を指示するデバイスである。その
動作は、コントローラが接続されるプロセッサによって
実行されるプログラムによって、又はデバイス内で実行
されるプログラムによって制御される。・ＣＭＯＳＣＭＯＳは相補形金属酸化膜半導体技術の頭辞語であ
る。これは、一般に、動的（ダイナミック）ランダムア
クセスメモリ（ＤＲＡＭｓ）を製造するために用いられ
る。ＮＭＯＳはＤＲＡＭＳを製造するために用いられる
他の技術である。我々はＣＭＯＳを好むが、ＡＰＡＰを
製造するために用いられる技術は、使用される半導体技
術の範囲内に限定されるものではない。・ドッティングドッティングは物理的に接続することによって、三つ以
上のリード線を結合することに関する。大部分のバック
パネルの複数のバスはこの接続アプローチを共有する。
この用語は、従来のＯＲ（論理和）ドットに関するが、
本明細書中では、非常に単純なプロトコルによって、バ
スと連結され得る複数のデータソースを識別するために
用いられる。In our overall PME node array, each cluster consists of a set of clusters supported by a cluster controller (CC). Cluster Controller A cluster controller is an input / output (I) to one or more devices or functional units connected to them.
/ O) is a device for controlling the operation. The cluster controller is, for example, IBM 3601 Finance Communication Controller (Finance Communication).
Controller), which is usually controlled by a program stored and executed in a unit, such as I
It may be entirely controlled by hardware such as the BM3272 control unit. Cluster Synchronizer A cluster synchronizer is a functional unit that manages the operation of all or part of a cluster in order to maintain the synchronous behavior of the elements, by which the functional unit has a specific time relationship with the execution of the program. Can be held. • Controller A controller is a device that directs the transmission and command of data via links of network interconnections. Its operation is controlled by a program executed by the processor to which the controller is connected or by a program executed in the device. CMOS CMOS is an acronym for complementary metal oxide semiconductor technology. It is commonly used to fabricate dynamic random access memories (DRAMs). NMOS is another technique used to fabricate DRAMS. Although we prefer CMOS, the technology used to fabricate APAP is not limited within the scope of the semiconductor technology used. • Dotting Dotting refers to joining three or more leads by physically connecting them. The buses on most back panels share this connection approach.
This term relates to conventional OR (or) dots,
Used herein to identify multiple data sources that may be coupled to the bus by a very simple protocol.

【０００４】我々の入力／出力（Ｉ／Ｏ）ジッパー（フ
ァスナー）に対する概念は、ノードへの入力ポートが、
ノードからの出力ポートによって、又はシステムバスか
ら送られるデータによって駆動され得るという概念を実
行するために用いられ得る。これに対して、ノードから
のデータは、他のノードへの入力及びシステムバスへの
入力の両方に使用可能である。システムバスと他のノー
ドの両方へのデータ出力は、同時には行なわれないが、
異なる周期で行なわれることに留意されたい。Our concept for input / output (I / O) zippers (fasteners) is that the input port to the node is
It can be used to implement the concept that it can be driven by an output port from a node or by data sent from the system bus. Data from a node, on the other hand, is available to both other nodes and to the system bus. Data output to both the system bus and other nodes is not done at the same time,
Note that it is done in different cycles.

【０００５】ドッティングは、２ポートの複数のＰＥ又
は複数のＰＭＥ若しくは複数のピケットが、ドッティン
グを利用することによって種々の編成の複数のアレイ内
に用いられ得るＨ−ＤＯＴディスカッションで用いられ
る。２Ｄ及び３Ｄ（ドット）メッシュ、ベース２Ｎ−ｃ
ｕｂｅ（キューブ）、スパースベース４Ｎ−ｃｕｂｅ
（キューブ）、及びスパースベース８Ｎ−ｃｕｂｅ（キ
ューブ）を含むいくつかのトポロジが説明される。・ＤＲＡＭＤＲＡＭは、ダイナミックランダムアクセスメモリの頭
辞語であり、主メモリのためにコンピュータによって用
いられる共通の記憶装置である。しかしながら、用語Ｄ
ＲＡＭは、キャッシュ又は主メモリではないメモリとし
て使用するために用いられ得る。・浮動小数点浮動小数点数は二つの部分において表される。固定少数
点即ち少数部と、ある仮定基数即ちベース（底）に対す
る指数部とがある。指数は、十進数（小数）の実際の位
置付けを示す。一般的な浮動小数点表示においては、実
数 0.0001234は、0.1234が固定小数点部であり、-3は指
数である、0.1234-3として表される。この例において
は、浮動小数点の基数又はベースは、１（ユニティ）よ
りも大きな、暗黙的固定整数基底を示す１０であり、こ
のユニティ（１）は、浮動小数点表示における指数によ
って明示されるか、或いは浮動小数点表示における特性
によって表示されるべきへと累乗され、次いで、表示さ
れた実数を決定するために固定小数点部によって乗算さ
れる。定数は、浮動小数点表記法だけでなく実数におい
ても表現され得る。・ＦＬＯＰＳ（フロップ）この用語は、一秒当りの浮動小数点命令回数を示す。浮
動小数点動作は、ＡＤＤ、ＳＵＢ、ＭＰＹ、ＤＩＶ
（加、減、乗、除）、及びしばしば多くの他のことを含
む。毎秒の浮動小数点命令回数パラメタは、加算及び乗
算命令によってしばしば計算され、一般には、５０／５
０ミックスを有すると考えられ得る。この動作は、指数
や小数の発生、及び必要とされるあらゆる小数の正規化
を含む。我々は、３２又は４８ビットの浮動小数点形式
（フォーマット）をアドレスすることができた（又はも
っと長いビットも可能だが、これらをミックス計算する
ことはしなかった）。固定小数点命令（通常又はＲＩＳ
Ｃ）によって実行される時、浮動小数点動作は、複数の
命令を必要とする。数字化パフォーマンスにおいて、１
０から１の比率を用いることもあるが、ある特殊な研究
によって、６．２５の比率がより使用に適していること
を示している。種々のアーキテクチュアがそれぞれ異な
る比率を有する。・機能的ユニット機能的ユニットは、目的を達成することが可能であるハ
ードウェア、ソフトウェア、又は両方のエンティティで
ある。・ギガバイト（Ｇｂｙｔｅｓ）ギガバイトは十億バイトを示す。Ｇバイト／ｓは毎秒十
億バイトである。・ギガフロップ（ＧＩＧＡＦＬＯＰＳ）１秒当りの（１０）の９乗の浮動小数点命令を示す。・ＧＯＰＳ及びＰＥＴＡＯＰＳＧＯＰＳ又はＢＯＰＳは、どちらも毎秒十億の動作を示
す同じ意味を示す。ＰＥＴＡＯＰＳは、現在の機械の潜
在能力である、毎秒数兆の動作を意味する。我々のＡＰ
ＡＰの機械に関しては、これの用語は、毎秒数十億の命
令を意味するＢＩＰＳ及びＧＩＰＳとほぼ同じ意味を有
する。いくつかの機械においては、一つの命令は、二つ
以上の動作（例えば、加算と乗算）を生じ得るが、我々
のはそのようにはしない。或いは、一つの動作を起こさ
せるために多数の命令が必要な場合もある。例えば、我
々は、６４ビットの演算を実行するために、複数の命令
を用いる。しかしながら、動作数を計算する際、我々は
ログ動作を計算することは選択しなかった。ＧＯＰＳ
は、パフォーマンスを説明するには好ましい使用方法で
あるかもしれないが、上記のような使用においては一貫
性がない。ＭＩＰｓ／ＭＯＰｓ、ＢＩＰｓ／ＢＯＰｓ及
びＭｅｇａＦＬＯＰｓ／ＧｉｇａＦＬＯＰｓ／Ｔｅｒａ
ＦＬＯＰｓ／ＰｅｔａＦＬＯＰｓなどを参照されたい。・ＩＳＡＩＳＡは命令セットアーキテクチュア（Instruction Se
t Architecture）を意味する。・リンクリンクは、物理的又は論理的であり得る素子である。物
理的リンクは、素子又はユニットを結合するための物理
的接続であるが、コンピュータプログラミングにおいて
は、リンクはプログラムの分離部分同士の間に制御及び
パラメタを渡す命令又はアドレスである。マルチシステ
ムにおいては、リンクは、実アドレス又は仮想アドレス
によって識別され得るリンクを識別するプログラムコー
ドによって特定化され得る二つのシステムの間の接続で
ある。これによって、一般に、リンクは、物理的媒体、
任意のプロトコル、及び関連するデバイス、及びプログ
ラミングを含むと共に論理的でありかつ物理的である。・ＭＦＬＯＰＳＭＦＬＯＰＳは、１秒当り（１０）の６乗の浮動小数点
命令を示す。・ＭＩＭＤＭＩＭＤは、１処理素子ごとに一つ位置決めされる複数
データストリーム（Multiple Data Stream）を実行する
ために、アレイ内の各プロセッサが其自体の命令ストリ
ーム、即ち複数命令ストリーム（Multiple Instruction
Stream ）、を有するプロセッサアレイアーキテクチュ
アを称するのに用いられる。・モジュールモジュールは、ディスクリート（個別化）され、かつ識
別可能なプログラムユニット、又は他の構成素子と共に
用いられるようにデザインされたハードウェアの機能的
ユニットである。さらに、単一電子チップ内に含まれる
ＰＥ（処理素子）の集合もモジュールと呼ばれる。・ノード一般に、ノードは、リンクの結合である。ＰＥの包括的
アレイにおいては、一つのＰＥが一つのノードであり得
る。ノードは、モジュールと呼ばれるＰＥの集合を含む
こともできる。本発明によれば、ノードは複数のＰＭＥ
のアレイから形成され、かつ我々は複数のＰＭＥ（プロ
セッサメモリ素子）のセットをノードと称する。ノード
は８個のＰＭＥであるのが好ましい。・ノードアレイＰＭＥから構成されるモジュールのコレクション（集
合）は、ノードアレイと呼ばれることもあり、モジュー
ルから成るノードのアレイである。ノードアレイは、通
常は、数個のＰＭＥよりも多いが、この用語は複数を包
合する。・ＰＤＥＰＤＥは偏微分方程式である。・ＰＤＥ弛緩法解決プロセスＰＤＥ弛緩法解決プロセスは、ＰＤＥ（偏微分方程式）
の解を求めるための方法である。ＰＤＥの解を求めるこ
とによって、公知の分野におけるスーパーコンピュータ
の演算力の大部分が使用され、これにより弛緩法プロセ
スの好適な例となり得る。ＰＤＥ方程式の解を求める方
法は多数あって、一つ以上の数値法は、弛緩法を含む。
例えば、ＰＤＥが有限要素法によって解かれるならば、
弛緩法は演算時間の大部分を消費する。伝熱の分野の例
を考えてみて下さい。煙突の内部に熱ガスが流れ、外に
冷たい風が吹いている場合は、煙突の煉瓦の内部の温度
勾配はどの様になるのでしょうか？小さなセグメント
（区分）としてレンガを考え、温度差の関数としてのセ
グメント同士の間を熱がどのようにして流れるかを示す
方程式を書込むことによって、伝熱偏微分方程式（ＰＤ
Ｅ）は有限要素法問題に変換される。内部及び外部の要
素以外の全ての要素を常温とし、一方、境界セグメント
が熱ガスであり且つ冷風の温度であるとした場合、弛緩
法で始まる問題を組み立てることができる。次いで、演
算プログラムはセグメントへ流れ込んだり、セグメント
から流れ出る熱の量に基づいて、各セグメントにおける
温度変数を更新することにより時間をモデル化する。こ
れは、煙突を介する温度変数値のセットが物理的な煙突
内で生じ得るる実際の温度分布を示すために弛緩する前
に、モデルにおける全てのセグメントを処理する多数の
周期を必要とする。目的が煙突内のガス冷却をモデル化
することにあったならば、これらの要素はガスの方程式
へ延長すべきであり、かつ内部の境界条件は他の有限要
素モデルと連結され、次いでこのプロセスは継続され
る。熱の流れがセグメントとその近隣セグメントの間の
温度差に依存することに留意されたい。従って、このプ
ロセスは温度変数値を分配するためにＰＥ間通信パスを
使用することになる。これが、ＰＤＥ関係を並列演算に
好適に適用させることができる近隣の通信パターン又は
特徴である。・ピケット（ＰＩＣＫＥＴ）これはアレイプロセッサを構成する素子のアレイにおけ
る素子である。この素子は、データフロー（ＡＬＵＲ
ＥＧＳ）、メモリ、制御、及びこの素子と関連する通信
マトリックスの部分である。この単位は、制御とアレイ
内部通信メカニズムの一部を有する、並列プロセッサと
メモリ素子からなるアレイプロセッサのｎ分の１を示
す。ピケットはプロセッサメモリ素子即ちＰＭＥの形態
をとる。我々のＰＭＥチップデザインプロセッサ論理
は、関連するアプリケーションにおいて記述されるピケ
ット論理を実行するか、或いはノードとして形成された
プロセッサのアレイに対する論理を有することができ
る。ピケットという用語は、処理素子に対して共通に使
用されるアレイ用語ＰＥと同様であり、かつ結合処理素
子と、クロックサイクルにおいて情報のビット並列バイ
トを処理するためのローカル（局部）メモリと、からな
る処理アレイの素子であるのが好ましい。好ましい実施
例は、バイト幅のデータフロープロセッサ、３２キロバ
イト又はそれ以上のメモリ、初期制御と、他のピケット
との通信のための結合（タイ）とを含んでいる。Dotting is used in H-DOT discussion where 2-port PEs or PMEs or pickets can be used in arrays of various configurations by utilizing dotting. 2D and 3D (dot) mesh, base 2N-c
ube, sparse base 4N-cube
Several topologies are described, including (cubes) and sparse base 8N-cubes. DRAM DRAM is an acronym for Dynamic Random Access Memory and is a common storage device used by computers for main memory. However, the term D
RAM can be used for use as cache or memory that is not main memory. Floating point floating point numbers are represented in two parts. There is a fixed decimal point or fractional part and an exponent part for some hypothesized radix or base. The exponent indicates the actual position of the decimal number (decimal number). In a typical floating point representation, a real number 0.0001234 is represented as 0.1234-3, where 0.1234 is the fixed point part and -3 is the exponent. In this example, the floating-point radix or base is 10, indicating an implicit fixed integer base greater than 1 (unity), which is explicitly indicated by the exponent in the floating-point representation, or Alternatively, it is raised to the power to be displayed by a property in the floating point representation and then multiplied by the fixed point part to determine the real number displayed. Constants can be expressed in real numbers as well as in floating point notation. FLOPS This term indicates the number of floating point instructions per second. Floating point operation is ADD, SUB, MPY, DIV
(Add, subtract, multiply, divide), and often many others. The floating point instruction count per second parameter is often calculated by addition and multiplication instructions and is typically 50/5.
Can be considered to have 0 mix. This operation includes the generation of exponents and fractions, and normalization of any fraction needed. We could address floating point formats of 32 or 48 bits (or longer bits are possible, but we didn't mix-compute these). Fixed point instructions (normal or RIS
Floating point operations, when performed by C), require multiple instructions. 1 in digitizing performance
While a ratio of 0 to 1 may be used, one particular study has shown that a ratio of 6.25 is more suitable for use. Different architectures have different ratios. -Functional unit A functional unit is an entity of hardware, software, or both capable of achieving an objective. -Gigabyte (Gbytes) Gigabyte indicates 1 billion bytes. G bytes / s is billion bytes per second. Gigaflops Shows (10) 9th floating point instructions per second. GOPS and PETAOPS GOPS or BOPS both have the same meaning, indicating billions of operations per second. PETAOPS means trillions of operations per second, which is the current machine's potential. Our AP
With respect to AP machines, this term has almost the same meaning as BIPS and GIPS, which means billions of instructions per second. On some machines, a single instruction can result in more than one operation (eg, add and multiply), but we do not. Alternatively, multiple instructions may be required to cause one operation. For example, we use multiple instructions to perform 64-bit operations. However, when calculating the number of operations, we did not choose to calculate log operations. GOPS
May be the preferred use for explaining performance, but is inconsistent in such uses. MIPs / MOPs, BIPs / BOPs and MegaFLOPs / GigaFLOPs / Tera
See FLOPs / PetaFLOPs, etc.・ ISA ISA is an instruction set architecture.
t Architecture). -Link A link is an element that can be physical or logical. A physical link is a physical connection for connecting elements or units, but in computer programming, a link is an instruction or address that passes control and parameters between separate parts of a program. In multisystem, a link is a connection between two systems that can be specified by a program code that identifies the link, which can be identified by a real or virtual address. This generally means that the link is a physical medium,
It is logical and physical, including any protocol and associated devices, and programming. MFLOPS MFLOPS indicates (10) to the sixth power of floating point instructions per second. MIMD In MIMD, each processor in the array executes its own instruction stream, that is, a multiple instruction stream (Multiple Instruction Stream) in order to execute a multiple data stream positioned one for each processing element.
Stream), which is used to refer to the processor array architecture. Module A module is a functional unit of hardware that is designed for use with a discrete and identifiable program unit, or other component. Furthermore, a set of PEs (processing elements) included in a single electronic chip is also called a module. -Node In general, a node is a connection of links. In a comprehensive array of PEs, one PE can be one node. A node can also include a collection of PEs called modules. According to the present invention, a node is a plurality of PMEs.
, And we refer to a set of PMEs (processor memory elements) as a node. The node is preferably 8 PMEs. Node Array A collection of modules composed of PMEs, sometimes called a node array, is an array of nodes composed of modules. A node array typically has more than a few PMEs, but the term embraces the plural. -PDE PDE is a partial differential equation.・ PDE relaxation method solving process PDE relaxation method solving process is PDE (partial differential equation)
Is a method for obtaining the solution of. By solving the PDE, most of the computational power of the supercomputer in the known field is used, which can be a good example of the relaxation process. There are many ways to find the solution to the PDE equation, and one or more of the numerical methods includes the relaxation method.
For example, if PDE is solved by the finite element method,
The relaxation method consumes most of the computation time. Consider the example of the field of heat transfer. What happens to the temperature gradient inside a chimney brick when hot gas is flowing inside the chimney and cold wind is blowing outside? Consider a brick as a small segment and write an equation that describes how heat flows between the segments as a function of the temperature difference by writing the heat transfer partial differential equation (PD
E) is converted into a finite element method problem. Given that all elements except the internal and external elements are at room temperature, while the boundary segments are hot gas and cold air temperatures, the problem starting with the relaxation method can be assembled. The computing program then models the time by updating the temperature variable in each segment based on the amount of heat flowing into or out of the segment. This requires a large number of cycles to process all the segments in the model before the set of temperature variable values through the chimney relaxes to show the actual temperature distribution that may occur in the physical chimney. If the objective was to model gas cooling in a chimney, these elements should extend into the gas equation, and the internal boundary conditions should be linked with other finite element models, then this process Will continue. Note that the heat flow depends on the temperature difference between the segment and its neighbors. Therefore, this process will use the PE-PE communication path to distribute the temperature variable values. This is a neighboring communication pattern or feature that allows the PDE relationship to be suitably applied to parallel computing. • PICKET This is an element in the array of elements that make up the array processor. This element has a data flow (ALU R
EGS), memory, control, and part of the communication matrix associated with this device. This unit represents 1 / nth of an array processor consisting of parallel processors and memory elements, with some of the control and array internal communication mechanisms. The picket takes the form of a processor memory element or PME. Our PME chip design processor logic can implement the picket logic described in the relevant application or have the logic for an array of processors formed as nodes. The term picket is similar to the array term PE commonly used for processing elements, and is composed of a combined processing element and a local (local) memory for processing bit parallel bytes of information in a clock cycle. Preferably, it is an element of a processing array comprising: The preferred embodiment includes a byte wide data flow processor, 32 kilobytes or more of memory, initial control, and ties for communication with other pickets.

【０００６】「ピケット」という用語は、軍事用のピケ
ットラインが、機能的に非常によく類似していることか
ら理解されるが、トム・ソーヤとその白いフェンスにそ
の由来をもつ。・ピケットチップピケットチップは単一シリコンチップ上に複数のピケッ
トを含む。・ピケットプロセッサシステム（又はサブシステム）ピケットプロセッサは、ピケットのアレイ、通信網、入
力／出力システム、及び、マイクロプロセッサとキャン
ド（canned）（格納された）ルーチンプロセッサとアレ
イを作動するマイクロコントローラから成るＳＩＭＤコ
ントローラを有するトータルシステムである。・ピケットアーキテクチュアピケットアーキテクチュアは、以下の処理を含む、幾つ
かの多様な種類の問題に適応する特徴を有するＳＩＭＤ
アーキテクチュアの好ましい実施例である。The term "picket" is understood because military picket lines are very similar in function, but has its origins in Tom Sawyer and his white fence. -Picket chips A picket chip contains multiple pickets on a single silicon chip. Picket processor system (or subsystem) A picket processor consists of an array of pickets, a network, an input / output system, and a microprocessor that operates a microprocessor and a canned routine processor and array. It is a total system that has a SIMD controller. Picket Architecture The Picket Architecture is a SIMD with features that adapt to several diverse types of problems, including:
It is the preferred embodiment of the architecture.

【０００７】−セット関連処理 −並列数値強化処理 −画像と類似した物理的アレイ処理・ピケットアレイピケットアレイは幾何学的オーダの規則的アレイ内の配
列されたピケットの集合である。・ＰＭＥ即ちプロセッサメモリ素子ＰＭＥはプロセッサメモリ素子として用いられる。我々
は、単一プロセッサ、メモリ、及び我々の並列アレイプ
ロセッサのうちの一つを形成する入力／出力が可能なシ
ステム素子又はユニットを称するのにこの用語ＰＭＥを
用いる。プロセッサメモリ素子は、ピケットを包合する
用語である。プロセッサメモリ素子は、プロセッサ、そ
の対応するメモリ、制御インターフェース、及びアレイ
通信網メカニズムの一部からなるプロセッサアレイのｎ
分の１を示す。この素子は、上記のように、ピケットプ
ロセッサにおけるように、又はサブアレイの一部とし
て、マルチプロセッサメモリ素子ノードにおけるよう
に、規則的アレイの接続性を有するプロセッサメモリ素
子を有することができる。・ルーチング（経路指定）ルーチングは、メッセージがその行き先に到達する物理
的経路の割り当てである。ルーチング割り当ては、ソー
ス又は原点、及び行き先を有する。これらの要素又はア
ドレスは、一時的な関係又は類似性を有する。メッセー
ジルーチングは、しばしば割当てテーブルによって得ら
れるキーに基づく。ネットワーク（網）においては、行
き先は、リンクを識別する経路制御アドレスによって送
られる情報の行き先としてアドレスされる、任意のステ
ーション又はネットワークのアドレス可能なユニットで
ある。行き先のフィールドは、メッセージヘッダ行き先
コードによって行き先を識別する。・ＳＩＭＤアレイ内の全てのプロセッサが、処理素子ごとに一つ割
当てられる複数データ（Multiple Data ）ストリームを
実行するために、単数命令（Single Instrustion）スト
リームからコマンドされるプロセッサアレイアーキテク
チュアである。・ＳＩＭＤＭＩＭＤ即ちＳＩＭＤ／ＭＩＭＤＳＩＭＤＭＩＭＤ即ちＳＩＭＤ／ＭＩＭＤは、幾つかの
複雑な命令を処理するためにある時間にＭＩＭＤからＳ
ＩＭＤへ切り換えることができる二重機能を有し、これ
によって二つのモードを有することができる機械に関す
る用語である。ＭＩＭＤの前端部又は後端部として配置
される時には、シンキングマシン社（Thinking Machine
s, Inc. ）のコネクションマシンモデルＣＭ２によっ
て、プログラマーは、時には二重モードと呼ばれる問題
の異なる部分の実行のために異なるモードを操作するこ
とが可能となった。これらの機械は、Illiac以来の機械
であり、かつマスターＣＰＵ（中央処理装置）を他のプ
ロセッサと相互接続するバスを用いている。これらのマ
スター制御プロセッサは、他のＣＰＵの処理に割り込む
能力を有している。他のＣＰＵは独立したプログラムコ
ードを実行することができた。割り込みの間は、チェッ
クポイント（制御されたプロセッサの現在状態のクロー
ズ及びセーブ）が提供されなければならない。・ＳＩＭＩＭＤＳＩＭＩＭＤは、アレイ内の全てのプロセッサが、処理
素子ごとに一つ割当てる複数データ（Multiple Data ）
ストリームを実行するために、単数命令（Single Instr
ustion）ストリームからコマンドされるプロセッサアレ
イアーキテクチュアである。この構成内では、命令の実
行を模倣する各ピケット内のデータ依存操作がＳＩＭＤ
命令ストリームによって制御される。-Set related processing-Parallel numerical enhancement processing-Physical array processing similar to images-Picket array A picket array is a set of arranged pickets in a regular array of geometric order. PME or processor memory element The PME is used as a processor memory element. We use this term PME to refer to a system element or unit capable of input / output forming a single processor, memory, and one of our parallel array processors. Processor memory device is a term that encompasses pickets. A processor memory device is a processor array consisting of a processor, its corresponding memory, a control interface, and part of the array communication network mechanism.
It shows one part. The device can have a processor memory device having a regular array of connectivity, as described above, as in a picket processor or as part of a sub-array, as in a multiprocessor memory device node. Routing Routing is the assignment of the physical route by which a message reaches its destination. A routing assignment has a source or origin and a destination. These elements or addresses have a temporary relationship or similarity. Message routing is often based on the keys obtained by the assignment table. In a network, the destination is any station or addressable unit of the network that is addressed as the destination for the information sent by the routing address that identifies the link. The destination field identifies the destination by the message header destination code. A processor array architecture in which all processors in a SIMD array are commanded from a Single Instrustion stream to execute a Multiple Data stream, one allocated per processing element. SIMDMIMD or SIMD / MIMD SIMDMIMD or SIMD / MIMD is from MIMD to S at some time to process some complex instructions.
It is a term for a machine that has a dual function that can be switched to an IMD and thus has two modes. When placed as the front or rear end of a MIMD, Thinking Machine
s, Inc.) connection machine model CM2 allows programmers to operate different modes for the execution of different parts of the problem, sometimes referred to as dual mode. These machines have been the machines since Illiac and use buses that interconnect the master CPU (Central Processing Unit) with other processors. These master control processors have the ability to interrupt the processing of other CPUs. Other CPUs could execute independent program code. During the interrupt, a checkpoint (close and save of the controlled processor's current state) must be provided. SIMIMD SIMIMD is a multiple data that all processors in the array allocate one for each processing element.
In order to execute the stream, a single instruction (Single Instr
is the processor array architecture that is commanded from the stream. Within this configuration, data-dependent operations within each picket that mimic instruction execution are SIMD.
Controlled by the instruction stream.

【０００８】ＳＩＭＤ命令ストリームを使って複数の命
令ストリーム（ピケットごとに一つ）を順番付け、かつ
複数データストリーム（ピケットごとに一つ）で操作す
るための能力を有する単数命令ストリームマシンであ
る。ＳＩＭＩＭＤはプロセッサメモリ素子システムによ
って実行され得る。・ＳＩＳＤＳＩＳＤは単数命令単数データの頭辞語である。・スワッピング（Swapping）スワッピングは記憶装置の記憶領域のデータ内容を記憶
装置の他の領域のデータ内容と交換する。・同期動作（Synchronous Operation ）ＭＩＭＤマシンの同期動作は、各アクションが一つの事
象（通常時計）に関連している動作モードであり、プロ
グラムシーケンスにおいて規則的に生じる特定事象とな
る。一つの動作が多数のＰＥへ送られ、これによってこ
の機能を単独で実行するために進行する。動作が完了す
るまで、制御はコントローラへは戻されない。A single instruction stream machine that has the ability to sequence multiple instruction streams (one for each picket) using the SIMD instruction stream and operate on multiple data streams (one for each picket). SIMIMD may be implemented by a processor memory device system. -SISD SISD is an acronym for single instruction single data. -Swapping Swapping replaces the data contents of the storage area of the storage device with the data contents of other areas of the storage device. Synchronous Operation The synchronous operation of the MIMD machine is an operation mode in which each action is associated with one event (normal clock), and is a specific event that occurs regularly in the program sequence. An operation is sent to multiple PEs, which proceeds to perform this function alone. Control is not returned to the controller until the operation is complete.

【０００９】要求が機能的ユニットのアレイに対するも
のならば、アレイ内の素子に対する要求がコントローラ
によって生成されるが、この要求動作は、制御がコント
ローラへ戻される前に、完了されなければならない。・ＴＥＲＡＦＬＯＰＳＴＥＲＡＦＬＯＰＳは、１秒当り（１０）の１２乗の浮
動小数点命令を意味する。・ＶＬＳＩ（ＩＣへ適用されるのと同様に）ＶＬＳＩは超大規模集
積回路の頭辞語である。・ジッパー（Zipper）ジッパーは、具備される新機能である。これは、リンク
（連結）が、アレイのコンフィグレーション（配置構
成）の通常の相互接続に対して外部のデバイスによって
行なわれることを可能とする。If the request is for an array of functional units, the request for an element in the array is generated by the controller, but this requesting action must be completed before control is returned to the controller. TERAFLOPS TERAFLOPS means (10) to the power of 12 floating-point instructions per second. VLSI (as applied to IC) VLSI is an acronym for Very Large Scale Integrated Circuits.・ Zipper The zipper is a new feature provided. This allows the links to be made by devices external to the normal interconnection of the array's configuration.

【００１０】[0010]

【従来の技術】より速いコンピュータへの終わりのなき
探究において、エンジニアは、今日の機械を当惑させる
複雑な問題を克服するために、分岐される超スーパーコ
ンピュータを作り出すために、数百、及び数千の低コス
トのマイクロプロセッサを並列でリンク（結合）させて
いる。このような機構は大量並列と呼ばれる。我々は大
量並列システムを作り上げるための新たな方法を作り出
した。我々が作った多くの改良点は、他の多くの仕事を
背景として考えられている。この分野は、参照される他
の出願のおいて要約されている。これについては、米国
特許番号第601,594 号に我々の並列関連プロセッサシス
テム（Parallel Associative ProcessorSystem）と我々
の高度並列アレイプロセッサ（Advanced Parallel Arra
y Processor −ＡＰＡＰ）の関連出願を参照されたい。
特別の適用に最も適しているアーキテクチュアを選択す
るためにはシステムの交換が必要とされるが、これまで
に満足できる解決法はなかった。我々のアイデアは解決
法の提供をより簡易化することにある。BACKGROUND OF THE INVENTION In the never-ending quest for faster computers, engineers have come to the task of creating hundreds, and even hundreds, of thousands to create branched super supercomputers in order to overcome the complex problems that plague today's machines. Thousands of low cost microprocessors are linked in parallel. Such a mechanism is called massively parallel. We have created a new way to build massively parallel systems. Many of the improvements we have made are considered in the context of many other jobs. This field is summarized in other referenced applications. For this, see US Patent No. 601,594, which describes our Parallel Associative Processor System and our Advanced Parallel Arra Processor.
y Processor-APAP).
System replacement is required to select the architecture that is most suitable for a particular application, but to date no satisfactory solution has been available. Our idea is to simplify the provision of solutions.

【００１１】プロセッサのアレイにおけるプロセッサの
相互関係、プロセッサ間で通信するために用いられる方
法は、この種のアレイに関する文献に引用されているよ
うに、注目すべき研究の中心であった。多くの研究が、
アレイの任意の２素子の間でメッセージを移動させるた
めの工程数を最小限とすることに重点をおいたし、多く
の研究が、画像処理及び他のこの種の非常に規則的な問
題をサポートするための近隣との通信に重点をおいてき
た。要するに、ＳＩＭＤ又はＭＩＭＤタイプの並列アレ
イプロセッサは、処理素子（ＰＥ）同士の通信のために
は、高度に編成され、かつ効果的な接続ネットワークを
必要とするのである。The interrelationships of processors in an array of processors, the methods used to communicate between the processors, have been the focus of considerable research, as cited in the literature for arrays of this type. Many studies
Focusing on minimizing the number of steps to move a message between any two elements of an array, many studies have supported image processing and other very regular problems of this kind. Emphasis has been placed on communicating with neighbors to do. In essence, SIMD or MIMD type parallel array processors require a highly organized and effective connection network for communication between processing elements (PEs).

【００１２】通信ネットワークは、全てのピケットが同
時に同一方向へデータ転送する同期的通信することが要
求され得るか、或いは各ピケットがランダムの時間にラ
ンダムの場所へメッセージを送るように、ランダムに通
信することが要求され得る。後者のアプローチは経路指
定転送と呼ばれる。The communication network may be required to communicate synchronously with all pickets transferring data in the same direction at the same time, or communicate randomly, such that each picket sends messages to random locations at random times. May be required to do so. The latter approach is called routed transfer.

【００１３】同期的転送及びルータ（経路指定）転送
は、ＭＩＭＤ又はＳＩＭＤアレイ制御アーキテクチュア
のいづれかにおいてアドレスされる必要があるが、一方
で、この通信の複雑性を単純化しようと試みる。Synchronous and router (route) transfers need to be addressed in either the MIMD or SIMD array control architecture, while attempting to simplify this communication complexity.

【００１４】いくつかの通信接続形態（トポロジ）が文
献に記述されており、アレイマシンにおいて種々の方法
で実行されている。基本的な通信トポロジは線形配列の
単純な左右の接続性である。線形配列においては、２ポ
ートＰＥ（処理素子）の各々は、２地点間ネットワーク
を介して右又は左のいづれかの位置で処理素子と通信す
る。２次元以上のもっと広範囲の従来のメッシュトポロ
ジにおいては、通信ネットワークはソース素子と実行さ
れる次元における素子の間の直接リンクを用いて実行さ
れる。従って、各素子はアレイの各次元ごとに二つのリ
ンクを有し、これにより、ＮＥＷＳ（北東西南）ネット
ワークを有する従来の技術の２次元アレイでは、各素子
は他の素子との間に四つのリンクを有することになり、
かつ他の次元が付加された場合には、メッシュの各素子
にさらに二つのリンクが加えられなければならない。従
来のメッシュの各素子の内部には、好適なリンクを介し
てメッセージ又はデータパケットを受け取りかつ伝送す
るルータ機能が設けられている。いくつかのマルチベー
スの多次元の超立方体（ハイパーキューブ）の実施例に
おいて、ハイパーキューブはプロセッサアレイ通信ネッ
トワークの最終列に近いものを表わす。例えば、２進ハ
イパーキューブの場合には、ポート数は迅速に有効比率
に達する。Several communication topologies have been described in the literature and are implemented in various ways in array machines. The basic communication topology is simple left-right connectivity in a linear array. In a linear array, each of the two-port PEs (processing elements) communicates with the processing element via a point-to-point network at either the right or left position. In a wider range of conventional mesh topologies of two or more dimensions, communication networks are implemented using direct links between source elements and elements in the dimension in which they are implemented. Therefore, each element has two links for each dimension of the array, so that in a prior art two-dimensional array with a NEWS (north-west-south-west) network, each element has four links to other elements. Will have a link,
And if another dimension is added, two more links must be added to each element of the mesh. Inside each element of the conventional mesh is a router function that receives and transmits message or data packets via suitable links. In some multi-based, multi-dimensional hypercube (hypercube) embodiments, the hypercube represents a near-final column of the processor array communication network. For example, in the case of a binary hypercube, the number of ports reaches the effective ratio quickly.

【００１５】従来の技術のアレイにおける処理素子は、
２地点間ネットワークリンクによって必要な素子に到達
するために十分なポートを必要とする。実行されるネッ
トワークのトポロジ（接続形態）とエクステント（範
囲）によって、ある処理素子は４ポート、ある処理素子
は６ポート、ある処理素子は８ポート、及びある処理素
子は３０ポート（３２ｋの素子を有する１５次元２進ハ
イパーキューブ）を必要とする。さらに、各リンクは、
これまでに増加するデータ転送率要求を満たすために、
１個から恐らく５０個の並列ラインを含むことができ
る。The processing elements in the prior art array are:
It requires enough ports to reach the required elements by the point-to-point network link. Depending on the topology (connection form) and extent (range) of the network to be executed, some processing elements have 4 ports, some processing elements have 6 ports, some processing elements have 8 ports, and some processing elements have 30 ports (32k elements). 15 dimensional binary hypercube). Furthermore, each link is
To meet ever increasing data transfer rate demands,
It can include 1 to perhaps 50 parallel lines.

【００１６】我々は、これらのトポロジをハードウェア
に取り入れる時に、チップ、カード、引出し、ラック、
及びルーム（空間）へのこれらのアレイの実装（パッケ
ージング）によって、我々は敏速にリンク数及び各リン
ク内の信号ピンの数に注目するようになる。ウェーハ技
術により、チップごとにより多くの回路が使用されるこ
とが可能となったので、並列プロセッサのアレイは益々
利用価値を増し、かつより高密度のアレイが所望される
ようになった。When we incorporate these topologies into our hardware, we have chips, cards, drawers, racks,
And packaging of these arrays in a room (space) prompts us to pay attention to the number of links and the number of signal pins in each link. Wafer technology has enabled more circuits to be used per chip, making arrays of parallel processors increasingly useful and higher density arrays desirable.

【００１７】この出願は、今日の多数のトポロジを実現
するために２ポート処理素子を相互接続するためにドッ
ト処理可能なネットワークを使用すると共に、実装用ピ
ンカウント数を効果的に減少させることを主目的として
いる。任意のメッシュコンフィギュレーションによるピ
ケットのアレイのパッケージングはいくつかのパッケー
ジング問題を提示しており、その問題の大部分が、使用
可能なパッケージピンの制限、又は必要とされるピン数
の減少希望に関している。This application uses a dot-processable network to interconnect two-port processing elements to implement many of today's topologies, while effectively reducing packaging pin counts. The main purpose is. Packaging an array of pickets with an arbitrary mesh configuration presents some packaging issues, most of which are limited to available package pins or desired reduction in pin count. It is related to

【００１８】特許の従来の技術においては、一般にＳＩ
ＭＤ及び他のネットワークについて記載する特許があ
る。例えば、米国特許番号第 4,270,170号は、ドット処
理された三つのブランチネットワークがそれぞれ４個の
処理素子を含むチップを相互接続するために使用される
ＮＥＷＳネットワークが接続されたＳＩＭＤアレイにつ
いて記述している。複数の方向の内の一方向における同
時転送が提供されている。ドットネットワークを用いる
ことによって、一つのチップ上のポートは、８個から６
個へ減少し、これによってピンカウントにおける２５％
削減を達成することができる。この特許では、２Ｄ（次
元）ネットワークだけが述べられている。広域経路指定
技術は、アレイ内の全ての素子に到達し、かつＮＥＷＳ
（北東西南）ネットワークに共通である４方向をコード
化する、二つのラインによって方向づけられている。こ
の特許は、ネットワークを介してピン及びポート数を減
少させるために、従来の技術のドット処理ネットワーク
を表しているが、各ネットワークの四つのブランチでは
なく、三つのブランチだけしかアドレスできないので、
処理素子（平均３．５ポート）当り三つを超えるのポー
トを備え、これにより、我々が到達したよりも高いピン
及びポートカウントに達してしまうことになる。この特
許は２Ｄ（次元）ＮＥＷＳネットワークのみを記述して
いる。我々は、他の次元及び配置構成への拡大を提供す
ることが所望されることを示しているが、この特許はこ
れらの問題は触れていない。この特許のアレイは、特定
方向へのデータの同時転送を必要とするのは明白であ
る。In the prior art of the patent, SI is generally
There are patents that describe MD and other networks. For example, US Pat. No. 4,270,170 describes a NEWS network connected SIMD array in which three dot-processed branch networks are used to interconnect chips containing four processing elements each. . Simultaneous transfers in one of multiple directions are provided. By using the dot network, there are 8 to 6 ports on one chip.
25% in pin count
Reductions can be achieved. In this patent only 2D (dimensional) networks are mentioned. Wide area routing technology reaches all elements in the array and
(North-south-west-south) Oriented by two lines that encode the four directions common to the network. This patent describes a prior art dot processing network to reduce the number of pins and ports through the network, but since only three branches can be addressed instead of four branches in each network,
With more than three ports per processing element (3.5 ports on average), this would lead to higher pin and port counts than we have reached. This patent describes only 2D (dimensional) NEWS networks. We have shown that it would be desirable to provide extensions to other dimensions and configurations, but this patent does not address these issues. Clearly, the array of this patent requires simultaneous transfer of data in a particular direction.

【００１９】米国特許番号第 4,468,727号は、画像処理
が、センサーを有する同一モノシリック基板上で実行さ
れるように、放射センサーのアレイが集積されたアレイ
プロセッサについて記述している。処理素子同士の間の
相互接続は、各処理素子のＮＥＷＳ（北東西南）のエッ
ジ上の電荷結合ゲートによって達成され、このようにし
てこの特許は多数のＮＥＷＳアレイのうちの一つだけを
表わす。このアレイプロセッサはドット通信ネットワー
クを備えていない。US Pat. No. 4,468,727 describes an array processor in which an array of radiation sensors is integrated such that image processing is performed on the same monolithic substrate with the sensors. Interconnection between the processing elements is accomplished by charge-coupled gates on the NEWS (northwest, southwest) edges of each processing element, and thus this patent represents only one of many NEWS arrays. This array processor does not have a dot communication network.

【００２０】米国特許番号第 4,805,091号には、例え
ば、接続マシン(Connection Machine)などの、シンキン
グマシン社（Thinking Machines.,Inc. ）によって製造
されたマシンによって用いられるハイパーキューブ（超
立方体）相互接続ネットワークの好ましい例が示されて
おり、かつ各パッケージのレベルがより高い（又はより
低い）次元のハイパーキューブによって達成される、チ
ップ、カード、ボード、及びフレームなどへの処理素子
のパッケージングに２進ハイパーキューブを適用するこ
とを記述している。この特許がドット方式のメカニズム
の形態について全く記述してないが、２進ハイパーキュ
ーブについては記述している。本発明が２進ハイパーキ
ューブを実行するために使用され得るので、この特許
は、アレイプロセッサのためのドット通信ネットワーク
である本発明の適用可能性の他の例ともいえる。しかし
ながら、この特許は、我々が図示するドットバスについ
ては記述していない。US Pat. No. 4,805,091 describes a hypercube interconnect used by machines manufactured by Thinking Machines., Inc., such as Connection Machines. A preferred example of a network is shown and 2 for packaging processing elements into chips, cards, boards, frames, etc., where each package is achieved by a higher (or lower) dimension hypercube. It describes the application of the Sudoku hypercube. Although this patent does not describe the morphology of the dot-based mechanism at all, it does describe a binary hypercube. This patent is another example of the applicability of the present invention, which is a dot communication network for array processors, as the present invention can be used to implement binary hypercubes. However, this patent does not describe the dot bus we illustrate.

【００２１】米国特許番号第 4,985,832号は、ルーチン
グ（経路指定）ネットワークを有するＳＩＭＤアレイ処
理システムの他の例である。この特許は、規則的なアレ
イ処理を提供するＮＥＷＳメッシュと、処理素子が大規
模な放送通信タスクを共有することができるメカニズム
と、１６×１６のクロスバー交換機が続くいくつかの
「バタフライ」ステージからなるランダムルーチングネ
ットワークとを共有するメモリを介して通信する小グル
ープの処理素子について述べている。しかし、この特許
はクロスバー交換機チップを含むランダムルーチングに
焦点を当てており、この欠陥は耐性の態様である。この
特許は多数の通信技術に焦点を当てているが、ドットメ
カニズムに付いては述べていない。US Pat. No. 4,985,832 is another example of a SIMD array processing system having a routing network. This patent describes a NEWS mesh that provides regular array processing, a mechanism by which processing elements can share large broadcast communication tasks, and several "butterfly" stages followed by a 16x16 crossbar switch. It describes a small group of processing elements that communicate through a shared memory with a random routing network of. However, this patent focuses on random routing, including crossbar switch chips, and this deficiency is a durable aspect. The patent focuses on a number of communication technologies, but does not mention the dot mechanism.

【００２２】米国特許番号第 4,910,665号は、各処理素
子が、その近隣素子のうちの８個を直接アクセスする２
次元ＳＩＭＤアレイプロセッサ相互接続技術について記
述している。この通信媒体は、四つの近隣素子をコーナ
ーで相互接続するドット処理ネットワークである。各処
理素子はそれぞれ四つのこの種のドットネットワークを
エンジョイ（有）する。この特許のＸ−ＤＯＴの提案及
びＨ−ＤＯＴと呼ばれるドット接続の両方は、四つの処
理素子がドットネットワークによって結合されることを
可能とする。しかしながら、この特許は、メッシュのト
ポロジと、トロイド（環状面）へのエクステンションだ
けについて述べており、処理素子内の回路に重点を置い
ている。我々は、接続性及び経路指定を中心とした改良
が必要とされると確信する。US Pat. No. 4,910,665 discloses that each processing element directly accesses eight of its neighboring elements.
A dimensional SIMD array processor interconnect technology is described. This communication medium is a dot processing network that interconnects four neighboring elements at the corners. Each processing element enjoys four such dot networks. Both the X-DOT proposal in this patent and the dot connection called H-DOT allow four processing elements to be connected by a dot network. However, this patent only mentions the topology of the mesh and the extension to the toroid (annular surface) and focuses on the circuitry within the processing element. We believe that improvements around connectivity and routing are needed.

【００２３】[0023]

【発明が解決しようとする課題】本発明の改良点は、ド
ットネットワーク構造体（Ｈ−ＤＯＴ）にある。Ｈ−Ｄ
ＯＴによってネットワーク化の実践の規模が減少する。
本発明の好ましい実施例は、いくつのトポロジ（接続形
態）に適用される。さらに、アレイにおけるＨ−ＤＯＴ
接続処理素子によって、プロセッサのアレイが大きさ及
び付加的次元において一般に拡大されることができると
共に、基本的２ポートアレイ処理素子を保持することが
できる。同期的及び経路指定転送制御は共に本発明に含
まれることができ、以下に、我々は本発明の経路指定ア
ルゴリズムについて説明する。The improvement of the present invention lies in the dot network structure (H-DOT). HD
OT reduces the scale of networking practices.
The preferred embodiment of the invention applies to any number of topologies. Furthermore, H-DOT in the array
The connection processing elements allow the array of processors to be generally scaled in size and additional dimensions while retaining the basic two-port array processing elements. Both synchronous and routing transfer control can be included in the present invention, below we describe the routing algorithm of the present invention.

【００２４】今日、チップにおける並列アレイ及びマル
チバイトワード転送機能を有する新たな機械が入手可能
であるが、メッシュ通信経路がより並列化されるにつれ
て、ピン・カウントが深刻な問題となってきている。本
発明はマイクロコンピュータを並列通信経路（パス）に
接続するために使用され得る。Although new machines are available today with parallel arrays on chips and multibyte word transfer capabilities, pin counts are becoming a serious problem as mesh communication paths become more parallel. . The present invention can be used to connect a microcomputer to parallel communication paths.

【００２５】本発明のＨ−ＤＯＴ概念は、種々のトポロ
ジとの並列通信を可能とするので、極めて好適である。
Ｈ−ＤＯＴ概念は２ポート（場合によってはそれ以上）
を有する処理素子を多種の相互接続トポロジに適用する
ためのアプローチである。ピン・カウントが有効に減少
されることができると共に、基本的素子が２ポートデバ
ウスのみを保持する。本発明のアプローチの結果は、同
一の２ポート素子の概念によってメッシュを付加的次元
へ拡大するための能力である。本発明の好ましい実施例
において、所望されるコンフィギュレーション（配置構
成）に関わらず、各処理素子（ＰＥ）を正確に２ポート
へ限定する。従って、各処理素子（ピケット）は正確に
二つのネットに接続される。The H-DOT concept of the present invention is very suitable because it allows parallel communication with various topologies.
The H-DOT concept has 2 ports (more in some cases)
Is an approach for applying a processing element having a to various types of interconnection topologies. The pin count can be effectively reduced while the basic element holds only a two port debauss. The result of the inventive approach is the ability to extend the mesh to an additional dimension by the same two-port element concept. In the preferred embodiment of the invention, each processing element (PE) is limited to exactly two ports, regardless of the desired configuration. Therefore, each processing element (picket) is connected to exactly two nets.

【００２６】我々は各ネットが幾つかの他のピケットへ
複数の次元において拡大し得ることを提供する。We provide that each net can extend to several other pickets in multiple dimensions.

【００２７】ピン・カウントは減少し、かつ機械化され
た我々の経路指定アルゴリズムが上記の特徴を利用す
る。メッセージが受け取られた後にメッセージを処理す
る経路指定アルゴリズムは単純明快である。メッセージ
が対応していれば、メッセージを保持するか、或いは対
応してないならば、そのメッセージを他のポートへパス
するかのどちらかである。The pin count is reduced and our mechanized routing algorithm takes advantage of the above features. The routing algorithm that processes a message after it has been received is straightforward. If the message corresponds, then the message is held, or if it does not, the message is passed to another port.

【００２８】Ｈ−ＤＯＴアプローチの利点は、・ピン削減・簡単化されたルータ（経路指定）アルゴリズム・より短い過渡時間でのネットワーク通過・潜在的に少数ホップ（ノード）・軽ロードシステムにおける高位信号の使用・スタンダードネットへ配置構成されるべき能力・代替経路指定を用いた有効な耐故障性である。The advantages of the H-DOT approach are: Pin reduction Simplified router (routing) algorithm Shorter transit time network traversal Potentially a few hops (nodes) Higher signaling in light load systems The ability to be placed and configured on a standard net is effective fault tolerance using alternative routing.

【００２９】[0029]

[Means for Solving the Problems]

【００３０】我々は、我々の新たな概念により設計され
た新たな「チップ」及びシステムを生成することによっ
て、大量並列プロセッサ及び他のコンピュータシステム
を作るための新たな方法を作り上げた。この出願はこの
様なシステムに関している。この出願及び関連出願にお
いて我々が教示する種々の概念の表現を介して我々の見
解は述べられている。各出願において示されている構成
素子は新たなシステムを作るために我々のシステムにお
いて結合されている。これらの構成素子は既存の技術と
も組み合わされ得る。We have created a new way to create massively parallel processors and other computer systems by creating new "chips" and systems designed according to our new concept. This application relates to such a system. Our views are set forth through the representation of various concepts we teach in this and related applications. The components shown in each application are combined in our system to create a new system. These components can also be combined with existing technology.

【００３１】この出願と関連出願において、ピケットプ
ロセッサと、いわゆる高度並列アレイプロセッサ（ＡＰ
ＡＰ）について詳細に記載する。ピケットプロセッサが
ＰＭＥ（プロセッサメモリ素子）を使用することを示す
ことは興味深い。ピケットプロセッサは、非常に小型化
されたアレイが所望されるミリタリー（軍用）使用に特
に有用であり得る。この関係において、ピケットプロセ
ッサはわが社の高度並列アレイプロセッサ（ＡＰＡＰ）
に関連する好ましい実施例とは少し異なる。しかしなが
ら、共通性が存在し、かつ我々が提供した態様及び特徴
は、機械を区別するのに使用されることができる。In this and related applications, the picket processor and the so-called highly parallel array processor (AP
AP) will be described in detail. It is interesting to show that the picket processor uses PMEs (processor memory devices). Picket processors may be particularly useful for military use where very miniaturized arrays are desired. In this context, the picket processor is our highly parallel array processor (APAP).
Is slightly different from the preferred embodiment relating to. However, there is commonality and the aspects and features we have provided can be used to distinguish machines.

【００３２】ピケットという用語は、プロセッサと、メ
モリと、これらに含まれたアレイの相互間通信のために
適用可能な通信素子と、から構成されるアレイプロセッ
サのｎ分の１の素子を称する。The term picket refers to the 1 / nth element of an array processor that is composed of a processor, memory, and communication elements applicable for intercommunication of the arrays contained therein.

【００３３】ピケットの概念はまたＡＰＡＰ処理アレイ
のｎ分の１へ適用可能である。The picket concept is also applicable to 1 / nth of an APAP processing array.

【００３４】ピケット概念が、データ幅、メモリサイ
ズ、及びレジスタの数においてＡＰＡＰとは異なり得る
が、ＡＰＡＰの代替である大量並列の実施例において
は、ピケット概念は、ＡＰＡＰにおけるＰＭＥはサブア
レイの一部であるのに、規則的アレイのｎ分の１に対し
て接続性を有するように構成されている点で異なってい
る。システムは共にＳＩＭＩＭＤを実行する。しかしな
がら、ピケットプロセッサは、ＰＥ内にＭＩＭＤを有す
るＳＩＭＤの機械として製造されるので、ＳＩＭＩＭＤ
を直接実行することができると共に、ＭＩＭＤＡＰＡ
Ｐの構造は、ＳＩＭＤを模倣するように制御されたＭＩ
ＭＤＰＥを用いることによってＳＩＭＩＭＤを実行し得
る。両機械ともＰＭＥを使用する。Although the picket concept may differ from APAP in data width, memory size, and number of registers, in a massively parallel embodiment, which is an alternative to APAP, the picket concept is that the PME in APAP is part of a subarray. However, it is configured to have connectivity for 1 / nth of a regular array. Both systems perform SIMIMD. However, since the picket processor is manufactured as a SIMD machine with MIMD in PE, SIMIMD
MIMD APA
The structure of P is controlled by MI to mimic SIMD.
SIMIMD can be performed by using MDPE. Both machines use PME.

【００３５】両システムは、プロセッサアレイのｎ分の
１が、処理素子、その関連するメモリ、制御バスインタ
フェース、及びアレイ通信ネットワークの一部である、
アレイ通信ネットワークと相互接続させるＮ個の素子を
有するアレイのためのアレイ処理ユニットを備えてい
る。In both systems, 1 / nth of the processor array is part of the processing element, its associated memory, control bus interface, and array communication network.
An array processing unit for an array having N elements interconnecting with an array communication network is provided.

【００３６】並列アレイプロセッサは、処理ユニットが
二つのうちのいづれかのモードにおいて、又は二つのモ
ードにおいて、動作するようにコマンドされることがで
き、かつＳＩＭＤ動作及びＭＩＭＤ動作のためのこれら
二つのモードの間を自在に移動することができる二重動
作モード能力を有しており、ＳＩＭＤがその編成のモー
ドである時、処理ユニットが各素子にＳＩＭＩＭＤモー
ドでそれ自体の命令を実行するようにコマンドするため
の能力を有し、かつＭＩＭＤが処理ユニット編成のため
の実行モードである時、処理ユニットは、ＭＩＭＤ実行
をシミュレートするために、アレイの選択された素子を
同期するための能力を有する。我々はこれをＭＩＭＤ−
ＳＩＭＤと称する。A parallel array processor can have its processing units commanded to operate in either of two modes, or in two modes, and these two modes for SIMD and MIMD operations. It has a dual operating mode capability that allows it to move freely between, and when the SIMD is in its mode of operation, the processing unit commands each element to execute its own instructions in SIMIMD mode. And the MIMD is in execution mode for processing unit organization, the processing unit has the ability to synchronize selected elements of the array to simulate MIMD execution. . We call this MIMD-
It is called SIMD.

【００３７】両システムにおける並列アレイプロセッサ
が、アレイの素子同士の間に情報を通過させるための経
路（パス）をアレイ通信網に提供する。情報の移動につ
いては、アレイコントローラが、移動するデータがその
行き先を定義しないように、全てのメッセージが、同時
に同じ行き先へ移動するように指定する第１の方法と、
各メッセージがその行き先を決定するメッセージの最初
にヘッダによって、セルフルーチング（自己経路指定）
される第２の方法と、の二つの方法のうちのいづれかに
より方向付けられる。The parallel array processors in both systems provide the array communication network with paths between the elements of the array for passing information. For moving information, the first way the array controller specifies that all messages go to the same destination at the same time so that the moving data does not define its destination.
Each message decides where to go Self-routing (self-routing) by header at the beginning of the message
The second method is performed, and the two methods are directed.

【００３８】並列アレイプロセッサのセグメントは、単
一半導体チップ上に提供される処理ユニットの複数のコ
ピーを備え、アレイの各コピーセグメントは当該セグメ
ントと関連するアレイ通信網の一部と、バッファと、マ
ルチプレクサと、このアレイ通信網を拡張するために、
アレイのセグメント部分がアレイの他のセグメントとシ
ームレス（継ぎ目なし）に接続されることをイネーブル
とするための制御部と、を含む。A segment of a parallel array processor comprises multiple copies of a processing unit provided on a single semiconductor chip, each copy segment of the array having a portion of the array communication network associated with the segment, a buffer, and In order to expand the multiplexer and this array communication network,
A controller for enabling a segment portion of the array to be seamlessly connected to other segments of the array.

【００３９】コントローラからの制御バス又はパスは、
制御バスがアレイの素子の各々に対して延長され、その
動作を制御するように、各処理ユニットごとに設けられ
る。The control bus or path from the controller is
A control bus extends for each of the elements of the array and is provided for each processing unit to control its operation.

【００４０】並列アレイの各処理素子セグメントは、単
一半導体チップの範囲内に含まれる処理メモリ素子の複
数のコピーを有し、かつチップ内に含まれるアレイセグ
メントへの制御の通信をサポートするためにアレイ制御
バスの一部とレジスタバッファとを含む。Each processing element segment of the parallel array has multiple copies of processing memory elements contained within a single semiconductor chip, and for supporting control communication to the array segments contained within the chip. Includes a portion of the array control bus and a register buffer.

【００４１】両方ともメッシュ移動又はルート移動を実
行することができる。通常、ＡＰＡＰは、一つの方向で
相互に関連しているチップ上の８個の素子によって二重
相互接続構造を実行するが、チップは他の方法で相互に
関連し合う。チップ上のプログラマブルルーチング（経
路指定）は、リンクを上記のＰＭＥ（プロセッサメモリ
素子）同士の間に設定させるが、ノードは関連付けられ
ることができ、通常他の方法で関連付けられる。チップ
上では、通常のＡＰＡＰ配置構成は、ノードの相互接続
がルート化された疎（スパース）８進数のＮ−キューブ
であり得る、本質的に２ｘ４のメッシュである。両シス
テムは、共に、２地点間（ツーポイント）パスからマト
リックスが構成されることを可能とするＰＥ（又はＰＭ
Ｅ）同士の間にＰＥ相互通信パスを有する。Both can perform mesh movements or route movements. APAPs typically implement a dual interconnect structure with eight elements on the chip that are interrelated in one direction, but the chips are otherwise interrelated. On-chip programmable routing allows links to be set up between the PMEs (processor memory devices) described above, but nodes can be, and usually are, otherwise associated. On chip, a typical APAP arrangement is essentially a 2x4 mesh, where the interconnections of nodes can be rooted sparse octal N-cubes. Both systems allow PEs (or PMs) to allow the matrix to be constructed from point-to-point paths.
E) There is a PE intercommunication path between each other.

【００４２】本発明の態様は、相互接続ネットワークコ
ンフィギュレーションによって、プロセッサアレイ素子
同士の間の相互接続を機械化するアレイトポロジで相互
接続される複数の処理素子を備える並列ＳＩＭＤ又はＭ
ＩＭＤアレイプロセッサ通信ネットワークであって、前
記相互接続ネットワークコンフィギュレーションによっ
て、各次元で次の処理素子への二つの経路を提供するリ
ンクである見掛上Ｈ型接続によって複数の方向の各々へ
処理素子が次の処理素子と相互接続することを許容する
ことより成るアレイプロセッサ通信ネットワークであ
る。An aspect of the present invention is a parallel SIMD or M with multiple processing elements interconnected in an array topology that mechanizes the interconnections between processor array elements, depending on the interconnection network configuration.
An IMD array processor communication network, wherein said interconnection network configuration allows processing elements to each of a plurality of directions by apparently H-shaped connections which are links providing two paths to the next processing element in each dimension Is an array processor communication network comprising allowing interconnection with the following processing elements.

【００４３】[0043]

【実施例】処理素子（ＰＥ）から処理素子へデータを送
るために使用され得る基本的な通信トポロジ（接続形
態）は、従来の技術における線形配列（リニアアレイ）
を表わしている図１に示された線形アレイの単純な左右
方向の連結性である。２ポートの処理素子の各々は、２
地点間ネットワークを介して左又は右のいづれかの処理
素子と連絡する。２ポート処理素子の基本的なアイデア
は、本発明のＨ−ＤＯＴ構造の説明の基礎となる。DESCRIPTION OF THE PREFERRED EMBODIMENT The basic communication topology that can be used to send data from a processing element (PE) to a processing element is a linear array in the prior art.
2 is a simple left-right connectivity of the linear array shown in FIG. Each of the 2-port processing elements has 2
It communicates with either the left or right processing element via a point-to-point network. The basic idea of a 2-port processing element is the basis for the description of the H-DOT structure of the present invention.

【００４４】もっと広範囲の従来のメッシュトポロジ
（接続形態）においては、通信ネットワークは、ソース
素子と行き先素子の間の直接リンク（相互連結）を用い
て実行される。従って、各素子は、アレイの各次元ごと
に二つのリンクを有する。例えば、ＮＥＷＳ（北、東、
西、南）通信による従来の２次元的アレイにおいては、
各素子は、図２に示されているように、他の素子に対し
て四つのリンクを有している。他の次元が付け加えられ
た場合には、メッシュの各素子にさらに二つのリンクが
加えられなければならない。従来のメッシュの各素子の
内部には、好適なリンクを介してメッセージ又はデータ
パケットを受け取り、かつこれらを伝送するルータ（経
路指定）機能が存在している。In a wider range of conventional mesh topologies, communication networks are implemented using direct links (interconnections) between source and destination elements. Therefore, each element has two links for each dimension of the array. For example, NEWS (North, East,
In the conventional two-dimensional array by west, south) communication,
Each element has four links to the other elements, as shown in FIG. If other dimensions are added, then two more links must be added to each element of the mesh. Within each element of a conventional mesh is a router function that receives and transmits message or data packets over suitable links.

【００４５】通信ネットワークは、全てのピケットが、
同時に同一方向へデータ伝送する「同期的通信」が必要
とされ得るか、或いは、各ピケットが、ランダムな時間
にランダムな場所へメッセージを発信する「ランダム通
信」のいづれかが必要とされ得る。後者のアプローチは
経路指定（ルート）伝送と呼ばれる。これらの種類の通
信は、共に、ＭＩＭＤ又はＳＩＭＤアレイ制御アーキテ
クチュアのいづれかにアドレスされることを必要とし得
ると共に、通信の複雑性を単純に保持するように試み
る。In the communication network, all pickets are
Either "synchronous communication" may be required to transmit data in the same direction at the same time, or "random communication" in which each picket sends messages to random locations at random times. The latter approach is called routing transmission. Both of these types of communication may need to be addressed to either the MIMD or SIMD array control architecture, and try to keep the communication complexity simple.

【００４６】各リンクは、常時増加するデータ伝送率要
求に応える為に、１から、恐らく５０の並列ラインを含
むことができる。Each link can include from 1 to perhaps 50 parallel lines to meet ever-increasing data rate requirements.

【００４７】任意のメッシュ構成を有するピケットのア
レイの実装（パッケージング）は、いくつかのパッケー
ジング問題を提示し、この問題の大部分は、パッケージ
ピンの利用が限定されることや、必要とされるピン数を
最小限にすることが所望されることに関する。Packaging of an array of pickets with arbitrary mesh configuration presents some packaging problems, most of which are limited or required use of package pins. Regarding the desire to minimize the number of pins taken.

【００４８】関連出願として上記に引用されている、並
列関連プロセッサシステム（TheParallel Associative
Processor System ）が、線形左右方向通信メッシュを
重点とする親特許出願において記述されている。しかし
ながら、以下のように、いくつかの相互接続メッシュが
考えられる。A parallel related processor system (The Parallel Associative, cited above as a related application)
Processor System) is described in a parent patent application that focuses on a linear left-right communication mesh. However, several interconnected meshes are possible, as follows.

【００４９】左右（Ｌ／Ｒ）メッシュＮＥＷＳメッシュスライドバスシャッフルネットワー
ク単純クロスバーベース２Ｎ−キュー
ブスライドクロスバーベース８Ｎ−キュー
ブLeft / Right (L / R) Mesh NEWS Mesh Slide Bus Shuffle Network Simple Crossbar Base 2 N-Cube Slide Crossbar Base 8 N-Cube

【００５０】所定のトポロジにおいてポート及びピン・
カウントを大きく削減させるために、Ｈ−ドットアイデ
アを用いてこれらのアレイ相互接続トポロジのうちのい
くつかを実行する方法を説明する。Ports and pins in a given topology
We describe how to implement some of these array interconnect topologies using the H-dot idea in order to significantly reduce the count.

【００５１】アプローチ−Ｈ−ＤＯＴ Approach-H-DOT

【００５２】メッシュ通信の制御は、二つのカテゴリに
分類され得る。通信が規則的である場合は、ある形式の
大域制御は、全てのピケットに同じ事をするように指示
する。これは同時か又はある方法で段階的に行なわれ得
る。このタイプの通信は、ＳＩＭＤ制御編成と一般的に
関連しているが、しかしながら、ＭＩＭＤ編成アレイと
同等に効果的に動作する。Control of mesh communication can be classified into two categories. If the communication is regular, some form of global control directs all pickets to do the same. This can be done simultaneously or stepwise in some way. This type of communication is commonly associated with SIMD control organizations, however, it works as effectively as MIMD organization arrays.

【００５３】本発明の好ましい実施例において、Ｈ−Ｄ
ＯＴは、全てが同一の基本的２ポート素子に基づいて、
いくつかのメッシュコンフィギュレーションのうちの一
つを実行することができる。Ｈ−ＤＯＴは、二つを超え
る２地点間リンクを採用し、かつこれらの機能を組み合
わせて、二つを超える処理素子へのアタッチメント（接
続機構）を有する一つのネットワークとすることができ
る。例えば、２Ｄ（次元）メッシュにおいては、二つの
隣接するＮ−Ｓ（北−南）リンク及び二つの隣接するＥ
−Ｗ（東−西）リンクが結合され、４処理素子へのアタ
ッチメントを有する一つのネットワークを形成する。数
的には、この４ポートネットワークが４個の２ポートネ
ットワークを置換し、これによってピン・カウントを５
０％減少させることができる。これは、Ｈ−ＤＯＴ相互
接続技術による２Ｄメッシュの実行を示す図３に示され
ている。さらに、図示されているように、Ｎ−Ｓ（北−
南）リンクが同一ワイヤを共有しているので、アレイ幅
の同期的伝送は２つのサイクルを必要とする。これらの
サイクルのうちの一つは、偶数の処理素子のためにあ
り、他のサイクルは奇数の処理素子のためにあることが
できる。Ｈ−ＤＯＴを用いた場合には、処理素子の相互
接続は、処理素子が、見掛上Ｈ型の接続リンクである当
該リンクに対して、二つの垂直経路と、二つの水平経路
を提供するリンクによって、次の処理素子と相互接続す
ることができるように、図３に示されるようなアレイの
他の相互接続に対してはドットＯＲ（論理和）となる。
本発明の好ましい実施例におけるスクエア（矩形）処理
素子は、一つを超える処理素子であり、或いは、一つの
ピケット、又は８個又は１６個のピケットを有するチッ
プである。この点に関しては、本発明の好適な処理素子
の付加的詳細については、関連出願を参照されたい。し
かしながら、本出願に関しては、本発明のスクエア処理
素子が従来のマイクロコンピュータを表すことができ
る。In the preferred embodiment of the present invention, HD
OT is based on a basic two-port device that is all the same,
One of several mesh configurations can be performed. H-DOT can employ more than two point-to-point links and combine these functions into one network with attachments to more than two processing elements. For example, in a 2D (dimensional) mesh, two adjacent NS links (north-south) and two adjacent Es.
-W (East-West) links are combined to form one network with attachments to four processing elements. Mathematically, this four-port network replaces four two-port networks, resulting in a pin count of five.
It can be reduced by 0%. This is shown in FIG. 3 which shows a 2D mesh implementation with H-DOT interconnection technology. Further, as shown, NS (North-
Since the (south) link shares the same wire, array-wide synchronous transmission requires two cycles. One of these cycles can be for an even number of processing elements and the other cycle can be for an odd number of processing elements. When H-DOT is used, the interconnection of the processing elements provides that the processing elements provide two vertical paths and two horizontal paths to the link, which is apparently an H-shaped connecting link. The link is a dot-OR to the other interconnects in the array as shown in FIG. 3 so that it can be interconnected with the next processing element.
The square processing element in the preferred embodiment of the present invention is more than one processing element, or a picket or a chip having eight or sixteen pickets. In this regard, see the related application for additional details of suitable processing elements of the present invention. However, for the purposes of this application, the square processing element of the present invention may represent a conventional microcomputer.

【００５４】Ｈ−ＤＯＴアプローチの利点について上記
にまとめてきたが、例外なくそうであるように、Ｈ−Ｄ
ＯＴアプローチにも欠点がある。本明細書中には、いく
つかの環境で、わずかなインパクトがあるＨ−ＤＯＴ概
念の影響がリストアップされているが、他の出願におい
ては、これらの影響は重大である。これらの影響とは、１．全ての素子が同時に同一方向でデータをパスする
同期式伝送においては、従来の実施では１クロックしか
必要としなかったが、Ｈ−ＤＯＴの実施の場合は２クロ
ック（又はサイクル）を必要とする。The advantages of the H-DOT approach have been summarized above, but as is true without exception, HD-D
The OT approach also has its drawbacks. While the impacts of the H-DOT concept in some environments have minor impacts are listed herein, in other applications these impacts are significant. These effects are: Synchronous transmission, in which all elements simultaneously pass data in the same direction, required only one clock in the conventional implementation, but requires two clocks (or cycles) in the H-DOT implementation.

【００５５】２．経路指定伝送が実行されている時に
は、Ｈ−ＤＯＴアプローチは、通信トラフィックが混雑
している時には重要となる一時的ブロッキングを導入す
る。これによって、メッセージが失われることはない
が、伝送に費やす時間が長くなる。2. When routed transmission is performed, the H-DOT approach introduces temporary blocking, which is important when communication traffic is congested. This does not result in message loss, but it increases the time spent in transmission.

【００５６】通信がランダムのタイプであるならば、各
メッセージは行き先アドレスを運び、かつ中間ノードで
停止する可能性はあるが、ネットワークを通過する。従
って、各素子はある形式の経路指定アルゴリズムを実行
しなければならない。ルータ（経路指定子）は、並列プ
ロセッサ制御編成から独立していることもあり、又は並
列プロセッサ制御編成に組み込まれることもある。If the communication is of a random type, each message carries a destination address and may pass down the intermediate node, but through the network. Therefore, each element must implement some form of routing algorithm. The router (router) may be independent of the parallel processor control organization or may be incorporated into the parallel processor control organization.

【００５７】２ポート素子の場合には、アルゴリズムの
多くではないが、この素子によって受け取られたメッセ
ージの処理方法を決定することが必要とされるものもあ
る。このメッセージはこの素子に属するか、又は他方へ
伝送されるかのいずれかである。このアルゴリズムは、
２ポートピケット即ち処理素子によって実行されるメッ
シュの全ての形態に対して万能である。In the case of a two-port element, some, but not many of the algorithms, require determining how to process the messages received by this element. This message either belongs to this element or is transmitted to the other. This algorithm is
It is universal for all forms of mesh implemented by two-port pickets or processing elements.

【００５８】あるポートにおいて入手可能なメッセージ
を受容すべきか否かを決定する他のアルゴリズムがあ
る。２次元ＮＥＷＳネットワークにおいては、メッセー
ジへアクセスする四つのポートが存在しているが、一つ
のポートだけが実際にメッセージを受け取ることができ
る。メッセージを受け取ることを決定するためのアルゴ
リズムは、その行き先までメッセージを到達させると共
に、機能しない経路を避けることに基づいている。この
受容アルゴリズムは、アレイのコンフィギュレーション
に基づいており、かつ特定のアレイによって後述され
る。There are other algorithms that determine whether to accept an available message at a port. In the two-dimensional NEWS network, there are four ports to access the message, but only one port can actually receive the message. The algorithm for deciding to receive a message is based on getting the message to its destination and avoiding dead routes. This acceptance algorithm is based on the array configuration and is described below by the particular array.

【００５９】以下のメッシュトポロジに対するＨ−ＤＯ
Ｔについて詳述する。ＮＥＷＳ（北東西南）メッシュ３Ｄメッシュ２進Ｎ−キューブベース４Ｎ−キューブ、及び８進Ｎ−キューブH-DO for the following mesh topologies:
T will be described in detail. NEWS (northwest-southwest) mesh 3D mesh Binary N-Cube Base 4 N-Cube and Octal N-Cube

【００６０】３−ＤＮＥＷＳアレイ 3-D NEWS Array

【００６１】この第１の説明には、２次元ＮＥＷＳアレ
イを達成するために、図１に示されている前述のＥＷ
（東西）スライドバスにおいて使用されている基本的２
ポートピケットを用いる。In this first description, in order to achieve a two-dimensional NEWS array, the above-mentioned EW shown in FIG.
(East and West) Basic 2 used in slide bus
Use port pickets.

【００６２】概して、Ｈ−ＤＯＴは、四つ（又はそれ以
上）の異なるピケットから一つのポートを接続させるた
めに、ワイヤドット−ＯＲ（論理和）を用いる。他のド
ット−ＯＲは、複数のピケットの内の一つの他のポート
を、三つ（又はそれ以上）の他のピケットと接続させ
る。図３は、２次元メッシュのための基本的なＨ−ＤＯ
Ｔパターンを示す。In general, H-DOT uses wire dot-OR to connect one port from four (or more) different pickets. Another dot-OR connects one other port of the plurality of pickets with three (or more) other pickets. FIG. 3 shows a basic H-DO for a two-dimensional mesh.
The T pattern is shown.

【００６３】Ｈ−ＤＯＴアプローチの利点のうちの一つ
は、各素子上のピン数が、半分に低減され、かつ２Ｄメ
ッシュにおける種々のパーツ内のリンク数が、従来の２
Ｄ（４ポート）メッシュによって必要とされた数に比較
して半分より多く低減されたことにある。図３は、単一
のＨ−ＤＯＴリンクが二つの垂直バス及び二つの水平パ
スを提供していることを示している。One of the advantages of the H-DOT approach is that the number of pins on each device is reduced by half, and the number of links in various parts in the 2D mesh is reduced to two.
It lies in a reduction of more than half compared to the number required by the D (4 port) mesh. FIG. 3 shows that a single H-DOT link provides two vertical buses and two horizontal paths.

【００６４】Ｈ−ＤＯＴアプローチの他の利点は、拡大
可能性にある。我々は、図４に示されているように、二
重の（ダブル）Ｈ−ＤＯＴを実行することによって、上
記の２−Ｄ（２次元）メッシュから３次元メッシュへ展
開することができる。この場合、３−Ｄアレイ当り四つ
の素子が、一つのＨの型が他のＨの型の頂部に位置する
ダブルＨ（Ｄｏｕｂｌｅ−Ｈ）のように見えるＤＯＴネ
ットワークと接続している。図４の挿入図は、明確にす
るために一つのダブルＨ−ＤＯＴを示している。図４を
よくみると、これらのダブルＨ−ＤＯＴは、各処理素子
が各方向へ各近隣素子と接続し、かつダブルＨ−ＤＯＴ
接続の一体化を保持するために、各次元でスタッガ（互
い違いに）されている。Another advantage of the H-DOT approach is its scalability. We can evolve from the 2-D (2-dimensional) mesh above to a 3-dimensional mesh by performing a double (double) H-DOT as shown in FIG. In this case, four elements per 3-D array connect to a DOT network that looks like a double H (Double-H), with one H type on top of the other H type. The inset of FIG. 4 shows one double H-DOT for clarity. Looking closely at FIG. 4, these double H-DOTs show that each processing element connects with each neighboring element in each direction, and
Staggered (staggered) in each dimension to maintain the integrity of the connection.

【００６５】Ｈ−ＤＯＴによる通信は、２地点間リンク
（相互接続）より多くの通信サイクルを必要とする。２
ＤＨ−ＤＯＴは、特定サイドのピケットへデータを送
るために、各ピケットごとに１又は２サイクルを必要と
する。さらに、ダブルＨ（型）は、各次元において４経
路（パス）を形成し、これによって、特定方向へピケッ
トへデータを送るために、全てのピケットに対して四つ
のサイクルを必要とする。H-DOT communication requires more communication cycles than point-to-point links (interconnects). Two
DH-DOT requires one or two cycles for each picket to send data to the picket on a particular side. In addition, the double H (shape) forms four paths in each dimension, thereby requiring four cycles for every picket to send data to the picket in a particular direction.

【００６６】ルーチング・アルゴリズム Routing algorithm

【００６７】ＮＥＷＳ（北東西南）ネットワークは、画
像形成及び他の規則的に配列されたデータ処理に用いら
れるように設計される。この場合、アレイコントローラ
は、全ての素子に、同一サイドの素子へ同時にデータを
パスするように指示する。The NEWS network is designed to be used for imaging and other regularly arranged data processing. In this case, the array controller directs all the elements to pass data to the elements on the same side at the same time.

【００６８】ＮＥＷＳネットワークは、ランダムの通信
のためにも用いられ得る。この場合、メッセージを初期
化し、メッセージを受け取り、かつメッセージをパスす
るための経路指定アルゴリズムを有するルータ（route
r）を有さなければならない。Ｈ−ＤＯＴアプローチを
用いた場合は、ある素子は、アクティブ（活動）ポート
からメッセージを受け取ることを決定しなければならな
い。図３に関しては、素子Ａが左ポートからメッセージ
を見たならば、素子Ａは、行き先アドレスが以下の場合
に、そのメッセージを受け取る。The NEWS network can also be used for random communication. In this case, a router that has a routing algorithm to initialize the message, receive the message, and pass the message (route
r) must be present. With the H-DOT approach, an element must decide to receive a message from the active port. With respect to FIG. 3, if device A sees the message from the left port, device A receives the message if the destination address is:

【００６９】−メッセージ自体のアドレス（Ｘｄ＝Ｘ
ｏ，Ｙｄ＝Ｙｏ） −ＯＲ（論理和）− −メッシュの上部右側カドランド（四分区間）におい
て、即ち、行き先のＸアドレス＝＞メッセージ自体のアドレ
ス又は、行き先のＹアドレス＝＜メッセージ自体のアドレ
スThe address of the message itself (Xd = X
o, Yd = Yo) -OR (logical sum) --- in the upper right quadland (quarter section) of the mesh, that is, destination X address => address of message itself or destination Y address = <address of message itself

【００７０】素子が位置Ａ，Ｂ，Ｃ，又はＤのいづれに
あるかによって四つの異なるアルゴリズムがあることに
留意されたい。Note that there are four different algorithms depending on whether the element is in position A, B, C, or D.

【００７１】バイナリハイパーキューブ Binary hypercube

【００７２】説明のための次のトポロジ（接続形態）
は、２進ハイパーキューブ（超立方体）を実行するため
にＨ−ＤＯＴを用いる。Next topology for explanation (connection form)
Uses H-DOT to implement a binary hypercube.

【００７３】図５は、４次元の典型的な２進ハイパーキ
ューブを示す。キューブの各素子は各次元ごとに一つの
接続を有する。各次元が二つの値だけ有するので、有効
サイズのアレイは多数の次元を有する。４次元キューブ
はそれぞれ４ポートを有する１６素子を有する。１０２
４素子を有するアレイは、各素子に１０ポートを有する
１０次元の２進（バイナリ）キューブである。FIG. 5 shows a typical 4-dimensional binary hypercube. Each element of the cube has one connection for each dimension. An array of effective size has multiple dimensions because each dimension has only two values. The four-dimensional cube has 16 elements each having 4 ports. 102
An array with 4 elements is a 10-dimensional binary cube with 10 ports for each element.

【００７４】２進のｎ−キューブがＨ−ＤＯＴで実行さ
れる時に、４次元バージョンは図６のように見える。各
素子は二つのポートを有し、かつその通信パートナーに
対して特徴的なＨ−ＤＯＴコンフィギュレーション（配
置構成）（少しずれている）で、ドット−ＯＲバスと接
続される。When the binary n-cube is implemented in H-DOT, the four-dimensional version looks like FIG. Each element has two ports and is connected to the dot-OR bus in a characteristic H-DOT configuration (slightly offset) with respect to its communication partner.

【００７５】Ｈ−ＤＯＴは、素子ごとのポート数を二つ
に限定し、かつ従来の実施例と比べて、ピン・カウント
を大きく減少させていると共に、メッセージが一度移動
する毎に到達し得る行き先のオプション（選択肢）の数
を増加させている。例えば、従来のｎ−キューブにおい
ては、素子0000は、一度移動するごとに、0001、0010、
0100、及び1000などのアドレス可能な処理素子と、四つ
のリンクで通信することができる。Ｈ−ＤＯＴアプロー
チを実行すると、素子0000は、一度移動するごとに、00
01、0100、0101、0010、1000、及び1010などのアドレス
可能な処理素子へ、二つのリンクで通信することができ
る。Ｈ−ＤＯＴで実行される４次元の２進キューブで
は、このように付加的な接続が行なわれた場合に、低い
メッセージ濃度を有する従来の２進キューブに比べて、
メッセージごとの平均移動回数が２５％低減される。従
って、２進キューブのＨ−ＤＯＴを実行することによっ
て、もはや単なる２進のｎ−キューブではないあるもの
を生じることになる。２進のｎ−キューブの定義は、処
理素子が他の処理素子と接続することを示し、かつこの
他の処理素子のアドレスはこの処理素子と１ビット異な
っている。但し、いかなる場合においても、これは２進
ｎ−キューブに他ならない。The H-DOT limits the number of ports for each element to two, and greatly reduces the pin count as compared with the conventional embodiment, and the message can be reached each time the message moves. We are increasing the number of destination options. For example, in the conventional n-cube, the element 0000 is moved to 0001, 0010,
Four links can be used to communicate with addressable processing elements such as 0100 and 1000. When the H-DOT approach is performed, the element 0000 is moved to 00
Two links can be communicated to addressable processing elements such as 01, 0100, 0101, 0010, 1000, and 1010. In a four-dimensional binary cube implemented in H-DOT, when an additional connection is made in this way, compared to a conventional binary cube having a low message density,
The average number of moves per message is reduced by 25%. Thus, performing an H-DOT on a binary cube will yield something that is no longer just a binary n-cube. The binary n-cube definition indicates that a processing element connects to another processing element, and the address of this other processing element is one bit different from this processing element. However, in any case, this is nothing but a binary n-cube.

【００７６】図７においては、２ポートの処理素子を有
するＨ−ＤＯＴで実行される６次元の２進キューブが示
されている。さらに、６４個のいづれかの素子からのメ
ッセージを他のいづれかの素子へ送るための最高移動回
数が２度であることに気付く。かつ各ダブルＨ−ＤＯＴ
は８個の処理素子と接続する。In FIG. 7, a 6-dimensional binary cube implemented in H-DOT with a 2-port processing element is shown. Further, notice that the maximum number of trips to send a message from any of the 64 elements to any of the other elements is 2 degrees. And each double H-DOT
Connect with 8 processing elements.

【００７７】あるポイントにおいては、Ｈ−ＤＯＴは、
大きなワイヤドットネットの問題や、高濃度メッセージ
におけるネット上で衝突が起きる可能性があるために効
果を表さなくなる。これによって、Ｈ−ＤＯＴごとの処
理素子の数が３２個ほどの比較的小型のアレイに対して
の２進のｎ−キューブの適用は限定され得ることにな
る。大きなドット−ＯＲのネットワークにおけるシンキ
ングカレント（受信側電流）が問題となる。この記述は
特に２進キューブに適用されるが、しかしながら、ある
ポイントで、ネット内にトランシーバを挿入することに
よって、これらの問題のうちのいくつかが解消される。
ＮＥＷＳネットワークは、ランダム通信がより多くの中
間ノードを前後左右に動かさなければならないが、いか
に範囲を広げても２次元アレイのままなので、この様な
問題を有さない。At one point, H-DOT is
It becomes ineffective due to problems with large wire dot nets and possible collisions on nets in high density messages. This may limit the applicability of binary n-cubes to relatively small arrays with as few as 32 processing elements per H-DOT. The sinking current (current on the receiving side) in a large dot-OR network poses a problem. This description applies particularly to binary cubes, however, at some point, inserting a transceiver in the net eliminates some of these problems.
The NEWS network does not have such a problem because random communication requires moving more intermediate nodes back and forth and left and right, but no matter how wide the range is, it remains a two-dimensional array.

【００７８】２進ｎ−キューブに対する経路指定アルゴ
リズム Routing Argo for Binary n-Cube
rhythm

【００７９】図６及び図７において、各素子及び各Ｈ−
ＤＯＴが識別されることに留意されたい。上部左側の素
子は、００００の番号が付いており、その素子のリンク
のうちの一つはｘ０ｘ０の番号が付いている。リンクネ
ームは素子に接続された全ての素子の参照である。経路
指定伝送機械化のための受容アルゴリズムはこれらのリ
ンクネームに基づく。6 and 7, each element and each H-
Note that DOT is identified. The top left element is numbered 0000 and one of the element's links is numbered x0x0. The link name is a reference for all devices connected to the device. Acceptance algorithms for routing transmission mechanization are based on these link names.

【００８０】ある素子が、そのリンクのうちの一つ上で
アクティブであるメッセージを見つけた時には、その素
子のうちの一つ、つまり一つだけがそのメッセージを受
容する。我々のアルゴリズムは、信じられない程シンプ
ルであって、「行き先が後側のリンクネームと一致した
時にメッセージを受け取りなさい（'Accept themessage
when the destination matches the back side link n
ame' ）」と表示される。四つの行き先はリンクｘ０ｘ
０と一致する。これらは、それぞれ、0000、1000、001
0、及び1010である。前側と後側のリンクは互いに直交
しており、かつ一つの行き先が一つのリンクのみと一致
することに留意されたい。When an element finds a message that is active on one of its links, only one of the elements, one, will accept the message. Our algorithm is unbelievably simple: "Accept the message when the destination matches the back link name ('Accept the message
when the destination matches the back side link n
ame ') ”is displayed. Four destinations are links x0x
Matches 0. These are 0000, 1000, 001, respectively.
0 and 1010. Note that the front and back links are orthogonal to each other, and one destination matches only one link.

【００８１】アレイの素子がメッセージを受け取ること
を選択した後、行き先アドレスがそのメッセージのアド
レスと一致したならば、この素子はメッセージを保持す
るか、或いは（'else'）、このメッセージをパスする。After an element of the array chooses to receive a message, if the destination address matches the address of that message, then this element holds the message or ('else') it passes this message. .

【００８２】アレイの素子がメッセージを発信する必要
がある時は、行き先アドレスは二つの接続されたＨ−Ｄ
ＯＴネットネームのうちの一つのみと一致する。これが
その上でメッセージを開始するためのポートとなる。When an element of the array needs to send a message, the destination address is two connected HDs.
Matches only one of the OT netnames. This is the port on which you can start messages.

【００８３】以下に２進ｎ−キューブのＨ−ＤＯＴの実
行の要約が記述されている。The following is a summary of the H-DOT implementation of a binary n-cube.

【００８４】[0084]

【表１】 [Table 1]

【００８５】説明すると' ＤＩＭ 'と呼ばれるコラムは
２進ｎ−キューブの次元を示す。’Element'はこの次元
の２進ｎ−キューブを構成するＰＥ即ち処理素子の数で
ある。' Ｌｉｎｋｓ' は、Ｈ−ＤＯＴ相互接続技術を用
いた時に示されている次元の２進ｎ−キューブを実行す
るために必要とされる相互接続されたポートの分離した
ネット又はグループの合計数である。' Ｔａｐｓ／Ｌｉ
ｎｋｓ' （タップ／リンク）は各リンク内のタップ又は
ポートの数を示す。最終的には、最後の二つのコラム
は、全ての素子から一つの特定方向への素子へのデータ
を同時に移動させるために必要とされるサイクルの最大
数と最小数を示す。用語' Ｌｉｎｋ' （リンク）は処理
素子（ＰＥ）を結合するＤＯＴＴｅｄ（ドット表示され
た）ネットワークのうちの一つに関する。To explain, the column called'DIM 'indicates the dimension of the binary n-cube. 'Element' is the number of PEs or processing elements that make up the binary n-cube of this dimension. 'Links' is the total number of separate nets or groups of interconnected ports needed to implement a binary n-cube of the dimensions shown when using the H-DOT interconnect technique. is there. '' Taps / Li
nks' (tap / link) indicates the number of taps or ports in each link. Finally, the last two columns show the maximum and minimum number of cycles required to move data from all elements to an element in one particular direction simultaneously. The term'Link 'refers to one of the DOTTed networks that combine processing elements (PEs).

【００８６】ベース４ｎ−キューブ Base 4 n-cube

【００８７】より高位のＮ−キューブの実行に関心が高
まっている。相互接続の経済性がこの関心を推進してい
る。我々のＨ−ＤＯＴ概念はよい高位のＮ−キューブと
正確に適合する（メッシュする）。簡単化のために、最
初にフル（完全）実施、次に、スパース（疎）実施のベ
ース４ｎ−キューブを説明し、最終的にＨ−ＤＯＴを
用いて実行されるスパース（疎）ベース４ｎ−キュー
ブに付いて説明する。There is increasing interest in implementing higher order N-cubes. The economics of interconnection drive this concern. Our H-DOT concept matches (mesh) exactly with good high-order N-cubes. For the sake of simplicity, we first describe the base 4 n-cube of a full implementation, then a sparse implementation, and finally a sparse base 4 implemented with H-DOT. The n-cube will be described.

【００８８】図８は、スタンダードベース４ｎ−キュ
ーブを示す。各処理素子は、そのアドレスが、１ディジ
ット（桁）を除く全てにおいて他の処理素子のアドレス
と一致した場合、他の処理素子と接続する。FIG. 8 shows a standard base 4 n-cube. Each processing element connects to another processing element if its address matches the address of the other processing element in all but one digit.

【００８９】２次元ベース４ｎ−キューブは以下の１
６個の素子を有している:00、01、02、03、10、11、1
2、13、20、21、22、23、30、31、32、及び33。ベース
４ｎ−キューブにおいては、素子00は、00、01、02、
03、10、20、及び30と直接接続する。このルールは他の
処理素子にも適用される。従って、各処理素子は６つの
ポートを有する。The two-dimensional base 4 n-cube has the following 1
It has 6 elements: 00, 01, 02, 03, 10, 11, 1.
2, 13, 20, 21, 22, 23, 30, 31, 32, and 33. In the base 4 n-cube, the element 00 is 00, 01, 02,
Connects directly with 03, 10, 20, and 30. This rule also applies to other processing elements. Therefore, each processing element has 6 ports.

【００９０】スパース（疎）ベース４ｎ−キューブ Sparse Base 4 n-Cube

【００９１】我々は、２次元的ベース４ｎ−キューブ
の疎性についてのアイデアを導入する。リンクの数を減
少させ、かつアレイの実行をより簡単にするために、あ
る設計者はリンクからいくつかを単純に取り除くことを
提案した。これは、その次元において二つの隣接する処
理素子（ＰＥ）と一致する一つのディジットを除く全て
を有する素子（ＰＥ）との接続を制限することによって
行なわれる。上記の例では、従って、ＰＥ00は、 01 、
03、及び10、及び13に限定されて接続される。図９はこ
れを示す。We introduce the idea of the sparsity of a two-dimensional base 4 n-cube. To reduce the number of links and make the array easier to implement, one designer proposed simply removing some from the links. This is done by limiting the connection with an element (PE) that has all but one digit that matches two adjacent processing elements (PE) in that dimension. In the example above, therefore, PE00 is 01,
Limited to 03, 10 and 13, and connected. FIG. 9 illustrates this.

【００９２】Ｈ−ＤＯＴ実行ベース４ｎ−キューブ H-DOT Execution Base 4 n-Cube

【００９３】我々は、２ポートＰＥを用いてＨ−ＤＯＴ
機械化において疎２次元ベース４ｎキューブを実行す
る。この結果が図１０に示されている。これが図８の実
施に比べると、ポートとネットワーク数の大幅な簡素化
である。当然、経路が共有され、かつもっと多くのサイ
クルが必要とされることは注意されたい。このアレイ
は、ベース２Ｎ−キューブの場合に示されたように、
容易に、もっと多くの次元に拡大され得る。We used H-DOT with a 2-port PE.
Implement a sparse two-dimensional base 4n cube in mechanization. The result is shown in FIG. This is a great simplification of the number of ports and networks compared to the implementation of FIG. Note, of course, that routes are shared and more cycles are needed. This array, as shown for the base 2 N-cube,
It can easily be extended to more dimensions.

【００９４】Ｈ−ＤＯＴ８進Ｎ−キューブ H-DOT Octal N-Cube

【００９５】２進及びベース４ｎ−キューブの近似的
相対キューブは、アレイの各ストリップにおいて２個又
は４個の素子ではなく、８個の素子を有する８進ｎ−キ
ューブである。図１１は、従来の方法で接続された疎２
次元８進ｎ−キューブを示す。２次元８進アレイの素子
は、一方の接続はプラスであり、他方の接続はマイナス
の、ペア接続を各次元ごとに有する。他のペアは、２進
キューブにおけるように、各付加的次元ごとに付け加え
られる。An approximate relative cube of binary and base 4 n-cubes is an octal n-cube with 8 elements instead of 2 or 4 elements in each strip of the array. FIG. 11 shows a sparse 2 connection made by a conventional method.
A dimension octal n-cube is shown. Elements of a two-dimensional octal array have a pair connection, one connection is positive and the other connection is negative, for each dimension. Other pairs are added for each additional dimension, as in the binary cube.

【００９６】図１２は、疎２次元８進ｎ−キューブと等
価であるＨ−ＤＯＴを示す。減少したピンカウント、高
められた接続性、及び（範囲の）拡大可能性を含む従来
のこの技術における利点の全てが適合する。FIG. 12 shows an H-DOT equivalent to a sparse two-dimensional octal n-cube. All of the advantages of this prior art, including reduced pin count, increased connectivity, and expandability (of range) are compatible.

【００９７】以下は、８進Ｈ−ＤＯＴの実行のまとめた
ものである。The following is a summary of the implementation of octal H-DOT.

【００９８】[0098]

【表２】 [Table 2]

【００９９】Ｈ−ＤＯＴ相互接続トポロジはより高い基
数（ベース）を有するｎ−キューブに用いられることに
特に適している。８進法の基数によって、４次元で４０
９６個の素子のアレイに到達することができる。経路の
衝突にもかかわらず、距離は短いままである。２次元の
８進ｎ−キューブの任意の二つの素子同士の間の最適な
距離は、従来のトポロジの場合は８であり、Ｈ−ＤＯＴ
トポロジの場合は４である。これが２だけ改良された要
因である。The H-DOT interconnect topology is particularly suitable for use in n-cubes with higher bases. 40 in 4 dimensions, based on octal radix
An array of 96 elements can be reached. The distance remains short despite the path collisions. The optimal distance between any two elements of a two-dimensional octal n-cube is 8 for conventional topologies and H-DOT
It is 4 in the case of topology. This is a factor that is improved by 2.

【０１００】Ｈ−ＤＯＴ８進ｎ−キューブ・ルーチング
アルゴリズム H-DOT Octal n-Cube Routing
algorithm

【０１０１】Ｎ−キューブトポロジは、同期化技術とは
反対に、一般に、経路指定されたメッセージ伝送技術を
用いる。異なる基数のｎ−キューブについては説明し、
かつ疎性と、疎８進ｎキューブの我々のＨ−ＤＯＴ実行
を紹介した。N-Cube topologies generally use routed message transmission techniques as opposed to synchronization techniques. We will discuss different radix n-cubes,
And we introduced sparseness and our H-DOT implementation of sparse octal n-cubes.

【０１０２】我々は、多次元８進ｎ−キューブのための
ルーチングアルゴリズムについて記述する。ルーチング
アルゴリズムには三つのパートがある。メッセージを初
期化し、メッセージを伝送し、メッセージを受け取り、
かつメッセージを保持することである。疎性を用いると
きにはコストがかかることに留意されたい。コストと
は、アドレスディジットが一致するように選択された後
に、ターゲットの処理素子に到達するために追加される
転送に掛かる経費である。We describe a routing algorithm for multidimensional octal n-cubes. The routing algorithm has three parts. Initialize message, send message, receive message,
And to hold the message. Note that using sparsity is costly. Cost is the cost of the additional transfer to reach the target processing element after the address digits have been selected to match.

【０１０３】１． INITIATE（初期化）：ＰＥは所与
の一致しないディジットにおいて行き先に最も近いポー
トを用いてメッセージを初期化する。図１２に関して
は、処理素子（２２）は処理素子（３４）のメッセージ
を有している。処理素子はポートから処理素子（２３）
へ向けてメッセージを送る。これによって、ＤｅｓｔＸ
が素子Ｘより大きな場合、又はＤＸ＝ＥＸでありかつＤ
Ｙ＞ＥＹである場合は、処理素子（４２）の右側のポー
トが用いられる。（Ｄは行き先、Ｅはこの素子を示す）２． ACCEPT（受け取り) ：アドレスが処理素子と一
致した場合、或いは処理素子が行き先により近接するメ
ッセージを移動させることができる場合に、メッセージ
が受け取られる。従って、処理素子（４２）は、ＤＸ＝
＞ＥＸであり、ＤＹ＝＜ＥＸである場合は、左側からメ
ッセージを受け取る。３． KEEP（保持）：メッセージがその処理素子のも
のならば、メッセージを保持するが、さもなければ、他
のポートへメッセージをパスする。1. INITIATE: The PE initializes the message with the port closest to the destination on a given unmatched digit. With respect to FIG. 12, processing element (22) has the message of processing element (34). Processing element is from port to processing element (23)
Send a message to. This allows DestX
Is larger than the element X, or DX = EX and D
If Y> EY then the right port of the processing element (42) is used. (D indicates the destination, E indicates this element) ACCEPT: A message is received if the address matches the processing element or if the processing element can move the message closer to the destination. Therefore, the processing element (42) is DX =
If> EX and DY = <EX, then receive the message from the left. 3. KEEP: Keep the message if the message is for that processing element, otherwise pass the message to another port.

【０１０４】メッシュの経路指定を用いた場合と同様
に、Ｈ−ＤＯＴ上の処理素子の位置によって経路指定基
準の四つの異なるセットがある。As with mesh routing, there are four different sets of routing criteria depending on the location of processing elements on the H-DOT.

【０１０５】図１３は、２次元Ｈ−ＤＯＴアレイによっ
て我々が先に述べた並列アレイプロセッサのブロック図
を示すが、図１４は、我々が先に述べたピケット（ＰＭ
Ｅ）を示す。関連出願についての付加的な詳細によって
上記の図を用いて既に記述したように、参照することに
よって組み込まれている記載内容について反復する必要
はない。このシステムは、ＳＩＭＤ（単数命令複数デー
タ）／ＭＩＭＤ（複数命令複数データ）の操作特性を提
供する並列アレイプロセッサである。ピケットＰＭＥ
（プロセッサメモリ素子）は全操作システムを包合する
必要はないが、ＭＩＭＤモードで独立して機能すること
ができ、またダイナミックスイッチングによりＳＩＭＤ
モードでも機能することができる。各ピケットは先に記
述した場合と、その他の場合の配置構成を有している。FIG. 13 shows a block diagram of the parallel array processor we previously described with a two-dimensional H-DOT array, while FIG. 14 shows the picket (PM) we previously described.
E) is shown. It is not necessary to repeat the description incorporated by reference, as already described with the above figures by means of additional details about the related application. The system is a parallel array processor that provides SIMD (single instruction multiple data) / MIMD (multiple instruction multiple data) operating characteristics. Picket PME
The (processor memory device) does not have to embed the entire operating system, but can function independently in MIMD mode, and SIMD by dynamic switching.
It can also work in mode. Each picket has the arrangement configuration described above and the other configurations.

【０１０６】[0106]

【発明の効果】本発明は、ネットワーク実施における規
模を縮小するためのＨ−ＤＯＴアプローチを提供するこ
とにある。The present invention seeks to provide an H-DOT approach for reducing the scale of network implementations.

[Brief description of drawings]

【図１】２ポートの処理素子（ＰＥ）を普通利用する従
来の線形配列を示す図である。FIG. 1 shows a conventional linear array that typically utilizes a two-port processing element (PE).

【図２】四つの隣接処理素子（ＰＥ）の各々と、２地点
間リンクによって、従来の方法で接続されたＮＥＷＳネ
ットワークを示す図である。FIG. 2 shows a NEWS network conventionally connected by four point-to-point links with each of four adjacent processing elements (PEs).

【図３】Ｈ−ＤＯＴ接続技術を用いたＮＥＷＳネットワ
ークの実行を示す図である。（処理素子（ＰＥ）が４ポ
ートではなく２ポートであることに留意されたい。）FIG. 3 is a diagram showing the implementation of a NEWS network using H-DOT connection technology. (Note that the processing element (PE) is 2 ports rather than 4 ports.)

【図４】Ｈ−ＤＯＴ接続技術を用いた３次元メッシュの
実行を示す図である。（処理素子が、従来の３Ｄメッシ
ュアレイで使用された従来の６ポート処理素子ではなく
２ポートであることに留意されたい。）FIG. 4 is a diagram showing execution of a three-dimensional mesh using the H-DOT connection technique. (Note that the processing elements are 2-port rather than the traditional 6-port processing elements used in conventional 3D mesh arrays.)

【図５】４ポート処理素子で実行される４次元の従来の
２進Ｎ−キューブを示す図である。FIG. 5 illustrates a 4-dimensional conventional binary N-cube implemented on a 4-port processing element.

【図６】Ｈ−ＤＯＴ接続技術を用いた４次元２進ｎ−キ
ューブの実行を示す図である。（処理素子が、図５の４
ポート処理素子ではなく２ポートであることに留意され
たい。）FIG. 6 shows the implementation of a 4-dimensional binary n-cube using the H-DOT connection technique. (The processing element is 4 in FIG.
Note that it is a 2-port rather than a port processing element. )

【図７】Ｈ−ＤＯＴ接続技術を用いた６次元値ｎ−キュ
ーブの実行を示す図である。（処理素子が、従来のｎ−
キューブの実行で用いられた６ポート処理素子ではなく
２ポートであることに留意されたい。）FIG. 7 shows the implementation of a 6-dimensional value n-cube using the H-DOT connection technique. (The processing element is a conventional n-
Note that it is 2 ports rather than the 6 port processing element used in the cube implementation. )

【図８】従来のベース４ｎ−キューブを示す図である。
（第１の処理素子とは異なる一つのベース４アドレスデ
ィジットを有する全ての処理素子への経路がある。）FIG. 8 is a diagram showing a conventional base 4n-cube.
(There is a path to all processing elements that have one base 4 address digit different from the first processing element.)

【図９】疎ベース４ｎ−キューブを示す図である。（第
１の処理素子と１アドレスのみ異なる一つのベース４ア
ドレスディジットを有する処理素子のみへの経路があ
る。）FIG. 9 shows a sparse base 4n-cube. (There is a path only to processing elements that have one base 4 address digit that differs from the first processing element by only one address.)

【図１０】Ｈ−ＤＯＴ接続技術によって実行される疎ベ
ース４ｎ−キューブを示す図である。（処理素子はそれ
ぞれ２ポートを有している。）FIG. 10 shows a sparse base 4n-cube implemented by the H-DOT connection technique. (Each processing element has two ports.)

【図１１】疎ｎ−キューブアイデアをベース８へケタ上
げし、かつこの種のトポロジを示す図である。FIG. 11 is a diagram showing a sparse n-cube idea digitized to base 8 and showing this type of topology.

【図１２】Ｈ−ＤＯＴ接続技術を用いて実行される疎８
進ｎ−キューブの実行を示す図である。（処理素子は２
ポートを有する。）FIG. 12: Sparse 8 performed using H-DOT connection technology
FIG. 6 is a diagram showing execution of a binary n-cube. (2 processing elements
Have a port. )

【図１３】２−ＤＨ−ＤＯＴアレイを有する上記の並
列アレイプロセッサのブロック図である。FIG. 13 is a block diagram of the above parallel array processor with a 2-D H-DOT array.

【図１４】上記のピケット処理素子を示す図である。FIG. 14 is a view showing the picket processing element.

───────────────────────────────────────────────────── フロントページの続き (72)発明者ピーターマイケルコッジアメリカ合衆国13760、ニューヨーク州エンディコット、ドーチェスタードライヴ７ ─────────────────────────────────────────────────── —————————————————————————————————————————————————————————–——————————————— PrinceMichaelCoggs / DorchesterDrive7, Endicott, NY 7760

Claims

[Claims]

1. A parallel SIMD or MIMD array processor communication network comprising a plurality of processing elements interconnected in an array topology that mechanizes interconnections between processor array elements with an interconnection network configuration, the method comprising: Interconnecting a processing element with each subsequent processing element in each of a plurality of directions by an apparent H-shaped connection that is a link that provides two paths to the next processing element in each dimension, depending on the interconnection network configuration An array processor communication network comprising allowing.

2. The array processor communication network of claim 1, wherein the processing elements are connected to neighboring processing elements via ORdot network connections.

3. The array processor communication network of claim 1, wherein the communication control for interconnection within the array processor can be of the synchronous type or the routing type.

4. The array processor communication network of claim 1, wherein the array can be expanded in area or size by adding to an existing interconnection network or by adding additional networks.