JPH0675931A

JPH0675931A - Computer system

Info

Publication number: JPH0675931A
Application number: JP5111877A
Authority: JP
Inventors: Norman Barker Thomas; トマス・ノーマン・バーカー; Clive A Collins; クライブ・アラン・コリンズ; Michael C Dapp; マイケル・チャールズ・ダップ; James W Dieffenderfer; ジェームズ・ウォレン・ディーフェンデルファー; Donald G Grice; ドナルド・ジョージ・グライス; J Knowles Billy; ビリー・ジャック・ノウルズ; Michael Lesmeister Donald; ドナルド・マイケル・レスマイスター; Richard Edward Nier; リチャード・エドワード・ニア; Eugene Retter Elic; エリー・ユージン・レター; B Rolfe David; デイヴィッド・ブルース・ロルフ
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1992-05-22
Filing date: 1993-05-13
Publication date: 1994-03-18
Anticipated expiration: 2011-10-30
Also published as: JP2549241B2

Abstract

PURPOSE: To provide a method for switching a network connection by allowing a high speed input and output for a multi-PME computer system to break in the network connection. CONSTITUTION: This system connection (gripper) is used in a system in which data are inputted and outputted in the network of mutually connected nodes, and the nodes are mutually connected in the form of mesh, ring, or returning tourus. As a result, no edge to the network is present, and a gripper mechanism logically breaks a ring along a dimension at a right angle to the ring so that the edge to the network can be established. Thus, the network can be dynamically toggled between the network without any edge and the network with the edge.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、密並列のプロセッサ及
びアーキテクチャに関し、詳細には、処理要素のアレイ
にデータを出し入れすることに関する。FIELD OF THE INVENTION This invention relates to densely parallel processors and architectures, and more particularly to moving data in and out of an array of processing elements.

【０００２】[0002]

【従来の技術】はじめに、本明細書で用いられる用語に
ついて説明する。2. Description of the Related Art First, terms used in this specification will be described.

【０００３】・ＡＬＵＡＬＵとは、プロセッサの演算論理回路部分である。ALU ALU is the arithmetic logic circuit part of the processor.

【０００４】・アレイアレイとは、１次元または多次元における要素のアレイ
を指す。アレイは、順番に並べた１組のデータ項目（ア
レイ要素）を含むことができるが、ＦＯＲＴＲＡＮのよ
うな言語では、それらのデータ項目は単一の名前で識別
される。他の言語では、順番に並べた１組のデータ項目
の名前は、すべて同じ属性を持つ順番に並べた１組のデ
ータ要素を指す。プログラム・アレイでは、一般に数ま
たは次元属性によって次元が指定される。アレイの宣言
子でアレイの各次元のサイズを指定する言語もあり、ア
レイがテーブル内の要素のアレイとなっている言語もあ
る。ハードウェア的な意味では、アレイは、大規模並列
アーキテクチャにおいて全体として同一な構造（機能要
素）の集合体である。データ並列コンピュータ処理にお
けるアレイ要素は、動作を割り当てることができ、並列
状態のとき、それぞれ独立にかつ並列に必要な動作を実
行できる要素である。一般に、アレイは処理要素の格子
と考えることができる。アレイの各セクションに区分デ
ータを割り当てることにより、区分データを規則的な格
子パターン内で移動することができる。ただし、データ
に索引を付け、あるいはデータをアレイ中の任意の位置
に割り当てることが可能である。Array An array refers to an array of elements in one or more dimensions. An array can include an ordered set of data items (array elements), but in languages such as FORTRAN, those data items are identified by a single name. In other languages, the name of an ordered set of data items refers to an ordered set of data elements that all have the same attributes. In program arrays, dimensions are generally specified by number or dimension attributes. In some languages, array declarators specify the size of each dimension of the array, and in some languages the array is an array of elements in a table. In a hardware sense, an array is a collection of structures (functional elements) that are the same in a massively parallel architecture as a whole. An array element in data parallel computer processing is an element to which an operation can be assigned and which can execute a required operation independently and in parallel in a parallel state. In general, an array can be thought of as a grid of processing elements. By assigning partition data to each section of the array, the partition data can be moved in a regular grid pattern. However, it is possible to index the data or assign the data to any location in the array.

【０００５】・アレイ・ディレクタアレイ・ディレクタとは、アレイの制御プログラムとし
てプログラミングされる単位である。アレイ・ディレク
タは、アレイとしてアレイされた機能要素のグループの
マスタ制御プログラムとしての機能を果す。Array Director The array director is a unit programmed as an array control program. The array director serves as a master control program for a group of functional elements arrayed as an array.

【０００６】・アレイ・プロセッサアレイ・プロセッサには主として、複数命令複数データ
方式（ＭＩＭＤ）と単一命令複数データ方式（ＳＩＭ
Ｄ）との２種類がある。ＭＩＭＤアレイ・プロセッサで
は、アレイ中の各処理要素が、それ自体のデータを使っ
てそれ自体の固有の命令ストリームを実行する。ＳＩＭ
Ｄアレイ・プロセッサでは、アレイ中の各処理要素が、
共通の命令ストリームを介して同一の命令に限定され
る。ただし、各処理要素に関連するデータは固有であ
る。本発明の好ましいアレイ・プロセッサには他にも特
徴がある。本明細書では、これをＡＰＡＰと呼び、ＡＰ
ＡＰという略語を使用する。Array Processor Array processors mainly include multiple instruction multiple data scheme (MIMD) and single instruction multiple data scheme (SIM).
There are two types, D). In a MIMD array processor, each processing element in the array uses its own data to execute its own unique instruction stream. SIM
In a D array processor, each processing element in the array is
Limited to the same instruction via a common instruction stream. However, the data associated with each processing element is unique. There are other features of the preferred array processor of this invention. In this specification, this is referred to as APAP, and AP
The abbreviation AP is used.

【０００７】・非同期非同期とは、規則的な時間関係がないことである。すな
わち、各機能の実行間の関係が予測不能であり、各機能
の実行間に規則的または予測可能な時間関係が存在しな
い。制御状況では、制御プログラムは、データが、アド
レスされている遊休要素を待っているとき、制御が渡さ
れる位置にアドレスする。このため、諸操作が、どの事
象とも時間が一致しないのに順序通りのままとなる。Asynchronous Asynchronous means that there is no regular time relationship. That is, the relationship between the executions of each function is unpredictable, and there is no regular or predictable time relationship between the executions of each function. In a control situation, the control program addresses the location to which control is passed when data is waiting for the idle element being addressed. This keeps the operations in order, even though they do not match the time of any event.

【０００８】・ＢＯＰＳ／ＧＯＰＳＢＯＰＳまたはＧＯＰＳは、１秒当たり１０億回の動作
という同じ意味の略語である。ＧＯＰＳを参照された
い。BOPS / GOPS BOPS or GOPS is an abbreviation for the same meaning of 1 billion operations per second. See GOPS.

【０００９】・回線交換／蓄積交換これらの用語は、ノードのネットワークを介してデータ
・パケットを移動するための２つの機構を指す。蓄積交
換は、データ・パケットを各中間ノードで受信し、その
メモリに格納してから、その宛先に向かって転送する機
構である。回線交換は、中間ノードに、その入力ポート
を出力ポートに論理的に接続するよう指令して、データ
・パケットが、中間ノードのメモリに入らずに、ノード
を直接通過して宛先に向かうことができるようにする機
構である。Circuit Switching / Store-and-Switch These terms refer to two mechanisms for moving data packets through a network of nodes. Store-and-forward is a mechanism by which a data packet is received at each intermediate node, stored in its memory, and then forwarded towards its destination. Circuit switching directs an intermediate node to logically connect its input port to an output port so that data packets do not go into the intermediate node's memory but go directly through the node to its destination. It is a mechanism that enables it.

【００１０】・クラスタクラスタとは、制御ユニット（クラスタ制御装置）と、
それに接続されたハードウェア（端末、機能ユニット、
または仮想構成要素）とから成るステーション（または
機能ユニット）である。本明細書では、クラスタは、ノ
ード・アレイとも称するプロセッサ・メモリ要素（ＰＭ
Ｅ）のアレイを含む。通常、クラスタは５１２個のＰＭ
Ｅ要素を有する。Cluster A cluster is a control unit (cluster control device),
Hardware connected to it (terminals, functional units,
Or a virtual component) and a station (or functional unit). As used herein, a cluster is a processor memory element (PM), also referred to as a node array.
E) of the array. Normally, a cluster has 512 PM
It has an E element.

【００１１】本発明の全ＰＭＥノード・アレイは、それ
ぞれ１つのクラスタ制御装置（ＣＣ）によってサポート
される１組のクラスタから成る。The entire PME node array of the present invention consists of a set of clusters each supported by one Cluster Controller (CC).

【００１２】・クラスタ制御装置クラスタ制御装置とは、それに接続された複数の装置ま
たは機能ユニットの入出力動作を制御する装置である。
クラスタ制御装置は通常、ＩＢＭ３６０１金融機関通
信制御装置におけるように、該ユニットに格納され、そ
こで実行されるプログラムの制御を受けるが、ＩＢＭ
３２７２制御装置におけるように、ハードウェアで完全
に制御可能である。Cluster control device A cluster control device is a device that controls the input / output operations of a plurality of devices or functional units connected to it.
The cluster controller is typically under the control of a program stored in, and executing on, the unit, as in the IBM 3601 financial institution communication controller.
It is fully controllable in hardware, as in the 3272 controller.

【００１３】・クラスタ・シンクロナイザクラスタ・シンクロナイザとは、あるクラスタのすべて
または一部分の動作を管理して、諸要素の同期動作を維
持し、各機能ユニットがプログラムの実行と特定の時間
関係を維持できるようにする機能ユニットである。Cluster Synchronizer A cluster synchronizer manages the operation of all or a portion of a cluster to maintain the synchronized operation of the elements so that each functional unit can maintain a specific time relationship with the execution of the program. It is a functional unit.

【００１４】・制御装置制御装置とは、相互接続ネットワークのリンクを介した
データおよび命令の伝送を指令する装置である。制御装
置の動作は、制御装置が接続されたプロセッサによって
実行されるプログラム、または制御装置内で実行される
プログラムによって制御される。Control Device A control device is a device that directs the transmission of data and commands over the links of an interconnection network. The operation of the control device is controlled by a program executed by a processor to which the control device is connected or a program executed in the control device.

【００１５】・ＣＭＯＳＣＭＯＳとは、相補型金属酸化膜半導体技術の略語であ
る。これは、ダイナミック・ランダム・アクセス・メモ
リ（ＤＲＡＭ）の製造に広く使用されている。ＮＭＯＳ
は、ダイナミック・ランダム・アクセス・メモリの製造
に使用されるもう１つの技術である。本発明では相補型
金属酸化膜半導体の方を使用するが、拡張並列アレイ・
プロセッサ（ＡＰＡＰ）の製造に使用する技術によっ
て、使用される半導体技術の範囲が制限されることはな
い。CMOS CMOS is an abbreviation for complementary metal oxide semiconductor technology. It is widely used in the manufacture of dynamic random access memory (DRAM). NMOS
Is another technique used in the manufacture of dynamic random access memories. In the present invention, the complementary metal oxide semiconductor is used.
The technology used to manufacture the processor (APAP) does not limit the scope of semiconductor technology used.

【００１６】・ドッティングドッティングとは、物理的な接続によって３本以上のリ
ード線を結合することを指す。たいていのバックパネル
・バスではこの接続方法を使用している。この用語は、
過去のＯＲＤＯＴＳと関係があるが、ここでは、非常
に単純なプロトコルによってバス上に結合できる複数の
データ源を識別するのに使用する。Dotting refers to joining three or more lead wires by physical connection. Most backpanel buses use this connection method. This term
Related to the OR DOTS of the past, it is used here to identify multiple data sources that can be coupled onto the bus by a very simple protocol.

【００１７】本発明における入出力ジッパの概念を用い
て、あるノードに入る入力ポートが、あるノードから出
る出力ポート、またはシステム・バスからくるデータに
よって駆動できるという概念を実施することができる。
逆に、あるノードから出力されるデータは、別のノード
およびシステム・バスへの入力として使用できる。シス
テム・バスと別のノードへのデータ出力は、同時には実
行されず、別のサイクルで実行されることに留意された
い。The concept of I / O zippers in the present invention can be used to implement the concept that an input port entering a node can be driven by an output port exiting a node or by data coming from the system bus.
Conversely, the data output from one node can be used as an input to another node and system bus. It should be noted that the data output to the system bus and another node are not executed at the same time but in another cycle.

【００１８】ドッティングは、それを利用することによ
り２ポート式のＰＥまたはＰＭＥまたはピケットを様々
な編成のアレイに使用できる、Ｈ−ＤＯＴの議論で使用
されている。２次元メッシュおよび３次元メッシュ、ベ
ース２Ｎキューブ、スパース・ベース４Ｎキューブ、ス
パース・ベース８Ｎキューブを含めて、いくつかのトポ
ロジーが議論されている。Dotting is used in the discussion of H-DOT, by which the two-port PE or PME or picket can be used for arrays of various configurations. Several topologies have been discussed, including 2D and 3D meshes, base 2N cubes, sparse base 4N cubes, sparse base 8N cubes.

【００１９】・ＤＲＡＭＤＲＡＭとは、コンピュータが主記憶装置として使用す
る共通記憶装置であるダイナミック・ランダム・アクセ
ス・メモリの略語である。ただし、ＤＲＡＭという用語
は、キャッシュとして、または主記憶装置ではないメモ
リとして使用するのにも適用できる。DRAM DRAM is an abbreviation for dynamic random access memory, which is a common storage device used by a computer as a main storage device. However, the term DRAM is also applicable to use as a cache or memory that is not main memory.

【００２０】・浮動小数点浮動小数点数は、固定小数部すなわち小数部と、約束上
の基数または基底に対する指数部の２つの部分で表され
る。指数は、１０進小数点の実際の位置を示す。典型的
な浮動小数点の表記法では、実数０．０００１２３４は
０．１２３４−３と表される。ここで、０．１２３４は
小数部であり、−３は指数である。この例では、浮動小
数点基数または基底は１０であり、暗示的な１より大き
な正の固定整数基底を表す。浮動小数点表示で明示的に
示される、あるいは浮動小数点表示で指数部で表される
指数でこの基底をべき乗し、次に小数部を掛けると、表
される実数が求められる。数字リテラルは、浮動小数点
表記法で表すことも実数で表すこともできる。Floating-Point Floating-point numbers are represented in two parts: a fixed-point part, or fractional part, and an exponent part to a radix or base on the promise. The exponent indicates the actual position of the decimal point. In typical floating point notation, the real number 0.0001234 is represented as 0.1234-3. Here, 0.1234 is a decimal part and -3 is an exponent. In this example, the floating point radix or base is 10, representing an implicit fixed integer base greater than one. The exponent represented explicitly in the floating-point representation, or represented in the exponent in the floating-point representation, is raised to the power of this base and then multiplied by the fractional part to obtain the represented real number. Numeric literals can be represented either in floating point notation or in real numbers.

【００２１】・ＦＬＯＰＳこの用語は、１秒当たりの浮動小数点命令数を指す。浮
動小数点演算には、ＡＤＤ（加算）、ＳＵＢ（減算）、
ＭＰＹ（乗算）、ＤＩＶ（除算）と、しばしばその他の
多くの演算が含まれる。１秒当たり浮動小数点命令数と
いうパラメータは、しばしば加算命令または乗算命令を
使って算出され、一般に５０／５０ミックスとみなすこ
とができる。演算には、指数部、小数部の生成と、必要
な小数部の正規化が含まれる。本発明では、３２ビット
または４８ビットの浮動小数点フォーマットを扱うこと
ができる（これより長くてもよいが、そのようなフォー
マットはミックスではカウントしなかった）。浮動小数
点演算を固定小数点命令（正規またはＲＩＳＣ）で実施
する際には、複数の命令が必要である。性能を計算する
際に１０対１の比率を使用する人もあれば、比率を６．
２５にした方が適切であることを示す研究もある。アー
キテクチャごとに比率が異なる。FLOPS This term refers to the number of floating point instructions per second. Floating point operations include ADD (addition), SUB (subtraction),
It involves MPY (multiplication), DIV (division) and often many other operations. The parameter of floating point instructions per second is often calculated using add or multiply instructions and can generally be considered a 50/50 mix. The calculation includes generation of an exponent part and a decimal part, and normalization of a necessary decimal part. The present invention can handle 32-bit or 48-bit floating point formats (though it may be longer, but such formats were not counted in the mix). Multiple instructions are required when performing floating point operations with fixed point instructions (regular or RISC). Some people use a 10: 1 ratio when calculating performance, and a ratio of 6: 1.
Some studies have shown that 25 is more appropriate. Different architectures have different ratios.

【００２２】・機能ユニット機能ユニットとは、ある目的を達成できる、ハードウェ
ア、ソフトウェア、あるいはその両方のエンティティで
ある。Functional Unit A functional unit is a hardware, software, or both entity that can achieve a certain purpose.

【００２３】・ＧバイトＧバイトとは１０億バイトを指す。Ｇバイト／秒は、１
秒当たり１０億バイトということになる。G bytes G bytes refer to 1 billion bytes. 1 Gbyte / sec
That's 1 billion bytes per second.

【００２４】・ＧＩＧＡＦＬＯＰＳ１秒当たり１０⁹個の浮動小数点命令GIGAFLOPS 10 ⁹ floating point instructions per second

【００２５】・ＧＯＰＳおよびＰＥＴＡＯＰＳＧＯＰＳまたはＢＯＰＳは、１秒当たり１０億回の演算
という同じ意味を持つ。ＰＥＴＡＯＰＳは、現在のマシ
ンの潜在能力である１秒当たり１兆回の演算という意味
である。本発明のＡＰＡＰマシンでは、これらの用語
は、１秒当たり１０億個の命令数を意味するＢＩＰ／Ｇ
ＩＰとほぼ同じである。１つの命令で複数の演算（すな
わち、加算と乗算の両方）を実行できるマシンもある
が、本発明ではそのようにはしない。また、１つの演算
を実行するのに多数の命令を要する場合もある。たとえ
ば、本発明では複数の命令を使って、６４ビット演算を
実行している。しかし、演算をカウントする際、対数演
算のカウントは行わなかった。性能を記述するにはＧＯ
ＰＳを使用する方が好ましいが、それを一貫して使うこ
とはしなかった。ＭＩＰ／ＭＯＰ、その上の単位として
ＢＩＰ／ＢＯＰ、およびＭｅｇａＦＬＯＰＳ／Ｇｉｇａ
ＦＬＯＰＳ／ＴｅｒａＦＬＯＰＳ／ＰｅｔａＦＬＯＰＳ
が使用される。GOPS and PETAOPS GOPS or BOPS have the same meaning of 1 billion operations per second. PETAOPS means 1 trillion operations per second, which is the potential of current machines. In the APAP machine of the present invention, these terms are BIP / G which means 1 billion instructions per second.
It is almost the same as IP. While some machines can perform multiple operations (ie, both add and multiply) with a single instruction, the present invention does not. Also, it may take many instructions to perform one operation. For example, the present invention uses multiple instructions to perform 64-bit operations. However, when counting the calculation, the logarithmic calculation was not performed. GO to describe performance
Preference was given to using PS, but it was not used consistently. MIP / MOP, BIP / BOP as units above it, and MegaFLOPS / Giga
FLOPS / TeraFLOPS / PetaFLOPS
Is used.

【００２６】・ＩＳＡＩＳＡとは、ＳｅｔＡｒｃｈｉｔｅｃｔｕｒｅ（アー
キテクチャ設定）命令を意味する。ISA ISA means a Set Architecture (architecture setting) instruction.

【００２７】・リンクリンクとは、物理的または論理的要素である。物理的リ
ンクは要素またはユニットを結合するための物理接続で
あり、一方コンピュータ・プログラミングにおけるリン
クは、プログラムの別々の部分間で制御およびパラメー
タのやり取りを行う命令またはアドレスである。多重シ
ステムでは、実アドレスまたは仮想アドレスで識別され
るリンクを識別するプログラム・コードによって指定さ
れる、２つのシステム間の接続がリンクである。したが
って、リンクには一般に、物理媒体、任意のプロトコ
ル、ならびに関連する装置およびプログラミングが含ま
れる。すなわち、リンクは論理的であるとともに物理的
である。Link A link is a physical or logical element. A physical link is a physical connection for connecting elements or units, while a link in computer programming is an instruction or address that exchanges control and parameters between different parts of a program. In multiple systems, a link is a connection between two systems specified by a program code that identifies the link identified by its real or virtual address. Thus, a link typically includes the physical medium, any protocol, and associated equipment and programming. That is, the link is both logical and physical.

【００２８】・ＭＦＬＯＰＳＭＦＬＯＰＳは、１秒当たり１０⁶個の浮動小数点命令
を意味する。MFLOPS MFLOPS means 10 ⁶ floating point instructions per second.

【００２９】・ＭＩＭＤＭＩＭＤは、アレイ内の各プロセッサがそれ自体の命令
ストリームを持ち、したがって多重命令ストリームを有
し、１処理要素当たり１つずつ配置された複数データ・
ストリームを実行する、プロセッサ・アレイ・アーキテ
クチャを指すのに使用される。MIMD MIMD is a multiple data arrangement where each processor in the array has its own instruction stream and thus multiple instruction streams, one for each processing element.
Used to refer to the processor array architecture that executes a stream.

【００３０】・モジュールモジュールとは、離散しており識別可能なプログラム単
位、あるいは他の構成要素と共に使用するように設計さ
れたハードウェアの機能単位である。また、単一の電子
チップに含まれるＰＥの集合体もモジュールと呼ばれ
る。Module A module is a discrete and identifiable program unit, or a functional unit of hardware designed for use with other components. An aggregate of PEs included in a single electronic chip is also called a module.

【００３１】・ノード一般に、ノードとはリンクの接合部である。ＰＥの汎用
アレイでは、１つのＰＥをノードとすることができる。
ノードはまた、モジュールというＰＥの集合体を含むこ
ともできる。本発明では、ノードはＰＭＥのアレイから
形成されており、この１組のＰＭＥをノードと称する。
ノードは８個のＰＭＥであることが好ましい。Node In general, a node is a junction of links. In a general-purpose array of PEs, one PE can be a node.
A node can also include a collection of PEs called modules. In the present invention, a node is formed from an array of PMEs, and this set of PMEs is called a node.
The node is preferably 8 PMEs.

【００３２】・ノード・アレイＰＭＥから構成されるモジュールの集合体をノード・ア
レイと呼ぶことがある。これは、モジュールから構成さ
れるノードのアレイである。ノード・アレイは通常、
２、３個より多いＰＭＥであるが、この用語は複数を包
含する。Node array A collection of modules composed of PMEs is sometimes called a node array. It is an array of nodes made up of modules. Node arrays are typically
Although there are more than a few PMEs, the term encompasses a plurality.

【００３３】・ＰＤＥＰＤＥとは、偏微分方程式である。PDE PDE is a partial differential equation.

【００３４】・ＰＤＥ緩和解法プロセスＰＤＥ緩和解法プロセスとは、ＰＤＥ（偏微分方程式）
を解く方法である。ＰＤＥを解くには、既知の分野にお
けるスーパー・コンピュータの計算能力の大半を使用
し、したがってこれは緩和プロセスの好例となる。ＰＤ
Ｅ方程式を解く方法は多数あり、複数の数値解法に緩和
プロセスが含まれている。たとえば、ＰＤＥを有限要素
法で解く場合、緩和の計算に大部分の時間が費やされ
る。熱伝達の分野の例を考えてみよう。煙突内に高温の
ガスがあり、外では冷たい風が吹いているとすると、煙
突のレンガ内の温度勾配はどのようになるだろうか。レ
ンガを小さなセグメントとみなし、セグメント間を熱が
どのように流れるかを温度差の関数として表す方程式を
書くと、伝熱ＰＤＥが有限要素問題に変換される。ここ
で、内側と外側の要素を除くすべての要素が室温であ
り、境界セグメントが高温のガスと冷たい風の温度であ
るとすると、緩和を開始するための問題ができあがる。
その後、コンピュータ・プログラムでは、セグメントに
流れ込む、あるいはセグメントから流れ出る熱の量に基
づいて各セグメント内の温度変数を更新することによ
り、時間をモデル化する。煙突における１組の温度変数
を緩和して、物理的な煙突で発生する実際の温度分布を
表すには、モデル中のすべてのセグメントを処理するサ
イクルに何回もかけなければならない。目的が煙突にお
けるガス冷却をモデル化することである場合、諸要素を
気体方程式に拡張しなければならず、そうすると、内側
の境界条件が別の有限要素モデルとリンクされ、このプ
ロセスが続く。熱の流れが隣接するセグメント間の温度
差に依存することに留意されたい。したがって、ＰＥ間
通信経路を使って温度変数を分配する。ＰＤＥ関係が並
列計算にうまく適用できるのは、この近隣接通信パター
ンまたは特性による。PDE relaxation solution process PDE relaxation solution process is PDE (partial differential equation)
Is a method of solving. Solving PDEs uses most of the computational power of supercomputers in the known field, and thus it exemplifies the mitigation process. PD
There are many ways to solve the E equation, and several numerical solutions include the relaxation process. For example, when solving a PDE with the finite element method, most of the time is spent computing the relaxation. Consider the example of the field of heat transfer. Given the hot gas inside the chimney and the cold wind outside, what is the temperature gradient inside the chimney bricks? Considering bricks as small segments and writing an equation that describes how heat flows between the segments as a function of temperature difference, the heat transfer PDE is transformed into a finite element problem. Now, assuming that all elements, except the inner and outer elements, are at room temperature and the boundary segments are the temperature of the hot gas and cold wind, a problem is created to initiate the relaxation.
The computer program then models the time by updating the temperature variable within each segment based on the amount of heat flowing into or out of the segment. In order to relax a set of temperature variables in the chimney to represent the actual temperature distribution that occurs in the physical chimney, it must take many cycles to process all the segments in the model. If the goal is to model gas cooling in a chimney, the elements must be extended to the gas equation, then the inner boundary conditions are linked with another finite element model, and this process continues. Note that the heat flow depends on the temperature difference between adjacent segments. Therefore, the temperature variable is distributed using the communication path between PEs. It is this near-neighbor communication pattern or characteristic that makes the PDE relationship well applicable to parallel computing.

【００３５】・ピケットこれは、アレイ・プロセッサを構成する要素のアレイ内
の要素である。この要素は、データ・フロー（ＡＬＵ
ＲＥＧＳ）、メモリ、制御機構、通信マトリックスのこ
の要素と関連する部分から構成される。この単位は、並
列プロセッサ要素およびメモリ要素と、その制御機構お
よびアレイ相互通信機構の一部から成るアレイ・プロセ
ッサの１／ｎを指す。ピケットは、プロセッサ・メモリ
要素（ＰＭＥ）の１つの形である。本発明のＰＭＥチッ
プ設計プロセッサ論理回路は、関連出願に記載されてい
るピケット論理を実施し、あるいはノードとして形成さ
れたプロセッサ・アレイ用の論理を持つことができる。
ピケットという用語は、処理要素を表す、一般的に使用
されているアレイ用語のＰＥと似ており、好ましくはビ
ット並列バイトの情報をクロック・サイクルで処理する
ための処理要素とローカル・メモリの組合せからなる、
処理アレイの要素である。好ましい実施例は、バイト幅
データ・フロー・プロセッサ、３２バイト以上のメモ
リ、原始制御機構、および他のピケットとの通信機構か
ら構成されている。Picket This is an element within the array of elements that make up the array processor. This element is a data flow (ALU
REGS), memory, controls, and the parts of the communication matrix associated with this element. This unit refers to 1 / n of an array processor that consists of parallel processor elements and memory elements and their control and part of the array intercommunication mechanism. Pickets are a form of processor memory element (PME). The PME chip design processor logic of the present invention may implement the picket logic described in the related application or have the logic for a processor array formed as a node.
The term picket is similar to the commonly used array term PE for processing elements, and is preferably a combination of processing elements and local memory for processing bit parallel bytes of information in a clock cycle. Consists of,
It is an element of the processing array. The preferred embodiment consists of a byte wide data flow processor, 32 bytes or more of memory, a primitive control mechanism, and a mechanism for communicating with other pickets.

【００３６】「ピケット」という用語は、トム・ソーヤ
ーと、彼の白いフェンスに由来している。ただし、機能
的には、軍隊のピケット・ラインと類似性があることも
理解されよう。The term "picket" comes from Tom Sawyer and his white fence. However, it will also be understood that, functionally, it is similar to the army's picket line.

【００３７】・ピケット・チップピケット・チップは、単一のシリコン・チップ上に複数
のピケットを含んでいる。Picket Chip A picket chip contains multiple pickets on a single silicon chip.

【００３８】・ピケット・プロセッサ・システム（また
はサブシステム）ピケット・プロセッサは、ピケットのアレイと、通信ネ
ットワークと、入出力システムと、マイクロプロセッ
サ、キャンド・ルーチン・プロセッサ、およびアレイを
実行するマイクロコントローラから成るＳＩＭＤ制御装
置とから構成されるトータル・システムである。Picket Processor System (or Subsystem) A picket processor consists of an array of pickets, a communications network, an input / output system, a microprocessor, a canned routine processor, and a microcontroller executing the array. It is a total system including a SIMD control device.

【００３９】・ピケット・アーキテクチャピケット・アーキテクチャは、ＳＩＭＤアーキテクチャ
の好ましい実施例であり、次のことを含む複数の多様な
問題に対応できる機能をもつ。 −セット連想処理 −並列数値中心処理 −イメージに類似した物理的アレイ処理Picket Architecture The picket architecture is a preferred embodiment of the SIMD architecture and has the capability to address a number of diverse problems including: -Set associative processing-Parallel numerical central processing-Image-like physical array processing

【００４０】・ピケット・アレイピケット・アレイは、幾何的順序でアレイされたピケッ
トの集合体であり、規則正しいアレイである。Picket Array A picket array is an ordered array of pickets arranged in a geometric order.

【００４１】・ＰＭＥすなわちプロセッサ・メモリ要素ＰＭＥは、プロセッサ・メモリ要素を表す。本明細書で
は、ＰＭＥという用語を、本発明の並列アレイ・プロセ
ッサの１つを形成する、単一のプロセッサ、メモリ、お
よび入出力可能なシステム要素もしくはユニットを指す
のに使用する。ＰＭＥは、ピケットを包含する用語であ
る。ＰＭＥは、プロセッサ、それと結合されたメモリ、
制御インタフェース、およびアレイ通信ネットワーク機
構の一部分から成るプロセッサ・アレイの１／ｎであ
る。この要素は、ピケット・プロセッサにおけるよう
に、正規のアレイの接続性を持つＰＭＥ、あるいは上述
の多重ＰＭＥノードにおけるように、サブアレイの一部
としてのＰＭＥを備えることができる。PME or Processor Memory Element PME stands for Processor Memory Element. The term PME is used herein to refer to a single processor, memory, and I / O capable system element or unit forming one of the parallel array processors of the present invention. PME is a term that encompasses pickets. A PME is a processor, memory associated with it,
1 / n of the processor array that consists of the control interface and part of the array communication network facility. This element can comprise a PME with regular array connectivity, such as in a picket processor, or a PME as part of a sub-array, such as in the multiple PME node described above.

【００４２】・経路指定経路指定とは、メッセージを宛先に届けるための物理経
路を割り当てることである。経路の割当てには、発信元
と宛先が必要である。これらの要素またはアドレスは、
一時的な関係または類縁性を持つ。メッセージの経路指
定は、しばしば、割当てのテーブルを参照することによ
って得られるキーに基づいて行われる。ネットワーク内
では、宛先は、リンクを識別する経路制御アドレスによ
って、伝送される情報の宛先としてアドレス指定され
る、任意のステーションまたはネットワークのアドレス
指定可能ユニットである。宛先フィールドは、メッセー
ジ・ヘッダ宛先コードで宛先を識別する。Routing The routing is the allocation of a physical route for delivering a message to a destination. A source and a destination are required for route allocation. These elements or addresses are
Has a temporary relationship or affinity. Message routing is often based on keys obtained by looking up a table of assignments. Within a network, a destination is any station or network addressable unit that is addressed as the destination of information to be transmitted by a routing address that identifies a link. The destination field identifies the destination with the message header destination code.

【００４３】・ＳＩＭＤアレイ内のすべてのプロセッサが、単一命令ストリーム
から、１処理要素当たり１つずつ配置された複数データ
・ストリームを実行するように指令を受ける、プロセッ
サ・アレイ・アーキテクチャ。A processor array architecture in which all processors in a SIMD array are instructed to execute multiple data streams arranged one per processing element from a single instruction stream.

【００４４】・ＳＩＭＤＭＩＭＤまたはＳＩＭＤ／ＭＩ
ＭＤＳＩＭＤＭＩＭＤまたはＳＩＭＤ／ＭＩＭＤとは、ある
時間の間ＭＩＭＤからＳＩＭＤに切り換えて複雑な命令
を処理できる二重機能を持ち、したがって２つのモード
を持つマシンを指す用語である。シンキング・マシンズ
社（Thinking Machines, Inc）のコネクション・マシン
（Connection Machine）モデルＣＭ−２をＭＩＭＤマシ
ンのフロント・エンドまたはバック・エンドとして配置
すると、プログラマは、二重モードとも称する、複数の
モードを動作させてある問題の別々の部分を実行するこ
とができた。これらのマシンは、ＩＬＬＩＡＣ以来存在
しており、バスを使用してマスタＣＰＵを他のプロセッ
サと相互接続している。マスタ制御プロセッサは、他の
ＣＰＵの処理に割り込む能力を持つ。他のＣＰＵは、独
立のプログラム・コードを実行できる。割込み中、チェ
ックポイント機能用に何らかの処理が必要である（制御
されるプロセッサの現状況のクローズおよびセーブ）。SIMDMIMD or SIMD / MI
MD SIMDMIMD or SIMD / MIMD is a term that refers to a machine that has the dual function of being able to switch from MIMD to SIMD for a period of time to process complex instructions, and thus has two modes. Placing a Connection Machine Model CM-2 from Thinking Machines, Inc as the front end or back end of a MIMD machine allows the programmer to configure multiple modes, also known as dual mode. I was able to run different parts of the problem that were running. These machines have been around since ILLIAC and use a bus to interconnect a master CPU with other processors. The master control processor has the ability to interrupt the processing of other CPUs. Other CPUs can execute independent program code. During the interrupt, something needs to be done for the checkpoint function (close and save the current state of the controlled processor).

【００４５】・ＳＩＭＩＭＤＳＩＭＩＭＤは、アレイ内のすべてのプロセッサが、単
一命令ストリームから、１処理要素当たり１つずつ配置
された複数データ・ストリームを実行するように指令を
受ける、プロセッサ・アレイ・アーキテクチャである。
この構成内では、命令実行を模倣する、各ピケット内の
データ従属演算が、ＳＩＭＤ命令ストリームによって制
御される。SIMIMD SIMIMD is a processor array architecture in which all processors in the array are commanded by a single instruction stream to execute multiple data streams arranged one per processing element. Is.
Within this configuration, the data dependent operations within each picket that mimic instruction execution are controlled by the SIMD instruction stream.

【００４６】これは、ＳＩＭＤ命令ストリームを使用し
て複数命令ストリーム（１ピケット当たり１個）を順序
付けし、複数データ・ストリーム（１ピケット当たり１
個）を実行することの可能な、単一命令ストリーム・マ
シンである。ＳＩＭＩＭＤは、ＰＭＥシステムによって
実行できる。This uses the SIMD instruction stream to order multiple instruction streams (one per picket) and multiple data streams (one per picket).
A single instruction stream machine capable of executing SIMIMD can be performed by the PME system.

【００４７】・ＳＩＳＤＳＩＳＤは、単一命令単一データの略語である。SISD SISD is an abbreviation for single instruction single data.

【００４８】・スワッピングスワッピングとは、ある記憶域のデータ内容を別の記憶
域のデータ内容と相互に交換することをいう。Swapping Swapping refers to the interchange of the data content of one storage area with the data content of another storage area.

【００４９】・同期操作ＭＩＭＤマシンにおける同期動作は、各アクションがあ
る事象（通常はクロック）に関係付けられる、動作モー
ドである。この事象は、プログラム・シーケンス中で規
則的に発生する、指定された事象とすることができる。
動作は多数の処理要素にディスパッチされ、それらの処
理要素はそれぞれ独立して機能を実行する。動作が完了
しないかぎり、制御は制御装置に返されない。Synchronous operation Synchronous operation in MIMD machines is a mode of operation in which each action is associated with some event (usually a clock). This event can be a specified event that occurs regularly in the program sequence.
Actions are dispatched to a number of processing elements, each of which performs its function independently. Control is not returned to the controller until the operation is complete.

【００５０】要求が機能ユニットのアレイに対するもの
である場合、アレイ内の要素に制御装置から要求が出さ
れ、その要素は、制御装置に制御が返される前に動作を
完了しなければならない。If the request is for an array of functional units, a request is made by the controller to an element in the array, which element must complete operation before control is returned to the controller.

【００５１】・ＴＥＲＡＦＬＯＰＳＴＥＲＡＦＬＯＰＳは、１秒当たり１０¹²個の浮動小数
点命令を意味する。TERAFLOPS TERAFLOPS means 10 ¹² floating point instructions per second.

【００５２】・ＶＬＳＩＶＬＳＩとは、（集積回路に適用される）超大規模集積
の略語である。VLSI VLSI is an abbreviation for very large scale integration (applied to integrated circuits).

【００５３】・ジッパジッパとは、新規に提供される、アレイ構成の通常の相
互接続の外部にある装置からリンクを確立するための機
能である。Zipper A zipper is a newly provided function for establishing a link from a device outside the normal interconnection of an array configuration.

【００５４】・回線交換方式中間のＰＭＥによる追加
の操作なしにメッセージが中間ＰＭＥを通過して最終宛
先に向うように、中間ＰＭＥが入力ポートを出力ポート
に論理的に接続する、アレイ内のＰＭＥ間でのデータ転
送の方法。Circuit-switched scheme A PME in the array in which an intermediate PME logically connects an input port to an output port so that the message passes through the intermediate PME towards its final destination without additional manipulation by the intermediate PME. Method of data transfer between.

【００５５】・入力転送完了割込み転送完了タグを伴
う入出力メッセージ・ワードを受け取ったときに行われ
る、プログラム・コンテキスト切替えの要求。Input transfer complete interrupt Request for program context switch made when an I / O message word with a transfer complete tag is received.

【００５６】・ブレイクイン入出力ポートがプロセッ
サ透過性コンテキスト切替えを引き起こし、プロセッサ
・データ流と制御経路を使ってデータ転送を自己管理す
るための機構。Break-in A mechanism for I / O ports to cause processor-transparent context switching and self-manage data transfers using processor data streams and control paths.

【００５７】・実行時ソフトウェア処理要素上で実行
されるソフトウェアであり、オペレーティング・システ
ム、エグゼクティブ・プログラム、適用業務プログラ
ム、サービス・プログラムなどを含む。Run-time software Software that runs on the processing elements, including operating systems, executive programs, application programs, service programs, etc.

【００５８】・メモリ・リフレッシュ現情報の再書込
み中にメモリの使用が中断される、動的ＲＡＭ技術で必
要とされる機能。Memory refresh A function required by dynamic RAM technology in which memory usage is interrupted during rewriting of the current information.

【００５９】・ジッパ一群のネットワーク・リングの
動的ブレイク。「ジップ」されたときは、データはネッ
トワークに出入りせずにリングを回ることができる。
「ジップ解除」されると、リングはブレイクされてネッ
トワークへのエッジを形成し、リングを回るデータがそ
こを通ってネットワークに出入りする。Zipper Dynamic break of a group of network rings. When "zipped", data can travel around the ring without entering or leaving the network.
When "unzipped", the ring is broken to form an edge to the network through which data traveling in and out of the network.

【００６０】[0060]

【発明が解決しようとする課題】本発明の背景として、
メッシュ、トーラスその他の次元ネットワーク内での高
速入出力は、より高速の入出力によって強化される。従
来のシステムは、ネットワークに関して本発明の機能を
有さない。リングをブレイクしてネットワークへのエッ
ジを形成し、リングを回るデータがそこを通ってネット
ワークに出入りできるようにする機能を提供することは
重要であると考える。As the background of the present invention,
Fast I / O in meshes, tori and other dimensional networks is enhanced by faster I / O. Conventional systems do not have the functionality of the invention with respect to networks. We think it important to provide the ability to break the ring to form an edge into the network through which data around the ring can enter and leave the network.

【００６１】[0061]

【課題を解決するための手段】多重ＰＭＥコンピュータ
・システム用の高速入出力は、ネットワーク結合にブレ
イクインしてネットワーク結合を切り換える方法を提供
する。このシステム結合をジッパと称する。High speed input / output for multiple PME computer systems provides a method of breaking in and switching network connections. This system connection is called a zipper.

【００６２】本発明の入出力ジッパの概念を用いて、あ
るノードに入るポートをあるノードから出るポートまた
はシステム・バスから来るデータで駆動することができ
るという概念を実施することができる。逆に、あるノー
ドから出されたデータが、別のノード及びシステム・バ
スへの入出力にとって使用可能になる。システム・バス
へのデータ出力と別のノードへのデータ出力は、同時で
はなく異なるサイクルに行われる。ジッパは、相互接続
されたノードのネットワークにデータを出し入れし、ノ
ードをメッシュ、リングまたは折返しトーラスとして相
互接続するシステム中で使用され、したがってネットワ
ークへのエッジはなく、ジッパ機構はリングをリングに
直交する次元に沿って論理的にブレイクして、ネットワ
ークへのエッジを確立させる。結合は、エッジのないネ
ットワークとエッジをもつネットワークの間でネットワ
ークを論理的にトグルする。エッジが活動状態のとき、
データはエッジを通ってネットワークに出入りし、この
結合により、ネットワークに入るデータの分散またはネ
ットワークから出るデータの収集が可能となり、その結
果、エッジを通るデータ速度が、ネットワークの外部の
システムの持続データ速度及びピーク・データ速度に一
致するようになる。The I / O zipper concept of the present invention can be used to implement the concept that a port entering a node can be driven with a port leaving a node or data coming from the system bus. Conversely, the data emitted from one node is made available for I / O to another node and the system bus. The data output to the system bus and the data output to another node do not occur simultaneously but in different cycles. Zippers are used in systems that move data in and out of a network of interconnected nodes and interconnect the nodes as a mesh, ring or folded torus, so there is no edge to the network and the zipper mechanism makes the ring orthogonal to the ring. It breaks logically along the dimension to establish an edge to the network. Coupling logically toggles networks between networks without edges and networks with edges. When the edge is active,
Data enters and leaves the network through the edges, and this coupling allows for the distribution of data that enters the network or the collection of data that exits the network so that the data rate through the edge is persistent data for systems outside the network. It will match the rate and peak data rate.

【００６３】ジッパは、一群のネットワーク・リングの
動的ブレイクを可能にする。「ジップ」されると、デー
タはネットワークに出入りせずにリングを回ることがで
きる。「ジップ解除」されると、リングはブレイクされ
てネットワークへのエッジを形成し、リングを回るデー
タがそこを通ってネットワークに出入りする。Zippers allow a dynamic break of a group of network rings. When "zipped", data can travel around the ring without entering or leaving the network. When "unzipped", the ring is broken to form an edge to the network through which data traveling in and out of the network.

【００６４】上記その他の改良点は、下記の詳細な説明
に記載されている。本発明と、その利点及び特徴をより
よく理解するため、下記の説明及び図面を参照された
い。The above and other improvements are described in the detailed description below. For a better understanding of the present invention and its advantages and features, refer to the description and to the drawings below.

【００６５】下記の詳しい説明では、図面を参照しなが
ら、本発明の好ましい実施例とその利点及び特徴を例に
よって説明する。The following detailed description illustrates, by way of example, preferred embodiments of the invention and its advantages and features, with reference to the drawings.

【００６６】[0066]

【実施例】以前の米国特許出願第６１１５９４号は、単
一チップ内にメモリと制御論理機構を組み込み、その組
合せをチップ内で複製し、単一チップの複製からプロセ
ッサ・システムを構築する構想を記載している。この手
段で、ただ一種類の単一チップを開発し製作するだけ
で、チップ境界の交差と線長を減らすことによって性能
能力を高めながら、大規模並列処理能力をもたらすシス
テムが得られる。DETAILED DESCRIPTION OF THE INVENTION Previous U.S. Patent Application No. 6111594 envisions incorporating memory and control logic within a single chip, replicating the combination within the chip, and building a processor system from a single chip replica. It has been described. By this means, by developing and manufacturing only one type of single chip, a system can be obtained that provides massively parallel processing capability while increasing performance capability by reducing chip boundary crossings and line lengths.

【００６７】原特許は、１次元入出力構造を利用して、
チップ内でその構造に多数のＳＩＭＤ処理メモリ要素
（ＰＭＥ）を付加することを記載している。本発明及び
引用した関連出願では、その概念を２次元以上に拡張
し、データ転送とプログラム割込みとを備えた完全な入
出力システムを含める。以下の記述は、１チップ当り８
個のＳＩＭＤ／ＭＩＭＤＰＭＥを有する４次元入出力
構造について行うが、前掲の米国特許出願第６１１５９
４号に記載されているように、より高次元にまたは１次
元当りさらに多くのＰＭＥに拡張することもできる。The original patent uses a one-dimensional input / output structure to
It describes adding multiple SIMD processing memory elements (PMEs) to the structure in a chip. The present invention and related applications cited extend the concept to more than one dimension and include a complete I / O system with data transfer and program interrupts. The following description is 8 per chip
For a four-dimensional input / output structure with a single SIMD / MIMD PME, see US Pat.
It can also be extended to higher dimensions or to more PMEs per dimension, as described in No. 4.

【００６８】本発明及びその関連出願では、これらの概
念をプロセッサ間通信から外部入出力機構に拡張する。
さらに、処理アレイの制御に必要なインターフェース及
び要素をも記述する。要約すると、入出力のタイプは次
の３種ある。（ａ）プロセッサ間、（ｂ）プロセッサと
外部の間、（ｃ）同報通信／制御。大規模並列処理シス
テムでは、これらすべてのタイプの入出力帯域幅をプロ
セッサの計算能力と釣り合わせる必要がある。アレイ内
で、これらの要件は、非常に高速の割込み状態スワップ
能力を付加された１６ビット命令セット・アーキテクチ
ャ・コンピュータ（以下ではＰＭＥと称する）を複製す
ることによって満足される。ＰＭＥの特徴は、他の大規
模並列マシンの処理要素と比較すると独特である。それ
は、処理、経路指定、記憶及び入出力を完全に分散させ
ることができる。この特徴は他のどの設計にもない。The present invention and its related applications extend these concepts from interprocessor communication to external input / output mechanisms.
It also describes the interfaces and elements needed to control the processing array. In summary, there are the following three types of input / output. (A) Between processors, (b) Between processor and the outside, (c) Broadcast communication / control. In a massively parallel processing system, all these types of I / O bandwidth must be balanced with the computational power of the processor. Within the array, these requirements are met by replicating a 16-bit instruction set architecture computer (hereinafter PME) with the addition of very fast interrupt state swap capability. The characteristics of PMEs are unique when compared to the processing elements of other massively parallel machines. It can completely distribute processing, routing, storage and I / O. This feature does not exist in any other design.

【００６９】関連出願に詳細に開示されている「拡張並
列アレイ・プロセッサ（ＡＰＡＰ）」のブロック図を図
１に示す。ＡＰＡＰは、ホスト・プロセッサ１の付属物
である。ホスト・プロセッサ上で実行されるプログラム
によってデータとコマンドが発行される。これらのデー
タとコマンドを、アレイ・ディレクタのアプリケーショ
ン・インターフェース（ＡＰＩ）３で受け取って変換す
る。次いでＡＰＩからデータとコマンドが、クラスタ同
期装置４とクラスタ制御装置５を経てクラスタ６に渡さ
れる。これらのクラスタは、ＡＰＡＰのメモリを提供し
並列処理を行う。クラスタ同期装置４とクラスタ制御装
置５が提供する機能は、データとコマンドを適切なクラ
スタに経路指定し、クラスタ間の負荷の均衡をはかるこ
とである。制御装置の詳細は、"Advanced Parallel Pro
cessor Array Director"と題する米国特許出願に記載さ
れている。A block diagram of the "Advanced Parallel Array Processor (APAP)" disclosed in detail in the related application is shown in FIG. APAP is an adjunct to the host processor 1. Data and commands are issued by programs running on the host processor. These data and commands are received and converted by the application director (API) 3 of the array director. Next, data and commands are passed from the API to the cluster 6 via the cluster synchronizer 4 and the cluster controller 5. These clusters provide APAP memory and perform parallel processing. The function provided by the cluster synchronizer 4 and the cluster controller 5 is to route data and commands to the appropriate clusters and to balance the load among the clusters. For details on the control device, refer to "Advanced Parallel Pro
US Patent Application entitled "cessor Array Director".

【００７０】クラスタは、修正ハイパーキューブとして
相互接続されたいくつかのＰＭＥから構成される。ハイ
パーキューブ内では、各セルが、アドレスが１ビット位
置だけ異なるどのセルをも隣接セルとしてアドレスする
ことができる。リング内ではどのセルも、アドレスが±
１だけ異なる２つのセルを隣接セルとしてアドレスする
ことができる。ＡＰＡＰ用に使用される修正ハイパーキ
ューブは、この両方の手法を組み合わせて、リングから
ハイパーキューブを構築する。リングの交差部をノード
と定義する。本発明の好ましい実施例では、ノードは２
ｎ個のＰＭＥ２０と同報通信／制御インターフェース
（ＢＣＩ）部２１を含む。ＰＭＥはノード内で２×ｎア
レイとして構成される。ここで、ｎはアレイを特徴づけ
る次元またはリングの数であり、物理的チップ・パッケ
ージの制限を受ける。好ましい実施例ではｎ＝４であ
る。チップ技術が向上するにつれて、"ｎ"が大きくなる
と、アレイ内で可能な次元が高くなる。A cluster consists of several PMEs interconnected as a modified hypercube. Within the hypercube, each cell can address any cell whose address differs by one bit position as a neighbor cell. Every cell in the ring has an address ±
Two cells that differ by one can be addressed as neighboring cells. The modified hypercube used for APAP combines both approaches to build a hypercube from a ring. The intersection of the rings is defined as a node. In the preferred embodiment of the invention, there are two nodes.
It includes n PMEs 20 and a broadcast / control interface (BCI) unit 21. The PME is organized as a 2xn array within the node. Where n is the number of dimensions or rings that characterize the array, subject to the limitations of the physical chip package. In the preferred embodiment, n = 4. As chip technology improves, the larger "n", the higher the possible dimensions in the array.

【００７１】図３及び４に、ＰＭＥからのアレイの構築
を示す。８個のＰＭＥが相互接続されてノード１５１を
形成している。８個のノードからなるグループがＸ次元
リング（１６ＰＭＥ）として相互接続され、それとオー
バー折返しする８個のノードのグループがＹ次元リング
１５２として相互接続される。これによって、ノードの
８×８アレイ（５１２ＰＭＥ）を含む単一の２次元クラ
スタが得られる。クラスタは最大で８×８アレイに組み
合わされて、４次元アレイ要素１５３を形成する。この
アレイ要素を横切る８個のノードの各グループが、Ｗ次
元とＺ次元で組み合わされる。４つの次元すべてにおけ
る単一ノードの相互接続経路が１５４に示されている。
アレイが正規形または直交形である必要はないことに留
意されたい。特定のアプリケーションまたは構成で、任
意のまたはすべての次元でのノードの数を定義し直すこ
とができる。The construction of arrays from PMEs is shown in FIGS. Eight PMEs are interconnected to form node 151. A group of eight nodes are interconnected as an X-dimensional ring (16PME), and a group of eight nodes that overturn it are interconnected as a Y-dimensional ring 152. This results in a single two-dimensional cluster containing an 8x8 array of nodes (512 PMEs). The clusters are combined in a maximum of 8x8 array to form a four-dimensional array element 153. Each group of 8 nodes across this array element are combined in the W and Z dimensions. Single node interconnection paths in all four dimensions are shown at 154.
Note that the array need not be normal or orthogonal. A particular application or configuration can redefine the number of nodes in any or all dimensions.

【００７２】各ＰＭＥは、１つのノード・リング２６内
にしか存在できない（図２）。リングをＷ、Ｘ、Ｙ、Ｚ
と呼ぶ。１チップ内のＰＭＥ２０は対になっており（す
なわち、＋Ｗ、−Ｗ）、一方のＰＭＥはデータを時計回
りにノード・リングに沿って外部へ移動し、他方のＰＭ
Ｅは反時計回りにノード・リング２３、２６に沿って外
部を移動し、したがって１つのＰＭＥが各ノードの外部
ポート専用となる。各リング内の２個のＰＭＥに、その
外部入出力ポートに因んだ名前を付ける（＋Ｗ、−Ｗ、
＋Ｘ、−Ｘ、＋Ｙ、−Ｙ、＋Ｚ、−Ｚ）。ノード内にも
２個のリング２２があり、４＋ｎ個及び４−ｎ個のＰＭ
Ｅを相互接続する。こうした内部リングは、メッセージ
が外部リング間を移動するための経路を提供する。ＡＰ
ＡＰは４次元直交アレイ１５１〜１５４と見なすことが
できるので、内部リングにより、メッセージがアレイ中
をすべての次元で移動できるようになる。このため、そ
れ自体のノード・リング内のＰＭＥまたはそのノード内
の隣接ＰＭＥをアドレスすることにより、どのＰＭＥも
目的に向けてメッセージをステップできる、アドレス指
定構造が得られる。Each PME can only exist within one node ring 26 (FIG. 2). Ring W, X, Y, Z
Call. The PMEs 20 in one chip are paired (ie + W, -W) and one PME moves data clockwise along the node ring to the outside and the other PME
E travels counterclockwise along the node rings 23, 26 externally, thus one PME is dedicated to each node's external port. Name the two PMEs in each ring named after their external I / O ports (+ W, -W,
+ X, -X, + Y, -Y, + Z, -Z). There are also two rings 22 in the node, 4 + n and 4-n PMs
Interconnect E. These inner rings provide a path for messages to travel between outer rings. AP
Since the APs can be considered as four-dimensional orthogonal arrays 151-154, the inner ring allows the message to move through the array in all dimensions. Thus, by addressing a PME in its own node ring or a neighboring PME in that node, an addressing structure is obtained in which any PME can step the message towards its purpose.

【００７３】各ＰＭＥは、図５では４個の入力ポートと
４個の出力ポート（左８５、９２、右８６、９５、縦９
３、９４、外部８０、８１）をもつ。入力ポートのうち
の３個と出力ポートのうちの３個は、チップ上の他のＰ
ＭＥへの全２重２点間接続である。第４のポートは、オ
フチップＰＭＥへの全２重２点間接続である。好ましい
実施例では物理的パッケージにおけるピン及び電力上の
拘束のために、実際の入出力インターフェースは４ビッ
ト幅の経路９７、９８、９９であり、これらは図１５に
示すＰＭＥ間データ・ワード９６、１００の４個のニッ
ブルを多重化するために使用される。In FIG. 5, each PME has four input ports and four output ports (left 85, 92, right 86, 95, vertical 9).
3, 94, external 80, 81). Three of the input ports and three of the output ports are
It is a full-duplex, 2-point connection to the ME. The fourth port is a full duplex point-to-point connection to the off-chip PME. In the preferred embodiment, due to pin and power constraints on the physical package, the actual I / O interfaces are 4-bit wide paths 97, 98, 99, which are the inter-PME data words 96 shown in FIG. Used to multiplex 100 4 nibbles.

【００７４】好ましい実施例では、このＰＭＥの入出力
設計は、３種の入出力動作モードを提供する。In the preferred embodiment, this PME I / O design provides three I / O modes of operation.

【００７５】・通常モード隣接する２つのＰＭＥ間で
のデータ転送に使用される。データ転送はＰＭＥソフト
ウェアによって開始される。隣接するＰＭＥより遠くに
あるＰＭＥ宛のデータは、隣接するＰＭＥが受け取っ
て、それをその隣接ＰＭＥから発するかのように転送す
る。通常モードは、"PME Store and Forward/Circuit S
witched Modes"と題する関連米国特許出願に詳細に開示
されている。Normal mode Used for data transfer between two adjacent PMEs. Data transfer is initiated by PME software. Data destined for a PME farther than an adjacent PME is received by the adjacent PME and forwarded as if it originated from that adjacent PME. Normal mode is "PME Store and Forward / Circuit S
It is disclosed in detail in a related US patent application entitled "witched Modes".

【００７６】・回線交換モードデータ及び制御がＰＭ
Ｅ中を通過できるようにする。このモードを使うと、直
接隣接していないＰＭＥ間での高速通信が可能になる。
回線交換モードは、前記の関連特許出願に詳細に開示さ
れている。Circuit switching mode Data and control is PM
Allow passage through E. This mode enables high speed communication between PMEs that are not directly adjacent to each other.
The circuit switched mode is disclosed in detail in the above related patent application.

【００７７】・ジッパ・モードクラスタ内のノードに
データをロードし、またはそこからデータを読み取るた
めに、アレイ制御装置が使用する。ジッパ・モードは、
通常モード及び回線交換モードの諸特徴を使って、クラ
スタ・カード上のＰＭＥのアレイとの間でデータを高速
で転送する。Zipper mode Used by the array controller to load data into or read data from the nodes in the cluster. Zipper mode is
The normal mode and circuit switched mode features are used to transfer data at high speed to and from an array of PMEs on a cluster card.

【００７８】アレイＷ、Ｘ、Ｙ、Ｚ内の各リングは連続
的であり、アレイへのエッジはない。概念上は、ジッパ
は、２つのノード間のインターフェースでリングを論理
的にブレイクして、一時エッジを形成するものである。
ジッパが非活動状態の場合、アレイはエッジをもたな
い。ジッパが活動化されると、２つのノード列間のすべ
てのインターフェースがブレイクされ、得られる「エッ
ジ」がアレイとアレイ制御装置の間でのデータ転送に使
用される。たとえば、図６を参照すると、ジッパ接続が
Ｘ＝０のノード行に沿った−Ｘインターフェース上に置
かれる場合、Ｘ＝８（ＰＭＥ×１５）２５０のノード行
とＸ＝０（ＰＭＥ×０）２５３のノード行の間のインタ
ーフェースは、もはや２点間ではなく、第３の（ホス
ト）インターフェース２５１が付加される。通常、デー
タは、ＰＭＥ×０２５３とＰＭＥ×１５２５０の間
を、そこにホスト・インターフェースがないかのように
通過する。しかし、ＰＭＥ実行時ソフトウェアの制御下
では、ジッパが活動化された場合、アレイの一時エッジ
を介してアレイ２５０、２５３とホスト・インターフェ
ース２５１の間をデータが通過する。単一クラスタの行
に沿ったジッパは８個のノードでリングをブレイクす
る。今日の技術に基づけば、好ましい実施例では、単一
のジッパを介して単一クラスタとの間で毎秒約５７メガ
バイトをパスすることができる。光接続など将来技術が
発展すれば、このデータ速度は大幅に増加すると期待さ
れる。Each ring in array W, X, Y, Z is continuous and has no edges to the array. Conceptually, a zipper is a logical break in the ring at the interface between two nodes to form a temporary edge.
When the zipper is inactive, the array has no edges. When the zipper is activated, all interfaces between the two node trains are broken and the resulting "edges" are used to transfer data between the array and the array controller. For example, referring to FIG. 6, if the zipper connection is placed on the -X interface along the X = 0 node row, then X = 8 (PME × 15) 250 node rows and X = 0 (PME × 0). The interface between the 253 node rows is no longer between the two points, but a third (host) interface 251 is added. Normally, the data passes between PME × 0 253 and PME × 15 250 as if there was no host interface there. However, under control of the PME run-time software, data is passed between the arrays 250, 253 and the host interface 251 via the temporary edges of the array when the zipper is activated. A zipper along a single cluster row breaks the ring with 8 nodes. Based on today's technology, the preferred embodiment can pass about 57 megabytes per second to and from a single cluster via a single zipper. It is expected that this data rate will increase significantly if future technology such as optical connection develops.

【００７９】図７は、この概念をどのように拡張すれ
ば、クラスタの２つの「エッジ」２５５、２５６上にジ
ッパを置くことができるかを示している。この手法で
は、異なるデータが各ジッパ内に渡される場合は、入出
力帯域幅が毎秒約１１４メガバイトに増加し、同一のデ
ータが各ジッパ内に渡される場合は、毎秒約５７メガバ
イトの直交データ移動をサポートする。直交データ移動
は、アレイ内での高速の転置操作及び行列乗算操作をサ
ポートする。理論上は各ノード間インターフェース上に
ジッパが存在し得るが、実際にはジッパ・インターフェ
ースを持つ各ＰＭＥは、そのメモリが満杯になってそれ
以上データを受け入れることができないようになるのを
避けるために、アレイ入出力データを他のＰＭＥに移す
ことができなければならない。ジッパの数は、各ＰＭＥ
でどれだけのメモリが使用できるかを決定する技術と、
ジッパ上のＰＭＥとアレイ内の別のＰＭＥの間でジッパ
・データを移動できる速度によって制限される。FIG. 7 illustrates how this concept can be extended to place zippers on the two "edges" 255, 256 of a cluster. This approach increases the I / O bandwidth to approximately 114 megabytes per second when different data is passed within each zipper, and approximately 57 megabytes per second of orthogonal data movement when the same data is passed within each zipper. To support. Orthogonal data movement supports fast transposition and matrix multiplication operations within the array. In theory there could be a zipper on each node-to-node interface, but in practice each PME with a zipper interface would avoid having its memory fill up and not be able to accept any more data. First, it must be possible to transfer array I / O data to another PME. The number of zippers is for each PME
Technology to determine how much memory can be used by
Limited by the speed at which zipper data can be moved between the PME on the zipper and another PME in the array.

【００８０】図１は、ｎ個のクラスタからなるアレイを
示している。好ましい実施例では、各クラスタが２個の
直交ジッパをサポートする。このアレイの最大アレイ入
出力速度は、毎秒２ｎ×５７メガバイトである。このア
レイの最大直交アレイ入出力速度は、毎秒ｎ×１５７メ
ガバイトである。FIG. 1 shows an array of n clusters. In the preferred embodiment, each cluster supports two orthogonal zippers. The maximum array I / O rate for this array is 2n x 57 megabytes per second. The maximum orthogonal array I / O rate for this array is n × 157 megabytes per second.

【００８１】ジッパの好ましい実施例では、ジッパ入力
とジッパ出力の２つの動作モードがある。ジッパ入力動
作はアレイ制御装置からクラスタ上の選択されたＰＭＥ
のグループにデータを転送する。ジッパ入力動作は、ア
レイ制御装置実行時ソフトウェアによって開始される。
アレイ制御装置実行時ソフトウェアは、まずＰＭＥＳＩ
ＭＤモード同報通信コマンド（"SIMD/MIMD Processing
Memory Element"と題する関連米国特許出願参照）を使
って、ＰＭＥをジッパ通常（ＺＮ）モードとジッパ回線
交換（ＺＣ）モードのどちらかでジッパ・インターフェ
ースに沿って置く。次いで、アレイ制御装置実行時ソフ
トウェアが、ＺＮモードのＳＩＭＤＰＭＥソフトウェア
に受け取るべきワードのカウントを与える。ＺＮモード
では、ＰＭＥはＸインターフェース８０（図５）からデ
ータを受け取ることができるが、まずそのインターフェ
ースのためにメモリ内の入力バッファをセットアップし
なければならない。メモリ内の２つの位置が、各入力デ
ータ・バッファ２３２の開始アドレスと、バッファ２３
３に格納されたワード数とを格納するために予約されて
いる。さらに、ＰＭＥ制御レジスタ２（図９）が、入力
インターフェース１７３を使用可能にしかつ入力割込み
１７２を可能にする。ＳＩＭＤＰＭＥ同報通信ソフト
ウェアは、予約されたメモリ位置に出力データ・ブロッ
クを定義するためのロードを行い、ＰＭＥ制御レジスタ
２に入力データの転送を可能にするためのロードを行
う。ＺＮモードでは、ＰＭＥは遊休状態にあり、入出力
割込みまたはＺＣモードへのトグルを待つ。In the preferred embodiment of the zipper, there are two modes of operation: zipper input and zipper output. The zipper input operation is from the array controller to the selected PME on the cluster.
Transfer data to a group of. The zipper input operation is initiated by the array controller run-time software.
The array controller run-time software must first call PMESI.
MD mode broadcast communication command ("SIMD / MIMD Processing
A PME is placed along the zipper interface in either zipper normal (ZN) mode or zipper line switched (ZC) mode using a related US patent application entitled "Memory Element". The software gives the SIMDPME software in ZN mode a count of words to receive, in which the PME can receive data from the X interface 80 (FIG. 5), but first the input buffer in memory for that interface. The two locations in memory are the start address of each input data buffer 232 and the buffer 23.
Reserved for storing the number of words stored in 3. In addition, PME control register 2 (FIG. 9) enables input interface 173 and input interrupt 172. The SIMD PME broadcast software loads to reserved memory locations to define the output data block and to load the PME control register 2 to allow the transfer of input data. In ZN mode, the PME is idle and waits for I / O interrupts or toggles to ZC mode.

【００８２】ＰＭＥが可能な１つの構成をとる場合のジ
ッパ入力動作を図１０に示す。この図には、８ワードを
異なる３つのＰＭＥに転送する例が示してある。データ
・インターフェース（ジッパ）がデータをＰＭＥ２６０
に転送し、このアレイを介してＰＭＥからＰＭＥに移動
される。FIG. 10 shows the zipper input operation when the PME has one possible configuration. This figure shows an example of transferring 8 words to three different PMEs. Data interface (zipper) sends data to PME260
To the PME from this PME via this array.

【００８３】本発明の好ましい実施例では、アレイ制御
装置は最初にＰＥＡ２６０、ＰＥＢ２６１、ＰＥＤ
２６３をＺＮモードに設定し、ＰＥＣ２６２をＺＣモ
ードに設定する。ジッパ入力動作では、ＰＭＥ制御レジ
スタ１の"Ｚ"ビット１６３と"ＣＳ"ビット１７０をセッ
トすると、ＰＭＥがＺＣモードになる。"Ｚ"ビット１６
３をセットし、"ＣＳ"ビット１７０をリセットすると、
ＰＥがＺＮモードになる。ＰＥＡ、Ｂ、Ｄには、それ
ぞれ初期受信カウント３、４、１が割り当てられる。Ｐ
ＥＡは、通常の受信シーケンスを使ってその３データ
・ワードを受け取る。ワード・カウントが０になると、
ＰＥＡのハードウェアがＰＭＥ制御レジスタ１の"Ｃ
Ｓ"ビット１７０をリセットし、ＰＥＡ２６４をＺＣ
モードに入らせる。ＰＥＢ２６９とＰＥＤ２７１で
も同じシーケンスが実行される。（ＰＥＤへの）最終
ワード転送時２７１に、アレイ制御装置は、転送完了
（ＴＣ）タグ・ビット２２４を挿入することができる。
ＴＣビットがセットされると、ＰＥＡ〜Ｄはそのビッ
トを検出し、入出力割込み要求１７１を生成する。ＴＣ
ビット２２４がセットされていない場合、ＰＥＡ〜Ｄ
は転送終了時にＺＣモード２７２〜２７５に留まる。In the preferred embodiment of the present invention, the array controller first PE A260, PEB261, PE D.
263 is set to ZN mode and PE C262 is set to ZC mode. In the zipper input operation, setting the "Z" bit 163 and "CS" bit 170 of the PME control register 1 puts the PME in ZC mode. "Z" bit 16
By setting 3 and resetting the "CS" bit 170,
PE goes into ZN mode. Initial reception counts 3, 4, and 1 are assigned to PEs A, B, and D, respectively. P
EA receives the three data words using the normal receive sequence. When the word count reaches 0,
The hardware of PE A is "C" of PME control register 1.
Reset the S "bit 170 and ZC PE A264
Enter the mode. The same sequence is executed for PE B269 and PE D271. At the time of the last word transfer (to PED) 271, the array controller may insert a transfer complete (TC) tag bit 224.
When the TC bit is set, PE AD detects that bit and generates an I / O interrupt request 171. TC
PE A-D if bit 224 is not set
Remains in ZC mode 272-275 at the end of the transfer.

【００８４】ジッパ・インターフェース上で要求２４０
が検出されると、受信側ＰＭＥは肯定応答２４１を送出
し、データを入力レジスタ８７にロードする。次いで受
信シーケンスが開始し、カウント２３３を取り出して減
分し、入力バッファ・アドレス２３２を取り出して増分
し、データ・ワードをＰＭＥメモリ４１に格納する。受
信シーケンスは送信シーケンスと類似している。このシ
ーケンスは、遊休ＰＭＥにブレイクインし、メモリ４１
及びＡＬＵ４２へのアクセスをサイクル・スチールする
ことにより、入出力アドレスとカウント・フィールドを
更新させ、入力データ・ワードをメモリ４１にロードさ
せる。カウントが０に達してモードがＺＣに切り換わる
か、あるいはＴＣタグを受け取って対応する入力割込み
レジスタ・ビット１７１がセットされ、割込みコード１
９０が「転送完了」を示すようになるかするまで、この
シーケンスが続く。Request 240 on the zipper interface
Is detected, the receiving PME sends an acknowledgment 241 and loads the data into the input register 87. The receive sequence then begins, fetching the count 233 and decrementing it, fetching the input buffer address 232 and incrementing it, and storing the data word in PME memory 41. The receive sequence is similar to the transmit sequence. This sequence breaks into the idle PME and the memory 41
And cycle stealing access to ALU 42 to update the I / O address and count fields and load the input data word into memory 41. Either the count reaches 0 and the mode switches to ZC, or a TC tag is received and the corresponding input interrupt register bit 171 is set, interrupt code 1
This sequence continues until 90 begins to indicate "transfer complete".

【００８５】ＰＭＥは、下記の条件が満たされる場合、
要求に応答して肯定応答を生成する。・入力レジスタ８７、１００が空・要求が抑制されていない１７４・割込み１８２がその要求入力上で保留中ではない。・要求入力が回線交換されていない。・要求がすべての現要求のうちで最高の優先順位をも
つ。The PME is:
Generate an acknowledgment in response to the request. The input registers 87, 100 are empty. The request is not suppressed 174. The interrupt 182 is not pending on the request input. -The request input is not switched. The request has the highest priority of all current requests.

【００８６】入力レジスタ８７、１００は、肯定応答２
２６が生成されてから受信シーケンスがデータ・ワード
をメモリに格納するまで、ビジー状態になる。入力レジ
スタがビジー状態になると、肯定応答は抑制される。ビ
ジー状態のとき、入力レジスタは受信シーケンスが実行
される前に重ね書きされるのを防止される（受信シーケ
ンスはメモリ・リフレッシュのために遅延される可能性
があるので）。The input registers 87 and 100 receive the acknowledgment 2
It is busy after 26 is generated until the receive sequence stores the data word in memory. Acknowledgments are suppressed when the input register becomes busy. When busy, the input register is prevented from being overwritten before the receive sequence is executed (since the receive sequence may be delayed due to memory refresh).

【００８７】ＴＣタグ・ビット２２４が送信側ジッパか
ら送られた場合、そのインターフェース用の入力割込み
ラッチ１７１がセットされる。ＰＭＥ実行時ソフトウェ
アによって割込みラッチがリセットされるまで、そのイ
ンターフェース上でそれ以上肯定応答２２６は生成され
ない。たとえば、ＴＣタグ・ビット２２４がＸインター
フェース８２からのデータ転送時にセットされた場合、
Ｌ割込みがとられＬ割込みラッチがリセットされるま
で、Ｘからのそれ以上の要求は抑制される。When the TC tag bit 224 is sent by the sending zipper, the input interrupt latch 171 for that interface is set. No further acknowledgments 226 are generated on that interface until the PME runtime software resets the interrupt latch. For example, if TC tag bit 224 is set during a data transfer from X interface 82:
Further requests from X are suppressed until the L interrupt is taken and the L interrupt latch is reset.

【００８８】ＴＣタグ・ビット２２４がセットされてデ
ータ・ワードが転送され、受信側ＰＭＥがＺＮモードで
ある場合、外部インターフェース用の入出力割込み１７
１が生成され、割込みコード１９０がＴＣを反映するよ
うにセットされる。さらに、送信側ジッパからＴＣタグ
が送られないうちにバッファ・カウントが０になる場
合、ＰＭＥはＺＣモードにトグルする。I / O interrupt 17 for the external interface if TC tag bit 224 is set, the data word is transferred, and the receiving PME is in ZN mode.
A 1 is generated and the interrupt code 190 is set to reflect TC. Furthermore, if the buffer count reaches 0 before the TC tag is sent from the sending zipper, the PME will toggle to ZC mode.

【００８９】ＰＭＥは、ＺＮ受信モードのとき、メモリ
・リフレッシュ・シーケンスとジッパ入力のための受信
シーケンスしか実行できない。これが必要なのは、ジッ
パのデータ転送が最大ＰＭＥクロック速度で起こり得る
からである。ＰＭＥ命令の実行や非ジッパ入力用の受信
シーケンスのための時間はない。ＺＮモードの間、ＰＥ
ハードウェアは、ジッパ入力要求を除くすべての入力要
求を抑制する。ＺＣモードのＰＭＥは、前掲の関連特許
出願"PME Store and Forward/Circuit Switched Modes"
で説明されているように、回線交換モードでのＰＭＥ用
の全範囲の動作を行うことができる。それには、ジッパ
・データに対してスプリッタ・サブモードを使用できる
能力も含まれる。When in the ZN receive mode, the PME can only execute the memory refresh sequence and the receive sequence for zipper input. This is necessary because zipper data transfers can occur at maximum PME clock rates. There is no time for the execution of PME instructions or receive sequences for non-zipper inputs. PE during ZN mode
The hardware suppresses all input requests except zipper input requests. The ZC mode PME is related to the above-mentioned related patent application "PME Store and Forward / Circuit Switched Modes"
The full range of operation for PMEs in circuit switched mode can be performed as described in. It also includes the ability to use the splitter submode for zipper data.

【００９０】ジッパ出力動作で、データがクラスタ内の
選択されたＰＭＥグループからアレイ制御装置に転送さ
れる。ジッパ出力動作は、アレイ制御装置実行時ソフト
ウェアによって開始され、このソフトウェアはまずＳＩ
ＭＤモード同報通信コマンドを使って、ジッパ・インタ
ーフェースの周りのＰＭＥをジッパ通常（ＺＮ）モード
とジッパ回線交換（ＺＣ）モードのどちらかに置く。次
いでアレイ制御装置は、ＺＮモードのＰＭＥＳＩＭＤ
ソフトウェアに、送信すべきワード数を与える。The zipper output operation transfers data from the selected PME group in the cluster to the array controller. The zipper output operation is initiated by the array controller run-time software, which first
The MD mode broadcast command is used to place the PME around the zipper interface in either zipper normal (ZN) mode or zipper line switched (ZC) mode. The array controller then switches to PN SIMD in ZN mode.
Give the software the number of words to send.

【００９１】概念的には、データは発信側ＰＭＥのメイ
ン・メモリからホスト・コンピュータのメイン・メモリ
に転送される。好ましい実施例では、各インターフェー
スごとに、出力データ・ブロック２３０の開始アドレス
とブロック２３１に格納されているワードの数を格納す
るための記憶位置が、メモリ内に２個ずつ予約されてい
る。さらに、ＰＭＥ制御レジスタ１（図８参照）がデー
タ出力の宛先とモードを制御する。同報通信ＳＩＭＤ
ＰＭＥソフトウェアは、転送モードを定義するため、Ｐ
ＭＥ制御レジスタ１へのロードを行う。同報通信ＳＩＭ
ＤＰＭＥソフトウェアとＰＭＥ実行時ソフトウェアの
どちらかが、ホストに転送すべきデータを指定されたメ
モリ位置にロードする。次いで同報通信ＳＩＭＤＰＭ
Ｅソフトウェアがアドレスとカウントを指定のメモリ位
置にロードする。次にそのソフトウェアはＰＭＥ制御レ
ジスタ１へのロードを行い、最後にＯＵＴ命令を実行し
て、データ送信シーケンスを開始させる。Conceptually, data is transferred from the main memory of the originating PME to the main memory of the host computer. In the preferred embodiment, for each interface, two storage locations are reserved in memory for storing the starting address of output data block 230 and the number of words stored in block 231. Further, the PME control register 1 (see FIG. 8) controls the destination and mode of data output. Broadcast SIMD
The PME software defines the transfer mode,
The ME control register 1 is loaded. Broadcast SIM
Either the D PME software or the PME run-time software loads the data to be transferred to the host into the specified memory location. Then broadcast SIMD PM
The E software loads the address and count into the specified memory location. The software then loads the PME control register 1 and finally executes the OUT instruction to start the data transmission sequence.

【００９２】ＰＭＥの可能な１つの構成でのジッパ出力
動作を図１４に示す。この図では、８ワードを異なる３
つのＰＭＥに転送する例が示されている。データ・イン
ターフェース（ジッパ）はデータをＰＭＥ２８０から転
送し、アレイを介してＰＭＥからＰＭＥへ移される。The zipper output operation in one possible PME configuration is shown in FIG. In this figure, 8 words are different 3
An example of transferring to one PME is shown. The data interface (zipper) transfers data from the PME 280 and moves from PME to PME through the array.

【００９３】この例では、アレイ制御装置が最初にＰＥ
Ａ２８０、ＰＥＢ２８１、ＰＥＤ２８３をＺＮモー
ドに設定し、ＰＥＣ２８２をＺＣモードに設定する。
ジッパ出力動作では、ＰＭＥ制御レジスタ１の"Ｚ"ビッ
ト１６３と"ＣＳ"ビット１７０をセットすると、ＰＭＥ
がＺＣモードになる。"Ｚ"ビット１６３をセットし、"
ＣＳ"ビット１７０をリセットすると、ＰＭＥはＺＮモ
ードになる。ＰＥＡ、ＰＥＢ、ＰＥＤにはそれぞ
れ３、４、１のカウントが割り当てられている。ＰＭＥ
Ａは通常送信シーケンスを使ってその３データ・ワー
ドを送信する。ワード・カウントが０になると、ＰＭＥ
Ａ内のハードウェアがＰＭＥ制御レジスタ１の"ＣＳ"
ビット１７０をリセットして、ＰＭＥＡ２８４をＺＣ
モードに入らせる。ＰＭＥＢ２８９及びＰＭＥＤ２
９５内でも同じシーケンスが起こる。ＰＭＥＤのＰＭ
Ｅ制御レジスタ"ＴＣ"１６４がセットされている場合、
（ＰＭＥＤからの）最後のワード転送時にＰＭＥＤ
は転送完了（ＴＣ）タグ・ビット２２４を挿入する。Ｔ
Ｃタグがセットされている場合、ＰＭＥＡ〜Ｄはその
ビットを検出し、入出力割込み要求１７１を生成するこ
とになる。ＴＣタグがセットされていない場合は、ＰＭ
ＥＡ〜Ｄは転送終了時にＺＣモードに留まる。In this example, the array controller is the first PE
A280, PE B281, PED283 are set to ZN mode, and PE C282 is set to ZC mode.
In the zipper output operation, setting "Z" bit 163 and "CS" bit 170 of PME control register 1 causes PME
Goes into ZC mode. "Z" bit 163 is set,
Resetting the CS "bit 170 puts the PME in ZN mode. PE A, PE B, and PE D are assigned counts of 3, 4, and 1, respectively.
A sends its three data words using the normal send sequence. When the word count reaches 0, PME
The hardware in A is "CS" of PME control register 1
Reset Bit 170 and PME A284 to ZC
Enter the mode. PME B289 and PME D2
The same sequence occurs within 95. PM of PMED
If the E control register "TC" 164 is set,
PMED on last word transfer (from PMED)
Inserts the transfer complete (TC) tag bit 224. T
If the C tag is set, PME AD will detect that bit and generate an I / O interrupt request 171. If the TC tag is not set, PM
EA-D remain in ZC mode at the end of the transfer.

【００９４】送信シーケンスでデータ・ワードが送信さ
れるごとに、カウント２３１が減分され、開始アドレス
２３０が増分され、メモリ４１から１データ・ワードが
読み取られる。そのデータ・ワードは送信レジスタ４
７、９６にロードされ、選択されたＰＭＥ９７、１６１
インターフェースに送られる。送信シーケンスは遊休Ｐ
ＭＥにブレイクインして、メモリ４１及びＡＬＵ４２へ
のアクセスをサイクル・スチールすることにより、入出
力アドレス及びカウント・フィールドを更新させ、かつ
送信レジスタ４７、９６へのロードを行わせる。ジッパ
転送では、ＰＭＥ制御レジスタ１のＣＸビット１６５が
セットされ、その結果、送信シーケンスが完了するまで
ＰＭＥプロセッサは遊休状態になる。このシーケンス
は、カウントが０に達するまで続く。Each time a data word is sent in the send sequence, the count 231 is decremented, the start address 230 is incremented, and one data word is read from memory 41. The data word is transmit register 4
PME 97,161 loaded and selected on 7,96
Sent to the interface. Transmission sequence is idle P
Breaking into ME to cycle steal access to memory 41 and ALU 42 causes the I / O address and count fields to be updated and the transmit registers 47, 96 to be loaded. For zipper transfers, CX bit 165 of PME control register 1 is set, resulting in the PME processor being idle until the transmission sequence is complete. This sequence continues until the count reaches zero.

【００９５】データ転送インターフェースは４ビット幅
９７である。したがって、各１６ビット・データ・ワー
ド２２０は、４つの４ビット切片（ニッブル）として送
られる。データと一緒にタグ・ニッブル２２１とパリテ
ィ・ニッブル２２２も送られる。転送フォーマットは２
２３に示してある。The data transfer interface is 4 bits wide 97. Therefore, each 16-bit data word 220 is sent as four 4-bit intercepts (nibbles). A tag nibble 221 and a parity nibble 222 are also sent with the data. Transfer format is 2
23.

【００９６】送信シーケンスを図１６に示す。インター
フェース上で送信側ＰＭＥが受信側ジッパ・インターフ
ェースに要求２２５を発生する。肯定応答２２６を受け
取ると、送信側ＰＭＥはデータ転送を開始し、次の送信
シーケンスが起こることができる。肯定応答を受け取る
まで、次の送信シーケンスは起こらない。The transmission sequence is shown in FIG. On the interface, the sending PME issues a request 225 to the receiving zipper interface. Upon receipt of the acknowledgment 226, the sending PME may initiate the data transfer and the next transmission sequence may occur. The next transmission sequence does not occur until an acknowledgment is received.

【００９７】ＰＭＥ制御レジスタ１のＴＣビット１６４
がセットされる場合、ＴＣビット２２４は最後に転送さ
れたデータ・ワードのタグ・フィールド中でセットされ
ることになる。このビットは、受信側ジッパにデータ転
送の終了を知らせる。TC bit 164 of PME control register 1
Is set, TC bit 224 will be set in the tag field of the last transferred data word. This bit informs the receiving zipper that the data transfer is complete.

【００９８】ＰＭＥはＺＮ送信モードのとき、送信シー
ケンスとメモリ・リフレッシュ・シーケンスしか実行で
きない。これが必要なのは、ジッパ・データ転送が最大
ＰＭＥクロック速度で起こり得るからである。ＰＭＥ命
令の実行や非ジッパ入力用の受信シーケンスのための時
間はない。ＺＮ送信モードの間、ＰＭＥハードウェアは
すべての入力要求を抑制する。ＺＣモードのＰＭＥは、
前掲の関連特許出願"PME Store and Forward/Circuit S
witched Modes"に記載されているように、回線交換での
ＰＭＥ用の全範囲の動作を行うことができる。これに
は、ジッパ・データに対してスプリッタ・サブモードを
使用できる能力も含まれる。When in the ZN transmission mode, the PME can only execute the transmission sequence and the memory refresh sequence. This is necessary because zipper data transfers can occur at maximum PME clock rates. There is no time for the execution of PME instructions or receive sequences for non-zipper inputs. During ZN transmission mode, PME hardware suppresses all input requests. ZC mode PME
Related patent application mentioned above "PME Store and Forward / Circuit S
The full range of operation for circuit switched PMEs can be performed as described in "witched Modes", including the ability to use splitter submodes for zipper data.

【００９９】ジッパ・インターフェースは、図１６の上
端と下端に示すように、アレイ制御装置をクラスタ上の
ノードに接続する。通常のインターフェースは、２ニッ
ブル（４ビット）一方向２点間インターフェースからな
り、これは２つのＰＭＥ間で一方向全２重転送をもたら
す。前掲の関連特許出願"PME Store and Forward/Circu
it Switched Modes"に記載されているような、一方のＰ
ＭＥから他方のＰＭＥに６ニッブルを転送する前述のプ
ロセスを使用する方が好ましい。基本的には、データ経
路２０２、要求線２０３、肯定応答線２０４を使って、
左側２００のＰＭＥから情報が転送される。同時に、デ
ータ経路２１０、要求線２１１、肯定応答線２１２を使
って、右側２０１のＰＭＥから情報を転送することがで
きる。ジッパがインターフェース上にインストールされ
るとき、データをアレイ内に入れるためにデータ経路２
１４、要求線２１５、肯定応答線２１６が付加され、デ
ータをアレイから出るためにデータ経路２１７、要求線
２１８、肯定応答線２１６が付加される。アレイ制御装
置実行時ソフトウェアは、ＰＭＥ２０１へのジッパ送信
シーケンスを実行したいとき、ＰＭＥ２００の実行時ソ
フトウェアに、２０２、２０３、２０４を使用不能にさ
せる。同時に、アレイ制御装置実行時ソフトウェアは、
ＰＭＥ２００へのジッパ受信シーケンスを実行したいと
き、ＰＭＥ２０１の実行時ソフトウェアに、２１０、２
１１、２１２を使用不能にさせる。ジッパ論理の配置は
全く任意であることに留意されたい。これは、容易に同
一ノードの＋Ｘ及び−Ｘインターフェース上に置くこと
もでき、またＷ、Ｙ、Ｚノード・インターフェースのい
ずれかまたはすべて上に置くこともできる。The zipper interface connects the array controller to the nodes on the cluster, as shown at the top and bottom of FIG. A typical interface consists of a two nibble (4 bit) unidirectional point-to-point interface, which results in a unidirectional full duplex transfer between two PMEs. Related patent application "PME Store and Forward / Circu
One P as described in it Switched Modes "
It is preferable to use the process described above to transfer 6 nibbles from the ME to the other PME. Basically, using data path 202, request line 203, acknowledge line 204,
Information is transferred from the PME on the left 200. At the same time, data path 210, request line 211, and acknowledgment line 212 can be used to transfer information from the PME on the right side 201. Data path 2 to put data into the array when the zipper is installed on the interface
14, request line 215, acknowledge line 216 are added, and data path 217, request line 218, acknowledge line 216 are added to exit the array. The array controller runtime software causes the runtime software of PME 200 to disable 202, 203, 204 when it wants to perform a zipper send sequence to PME 201. At the same time, the array controller run-time software
When you want to execute the zipper reception sequence to PME200, the runtime software of PME201
Disable 11,212. Note that the placement of zipper logic is quite arbitrary. It can easily be placed on the + X and -X interfaces of the same node, and can also be placed on any or all of the W, Y, Z node interfaces.

【０１００】本発明の好ましい実施例について記述した
が、当業者なら現在でも将来も、頭記の特許請求の範囲
に含まれる様々な改良や改善を行えることが理解されよ
う。特許請求の範囲は、最初に開示された本発明の適切
な保護を維持するものと解釈すべきである。Although a preferred embodiment of this invention has been described, a worker of ordinary skill in this art would recognize that various modifications and improvements, both now and in the future, are within the scope of the following claims. The claims should be construed to maintain the proper protection for the invention first disclosed.

[Brief description of drawings]

【図１】典型的な拡張並列アレイ・プロセッサ（ＡＰＡ
Ｐ）を例示し、特にＡＰＡＰの主要要素と、ホスト・プ
ロセッサまたは他のデータ発信元／宛先へのＡＰＡＰイ
ンターフェースとを示す機能構成図である。FIG. 1 illustrates a typical enhanced parallel array processor (APA).
FIG. 3B is a functional block diagram illustrating P) and specifically showing the main elements of APAP and the APAP interface to a host processor or other data source / destination.

【図２】プロセッサ・メモリ要素（ＰＭＥ）ノードの実
施例を示し、特にノードを構成する様々な要素の相互接
続を示す概略図である。FIG. 2 is a schematic diagram illustrating an embodiment of a processor memory element (PME) node, and in particular showing the interconnection of the various elements that make up the node.

【図３】修正２進ハイパーキューブを示す概略図であ
る。FIG. 3 is a schematic diagram showing a modified binary hypercube.

【図４】修正２進ハイパーキューブを示す概略図であ
る。FIG. 4 is a schematic diagram showing a modified binary hypercube.

【図５】回路交換経路を示す概略図である。FIG. 5 is a schematic diagram showing a circuit switching path.

【図６】単一のＰＭＥ−ＰＭＥインターフェース上のジ
ッパ接続を示す概略図である。FIG. 6 is a schematic diagram showing a zipper connection on a single PME-PME interface.

【図７】クラスタへの２つの直交する接続上のジッパ接
続を示す概略図である。FIG. 7 is a schematic diagram showing a zipper connection over two orthogonal connections to a cluster.

【図８】割込み及び入出力処理用の予約された記憶位置
を示す概略図である。ここで、実メモリ位置は、レベル
範囲の開始記憶アドレスにオフセットを加えて求める。
たとえば、右入力データ・バッファ・カウントは００Ｃ
Ｏ＋００３Ｄまたは^Ｘ^００ＦＤにある。FIG. 8 is a schematic diagram showing reserved storage locations for interrupt and I / O processing. Here, the actual memory position is obtained by adding an offset to the starting storage address of the level range.
For example, the right input data buffer count is 00C
It is in O + 003D or ^ X ^ 00FD.

【図９】割込みの実施態様をサポートするＰＭＥ制御レ
ジスタ及び相互接続ネットワークを示す概略図である。FIG. 9 is a schematic diagram showing PME control registers and interconnect networks supporting interrupt implementations.

【図１０】ジッパ受信シーケンスを示す概略図である。FIG. 10 is a schematic diagram showing a zipper reception sequence.

【図１１】プロセッサ・メモリ要素の実施例を例示する
データ流れ図である。このデータ流れの主要セクション
としては、主記憶域、汎用レジスタ、ＡＬＵ及びレジス
タ、及び相互接続メッシュの一部分がある。FIG. 11 is a data flow diagram illustrating an example of a processor memory element. The main sections of this data stream are main storage, general purpose registers, ALUs and registers, and parts of the interconnect mesh.

【図１２】ＰＭＥ入出力間で転送されるタグ、パリテ
ィ、及びデータ・ワードを示す概略図である。FIG. 12 is a schematic diagram showing tags, parity, and data words transferred between PME inputs and outputs.

【図１３】ジッパ送信シーケンスを示す概略図である。FIG. 13 is a schematic diagram showing a zipper transmission sequence.

【図１４】ＰＥ入出力データ流れを示す概略図である。FIG. 14 is a schematic diagram showing a PE input / output data flow.

【図１５】ＰＭＥ入出力間での出力インターフェースの
順序付けを示す概略図である。FIG. 15 is a schematic diagram showing ordering of output interfaces between PME inputs and outputs.

【図１６】転送シーケンスを示す図である。FIG. 16 is a diagram showing a transfer sequence.

【図１７】物理的ジッパ・インターフェースを示す概略
図である。FIG. 17 is a schematic diagram showing a physical zipper interface.

[Explanation of symbols]

１ホスト・プロセッサ２ホスト・メモリ３アプリケーション・プログラム・インターフェース
（ＡＰＩ）４クラスタ同期装置５クラスタ制御装置６クラスタ２０プロセッサ・メモリ要素（ＰＭＥ）２１同報通信／制御インターフェース（ＢＣＩ）部２２リング２３ノード・リング２６ノード・リング１５１ノード１５２Ｙ次元リング１５３４次元アレイ要素１５４相互接続経路1 Host Processor 2 Host Memory 3 Application Program Interface (API) 4 Cluster Synchronizer 5 Cluster Controller 6 Cluster 20 Processor Memory Element (PME) 21 Broadcast / Control Interface (BCI) Unit 22 Ring 23 Node -Ring 26-node ring 151-node 152 Y-dimensional ring 153 4-dimensional array element 154 Interconnection path

───────────────────────────────────────────────────── フロントページの続き (72)発明者クライブ・アラン・コリンズアメリカ合衆国12601、ニューヨーク州ポーキープシー、モンロー・ドライブ９ (72)発明者マイケル・チャールズ・ダップアメリカ合衆国13760、ニューヨーク州エンドウェル、アイヴォン・アベニュー 1130 (72)発明者ジェームズ・ウォレン・ディーフェンデルファーアメリカ合衆国13827、ニューヨーク州オウェゴ、フロント・ストリート 396 (72)発明者ドナルド・ジョージ・グライスアメリカ合衆国12401、ニューヨーク州キングストン、ソーキル＝ラビー・ロード 2179 (72)発明者ビリー・ジャック・ノウルズアメリカ合衆国12401、ニューヨーク州キングストン、ハーリー・アベニュー 72 (72)発明者ドナルド・マイケル・レスマイスターアメリカ合衆国13850、ニューヨーク州ヴェスタル、コリンズ・ヒル・ロード 108 エイ (72)発明者リチャード・エドワード・ニアアメリカ合衆国13732、ニューヨーク州アパラチン、フォレスト・ヒル・ロード 109 (72)発明者エリー・ユージン・レターアメリカ合衆国18851、ペンシルバニア州ウォレン・センター、エイチ・シー・アール34 ボックス29ビー (72)発明者デイヴィッド・ブルース・ロルフアメリカ合衆国12491、ニューヨーク州ウェスト・ハリー、パイン・トリー・ロード 24 (72)発明者ヴィンセント・ジョン・スモーラルアメリカ合衆国13760、ニューヨーク州エンドウェル、スカイライン・テラス 812 ─────────────────────────────────────────────────── ─── Continued Front Page (72) Inventor Clive Alan Collins, Monroe Drive, Pawkeepsie, NY 12601, USA 9 (72) Inventor Michael Charles Dup, USA 13760, Ivon Avenue, Endwell, NY 1130 (72) Inventor James Warren Diefendelfer Front Street, Owego, NY 13827, USA 396 (72) Inventor Donald George Grice, United States 12401, Kingston, NY Sawkill-Ravy Road 2179 ( 72) Inventor Billy Jack Knowles 12401, Kingston, NY Harley Avenue 72 (72) Inventor Donald Michael Resmeister United States 13850, Vestal, NY 108 Collins Hill Road 108 A (72) Inventor Richard Edward Near USA 13732, Appalachin, NY Forest Hill Road 109 (72) Inventor Elie Eugene Letter 34 Box 29 Be, H. Sea Earl, Warren Center, PA 18851, USA (72) Inventor David Bruce Rolf United States 12491, U.S.A. West Harry, Pine Tree Road 24 (72) Inventor Vincent John Smallal Skyline Terrace, Endwell, New York 13760, USA 812

Claims

[Claims]

1. A means for interconnecting nodes as a ring of a folded torus without edges to the network, and logically breaking the ring along a dimension orthogonal to the ring so that the edge to the network is established. To dynamically toggle the network between edgeless and edged networks, to move data in and out of the network through the edge when it is active, and to traverse the edge Means to distribute data into or out of the network so that the data rate to be matched matches both the sustained and peak data rates of systems external to the network. A device for moving data in and out of a network of connected nodes .

2. When a logical break of the ring is performed n times, n
Claim 10 characterized by the fact that the resulting peak and sustained data rates through n edges approach n times the data rate through a single edge. The apparatus according to Item 1.

3. A ring connects nodes in an m-dimensional orthogonal form, n logical breaks in the orthogonal direction of the ring are performed in a plurality of orthogonal dimensions, and the same data is obtained by the multidimensional orthogonal break of the ring. A quadratic pass of the same data that is moved in and out of the network, resulting in a simplification and acceleration of processing for functions such as transposition and matrix multiplication in the network.
The device according to.

4. A processor memory element having a plurality of processor memory elements having a communication path to another processor memory element and a node identification, the nodes between the processor memory elements along the communication path providing a processing array. A computer system interconnected as an n-dimensional network with parallel communication paths, and further having a communication ring that, when broken, can provide an effective interface to the outside of the node I / O bus path.

5. Arranged as a massively parallel machine, the nodes are interconnected as an n-dimensional network with parallel communication paths between processor and memory elements along the internal and external communication paths providing a processing array. And a further break, a communication ring that can provide a valid interface to the external I / O bus path, and an external storage device or additional display device, a subset of the processor memory elements of the processing array of the nodes. 5. The computer system of claim 4, having an interface for connecting to.

6. A node is coupled with other nodes by a four-dimensional modified hypercube that supports eight external ports, the nodes being connected through the ports to form a ring and the matching ports interconnected. Allows you to interconnect rings into rings of a ring, thereby logically zipping the edges of folded clusters and connecting them to the external system bus for I / O to external systems. The computer system according to claim 4, wherein a folded cluster structure can be formed.

7. A ring within a ring, any or all of those rings logically breaking the interface to external I / O, and the two ends of the broken ring to the external I / O bus as a logical operation. By connecting, a modified hyper-extension that allows for topologies that allow regular interprocessor memory elements inside node communication at one time interval and I / O at another time interval. A break process at the level of the ring inside the modified hypercube, including cube interconnects, effectively "unzips" the ring for I / O purposes and resides on 1-n edges of the modified hypercube. Providing an independent interface into a potential array, parallel inputs to multiple dimensions of the array or multiple orders of the array Wherein the supporting broadcast communication to a computer system according to claim 4.

8. A configuration-specific application requirement in which a node array alternates data transfers into and out of the array between two nodes that are directly connected to the zipper, with zipper of different sizes. The computer system according to claim 4, characterized in that:

9. An application processor interface is provided for managing a path to an input / output zipper, wherein data received from a host system in attached mode or data received from a device in independent mode is transferred to an array, Computer according to claim 4, characterized in that the processor memory elements in the array managing the I / O are started before the start of this type of operation.
system.

10. A plurality of processor memory elements forming a node, each processor memory element having a communication path providing a processing array inside and outside the node, each node having a node identification. , Nodes are interconnected as a network with parallel communication paths between nodes, processors, and memory elements along the communication path providing a processing array, and when the system communication path is broken, an external I / O bus path is provided. A multiprocessor memory computer system characterized in that each node can be provided with an effective interface to one or more processor memory elements of a processing array of nodes.

11. A plurality of processor memory elements having a communication path to another processor memory element and a node identification, wherein the nodes are in parallel between the processor memory elements along a communication path providing a processing array. Interconnected as an n-dimensional network with communication paths, and the system has communication rings that, when broken, provide a valid interface to external I / O bus paths and external storage or additional display. An interface for connecting a device to a subset of processor memory elements of a processing array of nodes and a computer system.