JPH07287700A

JPH07287700A - Computer system

Info

Publication number: JPH07287700A
Application number: JP5118660A
Authority: JP
Inventors: Paul Amba Wilkinson; ポール・アンバ・ウィルキンソン; Norman Barker Thomas; トマス・ノーマン・バーカー; James W Dieffenderfer; ジェームズ・ウォレン・ディーフェンデルファー; M Koch Peter; ピーター・マイケル・コッヘ
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1992-05-22
Filing date: 1993-05-20
Publication date: 1995-10-31

Abstract

PURPOSE: To allow the computer to act like an SIMD mode processor with high floating decimal point precision by decoding an SIMD command in a picket and processing an SIMD with an array of processors able to execute data in parallel. CONSTITUTION: In order to conduct inter-communication of data and instructions, a plurality of array processing elements 406 coupled as a picket 100 are provided. Each picket 100 has a plurality of devices to conduct various execution capability and each picket 100 acquires various modes to execute data in each picket 100 depending on the execution capability. Then each picket 100 decodes an SIMD command to execute SIMD processing by the array 406 of the processor executing data in parallel. Thus, the computer acts like the SIMD mode processor having high floating decimal point precision.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、コンピュータおよびコ
ンピュータ・システムに関し、具体的には、プロセッサ
のアレイ、ならびにローカル自律性を有するピケットか
ら形成されたアレイを有する、本明細書でＳＩＭＩＭＤ
マシンと称するマシンとして機能して、データのアレイ
を実行することのできるプロセッサのアレイに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to computers and computer systems, and in particular to SIMIMD herein having an array of processors and an array formed from pickets with local autonomy.
An array of processors that can function as a machine, called a machine, and execute an array of data.

【０００２】[0002]

【従来の技術】はじめに、本明細書で用いられる用語に
ついて説明する。2. Description of the Related Art First, terms used in this specification will be described.

【０００３】・ＡＬＵＡＬＵとは、プロセッサの演算論理機構部分である。ALU ALU is the arithmetic logic unit of the processor.

【０００４】・アレイアレイとは、１次元または多次元における要素のアレイ
を指す。アレイは、順番に並べた１組のデータ項目（ア
レイ要素）を含むことができるが、ＦＯＲＴＲＡＮのよ
うな言語では、それらのデータ項目は単一の名前で識別
される。他の言語では、順番に並べた１組のデータ項目
の名前は、すべて同じ属性を持つ順番に並べた１組のデ
ータ要素を指す。プログラム・アレイでは、一般に数ま
たは次元属性によって次元が指定される。アレイの宣言
子でアレイの各次元のサイズを指定する言語もあり、ア
レイがテーブル内の要素のアレイとなっている言語もあ
る。ハードウェア的な意味では、アレイは、大規模並列
アーキテクチャにおいて全体として同一な構造（機能要
素）の集合体である。データ並列コンピュータ処理にお
けるアレイ要素は、動作を割り当てることができ、並列
状態のとき、それぞれ独立にかつ並列に必要な動作を実
行できる要素である。一般に、アレイは処理要素の格子
と考えることができる。アレイの各セクションに区分デ
ータを割り当てることにより、区分データを規則的な格
子パターン内で移動することができる。ただし、データ
に索引を付け、あるいはデータをアレイ中の任意の位置
に割り当てることが可能である。Array An array refers to an array of elements in one or more dimensions. An array can include an ordered set of data items (array elements), but in languages such as FORTRAN, those data items are identified by a single name. In other languages, the name of an ordered set of data items refers to an ordered set of data elements that all have the same attributes. In program arrays, dimensions are generally specified by number or dimension attributes. In some languages, array declarators specify the size of each dimension of the array, and in some languages the array is an array of elements in a table. In a hardware sense, an array is a collection of structures (functional elements) that are the same in a massively parallel architecture as a whole. An array element in data parallel computer processing is an element to which an operation can be assigned and which can execute a required operation independently and in parallel in a parallel state. In general, an array can be thought of as a grid of processing elements. By assigning partition data to each section of the array, the partition data can be moved in a regular grid pattern. However, it is possible to index the data or assign the data to any location in the array.

【０００５】・アレイ・ディレクタアレイ・ディレクタとは、アレイの制御プログラムとし
てプログラミングされる単位である。アレイ・ディレク
タは、アレイとして配列された機能要素のグループのマ
スタ制御プログラムとしての機能を果す。Array Director The array director is a unit programmed as an array control program. The array director serves as a master control program for a group of functional elements arranged in an array.

【０００６】・アレイ・プロセッサアレイ・プロセッサには主として、複数命令複数データ
方式（ＭＩＭＤ）と単一命令複数データ方式（ＳＩＭ
Ｄ）との２種類がある。ＭＩＭＤアレイ・プロセッサで
は、アレイ中の各処理要素が、それ自体のデータを使っ
てそれ自体の固有の命令ストリームを実行する。ＳＩＭ
Ｄアレイ・プロセッサでは、アレイ中の各処理要素が、
共通の命令ストリームを介して同一の命令に限定され
る。ただし、各処理要素に関連するデータは固有であ
る。本発明の好ましいアレイ・プロセッサには他にも特
徴がある。本明細書では、これをＡＰＡＰと呼び、ＡＰ
ＡＰという略語を使用する。Array Processor Array processors mainly include multiple instruction multiple data scheme (MIMD) and single instruction multiple data scheme (SIM).
There are two types, D). In a MIMD array processor, each processing element in the array uses its own data to execute its own unique instruction stream. SIM
In a D array processor, each processing element in the array is
Limited to the same instruction via a common instruction stream. However, the data associated with each processing element is unique. There are other features of the preferred array processor of this invention. In this specification, this is referred to as APAP, and AP
The abbreviation AP is used.

【０００７】・非同期非同期とは、規則的な時間関係がないことである。すな
わち、各機能の実行間の関係が予測不能であり、各機能
の実行間に規則的または予測可能な時間関係が存在しな
い。制御状況では、制御プログラムは、データが、アド
レスされている遊休要素を待っているとき、制御が渡さ
れる位置にアドレスする。このため、諸動作が、どの事
象とも時間が一致しないのに順序通りのままとなる。Asynchronous Asynchronous means that there is no regular time relationship. That is, the relationship between the executions of each function is unpredictable, and there is no regular or predictable time relationship between the executions of each function. In a control situation, the control program addresses the location to which control is passed when data is waiting for the idle element being addressed. This keeps the actions in order even though they do not match the time of any event.

【０００８】・ＢＯＰＳ／ＧＯＰＳＢＯＰＳまたはＧＯＰＳは、１秒当たり１０億回の動作
という同じ意味の略語である。ＧＯＰＳを参照された
い。BOPS / GOPS BOPS or GOPS is an abbreviation for the same meaning of 1 billion operations per second. See GOPS.

【０００９】・回線交換／蓄積交換これらの用語は、ノードのネットワークを介してデータ
・パケットを移動するための２つの機構を指す。蓄積交
換は、データ・パケットを各中間ノードで受信し、その
メモリに格納してから、その宛先に向かって転送する機
構である。回線交換は、中間ノードに、その入力ポート
を出力ポートに論理的に接続するよう指令して、データ
・パケットが、中間ノードのメモリに入らずに、ノード
を直接通過して宛先に向かうことができるようにする機
構である。Circuit Switching / Store-and-Switch These terms refer to two mechanisms for moving data packets through a network of nodes. Store-and-forward is a mechanism by which a data packet is received at each intermediate node, stored in its memory, and then forwarded towards its destination. Circuit switching directs an intermediate node to logically connect its input port to an output port so that data packets can pass directly through the node to their destination without entering the intermediate node's memory. It is a mechanism that enables it.

【００１０】・クラスタクラスタとは、制御ユニット（クラスタ制御装置）と、
それに接続されたハードウェア（端末、機能ユニット、
または仮想構成要素）とから成るステーション（または
機能ユニット）である。本明細書では、クラスタは、ノ
ード・アレイとも称するプロセッサ・メモリ要素（ＰＭ
Ｅ）のアレイを含む。通常、クラスタは５１２個のＰＭ
Ｅ要素を有する。Cluster A cluster is a control unit (cluster control device),
Hardware connected to it (terminals, functional units,
Or a virtual component) and a station (or functional unit). As used herein, a cluster is a processor memory element (PM), also referred to as a node array.
E) of the array. Normally, a cluster has 512 PM
It has an E element.

【００１１】本発明の全ＰＭＥノード・アレイは、それ
ぞれ１つのクラスタ制御装置（ＣＣ）によってサポート
される１組のクラスタから成る。The entire PME node array of the present invention consists of a set of clusters each supported by one cluster controller (CC).

【００１２】・クラスタ制御装置クラスタ制御装置とは、それに接続された複数の装置ま
たは機能ユニットの入出力動作を制御する装置である。
クラスタ制御装置は通常、ＩＢＭ３６０１金融機関通
信制御装置におけるように、該ユニットに格納され、そ
こで実行されるプログラムの制御を受けるが、ＩＢＭ
３２７２制御装置におけるように、ハードウェアで完全
に制御可能である。Cluster control device A cluster control device is a device that controls the input / output operations of a plurality of devices or functional units connected to it.
The cluster controller is typically under the control of a program stored in, and executing on, the unit, as in the IBM 3601 financial institution communication controller.
It is fully controllable in hardware, as in the 3272 controller.

【００１３】・クラスタ・シンクロナイザクラスタ・シンクロナイザとは、あるクラスタのすべて
または一部分の動作を管理して、諸要素の同期動作を維
持し、各機能ユニットがプログラムの実行と特定の時間
関係を維持できるようにする機能ユニットである。Cluster Synchronizer A cluster synchronizer manages the operation of all or a portion of a cluster to maintain the synchronized operation of the elements so that each functional unit can maintain a specific time relationship with the execution of the program. It is a functional unit.

【００１４】・制御装置制御装置とは、相互接続ネットワークのリンクを介した
データおよび命令の伝送を指令する装置である。制御装
置の動作は、制御装置が接続されたプロセッサによって
実行されるプログラム、または制御装置内で実行される
プログラムによって制御される。Control Device A control device is a device that directs the transmission of data and commands over the links of an interconnection network. The operation of the control device is controlled by a program executed by a processor to which the control device is connected or a program executed in the control device.

【００１５】・ＣＭＯＳＣＭＯＳとは、相補型金属酸化膜半導体技術の略語であ
る。これは、ダイナミック・ランダム・アクセス・メモ
リ（ＤＲＡＭ）の製造に広く使用されている。ＮＭＯＳ
は、ダイナミック・ランダム・アクセス・メモリの製造
に使用されるもう１つの技術である。本発明では相補型
金属酸化膜半導体の方を使用するが、拡張並列アレイ・
プロセッサ（ＡＰＡＰ）の製造に使用する技術によっ
て、使用される半導体技術の範囲が制限されることはな
い。CMOS CMOS is an abbreviation for complementary metal oxide semiconductor technology. It is widely used in the manufacture of dynamic random access memory (DRAM). NMOS
Is another technique used in the manufacture of dynamic random access memories. In the present invention, the complementary metal oxide semiconductor is used.
The technology used to manufacture the processor (APAP) does not limit the scope of semiconductor technology used.

【００１６】・ドッティングドッティングとは、物理的な接続によって３本以上のリ
ード線を結合することを指す。たいていのバックパネル
・バスではこの接続方法を使用している。この用語は、
過去のＯＲＤＯＴＳと関係があるが、ここでは、非常
に単純なプロトコルによってバス上に結合できる複数の
データ源を識別するのに使用する。Dotting refers to joining three or more lead wires by physical connection. Most backpanel buses use this connection method. This term
Related to the OR DOTS of the past, it is used here to identify multiple data sources that can be coupled onto the bus by a very simple protocol.

【００１７】本発明における入出力ジッパの概念を用い
て、あるノードに入る入力ポートが、あるノードから出
る出力ポート、またはシステム・バスからくるデータに
よって駆動できるという概念を実施することができる。
逆に、あるノードから出力されるデータは、別のノード
およびシステム・バスへの入力として使用できる。シス
テム・バスと別のノードへのデータ出力は、同時には実
行されず、別のサイクルで実行されることに留意された
い。The concept of I / O zippers in the present invention can be used to implement the concept that an input port entering a node can be driven by an output port exiting a node or by data coming from the system bus.
Conversely, the data output from one node can be used as an input to another node and system bus. It should be noted that the data output to the system bus and another node are not executed at the same time but in another cycle.

【００１８】ドッティングは、それを利用することによ
り２ポート式のＰＥまたはＰＭＥまたはピケットを様々
な編成のアレイに使用できる、Ｈ−ＤＯＴの議論で使用
されている。２次元メッシュおよび３次元メッシュ、ベ
ース２Ｎキューブ、スパース・ベース４Ｎキューブ、ス
パース・ベース８Ｎキューブを含めて、いくつかのトポ
ロジーが議論されている。Dotting is used in the discussion of H-DOT, by which it is possible to use a two-port PE or PME or picket in an array of various configurations. Several topologies have been discussed, including 2-D and 3-D meshes, base 2N cubes, sparse base 4N cubes, sparse base 8N cubes.

【００１９】・ＤＲＡＭＤＲＡＭとは、コンピュータが主記憶装置として使用す
る共通記憶装置であるダイナミック・ランダム・アクセ
ス・メモリの略語である。ただし、ＤＲＡＭという用語
は、キャッシュとして、または主記憶装置ではないメモ
リとして使用するのにも適用できる。DRAM DRAM is an abbreviation for dynamic random access memory, which is a common storage device used by a computer as a main storage device. However, the term DRAM is also applicable to use as a cache or memory that is not main memory.

【００２０】・浮動小数点浮動小数点数は、固定小数部すなわち小数部と、約束上
の基数または基底に対する指数部の２つの部分で表され
る。指数は、１０進小数点の実際の位置を示す。典型的
な浮動小数点の表記法では、実数０．０００１２３４は
０．１２３４−３と表される。ここで、０．１２３４は
小数部であり、−３は指数である。この例では、浮動小
数点基数または基底は１０であり、暗示的な１より大き
な正の固定整数基底を表す。浮動小数点表示で明示的に
示される、あるいは浮動小数点表示で指数部で表される
指数でこの基底をべき乗し、次に小数部を掛けると、表
される実数が求められる。数字リテラルは、浮動小数点
表記法で表すことも実数で表すこともできる。Floating-Point Floating-point numbers are represented in two parts: a fixed-point part, or fractional part, and an exponent part to a radix or base on the promise. The exponent indicates the actual position of the decimal point. In typical floating point notation, the real number 0.0001234 is represented as 0.1234-3. Here, 0.1234 is a decimal part and -3 is an exponent. In this example, the floating point radix or base is 10, representing an implicit fixed integer base greater than one. The exponent represented explicitly in the floating-point representation or represented by the exponent in the floating-point representation is raised to the power of this exponent and then multiplied by the fractional part to obtain the represented real number. Numeric literals can be represented either in floating point notation or in real numbers.

【００２１】・ＦＬＯＰＳこの用語は、１秒当たりの浮動小数点命令数を指す。浮
動小数点演算には、ＡＤＤ（加算）、ＳＵＢ（減算）、
ＭＰＹ（乗算）、ＤＩＶ（除算）と、しばしばその他の
多くの演算が含まれる。１秒当たり浮動小数点命令数と
いうパラメータは、しばしば加算命令または乗算命令を
使って算出され、一般に５０／５０ミックスとみなすこ
とができる。演算には、指数部、小数部の生成と、必要
な小数部の正規化が含まれる。本発明では、３２ビット
または４８ビットの浮動小数点フォーマットを扱うこと
ができる（これより長くてもよいが、そのようなフォー
マットはミックスではカウントしなかった）。浮動小数
点演算を固定小数点命令（正規またはＲＩＳＣ）で実施
する際には、複数の命令が必要である。性能を計算する
際に１０対１の比率を使用する人もあれば、比率を６．
２５にした方が適切であることを示す研究もある。アー
キテクチャごとに比率が異なる。FLOPS This term refers to the number of floating point instructions per second. Floating point operations include ADD (addition), SUB (subtraction),
It involves MPY (multiplication), DIV (division) and often many other operations. The parameter of floating point instructions per second is often calculated using add or multiply instructions and can generally be considered a 50/50 mix. The calculation includes generation of an exponent part and a decimal part, and normalization of a necessary decimal part. The present invention can handle 32-bit or 48-bit floating point formats (though it may be longer, but such formats were not counted in the mix). Multiple instructions are required when performing floating point operations with fixed point instructions (regular or RISC). Some people use a 10: 1 ratio when calculating performance, and a ratio of 6: 1.
Some studies have shown that 25 is more appropriate. Different architectures have different ratios.

【００２２】・機能ユニット機能ユニットとは、ある目的を達成できる、ハードウェ
ア、ソフトウェア、あるいはその両方のエンティティで
ある。Functional Unit A functional unit is a hardware, software, or both entity that can achieve a certain purpose.

【００２３】・ＧバイトＧバイトとは１０億バイトを指す。Ｇバイト／秒は、１
秒当たり１０億バイトということになる。G bytes G bytes refer to 1 billion bytes. 1 Gbyte / sec
That's 1 billion bytes per second.

【００２４】・ＧＩＧＡＦＬＯＰＳ１秒当たり１０⁹個の浮動小数点命令GIGAFLOPS 10 ⁹ floating point instructions per second

【００２５】・ＧＯＰＳおよびＰＥＴＡＯＰＳＧＯＰＳまたはＢＯＰＳは、１秒当たり１０億回の演算
という同じ意味を持つ。ＰＥＴＡＯＰＳは、現在のマシ
ンの潜在能力である１秒当たり１兆回の演算という意味
である。本発明のＡＰＡＰマシンでは、これらの用語
は、１秒当たり１０億個の命令数を意味するＢＩＰ／Ｇ
ＩＰとほぼ同じである。１つの命令で複数の演算（すな
わち、加算と乗算の両方）を実行できるマシンもある
が、本発明ではそのようにはしない。また、１つの演算
を実行するのに多数の命令を要する場合もある。たとえ
ば、本発明では複数の命令を使って、６４ビット演算を
実行している。しかし、演算をカウントする際、対数演
算のカウントは行わなかった。性能を記述するにはＧＯ
ＰＳを使用する方が好ましいが、それを一貫して使うこ
とはしなかった。ＭＩＰ／ＭＯＰ、その上の単位として
ＢＩＰ／ＢＯＰ、およびＭｅｇａＦＬＯＰＳ／Ｇｉｇａ
ＦＬＯＰＳ／ＴｅｒａＦＬＯＰＳ／ＰｅｔａＦＬＯＰＳ
が使用される。GOPS and PETAOPS GOPS or BOPS have the same meaning of 1 billion operations per second. PETAOPS means 1 trillion operations per second, which is the potential of current machines. In the APAP machine of the present invention, these terms are BIP / G which means 1 billion instructions per second.
It is almost the same as IP. While some machines can perform multiple operations (ie, both add and multiply) with a single instruction, the present invention does not. Also, it may take many instructions to perform one operation. For example, the present invention uses multiple instructions to perform 64-bit operations. However, when counting the calculation, the logarithmic calculation was not performed. GO to describe performance
Preference was given to using PS, but it was not used consistently. MIP / MOP, BIP / BOP as units above it, and MegaFLOPS / Giga
FLOPS / TeraFLOPS / PetaFLOPS
Is used.

【００２６】・ＩＳＡＩＳＡとは、ＳｅｔＡｒｃｈｉｔｅｃｔｕｒｅ（アー
キテクチャ設定）命令を意味する。ISA ISA means a Set Architecture (architecture setting) instruction.

【００２７】・リンクリンクとは、物理的または論理的要素である。物理的リ
ンクは要素またはユニットを結合するための物理接続で
あり、一方コンピュータ・プログラミングにおけるリン
クは、プログラムの別々の部分間で制御およびパラメー
タのやり取りを行う命令またはアドレスである。多重シ
ステムでは、実アドレスまたは仮想アドレスで識別され
るリンクを識別するプログラム・コードによって指定さ
れる、２つのシステム間の接続がリンクである。したが
って、リンクには一般に、物理媒体、任意のプロトコ
ル、ならびに関連する装置およびプログラミングが含ま
れる。すなわち、リンクは論理的であるとともに物理的
である。Link A link is a physical or logical element. A physical link is a physical connection for connecting elements or units, while a link in computer programming is an instruction or address that exchanges control and parameters between different parts of a program. In multiple systems, a link is a connection between two systems specified by program code that identifies the link identified by a real or virtual address. Thus, a link typically includes the physical medium, any protocol, and associated equipment and programming. That is, the link is both logical and physical.

【００２８】・ＭＦＬＯＰＳＭＦＬＯＰＳは、１秒当たり１０⁶個の浮動小数点命令
を意味する。MFLOPS MFLOPS means 10 ⁶ floating point instructions per second.

【００２９】・ＭＩＭＤＭＩＭＤは、アレイ内の各プロセッサがそれ自体の命令
ストリームを持ち、したがって多重命令ストリームを有
し、１処理要素当たり１つずつ配置された複数データ・
ストリームを実行する、プロセッサ・アレイ・アーキテ
クチャを指すのに使用される。MIMD MIMD is a multiple data arrangement where each processor in the array has its own instruction stream and thus multiple instruction streams, one for each processing element.
Used to refer to the processor array architecture that executes a stream.

【００３０】・モジュールモジュールとは、離散しており識別可能なプログラム単
位、あるいは他の構成要素と共に使用するように設計さ
れたハードウェアの機能単位である。また、単一の電子
チップに含まれるＰＥの集合体もモジュールと呼ばれ
る。Module A module is a discrete and identifiable program unit, or a functional unit of hardware designed for use with other components. An aggregate of PEs included in a single electronic chip is also called a module.

【００３１】・ノード一般に、ノードとはリンクの接合部である。ＰＥの汎用
アレイでは、１つのＰＥをノードとすることができる。
ノードはまた、モジュールというＰＥの集合体を含むこ
ともできる。本発明では、ノードはＰＭＥのアレイから
形成されており、この１組のＰＭＥをノードと称する。
ノードは８個のＰＭＥであることが好ましい。Node In general, a node is a junction of links. In a general-purpose array of PEs, one PE can be a node.
A node can also include a collection of PEs called modules. In the present invention, a node is formed from an array of PMEs, and this set of PMEs is called a node.
The node is preferably 8 PMEs.

【００３２】・ノード・アレイＰＭＥから構成されるモジュールの集合体をノード・ア
レイと呼ぶことがある。これは、モジュールから構成さ
れるノードのアレイである。ノード・アレイは通常、
２、３個より多いＰＭＥであるが、この用語は複数を包
含する。Node array A collection of modules composed of PMEs is sometimes called a node array. It is an array of nodes made up of modules. Node arrays are typically
Although there are more than a few PMEs, the term encompasses a plurality.

【００３３】・ＰＤＥＰＤＥとは、偏微分方程式である。PDE PDE is a partial differential equation.

【００３４】・ＰＤＥ緩和解法プロセスＰＤＥ緩和解法プロセスとは、ＰＤＥ（偏微分方程式）
を解く方法である。ＰＤＥを解くには、既知の分野にお
けるスーパー・コンピュータの計算能力の大半を使用
し、したがってこれは緩和プロセスの好例となる。ＰＤ
Ｅ方程式を解く方法は多数あり、複数の数値解法に緩和
プロセスが含まれている。たとえば、ＰＤＥを有限要素
法で解く場合、緩和の計算に大部分の時間が費やされ
る。熱伝達の分野の例を考えてみよう。煙突内に高温の
ガスがあり、外では冷たい風が吹いているとすると、煙
突のレンガ内の温度勾配はどのようになるだろうか。レ
ンガを小さなセグメントとみなし、セグメント間を熱が
どのように流れるかを温度差の関数として表す方程式を
書くと、伝熱ＰＤＥが有限要素問題に変換される。ここ
で、内側と外側の要素を除くすべての要素が室温であ
り、境界セグメントが高温のガスと冷たい風の温度であ
るとすると、緩和を開始するための問題ができあがる。
その後、コンピュータ・プログラムでは、セグメントに
流れ込む、あるいはセグメントから流れ出る熱の量に基
づいて各セグメント内の温度変数を更新することによ
り、時間をモデル化する。煙突における１組の温度変数
を緩和して、物理的な煙突で発生する実際の温度分布を
表すには、モデル中のすべてのセグメントを処理するサ
イクルに何回もかけなければならない。目的が煙突にお
けるガス冷却をモデル化することである場合、諸要素を
気体方程式に拡張しなければならず、そうすると、内側
の境界条件が別の有限要素モデルとリンクされ、このプ
ロセスが続く。熱の流れが隣接するセグメント間の温度
差に依存することに留意されたい。したがって、ＰＥ間
通信経路を使って温度変数を分配する。ＰＤＥ関係が並
列計算にうまく適用できるのは、この隣接接通信パター
ンまたは特性による。PDE relaxation solution process PDE relaxation solution process is PDE (partial differential equation)
Is a method of solving. Solving PDEs uses most of the computational power of supercomputers in the known field, and thus it exemplifies the mitigation process. PD
There are many ways to solve the E equation, and several numerical solutions include the relaxation process. For example, when solving a PDE with the finite element method, most of the time is spent computing the relaxation. Consider the example of the field of heat transfer. Given the hot gas inside the chimney and the cold wind outside, what is the temperature gradient inside the chimney bricks? Considering bricks as small segments and writing an equation that describes how heat flows between the segments as a function of temperature difference, the heat transfer PDE is transformed into a finite element problem. Now, assuming that all elements, except the inner and outer elements, are at room temperature and the boundary segments are the temperature of the hot gas and cold wind, a problem is created to initiate the relaxation.
The computer program then models the time by updating the temperature variable within each segment based on the amount of heat flowing into or out of the segment. In order to relax a set of temperature variables in the chimney to represent the actual temperature distribution that occurs in the physical chimney, it must take many cycles to process all the segments in the model. If the goal is to model gas cooling in a chimney, the elements must be extended to the gas equation, then the inner boundary conditions are linked with another finite element model, and this process continues. Note that the heat flow depends on the temperature difference between adjacent segments. Therefore, the temperature variable is distributed using the communication path between PEs. It is due to this adjacent communication pattern or characteristic that the PDE relationship can be successfully applied to parallel computing.

【００３５】・ピケットこれは、アレイ・プロセッサを構成する要素のアレイ内
の要素である。この要素は、データ・フロー（ＡＬＵ
ＲＥＧＳ）、メモリ、制御機構、通信マトリックスのこ
の要素と関連する部分から構成される。この単位は、並
列プロセッサ要素およびメモリ要素と、その制御機構お
よびアレイ相互通信機構の一部から成るアレイ・プロセ
ッサの１／ｎを指す。ピケットは、プロセッサ・メモリ
要素（ＰＭＥ）の１つの形である。本発明のＰＭＥチッ
プ設計プロセッサ論理機構は、関連出願に記載されてい
るピケット論理を実施し、あるいはノードとして形成さ
れたプロセッサ・アレイ用の論理を持つことができる。
ピケットという用語は、処理要素を表す、一般的に使用
されているアレイ用語のＰＥと似ており、好ましくはビ
ット並列バイトの情報をクロック・サイクルで処理する
ための処理要素とローカル・メモリの組合せからなる、
処理アレイの要素である。好ましい実施例は、バイト幅
データ・フロー・プロセッサ、３２バイト以上のメモ
リ、原始制御機構、および他のピケットとの通信機構か
ら構成されている。Picket This is an element within the array of elements that make up the array processor. This element is a data flow (ALU
REGS), memory, controls, and the parts of the communication matrix associated with this element. This unit refers to 1 / n of an array processor that consists of parallel processor elements and memory elements and their control and part of the array intercommunication mechanism. Pickets are a form of processor memory element (PME). The PME chip design processor logic of the present invention may implement the picket logic described in the related application or have the logic for a processor array formed as a node.
The term picket is similar to the commonly used array term PE for processing elements, preferably a combination of processing elements and local memory for processing bit parallel bytes of information in a clock cycle. Consists of,
It is an element of the processing array. The preferred embodiment consists of a byte wide data flow processor, 32 bytes or more of memory, a primitive control mechanism, and a mechanism for communicating with other pickets.

【００３６】「ピケット」という用語は、トム・ソーヤ
ーと、彼の白いフェンスに由来している。ただし、機能
的には、軍隊のピケット・ラインと類似性があることも
理解されよう。The term "picket" comes from Tom Sawyer and his white fence. However, it will also be understood that, functionally, it is similar to the army's picket line.

【００３７】・ピケット・チップピケット・チップは、単一のシリコン・チップ上に複数
のピケットを含んでいる。Picket Chip A picket chip contains multiple pickets on a single silicon chip.

【００３８】・ピケット・プロセッサ・システム（また
はサブシステム）ピケット・プロセッサは、ピケットのアレイと、通信ネ
ットワークと、入出力システムと、マイクロプロセッ
サ、かん詰ルーチン・プロセッサ、およびアレイを実行
するマイクロ制御装置から成るＳＩＭＤ制御装置とから
構成されるトータル・システムである。Picket Processor System (or Subsystem) A picket processor is an array of pickets, a communications network, an input / output system, a microprocessor, a canned routine processor, and a microcontroller that implements the array. It is a total system composed of a SIMD control device composed of

【００３９】・ピケット・アーキテクチャピケット・アーキテクチャは、ＳＩＭＤアーキテクチャ
の好ましい実施例であり、次のことを含む複数の多様な
問題に対応できる機能をもつ。 −セット連想処理 −並列数値中心処理 −イメージに類似した物理的アレイ処理Picket Architecture The picket architecture is the preferred embodiment of the SIMD architecture and has the capability to address a number of diverse problems including: -Set associative processing-Parallel numerical central processing-Image-like physical array processing

【００４０】・ピケット・アレイピケット・アレイは、幾何的順序で配列されたピケット
の集合体であり、規則正しいアレイである。Picket Array A picket array is an ordered collection of pickets arranged in a geometrical order.

【００４１】・ＰＭＥすなわちプロセッサ・メモリ要素ＰＭＥは、プロセッサ・メモリ要素を表す。本明細書で
は、ＰＭＥという用語を、本発明の並列アレイ・プロセ
ッサの１つを形成する、単一のプロセッサ、メモリ、お
よび入出力可能なシステム要素もしくはユニットを指す
のに使用する。ＰＭＥは、ピケットを包含する用語であ
る。ＰＭＥは、プロセッサ、それと結合されたメモリ、
制御インタフェース、およびアレイ通信ネットワーク機
構の一部分から成るプロセッサ・アレイの１／ｎであ
る。この要素は、ピケット・プロセッサにおけるよう
に、正規のアレイの接続性を持つＰＭＥ、あるいは上述
の多重ＰＭＥノードにおけるように、サブアレイの一部
としてのＰＭＥを備えることができる。PME or Processor Memory Element PME stands for Processor Memory Element. The term PME is used herein to refer to a single processor, memory, and I / O capable system element or unit forming one of the parallel array processors of the present invention. PME is a term that encompasses pickets. A PME is a processor, memory associated with it,
1 / n of the processor array that consists of the control interface and part of the array communication network facility. This element can comprise a PME with regular array connectivity, such as in a picket processor, or a PME as part of a sub-array, such as in the multiple PME node described above.

【００４２】・経路指定経路指定とは、メッセージを宛先に届けるための物理経
路を割り当てることである。経路の割当てには、発信元
と宛先が必要である。これらの要素またはアドレスは、
一時的な関係または類縁性を持つ。メッセージの経路指
定は、しばしば、割当てのテーブルを参照することによ
って得られるキーに基づいて行われる。ネットワーク内
では、宛先は、リンクを識別する経路制御アドレスによ
って、伝送される情報の宛先としてアドレス指定され
る、任意のステーションまたはネットワークのアドレス
指定可能ユニットである。宛先フィールドは、メッセー
ジ・ヘッダ宛先コードで宛先を識別する。Routing The routing is the allocation of a physical route for delivering a message to a destination. A source and a destination are required for route allocation. These elements or addresses are
Has a temporary relationship or affinity. Message routing is often based on keys obtained by looking up a table of assignments. Within a network, a destination is any station or network addressable unit that is addressed as the destination of information to be transmitted by a routing address that identifies a link. The destination field identifies the destination with the message header destination code.

【００４３】・ＳＩＭＤアレイ内のすべてのプロセッサが、単一命令ストリーム
から、１処理要素当たり１つずつ配置された複数データ
・ストリームを実行するように指令を受ける、プロセッ
サ・アレイ・アーキテクチャ。A processor array architecture in which all processors in a SIMD array are instructed to execute multiple data streams, one arranged per processing element, from a single instruction stream.

【００４４】・ＳＩＭＤＭＩＭＤまたはＳＩＭＤ／ＭＩ
ＭＤＳＩＭＤＭＩＭＤまたはＳＩＭＤ／ＭＩＭＤとは、ある
時間の間ＭＩＭＤからＳＩＭＤに切り換えて複雑な命令
を処理できる二重機能を持ち、したがって２つのモード
を持つマシンを指す用語である。シンキング・マシンズ
社（Thinking Machines, Inc）のコネクション・マシン
（Connection Machine）モデルＣＭ−２をＭＩＭＤマシ
ンのフロント・エンドまたはバック・エンドとして配置
すると、プログラマは、二重モードとも称する、複数の
モードを動作させてある問題の別々の部分を実行するこ
とができた。これらのマシンは、ＩＬＬＩＡＣ以来存在
しており、バスを使用してマスタＣＰＵを他のプロセッ
サと相互接続している。マスタ制御プロセッサは、他の
ＣＰＵの処理に割り込む能力を持つ。他のＣＰＵは、独
立のプログラム・コードを実行できる。割込み中、チェ
ックポイント機能用に何らかの処理が必要である（制御
されるプロセッサの現状況のクローズおよびセーブ）。SIMDMIMD or SIMD / MI
MD SIMDMIMD or SIMD / MIMD is a term that refers to a machine that has the dual function of being able to switch from MIMD to SIMD for a period of time to process complex instructions, and thus has two modes. Placing a Connection Machine Model CM-2 from Thinking Machines, Inc as the front end or back end of a MIMD machine allows the programmer to configure multiple modes, also known as dual mode. I was able to run different parts of the problem that were running. These machines have been around since ILLIAC and use a bus to interconnect a master CPU with other processors. The master control processor has the ability to interrupt the processing of other CPUs. Other CPUs can execute independent program code. During the interrupt, something needs to be done for the checkpoint function (close and save the current state of the controlled processor).

【００４５】・ＳＩＭＩＭＤＳＩＭＩＭＤは、アレイ内のすべてのプロセッサが、単
一命令ストリームから、１処理要素当たり１つずつ配置
された複数データ・ストリームを実行するように指令を
受ける、プロセッサ・アレイ・アーキテクチャである。
この構成内では、命令実行を模倣する、各ピケット内の
データ従属演算が、ＳＩＭＤ命令ストリームによって制
御される。SIMIMD SIMIMD is a processor array architecture in which all processors in the array are commanded by a single instruction stream to execute multiple data streams arranged one per processing element. Is.
Within this configuration, the data dependent operations within each picket that mimic instruction execution are controlled by the SIMD instruction stream.

【００４６】これは、ＳＩＭＤ命令ストリームを使用し
て複数命令ストリーム（１ピケット当たり１個）を順序
付けし、複数データ・ストリーム（１ピケット当たり１
個）を実行することの可能な、単一命令ストリーム・マ
シンである。ＳＩＭＩＭＤは、ＰＭＥシステムによって
実行できる。This uses the SIMD instruction stream to order multiple instruction streams (one per picket) and multiple data streams (one per picket).
A single instruction stream machine capable of executing SIMIMD can be performed by the PME system.

【００４７】・ＳＩＳＤＳＩＳＤは、単一命令単一データの略語である。SISD SISD is an abbreviation for single instruction single data.

【００４８】・スワッピングスワッピングとは、ある記
憶域のデータ内容を別の記憶域のデータ内容と相互に交
換することをいう。Swapping Swapping refers to the interchange of the data content of one storage area with the data content of another storage area.

【００４９】・同期動作ＭＩＭＤマシンにおける同期動作は、各アクションがあ
る事象（通常はクロック）に関係付けられる、動作モー
ドである。この事象は、プログラム・シーケンス中で規
則的に発生する、指定された事象とすることができる。
動作は多数の処理要素にディスパッチされ、それらの処
理要素はそれぞれ独立して機能を実行する。動作が完了
しないかぎり、制御は制御装置に返されない。Synchronous Operation Synchronous operation in MIMD machines is an operational mode in which each action is associated with some event (usually a clock). This event can be a specified event that occurs regularly in the program sequence.
Actions are dispatched to a number of processing elements, each of which performs its function independently. Control is not returned to the controller until the operation is complete.

【００５０】要求が機能ユニットのアレイに対するもの
である場合、アレイ内の要素に制御装置から要求が出さ
れ、その要素は、制御装置に制御が返される前に動作を
完了しなければならない。If the request is for an array of functional units, a request is made by the controller to an element in the array, which element must complete operation before control is returned to the controller.

【００５１】・ＴＥＲＡＦＬＯＰＳＴＥＲＡＦＬＯＰＳは、１秒当たり１０¹²個の浮動小数
点命令を意味する。TERAFLOPS TERAFLOPS means 10 ¹² floating point instructions per second.

【００５２】・ＶＬＳＩＶＬＳＩとは、（集積回路に適用される）超大規模集積
の略語である。VLSI VLSI is an abbreviation for very large scale integration (applied to integrated circuits).

【００５３】・ジッパジッパとは、新規に提供される、アレイ構成の通常の相
互接続の外部にある装置からリンクを確立するための機
能である。Zipper A zipper is a newly provided function for establishing a link from a device external to the normal interconnect of an array configuration.

【００５４】以下に、本発明の背景となる従来技術につ
いて述べる。エンジニアは、コンピュータの高速化をあ
くなく追求する中で、数百、ときには数千もの低コスト
・マイクロプロセッサを並列にリンクして、スーパー・
スーパーコンピュータを構築し、今日のマシンには手の
負えない複雑な問題を解決しようとしている。そのよう
なマシンは、大規模並列マシンと呼ばれている。本発明
者等は、大規模並列システムを構築するための新規の方
法を開発した。本発明者等が加えた多数の改良は、他の
人々の多数の研究の背景と対比して考察すべきである。
当技術分野の要約は、参照する他の特許出願で行われて
いる。この点については、本発明者等の並列連想プロセ
ッサ・システムに関する関連米国特許出願第６０１５９
４号と拡張並列アレイ・プロセッサ（ＡＰＡＰ）に関す
る関連米国特許出願を参照されたい。特定の応用例にも
っとも適合するアーキテクチャを選択するにはシステム
のトレードオフが必要であるが、満足のいく解決策はこ
れまでに１つもない。本発明の概念を用いると、解決策
の実現が容易になる。The prior art as the background of the present invention will be described below. As engineers strive to speed up computers, hundreds, sometimes thousands, of low-cost microprocessors are linked in parallel to create super
They are building supercomputers and trying to solve complex problems that today's machines can't handle. Such machines are called massively parallel machines. The inventors have developed a new method for building a massively parallel system. The numerous improvements we have made should be considered against the background of numerous studies by others.
A summary of the art is made in other referenced patent applications. In this regard, US Patent Application No. 60159, relating to our parallel associative processor system, is discussed in this regard.
See No. 4 and related U.S. patent application for the Advanced Parallel Array Processor (APAP). System trade-offs are necessary to choose the architecture that best fits a particular application, but no single satisfactory solution has been available. Using the inventive concept, the solution is easy to implement.

【００５５】ピケットの最初の記述は、関連特許である
米国特許出願第６１１５９４号明細書で行われた。同明
細書および上述の関連特許出願明細書に、背景技術とみ
なされる多数の参照文献が論じられている。本発明の概
念は、プロセッサのアレイ内に、それぞれデータ・フロ
ー（ＡＬＵとＲＥＧ）、メモリ、制御論理機構、およ
び、ピケットに関連する通信マトリックスのうちでプロ
セッサ・アレイのピケット間での相互通信を可能にする
部分を含む、ピケットを設けることが望ましいというこ
とである。便宜上、関連特許出願明細書の「従来の技
術」の項を参照する。The first description of the picket was made in a related patent, US Pat. No. 6,111,594. In that same document and in the related patent application mentioned above, numerous references are considered to be background art. The concept of the present invention provides for intercommunication between the pickets of a processor array within the array of processors, each of data flow (ALU and REG), memory, control logic, and the communication matrix associated with the picket. It is desirable to provide a picket, including the enabling part. For convenience, reference is made to the "Prior Art" section of the related patent application.

【００５６】本明細書は、ＳＩＭＤマシンのアレイ内で
の、従来の意味での処理要素またはピケットの使用を対
象とする。ＳＩＭＤマシンは、何年も前から存在する
が、従来技術であるそのようなマシンの１つを、本明細
書の図１に示す。米国特許出願第０７／２５０５９５号
の継続出願である関連米国特許出願第０７／５１９３３
２号明細書に記載の従来技術では、高価で複雑なＭＩＭ
Ｄプロセッサに頼らずに並列処理能力をよりよく利用で
きるようにするために、柔軟度を高めた処理要素の多次
元アレイが記述されている。上記出願は、元来は欧州特
許出願第８８／３０７８８５／８８−Ａ号として１９８
９年５月３日に初めて公表された。この明細書に記載さ
れたシステムは、様々な並列処理要素内の制御論理機構
を接続するバスに沿って、ローカル・ビット直列実行用
の大域命令を送り、変更済みのビットが復号されるロー
カル・ビット線上で使用できるように、大域命令の選択
されたビットをプログラム的に変更する。The present specification is directed to the use of processing elements or pickets in the conventional sense within an array of SIMD machines. SIMD machines have been around for many years, but one such machine that is prior art is shown in FIG. 1 herein. Related US patent application Ser. No. 07 / 51,933, a continuation of US patent application Ser. No. 07 / 250,595.
In the prior art described in the specification No. 2, an expensive and complicated MIM is used.
In order to better utilize parallel processing power without resorting to D processors, multidimensional arrays of processing elements with increased flexibility have been described. The above application was originally 198 as European Patent Application No. 88/307885 / 88-A.
It was first announced on May 3, 1997. The system described in this specification sends global instructions for local bit serial execution along a bus connecting control logic in various parallel processing elements, where local bits are decoded and local bits are decoded. Programmatically modify selected bits of the global instruction for use on the bit line.

【００５７】次に、背景技術として検討できる当技術分
野の他のアレイ方式について述べる。Next, another array system in this technical field which can be considered as a background art will be described.

【００５８】米国特許第４７３６２９１号明細書では、
データのアレイの高速処理を実行するアレイ変形プロセ
ッサが論じられている。これは、地震分析の領域でのＦ
ＦＴアルゴリズムの実行用にとくに最適化されている。
この特許の焦点は、１５台もの異なる装置によって共用
される、バルク・メモリとシステム制御バスである。各
装置は、書込み可能な制御記憶域、プログラム・メモ
リ、制御ユニットおよび、１５台の装置のそれぞれに独
自の特徴を与える装置依存ユニットを有する。この特許
は、アレイ変形プロセッサに関するものであるが、この
ようなプロセッサ自体は、必ずしもプロセッサ並列アレ
イを利用する必要はなく、上記特許にはその旨の記述が
ない。アレイの変形は、本明細書と関連特許出願明細書
に記載のシステムによって行えることに留意されたい。
ただし、この特許明細書に記載のプロセッサは、その代
わりに、複雑な逐次繰返しの形でデータのアレイを処理
する、複数のサブユニット（またはステージ）を有す
る。これは、各サブユニットがデータのアレイの１要素
を取り、多数のプロセッサ内でデータを並列に処理する
という、本発明のＳＩＭＤプロセッサのアレイと異な
る。In US Pat. No. 4,736,291,
Array variant processors are discussed which perform high speed processing of arrays of data. This is F in the area of seismic analysis
It is specifically optimized for the implementation of the FT algorithm.
The focus of this patent is on bulk memory and system control buses shared by as many as 15 different devices. Each device has a writable control store, a program memory, a control unit and a device dependent unit that gives each of the 15 devices unique characteristics. This patent relates to an array variant processor, but such a processor itself does not necessarily have to utilize a processor parallel array, and the above patent does not describe that. Note that array modifications can be made by the systems described herein and in the related patent applications.
However, the processor described in this patent instead has multiple subunits (or stages) that process an array of data in a complex, iterative manner. This differs from the SIMD processor array of the present invention, where each subunit takes one element of the array of data and processes the data in parallel within multiple processors.

【００５９】米国特許第４８３１５１９号明細書では、
左右へのプロセッサ間接続を有し、その結果、１６ビッ
トのプロセッサ要素（ＰＥ）を隣同士で接続して、様々
なデータ・フォーマットに効果的に適応できるようにな
った、ＳＩＭＤアレイ・プロセッサを論じている。たと
えば、６４ビットの浮動小数点ワードは、連続するＰＥ
によって処理され、その際に、高位のＰＥが指数部を処
理し、他の３つが連動されて４８ビットの小数部を処理
する。制御と繰上り／借りをＰＥ間で連結すると、これ
が達成できる。１チップに、１６個のデータ用１６ビッ
トＰＥ、２個のアドレス生成用ＰＥおよび２個の予備Ｐ
Ｅを含むことができる。この特許の入出力は、４レベル
信号手法を使用して、２つの論理信号を１つのピンに組
み合わせ、２本の信号線の論理状態に応じて４つの異な
る電圧のうちの１つを生成することができる。In US Pat. No. 4,831,519,
A SIMD array processor with left-right inter-processor connections, so that 16-bit processor elements (PEs) can be connected side-by-side to effectively adapt to various data formats. Arguing. For example, a 64-bit floating point word is
, Where the higher PEs process the exponent and the other three are interlocked to process the 48-bit fraction. This can be achieved by connecting control and carry / borrow between PEs. 16 16-bit PEs for data, 2 PEs for address generation, and 2 spare Ps in 1 chip
E can be included. The input / output of this patent uses a four-level signaling technique to combine two logic signals into one pin to generate one of four different voltages depending on the logic state of the two signal lines. be able to.

【００６０】この特許では、制御機能または制御装置機
能を提供しなければならないが、それに関してはほとん
ど言及されていない。ただし、このアレイには、様々な
サイズのデータに作用するためにＰＥをどうグループ化
するかを定義する大域ＭＡＳＫ（マスク）と、あるネス
トのマスタＰＥから右側のスレーブＰＥへ伝播するロー
カルＮＥＳＴ（ネスト）制御が供給される。明らかにこ
のＰＥは、それがどのグループに属するのかを判断でき
ず、またどの処理も、ＰＥが処理中の、ＮＥＳＴ制御以
外のデータに基づくという意味でのローカル自律性を有
していない。この特許明細書には、隣接するＰＥ同士を
連結して単一のプロセッサとして動作させることによっ
て、１６ビット、３２ビットおよび６４ビットのデータ
幅を処理するようにチップ内で水平に拡張できる、ＳＩ
ＭＤチップ用の可能な設計が記載されている。したがっ
て、これは、メモリと論理機構を同一チップ上に有する
本発明のピケット・マシンの開発後に実施できるなんら
かの概念に適用できるとみなすこともでき、したがっ
て、これによって本発明の概念が拡張されることにな
る。並列処理要素を１チップ上に設けることが可能であ
る。ただし、この特許明細書では、処理要素をどのよう
に組み合わせれば、ある処理要素が参加要素であるか否
かをその処理要素が知るまたは判断できるようになるか
は示されていない。この特許明細書では、グループ化が
大域制御ＭＡＳＫによって指示される。本発明を理解し
た後にこの特許を検討すれば、ローカル自律性の能力が
ないことが明らかになろう。In this patent, control or controller functions must be provided, but little is said about it. However, this array has a global MASK (mask) that defines how PEs are grouped to operate on data of various sizes, and a local NEST (propagating from a master PE of a nest to a slave PE on the right). Nest control is provided. Obviously, this PE cannot determine which group it belongs to, nor does any process have local autonomy in the sense that it is based on data being processed by the PE other than NEST control. This patent specification contemplates that SI can be extended horizontally within a chip to handle 16-bit, 32-bit and 64-bit data widths by concatenating adjacent PEs to operate as a single processor.
Possible designs for MD chips are described. Therefore, it can also be considered applicable to any concept that can be implemented after the development of the picket machine of the present invention having memory and logic on the same chip, thus extending the inventive concept. become. It is possible to provide parallel processing elements on one chip. However, this patent specification does not show how to combine the processing elements so that the processing element can know or judge whether or not the processing element is a participating element. In this patent specification, grouping is dictated by the global control MASK. A review of this patent after understanding the invention will reveal the lack of local autonomy capabilities.

【００６１】米国特許第４７４８５８５号明細書は、並
列プロセッサの諸要素をセグメントに割り当てて、様々
な長さのデータに適応させる機構を論じている。これ
は、米国特許第４８３１５１９号明細書に類似してい
る。ただし、この明細書は、より長いワード長の処理を
得るために互いに連結された単一プロセッサのグループ
に関するものである。各単一プロセッサは、マイクロプ
ロセッサ、ＡＬＵ、レジスタ（ＲＥＧ）等を有するとい
う点で、完全である。この特許の明白な教示は、複数の
単一プロセッサを互いにロック・ステップして、幅広の
データ・ワードに作用することができることである。セ
グメント化の制御は、大域制御により組合せコードと大
域条件コードを使って行われているように見える。ＭＩ
ＭＤアレイとしてこの特許は、ＳＩＭＤアレイの能力を
提供できない。本発明が提供する改良は、以前に着想さ
れたことのない、ピケットの連結に関するものであっ
て、この特許の中心課題である、アレイを互いに結合し
てより幅広のプロセッサにするためのＭＩＭＤアレイの
制御に関するものではない。US Pat. No. 4,748,585 discusses a mechanism for assigning parallel processor elements to segments to accommodate varying lengths of data. This is similar to US Pat. No. 4,831,519. However, this specification relates to groups of uniprocessors that are linked together to obtain longer word length processing. Each single processor is complete in that it has a microprocessor, ALU, registers (REG), etc. The obvious teaching of this patent is that multiple single processors can be lockstepped together to operate on a wide data word. The control of segmentation seems to be done by the global control using the combination code and the global condition code. MI
As an MD array, this patent cannot provide the capabilities of a SIMD array. The improvement provided by the present invention relates to an unprecedented concatenation of pickets, which is the central subject of this patent, a MIMD array for combining arrays into a wider processor. It's not about controlling.

【００６２】米国特許第４８２５３５９号明細書では、
データのアレイを処理するためのプロセッサを論じてお
り、その計算は、高速フーリエ変換（ＦＦＴ）であろう
と思われる。このプロセッサは、計算の１ステップを行
うようにそれぞれをプログラミングすることのできる、
複数の処理オペレータを含んでいる。これは、複雑な処
理を実行する際にパイプラインとして動作する、複数の
処理オペレータを有する複合単一プロセッサとして分類
できるはずである。この特許では、グループ化も自律性
も論じていず、むしろ、広い分野のオペレータに適合す
るためのなんらかの改良を行うことが目的であると思わ
れる。In US Pat. No. 4,825,359,
It discusses a processor for processing an array of data, the calculation of which is likely to be a Fast Fourier Transform (FFT). The processors can each be programmed to perform one step of the calculation,
It contains multiple processing operators. It could be categorized as a complex single processor with multiple processing operators, which acts as a pipeline in performing complex processing. In this patent neither grouping nor autonomy is discussed, but rather the purpose seems to be to make some improvements to accommodate a wide range of operators.

【００６３】米国特許第４９０５１４３号明細書は、２
つのタイプの変数のすべての組合せの計算と、これらの
計算結果を使用して、データのローカル依存性を有する
再帰方程式の計算を実行するための、アレイ・プロセッ
サを論じたものであり、その議論によれば、音声認識の
パターン照合の際に使用される動的時間ワープ理論また
は動的プログラミング理論に基づく照合計算を特徴とす
る。理解が難しいが、このプロセッサは、ある種のシス
トリックＭＩＭＤチューブとして機能することが意図さ
れていると思われる。ＰＥは、リングに配置され、中間
結果をリング内の次のＰＥに渡すものと思われる。各Ｐ
Ｅは、それ自体の命令メモリと、その他の特徴を有す
る。この特許は、ＳＩＭＤアレイ内でのＰＥの自律性を
論じておらず、さらに、このＰＥは、リングへの物理的
な配置によるグループ化を除いて、グループ化されてい
ないと思われる。US Pat. No. 4,905,143 describes 2
Discusses an array processor for performing the computation of all combinations of variables of one type and the use of these computations to compute recursive equations with local dependence of the data. It features matching calculations based on dynamic time warp theory or dynamic programming theory used in pattern matching for speech recognition. Although difficult to understand, this processor appears to be intended to function as some sort of systolic MIMD tube. The PE is expected to be placed in the ring and pass the intermediate result to the next PE in the ring. Each P
E has its own instruction memory and other features. This patent does not discuss the autonomy of PEs within a SIMD array, and further, the PEs appear to be ungrouped except by physical placement in a ring.

【００６４】米国特許第４９１０６６５号明細書は、各
ＰＥがその隣接のＰＥ８個に直接アクセスできる、２次
元ＳＩＭＤアレイ・プロセッサ相互接続方式を論じたも
のである。通信媒体は、隅にある４つの隣接ＰＥを相互
接続する点線のネットワークである。ＳＩＭＤマシンが
開示されているが、ＰＥのローカル自律性またはグルー
プ化を提供できる、あるいは提供すべきであるという考
えはやはり存在せず、ましてや本明細書に記載の方法で
提供されることはない。US Pat. No. 4,910,665 discusses a two-dimensional SIMD array processor interconnect scheme in which each PE has direct access to its eight adjacent PEs. The communication medium is a dotted network that interconnects four adjacent PEs in the corner. Although SIMD machines are disclosed, there is still no idea that local autonomy or grouping of PEs can or should be provided, much less in the manner described herein. .

【００６５】米国特許第４９２５３１１号明細書には、
多重プロセッサ内のプロセッサを、ある問題に対してグ
ループとして一緒に作用するように割り当てることので
きる、多重プロセッサ・システムが開示されている。さ
らに、各プロセッサは、プロセッサ間でメッセージ、セ
マフォおよび他の制御を受け渡すことの他に、それ自体
をグループに追加し、それ自体をグループから除去する
ことができる。ただし、この特許は、本質的にＭＩＭＤ
であるため、本明細書に記載のようにＳＩＭＤアレイ内
のＰＥにローカル自律性をもたせることに関する示唆は
ない。その代わりに、この多重プロセッサ・システム内
の各プロセッサは、ＲＡＭ、マイクロプロセッサ、およ
びなんらかの機能ユニット（ディスク制御装置）を含む
ネットワーク・インターフェース制御装置を有する。グ
ループ化は、各プロセッサのネットワーク・インターフ
ェースによって制御される。本発明では、このように手
の込んだタスク区分が不要であり、グループ化が各ピケ
ット内で制御される。US Pat. No. 4,925,311 discloses that
A multiprocessor system is disclosed in which the processors within a multiprocessor can be assigned to act together as a group for a problem. Further, each processor can add itself to the group and remove itself from the group, as well as passing messages, semaphores and other controls between the processors. However, this patent is essentially a MIMD
Therefore, there is no suggestion of having the PEs in the SIMD array have local autonomy as described herein. Instead, each processor in the multiprocessor system has a network interface controller that includes RAM, a microprocessor, and some functional unit (disk controller). Grouping is controlled by the network interface of each processor. The present invention does not require such elaborate task divisions and grouping is controlled within each picket.

【００６６】米国特許第４９４３９１２号明細書では、
アレイ制御装置が、アレイ内の各ＰＥのメモリにプログ
ラムをロードし、その後、ＰＥ群に対して、それらが実
行を開始すべき位置を識別する手順開始コマンドを発行
する、ＮＥＷＳネットワークによって接続されたＭＩＭ
Ｄアレイを論じている。各ＰＥは、タスク・パターンを
格納するレジスタと、ＰＥのタスク・パターンを大域タ
スク・パターン・コマンドと突き合わせるための比較機
構を含んでいる。この比較の結果を使って、ＰＥ内のプ
ログラム開始点を選択し、あるいはＰＥを遊休状態に移
行させる。このＭＩＭＤ特許では、本明細書に記載する
ようなＳＩＭＤアレイ内のＰＥでの自律性は示唆されて
いない。比較機構とそれが生成する結果を使用して、異
なる並行タスクのためにＰＥを分類またはグループ化す
ることはできる。本明細書では、ＳＩＭＤ環境内で、Ｐ
Ｅがどのようにしてそれ自体を分類またはグループ化で
きるようになるかを詳述する。その結果、ＳＩＭＤコー
ドの一部の実行中に、本発明のピケットの一部が分離さ
れて活動状態になる。その後、本発明のマシンは、処理
を進めて、処理のために別のグループを活動化すること
ができる。In US Pat. No. 4,943,912,
Connected by the NEWS network, where the array controller loads the program into the memory of each PE in the array and then issues a start procedure command to the PEs that identifies where they should start executing. MIM
Discussing D arrays. Each PE includes a register that stores the task pattern and a comparison mechanism for matching the PE's task pattern with the global task pattern command. The result of this comparison is used to select a program start point within the PE or to transition the PE to the idle state. The MIMD patent does not suggest autonomy at the PE in the SIMD array as described herein. The comparison mechanism and the results it produces can be used to classify or group PEs for different concurrent tasks. In this specification, in the SIMD environment, P
Details how E becomes able to classify or group itself. As a result, during execution of a portion of SIMD code, a portion of the picket of the present invention becomes detached and active. The machine of the present invention can then proceed with the process and activate another group for the process.

【００６７】米国特許第４９６７３４０号明細書は、各
要素が、２つのレジスタ、１つの加算器、１つの乗算器
および３つのプログラム式スイッチからなる、シストリ
ック・アレイ・プロセッサを論じたものである。このア
レイは、制御装置によって、アレイの各段または各要素
のスイッチをセットすることによって構成される。その
後、これらの段にデータを送り込んで、所望の結果を得
る。このようなプロセッサは、本発明のシステムを示唆
していない。US Pat. No. 4,967,340 discusses a systolic array processor where each element consists of two registers, one adder, one multiplier and three programmable switches. . The array is constructed by the controller setting switches on each stage or element of the array. The data is then fed into these stages to get the desired result. Such a processor does not suggest the system of the present invention.

【００６８】米国特許第５００５１２０号明細書は、複
数のプロセッサが存在するという意味でアレイに関する
ものである。しかしながら、この特許は、ビット直列信
号プロセッサ・アレイ内でのデータ整列に使用するため
の時間補償回路を論じたものである。このアレイの各要
素は、ＡＬＵに供給する４つのビット直列レジスタから
なる。最初のレジスタの前に時間補償回路が置かれる。
このアレイには、ＳＩＭＤマシンに類似する点がない。US Pat. No. 5,005,120 relates to arrays in the sense that there are multiple processors. However, this patent discusses a time compensation circuit for use in data alignment within a bit serial signal processor array. Each element of this array consists of a 4-bit serial register feeding the ALU. A time compensation circuit is placed before the first register.
This array has no similarities to SIMD machines.

【００６９】米国特許第５０２００５９号明細書は、障
害を発生したＰＥの周囲でアレイを再構成でき、基本的
な２次元メッシュ内でツリーその他のトポロジーを実現
できる、プロセッサのアレイ内で使用される一般化され
た相互接続方式に関するものである。アレイ制御アーキ
テクチャ、グループ化またはＰＥのなんらかの態様に関
する言及や示唆はなく、したがって、ＰＥ内での自律性
の議論は含まれていない。US Pat. No. 5020059 is used in an array of processors where the array can be reconfigured around a failed PE and a tree or other topology can be realized in a basic two-dimensional mesh. It relates to a generalized interconnection scheme. There is no mention or suggestion of any aspect of the array control architecture, grouping, or PE, and thus no discussion of autonomy within the PE is included.

【００７０】しかし、本発明の場合、ＳＩＭＤマシンの
アレイ内の処理要素またはピケットは、従来から、すべ
てのピケット内で完全に同一の動作を実行してきた。１
つまたは複数のピケットを選択的にディスエーブルし、
ローカル自律性を可能にすることを、本発明者等は必要
と認識し、開発した。However, in the present invention, processing elements or pickets within an array of SIMD machines have traditionally performed exactly the same operations within all pickets. 1
Selectively disable one or more pickets,
The present inventors recognized that it was necessary and developed to enable local autonomy.

【００７１】米国特許第４７８３７８２号明細書は、Ｓ
ＩＭＤアレイ・プロセッサ上での試験および障害迂回へ
の適応を論じたものである。これは、本明細書の議論の
主題に関係していない。米国特許第４７８３７８２号明
細書は、前に米国特許第４８３１５１９号明細書で論じ
られたＳＩＭＤアレイ内の最高２つの不良ＰＥを隔離す
るための製造試験時のチップ構成を論じたものである。
このチップは、欠陥データが記憶されるＰＲＯセクショ
ンを含んでいる。これは、制御装置によって読み取るこ
とができ、オンチップ資源の動的割振りに使用できる。
したがって、このチップは、プロセッサの自律性が制限
されている。US Pat. No. 4,783,782 describes S
It discusses testing and adaptation to fault diversion on IMD array processors. This is not related to the subject of the discussion herein. U.S. Pat. No. 4,783,782 discusses a manufacturing test chip configuration for isolating up to two defective PEs in a SIMD array previously discussed in U.S. Pat. No. 4,831,519.
The chip contains a PRO section where defective data is stored. It can be read by the controller and used for dynamic allocation of on-chip resources.
Therefore, this chip has limited processor autonomy.

【００７２】上記に述べたこれまでのすべての努力にも
かかわらず、１台のマシンでＳＩＭＤ、ＭＩＭＤおよび
ＳＩＭＩＭＤ処理を実行するための、上記の並列連想プ
ロセッサ・システムまたは動的多重モード並列アレイ・
アーキテクチャに類似するものはないように思われる。
実際、ＳＩＭＩＭＤ処理、浮動小数点その他に必要な機
構は、従来技術では十分に開発されていない。Notwithstanding all the efforts so far set forth above, a parallel associative processor system or dynamic multimode parallel array system as described above for performing SIMD, MIMD and SIMIMD processing on a single machine.
There seems to be no analog to the architecture.
In fact, the mechanisms required for SIMIMD processing, floating point, etc. have not been fully developed in the prior art.

【００７３】米国特許第４７８３７３８号明細書は、自
律性という態様を対象とする特許の１つである。同明細
書に記載のシステムによれば、ＳＩＭＤ制御装置は、ア
レイ内のすべての要素（ＰＥ）にコマンドを発行するこ
とができ、各ＰＥは、空間依存特性またはデータ依存特
性の結果としてコマンド内の１ビットを変更または修正
することができる。実例としては、ＡＤＤ／ＳＵＢ（加
算／減算）、ＳＥＮＤ／ＲＥＣＥＩＶＥ（送出／受取）
および、ＯＰＡ／ＯＰＢに対する一般化が含まれる。こ
の機能は、線の画像処理、および像の境界の処理に使用
されるはずである。本明細書に記載の自律機能の１つと
類似する点があるが（とくに、データに依存する動作を
ＡＬＵに実行させる点）、本発明は、１動作ビットの変
更にとくに焦点を置くものではない。本発明は、ＡＬＵ
機能がアレイ内の位置に特有の機能（空間的）であるこ
とには無関係である。この特許は、データ依存機能の領
域で本発明の機構のあるものと類似しているが、本発明
は、命令シーケンスのデータ依存部分（符号や条件コー
ドなど）が、単にＡＬＵに他の何かを行うよう強制する
場合に、ＤＷＩＭ（Do what I mean）機能を実施する。
この特許は、データ依存自律機能の領域では、明らかに
動作の際に１ビットの変更または挿入を使用する。US Pat. No. 4,783,738 is one of the patents covering the aspect of autonomy. The system described therein allows the SIMD controller to issue commands to all elements (PEs) in the array, with each PE in the command as a result of spatial or data dependent properties. One bit of can be changed or modified. Examples include ADD / SUB (addition / subtraction), SEND / RECEIVE (send / receive)
And a generalization to OPA / OPB is included. This feature should be used for line image processing and image boundary processing. Although similar to one of the autonomous functions described herein (particularly causing the ALU to perform data-dependent operations), the present invention does not specifically focus on changing one operation bit. . The present invention is an ALU
It is irrelevant that the function is position-specific (spatial) in the array. This patent is similar to some of the features of the present invention in the area of data-dependent functions, but the present invention allows the data-dependent portion of an instruction sequence (such as signs and condition codes) to simply be something else in the ALU. If you want to force the DWIM (Do what I mean) function.
This patent apparently uses 1-bit modification or insertion in operation in the area of data-dependent autonomous functions.

【００７４】米国特許第５０４５９９５号明細書は、各
ＰＥ内のデータ条件に基づいてＳＩＭＤアレイの各ＰＥ
をイネーブルまたはディスエーブルするための機構を論
じたものである。ある大域命令が、すべてのＰＥに、そ
のＰＥ内の状態を抽出し、それに応じてそのＰＥ自体を
状況レジスタ・ビットによってイネーブルまたはディス
エーブルするように指示する。もう１つの大域命令が、
ＰＥの状態を効果的に交換する。これらの機能を使用し
て、ＩＦ／ＴＨＥＮ／ＥＬＳＥ構造体およびＷＨＩＬＥ
／ＤＯ構造体を実施することができる。さらに、状況を
スタックして、入れ子式のイネーブル条件をサポートす
ることができる。したがって、この特許は、本明細書に
記載のイネーブル／ディスエーブル機能に関するもので
ある。しかし、この米国特許第５０４５９９５号明細書
は、（１）初期テスト、状況ビットのロード、および状
況ビットに基づくイネーブルまたはディスエーブル、
（２）すべてのディスエーブル／イネーブル・ビットを
反転して他の組のＰＥをイネーブルする命令、および
（３）ネスト化を実現するための記憶域、という３つの
条件を必要とするので、これに記載の装置は不必要に複
雑である。本発明では、この発明の条件（１）（２）
（３）のすべてを互いに結合せずに、機能をイネーブル
およびディスエーブルする機構を提供していることが後
で明らかになろう。本発明の機構は、米国特許第５０４
５９９５号明細書に記載の反転機能を必要としないが、
この特許に記載の装置が提供する能力よりも広範なシス
テムの能力を提供することが明らかになろう。本発明の
ＳＩＭＤマシンでは、この機能を、ＩＦ命令およびＥＬ
ＳＥ命令ならびにビデオ処理に関連する他の特徴と結び
付ける必要はない。US Pat. No. 5,045,995 discloses that each PE in a SIMD array is based on the data conditions in each PE.
Is a mechanism for enabling or disabling. A global instruction directs all PEs to extract the state within that PE and accordingly enable or disable itself by the status register bits. Another global command
Effectively exchange PE condition. Using these features, the IF / THEN / ELSE structure and WHILE
A / DO structure can be implemented. In addition, situations can be stacked to support nested enable conditions. Therefore, this patent relates to the enable / disable function described herein. However, this U.S. Pat. No. 5,045,995 discloses: (1) initial test, loading of status bits, and enabling or disabling based on status bits;
This requires three conditions: (2) an instruction that inverts all disable / enable bits to enable another set of PEs, and (3) storage to achieve nesting. The device described in 1. is unnecessarily complicated. In the present invention, the conditions (1) and (2) of the present invention are
It will be apparent later that a mechanism for enabling and disabling features is provided without coupling all of (3) together. The mechanism of the present invention is described in US Pat.
No inversion function described in 5995 is required,
It will be apparent that it provides a wider range of system capabilities than that provided by the devices described in this patent. In the SIMD machine of the present invention, this function is used for IF instruction and EL
It need not be tied to SE instructions as well as other features related to video processing.

【００７５】本発明の浮動小数点態様に関して、他の特
許が、関連特許明細書で参照されており、その１つに、
ＳＩＭＤ式に制御されるビット直列処理要素の２次元セ
ルラー・アレイに関する米国特許第４９３３８９５号明
細書がある。この開示の処理要素を用いると、通常考え
られる通り、ビット直列マシンの調節によって、条件が
満たされるまで小数部の１つを一時に１ビットずつシフ
トすることができる。この特徴は、データ自体と隣接処
理要素に基づいており、画像処理用に設計されたアレイ
の動作に適用されるはずである。米国特許第４９３３８
９５号明細書では、ビット直列浮動小数点動作が詳細に
議論されているが、バイト幅条件で発生すること、発生
し得ること、または発生すべきことに関する議論はな
い。この明細書には、直列マシンで予想されるような、
浮動小数点加算の調節部の実施で何か起こるかが記載さ
れている。まず、２つの指数部の差がとられ、次に、指
数部の差が０になるまで、選択された小数部がシフトさ
れる。Regarding the floating point aspect of the present invention, other patents are referenced in the related patent specifications, one of which is:
There is U.S. Pat. No. 4,933,895 to a two-dimensional cellular array of SIMD controlled bit serial processing elements. Using the processing elements of this disclosure, adjustments of the bit-serial machine can shift one of the fractions, one bit at a time, until the condition is met, as is usually considered. This feature is based on the data itself and adjacent processing elements and should apply to the operation of arrays designed for image processing. U.S. Pat. No. 49338
In the '95 patent, bit-serial floating point operations are discussed in detail, but there is no discussion of what happens, may happen, or should occur in byte wide conditions. In this specification, as expected for serial machines,
It describes what happens in the implementation of the floating point addition regulator. First, the difference between the two exponents is taken, then the selected fractional part is shifted until the difference between the exponents is zero.

【００７６】この検討から、様々な種類の並列アレイ・
マシンおよびとＳＩＭＤマシンが存在したことがわかる
であろう。さらに、浮動小数点マシンが発明されてい
る。しかし、バイト並列でプロセッサのアレイの実行に
よりデータを処理する、ＳＩＭＤマシンでの浮動小数点
用の実施態様を提供することに成功したものはない。こ
のような浮動小数点計算能力が求められている。本発明
者等の「並列連想プロセッサ・システム」や、プロセッ
サのアレイ内でデータを実行できる他の並列アレイ・マ
シンなどのプロセッサでは、浮動小数点演算が、ＳＩＭ
Ｄの悪夢の材料をすべて含んでいる。各処理要素（Ｐ
Ｅ）または各ピケットは、異なるデータ、異なる演算、
および異なるレベルの複雑さを含むことができる。これ
らの相違のすべてが、各要素が同じことを行っているＳ
ＩＭＤマシン上で浮動小数点を実行することの難しさを
増大させている。その１例が浮動小数点である。"Float
ing Point Implementation on a SIMD Machine"と題す
る上記の関連特許で参照される浮動小数点計算を使用す
れば、本発明によるローカル自律性が利用できる。From this study, various types of parallel arrays
It will be seen that there were machines and SIMD machines. In addition, floating point machines have been invented. However, none have been successful in providing an implementation for floating point on SIMD machines that processes data by executing an array of processors in byte parallel. Such floating point computing power is required. In processors such as our "parallel associative processor system" and other parallel array machines that can execute data within an array of processors, floating point arithmetic
Contains all of D's nightmare material. Each processing element (P
E) or each picket has different data, different operations,
And can include different levels of complexity. All of these differences S where each element does the same
It increases the difficulty of running floating point on IMD machines. One example is floating point. "Float
Using the floating point math referred to in the above-referenced related patent entitled "ING Point Implementation on a SIMD Machine", local autonomy in accordance with the present invention can be exploited.

【００７７】浮動小数点の実行は、本発明のシステムに
とって重要であるが、他のいくつかの重要な計算援助手
段を列挙する。アレイ・プロセッサ内でこれらの計算援
助手段を実施すると、本発明の目的であるローカル自律
性の諸機能から利益が得られるであろう。ピケットのグ
ループ化、ローカル・テーブル索引および命令のローカ
ル実行が、３つの重要な領域である。Floating point execution is important to the system of the present invention, but lists some other important computational aids. Implementation of these computational aids within the array processor would benefit from the local autonomy features that are the object of the present invention. Picket grouping, local table indexing and local execution of instructions are three important areas.

【００７８】ピケットが、それ自体を評価し、それ自体
を様々なグループに置くことができ、これらのグループ
のそれぞれが、処理のために順に選択できる場合、現在
使用されている能力を超えるかなりの機能を有するプロ
セッサのアレイ上で実行中の１タスク内のサブタスクを
区分することができる。この能力は"Grouping of SIMD
Pickets"と題する、関連特許で論じられている。If the picket can evaluate itself and place itself in various groups, each of which can in turn be selected for processing, a significant amount over the capabilities currently used. Subtasks within a task running on an array of functional processors can be partitioned. This ability is called "Grouping of SIMD
It is discussed in a related patent entitled "Pickets".

【００７９】超越関数や他の多くの重要な関数の実行
は、テーブルを使用することによって最も効率的に実施
される。プロセッサのアレイ内の各プロセッサが、デー
タ（角度値）に基づいてそれ自体のメモリ内の値を独立
して引く能力を有するならば、性能上大きな利益がもた
らされるはずである。Execution of transcendental functions and many other important functions is most efficiently performed by using tables. If each processor in the array of processors had the ability to independently pull the value in its own memory based on the data (angle value), a significant performance benefit would result.

【００８０】ＳＩＭＤ実施態様を伴うプロセッサのアレ
イ内の各プロセッサのローカル・メモリから命令を実行
すると、そのようなアレイの柔軟性が高まるはずであ
る。これが、関連特許出願第７９８７８８号明細書に記
載のＳＩＭＩＭＤモードの基礎となる。Executing instructions from the local memory of each processor in an array of processors with SIMD implementations should increase the flexibility of such arrays. This is the basis for the SIMIMD mode described in related patent application 798788.

【００８１】したがって、本発明が必要である。Therefore, there is a need for the present invention.

【００８２】[0082]

【発明が解決しようとする課題】ピケット自律性の目的
は、ＳＩＭＤとＳＩＭＩＭＤを含む様々なモードで動作
できる並列アレイ・マシンを提供することである。した
がって、本発明の提供するプロセッサのＳＩＭＤアレイ
内の各要素は、アレイ制御装置からコマンドのストリー
ムを受け取る。この場合、複数の機構により、ピケット
と称する個々の処理要素を有するアレイ・マシンが、Ｓ
ＩＭＤコマンドの一部を、それ自体の独自の形で解釈で
きるようになり、各ピケットにある程度のローカル自律
性が与えられる。その結果もたらされる能力により、ピ
ケットは、ＳＩＭＩＭＤと称するモードで命令を実行で
きるようになる。その結果もたらされるもう１つの能力
により、浮動小数点命令実行の性能が大きく向上する。
さらに別の能力により、様々な有用な形でピケットがグ
ループ化され、これによって、あるグループがドーズ
（居眠り）している間に別のグループがＳＩＭＤ命令を
実行しておられるようになる。他の能力としては、テー
ブル・ルックアップ、ローカルで供給される命令コー
ド、およびデータ内容によって定義される実行がある。
ＳＩＭＤモードとＳＩＭＩＭＤモードを用いると、デー
タの浮動小数点表現、および複数バイト幅データ・フロ
ーでデータを実行するプロセッサのアレイ内でのデータ
を用いる計算が可能になる。The purpose of picket autonomy is to provide a parallel array machine that can operate in various modes including SIMD and SIMIMD. Thus, each element in the SIMD array of processors provided by the present invention receives a stream of commands from the array controller. In this case, multiple mechanisms allow the array machine with individual processing elements called pickets to
It allows some of the IMD commands to be interpreted in their own form, giving each picket some degree of local autonomy. The resulting capability allows the picket to execute instructions in a mode called SIMIMD. Another resulting capability greatly enhances floating point instruction execution performance.
Yet another capability groups pickets in various useful ways so that one group is executing a SIMD instruction while another group is dozing. Other capabilities include table lookups, locally supplied opcodes, and execution defined by data content.
SIMD and SIMIMD modes allow floating point representations of data and computations using the data within an array of processors executing the data in multi-byte wide data flows.

【００８３】[0083]

【課題を解決するための手段】本発明の好ましい実施例
では、同一チップ上に複数の相互作用するピケット・ユ
ニットまたはＰＭＥ（ピケット）を備える並列アレイ・
プロセッサ内で、本発明によって並列アレイ・プロセッ
サ用に開発された実施態様が使用できることを理解され
たい。このようなシステムは、複数のピケット・ユニッ
ト（メモリを有する処理要素）を有する。これらのユニ
ットはアレイとして配置され、メモリを有するピケット
のアレイは、セット連想メモリとして使用できる。多重
データ・ストリームを有するこのようなシステムの処理
は、場合によっては、単一命令多重データ（ＳＩＭＤ）
システム上で実行した方が性能がよくなる可能性があ
る。浮動小数点を有する処理システムは、単一チップ上
にコンピュータ・メモリと制御論理機構を統合したシス
テムの一部となり、これによって、チップ内で組合せを
複製し、単一チップの複製からプロセッサ・システムを
構築することができる。このチップ内では、各処理メモ
リ要素（ＰＭＥ）が自律プロセッサとして動作でき、外
部制御プロセッサの制御が提供され、本発明の好ましい
実施例と共に実施される時には、本発明のプロセッサ
は、より高い浮動小数点精度を有するＳＩＭＤモード・
プロセッサとして動作できる。SUMMARY OF THE INVENTION In a preferred embodiment of the present invention, a parallel array comprising a plurality of interacting picket units or PMEs (pickets) on the same chip.
It should be appreciated that within the processor, the implementations developed by the present invention for parallel array processors may be used. Such a system has multiple picket units (processing elements with memory). These units are arranged as an array, and an array of pickets with memory can be used as a set associative memory. The processing of such systems with multiple data streams is sometimes referred to as single instruction multiple data (SIMD).
Performance may be better if executed on the system. A processing system with floating point becomes part of a system that integrates computer memory and control logic on a single chip, which allows the combination to be replicated within the chip and the processor system from a single chip replication. Can be built. Within this chip, each processing memory element (PME) can operate as an autonomous processor, provides control of an external control processor, and when implemented with the preferred embodiment of the present invention, the processor of the present invention provides a higher floating point number. SIMD mode with accuracy
It can operate as a processor.

【００８４】本発明の単一チップ上のセット連想並列処
理システムを用いると、その上で連想動作が実行できる
メモリから、大きな組から小さな組の「データ」を取り
出すことが可能になる。この連想動作は、典型的には厳
密比較であるが、ピケットのメモリと実行ユニットを利
用して、１組のデータ全体に対して並列に実行される。
ピケット・アレイ内では、各ピケットが、大きな組から
のデータの一部分を有する。さらに、各ピケットは、そ
の部分から一片のデータを選択する。したがって、１組
のピケットのそれぞれに含まれる一片のデータが、すべ
てのピケットによって並列に連想動作が実行される、１
組のデータを構成する。Using the single-chip set-associative parallel processing system of the present invention, it is possible to retrieve a small set of "data" from a large set from a memory on which associative operations can be performed. This associative operation, which is typically an exact comparison, is performed in parallel on the entire set of data utilizing the picket's memory and execution units.
Within the picket array, each picket has a portion of the data from the large set. In addition, each picket selects a piece of data from that part. Therefore, a piece of data contained in each of the set of pickets is associated with all the pickets in parallel, and the associative operation is performed.
Construct a set of data.

【００８５】本明細書では、ピケットのＳＩＭＤアレイ
内の１ピケットの設計に適用された時に、以前には実行
が困難または不可能であったタスクの動作が簡単に行え
る、複数の実施技法について論ずる。これによって、各
ピケットに、ある程度のローカル自律性が与えられる。This specification discusses a number of implementation techniques that, when applied to the design of a picket in a SIMD array of pickets, facilitate the operation of tasks that were previously difficult or impossible to perform. . This gives each picket a degree of local autonomy.

【００８６】本明細書に記載のローカル自律的特徴はす
べて、すべての参加ピケットに同時に提示されるＳＩＭ
ＤコマンドまたはＳＩＭＤコマンドのローカル変形とし
て実施される。これらのコマンドのうちのいくつかは、
ローカル自律機能を直接にもたらす。コマンド"ＬＯＡ
ＤＯＰＦＲＯＭＭＥＭＯＲＹＢＵＳ"（メモリ
・バスから命令コードをロード）には、関連するローカ
ル変形がないが、これによってローカル選択されたコマ
ンドが確実に呼び出されるようになる。一方、コマン
ド"ＳＴＯＲＥＴＯＡＲＥＧＰＥＲＳＴＡＴ"
（状況によってＡレジスタにストア）は、レジスタＡへ
の記憶を、ローカル状況ビットに依存するものとする。All of the local autonomous features described herein are presented to all participating pickets simultaneously.
It is implemented as a local variant of the D command or SIMD command. Some of these commands are
Bring local autonomy directly. Command "LOA
The D OP FROM MEMORY BUS (load instruction code from memory bus) has no associated local variant, but it ensures that the locally selected command is recalled. REG PER STAT "
Store in register A (depending on circumstances) makes storage in register A dependent on local status bits.

【００８７】本発明の好ましいシステムでは、各ピケッ
トが、１群または複数の機構によってイネーブルされ、
これによって、各ピケットが、様々な実行能力を有する
ことができるようになる。これらの能力により、各ピケ
ットは、ピケット内でデータを実行するために様々なモ
ードを獲得できるようになり、外部ＳＩＭＤ制御装置か
ら送られるのではなくピケット内でＳＩＭＤコマンドを
解釈できるようになる。この能力は、ＳＩＭＤアレイの
各プロセッサが、ローカル条件に基づいて異なる動作を
実行でき、実際に実行するいくつかのモードに及んでい
る。In the preferred system of the present invention, each picket is enabled by a group or mechanisms.
This allows each picket to have different performance capabilities. These capabilities allow each picket to acquire various modes for executing data within the picket and interpreting SIMD commands within the picket rather than being sent from an external SIMD controller. This capability extends to several modes in which each processor of a SIMD array can and does perform different operations based on local conditions.

【００８８】これらのローカル自律機能のうちのいくつ
かを組み合わせると、ピケットに、ＳＩＭＤコマンド・
ストリームの制御下で短時間の間そのピケット自体のロ
ーカル・プログラム・セグメントを実行する能力を与え
ることができる。これによって、ＳＩＭＤアレイ内でＭ
ＩＭＤ能力が与えられる。このアーキテクチャをＳＩＭ
ＩＭＤと称するが、ＳＩＭＩＭＤをサポートし、追加能
力に貢献する機構については後で述べる。これらの機能
のいくつかは、ピケットのグループ化の機構形成に参加
して、分離された計算のためピケットを選択してグルー
プにするためのかなり複雑なツールをサポートする。こ
の機能をグループ化と称するが、本発明のシステムにグ
ループ化を追加するのをサポートする機構については後
で論ずる。ＳＩＭＩＭＤとグループ化のより詳しい説明
については、上記の関連特許明細書を参照する。Combining some of these local autonomous functions allows pickets to have SIMD command
The ability to execute the picket's own local program segment for a short time under the control of a stream can be provided. This allows M in the SIMD array.
IMD capabilities are provided. SIM this architecture
Although referred to as IMD, the mechanism that supports SIMIMD and contributes to additional capabilities is described below. Some of these features participate in the mechanics of picket grouping and support fairly complex tools for selecting and grouping pickets for discrete computations. Although this function is called grouping, the mechanism that supports adding grouping to the system of the present invention will be discussed later. For a more detailed description of SIMIMD and grouping, see the related patent specifications referenced above.

【００８９】これらの機能のいくつかは、ＳＩＭＤマシ
ンでの浮動小数点の実施を対象とする関連特許明細書で
論じられている、効率的な浮動小数点手法の機構形成に
参加する。浮動小数点をサポートする機構が、本明細書
に記載のシステムに追加される。Some of these features participate in the mechanics of efficient floating point techniques discussed in related patent specifications directed to floating point implementations on SIMD machines. A mechanism that supports floating point is added to the system described herein.

【００９０】本発明が提供するローカル自律性のための
機能には、下記の３つの範疇がある。１．状況に制御されるローカル動作２．データに制御されるローカル動作３．プロセッサからマイクロ制御装置への状況分配このそれぞれが、本明細書に記載のシステムに追加の能
力を提供する。The functions for local autonomy provided by the present invention are classified into the following three categories. 1. Local operation controlled by situation 1. Local operation controlled by data Processor-to-microcontroller status distribution, each of which provides additional capabilities to the systems described herein.

【００９１】ピケットは、前の状況に基づいてそれ自体
をオンまたはオフにすることができる。ピケットは、Ｓ
ＩＭＤ制御装置からコマンドが与えられる際に、ピケッ
ト・メモリから適当な値をピケット内のドーズ・ラッチ
にロードすることによって、それ自体をドーズ・モード
にすることができる。これによって、ピケット・チップ
内の個々のピケットの内部制御用の機能を提供する「ロ
ーカル参加」自律性がもたらされる。The picket can turn itself on or off based on the previous situation. Picket is S
When given a command from the IMD controller, it can put itself in the doze mode by loading the appropriate value from the picket memory into the dose latch in the picket. This provides "local join" autonomy that provides the functionality for internal control of individual pickets within the picket chip.

【００９２】以下で詳細に説明する本発明の開発の結果
として、コンピュータ・システムのプロセッサのＳＩＭ
Ｄアレイは、データを並列に実行できる。このコンピュ
ータ・システムは、データおよび命令を相互通信するた
めピケットとして結合された、複数のアレイ処理要素を
有する。各ピケットは、ピケット内でデータを実行する
ために様々なモードを獲得できるようにし、ピケット内
でＳＩＭＤコマンドを解釈できるようにする、様々な実
行能力を各ピケットが有することができるようにする複
数の機構を有する。As a result of the development of the invention described in detail below, the SIM of the processor of the computer system
D-arrays can execute data in parallel. The computer system has a plurality of array processing elements coupled as pickets for communicating data and instructions with each other. Each picket allows each picket to have different execution capabilities that allow it to acquire different modes for executing data within the picket and allow SIMD commands to be interpreted within the picket. It has the mechanism of.

【００９３】さらに、各ピケットは、ＳＩＭＤアレイの
各プロセッサがローカル条件に基づいて異なる動作を実
行でき、実際に実行する複数のモードで、各ピケットが
動作できるようにする複数の機構を有する。Further, each picket has a plurality of mechanisms that allow each processor of the SIMD array to perform different operations based on local conditions, and to allow each picket to operate in multiple modes of actual execution.

【００９４】本発明の開発の結果として、各アレイ・プ
ロセッサは、ＳＩＭＩＭＤアレイ・プロセッサ・モード
にあるものとして動作でき、複数のアレイ・プロセッサ
のうちの少なくとも１つが、ＳＩＭＩＭＤモードで動作
できるようになる。プロセッサ・アレイ内の１要素が、
各クロック・サイクルに制御装置からコマンドを受け取
り、実行する。このコマンドのうちのいくつかは、各ピ
ケット内で解釈して、異なる動作を生成することができ
る。As a result of the development of the present invention, each array processor can operate as being in SIMIMD array processor mode, and at least one of the plurality of array processors can operate in SIMIMD mode. . One element in the processor array
Receives and executes commands from the controller on each clock cycle. Some of these commands can be interpreted within each picket to generate different actions.

【００９５】本発明者等は、この新しい概念を使って設
計された新しい「チップ」およびシステムを作成するこ
とにより、大規模並列プロセッサおよびその他のコンピ
ュータ・システムを作成する新しい方法を開発した。本
発明および関連出願の方法は、本明細書および関連出願
で教示される様々な概念を表したものとみなすことがで
きる。各出願に記載された構成要素を本システム中で組
み合わせて、新しいシステムを作成することができる。
また、既存の技術と組み合わせることもできる。The inventors have developed a new method of creating massively parallel processors and other computer systems by creating new "chips" and systems designed using this new concept. The present invention and methods of related applications can be viewed as representing various concepts taught herein and related applications. The components described in each application can be combined in the system to create a new system.
It can also be combined with existing technology.

【００９６】本明細書および関連出願で、ピケット・プ
ロセッサと、いわゆる拡張並列アレイ・プロセッサ（Ａ
ＰＡＰ）について詳細に記述する。ピケット・プロセッ
サがＰＭＥを使用できることに留意されたい。ピケット
・プロセッサは、きわめて小型のアレイ・プロセッサが
好ましい、軍事の応用分野で特に役立つ。この点で、ピ
ケット・プロセッサは、ＡＰＡＰ、すなわち本発明の拡
張並列アレイ・プロセッサに関連する好ましい実施例と
は若干異なる。しかし、共通性があり、本発明で提供し
た態様および機能は異なるマシンに使用することができ
る。In this specification and related applications, the picket processor and the so-called enhanced parallel array processor (A
PAP) will be described in detail. Note that the picket processor can use the PME. Picket processors are particularly useful in military applications, where extremely small array processors are preferred. In this respect, the picket processor differs slightly from APAP, the preferred embodiment associated with the enhanced parallel array processor of the present invention. However, there is a commonality and the aspects and features provided in the present invention can be used in different machines.

【００９７】ピケットという用語は、プロセッサおよび
メモリと、それらに組み込まれ、アレイ相互通信に適用
される通信要素とから成る、アレイ・プロセッサのｎ分
の１の要素を指す。The term picket refers to the 1 / nth element of an array processor, which consists of the processor and memory, and the communication elements that are incorporated therein and applied to array intercommunication.

【００９８】ピケットの概念は、ＡＰＡＰ処理アレイの
ｎ分の１にも適用される。The picket concept also applies to the 1 / nth of an APAP processing array.

【００９９】ピケットの概念は、データ幅、メモリ・サ
イズ、およびレジスタ数の点でＡＰＡＰと異なることが
あり得るが、ＡＰＡＰに代わる大規模並列実施例では、
正規アレイのｎ分の１の接続性を持つように構成される
が、ＡＰＡＰのＰＭＥは副アレイの一部であるという点
で異なる。どちらのシステムもＳＩＭＩＭＤを実行する
ことができる。しかし、ピケット・プロセッサは、ＰＥ
内にＭＩＭＤを持つＳＩＭＤマシンとして構成されるの
で、ＳＩＭＩＭＤを直接実行することができるが、ＭＩ
ＭＤＡＰＡＰ構成は、ＳＩＭＤをエミュレートするよ
うに制御されたＭＩＭＤＰＥを使ってＩＭＩＭＤを実
行する。どちらのマシンもＰＭＥを使用する。The picket concept may differ from APAP in terms of data width, memory size, and number of registers, but in a massively parallel alternative to APAP,
Although configured to have 1 / nth the connectivity of a regular array, the APME's PME is different in that it is part of a secondary array. Both systems are capable of performing SIMIMD. However, the picket processor
SIMIMD can be executed directly because it is configured as a SIMD machine with MIMD in it.
The MD APAP configuration implements IMIMD with MIMD PEs controlled to emulate SIMD. Both machines use PME.

【０１００】どちらのシステムも、アレイ通信ネットワ
ークによって相互接続された^Ｎ^個の要素を持つアレイ
のアレイ処理ユニットを備える、並列アレイ・プロセッ
サとして構成することができる。ここで、プロセッサ・
アレイのＮ分の１は、処理要素、それに関連するメモ
リ、制御バス・インタフェース、およびアレイ通信ネッ
トワークの一部である。Either system can be configured as a parallel array processor with an array processing unit of an array with N elements interconnected by an array communication network. Where the processor
One-Nth of the array is part of the processing element, its associated memory, control bus interface, and array communication network.

【０１０１】並列アレイ・プロセッサは、２つのモード
の一方または両方で動作するように処理ユニットに指令
することができ、処理ユニットがＳＩＭＤ動作およびＭ
ＩＭＤ動作のためにこれらの２つのモード間を自由に移
動することができる、２重動作モード機能を有する。そ
の際に、ＳＩＭＤがその編成のモードである時は、処理
ユニットは、各要素にそれ自体の命令をＳＩＭＩＭＤモ
ードで実行するよう指令することができ、そのＭＩＭＤ
が処理ユニット編成の実施モードの時は、処理ユニット
は、ＭＩＭＤ実行をシミュレートするためにアレイの選
択された要素を同期させることができる。本明細書では
これをＭＩＭＤ−ＳＩＭＤと呼ぶ。The parallel array processor can instruct the processing unit to operate in one or both of two modes, where the processing unit has SIMD operation and M
It has a dual operating mode capability that allows it to move freely between these two modes for IMD operation. In doing so, when the SIMD is in its mode of organization, the processing unit can instruct each element to execute its own instructions in SIMIMD mode.
When is an implementation mode of processing unit organization, the processing unit can synchronize selected elements of the array to simulate MIMD execution. In this specification, this is called MIMD-SIMD.

【０１０２】どちらのシステムでも、並列アレイ・プロ
セッサは、アレイの要素間で情報をやり取りするための
経路を有するアレイ通信ネットワークを提供する。情報
の動きは、次の２つの方式のどちらかで指示することが
できる。第１の方式では、アレイ制御装置が、すべての
メッセージが同時に同一の方向に移動するように指示
し、したがって移動されるデータがその宛先を定義する
ことはない。第２の方式では、各メッセージが自分で経
路を指定し、メッセージの始めにあるヘッダが宛先を定
義する。In both systems, the parallel array processor provides an array communication network with paths for exchanging information between the elements of the array. The movement of information can be instructed by either of the following two methods. In the first scheme, the array controller directs all messages to move in the same direction at the same time, so the data being moved does not define its destination. In the second scheme, each message routes itself and the header at the beginning of the message defines the destination.

【０１０３】並列アレイ・プロセッサ・アレイのセグメ
ントは、単一の半導体チップ上に設けた処理ユニットの
複数のコピーを有する。アレイのセグメントの各コピー
は、アレイ通信ネットワークのうちそのセグメントと関
連する部分と、アレイのそのセグメント部分をアレイの
他のセグメントと継目なしに接続してアレイ通信ネット
ワークを拡張できるようにするためのバッファ、ドライ
バ、マルチプレクサ、および制御機構とを含む。A segment of a parallel array processor array has multiple copies of processing units provided on a single semiconductor chip. Each copy of a segment of the array is for connecting the portion of the array communication network associated with that segment and that segment of the array to other segments of the array so that the array communication network can be expanded. Includes buffers, drivers, multiplexers, and control mechanisms.

【０１０４】制御装置からの制御バスまたは制御経路が
各処理ユニットごとに設けられ、制御バスがアレイの各
要素に延びて、その活動を制御するようになっている。A control bus or control path from the controller is provided for each processing unit such that a control bus extends to each element of the array to control its activity.

【０１０５】並列アレイの各処理要素セグメントは、プ
ロセッサ・メモリ要素の複数のコピーを含む。プロセッ
サ・メモリ要素は、単一の半導体チップの範囲内に含ま
れ、アレイの１セグメントを有し、チップ内に含まれる
アレイ・セグメントへの制御の通信をサポートするため
にアレイ制御バスの一部分およびレジスタ・バッファを
含む。Each processing element segment of the parallel array contains multiple copies of the processor memory elements. A processor memory element is contained within a single semiconductor chip, has a segment of the array, and includes a portion of an array control bus to support communication of control to array segments contained within the chip. Includes register buffer.

【０１０６】どちらのシステムも、メッシュ移動および
経路指定移動を実施することができる。通常、ＡＰＡＰ
は、チップ上の８個の要素がある方式で相互に関連し、
チップ同士は別の方式で相互に関連する、２重相互接続
構造を実施する。上述のように、一般に、チップ上での
プログラマブル経路指定によってＰＭＥ間にリンクが確
立されるが、ノードは別の方式で関連させることがで
き、通常はそのようになっている。一般に、チップ上の
正規ＡＰＡＰ構造は２×４メッシュであり、ノード相互
接続は経路指定された疎８進Ｎキューブとすることがで
きる。どちらのシステムも、ＰＥ間にＰＥ間相互通信経
路を有し、２地点間経路からマトリックスを構成するこ
とができる。Both systems can implement mesh movement and routing movement. Usually APAP
Are interrelated in a way that there are 8 elements on the chip,
The chips implement a dual interconnect structure that is otherwise interrelated. As mentioned above, programmable routing is generally established on-chip to establish a link between PMEs, although nodes can and typically are associated in different ways. In general, the regular APAP structure on chip is a 2x4 mesh and the node interconnects can be routed sparse octal N-cubes. Both systems have PE-PE intercommunication paths between PEs and can form a matrix from point-to-point paths.

【０１０７】このような背景および見通しの上に立っ
て、図面を参照しながら、本発明の好ましい実施例に関
する本発明の特徴および態様について詳細に説明するこ
とにする。このシステムの追加の詳細は、以下の説明と
関連特許に出ている。With this background and perspective in mind, the features and aspects of the present invention relating to the preferred embodiments of the present invention will be described in detail with reference to the drawings. Additional details of this system can be found in the following description and related patents.

【０１０８】[0108]

【実施例】以下、本発明の好ましい実施例について説明
する。まず、好ましい浮動小数点実施例の一般的考察か
ら始める。浮動小数点実施態様のこの実施例は、（ピケ
ットとして形成された）処理要素の並列アレイで使用す
ることが好ましいが、従来技術の並列アレイ・プロセッ
サで使用することができる。したがって、より適切な方
向付けを与えるため、そのようなシステムについて考察
するのが適切であろう。The preferred embodiments of the present invention will be described below. We will start with a general discussion of the preferred floating point implementation. This embodiment of the floating point implementation is preferably used with a parallel array of processing elements (formed as pickets), but can be used with prior art parallel array processors. Therefore, it would be appropriate to consider such a system in order to provide a more appropriate orientation.

【０１０９】図面を参照すると、図１は、ヨーロッパ特
許出願第８８３０７８５５／８８−Ａ号および英国特許
出願第１４４５７１４号に一般に記載されている種類
の、典型的な従来技術のＳＩＭＤシステムを表すことを
理解されたい。そのような従来技術の装置では、ＳＩＭ
Ｄコンピュータは、それぞれが複数のＳＩＭＤメモリ・
デバイスの１つと関連する複数の並列リンク・ビット直
列プロセッサを含む、並列アレイ・プロセッサを有する
単一命令複数データ・コンピュータである。入出力シス
テムは、ＳＩＭＤユニットへのステージング・システム
として働き、ホスト・コンピュータ（メインフレームま
たはマイクロプロセッサでよい）とＳＩＭＤコンピュー
タの間での両方向２次元データ転送用の一時記憶域を備
える。入出力システムは、ホスト・コンピュータと一時
記憶手段の間のデータの流れを制御し、一時記憶域と、
バッファ・セクションまたは大規模メモリの区画として
通常編成される、複数のＳＩＭＤメモリ・デバイスとの
間でのデータの流れを制御するための入出力処理手段を
含む。したがって、入出力システムの入力動作は、ホス
ト・コンピュータ・メモリから一時記憶域へのデータの
転送と、第２ステップでの一時記憶域からＳＩＭＤメモ
リ・デバイスへのデータの転送を伴う。出力の場合も、
２次元バスを介してホスト・コンピュータとＳＩＭＤコ
ンピュータの間でデータを転送する、２ステップ処理が
行われる。入出力転送用の入出力システムは、別個のユ
ニット、またはホスト内のサブユニットとすることもで
きるが、多くの場合、ＳＩＭＤコンピュータ内の一ユニ
ットであり、ＳＩＭＤ制御装置が一時入出力バッファ記
憶域の制御機構として働く。Referring to the drawings, FIG. 1 represents a typical prior art SIMD system of the type generally described in European Patent Application No. 8830755 / 88-A and British Patent Application No. 1445714. I want you to understand. In such prior art devices, SIM
Each D computer has multiple SIMD memory
A single-instruction, multiple-data computer having a parallel array processor, including a plurality of parallel-link bit-serial processors associated with one of the devices. The I / O system acts as a staging system to the SIMD unit and provides temporary storage for bidirectional two-dimensional data transfer between the host computer (which may be a mainframe or a microprocessor) and the SIMD computer. The input / output system controls the flow of data between the host computer and the temporary storage means, the temporary storage area,
It includes input / output processing means for controlling the flow of data to and from a plurality of SIMD memory devices, usually organized as buffer sections or large memory partitions. Thus, the input operation of the I / O system involves the transfer of data from the host computer memory to the temporary storage and the transfer of the data from the temporary storage to the SIMD memory device in the second step. For output,
A two-step process is performed in which data is transferred between the host computer and the SIMD computer via the two-dimensional bus. The I / O system for I / O transfers can be a separate unit, or a sub-unit in the host, but is often a unit in a SIMD computer, with a SIMD controller providing temporary I / O buffer storage. Acts as a control mechanism for.

【０１１０】ＳＩＭＤコンピュータ自体は、複数の処理
要素と、個々の処理要素と複数の従来の別個のＳＩＭＤ
メモリ・デバイスを接続するネットワークとを有する、
プロセッサ・アレイを含んでいる。ＳＩＭＤコンピュー
タは、リンクされて並列に動作する多数の個別の処理要
素を有する、並列アレイ・プロセッサである。ＳＩＭＤ
コンピュータは、処理要素用の命令ストリームを生成
し、かつコンピュータに必要なタイミング信号を供給す
る制御ユニットを含む。様々な処理要素を相互接続する
ネットワークは、個々の処理要素用の何らかの形の相互
接続方式を含む。この相互接続は、メッシュ、多形トー
ラス、ハイパーキューブなど多数のトポロジーを採用す
ることができる。複数のメモリ・デバイスは、個々の処
理要素のビット・データを即時に記憶するためのもので
ある。処理要素の数と、大規模メモリの前記のバッファ
区画とすることができるメモリ・デバイスの数との間に
は、１対１の対応関係がある。The SIMD computer itself includes multiple processing elements, individual processing elements and multiple conventional separate SIMDs.
A network connecting the memory devices,
Contains a processor array. SIMD computers are parallel array processors that have many individual processing elements that are linked and operate in parallel. SIMD
The computer includes a control unit that generates the instruction stream for the processing elements and provides the computer with the necessary timing signals. Networks interconnecting various processing elements include some form of interconnection scheme for individual processing elements. The interconnect can employ numerous topologies such as mesh, polymorphic torus, hypercube, and so on. The multiple memory devices are for immediate storage of bit data for individual processing elements. There is a one-to-one correspondence between the number of processing elements and the number of memory devices that can be said buffer partition of large memory.

【０１１１】たとえば、図１に示すように、ホスト・プ
ロセッサ２８が設けられる。このプロセッサを使用し
て、アレイ制御装置１４（一時記憶域バッファを含む）
にマイクロコード・プログラムをロードし、アレイ制御
装置１４とデータを交換し、ホストと制御装置の間のデ
ータ・バス３０およびアドレス／制御バス３１を介して
アレイ制御装置１４の状況を監視する。この例のホスト
・プロセッサは、メインフレームやパーソナル・コンピ
ュータなど適当な汎用コンピュータなら何でもよい。こ
の従来技術の例では、アレイのアレイ・プロセッサを２
次元的に示してあるが、アレイを３次元クラスタ配列や
４次元クラスタ配列など異なる編成にすることもでき
る。ＳＩＭＤアレイ・プロセッサは、処理要素Ｐ（ｉ，
ｊ）のアレイ１２と、処理要素Ｐ（ｉ，ｊ）に大域命令
のストリームを発行するためのアレイ制御装置１４を備
えている。図１には示してないが、この従来技術の例
は、一時に１個のビットに作用する処理要素を有し、そ
の処理要素に関連するメモリ内の区画である記憶域のブ
ロックと関連づけられている。処理要素は、ＮＥＷＳ
（北、東、西、南）ネットワークにより、両方向ビット
線でそれぞれの隣接処理要素と接続される。すなわち、
処理要素Ｐ（ｉ，ｊ）は、北、東、西、南の各方向でそ
れぞれ処理要素Ｐ（ｉ−１，ｊ）、Ｐ（ｉ，ｊ＋１）、
Ｐ（ｉ，ｊ−１）、Ｐ（ｉ＋１，ｊ）に接続される。For example, as shown in FIG. 1, a host processor 28 is provided. Array controller 14 (including temporary storage buffers) using this processor
A microcode program to exchange data with the array controller 14 and monitor the status of the array controller 14 via the data bus 30 and address / control bus 31 between the host and controller. The host processor in this example may be any suitable general purpose computer such as a mainframe or personal computer. In this prior art example, there are two array processors in the array.
Although shown dimensionally, the arrays can be organized differently, such as a three-dimensional cluster array or a four-dimensional cluster array. The SIMD array processor uses the processing element P (i,
j) array 12 and array controller 14 for issuing a stream of global instructions to processing element P (i, j). Although not shown in FIG. 1, this prior art example has a processing element that operates on one bit at a time and is associated with a block of storage that is a partition in memory associated with that processing element. ing. The processing element is NEWS
A (North, East, West, South) network connects bidirectional bit lines to each adjacent processing element. That is,
Processing element P (i, j) is processing element P (i−1, j), P (i, j + 1), in each direction of north, east, west, and south.
It is connected to P (i, j-1) and P (i + 1, j).

【０１１２】この典型的な例では、ＮＥＷＳネットワー
クは、そのエッジでトロイド状に接続され、北のエッジ
と南のエッジが両方向に相互接続され、西のエッジと東
のエッジも同様に相互接続される。データをプロセッサ
のアレイに入出力できるようにするため、制御装置とア
レイの間のデータ・バス２６がＮＥＷＳネットワークに
接続される。図のように、このデータ・バス２６はアレ
イの東西の境界に接続されている。トロイド状の東西Ｎ
ＥＷＳ接続に接続された両方向３状態ドライバにより、
東西方向の代わりに南北境界に接続することも、東西境
界と南北境界の両方に接続することもできる。この例の
処理要素の数が、この例のように１６×１６ではなく３
２×３２であったなら、後述する好ましい実施例の場合
と同様に、従来技術でも１０２４個の処理要素が実現さ
れるはずである。図では、単一線は単一ビット線を示
し、機能要素を接続する２重線は複数の接続線またはバ
スを表すのに使用する。In this typical example, a NEWS network is toroidally connected at its edges, with the north and south edges interconnected in both directions, and the west and east edges interconnected as well. It A data bus 26 between the controller and the array is connected to the NEWS network to allow data to be input to and output from the array of processors. As shown, this data bus 26 is connected to the east and west boundaries of the array. Toroidal east-west N
With a bi-directional 3-state driver connected to the EWS connection,
You can connect to the north-south boundary instead of the east-west direction, or to both the east-west boundary and the north-south boundary. The number of processing elements in this example is 3 instead of 16 × 16 as in this example.
If it were 2 × 32, then 1024 processing elements would be realized in the prior art, as in the preferred embodiment described below. In the figures, a single line represents a single bit line and double lines connecting functional elements are used to represent multiple connection lines or buses.

【０１１３】この従来技術の例では、アレイ制御装置
が、命令バス１８を介して処理要素に命令を並列に発行
し、行選択線２０および列選択線２２を介してそれぞれ
行選択信号および列選択信号を発行する。これらの命令
は、処理要素に、記憶域からデータをロードさせ、それ
を処理させ、その後再度記憶域に格納させる。この目的
のため、各処理要素はメイン・メモリのビット・スライ
ス（セクションまたはバッファ）にアクセスすることが
できる。したがって、論理的には、アレイ・プロセッサ
のメイン・メモリは、１０２４個の処理要素のアレイ用
の１０２４個の区画スライスに分けられる。すなわち、
１転送ステップで、一時に最大３２個の３２ビット・ワ
ードが記憶域にまたは記憶域から転送できる。読取り動
作または書込み動作を実行するため、メモリは、アドレ
ス・バス２４を介してメモリ・アドレス線に供給される
インデックス・アドレスによって指定アドレスされ、各
処理要素に読取り命令または書込み命令が並列に供給さ
れる。読取り動作時には、行選択線上の行選択信号およ
び列選択線上の列選択信号が、動作を実行すべき処理要
素はどれかを識別する。したがって、この例では、アレ
イが３２×３２のとき、メモリから、選択された行の３
２個の処理要素に単一の３２ビット・ワードを読み込む
ことが可能である。処理要素は、スライスまたは１ビッ
ト幅のメモリ・ブロック（ｉ，ｊ）と関連付けられる。
このスライスまたはブロック・メモリは、関連する個々
の処理要素と論理的に１対１に関連付けられるが、別の
チップ上では物理的に分離することができ、通常はそう
される。この従来技術のアーキテクチャでは、どのよう
にすれば単一のチップ上に上記のアレイ・プロセッサを
製造できるのかは不明である。本発明のピケットは、下
記の種類の単一のチップ上にプロセッサのアレイおよび
適切なメモリを備えて製造することができる。In this prior art example, the array controller issues an instruction in parallel to the processing elements via the instruction bus 18 and the row select signal and the column select line 20 and the column select line 22, respectively. Issue a signal. These instructions cause the processing element to load data from storage, process it, and then store it back in storage. For this purpose, each processing element can access a bit slice (section or buffer) of main memory. Thus, logically, the array processor's main memory is divided into 1024 partitioned slices for an array of 1024 processing elements. That is,
Up to 32 32-bit words can be transferred to or from storage at one time in one transfer step. To perform a read or write operation, the memory is addressed by the index address provided on the memory address lines via the address bus 24 and each processing element is provided with a read or write instruction in parallel. It During a read operation, the row select signal on the row select line and the column select signal on the column select line identify which processing element should perform the operation. Therefore, in this example, when the array is 32 × 32, from the memory, 3 of the selected rows are
It is possible to read a single 32-bit word into two processing elements. A processing element is associated with a slice or 1-bit wide memory block (i, j).
This slice or block memory is logically associated one-to-one with the individual processing elements involved, but can be, and usually is, physically separated on another chip. With this prior art architecture it is unclear how the above array processor could be manufactured on a single chip. The pickets of the present invention can be manufactured with an array of processors and suitable memory on a single chip of the type described below.

【０１１４】この従来技術の例の処理要素Ｐ（ｉ，ｊ）
はそれ自体、ＡＬＵと、それぞれ単一ビットの情報を格
納することができる、繰上げを含む入力レジスタおよび
出力レジスタとを備えていることを理解されたい。マル
チプレクサが、ＡＬＵの入出力に接続され、かつ個々の
処理要素Ｐ（ｉ，ｊ）に関連するメモリのスライス
（ｉ，ｊ）の両方向データ・ポートに接続されている。Processing element P (i, j) of this prior art example
It is to be understood that it comprises an ALU and input and output registers, including carry, each capable of storing a single bit of information. A multiplexer is connected to the I / O of the ALU and to the bidirectional data port of the slice (i, j) of memory associated with each processing element P (i, j).

【０１１５】命令バスとデータ・バスが別々にあり、ア
レイ制御装置は、ホスト２８が、データ・バス３０およ
びアドレス／制御バス３１を使って、アレイによって実
行される処理を定義するマイクロコードをロードするた
めのマイクロコード記憶域を有する。アレイ制御装置の
動作がホスト２８によって開始された後、マイクロコー
ドの順序は、アレイ制御装置１４内のマイクロコード記
憶域に接続されたマイクロコード制御装置によって制御
される。アドレス・バス上に出力される、アレイ・メモ
リ・アドレスの生成、ループのカウント、ジャンプ・ア
ドレスの計算、および汎用レジスタの動作に、アレイ制
御装置のＡＬＵとレジスタ・バンクが使用される。アレ
イ制御装置はまた、行マスク・コードおよび列マスク・
コードを復号するためのマスク・レジスタをも有する。
特定の命令コードが、命令バスを介して処理要素に渡さ
れる。この例では、アレイ制御装置は、制御装置内にあ
るが、機能的にはホストと制御装置の間のバスと制御装
置とアレイの間のデータ・バスの間にある、データ・バ
ッファを有することができる。制御記憶域内のマイクロ
コードの制御下で、このバッファからプロセッサのアレ
イにデータがロードされる。逆方向も同様である。この
目的のため、バッファは、アレイ制御装置内のマイクロ
コード制御機構の制御下で両方向ＦＩＦＯバッファとし
て配置される。そのような従来技術のシステムの詳細
は、上記に引用した例、特にヨーロッパ特許出願第８８
３０７８５５／８８−Ａ号を参照されたい。With the instruction and data buses separate, the array controller causes the host 28 to use the data bus 30 and the address / control bus 31 to load microcode that defines the processing performed by the array. Have microcode storage for After the array controller operation is initiated by the host 28, the microcode order is controlled by the microcode controller connected to the microcode storage within the array controller 14. The array controller's ALUs and register banks are used to generate array memory addresses, count loops, calculate jump addresses, and operate general registers on the address bus. The array controller also has a row mask code and a column mask
It also has a mask register for decoding the code.
A particular instruction code is passed to the processing element via the instruction bus. In this example, the array controller has a data buffer that is within the controller but functionally between the host-controller bus and the controller-array data bus. You can Data is loaded from this buffer into the array of processors under the control of microcode in control storage. The reverse direction is also the same. For this purpose, the buffer is arranged as a bidirectional FIFO buffer under the control of the microcode controller in the array controller. Details of such prior art systems can be found in the examples cited above, in particular in European Patent Application No. 88.
See 307855 / 88-A.

【０１１６】従来の試みの総括を、本発明の好ましい実
施例と比較することができる。図２には、基本ピケット
・ユニット１００を示す。基本ピケット・ユニット１０
０は１クロック・サイクルで１バイトの情報を処理する
ための、処理要素ＡＬＵ１０１と処理要素に結合された
ローカル・メモリ１０２との組合せを備えている。図の
ように、シリコンのベース・チップまたはピケット・チ
ップ上に、両側（図の右側と左側）に隣接ピケットがあ
るピケットの線形アレイを備える、ピケット・ユニット
が形成され、その結果、シリコンのベース・チップ上
に、複数のローカル・メモリを備えたピケット処理アレ
イが形成される。ローカル・メモリは、各バイト幅処理
データ・フローごとに１つずつあり、データを左右両方
向に受け渡しするための隣接ピケット通信バスを備えた
論理行または線形アレイとして配列されている。ピケッ
ト・チップ内のピケットの集合は、幾何的順序で配列さ
れ、チップ上に水平に配列することが好ましい。図２
は、複数のメモリおよびデータ・フローを有し、各ピケ
ットの処理要素とメモリの間に通信経路を備える、ピケ
ット・チップ上のピケット・アレイの２つのピケットの
典型的な実施態様を示す。本発明の好ましい実施例で
は、アレイの１対１の関係にあるメモリと処理要素の間
のデータ通信経路はバイト幅であり、左右に延びて、隣
接ピケットと、またはさらに遠隔のピケット・プロセッ
サとの通信のための「スライド」と接している。A summary of prior attempts can be compared with the preferred embodiment of the present invention. A basic picket unit 100 is shown in FIG. Basic picket unit 10
0 comprises a combination of a processing element ALU 101 and a local memory 102 coupled to the processing element for processing one byte of information in one clock cycle. As shown, a picket unit is formed with a linear array of pickets on a silicon base or picket tip with adjacent pickets on both sides (right and left in the figure), resulting in a silicon base. -On the chip, a picket processing array with multiple local memories is formed. There is one local memory for each byte wide processed data flow, arranged as a logical row or linear array with adjacent picket communication buses for passing data in both left and right directions. The set of pickets in a picket chip are arranged in geometric order, preferably horizontally on the chip. Figure 2
Shows a typical implementation of two pickets in a picket array on a picket chip with multiple memories and data flows, with a communication path between the processing elements of each picket and the memory. In the preferred embodiment of the present invention, the data communication path between the memory and the processing elements in a one-to-one relationship in the array is byte wide and extends left and right to adjacent pickets or even remote picket processors. Is in contact with the "slide" for communication.

【０１１７】「スライド」とは、通常なら情報を受け取
ることができるはずのピケットが、送られてくるメッセ
ージに対して透過的でない場合、メッセージがメッセー
ジを実際に受信する最も近くの活動状態の隣接ピケット
に到着し、そこで受け取るまで、そのピケット・アドレ
ス位置を介して非隣接位置に単一サイクルで情報を転送
するための手段と定義することができる。したがって、
スライドは、「オフになった」ピケットを横切って非隣
接位置に情報を送ることによって機能する。ピケット"
Ａ"が遠隔ピケット"Ｇ"に情報を転送しようとしている
ものとする。そのサイクルの前に、途中に介在する"Ｂ"
ピケットないし"Ｆ"ピケットをオフにすることにより、
これらのピケットを透過的にしておく。次の単一サイク
ルで、"Ａ"は、そのメッセージを右側に送り、その際
に、オフになっているために透過的になっている"Ｂ"な
いし"Ｆ"を通過する。"Ｇ"は、オンのままなので、メッ
セージを受け取る。"スライド"の通常の使用では、情報
が格子を横切って直線的に転送されるが、スライド手法
は、２次元メッシュまたは多次元アレイでもうまく働く
ことができる。[0117] A "slide" is the closest active neighbor to which a message actually receives a message if the picket, which would otherwise be able to receive the information, is not transparent to the incoming message. It can be defined as a means for transferring information in a single cycle through a picket address location to a non-adjacent location until it arrives at the picket and is received there. Therefore,
Slides work by sending information across non-adjacent locations across "off" pickets. picket"
Assume that A "is trying to transfer information to a remote picket" G. "Intervening" B "before the cycle.
By turning off the picket or "F" picket,
Keep these pickets transparent. In the next single cycle, "A" sends its message to the right, passing "B" to "F" which is transparent because it is off. "G" remains on, so you receive the message. In the usual use of "slides", information is transferred linearly across a grid, but sliding techniques can work well with two-dimensional meshes or multidimensional arrays.

【０１１８】本発明の好ましい実施例では、処理要素の
アクセスは、ビット直列動作ではなく、バイト直列動作
である。各プロセッサは、ローカル・メモリのブロック
や、その関連する区画またはページにアクセスするので
はなく、それ自体に結合したメモリにアクセスする。１
ビット・バスではなく文字幅または文字倍数幅のバスが
設けられている。１クロック・サイクルで、１ビットで
はなく１バイト（将来のシステムでは、文字バイトの性
能を倍加し、複数バイトとすることが計画されている）
の情報が処理される。したがって、各ピケット処理要素
間を、関連するメモリの幅に適合した８ビット、１６ビ
ット、または３２ビットが流れることができる。本発明
の好ましい実施例では、各ピケット・チップが、３２Ｋ
バイトの８（９）ビット幅メモリを有し、線形アレイの
１ピケット・ノード当たりこの３２Ｋバイトの記憶域を
１個ずつ備えたピケットを１６個有することが好まし
い。本発明の実施例では、ＤＲＡＭとしてのＣＭＯＳに
各関連メモリが設けられ、文字バイトは９ビットである
（自己検査機能を備えた８ビット文字として機能す
る）。In the preferred embodiment of the present invention, the processing element access is a byte serial operation rather than a bit serial operation. Each processor does not access a block of local memory or its associated partition or page, but rather memory associated with itself. 1
A character width or character multiple width bus is provided rather than a bit bus. 1 byte instead of 1 bit per clock cycle (future systems plan to double the performance of character bytes into multiple bytes)
Information is processed. Thus, between each picket processing element, 8 bits, 16 bits, or 32 bits can be flowed to fit the width of the associated memory. In the preferred embodiment of the invention, each picket tip is 32K.
It is preferable to have 16 pickets having an 8 (9) bit wide memory of bytes, one for each 32 Kbyte of storage per picket node of the linear array. In an embodiment of the present invention, CMOS as DRAM is provided with each associated memory and the character byte is 9 bits (acts as an 8-bit character with self-checking function).

【０１１９】ピケット間および処理要素とそのメモリの
間の並列経路バイト幅バス・データフローは、従来技術
のシステムの直列ビット構造と比べて大幅な改善であ
る。また、この重要な成果を評価すれば、並列性が増す
と、解決しなければならない別の問題が発生することが
認識されよう。新たに実現されたアーキテクチャの持つ
意味の理解が深まるにつれて、後述する他の重要な解決
策の価値も認められよう。The parallel path byte-width bus data flow between pickets and between processing elements and their memory is a significant improvement over the serial bit structure of prior art systems. Evaluating this important achievement will recognize that increasing parallelism creates another problem that must be resolved. As the understanding of the implications of the newly realized architecture grows, the value of other important solutions described below will be appreciated.

【０１２０】図面を参照しながら説明した左右の隣接ピ
ケットへの転送およびスライド機構に加えて、もう１つ
の評価される特徴として、２重バイト幅の同報通信バス
を設けて、すべてのピケットが同時に同一のデータを見
ることができるようにした点がある。ピケット制御およ
びアドレス伝播も、この同報通信バス上で転送される。
セット関連動作およびその他の比較動作または同期演算
動作を実行する際に比較データを供給するのはこのバス
である。In addition to the transfer and slide mechanism to the left and right adjacent pickets described with reference to the drawings, another evaluated feature is the provision of a double byte wide broadcast bus to ensure that all pickets are There is a point that the same data can be viewed at the same time. Picket control and address propagation are also transferred on this broadcast bus.
It is this bus that supplies the comparison data when performing set-related operations and other comparison or synchronous operation operations.

【０１２１】単一命令ストリームの制御下でピケット・
データ処理要素内での処理に役立つ高度に並列なデータ
構造を有するタスクには、人工知能パターン突合せ、マ
ルチセンサ最適割当てにおけるセンサとトラックのフュ
ージョン、文脈探索、およびイメージ処理のアプリケー
ションが含まれる。しかし、現在可能なこれらのアプリ
ケーションの多くは、ＳＩＭＤプロセスでは使用されな
かった。これは、ＳＩＭＤプロセスが単一クロック時間
での直列ビット処理であるためである。たとえば、ＳＩ
ＭＤマシンの従来の直列処理要素は、各プロセッサ・サ
イクルごとに１ビットのＡＤＤ演算を実行するが、３２
ビット並列マシンは１サイクルで３２ビットのＡＤＤを
実行することができる。Picket under control of a single instruction stream
Tasks with highly parallel data structures useful for processing within the data processing element include artificial intelligence pattern matching, sensor and track fusion in multi-sensor optimal allocation, context search, and image processing applications. However, many of these currently available applications were not used in the SIMD process. This is because the SIMD process is serial bit processing in a single clock time. For example, SI
The conventional serial processing element of the MD machine performs a 1-bit ADD operation every processor cycle, but 32
The bit parallel machine can execute 32-bit ADD in one cycle.

【０１２２】１処理要素当たり３２Ｋバイトの構成を用
いると、各処理要素が論理的に使用できるメモリの量
が、従来のＳＩＭＤマシンよりもはるかに多くなる。With the 32K byte configuration per processing element, the amount of memory that each processing element can logically use is much greater than in a conventional SIMD machine.

【０１２３】チップに出し入れされるデータが最小限に
抑えられているので、チップ上のピン数が少なくなって
いる。ＤＲＡＭメモリは、従来のメモリＣＭＯＳアレイ
であり、メモリ・アレイの背後の行のデマルチプレック
ス処理を削除し、メモリ・アレイの行をデータフローに
並列に読み込む行アドレスを提供することによって、
「行／列」アクセスをサポートする。The number of pins on the chip is small because the amount of data to be taken in and out of the chip is minimized. DRAM memory is a conventional memory CMOS array, by eliminating the demultiplexing of the rows behind the memory array and providing the row address to read the rows of the memory array into the data flow in parallel,
Supports "row / column" access.

【０１２４】このメモリは、データの他に「トライビッ
ト」または「トリット」を含むため、論理機構は従来の
２進数ではなく、論理１、論理０、"don^t care"という
３つの状態を認識する。突合せフィールド中の"don^t c
are"は、論理１または論理０のいずれかと一致する。ト
リットは、記憶アレイ内の連続する記憶位置に格納され
る。マスクは、メモリに格納された別の形のデータであ
り、ピケット処理要素のマスク・レジスタに送られる。Since this memory includes "tribit" or "trit" in addition to data, the logic mechanism is not a conventional binary number, but has three states of logic 1, logic 0 and "don ^ t care". recognize. "Don ^ tc in the match field
are "matches either a logical 1 or a logical 0. The trits are stored in consecutive storage locations within the storage array. The mask is another form of data stored in memory, the picket processing element. Sent to the mask register of.

【０１２５】記憶アレイはコマンドを格納できるので、
あるピケットが別のピケットと異なる動作を行えるよう
になる。必ずしもすべてのピケットではないがほとんど
のピケットに関連する動作中の個々のピケットのオンチ
ップ制御により、ＳＩＭＤ動作に特有の実施態様が可能
になる。提供される簡単な制御機能の１つは、その状況
出力が特定の条件を満たす任意のピケットにおける中断
動作を制御する機能である。すなわち、非ゼロ条件は、
ドーズ（居眠り）、すなわち動作を中断し、ピケットを
非活動状態ではあるが意識のある状態にするコマンド条
件を意味している。提供されるもう１つ別のコマンド
は、ピケット内の条件に基づいて、またはスライド動作
の前にバスに提供されたコマンドに基づいて、メモリへ
の書込みを禁止または使用可能にするものである。Since the storage array can store commands,
Allows one picket to behave differently than another. On-chip control of individual pickets during operation associated with most, but not necessarily all pickets, allows for implementations specific to SIMD operation. One of the simple control functions provided is the ability to control the break action in any picket whose status output meets certain conditions. That is, the non-zero condition is
Doze (drowsiness), that is, a command condition that interrupts an action and puts the picket in an inactive but conscious state. Another command provided is to inhibit or enable writing to memory based on conditions in the picket or based on the command provided to the bus prior to the sliding action.

【０１２６】それぞれ３２Ｋバイトのメモリを備えた１
６個の強力なピケットをピケット・チップに適用するこ
とにより、わずか６４個のチップで、１０２４個のプロ
セッサと３２７６８Ｋバイトのメモリが提供される。こ
のピケットのアレイは、セット連想メモリを備えてい
る。本発明は、数値計算中心の処理であるイメージ分析
ならびにベクトル処理に有用である。この強力なピケッ
ト処理アレイが、現在ではわずか２枚の小型カード上に
パッケージできる。より可能性のある低電力パッケージ
に数千個ものピケットを適切にパッケージでき、最小の
遅延で、あるいはビデオ・フレーム時間内でイメージ処
理アプリケーションを実行できることが理解されよう。
たとえば、ペイロードの問題を余り気にせずに、航空機
の飛行中にアプリケーションの実行が可能である。1 with 32 Kbytes of memory each
Applying 6 powerful pickets to a picket chip provides 1024 processors and 32768K bytes of memory with only 64 chips. This array of pickets has a set associative memory. INDUSTRIAL APPLICABILITY The present invention is useful for image analysis and vector processing, which are numerical calculation-intensive processing. This powerful picket processing array can now be packaged on just two small cards. It will be appreciated that thousands of pickets can be properly packaged in more potential low power packages to run image processing applications with minimal delay or within video frame time.
For example, it is possible to run an application while the aircraft is flying, with little concern for payload issues.

【０１２７】このピケットの力により、狭い面積にパッ
クされた大規模な関連メモリ・システムの使用が可能に
なり、システム設計者が新規システムの使用に慣れた後
には様々なアプリケーションで処理能力の使用が可能に
なる。The power of this picket allows the use of large, associated memory systems packed in a small area, and uses the processing power in various applications after the system designer has become accustomed to using the new system. Will be possible.

【０１２８】図３は、関連付けが要求されたとき、すべ
てのメモリ位置に比較値が提示され、すべてのメモリ位
置がその一致線で同時に応答するために、完全連想メモ
リと言える機構を示している。連想メモリ自体は当技術
分野で周知である。本発明のシステムでは、メモリと、
探索を行うためのバイト転送を有する処理要素とから成
る並列ピケットを使用し、データの入力と、メモリ内の
Ｎ個のワードのうちからワードＫを見つけるための探索
用マスクとがある。一致するすべてのピケットが状況線
をハイにした後、別の動作で一致する最初のＫが読み取
られまたは選択される。この動作は通常、セット連想式
と呼ばれているが、連続するワードについてピケット・
メモリ中を上に向かって繰り返すことができる。同様
に、書込みは、ハイになった選択線が参加を示す同報通
信動作を介して行われ、同報通信データが、選択された
すべてのピケットにコピーされる。FIG. 3 illustrates a mechanism that can be said to be a fully associative memory, because when an association is requested, all memory locations are presented with comparison values and all memory locations respond simultaneously on their match lines. . Associative memory itself is well known in the art. In the system of the present invention, memory and
Using a parallel picket consisting of processing elements with byte transfers to do the search, there is an input of data and a search mask to find word K out of N words in memory. After all matching pickets have brought the status line high, another action reads or selects the first matching K. This action is commonly called set-associative, but picket
You can repeat up in memory. Similarly, writing is done via a broadcast operation with the select line going high indicating join, and the broadcast data is copied to all selected pickets.

【０１２９】好ましいものではないが、別の実施例で
は、使用可能なＤＲＡＭメモリの量を減らして、各ピケ
ットに図３に示す種類の完全連想メモリのセクションを
含めることができるようにする。たとえば、５１２バイ
トの完全連想メモリを含める場合、あらゆるピケットが
１組の探索索引を格納でき、単一の動作で１０２４個の
ピケットに５１２をかけて、１動作当たり５１２キロ回
の比較、すなわち１動作当たり１マイクロ秒の速度とし
て、毎秒５１２ギガ回の比較が可能になる。拡張可能性
があれば、この概念を数テラ回の範囲の比較まで拡張す
ることができる。この実施例では、情報の広範な探索を
必要とする連想タスクに、現在使用可能なコンピューテ
ィング能力をかなり上回る能力を与える。Although not preferred, another embodiment reduces the amount of DRAM memory available so that each picket can include a section of fully associative memory of the type shown in FIG. For example, if you include 512 bytes of fully associative memory, every picket can store a set of search indexes, and 512 1024 pickets in a single operation, 512 comparisons per operation, or 1 With a speed of 1 microsecond per operation, 512 gigasecond comparisons per second are possible. With expandability, this concept can be extended to a range of tera-times comparisons. This example provides associative tasks that require extensive exploration of information with capabilities that far exceed the computing power currently available.

【０１３０】図２に示すように、関連付け動作で、メモ
リと、バイト幅の結合された処理要素とを使用すると
き、個々のアルゴリズムや演算、人工知能、およびＳＩ
ＭＤ状況で試行される並列プログラミングなどのアプリ
ケーションの他に、ＳＩＭＤ環境において上述のチップ
構成を有するマシンにとって現在使用可能なアプリケー
ションが多数ある。以下にその例を示す。As shown in FIG. 2, when using memory and byte-width coupled processing elements in the association operation, the individual algorithms and operations, artificial intelligence, and SI
In addition to applications such as parallel programming that are tried in MD situations, there are many applications currently available for machines with the above chip configurations in SIMD environments. An example is shown below.

【０１３１】・単純並列可能演算タスク。マトリックス
乗算や、専用メモリ・マシンで実行可能なその他のタス
クを含む。Simple parallelizable computing task. Includes matrix multiplication and other tasks that can be performed on dedicated memory machines.

【０１３２】・イメージ突合せと、フォン・ノイマン型
マシンでも実行できるが、極端な並列性に適合可能なア
プリケーションによってかなり高速化できるイメージ処
理タスク。たとえば、３次元イメージのパターン突合
せ。Image matching tasks and image processing tasks that can be performed on von Neumann machines, but can be significantly accelerated by applications that can accommodate extreme parallelism. For example, pattern matching of 3D images.

【０１３３】・データに基づく照会機能・ Data-based inquiry function

【０１３４】・人工知能分野でのパターン突合せPattern matching in the field of artificial intelligence

【０１３５】・ネットワークのブリッジの反対側にいる
ユーザに送信されるメッセージを迅速に識別するため
の、ブリッジにおけるネットワーク制御Network control at the bridge to quickly identify messages sent to users on the other side of the bridge of the network

【０１３６】・ゲート・レベルのシミュレーションGate-level simulation

【０１３７】・ＶＬＳＩ接地規則の違反を調べる検査プ
ログラムInspection program for checking for violation of VLSI ground rules

【０１３８】アプリケーション・プログラマは、この新
規システム・アーキテクチャの能力を利用するアプリケ
ーションを開発する際に、メモリおよび関連処理要素の
バンクを利用する処理タスクを思いつくであろう。Application programmers will come up with processing tasks that utilize banks of memory and associated processing elements when developing applications that utilize the capabilities of this new system architecture.

【０１３９】本発明の新規アーキテクチャを利用する
際、プロセスの変更によって通常のアプリケーションが
機能強化される。ディジタル・システムの記述を維持す
るプロセスは、このアレイを使って１００ピケットごと
に１つのゲートまたは論理要素を記述することによって
機能強化できる。そのようなシステムでは、各ゲートの
記述を、ゲートが入力として受け入れる信号のリストと
して割り当て、ゲートが生成する信号を命名することに
より、処理が開始する。本発明では、信号が変化するた
びに、その名前をバス１０３を介してすべてのピケット
に同報通信し、この名前を、予想される入力信号の名前
と並列に比較するステップを含むことになる。一致が見
つかった場合、後続ステップで、そのピケット内でデー
タフロー・レジスタ・ビットに信号の新しい値を記録す
る。すべての信号の変化が記録されると、そのような拡
張プロセスでは、すべてのピケットが、１組の現入力を
使って出力をどう計算するかをデータ・フローに知らせ
る制御ワードを並列に読み出す。次にこれらの演算を並
列に実行し、その結果をローカル・ゲートからの古い値
と比較する。次に、この改良されたプロセスでは、その
出力が変化したピケットのゲートをすべてデータフロー
状況ビットとして記録する。次に、外部制御装置がすべ
てのピケットに問い合わせ、次の変更済みゲートを求め
る。次いで、元の記述通り、システムは適切な信号名お
よび値をそのピケットから他のすべてのピケットに同報
通信し、信号の変化が起こらなくなるまで、またはプロ
セスが停止するまでこのサイクルを繰り返す。In utilizing the novel architecture of the present invention, process changes enhance normal applications. The process of maintaining a description of a digital system can be enhanced by using this array to describe one gate or logic element for every 100 pickets. In such a system, the process begins by assigning a description of each gate as a list of signals that the gate accepts as input and naming the signals it produces. The present invention will include the step of broadcasting the name via the bus 103 to all pickets each time the signal changes and comparing this name in parallel with the name of the expected input signal. . If a match is found, a subsequent step records the new value of the signal in the dataflow register bits within that picket. Once all signal changes have been recorded, in such an expansion process all pickets read in parallel control words that tell the data flow how to calculate the output using the set of current inputs. These operations are then performed in parallel and the result is compared with the old value from the local gate. The improved process then records all picket gates whose outputs have changed as dataflow status bits. The external controller then queries all pickets for the next modified gate. The system then broadcasts the appropriate signal name and value from that picket to all other pickets, as described earlier, and repeats this cycle until no signal changes occur or the process stops.

【０１４０】本システムを使用するために開発すること
のできるもう１つのプロセスは、ディクショナリ名の探
索である。ディクショナリ名の探索では、すべての名前
の最初の英字を、同報通信データ・アドレス・バス１０
３上の所望の同報通信名の最初の英字と比較できるよう
に、ピケット・メモリ１０２に名前を格納しておく。一
致しないすべてのピケットは、本発明で提供する制御特
性によりオフになる。次に、第２の英字を比較し、活動
状態のピケット・ユニットがなくなるまで、またはワー
ドの最後に達するまで、連続する英字（文字）について
この比較の後にオフにする手順を繰り返す。活動状態の
ピケット・ユニットがなくなるか、またはワードの最後
に達すると、残りのピケット・ユニットを照会し、シー
ケンサによって所望のデータのインデックスを読み出
す。Another process that can be developed to use the system is searching for dictionary names. When searching for dictionary names, the first letter of all names is the broadcast data address bus 10
The name is stored in picket memory 102 so that it can be compared with the first letter of the desired broadcast name on # 3. All non-matching pickets are turned off due to the control characteristics provided by the present invention. The second letter is then compared and the procedure of turning off after this comparison is repeated for successive letters (letter) until no more picket units are active or the end of the word is reached. When there are no active picket units or the end of the word is reached, the remaining picket units are queried and the sequencer reads the index of the desired data.

【０１４１】図４は、ＳＩＭＤサブシステムとして構成
できる並列アレイの一部として単一のシリコン・チップ
上に行として配列された、複数の並列プロセッサとメモ
リ、すなわちピケット・ユニットの基本ピケット構成を
示し、かつそのようなシステムの制御構造を示してい
る。図４には、制御プロセッサおよび監視マイクロプロ
セッサも示されている。図４では、同一のチップ上のメ
モリおよび並列処理要素論理機構が、ピケットのアレイ
と記したセクションに示されている。各メモリはｎビッ
ト幅であり、前述のように１文字幅すなわち８（９）ビ
ットであることが好ましい。しかし、概念的には複数バ
イト幅メモリのワード幅を有することもできる。したが
って、並列ピケット処理要素のメモリ部分は８（９）ビ
ット幅であることが好ましく、さもなければ１６ビット
幅または３２ビット幅である。現行のＣＭＯＳファウン
ドリ技術を使用する場合は、各ピケット処理要素と共に
８ビットまたは１文字幅の連想メモリ（自己検査を含め
て９ビット幅バイト）を使用することが好ましい。メモ
リは、ＡＬＵ、マスク・レジスタ（質問と回答かマスク
動作に使用される）およびラッチ１０４（図４のＳ
Ｒ）、ならびに状況レジスタ１０７とデータフロー・レ
ジスタＡ１０５およびＱ１０６（図４のＤＦ）を含
む、結合された処理要素と直接に１対１に対応する。こ
れらの要素は、図２のピケット図に詳細に示してある。
各ピケット・プロセッサのＤＲＡＭおよび論理機構に
は、相互接続ネットワークと競合するという負担はな
い。というのは、マルチビット幅ＤＲＡＭメモリとその
処理要素がチップ自体上で直接に１対１に対応するから
である。FIG. 4 shows the basic picket configuration of multiple parallel processors and memories, or picket units, arranged in rows on a single silicon chip as part of a parallel array that can be configured as a SIMD subsystem. , And illustrates the control structure of such a system. Also shown in FIG. 4 is a control processor and a supervisory microprocessor. In FIG. 4, memory and parallel processing element logic on the same chip are shown in the section labeled Array of Pickets. Each memory is n bits wide, and is preferably one character wide or 8 (9) bits as described above. However, it is also possible to have the word width of a multi-byte wide memory conceptually. Therefore, the memory portion of the parallel picket processing element is preferably 8 (9) bits wide, otherwise 16 or 32 bits wide. When using current CMOS foundry technology, it is preferred to use 8-bit or 1-character wide associative memory (9-bit wide bytes including self-test) with each picket processing element. The memory includes an ALU, a mask register (used for question and answer or mask operations) and a latch 104 (S in FIG. 4).
R), as well as a direct one-to-one correspondence with the combined processing elements, including status register 107 and dataflow registers A 105 and Q 106 (DF in FIG. 4). These elements are detailed in the picket view of FIG.
The DRAM and logic of each picket processor does not have the burden of competing with the interconnect network. This is because the multi-bit wide DRAM memory and its processing elements have a direct one-to-one correspondence on the chip itself.

【０１４２】図４では、メモリと処理要素のＡＬＵの関
連論理機構の間にスライドＢレジスタ・ラッチ（ＳＲ）
１０４が論理的に配置されており、ラッチが基本的に、
ピケット・アレイに沿った各処理要素の結合ポートとな
ることに留意されたい。各ピケット・チップは、ピケッ
ト制御機構と通信できるように線（真っすぐなバスとし
て示してある）状に配列された複数の並列ピケット処理
要素を備えている。ベクトル・アドレス・バスは、メモ
リに共通であり、データ・ベクトル・アドレス・レジス
タが、各メモリにどのデータが渡されるのかを制御す
る。In FIG. 4, there is a slide B register latch (SR) between the memory and the associated logic of the ALU of the processing element.
104 is logically arranged and the latch is basically
Note that this will be the coupling port for each processing element along the picket array. Each picket chip comprises a plurality of parallel picket processing elements arranged in a line (shown as a straight bus) for communication with a picket control mechanism. The vector address bus is common to the memories and the data vector address registers control which data is passed to each memory.

【０１４３】図４はまた、メイン・プロセッサ・カード
またはマイクロプロセッサ・カードと、サブシステム制
御装置との間の相互接続を示している。メイン・プロセ
ッサ・カードまたはマイクロ・プロセッサ・カードは、
本発明の好ましい実施例では、ＰＳ／２システムとして
構成された３８６マイクロプロセッサである。サブシス
テム制御装置を介して、大域命令がかん詰ルーチン・プ
ロセッサ（ＣＲＰ）に送られる。このかん詰ルーチン・
プロセッサは、本発明で提供され、命令シーケンサ４０
２と命令シーケンサから要求される特定のマイクロコー
ドを実行する実行制御機構４０３とに命令を供給する。
この命令シーケンサは機能上制御装置と類似するもので
よい。ただし、本発明では、かん詰ルーチン・プロセッ
サ内に、ローカル・レジスタ４０５をも設ける。このロ
ーカル・レジスタ４０５は、ローカル・レジスタＡＬＵ
（図示せず）とともに、ピケット・アレイ４０６内のす
べてのピケットに同報通信されるすべてのアドレス指定
の基礎を提供する。このようにして、ピケット資源を使
用せずに、または恐らくはピケット実行サイクルを使用
せずに、１つのＡＬＵ内のすべてのピケットについてア
ドレス計算が実行される。この重要な付加機能によっ
て、ピケット・アレイに制御の柔軟性が与えられ、ドー
ズ機能、禁止機能、および特殊タスクを実用するための
その他の制御機能が実行できるようになり、ピケットを
どんな同報通信命令やデータ機能からも分離することが
可能になる。FIG. 4 also shows the interconnection between the main processor card or microprocessor card and the subsystem controller. The main processor card or microprocessor card is
In the preferred embodiment of the invention, it is a 386 microprocessor configured as a PS / 2 system. Global instructions are sent to the canned routine processor (CRP) via the subsystem controller. This canning routine
A processor is provided in the present invention and is an instruction sequencer 40.
2 and the execution control mechanism 403 that executes the specific microcode required by the instruction sequencer.
The instruction sequencer may be functionally similar to the controller. However, the present invention also provides a local register 405 within the canned routine processor. This local register 405 is a local register ALU.
Together (not shown), it provides the basis for all addressing that is broadcast to all pickets in the picket array 406. In this way, address calculations are performed for all pickets in one ALU without using picket resources, or perhaps using picket run cycles. This important additional feature gives the picket array control flexibility and allows it to perform dose functions, inhibit functions, and other control functions for performing special tasks, allowing pickets to be broadcast in any manner. It can also be separated from the instruction and data functions.

【０１４４】マイクロコード４０７をロードされた命令
シーケンサ４０２は、ピケットのアレイに同報通信を行
って、メイン・プログラム・マイクロプロセッサ（Ｍ
Ｐ）と、かん詰ルーチン実行時ライブラリ４０８のかん
詰ルーチンとによって決定されたＳＩＭＤ命令シーケン
スの下で実行するように求め、ピケットのアレイに含ま
れるデータのＳＩＭＤ処理を可能にする。The instruction sequencer 402, loaded with microcode 407, broadcasts to an array of pickets to the main program microprocessor (M
P) and the canning routine run-time library 408 canning routine to determine under the SIMD instruction sequence determined to enable SIMD processing of the data contained in the array of pickets.

【０１４５】サブシステム・インタフェースを介してマ
イクロプロセッサ（ＭＰ）に提供される命令は、マイク
ロプロセッサ（ＭＰ）のサブシステム制御装置からマイ
クロプロセッサに渡されるＳｔａｒｔＰｒｏｃｅｓｓ
（処理開始）、ＷｒｉｔｅＯｂｓｅｒ．（書込み監
視）、ＲｅａｄＲｅｓｕｌｔ（結果読取り）を含め
て、高レベルの処理コマンドであると考えられる。この
マイクロプロセッサは、図４、図５、図６、および図７
に示すサブシステム・アレイにおけるメイン・システム
または制御プロセッサとみなすことができる。このユニ
ットは、キーボードや表示装置などの周辺入力装置（図
示せず）を付加したスタンドアロン・ユニットでもよい
ことを理解されたい。このスタンドアロン構成では、シ
ステムＭＰを、図７に示す線に沿って、シーケンサ・カ
ード（かん詰ルーチン・プロセッサを構成する）とプロ
セッサ・アレイ・カードを含むカードが挿入される、商
用ＰＳ／２とみなすことができる。ルーチン・ライブラ
リ４１１は、ＣＡＬＬ（，），Ｋａｌｍａｎ、Ｃｏｎｖ
ｏｌｖｅ、Ｎａｖ．Ｕｐｄａｔｅなどプロセスの全体的
制御用のルーチン・シーケンスを含むことができる。こ
れらのルーチンの選択はユーザ・プログラムを介して行
われ、したがって処理全体を外部ホストの制御下、また
はＭＰ内にあるユーザ・プログラム４１２の制御下で行
うことができる。並列ピケット・プロセッサ・システム
との間でのデータ転送用にＭＰメモリ内にデータ・バッ
ファ４１３が設けられている。命令シーケンサ４０２
は、ＭＰからの制御ストリームと、かん詰ルーチン実行
時ライブラリ・メモリ４０８に常駐するかん詰ルーチン
を実行するように構成される。これらのルーチンには、
かん詰ルーチン実行時ライブラリ４０８によって提供さ
れるかん詰ルーチンのＣＡＬＬ（，）、ＬｏａｄＢｌ
ｏｃｋ、Ｓｉｎ、Ｃｏｓ、Ｆｉｎｄ、Ｍｉｎ、Ｒａｎｇ
ｅＣｏｍｐが含まれる。The instructions provided to the microprocessor (MP) through the subsystem interface are passed to the microprocessor from the subsystem controller of the microprocessor (MP), and the start process is started.
(Processing start), Write Obser. It is considered to be a high-level processing command including (write monitoring) and Read Result (read result). This microprocessor is shown in FIG. 4, FIG. 5, FIG. 6, and FIG.
It can be considered as the main system or control processor in the subsystem array shown in. It should be appreciated that this unit may be a stand-alone unit with the addition of peripheral input devices (not shown) such as a keyboard and display device. In this stand-alone configuration, the system MP is a commercial PS / 2 in which a card including a sequencer card (which constitutes a canned routine processor) and a processor array card is inserted along the line shown in FIG. Can be considered The routine library 411 includes CALL (,), Kalman, Conv
olve, Nav. It may include a routine sequence for overall control of the process such as Update. Selection of these routines is done through the user program, and thus the entire process can be done under the control of the external host or under the control of the user program 412 within the MP. A data buffer 413 is provided in MP memory for data transfer to and from the parallel picket processor system. Instruction sequencer 402
Is configured to execute the control stream from the MP and the canning routine resident in the canning routine runtime library memory 408. These routines include
CALL (,) of the canning routine provided by the canning routine runtime library 408, Load Bl
ock, Sin, Cos, Find, Min, Rang
e Comp is included.

【０１４６】ＣＲＰ内には、Ｌｏａｄ機能、Ｒｅａｄ機
能、Ａｄｄ機能、Ｍｕｌｔｉｐｌｙ機能、Ｍａｔｃｈ機
能などの低レベル機能の実行制御用のマイクロコード４
０７もある。In the CRP, microcode 4 for controlling execution of low-level functions such as a Load function, a Read function, an Add function, a Multiply function, and a Match function.
There is also 07.

【０１４７】各処理ユニットごとに外部ＦＯＲ／ＮＥＸ
Ｔ制御を設けることが好ましく、かつ実際にこれを設け
る。また、決定的浮動小数点バイト正規化を実施する。External FOR / NEX for each processing unit
It is preferable and practical to provide a T-control. It also implements deterministic floating point byte normalization.

【０１４８】本発明によって提供される、システムのマ
クロ開発のために決定的手法を使用すると、ピケットの
グループ化およびグループ制御が可能になる。個々のピ
ケット処理の変動に対処するためのローカル・ドーズ機
能が提供されている。The use of the deterministic approach provided by the present invention for system macro development enables picket grouping and group control. A local dose function is provided to handle individual picket processing variations.

【０１４９】ユーザ・プログラムをプロセッサ・アレイ
によって実行する必要がある場合、ピケット・プロセッ
サのアレイに原始コマンド、アドレス、および同報通信
データが提供される。When a user program needs to be executed by the processor array, the array of picket processors is provided with source commands, addresses, and broadcast data.

【０１５０】システムの各部分がどの機能を使用するか
は、実行すべきタスクによって決まり、ユーザ・プログ
ラムのコンパイル時に割り当てられる。Which function each part of the system uses depends on the task to be performed and is assigned when the user program is compiled.

【０１５１】このサブシステムの柔軟性は、かなり一般
的な問題で例示することができる。行列乗算問題．．．|x| * |y| = |z|を例として取り上
げる。The flexibility of this subsystem can be illustrated with a fairly general problem. Matrix multiplication problem. ．． Take | x | * | y | = | z | as an example.

【０１５２】これは、下記のような問題として記述され
る。This is described as the following problem.

【数１】 [Equation 1]

【０１５３】これは、たとえば次のステートメントによ
って解かれる。各ステートメントの隣に、パスの数と、
１パス当たりのクロック・サイクルの数を示してある。This is solved, for example, by the statement Next to each statement,
The number of clock cycles per path is shown.

【０１５４】サイクル数／パス数パス 01 Call Matrix Mult Fx 1 c (R,M,C,Xaddr, Yaddr, Zaddr) 02 xSUB = ySUB = zSUB = 1 1 3 03 DO I = 1 to c 1 3 04 DO J = 1 to R C 3 05 z = 0 C×R 5/6^* 06 DO K = 1 to M C×R 3 07 *** 連想並列プロセッサに割り当てる *** 08 Zz = Xx × Yy ＋ Zz C×R×M 204/345^* 09 *** 結果を戻す *** 10 xSUB = xSUB ＋ R C×R×M 2 11 ySUB = ySUB ＋ 1 C×R×M 2 12 NEXT K C×R×M 3 13 xSUB = xSUB − M×R ＋ 1 C×R 2 14 ySUB = ySUB − M C×R 2 15 zSUB = zSUB ＋ 1 C×R 2 16 NEXT J C×R 3 17 xSUB = 1 C 2 18 NEXT z C 3 19 END Call 1 1 注^* 固定小数点（４バイト）／浮動小数点（１＋４バ
イト）−− 下記参照Number of cycles / number of passes Path 01 Call Matrix Mult Fx 1 c (R, M, C, Xaddr, Yaddr, Zaddr) 02 xSUB = ySUB = zSUB = 1 1 3 03 DO I = 1 to c 1 3 04 DO J = 1 to RC 3 05 z = 0 C × R 5/6 ^* 06 DO K = 1 to MC × R 3 07 *** Assign to associative parallel processor *** 08 Zz = Xx × Yy + Zz C × R × M 204/345 ^* 09 *** Return result *** 10 xSUB = xSUB + RC × R × M 2 11 ySUB = ySUB + 1 C × R × M 2 12 NEXT KC × R × M 3 13 xSUB = xSUB − M × R + 1 C × R 2 14 ySUB = ySUB − MC × R 2 15 zSUB = zSUB + 1 C × R 2 16 NEXT JC × R 3 17 xSUB = 1 C 2 18 NEXT z C 3 19 END Call 1 1 Note ^* Fixed point (4 bytes) / Floating point (1 + 4 bytes) --- See below

【０１５５】上記の例から、上記のステートメント０８
で識別されるタスクが、サイクル時間の約９８％を要す
ることが理解されよう。したがって、このタスクは、並
列ピケット・プロセッサのＳＩＭＤ編成に割り当てられ
る。他のプロセスはサイクル時間のうちわずか２％しか
要せず、これらのプロセスはマイクロプロセッサ内のア
ーキテクチャに維持される。From the above example, statement 08 above
It will be appreciated that the task identified by will take approximately 98% of the cycle time. Therefore, this task is assigned to the SIMD organization of parallel picket processors. Other processes take only 2% of the cycle time and these processes are maintained in the architecture within the microprocessor.

【０１５６】したがって、この行列乗算の例を検討する
と、これは実行のためにＭＰ、ＣＲＰ、ローカル・レジ
スタ（ＬＲ）またはピケット・アレイのいずれかに割り
当てられる（各ステートメントは、コンパイルされる
と、特定のシステム位置で実行を引き起こす）。Thus, considering this matrix multiplication example, it is assigned to either MP, CRP, a local register (LR) or a picket array for execution (each statement: Trigger execution at a particular system location).

【０１５７】上の行列乗算の例では、ステートメント０
１はメイン・プロセッサＭＰに割り当てられ、ステート
メント０２、０５、１０、１１、１３、１４、１５、１
７はローカル・レジスタ（ＬＲ）に割り当てられる。ス
テートメント０３、０４、０６、１２、１６、１８、１
９は、かん詰ルーチン・プロセッサ内で実行されるよう
に割り当てられる。普通なら時間のかかる行列処理は、
単一命令の下で実行されるようにピケットのアレイに割
り当てられ、ステートメント０８もピケットのアレイに
割り当てられる。In the above matrix multiplication example, statement 0
1 is assigned to the main processor MP and statements 02, 05, 10, 11, 13, 14, 15, 1
7 is assigned to the local register (LR). Statements 03, 04, 06, 12, 16, 18, 1
9 is assigned to be executed in the canned routine processor. Matrix processing, which normally takes time,
Assigned to the array of pickets to be executed under a single instruction, statement 08 is also assigned to the array of pickets.

【０１５８】図５は、複数の並列ピケット・プロセッサ
を備えた多重並列ピケット・プロセッサ・システム５１
０を示している。複数ターゲット追跡、センサとデータ
のフュージョン、信号処理、人工知能、衛星イメージ処
理、パターン／ターゲット認識、リード・ソロモン・コ
ード化／復号演算などのアプリケーション用には、好ま
しい実施例では１０２４個の並列プロセッサ１個ごとに
２〜４枚（ここでは、１システム当たり４枚のカードと
して表す）のＳＥＭＥカード５１１を備えた、１０２
４個の並列プロセッサを有するＳＩＭＤシステムとして
構成できる、システムを製作した。個々のカード５１２
は、ウェッジロック・スライド５１４を備えたラック取
付けシステム格納機能５１３に挿入可能である。カード
は挿入／取外しレバー５１６を備えており、カバー５１
７を閉じると、３２〜６４Ｍバイトの記憶域と毎秒約２
０億演算の性能を有する取付け可能システムが、ラック
内に効果的に格納される。このシステムはコンパクトで
あり、複数のピケットのアレイが、論理機構を有し複数
のカードの相互接続を可能にする、バックパネル・ボー
ドに挿入される。FIG. 5 shows a multiple parallel picket processor system 51 with multiple parallel picket processors.
0 is shown. For applications such as multi-target tracking, sensor and data fusion, signal processing, artificial intelligence, satellite image processing, pattern / target recognition, Reed-Solomon encoding / decoding operations, the preferred embodiment has 1024 parallel processors. 102-4 with 2-4 (each represented here as 4 cards per system) SEM E cards 511
A system was constructed that can be configured as a SIMD system with four parallel processors. Individual cards 512
Can be inserted into a rack mounting system storage feature 513 with a wedge lock slide 514. The card is equipped with an insertion / removal lever 516 and a cover 51
If you close 7, you will have 32 to 64 Mbytes of storage and about 2 per second.
A mountable system with a performance of 100 million operations is effectively stored in the rack. The system is compact and an array of multiple pickets is inserted into a back panel board that contains the logic and allows the interconnection of multiple cards.

【０１５９】４枚のＳＥＭＥカード上に３２Ｍバイト
の記憶域を備えたプロセッサが形成され、システムの重
量はわずか１３．６ｋｇ（３０ポンド）程度である。電
力は、図の電源５１９によって供給される。そのような
電力の空冷プロセッサの必要電力は、わずか２８０Ｗ程
度と見積もられる。各ＳＩＭＤシステムは、関連するメ
インフレームまたは世界の他の地域とのチャネル・アダ
プタ通信用の２つの入出力ポート５２０を有する。それ
ぞれ４論理ページから成り、標準のモジュール式アビオ
ニクス・パッケージングおよび外部メモリへの接続用の
バス構造（たとえば、ＰＩバス、ＴＭバス、およびＩＥ
ＥＥ４８８バス）を使用する、図の多重並列ピケット
・プロセッサでは、プロセッサは、入出力ポートを介し
てミッション・プロセッサのメモリ・バスに接続するこ
とができ、ミッション・プロセッサ・メモリ空間の拡張
部分とみなすことができる。A processor with 32 Mbytes of storage is formed on four SEM E cards, and the system weighs only about 13.6 kg (30 lbs). Electric power is supplied by the power supply 519 shown. The required power of the air-cooled processor for such power is estimated to be only about 280W. Each SIMD system has two I / O ports 520 for channel adapter communication with the associated mainframe or other parts of the world. Bus structures for standard modular avionics packaging and connections to external memory (eg, PI bus, TM bus, and IE), each consisting of 4 logical pages.
In the illustrated multiple parallel picket processor using the EE 488 bus), the processor can be connected to the mission processor memory bus through an I / O port, providing an extension of the mission processor memory space. Can be considered

【０１６０】１０２４個の並列処理要素を含む、図の多
重並列ピケット・プロセッサでは、各プロセッサは３２
Ｋバイトのローカル・メモリを有し、ピケット並列プロ
セッサへの関連する経路は並列８ビット幅または１文字
幅（９ビット）である。In the illustrated multiple parallel picket processor, which contains 1024 parallel processing elements, each processor has 32
With K bytes of local memory, the associated path to the picket parallel processor is parallel 8 bits wide or 1 character wide (9 bits).

【０１６１】各ピケット内のプロセッサは、バックプレ
ーン相互接続ネットワークを介して、他の隣接プロセッ
サとデータを交換し、かつページ間でデータを交換す
る。ネットワークとしてはクロスバーが好ましいが、ス
ライド・クロスバー、シャッフル・ネットワーク、ベー
ス３Ｎキューブ、ベース８Ｎキューブでもよい。The processors in each picket exchange data with other adjacent processors and between pages via the backplane interconnect network. A crossbar is preferable as the network, but a slide crossbar, a shuffle network, a base 3N cube, or a base 8N cube may be used.

【０１６２】システムの個々のプロセッサは４枚のカー
ドのうち２枚のカードのパック内に収容されており、１
枚のカード上にはＰＳ／２マイクロプロセッサが収容さ
れている。一方、図６および図７に概略を示すシステム
を構成する４枚のカードの残りの１枚にはかん詰ルーチ
ン・プロセッサ・シーケンサが収容されている。個々の
ピケット１００またはピケットのカード５１２は、ラッ
チ１０４のアーキテクチャと、シーケンサ・カード７
０３のＣＲＰ実行制御機構に結合されたローカル・レジ
スタとによって制御されるデータ条件に基づいて、かん
詰ルーチン・プロセッサと共にオンザフライに構成し
て、動作に入れるようにすることもでき、動作から外す
こともできる。したがって、ピケット・プロセッサは、
浮動小数点演算に関連する位置合せ動作および正規化動
作を独立に実行することができる。The individual processors of the system are housed in a pack of 2 out of 4 cards.
A PS / 2 microprocessor is housed on a single card. On the other hand, a canning routine processor sequencer is accommodated in the remaining one of the four cards constituting the system outlined in FIGS. The individual picket 100 or picket card 512 includes a latch 104 architecture and a sequencer card 7.
On-the-fly with a canned routine processor based on a data condition controlled by a local register coupled to the CRP execution control mechanism of 03 and can be put into or out of operation. You can also Therefore, the picket processor
Alignment and normalization operations associated with floating point operations can be performed independently.

【０１６３】プロセッサは、共通のシーケンサによって
並列に制御される。シーケンサ・カード７０３は、ピケ
ット・プロセッサの制御装置ＣＲＰを含み、従来のビッ
ト直列処理とも類似するバイト順次方式でＳＩＭＤプロ
セッサのアレイ上で実行するようにコード化された、単
一のスレッドの命令をピケット・プロセスに実行させる
ことができる。制御装置には、３つの層がある。ピケッ
トのマイクロ制御は、現在のプロセッサと同様にマイク
ロコード化され、すべてのピケットに並列に転送され
る。マイクロ制御とピケットは同一のクロック・システ
ムＣＬＫに対して同期化され、したがってシーケンサに
よって制御される機能を同一のクロック時間に実行する
ことができる。マイクロ制御シーケンサへのコマンドの
供給は、かん詰ルーチン・プロセッサの役割である。こ
のシーケンサ・カード７０３は、布線式の制御装置であ
り、大部分の機能の実行時に、ループ制御コマンドを実
行し、新規のマイクロ制御シーケンスを反復して開始す
る。制御装置は、かん詰ルーチン実行時ライブラリ４０
８とループ動作機能により、ピケットを、供給が十分
で、コマンドに束縛されない状態に保つ。かん詰ルーチ
ン・プロセッサ制御装置は、メイン・システムによって
呼び出されるマクロの大きな集合を含む。かん詰ルーチ
ン・プロセッサ制御装置は、サブシステム内で主監視ピ
ケット制御装置として働く。これは、ピケット・アレイ
の最上位制御システムである。ピケットのアレイの活動
を管理するのは３８６マイクロプロセッサである。所与
の瞬間に、アレイのすべてのピケットが同一の命令を実
行することができる。ただし、プロセッサの一部は、制
御フローに対して個別に応答することが可能である。The processors are controlled in parallel by a common sequencer. The sequencer card 703 contains the picket processor controller CRP and executes a single thread of instructions coded to execute on an array of SIMD processors in a byte-sequential manner similar to conventional bit-serial processing. Can be picked by the picket process. The controller has three layers. The picket micro-control is microcoded like current processors and transferred to all pickets in parallel. The microcontroller and picket are synchronized to the same clock system CLK so that the functions controlled by the sequencer can perform the same clock time. It is the responsibility of the canned routine processor to supply commands to the micro-controlled sequencer. The sequencer card 703 is a hardwired controller that executes loop control commands and iteratively starts a new micro-control sequence during execution of most functions. The control unit uses the canning routine runtime library 40.
8 and the looping feature keep the picket well-fed and command-unbound. The canned routine processor controller contains a large set of macros that are called by the main system. The canned routine processor controller acts as the primary supervisory picket controller within the subsystem. This is the highest level picket array control system. It is the 386 microprocessor that manages the activity of the picket array. At any given moment, all pickets in the array can execute the same instruction. However, some of the processors can respond individually to the control flow.

【０１６４】個別応答にはいくつかの変形があるが、各
ピケットごとにバイト制御機能（ドーズ、禁止など）が
あるため、ローカル自律性がもたらされる。このローカ
ル自律性は、プログラミングで利用でき、かつプログラ
ムのコンパイル時に提供しシステムの制御下に置くこと
ができる。There are several variations on individual responses, but the byte autonomy (dose, inhibit, etc.) for each picket provides local autonomy. This local autonomy is available for programming and can be provided at program compile time and placed under system control.

【０１６５】さらに、前述のように、ローカル・メモリ
・アドレス指定の自律性もある。ＳＩＭＤ制御装置シー
ケンサは、すべてのピケットが使用する共通のアドレス
を供給する。各ピケットは、そのアドレスをローカル的
に増補して、データ依存メモリ・アクセスを行う能力を
強化することができる。In addition, as mentioned above, there is also autonomy of local memory addressing. The SIMD controller sequencer provides a common address that all pickets use. Each picket can locally augment its address to enhance its ability to make data-dependent memory accesses.

【０１６６】さらに、ピケットは、ローカル的条件に応
じて、アレイ活動に参加することも、しないことも可能
である。Further, pickets may or may not participate in array activities, depending on local conditions.

【０１６７】この特性により、現在では、各ピケットが
それ自体を複数のグループのうちの１グループまたは数
グループに割り当てる手段を設けることにより、ＳＩＭ
Ｄ処理にグループの概念を導入することができる。構成
の変更が基本的にオンザフライで行えるこれらのグルー
プ化に基づいて、処理を進めることができる。ある実施
例では、一時に１つのグループだけまたはグループの１
つの組合せだけを活動状態にすることができ、それぞれ
が同一のＳＩＭＤ命令ストリームを実行する。ピケット
のサブセットまたはグループによる作業だけを必要とす
る動作もある。プログラミングでこの能力を利用するこ
とができる。ローカル参加自律性は、そのような作業が
可能なように調整される。明らかに、計算を実行するピ
ケットが多ければ多いほど、ローカル参加自律性が向上
する。Due to this property, each picket is now provided with means for assigning itself to one or several groups of a plurality of groups.
The concept of groups can be introduced into the D process. Processing can proceed based on these groupings, where configuration changes can basically be done on the fly. In some embodiments, only one group or one group at a time
Only one combination can be active, each executing the same SIMD instruction stream. Some actions only require work by a subset or group of pickets. You can take advantage of this ability in programming. Local participation autonomy is coordinated to allow such work. Obviously, the more pickets that perform the calculations, the better the local participation autonomy.

【０１６８】参加ピケットの数を増やす１つの方法は、
各ピケットがそれ自体の命令ストリームを実行できるよ
うにすることである。これは基本的に、ＳＩＭＤ内のＭ
ＩＭＤである。現在、同一のＳＩＭＤマシンをＭＩＭＤ
システムまたは別の構成のマシンとして構成することが
基本的に可能である。というのは、ピケットをそれ自体
の命令のシーケンスで動作するようにプログラミングで
きるからである。One way to increase the number of participating pickets is
Allowing each picket to execute its own instruction stream. This is basically M in SIMD
It is IMD. Currently, the same SIMD machine is MIMD
It is basically possible to configure it as a system or another configured machine. The picket can be programmed to work with its own sequence of instructions.

【０１６９】各ピケットにそれ自体のシーケンスを持た
せることができるので、ピケット・レベルで非常に簡単
な１組の命令を復号することが可能であり、このため、
より広範なローカル処理が行えるようになる。この機能
が最初に適用されそうな領域は、複雑な意思決定であ
る。しかし、簡単な固定小数点処理も、プログラマの関
心を集める領域となろう。Since each picket can have its own sequence, it is possible to decode a very simple set of instructions at the picket level, and thus
A wider range of local processing can be performed. The first area where this feature is likely to be applied is in complex decision making. But simple fixed-point processing will also be an area of programmer interest.

【０１７０】そのような簡単なプログラムは、たとえば
２Ｋバイトを超えないピケット・プログラムのブロック
をピケット・メモリにロードすることになる。ＳＩＭＤ
制御装置カード７０３が、実行制御を介して、指定され
たｘｙｚアドレスからローカル実行を始めるときに、こ
れらのブロックを実行することができる。制御装置が、
非常に多くのクロックをカウントするとき、あるいは図
４に示す状況ファネル（ＳＦ）レジスタを監視すること
により、タスク完了信号の有無をテストするとき、ブロ
ックの実行が続行される。Such a simple program would, for example, load a block of picket program that does not exceed 2K bytes into picket memory. SIMD
These blocks may be executed when the controller card 703 begins local execution from the specified xyz address via execution control. The control unit
Execution of the block continues when testing for the task complete signal when counting too many clocks or by monitoring the status funnel (SF) register shown in FIG.

【０１７１】状況ファネル（図４のＳＦ）は、各ピケッ
ト用のラッチ１０４を利用する。各ピケットは、ロード
するとピケットの状況条件を反映することができる、ラ
ッチ１０４を有する。ＳＩＭＤ制御装置は、アレイ状況
線を監視することにより、これらのラッチ（１ピケット
当たり１つ）内の集合値をテストすることができる。こ
のアレイ状況線は、各ピケット状況ラッチの値の論理的
組合せである。The status funnel (SF in FIG. 4) utilizes a latch 104 for each picket. Each picket has a latch 104 that can be loaded to reflect the picket's status conditions. The SIMD controller can test the aggregate values in these latches (one per picket) by monitoring the array status line. This array status line is a logical combination of the values in each picket status latch.

【０１７２】次の例では、２５０を上回る値を５００＞
ｘ≧２５０の範囲に調整したいものとする。下記のルー
チンは、状況ファネルを使って、このタスクが実行され
たことを検出する。In the following example, values greater than 250 are 500>
It is assumed that the user wants to adjust the range of x ≧ 250. The routine below uses the status funnel to detect that this task has been performed.

【０１７３】 If VALUE < 500 then TURN YOUR PICKET OFF STAT < - PICKET OFF CONDITION IF STAT FUNNEL = OFF then finished .... VALUE < - VALUE - 250 RepeatIf VALUE <500 then TURN YOUR PICKET OFF STAT <-PICKET OFF CONDITION IF STAT FUNNEL = OFF then finished .... VALUE <-VALUE-250 Repeat

【０１７４】したがって、この多重並列ピケット・プロ
セッサを、様々な方式で、ＳＩＭＤプロセッサとして構
成することができる。好ましい実施例におけるそのよう
なＳＩＭＤマシンは、ＳＩＭＤ制御装置またはＳＩＭＤ
シーケンサの全体的制御下で、単一スレッドの命令を従
来の方式で実行するようにプログラミングされ、かつ従
来のプロセッサと同様に順次方式でＳＩＭＤプロセッサ
のアレイ上で実行するようにコード化される。アプリケ
ーション・レベルでは、これはベクトル命令およびベク
トル型の命令によって実行され、ベクトルはプロセッサ
内でおよびプロセッサ間で処理することができる。ベク
トル命令には、マイクロ命令を、通常６〜１０個付加す
ることができる。Therefore, this multiple parallel picket processor can be configured as a SIMD processor in various ways. Such a SIMD machine in the preferred embodiment is a SIMD controller or SIMD
Under the overall control of the sequencer, a single thread of instructions is programmed to execute in a conventional manner and is coded to execute on an array of SIMD processors in a sequential manner similar to conventional processors. At the application level, this is done by vector instructions and vector type instructions, which allow vectors to be processed within and between processors. Usually, 6 to 10 micro-instructions can be added to the vector instruction.

【０１７５】そのような好ましい実施例では、システム
は、図式的には、図６の並列プロセッサ・サブシステム
の機能ブロック図に示すように見える。サブシステム・
シーケンサは、図４のホスト・インタフェース制御機構
によって制御されるシステムの入出力ポートを介して、
処理要素の機能を制御する高機能マクロを備えたＳＩＭ
Ｄプログラムと同様に機能する。メモリ・アドレス指定
によって８ビット・バイト幅のデータ・フローが可能に
なり、各機能（論理、加算、乗算、および除算）に８を
法とする算術論理が使用される。浮動小数点フォーマッ
トが設けられ、個々のスリープ・モードおよびドーズ・
モードならびに別々のアドレス指定を有する自律ピケッ
ト動作が可能になる。In such a preferred embodiment, the system diagrammatically appears as shown in the functional block diagram of the parallel processor subsystem of FIG. sub-system·
The sequencer, via the system I / O port controlled by the host interface control mechanism of FIG.
SIM with high-performance macros that control the functions of processing elements
Functions the same as the D program. Memory addressing allows 8-bit byte wide data flow, and modulo 8 arithmetic logic is used for each function (logic, add, multiply, and divide). Floating point format is provided for individual sleep modes and dose
Allows autonomous picket operation with modes as well as separate addressing.

【０１７６】サブシステム制御装置の配列を図７に示
す。各プロセッサ・アレイ・カード５１２（このサブシ
ステムの図では４枚となっているが、２枚のＳＥＭＥ
カードに減らすことが可能である）がシーケンサＣＲＰ
７０３に結合され、シーケンサＣＲＰ７０３はサブ
システム制御装置７０２に結合される。サブシステム制
御装置７０２は、メイン・メモリ・システムにポート接
続され、または関連マイクロチャネル・バス７０６への
チップ７０５のインタフェースを介して構成内の別のサ
ブシステムに結合される。好ましい実施例では、サブシ
ステム制御装置はＩＢＭのＰＳ／２（ＩＢＭの商標）の
汎用マイクロプロセッサ・ユニットであり、インテル３
８６プロセッサ・チップおよび４Ｍバイト・メモリを使
用する。パーソナル・コンピュータ・マイクロプロセッ
サ７０２は、サブシステム内のチップ７０５とマイクロ
チャネル型バス７０６を介してシーケンサのカードにポ
ート接続される。The array of subsystem controllers is shown in FIG. Each processor array card 512 (two SEM Es, although there are four in this subsystem diagram)
Can be reduced to cards) sequencer CRP
703 and the sequencer CRP 703 is coupled to the subsystem controller 702. Subsystem controller 702 is ported to the main memory system or is coupled to another subsystem in the configuration via the interface of chip 705 to the associated Micro Channel bus 706. In the preferred embodiment, the subsystem controller is an IBM PS / 2 (a trademark of IBM) general purpose microprocessor unit, an Intel 3
Uses 86 processor chips and 4 Mbytes of memory. The personal computer microprocessor 702 is ported to the sequencer card via a chip 705 in the subsystem and a microchannel bus 706.

【０１７７】ＳＩＭＤ用ローカル自律性の好ましい実施
例本明細書に記載のローカル自律機能はすべて、参加ピケ
ットのすべてに同時に提示されるＳＩＭＤコマンドまた
はＳＩＭＤコマンドのローカル変形として実施される。
これらのコマンドのうちのいくつかは、ローカル自律機
能を直接にもたらす。コマンド"ＬＯＡＤＯＰＦＲ
ＯＭＭＥＭＯＲＹＢＵＳ"には、関連するローカル
変形が存在しないが、これによって確実にローカル選択
されたコマンドが呼び出される。一方、コマンド"ＳＴ
ＯＲＥＴＯＡＲＥＧＰＥＲＳＴＡＴ"は、レジ
スタＡへの記憶を、ローカル状況ビットに依存させる。
これから、ローカル自律性の各特徴をそれぞれ検討す
る。Local Autonomy Preferred Embodiments for SIMD All local autonomy functions described herein are implemented as SIMD commands or local variants of SIMD commands that are presented to all of the participating pickets simultaneously.
Some of these commands directly provide local autonomy functionality. Command "LOAD OP FR
OM MEMORY BUS "has no associated local variant, but this ensures that the locally selected command is invoked. On the other hand, the command" ST
ORE TO A REG PERSTAT "makes storage in register A dependent on local status bits.
From now on, each characteristic of local autonomy will be examined.

【０１７８】図８は、本発明の好ましい実施例による、
アレイ制御装置がピケットを使用不能にする方法と、ピ
ケットがそれ自体を活動化および非活動化して、ローカ
ル自律性を提供する方法を示している。１つまたは複数
のピケットを選択的に使用不能にすると、問題を実行す
る際の自由度が大きくなる。これを本明細書ではローカ
ル自律機能に分類する。すべてのピケットにＳＩＭＤコ
マンドを発行する能力と、ピケット内のデータに応じて
ピケットに異なる動作を実行させる能力は、数種の方式
で拡張可能である。１つは、ＳＩＭＩＭＤモードと識別
されるモードで、各ピケットに命令を実行させるもので
ある。もう１つは、分離実行のためにピケットを動的に
グループ化するものである。もう１つは、ピケット内で
浮動小数点調整動作および正規化動作を効率的に実行で
きるようにするものである。FIG. 8 illustrates a preferred embodiment of the present invention.
It shows how the array controller disables the picket and how the picket activates and deactivates itself to provide local autonomy. The selective disabling of one or more pickets provides more freedom in executing the problem. This is classified as a local autonomous function in this specification. The ability to issue SIMD commands to all pickets and the ability to have the picket perform different actions depending on the data in the picket can be expanded in several ways. One is a mode identified as SIMIMD mode, which causes each picket to execute an instruction. The other is to dynamically group pickets for separate execution. The other is to allow the floating point adjustment and normalization operations to be performed efficiently within the picket.

【０１７９】たとえば、すでに正であり、変更を必要と
しないデータを有するピケットと、補数演算が必要なデ
ータを有するピケットがある、ＡＢＳＯＬＵＴＥＶＡ
ＬＵＥのような普通の単純なタスクを実行するには、ピ
ケットの選択が必要である。このタスクは、データを補
数化して一時位置に入れると同時に、結果の符号でマス
クをセットすると実行できる。そうすれば、次の命令
で、元々正の値を含んでいたピケットの活動を中断する
ことができる。さらに、次の命令で、補数化された（現
在は正の）値が一時位置から元の位置に移される。これ
で、すべてのピケットが正の値が含むようになる。後は
中断されているピケットを再活動化するだけである。For example, some pickets have data that is already positive and does not require modification, and some pickets have data that requires complement arithmetic. ABSOLUTE VA
Picket selection is required to perform common simple tasks such as LUE. This task can be performed by complementing the data into the temporary position while setting the mask with the sign of the result. Then, the next instruction can suspend the picket activity that originally contained a positive value. In addition, the next instruction moves the complemented (currently positive) value from the temporary position to the original position. Now all pickets will contain positive values. All that is left is to reactivate the suspended pickets.

【０１８０】適用可能なローカル自律機能は、ＳＴＡＴ
ラッチが設定されているときだけ記憶動作が行われる、
ＳＴＯＲＥｐｅｒＳＴＡＴ（状況によって記憶）で
ある。上述の動作では、この機能により２つのステップ
が節約される。The applicable local autonomous function is STAT.
Store operation is performed only when the latch is set,
STORE per STAT (memorized depending on the situation). In the operation described above, this feature saves two steps.

【０１８１】中断／再活動化手法では、次の点でいくつ
かの問題が発生する。The suspend / reactivate approach presents several problems in the following respects.

【０１８２】１．遊休プロセッサの影響でマシン効率が
低下する。1. Machine efficiency drops due to idle processor effects.

【０１８３】２．マスクをセットしリセットする動作
が、複雑なタスクおよびそのプログラムのかなりの部分
を占める可能性がある。2. The act of setting and resetting masks can be a significant part of a complex task and its program.

【０１８４】３．プロセッサ間でのデータ転送などのタ
スクでは、単純な走行／停止マスク動作と比べて、個々
の単位動作が全く変わる必要がある。3. For tasks such as data transfer between processors, individual unit operations need to be completely different compared to simple run / stop mask operations.

【０１８５】図２ないし図８に関連して説明した構造を
使用する、本発明の好ましい実施例は、ローカル自律性
の問題に対処するものである。The preferred embodiment of the present invention, using the structure described in connection with FIGS. 2-8, addresses the problem of local autonomy.

【０１８６】ＳＩＭＩＭＤの一般的考察本発明では、ピケットのＳＩＭＤアレイ内のピケットの
設計に適用すると、以前はタスクの実行が困難または不
可能であった動作が容易に実行できるようになる、実施
技法をいくつか提供する。その結果、各ピケットはある
程度のローカル自律性を持つようになる。SIMIMD General Considerations The present invention, when applied to the design of pickets within a SIMD array of pickets, facilitates performing operations that were previously difficult or impossible to perform tasks. Provide some. As a result, each picket will have some degree of local autonomy.

【０１８７】これらの概念は、米国特許第４７８３７３
８号および第５０４５９９５号などの、従来の技術に記
載されている走行／停止の選択を拡張したものである。
本発明の好ましいシステムでは、各ピケットが様々な実
行能力を持つことができるようにする、機構の集合また
は複数の機構によって各ピケットを使用可能にすること
ができる。これらの実行能力により、各ピケットは、ピ
ケット内でデータを実行するために様々なモードを獲得
できるようにより、外部ＳＩＭＤ制御装置から送信され
るのではなくピケット内でＳＩＭＤコマンドを解釈でき
るようになる。この能力は、複数のモードに及び、それ
らのモードで、ＳＩＭＤアレイの各プロセッサがローカ
ル条件に基づいて異なる動作を実行でき、かつ実際に実
行する。These concepts are described in US Pat. No. 478373.
It is an extension of the run / stop selection described in the prior art, such as No. 8 and No. 5045995.
In the preferred system of the present invention, each picket may be enabled by a collection of mechanisms or multiple mechanisms that allow each picket to have different execution capabilities. These execution capabilities allow each picket to acquire different modes for executing data within the picket, allowing for interpretation of SIMD commands within the picket rather than being sent from an external SIMD controller. . This capability spans multiple modes, in which each processor in the SIMD array can and does perform different operations based on local conditions.

【０１８８】これらのローカル自律機能のいくつかを組
み合わせると、ピケットに短期間の間、それ自体のロー
カル・プログラム・セグメントを実行する能力を与える
ことができる。これによって、ＳＩＭＤアレイ内でＭＩ
ＭＤ能力が与えられる。このアーキテクチャをＳＩＭＩ
ＭＤと呼ぶ。ＳＩＭＩＭＤをサポートする機構によっ
て、本発明のＳＩＭＩＭＤアレイ処理システムが可能に
なる。Combining some of these local autonomous functions can give a picket the ability to execute its own local program segment for a short period of time. This allows MI in the SIMD array.
MD ability is given. This architecture is SIMI
Call it MD. Mechanisms that support SIMIMD enable the SIMIMD array processing system of the present invention.

【０１８９】これらの機能のいくつかは、ピケットのグ
ループ化の機構形成に参加して、ピケットを選択して分
離計算用のグループを形成するための複雑なツールをサ
ポートする。この機能をグループ化と呼び、グループ化
をサポートする機構が本発明のシステムに含まれる。グ
ループ化については、"Grouping of SIMD Pickets"と題
する関連出願にも記載されている。Some of these functions participate in the formation of picket grouping mechanisms to support complex tools for selecting pickets to form groups for separate computations. This function is called grouping, and a mechanism supporting grouping is included in the system of the present invention. Grouping is also described in a related application entitled "Grouping of SIMD Pickets".

【０１９０】これらの機能のいくつかは、"Floating-Po
int Implementation on a SIMD Machine"と題する関連
出願に記載されている、効率的な浮動小数点手法の機構
形成に参加する。これらの機構で浮動小数点をサポート
することが好ましい。Some of these features are "Floating-Po
Participate in the formation of mechanisms for efficient floating-point techniques, described in a related application entitled "int Implementation on a SIMD Machine." It is preferred that these mechanisms support floating-point.

【０１９１】システム機構の考察これから考察する「ローカル自律性」の諸項目は、次の
３つの範疇に分類することができる。１．状況制御式ローカル動作２．データ制御式ローカル動作３．プロセッサからマイクロ制御装置への状況の分配Consideration of System Mechanism Various items of “local autonomy” to be discussed below can be classified into the following three categories. 1. Situation-controlled local operation 2. Data controlled local operation 3. Distribution of status from the processor to the microcontroller

【０１９２】状況生成の考察状況は、命令ストリーム内での以前の活動に基づいて設
定される。状況レジスタのロードは、命令ストリームに
よって制御される。状況レジスタには、命令ストリーム
ＺｅｒｏＤｅｔｅｃｔ、Ｓｉｇｎ、Ｅｑｕａｌ、Ｇｒ
ｅａｔｅｒｔｈａｎ、ＬｅｓｓＴｈａｎ、ｃａｒｒ
ｙｏｕｔがロードされる。Context Generation Considerations Contexts are set based on previous activity in the instruction stream. The loading of the status register is controlled by the instruction stream. The status register contains the instruction streams Zero Detect, Sign, Equal, Gr.
eater than, Less Than, carr
y out is loaded.

【０１９３】Ａｄｄｗｉｔｈｃａｒｒｙは、演算の
精度を拡張するための状況機能の一般的な使用例であ
る。状況のさらに巧妙な使用が含まれる。Add with carry is a common use case for the status function to extend the accuracy of operations. Includes more subtle uses of the situation.

【０１９４】状況は、後で使用できるようにデータ・ワ
ードとしてメモリに格納することができる。また状況を
論理演算および算術演算に使用して、他の格納値のかな
り複雑な組合せを生成することもできる。これらの計算
値を使用して、ピケットのドーズ・ラッチの値を変更
し、ピケットが、他の計算情報によって再び使用可能に
なるまで、その動作を中断させることができる。The status can be stored in memory as a data word for later use. The situation can also be used in logical and arithmetic operations to produce fairly complex combinations of other stored values. These calculated values can be used to change the value of the picket's dose latch and suspend its operation until the picket becomes available again with other calculated information.

【０１９５】アレイ制御装置内で実行されているサブル
ーチン呼出しまたは文脈切替えと関係がある状況は、セ
ーブすることができる。その後、「ポップ」または「リ
ターン」により、これらの値をリコールして使用するこ
とができる。もちろん、サブルーチン動作にピケットの
選択されたグループまたはサブセットが参加できるが、
サブルーチンからのリターン時に活動化されるピケット
の数はそれより多くなる。Situations associated with subroutine calls or context switches executing within the array controller can be saved. These values can then be recalled and used by "pop" or "return." Of course, a selected group or subset of pickets can participate in the subroutine operation,
There will be more pickets activated on return from the subroutine.

【０１９６】状況制御式ローカル動作下記の各「ローカル自律」機能は、ＳＩＭＤ命令によっ
てＳＩＭＤサブシステム制御装置から選択されたとき、
状況によって制御される。図８の状況ラッチに状況条件
の１つまたはすべてを転送するためのコマンドで状況を
更新することができ、またはこの状況ラッチにメモリ位
置からロードすることができる。後者の方法は、"Group
ing of SIMD Pickets"と題する特許出願に記載されてい
る、ピケットのグループ化で使用されている。各ローカ
ル・ピケットの状況をグループ化し、状況ファネルＳＦ
を介してＳＩＭＤ制御装置にそっくり送ることができ、
制御装置は、個別に各ピケットから状況ワードを読み取
らずに、ピケット活動に明確に反応するための巧妙な方
法を与えられる。Context-Controlled Local Operations Each of the following “local autonomous” functions, when selected from the SIMD subsystem controller by a SIMD instruction:
Controlled by the situation. The status can be updated with a command to transfer one or all of the status conditions to the status latch of FIG. 8 or the status latch can be loaded from a memory location. The latter method is called "Group
Used in the grouping of pickets described in the patent application entitled "ing of SIMD Pickets". The status of each local picket is grouped into a status funnel SF.
Can be sent to the SIMD controller via
The controller is provided with a clever way to react clearly to picket activity without having to read the status word from each picket individually.

【０１９７】ローカル参加ピケットは、直前の状況に基づいてそれ自体をオンまた
はオフにすることができる。ピケットは、ピケット内の
図８のドーズ・ラッチに適当な値をロードすることによ
り、ドーズ・モードに入ることができる。これを「ロー
カル参加」自律性と呼ぶことができる。ローカル参加自
律性は、ピケット・チップ内の個々のピケットの内部制
御用の機構をもたらす。ドーズ・モードのピケットは、
それ自体のメモリに書き込まず、一部分のレジスタの内
容を変更しないので、状態を変化させないが、すべての
テスト動作を監視し、選択された結果に基づいて再びオ
ンに戻ることができる。図８は、ドーズとディスエーブ
ルという２つの「オフ」モードの関係を示している。デ
ィスエーブル・モードでは、サブシステム制御装置が、
プロセスに実際に参加していない個々のピケットを使用
不能にすることができる。米国特許第５０４５９９５号
のようなＳＩＭＤ制御装置状況での使用不能命令は、個
々の処理要素を使用不能にするように効果的に働くこと
ができる。しかし、本発明では制御装置は、プロセス内
の別の１組のデータを起動する必要があったとき、選択
されたピケットを使用可能にする。Local Participation A picket can turn itself on or off based on immediate circumstances. The picket can enter the dose mode by loading the dose latch of FIG. 8 in the picket with the appropriate value. This can be called "local participation" autonomy. Local participation autonomy provides a mechanism for internal control of individual pickets within the picket chip. Doze mode pickets
It does not write to its own memory and does not change the contents of some registers, so it does not change state, but can monitor all test operations and turn it back on again based on the selected result. FIG. 8 shows the relationship between the two "off" modes: dose and disable. In disabled mode, the subsystem controller
Individual pickets that do not actually participate in the process can be disabled. Disable instructions in SIMD controller situations, such as US Pat. No. 5,045,995, can effectively work to disable individual processing elements. However, in the present invention, the controller enables the selected picket when it needs to activate another set of data in the process.

【０１９８】例：０より大きな値を持つすべてのピケ
ットが、次の組の動作に参加しない。Example: All pickets with values greater than 0 do not participate in the next set of operations.

【０１９９】本発明の好ましい実施例におけるドーズと
ディスエーブルの関係を図８に示す。FIG. 8 shows the relationship between dose and disable in the preferred embodiment of the present invention.

【０２００】状況によるキャリーインまた、本発明では、各ピケットが、直前の状況に基づい
て現在の動作にキャリーインを追加することができる。
これは、データ長がハードウェア・レジスタ長の倍数で
あるノイマン型マシンで共通の動作である。しかし、多
数のピケットが様々なデータに対して同一の動作を実行
しているとき、各ピケットは直前の動作のデータに基づ
いてキャリーインを生成しなければならない。これは、
各ピケットに固有の動作であり、したがって重要なロー
カル自律動作である。Carry-In by Situation Also, in the present invention, each picket can add carry-in to the current action based on the immediately preceding situation.
This is a common operation in Neumann machines where the data length is a multiple of the hardware register length. However, when multiple pickets are performing the same operation on different data, each picket must generate a carry-in based on the data of the previous operation. this is,
It is a unique action for each picket and is therefore an important local autonomous action.

【０２０１】例：浮動小数点正規化動作における先行
ゼロの検索。Example: Search for leading zero in floating point normalization operation.

【０２０２】状況による格納禁止また、本発明によれば、パケットは直前の状況に基づい
てメモリへの格納動作を禁止することができる。この選
択的書込み動作を用いると、ピケット状況でピケット・
メモリへのデータの再格納を制御することができる。す
べてのピケットで実行される演算が、一部の個別ピケッ
トに有用な結果を生成しないとき、この動作を効果的な
禁止機構として使用することができる。Prohibition of Storing Depending on the Situation Further, according to the present invention, the packet can be prohibited from being stored in the memory based on the immediately preceding situation. With this selective write operation, picket
Restoring of data to memory can be controlled. This operation can be used as an effective prohibiting mechanism when the operations performed on all pickets do not produce useful results for some individual pickets.

【０２０３】例：絶対値の計算時に直前の状況が負だ
った場合に補数を格納し戻す。Example: When the absolute value is calculated and the immediately preceding situation is negative, the complement is stored and returned.

【０２０４】状況によるレジスタ・ソース選択ピケットは、状況に基づいて、２つのデータ・ソースの
うちの一方を選択し、ハードウェア・レジスタをロード
することができる。状況は、２つのデータ・ソースのう
ちの一方を選ぶのに使われるマルチプレクサを切り替え
るのに使用される。データ・ソースは、２つのハードウ
ェア・レジスタ、または同報通信バスと１つのハードウ
ェア・レジスタとすることができる。Register Source Selection by Context Picket can select one of two data sources and load hardware registers based on the context. The situation is used to switch the multiplexer used to select one of the two data sources. The data source can be two hardware registers, or a broadcast bus and one hardware register.

【０２０５】例：同報通信バス上のデータがピケット
にとって有用な場合、該バスから読み取る。そうでない
場合は、内部レジスタのデータを使用する。この例は、
英字およびワードのストリームが同報通信バスを介して
移動し、潜在的一致を含むピケットがデータを捕捉する
という、ディクショナリ機能を表している。Example: If the data on the broadcast bus is useful to the picket, read from it. Otherwise, the data in the internal register is used. This example
It represents a dictionary function in which streams of letters and words travel over a broadcast bus and pickets containing potential matches capture the data.

【０２０６】例：内容が、メモリから読み取ったデー
タよりも小さい場合に、レジスタからＬｏａｄＲｅｇ
する。Example: If the content is smaller than the data read from memory, Load Reg from register
To do.

【０２０７】状況による、代替メモリ位置からの読取りピケットは、状況に基づいて２つのメモリ位置のうちの
一方からデータを読み取ることができる（ストライドは
２のｎ乗）。通常、レコード内には１バイトより多くの
データがある。また、通常、１データ・セットごとに複
数のデータ・レコードがある。この選択機能を用いる
と、ローカル・ピケット状況で、データの２つのレコー
ドの一方を選択することができる。最大値を探してお
り、それを見つけるためにテストを行ったものとする。
その場合、この機能を使用すれば、将来使用するため選
択されたレコードを捕捉することができるはずである。Reading from Alternate Memory Locations, Depending on Context Picket can read data from one of two memory locations, depending on the context (stride is 2n). Usually, there is more than one byte of data in a record. Also, there are typically multiple data records per data set. This select function allows you to select one of two records of data in a local picket situation. Suppose you are looking for a maximum and have tested it to find it.
In that case, this feature should be able to capture selected records for future use.

【０２０８】例：浮動小数点調整動作では、２つの値
を突き合わせ、一方の指数部が他方と一致するように一
方の値をスケーリングする。「どちらの一方」をスケー
リングするかは、指数部を比較し、状況値をロードする
ことによって決定する。Example: In a floating point adjust operation, two values are matched and one value is scaled so that the exponent of one matches the other. Deciding which one to scale is determined by comparing the exponents and loading the context value.

【０２０９】状況による代替メモリ位置への格納ピケットは、状況に基づいて２つのメモリ位置のうちの
一方にデータを格納することができる（ストライドは２
のｎ乗）。これは、読取り動作と類似している。Situation Stores to Alternate Memory Locations Pickets can store data in one of two memory locations depending on the situation (the stride is 2
N-th power). This is similar to a read operation.

【０２１０】状況による、隣接ピケットへの転送のため
の選択ピケットは、本発明のスライド動作を介して隣接ピケッ
トへの転送を行うことができる。この動作のソース・デ
ータは、状況によってＡレジスタとＢレジスタのどちら
かとすることができる。そのよい例が、すべてのピケッ
トにわたってデータを比較する場合である。その目標
は、最大値を識別することである。したがって、奇数番
目のすべてのピケットがデータを右に移動し、そこでデ
ータがレジスタに格納される。受取り側ピケットは、新
しいデータ・ワードを該ピケットが有するデータ・ワー
ドと比較し、状況をセーブし、適切なソースを選択する
ことによって２つのうちの大きい方を転送する。Choice of Transfers to Adjacent Pickets, Depending on the Situation Pickets can make transfers to adjacent pickets via the sliding action of the present invention. The source data for this operation can be either the A register or the B register depending on the situation. A good example is when comparing data across all pickets. The goal is to identify the maximum. Therefore, all odd pickets move data to the right where it is stored in registers. The receiving picket transfers the larger of the two by comparing the new data word with the data word it has, saving the situation, and selecting the appropriate source.

【０２１１】データ制御式ローカル動作各ピケット内のデータに基づくローカル自律機能がいく
つかある。これらのデータ依存機能は、データ・レジス
タの内容を使ってピケット内の独立活動を決定させる、
ＳＩＭＤ制御装置のマイクロワードによって使用可能に
なる。いくつかの例を挙げて、データ制御式ローカル動
作について説明する。Data Controlled Local Operations There are several local autonomous functions that are based on the data in each picket. These data-dependent functions use the contents of the data register to determine independent activity within the picket,
Enabled by SIMW controller microwords. Data controlled local operation will be described with some examples.

【０２１２】データによる代替メモリ位置からの読取り各ピケットは、そのレジスタのうちの１つに含まれるデ
ータをメモリ・アドレスの一部として使用して、それ自
体のメモリをアドレスすることができる。アドレス可能
なメモリのフィールドは、２の累乗で、２からすべての
ピケットのメモリまでの範囲に及ぶ。現在の適切で実用
的な設計点は４Ｋである。もう１つの特徴は、アドレス
指定にストライドを導入して、制御装置が、ピケットに
１つおきにまたは４つおきになどでワードをアドレスさ
せることができるようにするものである。メモリのこの
フィールドのベース（ピケット・アドレスの他のビッ
ト）は、ＳＩＭＤ制御装置から供給される。Reading from Alternate Memory Locations with Data Each picket can address its own memory using the data contained in one of its registers as part of the memory address. Addressable memory fields are powers of two and range from 2 to all picket memories. The current appropriate and practical design point is 4K. Another feature is the introduction of strides in addressing to allow the controller to address the picket every other word, every fourth word, or the like. The base of this field of memory (the other bits of the picket address) is provided by the SIMD controller.

【０２１３】例：これらの着想の主な応用例は、０〜
９０度の角度に対するsinの値を含むテーブルから、三
角関数や超越関数などの１組の値のうちの１つを求める
表引きである。Examples: The main applications of these ideas are 0-
It is a table for obtaining one of a set of values such as a trigonometric function and a transcendental function from a table containing the value of sin for an angle of 90 degrees.

【０２１４】メモリ・フィールドへの条件付きインデク
シングこの着想は、インデクシングを状況で条件付け、ピケッ
ト・レジスタ内のデータに基づいてインデックスの深さ
を決定することによって、２つの考えを組み合わせるも
のである。Conditional Indexing to Memory Fields This idea combines the two ideas by conditionally indexing and determining the index depth based on the data in the picket register.

【０２１５】メモリからのデータに基づく加算器動作ピケット・メモリは、ピケットのタスクに固有の「命
令」を格納することができる。これらの命令をメモリか
ら読み取り、ピケットのレジスタに格納し、ピケット・
データフロー・レジスタ、マルチプレクサ、およびＡＬ
Ｕによって実行とができる。Adder Operation Based on Data from Memory Picket memory can store "instructions" specific to the task of the picket. These instructions are read from memory and stored in the picket's registers.
Dataflow register, multiplexer, and AL
Can be executed by U.

【０２１６】例：ＳＩＭＤアレイが論理設計のシミュ
レーションを実行している場合、各ピケットに論理機能
を割り当てることができる。これをレジスタに格納すれ
ば、それを使って、そのピケットのＡＬＵ内のこの機能
に指令することができる。Example: If the SIMD array is performing a simulation of a logic design, each picket can be assigned a logic function. If it is stored in a register, it can be used to command this function in the picket's ALU.

【０２１７】メモリからのデータに基づくマスク動作各ピケットは、データとマスクの論理的組合せが各ピケ
ットごとに異なる結果をもたらすように、メモリ内でデ
ータとマスクを混合することにより、ローカル自律性を
達成することができる。データとそれに関連するマスク
を対にすると、トリットと称する単位が形成される。Masking Based on Data from Memory Each picket provides local autonomy by mixing data and masks in memory so that the logical combination of data and mask produces different results for each picket. Can be achieved. Pairing the data with its associated mask forms a unit called a trit.

【０２１８】ピケットからマイクロ制御装置への状況の
分配ローカル自律性の第３の範疇は、ある事象に関係する条
件を決定し、この条件を状況ファネルを介してＳＩＭＤ
制御装置に報告するピケットに関するものである。状況
ファネルは基本的に、活動状態の各ピケットから制御装
置に供給される条件の全体的論理和である。状況ファネ
ルは、少なくとも１つのピケットが要求された条件を含
むかどうかを示す。この考えを他の条件について繰り返
して、制御装置がおそらく４つ以上の条件に関する通知
を同時に受けることができるようにすることができる。
したがって、ピケット内で行われる個々のテストに基づ
いて制御の判断を下し、それをＳＴＡＴ報告ファネルを
介してアレイ制御装置に通信することができる。Distribution of Context from Picket to Microcontroller The third category of local autonomy determines the conditions associated with an event, which can be SIMD via the Context Funnel.
It is about pickets reporting to the controller. The status funnel is basically the overall disjunction of the conditions supplied to the controller from each active picket. The status funnel indicates whether at least one picket contains the requested condition. This idea can be repeated for other conditions so that the controller can possibly be notified of more than three conditions simultaneously.
Therefore, it is possible to make control decisions based on the individual tests performed within the picket and communicate them to the array controller via the STAT reporting funnel.

【０２１９】例：この機能は、動作が完全であるかど
うか、あるいは動作に参加している活動状態のピケット
が他にもうないかどうかを判定する際に制御装置が使用
することができる。Example: This function can be used by the controller in determining if a motion is complete, or if there are no more active pickets participating in the motion.

【０２２０】例：ピケット・アレイは、同時に参加し
ているすべての活動状態のピケットに対して、マスク動
作によって探索を実行することができる。マスク動作で
は、一致するピケットが一致線をハイにして一致を示
す。一致条件は、活動状態のプロセスにおいて後で判断
するために、状況ファネルを介して制御装置に送られ
る。Example: The picket array can perform a search by mask operation on all active pickets that are simultaneously participating. In a mask operation, a matching picket will raise the match line high to indicate a match. Match conditions are sent to the controller via the status funnel for later determination in the active process.

【０２２１】例：ピケット・アレイは、情報の集合を
保持することができ、一時に１文字を比較し、ピケット
が一致しなくなった時点でピケットをオフにすることに
より特定の情報を探すことができる。これによって、１
回の同時同報通信比較ですべての活動状態ピケットを処
理する、強力な並列関連付け機能がもたらされる。Example: A picket array can hold a collection of information, comparing one character at a time and looking for specific information by turning the picket off when the pickets no longer match. it can. By this, 1
A powerful parallel association feature is provided that handles all active pickets in one simultaneous broadcast comparison.

【０２２２】マルチレベル状況および状況のセーブ状況の反復的生成および制御を管理するための２つの能
力について説明する。すなわち、様々なレベルのプログ
ラムに入るとき、状況がより複雑になりまたはプロセッ
サ制限度が高くなり、あるいはその両方になることがあ
り得る。ソフトウェアの制御下で完全なピケット状況を
セーブし復元する能力があると、ＣＡＬＬ／ＲＥＴＵＲ
Ｎシーケンスを用いてソフトウェアのレベルをカスケー
ド化できるようになる。各ピケットは、アレイ制御装置
から指令を受けると、その状況を個別メモリにセーブす
る。Multi-Level Situations and Saving Situations Two capabilities for managing iterative generation and control of situations are described. That is, when entering different levels of a program, the situation can be more complex and / or processor-bound. The ability to save and restore a complete picket situation under software control, CALL / RETUR
Allows N levels to be used to cascade levels of software. Each picket saves its status in a separate memory when commanded by the array controller.

【０２２３】状況の生成状況は、ＡＬＵ演算の結果生成される。命令ストリーム
は、テストすべき条件と、その条件を既存の状況とどう
組み合わせるかとを示す。以下の選択肢がある。テスト
結果を無視する。まったく新しい状況を設定する。新規
状況と既存の状況のＯＲまたはＸＯＲを使う。状況の生
成に使用できる条件には、Zero result,Equal, Greater
than, Less than, Not equal, Equal or greater tha
n, Equalor less than, Overflow, およびUnderflowが
ある。Situation Generation Situations are generated as a result of ALU operations. The instruction stream shows the conditions to be tested and how to combine them with existing situations. You have the following options: Ignore test results. Setting a whole new situation. Use OR or XOR of new and existing situations. Conditions that can be used to generate a situation include Zero result, Equal, Greater
than, Less than, Not equal, Equal or greater tha
There are n, Equalor less than, Overflow, and Underflow.

【０２２４】命令によって制御される状況の使用法各命令は、状況を収集しセーブする方法に関するコマン
ドを各ピケットに提供する。ピケットでローカル的に状
況を使用する方法を示すコマンドがすべてのピケットに
提供される。これらの使用法コマンドは、複数の事項を
指定する。それらはすべてローカル自律性の発想に関す
るもので、とりわけピケット参加、再格納禁止、状況に
よるデータ・ソース選択、およびピケット間通信の制御
が含まれる。ピケットのアレイの制御装置に、個々のピ
ケットの状況を制御装置に収集する方法と、この状況を
使用して大域動作を管理する方法を指示するコマンドも
提供される。基本的に、各ピケットの状況を他のあらゆ
るピケットの状況とのＯＲまたはＡＮＤをとり、その結
果が、アレイ制御装置が使用できるように該制御装置に
提示される。Instruction-Used Situation Usage Each instruction provides a command to each picket as to how to collect and save the situation. Commands are provided for all pickets that show how to use status locally on the picket. These usage commands specify several things. They are all related to the idea of local autonomy, and include picket participation, no restores, contextual data source selection, and control of inter-picket communication. Commands are also provided to the controller of the array of pickets to instruct them how to collect the status of individual pickets to the controller and how to use this status to manage global operation. Basically, the status of each picket is ORed and ANDed with the status of every other picket, and the result is presented to the array controller for use.

【０２２５】好ましい浮動小数点実施例に関する一般的
考察浮動小数点演算のアプリケーション用に、ＳＩＭＤマシ
ンにローカル自律性を持たせることが特に好ましい。"F
loating-Point for SIMD Array Machine"と題する上記
で参照した関連出願に詳述されているＳＩＭＤアレイ・
マシンに浮動小数点に関する諸能力を提供するための新
しい方法の説明を加えれば、本発明の卓越性が理解でき
る。General Considerations for the Preferred Floating Point Embodiments It is especially preferred to provide SIMD machines with local autonomy for floating point arithmetic applications. "F
SIMD array as detailed in the above-referenced related application entitled "loating-Point for SIMD Array Machine".
The excellence of the present invention can be understood by adding a description of a new method for providing a machine with floating point capabilities.

【０２２６】上記システム用の浮動小数点フォーマット
を用いると、アレイＳＩＭＤマシンが浮動小数点演算を
実行することができる。説明の便宜上、図９に本発明の
浮動小数点フォーマットについて記載する。小数部のカ
ウントは指数部のカウント１に対して１バイトだけシフ
トする。図１０は、小数部を１バイトだけシフトし、浮
動小数点指数部を１だけ増分することによって浮動小数
点調整を行うステップを示している。この特定のアレイ
を用いると、メモリを使って調整シフトを行うことがで
きる。図１１は、バイト幅メモリを使って、浮動小数点
調整動作の一部であるシフトを有利に実行することによ
る、調整シフトの使用法を示している。The floating point format for the above system allows an array SIMD machine to perform floating point operations. For convenience of explanation, FIG. 9 shows the floating point format of the present invention. The fractional part count is shifted by 1 byte with respect to the exponent part count 1. FIG. 10 illustrates the steps of making a floating point adjustment by shifting the fractional part by one byte and incrementing the floating point exponent by one. With this particular array, the memory can be used to perform adjustment shifts. FIG. 11 illustrates the use of adjusted shifts by using byte wide memory to advantageously perform the shifts that are part of the floating point adjust operation.

【０２２７】浮動小数点フォーマットＳＩＭＤマシンにおいて浮動小数点を首尾よく実施する
ための鍵は、図９に示す適切なフォーマットである。適
切なフォーマットとは、フォーマットがアーキテクチャ
の何らかの態様と互換性があり、必ずしも既存の浮動小
数点標準の１つと互換性があるわけではないことを意味
する。Floating Point Format The key to the successful implementation of floating point on SIMD machines is the appropriate format shown in FIG. A suitable format means that the format is compatible with some aspect of the architecture and not necessarily with one of the existing floating point standards.

【０２２８】本発明では、バイト幅データフロー上で実
行され、従来の実施態様の数倍の性能を有するフォーマ
ットを提供する。The present invention provides a format that runs on byte wide data flows and has several times the performance of conventional implementations.

【０２２９】このフォーマットは、少なくともＩＥＥＥ
３２ビット浮動小数点標準と同程度に正確な回答を生成
する。このフォーマットは、平均して約２ビット高い精
度を有する。This format is at least IEEE
It produces answers as accurate as the 32-bit floating point standard. This format has an accuracy of about 2 bits higher on average.

【０２３０】このフォーマットは、ユーザが使用するた
め、既存の標準との間で容易に変換できる。Since this format is used by the user, it can be easily converted to and from the existing standard.

【０２３１】したがって、このフォーマットは、外から
見れば標準とみなせる。ピケット内部でこの提案された
フォーマットで浮動小数点を効率的に実行することがで
き、データを並列プロセッサにロードしまたはそこから
削除する時に、必要に応じてデータのフォーマットを変
換することができる。Therefore, this format can be regarded as a standard from the outside. Floating point can be efficiently implemented in this proposed format inside a picket, and the format of the data can be converted as needed when loading or deleting data from a parallel processor.

【０２３２】この浮動小数点フォーマットは、マルチバ
イト幅データフローに対する実施効率が得られるように
選択された。このフォーマットは、実施効率をもたらす
と同時に、ＩＥＥＥ３２ビット浮動小数点フォーマット
を上回る演算精度を提供する。This floating point format was chosen for implementation efficiency on multibyte wide data flows. This format provides implementation efficiency while providing greater computational accuracy than the IEEE 32-bit floating point format.

【０２３３】図９に示すフォーマットは、ＩＥＥＥ３２
ビット浮動小数点フォーマットよりも高い精度を得るの
に適したフォーマットの代表であり、バイト幅（８ビッ
ト）データ・ストリームを有するマシンで実施すること
を目的としている。好ましいフォーマットは、１ビット
の符号、７ビットの指数部、および８ビット４バイトの
小数部からなり、合計４０ビットである。The format shown in FIG. 9 is IEEE32.
It is a representative of formats suitable for obtaining higher precision than bit floating point formats and is intended to be implemented on machines with byte wide (8 bit) data streams. The preferred format consists of a 1-bit sign, a 7-bit exponent part, and an 8-bit 4-byte fractional part, for a total of 40 bits.

【０２３４】小数部の長さを整数個のバイトだけ拡張す
ることにより、この同じ計算スタイルを拡張して、浮動
小数点計算の精度を上げることができることに留意され
たい。Note that this same style of computation can be extended to extend the precision of floating point calculations by extending the fraction length by an integral number of bytes.

【０２３５】指数部の各カウントが小数部の８ビット・
シフトを表すので、正規化された数値は最大７個の先行
０を持つことができる。先行０が８個以上ある場合、そ
の数値を調整して正規化する必要がある。Each count of the exponent part is 8 bits of the decimal part.
Since it represents a shift, the normalized number can have up to 7 leading zeros. If there are 8 or more leading zeros, it is necessary to adjust the value and normalize it.

【０２３６】図１０は、調整シフトを使って浮動小数点
調整を行う方法を示している。このシステムでは、デー
タ・ワードが図９のフォーマットで構成されている。浮
動小数点調整を行うには、小数部を８ビットだけ左にシ
フトし、指数部を１だけ減らす。上述のシステムを典型
とするバイト幅データフローでは、このシフト動作は、
次のバイトを第１バイトとし、すべて０の第１バイトを
切り捨てるだけで実施することができる。このシフトの
実施は、様々な方法で行うことができる。FIG. 10 illustrates a method of making floating point adjustments using adjustment shifts. In this system, the data words are organized in the format of FIG. To make a floating point adjustment, the fractional part is shifted left by 8 bits and the exponent part is decremented by 1. In a byte wide data flow typical of the system described above, this shift operation is
This can be performed by setting the next byte as the first byte and truncating the first byte of all zeros. The implementation of this shift can be done in various ways.

【０２３７】バイト幅データフローでは、１クロック・
サイクルに１バイト幅シフトでデータをシフトする。最
大３回のシフトが必要である。これを、先行０のテスト
と組み合わせることができる。In the byte width data flow, 1 clock
Data is shifted by 1 byte width in a cycle. Up to 3 shifts are required. This can be combined with a leading zero test.

【０２３８】図１１に示すもう１つの手法は、並列アレ
イに対して実施することができ、この手法はアレイの処
理要素がそれ自体のメモリを有するマシンに特に有利で
ある。この場合、データがメモリ内に存在すると仮定す
る。マシンはまず、先行０のバイト数を決定する。次
に、マシンは先行０バイトのカウントを使って、データ
の獲得に使用する取出しアドレスを調整する。次に、デ
ータ移動動作の一環として、４クロックで正規化プロセ
スを実施することができる。Another approach, shown in FIG. 11, can be implemented for parallel arrays, which is particularly advantageous for machines where the processing elements of the array have their own memory. In this case, it is assumed that the data is in memory. The machine first determines the number of leading zero bytes. The machine then uses the count of leading 0 bytes to adjust the fetch address used to obtain the data. Then, as part of the data move operation, the normalization process can be performed in 4 clocks.

【０２３９】ＳＩＭＤマシンでは、固定した所定のサイ
クル数ですべての動作が実行されるようにすることが好
ましい。指数部の１カウントが小数部１ビットを表す従
来の浮動小数点を用いる場合は、小数部の先行０１個
につきシフトおよびカウント・プロセスが１回行われ
る。正規化の実行には０ないし３２サイクルが必要とな
る。各ピケットは、そのハードウェア内で小数部に対し
て適切なサイクル数だけ実行するはずであり、そのサイ
クル数はピケットごとに異なる。このプロセスを決定的
なものにするには、ＳＩＭＤマシン全体が最大（３２）
のサイクル数だけ実行しなければならず、多数のピケッ
トで大量の遊休時間が生じる。In a SIMD machine, it is preferable that all operations are executed in a fixed predetermined number of cycles. If a conventional floating point is used where one count in the exponent represents one bit in the fraction, then one shift and count process is performed for each leading 0 in the fraction. Performing normalization requires 0 to 32 cycles. Each picket should perform the appropriate number of cycles for its fraction in its hardware, and the number of cycles will vary from picket to picket. The entire SIMD machine has a maximum (32) to make this process deterministic.
Must be performed for a number of cycles, resulting in a large amount of idle time for many pickets.

【０２４０】本明細書に記載する発想は決定的なものに
なっており、次のように１２サイクルで実行される。The idea described here is decisive and is implemented in 12 cycles as follows.

【０２４１】１．最上位側から順に一時に１バイトず
つ、小数部をメモリから（メモリ内の）レジスタに移動
し、各バイトがすべて０であるかどうかを記録する。す
べて０の場合、そうなっているバイトの数をカウントす
る。この動作には４サイクルを要する。1. The fractional part is moved from the memory to the register (in the memory), one byte at a time from the most significant side, and whether or not each byte is all 0 is recorded. If all zeros, count the number of bytes that are. This operation requires 4 cycles.

【０２４２】２．ポインタが小数部の最上位バイトの
「アドレス」を含むと仮定する。このポインタに０バイ
トの数（０〜４）を加算して、このポインタが小数部の
最上位非ゼロ・バイトを指すようにする。この動作には
１サイクルを要する。2. Suppose the pointer contains the "address" of the most significant byte of the fraction. The number of 0 bytes (0-4) is added to this pointer so that it points to the most significant non-zero byte of the fractional part. This operation requires one cycle.

【０２４３】３．ポインタを使って、最上位非ゼロ・ビ
ットから順に、最下位側に０を含むバイトを埋め込みな
がら、小数部をメモリに格納し直す。この動作には４サ
イクルを要する。3. The pointer is used to store the fractional part back into memory, padding the bytes containing 0s at the least significant side, in order from the most significant non-zero bit. This operation requires 4 cycles.

【０２４４】４．次に指数部をカウント数だけ減分し、
メモリに格納し直す。この動作には３サイクルを要す
る。4. Next, decrement the exponent part by the count number,
Store it back in memory. This operation requires 3 cycles.

【０２４５】この動作をピケット内で実施される通りに
図に示す。This operation is shown in the figure as performed in the picket.

【０２４６】上記の手法によって正規化を行うには、ピ
ケット設計の複数の機能にある程度のローカル自律性を
与える必要がある。限られたＰＭＥ自律性またはピケッ
ト自律性を提供する機能には以下のものがある。In order to perform normalization by the above method, it is necessary to give a certain degree of local autonomy to a plurality of functions of picket design. Features that provide limited PME or picket autonomy include:

【０２４７】１．０の有無をテストし、状況に入れる。Test for a 1.0 and enter the situation.

【０２４８】２．状況が設定されている場合、カウンタ
を増分する。2. If the status is set, increment the counter.

【０２４９】３．レジスタ内のポインタ値をメモリ・イ
ンデックスとして使って、メモリへのデータ依存アクセ
スを提供する。本発明の好ましい実施例を説明してきた
が、現在と将来の両方において、当業者が、本発明の請
求の範囲に含まれる様々な改良と機能強化を行えること
が理解されよう。本発明の請求項は、最初に開示される
本発明に対する正当な保護を維持するために解釈されな
ければならない。3. Use the pointer value in the register as the memory index to provide data-dependent access to memory. While the preferred embodiment of the invention has been described, it will be appreciated that those skilled in the art, both now and in the future, may make various modifications and enhancements that fall within the scope of the claims of the invention. The claims of the present invention should be construed to maintain the proper protection for the invention first disclosed.

[Brief description of drawings]

【図１】従来技術の代表と見なすことのできる最近のＳ
ＩＭＤプロセッサの概略図である。1 is a recent S that can be regarded as representative of the prior art.
FIG. 6 is a schematic diagram of an IMD processor.

【図２】アレイ内の他のピケットとのバイト通信ができ
る、プロセッサ、メモリ、制御論理機構、及び連想メモ
リを備えた、シリコン・ベース上に構成された１対の基
本ピケット・ユニットを示す図である。FIG. 2 shows a pair of basic picket units configured on a silicon base with a processor, memory, control logic, and associative memory capable of byte communication with other pickets in the array. Is.

【図３】連想メモリの処理を示す図である。FIG. 3 is a diagram showing a process of an associative memory.

【図４】マイクロプロセッサ制御装置と、カン詰ルーチ
ン用のハードワイヤ接続式順序付け制御装置と、ピケッ
ト・アレイとを使用する、ＳＩＭＤサブシステム用の基
本１６（ｎ）ピケット構成を示す図である。ピケット・
アレイが形成するこの基本並列ピケット・プロセッサ・
システムは、独立型ユニットでもよい。FIG. 4 shows a basic 16 (n) picket configuration for a SIMD subsystem using a microprocessor controller, a hardwired sequencing controller for a canning routine, and a picket array. picket·
This basic parallel picket processor formed by the array
The system may be a stand alone unit.

【図５】図４の複数のピケット・プロセッサを組み込ん
だ多重ピケット・プロセッサ・システムを示す図であ
る。5 illustrates a multiple picket processor system incorporating the multiple picket processors of FIG.

【図６】サブシステムの機能ブロック図である。FIG. 6 is a functional block diagram of a subsystem.

【図７】図５のカードを含むサブシステム制御装置の配
置構成を示す図である。7 is a diagram showing a layout configuration of a subsystem control device including the card of FIG. 5;

【図８】本発明の好ましい実施例による、ＳＩＭＤマシ
ン内のピケットのローカル自律性のためのドーズ命令と
ディスエーブル命令の関係を示す図である。FIG. 8 illustrates the relationship between dose and disable instructions for picket local autonomy in a SIMD machine according to a preferred embodiment of the present invention.

【図９】本発明の好ましい実施例の、特定の浮動小数点
フォーマットを示す図である。FIG. 9 illustrates a particular floating point format of the preferred embodiment of the present invention.

【図１０】浮動小数点の調整を行うための諸ステップを
記載した図である。FIG. 10 illustrates steps for making floating point adjustments.

【図１１】本発明による、メモリをどのように使ってシ
フトの調整を行うかを示す図である。FIG. 11 illustrates how a memory may be used to make shift adjustments in accordance with the present invention.

───────────────────────────────────────────────────── フロントページの続き (72)発明者トマス・ノーマン・バーカーアメリカ合衆国13850、ニューヨーク州ヴェスタル、サンセット・アベニュー 136 (72)発明者ジェームズ・ウォレン・ディーフェンデルファーアメリカ合衆国13827、ニューヨーク州オウェゴ、フロント・ストリート 396 (72)発明者ピーター・マイケル・コッヘアメリカ合衆国13760、ニューヨーク州エンディコット、ドーチェスタ・ドライブ７ ─────────────────────────────────────────────────── ─── Continued Front Page (72) Inventor Thomas Norman Barker, USA 13850, Sunset Avenue, Vestal, NY 136 (72) Inventor James Warren Diefendelfer USA 13827, Owego, NY, Front Street 396 (72) Inventor Peter Michael Coghe Dorchester Drive, Endicott, NY 13760, USA 7

Claims

[Claims]

1. A plurality of array processing elements coupled as pickets for communicating data and instructions with each other,
Each picket has multiple mechanisms that allow it to have different execution capabilities, which enable each picket to acquire different modes for executing data within the picket, SIMD in the picket
A computer system capable of processing SIMD by an array of processors capable of executing data in parallel, which allows the commands to be interpreted.

2. Each picket has the plurality of mechanisms that allow each picket to operate in multiple modes, in which each processor of a SIMD array performs different operations based on local conditions. The computer according to claim 1, which is capable of and actually executes.
system.

3. A SIMIMD array processor mode is provided to allow at least one of the plurality of array processors to operate in SIMIMD mode, with elements in the processor array controlling each clock cycle. The computer system of claim 1, wherein commands are received and executed from a device, some of the commands being capable of being interpreted within each picket to result in different actions.

4. The computer system of claim 2, wherein the interpretation of SIMD commands within each picket can be controlled by one or more status latches or register bits within each picket.

5. The computer system of claim 2 wherein the interpretation of SIMD commands within each picket can be controlled by the data in one or more registers within each picket.

6. The computer system of claim 2, wherein the result status can be collected and sent to a controller of an array of processors.

7. The interpretation of SIMD commands based on picket status results in the picket being able to select one of two data sources for a portion of its operation, the data source being a picket register or a picket. Computer system according to claim 4, characterized in that it can be one or more external data buses.

8. The picket status-based interpretation of SIMD commands results in the picket having one of two picket memory locations for storing the result of an action within the picket.
5. It is possible to select one of them.
The computer system described in.

9. The interpretation of SIMD commands based on picket status results in the picket being able to select one of two data sources within the picket for transfer to an adjacent picket. A computer system according to claim 4.

10. An array of pickets can handle floating point calculations to handle a range of possible floating point numbers, the floating point processing array including a plurality of picket units, each picket. Combined with a local memory coupled to the processing elements for parallel processing of information within all picket units, such that the units are each adapted to perform one element of the processing utilizing the memory Computer according to claim 3, characterized in that it comprises a bit-parallel processing element.
system.