JP2869100B2

JP2869100B2 - Element processor of parallel computer

Info

Publication number: JP2869100B2
Application number: JP1266625A
Authority: JP
Inventors: 宏喜三浦
Original assignee: Sanyo Denki Co Ltd
Current assignee: Sanyo Denki Co Ltd
Priority date: 1989-10-13
Filing date: 1989-10-13
Publication date: 1999-03-10
Anticipated expiration: 2014-03-10
Also published as: JPH03127250A

Description

【発明の詳細な説明】（イ）産業上の利用分野本発明は、並列計算機の構成単位となる要素プロセッ
サ、特にデータ処理機能とプロセッサ間通信制御機能を
併せ持った高機能の要素プロセッサに関する。The present invention relates to an element processor as a constituent unit of a parallel computer, and more particularly to a high-performance element processor having both a data processing function and a communication control function between processors.

（ロ）従来の技術近年、実用的な並列処理計算機の実現に向けて研究が
進められており、特に、半導体技術の進歩に伴い、通信
制御部とデータ処理部をあわせたものを、１チップの要
素プロセッサLSIとして実現し、この要素プロセッサLSI
を多数個接続して、並列処理プロセッサを実現する研究
が多く見られる。(B) Conventional technology In recent years, research has been conducted toward the realization of a practical parallel processing computer. In particular, with the advance of semiconductor technology, a combination of a communication control unit and a data processing unit has been integrated into one chip. Of this element processor LSI
Many studies have been conducted to realize a parallel processing processor by connecting a large number of devices.

本願発明者は、情報処理学会第38回（平成元年前期）
論文集2T−２に開示されているように、１チップの要素
プロセッサLSIを、最大1024台接続した大規模並列デー
タ駆動計算機EDDEN（Enhanced Data Driven ENgine）の
開発を進めている。また、EDDENにおける並列処理装置
外部とのデータ通信方式に改良を加えたものとして本願
出願人は、既に特願平１−153276号に「並列計算機のデ
ータ通信システム」を出願している。The inventor of the present application is the IPSJ 38th (early 1989)
As disclosed in Collection of Papers 2T-2, the development of a large-scale parallel data driven computer EDDEN (Enhanced Data Driven ENgine) in which a maximum of 1024 one-chip element processor LSIs are connected is underway. The present applicant has already applied for a "data communication system for parallel computers" in Japanese Patent Application No. 1-153276 as an improvement of the data communication system with the outside of the parallel processing device in EDDEN.

（ハ）発明が解決しようとする課題しかるに、上述のような並列計算機のデータ通信シス
テムにおいては、並列処理装置を構成するための複数の
要素プロセッサはもちろんのこと、並列処理装置外部と
のデータの送受の仲介を行なうための入出力インタフェ
ースが必要になる。更に、並列処理装置の規模が大きく
なると、外部との入出力をも並列的に高速に行なうこと
が要求され、入出力インタフェースも複数個必要にな
る。(C) Problems to be Solved by the Invention However, in the data communication system of the parallel computer as described above, not only a plurality of element processors for configuring the parallel processing device, but also the data communication with the outside of the parallel processing device. An input / output interface for mediating transmission and reception is required. Further, when the scale of the parallel processing device becomes large, it is required that input and output with the outside be performed in parallel at high speed, and a plurality of input / output interfaces are required.

従って、システム全体の小型化のためには、要素プロ
セッサのLSI化とともに、入出力インタフェースをもLSI
化することが望ましい。しかるに、このように２種類の
LSIを開発するためには、多大な労力と時間を要すると
ともに、結果的にシステム全体のコストを上げてしまう
ことになる。Therefore, in order to reduce the size of the entire system, the input / output interface must be LSI, as well as the element processor.
Is desirable. However, these two types
Developing an LSI requires a great deal of labor and time, and consequently raises the cost of the entire system.

従って、本発明の目的は、並列計算機の構成単位であ
るプロセッサとしても、それらのプロセッサ群と外部と
の入出力インタフェースとしても動作しうる高機能の要
素プロセッサを提供することである。更に本発明の目的
は、１種類のハードウェアあるいはLSIの開発により並
列処理の単位プロセッサとしての機能と入出力インタフ
ェースとしての機能の両方を実現することにより、並列
処理システムの開発期間を大きく短縮し、開発、製造の
コストを低くするとともに、システムの小型化をも実現
することである。Accordingly, an object of the present invention is to provide a high-performance element processor that can operate as a processor that is a constituent unit of a parallel computer and also as an input / output interface between those processors and the outside. Furthermore, the object of the present invention is to greatly shorten the development period of a parallel processing system by realizing both the function as a unit processor of parallel processing and the function as an input / output interface by developing one kind of hardware or LSI. In addition to reducing the cost of development and manufacturing, it is also necessary to reduce the size of the system.

（ニ）課題を解決するための手段本発明の要素プロセッサは、少なくとも通信制御部と
データ処理部から成り、通信制御部には行方向及び列方
向夫々四方の隣接プロセッサとの通信のための４つの隣
接通信ポートと、データ処理部との通信のための内部通
信ポートを備える。(D) Means for Solving the Problems The element processor of the present invention comprises at least a communication control unit and a data processing unit, and the communication control unit has a communication control unit and a data processing unit. One adjacent communication port and an internal communication port for communication with the data processing unit.

各要素プロセッサは、プロセッサモードあるいはイン
タフェースモードのいずれかの動作モードで動作する。
プロセッサモードの要素プロセッサは行列配置され、上
述の４つの隣接通信ポートを介して各行方向のプロセッ
サ群および各列方向のプロセッサ群が循環的に結合され
たいわゆるトーラス結合網による並列処理装置を構成す
る。インタフェースモードの要素プロセッサは、上述の
行方向の結合線、あるいは列方向の結合線に、行方向、
あるいは列方向の２つの隣接通信ポートを用いて挿入さ
れ、残りの２つの隣接通信ポートは、並列処理システム
外部との接続に使用する。Each element processor operates in one of the processor mode and the interface mode.
The element processors in the processor mode are arranged in a matrix, and constitute a parallel processing device based on a so-called torus connection network in which processors in each row direction and processors in each column direction are cyclically connected via the above-described four adjacent communication ports. . The element processor in the interface mode applies the row direction, the row direction, or the column direction connection line to the row direction,
Alternatively, it is inserted using two adjacent communication ports in the column direction, and the remaining two adjacent communication ports are used for connection with the outside of the parallel processing system.

要素プロセッサ間あるいはシステム外部とプロセッサ
間の通信データには、行き先のプロセッサ番号、及びそ
の通信データが並列処理システム内部で処理されるもの
か、システム外部へ出力されるべきものかを示す外部フ
ラグを保持させる。The communication data between the element processors or between the processor and the outside of the system includes a destination processor number and an external flag indicating whether the communication data is to be processed inside the parallel processing system or to be output outside the system. Hold.

要素プロセッサの通信制御部に前記各通信ポートのい
ずれかを介して通信データが到着した時、要素プロセッサがプロセッサモードの場合、通信デー
タが保持する行き先プロセッサ番号と自身のプロセッサ
番号を比較し、両者が一致しなければ、データを、４つ
の隣接通信ポートのうちで行き先プロセッサまでの距離
が最も近くなるような通信ポートに選択的に出力し、両
者が一致し、かつ通信データの保持する外部フラグが、
データが並列処理装置の内部で処理されるべきであるこ
とを示しておれば、データを前記内部通信ポートを介し
て自身のプロセッサのデータ処理部に入力してデータ処
理せしめ、両者が一致し、かつデータの保持する外部フ
ラグが、データが並列処理装置の外部へ出力されるべき
であることを示している時は、データを自身のプロセッ
サでデータ処理せずに４つの隣接通信ポートのうちの定
められた通信ポートに出力する。When communication data arrives at the communication control unit of the element processor via any of the communication ports, when the element processor is in the processor mode, the destination processor number held by the communication data is compared with its own processor number. If they do not match, the data is selectively output to a communication port of the four adjacent communication ports such that the distance to the destination processor is shortest. But,
If it indicates that the data is to be processed inside the parallel processing device, the data is input to the data processing unit of its own processor via the internal communication port and the data is processed, and the two match, When the external flag held by the data indicates that the data is to be output to the outside of the parallel processing device, the data is not processed by its own processor, and the data is not processed by its own processor. Output to the specified communication port.

要素プロセッサがインタフェースモードでありかつ通
信データが並列処理装置のプロセッサから送信されたも
のである時には、データの保持する外部フラグがデータ
が並列処理装置内部で処理されるべきであることを示し
ておればデータをそのまま通過させて並列処理装置内部
へ戻し、データの保持する外部フラグが該データが並列
計算機の外部へ出力されるべきであることを示しておれ
ばデータを並列処理装置外部へ向けて分岐させる。When the element processor is in the interface mode and the communication data is transmitted from the processor of the parallel processing device, an external flag held by the data indicates that the data is to be processed inside the parallel processing device. If the data is passed as it is and returned to the inside of the parallel processing device, and if the external flag held by the data indicates that the data should be output to the outside of the parallel computer, the data is sent to the outside of the parallel processing device. Branch.

要素プロセッサがインタフェースモードでありかつ通
信データが並列処理装置の外部から送信されたものであ
る時は、通信データの保持するプロセッサ番号が所定の
範囲に収まっておればデータを並列処理装置内部に向け
て分岐させ、プロセッサ番号が所定の範囲に収まってい
なければデータをそのまま同方向に通過させる。When the element processor is in the interface mode and the communication data is transmitted from outside the parallel processing device, the data is directed to the inside of the parallel processing device if the processor number held by the communication data is within a predetermined range. If the processor number is not within the predetermined range, the data is passed in the same direction as it is.

（ホ）作用本発明の要素プロセッサを並列計算機の単位プロセッ
サ及び入出力インタフェース装置として用いれば、格子
状プロセッサ結合網にもとづいて並列処理装置内部のプ
セッサ間通信を最短距離で行なういわゆるセルフルーテ
ィング機能、演算結果としての結果パケットを並列プロ
セッサ外部（ホスト計算機など）に向けて分岐させて出
力する機能、及び並列セッサ外部から入力されるデータ
を所定のプロセッサ群に向けて分岐させて入力する機能
などが全て実現できる。(E) Function If the element processor of the present invention is used as a unit processor and an input / output interface device of a parallel computer, a so-called self-routing function for performing communication between processors within the parallel processing device at the shortest distance based on a grid-like processor connection network, The function of branching and outputting the result packet as the operation result to the outside of the parallel processor (such as a host computer) and the function of branching and inputting the data input from the outside of the parallel processor to a predetermined processor group are provided. All can be realized.

（ヘ）実施例第１図に本発明実施例としての高並列データ駆動計算
機のシステムを示し、第２図に要素プロセッサの構成を
示す。(F) Embodiment FIG. 1 shows a highly parallel data driven computer system as an embodiment of the present invention, and FIG. 2 shows the configuration of an element processor.

まず第２図の要素プロセッサ（PE）は、基本的にはプ
ログラム記憶部部（PS）、発火制御・カラー管理部（FC
CM）、命令実行部（EXE）、及びキューメモリ（Ｑ）が
巡回パイプライン（リング）構造に接続された構成であ
る。First, the element processor (PE) shown in FIG. 2 basically includes a program storage unit (PS), a firing control / color management unit (FC).
CM), an instruction execution unit (EXE), and a queue memory (Q) are connected in a cyclic pipeline (ring) structure.

プログラム記憶部（PS）はノード番号の更新、定数付
与、及び結果のコピーを行う。発火制御・カラー管理部
（FCCM）は、左右オペランドの待ち合わせ及びカラーの
獲得・解放の管理を行なう。命令実行部（EXE）は、浮
動小数点・整数演算、条件判定、分岐などの命令を実行
する。キュー（Ｑ）は、リング上でのあらゆるデータ流
変動を吸収する緩衝記憶である。The program storage unit (PS) updates the node number, assigns a constant, and copies the result. The firing control and color management unit (FCCM) manages the queuing of left and right operands and the acquisition and release of colors. The instruction execution unit (EXE) executes instructions such as floating-point / integer operation, condition determination, and branching. The queue (Q) is a buffer memory that absorbs any data flow fluctuations on the ring.

ベクトル演算制御部（VC）は、ベクトル演算関連命
令、及び外部データメモリアクセス命令の実行制御を行
う。外部データメモリ（EDM）は、構造体、ベクトルデ
ータ等を格納するメモリである。The vector operation control unit (VC) controls execution of a vector operation related instruction and an external data memory access instruction. The external data memory (EDM) is a memory for storing structures, vector data, and the like.

通信制御部（NC）は、東西南北４系統の通信ポートを
備え、最大1024プロセッサ（PE）のトーラス結合網に基
づくルーティング制御を行う。入力制御部（IC）は、通
信制御部からリングへのデータパケットの入力処理を行
う。出力制御部（OC）は、リングから通信制御部へのデ
ータパケットの出力処理を行う。ベクトル演算制御部
（VC）と、入力制御部（IC）及び出力制御部（OC）の間
には構造体（ベクトル）データ通信用のバイパス線を備
えている。The communication control unit (NC) has four communication ports of north, south, east and west, and performs routing control based on a torus connection network of up to 1024 processors (PE). The input control unit (IC) performs a process of inputting a data packet from the communication control unit to the ring. The output control unit (OC) performs a process of outputting a data packet from the ring to the communication control unit. A bypass line for structure (vector) data communication is provided between the vector operation control unit (VC), the input control unit (IC), and the output control unit (OC).

斯様な要素プロセッサ（PE）を多数用いたEDDENの基
本的な構成は第１図に示すようにｎ×ｎ台の要素プロセ
ッサをトーラス結合網で接続することを基本とする。該
トーラス結合網とは、多数のプロセッサを行列配置し、
各縦方向のプロセッサ群を循環的に結合する複数の縦通
信線と各横方向のプロセッサ群を循環的に結合する複数
の横通信線とで任意のプロセッサ間のデータ通信を可能
としたものである。The basic configuration of an EDDEN using a large number of such element processors (PE) is based on the connection of n × n element processors by a torus connection network as shown in FIG. With the torus connection network, a large number of processors are arranged in a matrix,
A plurality of vertical communication lines that cyclically connect each vertical processor group and a plurality of horizontal communication lines that cyclically connect each horizontal processor group enable data communication between arbitrary processors. is there.

本実施例システムでは、ネットワークと外部とのデー
タのやりとりは、ネットワークインタフェース（NIF）
を挿入して行う。このNIFとしては、要素プロセッサを
インタフェースモードに設定したものを用いる。要素プ
ロセッサをプロセッサモードとインタフェースモードの
２つの動作モードを有しており、要素プロセッサをプロ
セッサモードに設定したものをPEと呼び、要素プロセッ
サをインタフェースモードに設定したものをNIFと呼ぶ
ことにする。In the system of this embodiment, data exchange between the network and the outside is performed by a network interface (NIF).
And insert it. As this NIF, one in which the element processor is set to the interface mode is used. An element processor has two operation modes, a processor mode and an interface mode. An element processor set to the processor mode is called a PE, and an element processor set to the interface mode is called an NIF.

上述の構成のデータ駆動計算機で用いられるデータパ
ケットには、大別してプログラム実行に使用する実行パ
ケットとプログラム実行以外に使用する非実行パケット
があり、第４図（ａ）〜（ｅ）にその実例を示してい
る。パケット形式は、構造体データを保持したパケット
以外は固定長とし、プロセサ（PE）内のパイプラインリ
ング上では33ビット×２語、ネットワーク上（通信制御
部）においては18ビット×４語構成である。以下に、第
４図のパケットフォーマットにおける各フィールドの内
容について説明する。The data packets used in the data driven computer having the above configuration are roughly classified into an execution packet used for program execution and a non-execution packet used for other than program execution. FIGS. 4A to 4E show examples. Is shown. The packet format is fixed length except for the packet holding the structure data, and is 33 bits x 2 words on the pipeline in the processor (PE) and 18 bits x 4 words on the network (communication control unit). is there. The contents of each field in the packet format of FIG. 4 will be described below.

HD（1bit）:2語パケットの際の１語目（ヘッダ）と２語
目（テイル）の識別子。ヘッダの時“1"。HD (1 bit): An identifier of the first word (header) and the second word (tail) in a two-word packet. “1” when header.

EX（1bit）：パイプラインリングから通信制御部へ向け
て出力すべきパケットを識別するフラグ。EX (1 bit): A flag for identifying a packet to be output from the pipeline ring to the communication control unit.

MODE（2bit）：実行パケット、非実行パケット等のパケ
ットの種類を識別する識別コード。MODE (2 bits): An identification code for identifying the type of packet such as an execution packet and a non-execution packet.

Ｓ−CODE:MODEと合わせてパケットに対する処理を規定
する識別コード。S-CODE: An identification code that defines processing for a packet together with MODE.

OPCODE−Ｍ（5bit）及びOPCODE−Ｓ（6bit）：命令の種
類を識別する命令コード。OPCODE-M (5 bits) and OPCODE-S (6 bits): Instruction codes for identifying the type of instruction.

NODE＃（11bit）：データフローグラフのノード番号。NODE # (11bit): Node number of data flow graph.

COLOR（4bit）：カラーであって、サブブルーチンコー
ルによるプログラム共用など、同一データフローグラフ
を多重実行する際に環境を識別するための識別番号。COLOR (4bit): An identification number that identifies the environment when multiple executions of the same data flow graph are performed, such as sharing a program by a subroutine call.

DATA（32bit）：整数、浮動小数点数などの数値デー
タ。DATA (32bit): Numeric data such as integers and floating point numbers.

HT（1bit）：ネットワーク上のパケットでヘッダ、テイ
ルとその中間の語とを識別するフラグ RQ（1bit）：ネットワーク上を転送されるパケットに付
加するフラグで、ネットワーク上でデータが１語転送さ
れるたびに値が反転するため、語の存在を認識できる。
更に、値が反転することが、パケットを前方へ転送する
ための転送要求信号となる。また、HTフラグと合わせ
て、ヘッダとテイルとを識別できる。HT (1bit): Flag for identifying header, tail and intermediate words in a packet on the network. RQ (1bit): Flag added to the packet transmitted on the network. One word of data is transmitted on the network. Each time the value is inverted, the presence of the word can be recognized.
Further, the inversion of the value becomes a transfer request signal for transferring the packet forward. The header and the tail can be identified together with the HT flag.

ADDRESS（16bit）：各メモリのロード／ダンプなどの際
に、メモリアドレスを格納する。ADDRESS (16bit): Stores the memory address when loading / dumping each memory.

また、パイプラインリング上の入力制御部（IC）に
は、自身のプロセッサ番号を格納しておくためのプロセ
ッサ番号レジスタを備えている。第６図にプロセッサ番
号レジスタの構成を示す。PE番号Ｘは横方向（東西方
向）のPE番号（列番号）であり、PE番号Ｙは縦方向（南
北方向）のPE番号（行番号）である。両者を合わせて各
プロセッサを固有に識別するプロセッサ番号となる。The input control unit (IC) on the pipeline ring has a processor number register for storing its own processor number. FIG. 6 shows the configuration of the processor number register. The PE number X is a horizontal (east-west) PE number (column number), and the PE number Y is a vertical (south-north) PE number (row number). Together, the processor numbers uniquely identify each processor.

第６図に示すPEACTと称するフラグビットは、プロセ
ッサ番号が既に設定されているかどうかを示すフラグで
あり、設定されていなければ“0"であり、設定された時
に“1"となる。A flag bit called PEACT shown in FIG. 6 is a flag indicating whether or not the processor number has already been set, and is "0" if not set, and "1" when set.

通信制御部（NC）は、第４図（ｃ）及び同図（ｅ）の
如きパケットを通信ポートを介して受けとる。The communication control unit (NC) receives a packet as shown in FIGS. 4C and 4E via a communication port.

特殊動作モード（PEACT＝０）においては、通信制御
部は東西南北あらゆるポートから入力される全てのデー
タパケットを、自身へのパケットとみなして、パイプラ
インリングに入力し、識別コードによって指示される所
定の処理を行わしめる。この時、東西南北いずれかのポ
ートに、第６図に示したプロセッサ番号レジスタへのロ
ードを示す識別コードを持つ非実行パケットが到着する
と、通信制御部は、これをパイプラインリング上の入力
制御部（IC）に入力し、ここでプロセッサ番号レジスタ
に所定のプロセッサ番号がロードされるとともにPEACT
フラグが“1"にセットされる。このようにしてPEACTフ
ラグが“1"にセットされると該プロセッサの通信制御部
（NC）は、ノーマル動作モードで動作するようになる。In the special operation mode (PEACT = 0), the communication control unit regards all data packets input from all ports in the east, west, north and south as packets to itself, inputs them to the pipeline ring, and is indicated by the identification code. A predetermined process is performed. At this time, when a non-executable packet having an identification code indicating a load to the processor number register shown in FIG. 6 arrives at one of the east, west, north and south ports, the communication control unit sends this to the input control on the pipeline Unit (IC), where the processor number register is loaded with the specified processor number and PEACT
The flag is set to "1". When the PEACT flag is set to "1" in this manner, the communication control unit (NC) of the processor operates in the normal operation mode.

ノーマル動作モードに（PEACT＝１）おいては、通信
制御部は到着したパケットの１語目にあるPE＃（パケッ
トの行き先プロセッサ番号）と自身のプロセッサ番号レ
ジスタにセットされている自身のプロセッサ番号とを比
較して、両者が一致した時にのみ該パケットをパイプラ
インリングに入力し、一致しない時は、所定のルーティ
ングアルゴリズムに従って該パケットを東西南北いずれ
かのポートに出力して隣接するプロセッサに向けて転送
する。In the normal operation mode (PEACT = 1), the communication control unit sets the PE # (the destination processor number of the packet) in the first word of the arriving packet and the own processor number set in the own processor number register. The packet is input to the pipeline ring only when the two match, and when the two do not match, the packet is output to one of the east, west, north, south and north ports in accordance with a predetermined routing algorithm to be directed to the adjacent processor. Transfer.

第５図に、MODEによって識別されるパケットの種類を
示す。同図に示すように、MODE＝00を保持したパケット
は、ホスト計算機へ向けて出力される結果パケットとし
て識別される。FIG. 5 shows the types of packets identified by MODE. As shown in the figure, a packet holding MODE = 00 is identified as a result packet output to the host computer.

また、図示はしないが、本実施例の要素プロセッサに
は、外部よりプロセッサモードで動作するかインタフェ
ースモードで動作するかを規定する信号INSが入力され
る。INSが０の時は要素プロセッサは並列計算機の単位
プロセッサPEとして動作し、INSが１の時は要素プロセ
ッサは入出力インタフェースNIFとして動作する。Although not shown, a signal INS for specifying whether to operate in the processor mode or the interface mode is externally input to the element processor of the present embodiment. When INS is 0, the element processor operates as a unit processor PE of the parallel computer, and when INS is 1, the element processor operates as an input / output interface NIF.

本発明の要素プロセッサの主たる特徴は、上述の如く
INS信号に従って要素プロセッサがPEとしても、NIFとし
ても動作しうる点にある。The main features of the element processor of the present invention are as described above.
The element processor can operate as either a PE or an NIF according to the INS signal.

これを説明するために通信制御部の動作についてさら
に詳細に説明する。第３図に通信制御部（NC）の構成を
模式的に示す。同図に於て、（RWI）及び（RWO）は、西
（Ｗ）入出力ポートを構成する自己同期式の入力シフト
レジスタ及び出力シフトレジスタであり、４段の18ビッ
トレジスタからなる。同様に（REI）（REO）は東（Ｅ）
入出力ポート、（RNI）（RNO）は北（Ｎ）入出力ポー
ト、（RSI）（RSO）は南（Ｓ）入出力ポートを構成して
いる。また、○は合流回路、◎は分岐回路を示してい
る。To explain this, the operation of the communication control unit will be described in more detail. FIG. 3 schematically shows the configuration of the communication control unit (NC). In the figure, (RWI) and (RWO) are a self-synchronous input shift register and an output shift register constituting the west (W) input / output port, and are composed of four stages of 18-bit registers. Similarly, (REI) (REO) means east (E)
The input / output port, (RNI) (RNO) constitutes a north (N) input / output port, and (RSI) (RSO) constitutes a south (S) input / output port. ○ indicates a merging circuit, and 、 indicates a branch circuit.

第３図を用いて、通信制御部におけるルーティングア
ルゴリズムについて説明する。M1〜M5はそれぞれパケッ
トの合流回路であり、同図に示した番号の順に優先度を
つけて、到着したパケットを合流させる（番号１が最も
優先度が高い）。The routing algorithm in the communication control unit will be described with reference to FIG. Each of M1 to M5 is a packet merging circuit that assigns a priority in the order of the numbers shown in the figure and merges arriving packets (number 1 has the highest priority).

R1〜R5はそれぞれパケットの分岐回路であり、以下の
ようなアルゴリズムで処理を行う。R1 to R5 are packet branch circuits, respectively, which perform processing according to the following algorithm.

I.自分のプロセッサ番号（行番号、列番号）を（y,
x）、ネットワークの配列サイズをｐ×ｑ（p:縦方向、
q:横方向）、パケットの行き先プロセッサ番号を（Y,
X）とし、 Δｘ≡（Ｘ−ｘ）mod q,|Δx|≦q/2 Δｙ≡（Ｙ−ｙ）mod p,|Δy|≦p/2 とする。（modは、モジュロ演算を示す。） II.プロセッサ番号は、ＮからＳの方向に順にｙ＝０、１、２、・・・ｐＷからＥの方向に順にｘ＝０、１、２、・・・ｑとする。I. Change your processor number (row number, column number) to (y,
x), the array size of the network is p × q (p: vertical direction,
q: horizontal direction), specify the destination processor number of the packet as (Y,
X), and Δx≡ (X−x) mod q, | Δx | ≦ q / 2 Δy≡ (Y−y) mod p, | Δy | ≦ p / 2. (Mod indicates a modulo operation.) II. Processor numbers are in the order from N to S, y = 0, 1, 2,..., P in the direction from W to E. x = 0, 1, 2,. ... Q.

III.MODEはパケットのMODEフィールドの値を意味し、MO
DE＝00はホスト計算機行きのパケットであることを意味
する。III.MODE means the value of the MODE field of the packet, MO
DE = 00 means that the packet is destined for the host computer.

以下に分岐回路における処理について述べる。 The processing in the branch circuit will be described below.

（１）R1: MODE≠00かつ（PEACT＝０または（Δｙ＝０かつINS＝
０））の時、パケットをＰへ出力。(1) R1: MODE ≠ 00 and (PEACT = 0 or (Δy = 0 and INS =
At 0)), output the packet to P.

（MODE＝00かつPEACT＝１）かつ（Δｙ＝０またはINS
＝１）の時、パケットをＥに出力。(MODE = 00 and PEACT = 1) and (Δy = 0 or INS
= 1), the packet is output to E.

上記以外の時、パケットをＳへ出力。 Otherwise, output the packet to S.

（２）R2: MODE≠00かつ（PEACT＝０または（Δｘ＝０かつΔｙ
＝０かつINS＝０））の時、パケットをＰへ出力。(2) R2: MODE ≠ 00 and (PEACT = 0 or (Δx = 0 and Δy
= 0 and INS = 0)), the packet is output to P.

INS＝０かつPEACT＝１かつΔｘ＝０かつΔｙ＞０の
時、パケットをＳへ出力。When INS = 0, PEACT = 1, Δx = 0, and Δy> 0, output the packet to S.

INS＝０かつPEACT＝１かつΔｘ＝０かつΔｙ＜０の
時、パケットをＮへ出力。When INS = 0, PEACT = 1, Δx = 0, and Δy <0, output the packet to N.

上記以外の時、パケットをＷへ出力。 Otherwise, output the packet to W.

（３）R3: MODE≠00かつ（PEACT＝０または（Δｘ＝０かつΔｙ
＝０かつINS＝０））の時、パケットをＰへ出力。(3) R3: MODE ≠ 00 and (PEACT = 0 or (Δx = 0 and Δy
= 0 and INS = 0)), the packet is output to P.

（INS＝０かつPEACT＝１かつΔｘ＝０かつΔｙ＞０）
または（INS＝１かつPEACT＝１かつ行き先プロセッサ番
号が所定の範囲内）の時、パケットをＳへ出力。(INS = 0 and PEACT = 1 and Δx = 0 and Δy> 0)
Or, when (INS = 1 and PEACT = 1 and the destination processor number is within a predetermined range), output the packet to S.

上記以外の時、パケットをＥへ出力。 Otherwise, output the packet to E.

（４）R4: MODE≠00かつ（PEACT＝０または（Δｙ＝０かつINS＝
０））の時、パケットをＰへ出力。(4) R4: MODE ≠ 00 and (PEACT = 0 or (Δy = 0 and INS =
At 0)), output the packet to P.

（MODE＝00かつPEACT＝１）かつ（Δｙ＝０またはINS
＝１）の時パケットをＥへ出力。(MODE = 00 and PEACT = 1) and (Δy = 0 or INS
When = 1), the packet is output to E.

上記以外の時、パケットをＮへ出力。 Otherwise, output the packet to N.

（５）R5: Δｘ＝０かつΔｙ＞０の時、パケットをＳへ出力。(5) R5: When Δx = 0 and Δy> 0, output the packet to S.

Δｘ＝０かつΔｙ＜０の時、パケットをＮへ出力。 When Δx = 0 and Δy <0, output the packet to N.

Δｘ＜０の時、パケットをＷへ出力。 When Δx <0, output the packet to W.

以上の説明からわかるように、PEACT＝１かつINS＝０
の場合にプロセッサモードとして動作し、プロセッサ動
作モードにおいては、各プロセッサ通信制御部は、パケ
ットの行き先＝（Y,X）、各プロセッサのプロセッサ番
号＝（y,x）の時、Ｘ＝ｘでない限り、パケットをＷか
らＥへ、あるいはＥからＷへ転送する。Ｘ＝ｘであれ
ば、Ｙ＝ｙでない限りパケットをＮからＳへ、あるいは
ＳからＮへ転送する。さらに、ＷまたはＥのポートから
ＮまたはＳのポートにパケットを転送する時、あるいは
パイプラインリング内部からＷ、Ｅ、Ｎ、Ｓのいずれか
のポートにパケットを転送する時には、モジュロ演算に
よって、プロセッサ間距離が小さくなる方向が選択され
ることになり、常に最短距離でのパケット通信制御機能
（セルフルーティング機能）が実現されている。As can be seen from the above description, PEACT = 1 and INS = 0
In the processor operation mode, when the destination of the packet = (Y, X) and the processor number of each processor = (y, x), X is not x. As long as the packet is transferred from W to E or from E to W. If X = x, forward the packet from N to S or from S to N unless Y = y. Further, when transferring a packet from a port of W or E to a port of N or S, or transferring a packet from the inside of the pipeline ring to any port of W, E, N, or S, a modulo operation is performed by the processor. The direction in which the distance becomes smaller is selected, and the packet communication control function (self-routing function) at the shortest distance is always realized.

さらに、パケットが行き先のプロセッサに到着し、Δ
ｘ＝Δｙ＝０が検出されると、MODE≠00ならば行き先プ
ロセッサのパイプラインリングに入力されて処理され、
MODE＝00のホスト計算機行きのパケットであればパイプ
ラインリングには入力せずに特定の通信ポートに（Ｅに
到着したパケット以外は全てＥポートに）出力する。Further, the packet arrives at the destination processor, and Δ
When x = Δy = 0 is detected, if MODE ≠ 00, it is input to the pipeline of the destination processor and processed,
If the packet is destined for the host computer of MODE = 00, the packet is not input to the pipeline ring but is output to a specific communication port (all packets except the packet arriving at E are output to the E port).

また、PEACT＝１かつINS＝１の時はインタフェースモ
ードで動作し、Ｎポート、Ｓポートに到着したパケット
は、ホスト計算機行きのもの以外はそれぞれＳポート、
Ｎポートに向けて分岐させ、ホスト計算機行きのものは
Ｅポートに分岐させる。また、Ｗポートに到着したパケ
ットは、行き先プロセッサ番号が所定の範囲に収まって
いる時のみ、Ｓ方向に分岐させ、それ以外はＥポートに
向けて通過させる。When PEACT = 1 and INS = 1, it operates in the interface mode, and packets arriving at the N port and the S port are S ports except those destined for the host computer.
Branch to the N port, and branch to the host computer to the E port. The packet arriving at the W port is branched in the S direction only when the destination processor number is within a predetermined range, and the other packets are passed toward the E port.

以上がルーティングアルゴリズムの一例であるが、こ
れに限られるものではない。The above is an example of the routing algorithm, but the present invention is not limited to this.

次に、第７図に上述の説明の如き通信制御部を備えた
本発明の要素プロセッサを用いて構成した並列処理計算
機システムの例を示し、本発明実施例の更に詳細な説明
を行う。Next, FIG. 7 shows an example of a parallel processing computer system constituted by using the element processor of the present invention provided with the communication control unit as described above, and the embodiment of the present invention will be described in more detail.

第７図においてPE00〜PE33は、プロセッサモードに設
定された要素プロセッサであり、NIFはインタフェース
モードに設定された要素プロセッサである。同図におい
て、ホストコンピュータから出力されるデータは、ホス
トインタフェースにおいて第４図（ｃ）あるいは（ｅ）
の形式に変換され、入力線（IN）を介してNIFのＷポー
トに入力され、行き先プロセッサ番号が00〜33の範囲に
収まっていた時のみＳポートに分岐されてPE00のＷポー
トに到達する。また、プロセッサモードに設定されてい
る４×４のプロセッサ群内部では、前述のようなセルフ
ルーティング機能により、最短距離でのパケット通信を
行ないながら並列処理形式のプログラムを実行し、実行
が終了するといずれかのPEにおいてMODE＝00を有したホ
スト計算機行きのパケットが生成される。このパケット
は、PE03に向けて転送され、NIFのＮポートまたはＳポ
ートに到達する。NIFでは、MODE＝00のパケットをＥポ
ートに分岐させて出力線（OUT）を介してホストインタ
フェースに向けて出力する。In FIG. 7, PE00 to PE33 are element processors set to the processor mode, and NIF is an element processor set to the interface mode. In the figure, the data output from the host computer is shown in FIG. 4 (c) or (e) in the host interface.
Is input to the WIF port of the NIF via the input line (IN), and is branched to the S port only when the destination processor number is within the range of 00 to 33, and reaches the W port of PE00. . In the 4 × 4 processor group set in the processor mode, the parallel processing type program is executed while performing packet communication at the shortest distance by the self-routing function as described above. A packet destined for the host computer having MODE = 00 is generated in the PE. This packet is transferred to PE03 and reaches the N port or S port of the NIF. In the NIF, a packet of MODE = 00 is branched to an E port and output to a host interface via an output line (OUT).

次に、第８図に上述の説明の如き通信制御部を備えた
本発明の要素プロセッサを用いて構成した並列処理計算
機システムの別の例を示し、本発明実施例の更に詳細な
説明を行う。Next, FIG. 8 shows another example of the parallel processing computer system constituted by using the element processor of the present invention provided with the communication control unit as described above, and the embodiment of the present invention will be described in more detail. .

第８図においてPE00〜PE73は、プロセッサモードに設
定された要素プロセッサであり、NIF13、NIF33、NIF5
3、NIF73はインタフェースモードに設定された要素プロ
セッサである。同図において、ホストコンピュータから
出力されるデータは、ホストインタフェースにおいて第
４図（ｃ）あるいは（ｅ）の形式に変換され、入力線
（IN）を介してNIF73のＷポートに入力され、行き先プ
ロセッサ番号が60〜73の範囲に収まっていた時のみＳポ
ートに分岐されてPE60のＷポートに到達する。それ以外
はNIF53のＷポートにむけて通過させられる。同様に、P
E53のＷポートにおいては行き先プロセッサ番号が40〜5
3の範囲に収まっているデータのみをＳポートに分岐さ
せ、PE33のＷポートにおいては行き先プロセッサ番号が
20〜33の範囲に収まっているデータのみをＳポートに分
岐させ、PE13のＷポートにおいては行き先プロセッサ番
号が00〜13の範囲に収まっているデータのみをＳポート
に分岐させる。プロセッサモードに設定された要素プロ
セッサ群における動作は第７図において説明したものと
ほぼ同様である。In FIG. 8, PE00 to PE73 are element processors set to the processor mode, and include NIF13, NIF33, and NIF5.
3. NIF 73 is an element processor set to interface mode. In the figure, data output from the host computer is converted into the format shown in FIG. 4 (c) or (e) by the host interface, input to the W port of the NIF 73 via the input line (IN), and processed at the destination processor. Only when the number falls within the range of 60 to 73, the port is branched to the S port and reaches the W port of the PE 60. Otherwise, it is passed to the W port of NIF53. Similarly, P
In the E53 W port, the destination processor number is 40-5
Only data within the range of 3 is branched to the S port, and the destination processor number is
Only the data within the range of 20 to 33 is branched to the S port, and only the data having the destination processor number within the range of 00 to 13 is branched to the S port at the W port of PE13. The operation in the element processor group set to the processor mode is almost the same as that described in FIG.

以上の説明から、本発明の要素プロセッサによって、
格子状プロセッサ結合網にもとづいて並列処理装置内部
のプセッサ間通信を最短距離で行なういわゆるセルフル
ーティング機能、演算結果としての結果パケットを並列
プロセッサ外部（ホスト計算機）に向けて分岐させて出
力する機能、及び並列セッサ外部から入力されるデータ
を所定のプロセッサ群に向けて分岐させて入力する機能
などが全て実現できることがわかる。From the above description, by the element processor of the present invention,
A so-called self-routing function for performing communication between processors in the parallel processing device at the shortest distance based on a lattice-like processor connection network, a function of branching and outputting a result packet as a calculation result to the outside of the parallel processor (host computer), It can be seen that all the functions of branching data input from outside the parallel processor toward a predetermined processor group and inputting the data can be realized.

（ト）発明の効果以上の説明から明らかなように、本発明によって、並
列計算機の構成単位であるプロセッサとしても、それら
のプロセッサ群と外部との入出力インタフェースとして
も動作しうる高機能の要素プロセッサを提供することが
できる。更に本発明によって、１種類のハードウェアあ
るいはLSIの開発により並列処理の単位プロセッサとし
ての機能と入出力インタフェースとしての機能の両方を
実現することにより、並列処理システムの開発期間を大
きく短縮し、開発、製造のコストを低くするとともに、
システムの小型化をも実現することが可能となる。(G) Effects of the present invention As is apparent from the above description, according to the present invention, a high-performance element that can operate as a processor that is a constituent unit of a parallel computer and as an input / output interface between those processors and the outside. A processor can be provided. Furthermore, the present invention realizes both the function as a unit processor for parallel processing and the function as an input / output interface by developing one kind of hardware or LSI, thereby greatly shortening the development period of a parallel processing system, , While reducing manufacturing costs,
It is also possible to reduce the size of the system.

[Brief description of the drawings]

第１図は本発明の要素プロセッサによって構成された並
列計算機を示すシステム図、第２図は本発明のプロセッ
サの概略構成を示すブロック図、第３図は本発明のプロ
セッサの要部の模式図、第４図（ａ）乃至（ｅ）はパケ
ットの構成図、第５図はパケットの識別コードの一部を
示す対応図、第６図は本発明のプロセッサ内部のプロセ
ッサ番号レジスタの構成図、第７図は本発明の要素プロ
セッサを用いて構成した並列計算機の別の構成図、第８
図は本発明の要素プロセッサを用いて構成した並列計算
機の更に別の構成図である。（PE）……プロセッサ、（PS）……プログラム記憶部、
（EXE）……命令実行部、（NC）……通信制御部、（NI
F）……ネットワークインタフェイス。FIG. 1 is a system diagram showing a parallel computer constituted by element processors of the present invention, FIG. 2 is a block diagram showing a schematic configuration of the processor of the present invention, and FIG. 3 is a schematic diagram of a main part of the processor of the present invention. 4 (a) to 4 (e) are diagrams showing the structure of a packet, FIG. 5 is a correspondence diagram showing a part of the identification code of the packet, FIG. 6 is a diagram showing the structure of a processor number register inside the processor of the present invention, FIG. 7 is another block diagram of a parallel computer configured using the element processor of the present invention.
The figure is a further configuration diagram of a parallel computer configured using the element processor of the present invention. (PE): Processor, (PS): Program storage,
(EXE) Command execution unit, (NC) Communication control unit, (NI
F)… Network interface.

Claims

(57) [Claims]

A plurality of processors are arranged in rows and columns, and a plurality of vertical communication lines for cyclically connecting each vertical processor row and a plurality of horizontal communication lines for cyclically connecting each horizontal processor row. A parallel processing device for performing data communication between the processors, and further, inserted into at least one of the plurality of vertical communication lines or the plurality of horizontal communication lines to mediate transmission and reception of data with the outside of the parallel processing device. An element processor serving as a constituent unit of a parallel computer system having at least one input / output interface for performing, the element processor having at least two types of operation modes of a processor operation and an interface operation. Operates as a processor which is a structural unit of the parallel processing device configured in the matrix mode, and operates in an interface operation mode. Operates as an input / output interface that mediates the transmission and reception of data to and from the outside of the parallel processing device. And four internal communication ports for communication with the data processing unit. The communication control unit of the element processor communicates with the communication data through one of the communication ports. When the element processor is in the processor mode, the destination processor number held by the communication data is compared with its own processor number. If the two do not match, the data is replaced by
Of the four adjacent communication ports, the communication port selectively outputs to the communication port whose distance to the destination processor is the shortest, and the two match, and the external flag held in the communication data indicates that the data is processed in parallel. If it is indicated that the data is to be processed inside the device, the data is input to the data processing unit of its own processor through the internal communication port, and the data is processed. When the external flag held by indicates that the data is to be output to the outside of the parallel processing device, the data is not processed by its own processor, and the data is not processed by its own processor. Output to the specified communication port, wherein the element processor is in the interface mode and the communication data is transmitted from the processor of the parallel processing device. When the external flag held by the data indicates that the data is to be processed inside the parallel processing device, the data is passed through as it is and returned to the parallel processing device, and the data is held. If the external flag indicates that the data is to be output outside the parallel computer, the data is branched out of the parallel computer; the element processor is in the interface mode; One adjacent communication port is used for coupling to the parallel processing device, and two adjacent communication ports in the row direction are used for connection with the outside of the parallel processing device. When communication data from the outside of the parallel processing device arrives via the CPU, if the value of the destination processor number held by the data is within a predetermined range, the data is queued. And the data is transferred to the parallel processing device, and if the value of the destination processor number does not fall within a predetermined range, the data is passed to the other communication port in the row direction. Element processor of parallel computer.

2. A plurality of processors are arranged in rows and columns, and a plurality of vertical communication lines cyclically connecting each vertical processor row and a plurality of horizontal communication lines cyclically connecting each horizontal processor row. A parallel processing device for performing data communication between the processors, and further, inserted into at least one of the plurality of vertical communication lines or the plurality of horizontal communication lines to mediate transmission and reception of data with the outside of the parallel processing device. An element processor serving as a constituent unit of a parallel computer system having at least one input / output interface for performing, the element processor having at least two types of operation modes of a processor operation and an interface operation. Operates as a processor which is a structural unit of the parallel processing device configured in the matrix mode, and operates in an interface operation mode. Operates as an input / output interface for mediating data transmission / reception with the outside of the parallel processing device, wherein the element processor is in an interface mode, and two adjacent communication ports in the column direction are connected to the parallel processing device. When two adjacent communication ports in the row direction are used for connection with the outside of the parallel processing device, communication data from outside the parallel processing device through one communication port in the row direction is used. Upon arrival, if the value of the destination processor number held by the data falls within a predetermined range, the data is branched to a communication port in the column direction and transferred to the parallel processing device, and the value of the destination processor number is obtained. If the data does not fall within a predetermined range, the data is passed to the other communication port in the row direction.