JPH01194055A

JPH01194055A - Parallel computer

Info

Publication number: JPH01194055A
Application number: JP63017070A
Authority: JP
Inventors: Naoki Hamanaka; 濱中　直樹; Teruo Tanaka; 輝雄田中
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1988-01-29
Filing date: 1988-01-29
Publication date: 1989-08-04

Abstract

PURPOSE:To reduce overhead generated at the time of transferring data between processor elements by inspecting the contents of a tag added to a word and repeating the inspection of the word until the tag express validity. CONSTITUTION:When processor elements 1-1-1-n write data in a word of a local memory 6 of another processor element, the contents of a tag added to the word are processed by a validating means so as to express 'data are valid'. On the other hand, the processor element for reading out the data waits until the contents of the tag added to the word express 'data are valid' by a tag access means prior to reading of the word and reads out the data stored in the word so that the contents of the tag express 'data are invalid' by an invalidating means 33. Consequently, overhead generated at the time of transferring data between the processors can be reduced.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は並列計算機の構成法に関する。[Detailed description of the invention] [Industrial application field] The present invention relates to a method for configuring a parallel computer.

[Conventional technology]

従来、ローカルメモリを持ち、複数のプロセッサエレメ
ントから構成され、他のプロセッサエレメントのローカ
ルメモリにアクセス可能な並列計算機においては、ある
プロセッサエレメントが他のプロセッサエレメントのロ
ーカルメモリを介してデータを受は渡すときに、データ
の参照順序を保証するためには渡す側のプロセッサエレ
メントがデータをローカルメモリに書き込んだ後に、こ
のデータを読み出すプロセッサエレメントに対して割り
込みかけるようになっていた。この種の装置として関連
するものには、例えば、アイ・イー・イー・イー、１９
８５インターナシヨナル・コンファレンス・オン・パラ
レル・プロセッシング予稿集第７８２頁から７８８頁（
ＩＥＥＥ、ＰＲＯＣＥＥＤＩＮＧＳ　　ＯＦ　　ＴＨＥ
１９８５　　ＩＮＴＥＲＮＡＴＩＯＮＡＬＣＯＮＦＥＲ
ＥＮＣＥ　　ＯＮ　　ＰＡＲＡＬＬＥＬＰＲＯＣＥＳＳ
ＩＮＧ　　ｐｐ、７８２−７８８）において論じられて
いる装置があげられる。Conventionally, in a parallel computer that has local memory, is composed of multiple processor elements, and can access the local memory of other processor elements, one processor element receives and passes data via the local memory of another processor element. Sometimes, in order to guarantee the order in which data is referenced, the passing processor element writes the data to local memory and then interrupts the processor element reading the data. Related devices of this type include, for example, I.E.E., 19
Proceedings of the 85th International Conference on Parallel Processing, pp. 782-788 (
IEEE, PROCEEDINGS OF THE
1985 INTERNATIONAL CONFER
ENCE ON PARALLEL PROCESS
ING pp, 782-788).

一方、共有メモリで結合され、複数のプロセッサエレメ
ントからなる並列計算機においては、あるプロセッサエ
レメントが他のプロセッサエレメントに共有メモリを介
してデータを受は渡すときに、共有メモリの客語にタグ
を設けておき、このタグでその語の内容が書き込み済み
（データが有効）か、未書き込み（データが無効）かを
表わすようにした装置がある。例えば、リアルタイム・
シグナルプロセッシングＩ　Ｖ、　Ｖｏｌ、２９８（１
９８１年８月）第２４１頁から第２４８頁（ＲＥＡＬ−
Ｔ　Ｉ　ＭＥ　　Ｓ　Ｉ　ＧＮＡＬＰＲＯＣＥＳＳＩＮ
Ｇ　　ＩＶ、Ｖｏｌ、２９８（Ａｕｇ　１９８１）　、
ｐｐ、２４１−２４８において論じられている装置がこ
れに相当する。On the other hand, in a parallel computer consisting of multiple processor elements connected by a shared memory, when one processor element receives or passes data to another processor element via the shared memory, a tag is set in the colloquy of the shared memory. There is a device that uses this tag to indicate whether the content of the word has been written (data is valid) or not written (data is invalid). For example, real-time
Signal Processing IV, Vol, 298(1)
August 981) pages 241 to 248 (REAL-
TI ME SI GNAL PROCESSIN
G IV, Vol. 298 (Aug 1981),
This is the device discussed in pp. 241-248.

[Problem to be solved by the invention]

上記従来技術のうち前者については、ローカルメモリを
用いて計算を実行するようになっているため、並列計算
機を構成するプロセッサエレメントの数が増大してもロ
ーカルメモリから演算器へのデータ供給能力はプロセッ
サエレメント数に比例して増大するので高い性能を得ら
れるが、あるプロセッサエレメントから他のプロセッサ
エレメントへローカルメモリを介したデータの受は渡し
を行なうときには、割り込み処理が介在することになり
、そのオーバヘッドが並列計算機の性能を著しく低下さ
せる恐れがある。The former of the above conventional technologies uses local memory to execute calculations, so even if the number of processor elements that make up a parallel computer increases, the ability to supply data from local memory to arithmetic units is limited. High performance can be obtained because the number increases in proportion to the number of processor elements, but when receiving and passing data from one processor element to another through local memory, interrupt processing is involved, and the The overhead may significantly reduce the performance of parallel computers.

一方、上記従来技術のうちの後者については、割り込み
処理のオーバヘッドはないものの共有メモリを用いて複
数のプロセッサエレメントを結合するため、並列計算機
の性能を向上するために多くのプロセッサエレメントを
設置することは困難であり、それゆえ高性能の並列計算
機を実現することは極めて困難である。On the other hand, in the latter of the above conventional technologies, although there is no interrupt processing overhead, multiple processor elements are connected using shared memory, so it is necessary to install many processor elements in order to improve the performance of a parallel computer. Therefore, it is extremely difficult to realize a high-performance parallel computer.

本発明の目的は、ローカルメモリを持つ複数のプロセッ
サエレメントからなり、他のプロセッサエレメントから
ローカルメモリにデータを書き込める並列計算機におい
て、プロセッサエレメント間でデータを受は渡すときに
発生するオーバヘッドを削減することにある。An object of the present invention is to reduce the overhead that occurs when receiving and passing data between processor elements in a parallel computer that is composed of a plurality of processor elements each having a local memory and in which data can be written to the local memory from other processor elements. It is in.

[Means to solve the problem]

上記目的は、ローカルメモリを有し、独立に動作可能な
複数のプロセッサエレメントとこれらを結合するネット
ワークからなり、複数のプロセソサエレメントのそれぞ
れが他のプロセッサエレメントのローカルメモリの中に
データを書き込むことができる並列計算機において、各
プロセッサエレメントの有するローカルメモリの語にタ
グを設け、語につけられたタグはその語の保持するデー
タが有効か無効かを表わすようにし、任意のプロセッサ
エレメントから語にデータを書き込むときにその語に付
されたタグの内容が「データ有効」を表わすように設定
する有効化手段と、タグの内容が「データが有効」を表
わすまで検査を続けるタグアクセス手段と、語を保持す
るプロセッサエレメントからその語を読み出すときにそ
の語に付されたタグの内容が「データが無効」を表わす
ように設定する無効化手段を設けることによって達成さ
れる。The above purpose is to consist of a plurality of processor elements that have local memories and can operate independently, and a network that connects them, and that each of the plurality of processor elements writes data into the local memory of other processor elements. In a parallel computer that is capable of an enabling means for setting the content of a tag attached to a word to indicate "data valid" when writing a word; a tag access means for continuing checking until the content of the tag indicates "data is valid"; This is achieved by providing invalidation means that sets the content of the tag attached to the word to indicate "data is invalid" when the word is read from the processor element holding the word.

さらに別の実施態様で上記目的を達成することもできる
。上記目的は、ローカルメモリを有し、独立に動作可能
な複数のプロセッサエレメントからなる並列計算機にお
いて、各プロセッサエレメントのそれぞれに任意のプロ
セッサエレメントから書き込みか可能な受信メモリを設
け、受信メモリを構成する語にタグを設け、このタグを
上述の実施態様におけるタグと同じように意味づけ、受
信メモリに対して上述の実施態様における有効化手段と
タグアクセス手段と無効化手段を設けることによっても
達成される。Further embodiments may also achieve the above objectives. The above purpose is to configure a receiving memory by providing a receiving memory in each processor element that can be written to by any processor element in a parallel computer consisting of a plurality of processor elements that have local memory and can operate independently. This can also be achieved by providing a tag for the word, giving this tag the same meaning as the tag in the embodiment described above, and providing the receiving memory with the enabling means, the tag access means, and the disabling means in the embodiment described above. Ru.

[Effect]

第１の実施態様においてはあるプロセッサエレメントが
他のプロセッサエレメントのローカルメモリ中の語にデ
ータを書き込むときに、第２の実施態様においてはある
プロセッサエレメントが他のプロセッサエレメントの受
信メモリ中の語にデータを書き込むときに、上記有効化
手段によってその語に付けられているタグの内容が「デ
ータが有効」を表わすようにしておく。一方、このデー
タを読み出すプロセッサエレメントは、語を読み出す前
に上記タグアクセス手段によってその語に付けられたタ
グの内容が「データが有効」を表わすまで待ち、それか
らその語に格納されたデータを読み出し、上記無効化手
段によってタグの内容　　゛が「データが無効」を表わ
すようにする。In a first embodiment, when a processor element writes data to a word in the local memory of another processor element, in a second embodiment, a processor element writes data to a word in the receive memory of another processor element. When writing data, the content of the tag attached to the word by the validation means is made to indicate "data is valid." On the other hand, the processor element that reads this data waits until the content of the tag attached to the word by the tag access means indicates "data is valid" before reading the word, and then reads the data stored in the word. , the content of the tag is made to represent "data is invalid" by the above-mentioned invalidation means.

以上のようにすることで、プロセッサエレメント間のデ
ータ転送を正しく、かつ、効率よ〈実施することができ
る。By doing as described above, data transfer between processor elements can be performed correctly and efficiently.

〔Example〕

以下、本発明の詳細な説明する。 The present invention will be explained in detail below.

〈第１の実施例〉第１の実施例を図によって説明する。まず、本実施例に
係る並列計算機の概要を第１図にて説明する。第１図に
おいて１−１ないし１−　ｎはｎ台の独立に動作可能な
プロセッサエレメントである。<First Example> The first example will be described with reference to the drawings. First, an overview of the parallel computer according to this embodiment will be explained with reference to FIG. In FIG. 1, 1-1 to 1-n are n independently operable processor elements.

２はネットワークであり、１−１ないし１−ｎの任意の
プロセッサエレメントから発せられるデータの送信要求
を受け、データを指定された任意のプロセッサエレメン
トへ転送する。A network 2 receives a data transmission request issued from any processor element 1-1 to 1-n and transfers the data to a specified processor element.

次に、プロセッサエレメント１−１ないし１−ｎの構成
を説明する。プロセッサエレメント１−１ないし１−ｎ
は同一の構成になっているが、第１図では簡単のため１
−１のみの内部を示しである。プロセッサエレメント１
−１は、プロセッサ３、受信器４、送信器５、ローカル
メモリ６から構成される。プロセッサ３の詳細は第２図
に示しである。第２図において、３０は命令フェッチ回
路、３２はＡＬＵ、３３は無効化回路、３４は汎用レジ
スタ群、３５はプログラムカウンタである。Next, the configuration of processor elements 1-1 to 1-n will be explained. Processor element 1-1 to 1-n
have the same configuration, but in Figure 1, 1 is used for simplicity.
Only the inside of -1 is shown. Processor element 1
-1 is composed of a processor 3, a receiver 4, a transmitter 5, and a local memory 6. Details of the processor 3 are shown in FIG. In FIG. 2, 30 is an instruction fetch circuit, 32 is an ALU, 33 is an invalidation circuit, 34 is a general-purpose register group, and 35 is a program counter.

３６は命令レジスタであり、命令コードを格納するフィ
ールドである３６−１と、オペランドを格納するフィー
ルドである３６−２．３６−３゜３６−４に分かれてい
る。３７は命令解読制御で、命令の解読とその実行の制
御を行なう。３８はメモリアクセス回路であり、後述の
ＲＥＣＥＩＶＥ命令を実行するときに用いる。また、３
１はＯＲ回路、３００はＡＮＤ回路である。プロセッサ
３はいわゆるノイマン型計算機であるが、本実施例のた
めに通常のノイマン型計算機の命令セット（メモリリー
ド、メモリライト、演算命令など）の他に２つの新設命
令を実行できるようになっている。新設命令については
後述する。Reference numeral 36 denotes an instruction register, which is divided into a field 36-1 for storing an instruction code and a field 36-2, 36-3, and 36-4 for storing operands. Reference numeral 37 denotes an instruction decoding control, which decodes instructions and controls their execution. 38 is a memory access circuit, which is used when executing a RECEIVE instruction, which will be described later. Also, 3
1 is an OR circuit, and 300 is an AND circuit. The processor 3 is a so-called Neumann type computer, and for this embodiment, it is designed to be able to execute two new instructions in addition to the normal von Neumann type computer instruction set (memory read, memory write, arithmetic instructions, etc.). There is. The new order will be discussed later.

第１図のローカルメモリ６の客語には語を単位としてア
ドレスが付けられている。また、客語にはデータを格納
するデータ部の他に１ビツトずつのタグを格納するタグ
部がある。その使用方法は後で説明する。ローカルメモ
リ６のデータ部はプロセッサ３の実行するプログラムと
、そのプログラムで使用するデータを格納する。ローカ
ルメモリ６には、線Ｌ３をアドレス入力に、線Ｌ４をデ
ータ入力とデータ書き込み要求信号に、線Ｌ５をタグ入
力とタグ書き込み要求信号にする第１のポート、線Ｌ６
をアドレス入力に、線Ｌ７をタグ入力とタグ書き込み要
求信号にする第２のポート、線Ｌ８をアドレス入力とタ
グおよびデータ読み出し要求信号に、線Ｌ９をタグ出力
に、線ＬＩＯをデータ入出力とデータ書き込み要求信号
にする第３のポート、線Ｌ１２をアドレス入力とデータ
読み出し要求信号に、線Ｌ１３をデータ出力信号にする
第４のポートがある。第１のポート、第２のポート、第
３のポートからの要求が２つ以上同時に到着したときに
は、ローカルメモリ６が適当に調停して要求に応答する
。Addresses are assigned to the guest words in the local memory 6 in FIG. 1 in units of words. Furthermore, in addition to the data part that stores data, the customary language has a tag part that stores one-bit tags. How to use it will be explained later. The data section of the local memory 6 stores programs executed by the processor 3 and data used by the programs. Local memory 6 has a first port, line L6, which uses line L3 as address input, line L4 as data input and data write request signal, and line L5 as tag input and tag write request signal.
to address input, line L7 to tag input and tag write request signal, line L8 to address input and tag and data read request signal, line L9 to tag output, line LIO to data input/output. There is a third port that uses the line L12 as a data write request signal, a fourth port that uses line L12 as an address input and data read request signal, and line L13 as a data output signal. When two or more requests from the first port, second port, and third port arrive at the same time, the local memory 6 arbitrates appropriately and responds to the requests.

送信器５はプロセッサエレメント１−１から他のプロセ
ッサエレメント１−１ないしｌ　−ｎのローカルメモリ
にデータを書き込むときに用いる装置であり、その中に
レジスタ５０を有する。レジスタ５０は、フィールド５
０−１．５０−２゜５０−３に分かれており、それぞれ
あて先、アドレス、データを格納するようになっている
。The transmitter 5 is a device used to write data from the processor element 1-1 to the local memories of the other processor elements 1-1 to l-n, and has a register 50 therein. Register 50 has field 5
It is divided into 0-1.50-2.50-3, each storing a destination, address, and data.

受信器４は、プロセッサエレメント１−１ないし１−　
ｎがプロセッサエレメント１　１のローカルメモリ６に
データを書き込むときに用いる装置であり、その中にレ
ジスタ４０を有する。レジスタ４０は、フィールド４０
−１．４０−２に分かれており、それぞれアドレス、デ
ータを格納するようになっている。４１は書き込み制御
で、線Ｌ１からレジスタ４０にデータが書き込まれると
線Ｌ５にタグの値の１と、タグ書き込み要求信号を出力
する。The receiver 4 includes processor elements 1-1 to 1-
n is a device used when writing data to the local memory 6 of the processor element 11, and has a register 40 therein. Register 40 has field 40
-1, 40-2, each of which stores addresses and data. 41 is a write control, and when data is written into the register 40 from line L1, a tag value of 1 and a tag write request signal are output to line L5.

プロセッサエレメント１−１の動作を第１図。FIG. 1 shows the operation of the processor element 1-1.

第２図を用いて説明する。まずプロセッサ３の中の命令
フェッチ回路３０が線Ｌ１２よりプログラムカウンタ３
５の内容とデータ読み出し要求信号をローカルメモリ６
の第４のポートに出力する。This will be explained using FIG. First, the instruction fetch circuit 30 in the processor 3 connects the program counter 3 to the line L12.
5 and the data read request signal to the local memory 6.
output to the fourth port of.

するとローカルメモリ６が読み出され、その内容が線Ｌ
１３に出力されるのでこれが命令レジスタ３６にセット
される。命令解読制御３７は、命令レジスタ３６にセッ
トされた命令のうちフィールド３６−１に格納された命
令コードの値を解読し。Then, the local memory 6 is read out and its contents are shown on line L.
13, this is set in the instruction register 36. The instruction decoding control 37 decodes the value of the instruction code stored in the field 36-1 of the instruction set in the instruction register 36.

その命令で指定された動作を実現するための信号をプロ
セッサ３の内部に配り、ＡＬＵ３２．汎用レジスタ群３
４などを動作させる。命令で指定された動作が終了する
と、命令解読制御３７は線Ｌ３０２よりプログラムカウ
ンタ３５の値を更新し以上の動作を繰り返すようになっ
ている。A signal for realizing the operation specified by the instruction is distributed inside the processor 3, and the ALU 32. General-purpose register group 3
4 etc. When the operation specified by the instruction is completed, the instruction decoding control 37 updates the value of the program counter 35 from line L302 and repeats the above operation.

ネットワーク２の動作を説明する。ネットワーク２は、
送信器５内のレジスタ５０に値がセットされると、フィ
ールド５０−１で示されるプロセッサエレメント１　　
ｊ　　Ｎ”Ｉｔ　２＋　”・＋　ｎ）の中にある受信器
４の中のレジスタ４００フィールド４０−１．４０−２
のそれぞれにフィールド５０−２．５０−３の値を転送
する。The operation of network 2 will be explained. Network 2 is
When a value is set in the register 50 in the transmitter 5, the processor element 1 indicated by the field 50-1
register 400 field 40-1.40-2 in receiver 4 in
The values of fields 50-2 and 50-3 are transferred to each of the fields 50-2 and 50-3.

続いて新設した命令について説明する。まず、５ＥＮＤ
命令を説明する。５ＥＮＤ命令は、この命令を実行した
プロセッサエレメントの持つデータを他のプロセッサエ
レメントの中にあるローカルメモリに書き込むための命
令である。第４図に５ＥＮＤ命令のフォーマットを示す
、５ＥＮＤ命令はオペランドを３つ持つ。Next, I will explain the newly established commands. First, 5END
Explain the command. The 5END instruction is an instruction for writing data possessed by the processor element that executed this instruction into a local memory in another processor element. FIG. 4 shows the format of the 5END instruction. The 5END instruction has three operands.

１、あて先２、アドレス３、データ各オペランドは、それぞれ命令フォーマットのＲ１、Ｒ
２，Ｒ３フィールドで指定される汎用レジスタに格納さ
れている。この命令は、第１オペランドで指定されるプ
ロセッサエレメントのローカルメモリの、第２オペラン
ドで指定されるアドレスに第３オペランドで指定される
データを書き込むことを意味する。この命令を実行する
とき、本実施例の並列計算機は次のように動作する。1, destination 2, address 3, data operands are R1 and R of the instruction format, respectively.
2, stored in the general-purpose register specified by the R3 field. This instruction means writing data specified by the third operand to the address specified by the second operand in the local memory of the processor element specified by the first operand. When executing this instruction, the parallel computer of this embodiment operates as follows.

まず、プロセッサ３の命令解読制御３７は命令レジスタ
３６のフィールド３６−２．３６−３゜３６−４に格納
されている値（レジスタ番号）を、それぞれ線Ｌ３０５
．線Ｌ３０６．線Ｌ３０７にて汎用レジスタ群３４に伝
える。その結果、第１゜第２．第３オペランドは線Ｌ３
１５を経由し、命令解読制御３７が発生し線Ｌ３０９か
ら伝えられる書き込み要求信号とともに線Ｌｌｌに出力
され、それぞれレジスタ５０のフィールド５０−１゜５
０−２．５０−３にセットされる。以上で５ＥＮＤ命令
は終了するが、これに引き続き以下の動作が行なわれる
。First, the instruction decoding control 37 of the processor 3 inputs the values (register numbers) stored in the fields 36-2, 36-3, and 36-4 of the instruction register 36 to the lines L305 and 36-4, respectively.
．． Line L306. It is transmitted to the general-purpose register group 34 via line L307. As a result, 1st degree, 2nd degree. The third operand is line L3
15, the instruction decoding control 37 is generated and output to the line Lll together with the write request signal transmitted from the line L309, and the fields 50-1 and 5 of the register 50 are respectively output.
Set to 0-2.50-3. This completes the 5END command, but the following operations are subsequently performed.

レジスタ５０に値がセットされると、既に述べたネット
ワーク２の動作によってフィールド５〇−１で指定され
るプロセッサエレメントのレジスタ４０−１．４０−２
にそれぞれレジスタ５〇−２，５０−３の内容がセット
される。するとデータがセットされたプロセッサエレメ
ントの受信器４は、ローカルメモリ６の第１のポートか
ら４０−１で指定される語のデータ部に４０−２の内容
を、その語のタグ部に値１　（その語の内容が有効であ
ることを表わす値）を書き込む。When a value is set in the register 50, the register 40-1, 40-2 of the processor element specified by the field 50-1 is
The contents of registers 50-2 and 50-3 are set respectively. Then, the receiver 4 of the processor element to which the data has been set transfers the contents of 40-2 from the first port of the local memory 6 to the data field of the word specified by 40-1, and sets the value 1 to the tag field of that word. (a value indicating that the content of the word is valid) is written.

次に、新設命令であるＲＥＣＥ　ＩＶＥ命令を説明する
。ＲＥＣＥＩＶＥ命令は、この命令を実行したプロセッ
サエレメントのローカルメモリからデータを読み出す命
令である。第３図にＲＥＣＥ　ＩＶＥ命令のフォーマッ
トを示す。Next, the RECEIVE command, which is a newly installed command, will be explained. The RECEIVE instruction is an instruction to read data from the local memory of the processor element that executed this instruction. FIG. 3 shows the format of the RECE IVE command.

ＲＥＣＥ　ＩＶＥ命令はオペランドを２つ持つ。The RECE IVE instruction has two operands.

■、アドレス２、レジスタ番号各オペランドは、それぞれ命令フォーマットのＲ１フィ
ールドで指定される汎用レジスタ、Ｒ２フィールドに格
納されている。この命令は、第１オペランドで指定され
るローカルメモリのアドレスから有効なデータを読み出
し、第２オペランドで指定される汎用レジスタにそのデ
ータを格納することを意味する。この命令を実行すると
き、本実施例の並列計算機は次のように動作する。(2) Address 2, Register Number Each operand is stored in the general-purpose register and R2 field specified by the R1 field of the instruction format, respectively. This instruction reads valid data from the local memory address specified by the first operand and stores the data in the general-purpose register specified by the second operand. When executing this instruction, the parallel computer of this embodiment operates as follows.

まず、プロセッサ３の命令解読制御３７は、命令レジス
タ３６のフィールド３６−２．３６−３の内容（レジス
タ番号）を、それぞれ線Ｌ３０５゜線Ｌ３０６にて汎用
レジスタ群３４に出力するとともに、ＲＥＣＥ　ＩＶＥ
命令を実行することを意味する信号を線Ｌ３０３に出力
し、メモリアクセス回路３８を起動する。その結果、メ
モリアドレスとしての第１オペランドとともに、メモリ
アクセス回路３８が発生し線Ｌ３０１とＯＲ回路３１を
経由したメモリ読み出し要求信号が線Ｌ８にてローカル
メモリ６の第３のポートに伝えられる。First, the instruction decoding control 37 of the processor 3 outputs the contents (register numbers) of fields 36-2 and 36-3 of the instruction register 36 to the general-purpose register group 34 on lines L305 and L306, respectively, and
A signal indicating execution of the command is output to the line L303, and the memory access circuit 38 is activated. As a result, along with the first operand as a memory address, a memory read request signal generated by the memory access circuit 38 and passed through the line L301 and the OR circuit 31 is transmitted to the third port of the local memory 6 on the line L8.

するとローカルメモリ６は線Ｌ９にタグの値を、線ＬＩ
Ｏにデータを出力する。線ＬＩＯに出力されたデータは
、第２オペランドで指定された汎用レジスタにセットさ
れる。線Ｌ９に出力されたタグの値はメモリアクセス回
路３８に入力されるが、この値がＯのときにはメモリア
クセス回路３８は線Ｌ３０１にメモリ読み出し要求信号
を再び発生し、上記のメモリアクセスを繰り返す。線Ｌ
９に出力された値が１のときには、無効化回路３３内の
ＡＮＤ回路３００の出力が１になる。それにより線Ｌ６
に出力されている第１オペランドをアドレスとして、線
Ｌ７に値ＯとＡＮＤ回路３００の出力をあわせて、それ
ぞれタグ書き込みデータと書き込み要求としてローカル
メモリ６の第２のポートに出力することになるので、第
１オペランドで指定される語のタグがＯ（データが無効
であることを表わす値）になる。線Ｌ９が１のときにメ
モリアクセス回路３８はこれに並行して線Ｌ３０２より
プログラムカウンタ３５に信号を与え、プログラムカウ
ンタの内容を更新する。以上でＲＥＣＥ　ＩＶＥ命令が
終了する。Then, the local memory 6 stores the tag value on the line L9 and the tag value on the line LI.
Output data to O. The data output to line LIO is set in the general-purpose register specified by the second operand. The tag value output on line L9 is input to the memory access circuit 38, and when this value is O, the memory access circuit 38 again generates a memory read request signal on line L301 and repeats the above memory access. Line L
When the value output to 9 is 1, the output of the AND circuit 300 in the invalidation circuit 33 becomes 1. As a result, line L6
Using the first operand output to the line L7 as an address, the value O and the output of the AND circuit 300 are output to the second port of the local memory 6 as tag write data and a write request, respectively. , the tag of the word specified by the first operand becomes O (a value indicating that the data is invalid). When the line L9 is 1, the memory access circuit 38 applies a signal to the program counter 35 from the line L302 in parallel to this, and updates the contents of the program counter. This completes the RECE IVE command.

続いて、本実施例に係る並列計算機の動作を説明する。Next, the operation of the parallel computer according to this embodiment will be explained.

本実施例に係る並列計算機を構成する各プロセッサエレ
メントは、他のプロセッサエレメントとのデータの受は
渡しを必要としないときには通常のコンピュータと同様
な動作をする。他のプロセッサエレメントとのデータの
受は渡しが必要になる場合には、データを受ける側のプ
ロセッサエレメントのローカルメモリ内の、データの受
は渡しに用いる語（この語は、プログラマあるいはコン
パイラがあらかじめ定めておく）に対して、データを渡
す側のプロセッサエレメントが上述の５ＥＮＤ命令にて
データを書き込むように、データを受ける側のプロセッ
サエレメントがＲＥＣＥ　ＩＶＥ命令にてデータを読み
出すようにプログラムしておく。このようにすることで
、データを受ける側のプロセッサエレメントはデータを
渡す側のプロセッサエレメントがデータを書き込む前に
データを読み出してしまうことはなく、データの参照順
序が保証される。以上が本発明の第１の実施例である。Each processor element constituting the parallel computer according to this embodiment operates in the same way as a normal computer when it is not necessary to receive or pass data to or from other processor elements. When data needs to be passed to and from other processor elements, the word used for the data passing (this word is specified by the programmer or compiler in advance) is stored in the local memory of the receiving processor element. Program the processor element on the side that sends data to write data using the 5END instruction mentioned above, and the processor element on the side that receives data to read data using the RECE IVE instruction. . By doing so, the processor element on the data receiving side will not read the data before the data transmitting processor element writes the data, and the reference order of the data is guaranteed. The above is the first embodiment of the present invention.

以上ではＲＥＣＥ　ＩＶＥ命令を実行しそのときにタグ
の値が０であったときには、メモリアクセス回路が繰り
返しメモリのアクセスを行なってタグの値を検査するよ
うになっているが、ＲＥＣＥ　ＩＶＥ命令を実行すると
きにタグの値を１回だけ読んで命令実行を終了するよう
にし、この命令の実行結果を反映するフラグレジスタや
条件コードレジスタなどにこの値をセットさせ、ＲＥＣ
Ｅ　ＩＶＥ命令に引き続く条件分岐命令などでタグが１
になるまで繰り返しＲＥＣＥ　ＩＶＥ命令を実行するよ
うにして実施してもよい。In the above, when the RECE IVE instruction is executed and the tag value is 0, the memory access circuit repeatedly accesses the memory and checks the tag value. When executing the command, the tag value is read only once to complete instruction execution, and this value is set in a flag register or condition code register that reflects the execution result of this instruction.
E If the tag is set to 1 due to a conditional branch instruction following the IVE instruction, etc.
It may also be implemented by repeatedly executing the RECE IVE command until the RECE IVE command is reached.

く第２の実施例〉第２の実施例を図によって説明する。本実施例は第１の
実施例の変形になっている。第５図にて本実施例の並列
計算機の構成を説明する。第５図と第１図において同一
の符号が付けられている構成要素は同一である。Second Example> A second example will be described with reference to the drawings. This embodiment is a modification of the first embodiment. The configuration of the parallel computer of this embodiment will be explained with reference to FIG. Components labeled with the same reference numerals in FIG. 5 and FIG. 1 are the same.

第５図において８は受信メモリであり、第１図のローカ
ルメモリ６と同じく語を単位にアドレスが付けられてい
る。また、客語にはデータを格納するデータ部の他に１
ビツトのタグを格納するタグ部がある。受信メモリ８に
は、線Ｌ３をアドレス入力に、線Ｌ４をデータ入力とデ
ータ書き込み要求信号に、線Ｌ５をタグ入力とタグ書き
込み要求信号にする第１のボート、線Ｌ６をアドレス入
力に、線Ｌ７をタグ入力とタグ書き込み要求信号にする
第２のポート、線Ｌ８をアドレス入力とタグおよびデー
タ読み出し要求信号に、線Ｌ　９をタグ出力に、線Ｌ１
０をデータ出力信号にする第３のボートがある。各ボー
トからの要求が２つ以上同時に到着したときには、受信
メモリ８が適当に調停して要求に応答する。In FIG. 5, 8 is a receiving memory, and like the local memory 6 in FIG. 1, addresses are assigned in units of words. In addition to the data part that stores data, the guest word also has one part.
There is a tag section that stores bit tags. The receiving memory 8 has a first port that uses line L3 as an address input, line L4 as a data input and data write request signal, line L5 as a tag input and tag write request signal, line L6 as an address input, and line L6 as an address input. Second port with L7 as tag input and tag write request signal, line L8 as address input and tag and data read request signal, line L9 as tag output, line L1
There is a third port that makes 0 the data output signal. When two or more requests from each boat arrive at the same time, the receiving memory 8 arbitrates appropriately and responds to the requests.

９はローカルメモリであり、語を単位にしてアドレスが
付けられている。客語にはデータを格納するデータ部が
ある。ローカルメモリ９には、線Ｌ１５をアドレス入力
とデータ読み出し要求信号に、線Ｌ１４をデータ入出力
とデータ書き込み要求信号にする第１のポート、線Ｌ１
２をアドレス入力とデータ読み出し要求信号に、線Ｌ１
３をデータ出力にする第２のポートがある。各ポートか
らの入力が２つ以上同時に到着したときには、ローカル
メモリ９が適当に調停して要求に応答する。Reference numeral 9 denotes a local memory, to which addresses are assigned in units of words. A guest word has a data part that stores data. The local memory 9 has a first port, line L1, which uses line L15 as an address input and data read request signal, and line L14 as a data input/output and data write request signal.
2 as address input and data read request signal, line L1
There is a second port that makes 3 a data output. When two or more inputs from each port arrive at the same time, the local memory 9 arbitrates appropriately and responds to the requests.

受信器４ではローカルメモリ６であるのに対し、第５図
では受信メモリ８に書き込むことを除けば第１の実施例
と同様の動作をする。In contrast to the local memory 6 in the receiver 4, the operation in FIG. 5 is similar to that of the first embodiment except that writing is performed in the reception memory 8.

送信器５は第１の実施例の送信器５と同じ動作をする。The transmitter 5 operates in the same way as the transmitter 5 of the first embodiment.

プロセッサエレメント１−１の動作を第５図。FIG. 5 shows the operation of the processor element 1-1.

第６図を用いて説明する。まずプロセッサ７内の命令フ
ェッチ回路３０が線ＬＬ２よりプログラムカウンタ３５
の内容とデータ読み出し要求信号をローカルメモリ９の
第２のボートに出力する。するとローカルメモリ９が読
み出され、その内容が線Ｌ１３に出力されるのでこれが
命令レジスタ３６にセットされる。命令解読制御３７は
、命令レジスタ３６にセットされた命令のうちフィール
ド３６−１に格納された命令コードの値を解読し、その
命令で指定された動作を実現するための信号をプロセッ
サ７の内部に配り、ＡＬＵ３２．汎用レジスタ群３４な
どを動作させる。命令で指定された動作が終了すると、
命令解読制御３７はＭＬ３０２よりプログラムカウンタ
３５の値を更新し以上の動作を繰り返すようになってい
る。This will be explained using FIG. First, the instruction fetch circuit 30 in the processor 7 inputs the program counter 35 from the line LL2.
and the data read request signal are output to the second port of the local memory 9. Then, the local memory 9 is read out and its contents are output to the line L13, which is then set in the instruction register 36. The instruction decoding control 37 decodes the value of the instruction code stored in the field 36-1 of the instruction set in the instruction register 36, and sends a signal to the processor 7 for implementing the operation specified by the instruction. distributed to ALU32. The general-purpose register group 34 and the like are operated. When the action specified by the command is completed,
The instruction decoding control 37 updates the value of the program counter 35 from the ML 302 and repeats the above operation.

ネットワーク２の動作は第１の実施例と同じである。The operation of network 2 is the same as in the first embodiment.

続いて新設した命令について説明する。まず、５ＥＮＤ
命令を説明する。５ＥＮＤ命令は第１の実施例の５ＥＮ
Ｄ命令と同一のフォーマットになっている。オペランド
の意味も同一である。この命令を実行するとき、本実施
例の並列計算機は次のように動作する。Next, I will explain the newly established commands. First, 5END
Explain the command. The 5END instruction is the 5EN of the first embodiment.
It has the same format as the D instruction. The meanings of the operands are also the same. When executing this instruction, the parallel computer of this embodiment operates as follows.

まず、プロセッサ７の命令解読制御３７は命令レジスタ
３６のフィールド３６−２．３６−３゜３６−４に格納
されている値（レジスタ番号）を、それぞれ線Ｌ３０５
．線Ｌ３０６．線Ｌ３０７にて汎用レジスタ群３４に伝
える。その結果、第１゜第２．第３オペランドは線Ｌ３
１５を経由し、命令解読制御３７が発生し線Ｌ３０９か
ら伝えられる書き込み要求信号とともに線Ｌｌｌに出力
され、それぞれレジスタ５０のフィールド５ｏ−ｉ。First, the instruction decoding control 37 of the processor 7 inputs the values (register numbers) stored in the fields 36-2, 36-3, and 36-4 of the instruction register 36 to the lines L305 and 36-4, respectively.
．． Line L306. It is transmitted to the general-purpose register group 34 via line L307. As a result, 1st degree, 2nd degree. The third operand is line L3
15, an instruction decoding control 37 is generated and output on line Lll together with a write request signal transmitted from line L309, respectively, to fields 5o-i of register 50.

５ｏ−２，５０−３にセットされる。以上で５ＥＮＤ命
令は終了するが、これに引き続き以下の動作が行なわれ
る。Set to 5o-2, 50-3. This completes the 5END command, but the following operations are subsequently performed.

レジスタ５０に値がセットされると、既に述べたネット
ワーク２の動作によってフィールド５０−１で指定され
るプロセッサエレメントのレジスタ４０−１．４０−２
にそれぞれレジスタ５０−２．５０−３の内容がセント
される。するとデータがセットされたプロセッサエレメ
ントの受信器４は、受信メモリ８の第１ポートから４０
−１で指定される語のデータ部に４０−２の内容を、そ
の語のタグ部に値１　（その語の内容が有効であること
を表わす値）を書き込む。When a value is set in the register 50, the register 40-1.
The contents of registers 50-2 and 50-3 are respectively sent to the registers 50-2 and 50-3. Then, the receiver 4 of the processor element to which the data has been set receives the data from the first port of the reception memory 8.
The contents of 40-2 are written in the data section of the word specified by -1, and the value 1 (a value indicating that the contents of the word are valid) is written in the tag section of that word.

次に、新設命令であるＲＥＣＥ　ＩＶＥ命令を説明する
。ＲＥＣＥＩＶＥ命令は、第１の実施例のＲＥＣＥＩＶ
Ｅ命令と同一のフォーマットを持つ。Next, the RECEIVE command, which is a newly installed command, will be explained. The RECEIVE instruction is RECEIV in the first embodiment.
It has the same format as the E command.

オペランドも同様である。この命令を実行するとき、本
実施例の並列計算機は次のように動作する。The same applies to operands. When executing this instruction, the parallel computer of this embodiment operates as follows.

まず、プロセッサ７の命令解読制御３７は、命令レジス
タ３６のフィールド３６−２．３６−３の内容（レジス
タ番号）を、それぞれ線Ｌ３０５゜線し、３０６にて汎
用レジスタ群３４に出力するとともに、ＲＥＣＥＩＶＥ
命令を実行することを意味する信号を線Ｌ３０３に出力
し、メモリアクセス回路３８を起動する。その結果、メ
モリアドレスとしての第１オペランドとともに、メモリ
アクセス回路３８が発生し線Ｌ３０１を経由したメモリ
読み出し要求信号が線Ｌ８より受信メモリ８の第３のポ
ートに出力される。すると受信メモリ８は線Ｌ９にタグ
の値を、線ＬＩＯにデータを出力する。線ＬＩＯに出力
されたデータは、第２オペランドで指定された汎用レジ
スタにセットされる。First, the instruction decoding control 37 of the processor 7 reads the contents (register numbers) of fields 36-2 and 36-3 of the instruction register 36 through a line L305° and outputs them to the general-purpose register group 34 at 306, and RECEIVE
A signal indicating execution of the command is output to the line L303, and the memory access circuit 38 is activated. As a result, along with the first operand as a memory address, a memory read request signal generated by the memory access circuit 38 and passed through the line L301 is outputted to the third port of the receiving memory 8 from the line L8. Then, the reception memory 8 outputs the tag value to the line L9 and the data to the line LIO. The data output to line LIO is set in the general-purpose register specified by the second operand.

線Ｌ９に出力されたタグの値はメモリアクセス回路３８
に入力されるが、この値がＯのときにはメモリアクセス
回路３８は線Ｌ３０１にメモリ読み出し要求信号を再び
発生し、上記のメモリアクセスを繰り返す。線Ｌ９に出
力された値が１のときには、無効化回路３３内のＡＮＤ
回路３００の出力が１になる。それにより線Ｌ６に出力
されている第１オペランドをアドレスとして、線Ｌ７に
値０とＡＮＤ回路３００の出力をあわせて、それぞれタ
グ書き込みデータと書き込み要求信号として受信メモリ
８の第２のボートに出力することになるので、第１オペ
ランドで指定される語のタグ０（データが無効であるこ
とを表わす値）になる。The tag value output to line L9 is sent to memory access circuit 38.
However, when this value is O, the memory access circuit 38 again generates a memory read request signal on the line L301, and repeats the above memory access. When the value output to the line L9 is 1, the AND in the invalidation circuit 33
The output of circuit 300 becomes 1. As a result, the first operand output on the line L6 is used as an address, and the value 0 and the output of the AND circuit 300 are output on the line L7 as tag write data and a write request signal, respectively, to the second port of the receiving memory 8. Therefore, the tag of the word specified by the first operand is 0 (a value indicating that the data is invalid).

線Ｌ９が１のときにメモリアクセス回路３８はこれに並
行して線Ｌ３０２よりプログラムカウンタ３５に信号を
与え、プログラムカウンタ３５の内容を更新する。以上
でＲＥＣＥＴＶＥ命令が終了する。When the line L9 is 1, the memory access circuit 38 applies a signal to the program counter 35 from the line L302 in parallel to this, and updates the contents of the program counter 35. This completes the RECETVE command.

本実施例の並列計算機は、プロセッサエレメント間でデ
ータの受は渡しをするときに用いる記憶場所が受信メモ
リ８であることを除けば第１の実施例の並列計算機と同
様の動作をし、第１の実施例と同じ効果を得られる。The parallel computer of this embodiment operates in the same way as the parallel computer of the first embodiment, except that the storage location used for receiving and passing data between processor elements is the reception memory 8. The same effect as the first embodiment can be obtained.

さらに、本実施例に特有な効果として次の効果がある。Furthermore, the following effects are unique to this embodiment.

本実施例では、データの受は渡しを行なうメモリ（受信
メモリ８）を、プログラムおよびデータを格納するメモ
リ（ローカルメモリ９）と分けであるため、プログラム
を作成する立場からは第７図に示したように、受信メモ
リの空間８０とローカルメモリの空間９０の２つのメモ
リ空間があるように見える。それゆえ、この並列計算機
で実行するプログラムを結合編集し、未解決外部参照を
解決するときに、１）各プロセッサエレメントの受信メ
モリ相互間の未解決外部参照の解決、２）各プロセッサ
エレメント内部での未解決外部参照の解決の両者を独立
に実行できるようになる。In this embodiment, the memory for receiving and passing data (reception memory 8) is separated from the memory for storing programs and data (local memory 9), so from the standpoint of creating a program, the memory shown in FIG. As shown above, there appear to be two memory spaces: a receiving memory space 80 and a local memory space 90. Therefore, when combining and editing programs to be executed on this parallel computer and resolving unresolved external references, 1) resolving unresolved external references between the receiving memories of each processor element, and 2) resolving the unresolved external references within each processor element. Both resolutions of unresolved external references can be performed independently.

すると、プロセッサエレメント間でデータを受は渡しな
がら計算を進めるプログラムをサブルーチンとして一度
作成しておき、１）の未解決外部参照までを解決してお
けば、このサブルーチンを流用して別のプログラムに組
み込むときには従来通りの２）の未解決外部参照の解決
さえすればよいことになる。つまり、プログラムの再利
用が容易になる。Then, if you create a program as a subroutine that performs calculations while passing data between processor elements, and resolve the unresolved external references in 1), you can reuse this subroutine in another program. When incorporating, all that is required is to resolve unresolved external references in 2) as before. In other words, programs can be easily reused.

〈第３の実施例〉第３の実施例を図によって説明する。本実施例は第１の
実施例の変形になっている。第９図にて本実施例の並列
計算機の構成を説明する。第９図と第１図において同一
の符号が付けられている構成要素は同一である。<Third Example> A third example will be described with reference to the drawings. This embodiment is a modification of the first embodiment. The configuration of the parallel computer of this embodiment will be explained with reference to FIG. Components labeled with the same reference numerals in FIG. 9 and FIG. 1 are the same.

本実施例では、第１の実施例に若干の変更を加えること
によって第２の実施例で得られた効果を実現する。第ｉ
の実施例に特有な効果は、第７図にあるように受信メモ
リの作るメモリ空間８０と、ローカルメモリの作るメモ
リ空間９０を分けたことに起因する。空間を分けるこ゛
とについては、第２の実施例のようにメモリの実体を分
けることによっても達成できるが、本実施例では別の方
法で達成する。すなわち、第８図にあるように受信メモ
リの空間８１をローカルメモリの空間９１の中にマツピ
ングすることで、アーキテクチャ上は別々の空間をハー
ドウェア上は同一のメモリの上に実現する。マツピング
の方法には第８図に示すような、受信メモリ空間のアド
レスに一定値″ｘ″を加えることでローカルメモリのア
ドレスにする方法をとる。In this embodiment, the effects obtained in the second embodiment are achieved by making some changes to the first embodiment. i-th
The unique effect of this embodiment is due to the fact that the memory space 80 created by the reception memory and the memory space 90 created by the local memory are separated, as shown in FIG. Dividing the space can also be achieved by dividing the memory entity as in the second embodiment, but in this embodiment it is achieved by a different method. That is, by mapping the reception memory space 81 into the local memory space 91 as shown in FIG. 8, architecturally separate spaces are realized on the same hardware memory. The mapping method is as shown in FIG. 8, in which a fixed value "x" is added to the address in the reception memory space to make it an address in the local memory.

第９図において１００はアドレス変換であり、その詳細
は第１１図に示されている。第１１図において１０１は
レジスタであり、上述の一定値”　ｘ　”を格納する。In FIG. 9, 100 is address translation, the details of which are shown in FIG. 11. In FIG. 11, 101 is a register, which stores the above-mentioned constant value "x".

レジスタ１０１への値のセットは、例えば第９図のプロ
セッサ１１が行なえばよい（これに必要な線は図では省
略しである）。The value may be set in the register 101 by, for example, the processor 11 in FIG. 9 (lines necessary for this are omitted from the figure).

第１１図において１０２は加算器である。アドレス変換
１００はレジスタ４０−１の内容に一定値＋１　ｘ＋＋
を加えた値を線Ｌ３に出力する。In FIG. 11, 102 is an adder. Address conversion 100 is a constant value +1 x++ to the contents of register 40-1.
The added value is output to line L3.

第９図において１１０はアドレス変換であり、その詳細
は第１２図に示されている。第１２図において１１３は
レジスタであり、上述の一定値”　ｘ　”を格納する。In FIG. 9, 110 is address translation, the details of which are shown in FIG. 12. In FIG. 12, 113 is a register, which stores the above-mentioned constant value "x".

レジスタ１１３への値のセットはレジスタ１０１の場合
と同様に行なえばよい。Values can be set in register 113 in the same manner as in register 101.

１１４は加算器である。アドレス変換１１０は、線Ｌ３
１３から入力された値に一定値゛″ｘ”を加えた値を線
Ｌ６に出力する。114 is an adder. Address translation 110 is performed on line L3
A value obtained by adding a constant value "x" to the value input from 13 is output to line L6.

第１０図はプロセッサ１１の構成を示している。FIG. 10 shows the configuration of the processor 11.

第１１図と第２図において、同一の符号が付けられてい
る構成要素は同一である。第１１図において、１１０は
アドレス変換であり、既に説明した。In FIG. 11 and FIG. 2, components denoted by the same reference numerals are the same. In FIG. 11, 110 is address translation, which has already been explained.

１１１は命令解読制御である。１１２はセレクタであり
、線Ｌ３１２によって制御される。111 is an instruction decoding control. 112 is a selector, which is controlled by line L312.

プロセッサエレメント１−１の動作を第９図。FIG. 9 shows the operation of the processor element 1-1.

第１０図を用いて説明する。プロセッサ１１の動作は、
第１の実施例のプロセッサ３の動作とほぼ同様であるが
、以下に特記する命令（ＲＥＣＥＩＶＥ命令）以外ではセＩ、／’）９１１２
は常に線Ｌ３１３を線Ｌ３１４に伝えるように命令解読
制御１１１が制御していることが異なる。This will be explained using FIG. The operation of the processor 11 is as follows:
The operation is almost the same as that of the processor 3 of the first embodiment, except for the commands (RECEIVE commands) mentioned below.
The difference is that the instruction decoding control 111 always transmits the line L313 to the line L314.

５ＥＮＤ命令を実行したプロセッサエレメントの動作は
第１の実施例のプロセッサエレメントと同一である。ネ
ットワークを経由してデータをレジスタ４０にセットさ
れたプロセッサエレメントは次のように動作する。この
プロセッサエレメントの受信器１０は、ローカルメモリ
６の第１のポートから、４０−１の内容にアドレス変換
１００によって変換を施して得られた値で指定される語
のデータ部に４０−２の内容を、その語のタグ部に値１
を書き込む。The operation of the processor element that executed the 5END instruction is the same as that of the processor element of the first embodiment. The processor element whose data is set in the register 40 via the network operates as follows. The receiver 10 of this processor element converts the contents of 40-1 from the first port of the local memory 6 into the data portion of the word 40-2 specified by the value obtained by converting the contents of 40-1 by the address conversion 100. The content is set to the value 1 in the tag section of the word.
Write.

次に、新設命令であるＲＥＣＥ　ＩＶＥ命令を説明する
＠ＲＥＣＥ　ＩＶＥ命令は、第１の実施例のＲＥＣＥ　
ＩＶＥ命令と同一のフォーマットを持つ。Next, we will explain the RECE IVE command, which is a newly created command.
It has the same format as the IVE instruction.

まず、プロセッサ１１の命令解読制御３７は、命令レジ
スタ３６のフィールド３６−２．３６−３の内容を、そ
れぞれ線Ｌ３０５、線Ｌ３０６にて汎用レジスタ群３４
に出力するとともに、ＲＥＣＥＩＶＥ命令を実行するこ
とを意味する信号を線Ｌ３０３に出力し、メモリアクセ
ス回路３８を起動する。その結果、メモリアドレスとし
ての第１オペランドが線Ｌ３１３．アドレス変換１１０
、セレクタ１１２を経由して線Ｌ３１４に出力される。First, the instruction decoding control 37 of the processor 11 transfers the contents of fields 36-2 and 36-3 of the instruction register 36 to the general-purpose register group 34 on lines L305 and L306, respectively.
At the same time, a signal indicating execution of the RECEIVE instruction is output to the line L303, and the memory access circuit 38 is activated. As a result, the first operand as a memory address is on line L313. Address conversion 110
, are output to line L314 via selector 112.

これと、メモリアクセス回路３８が発生し、線Ｌ３０１
．ＯＲ回路３１を経由したメモリ読み出し要求信号とが
あわさって線Ｌ８として、ローカルメモリ６の第３のポ
ートに伝えられる。するとローカルメモリ６は線Ｌ９に
タグの値を、線ＬＩＯにデータを出力する。線Ｌ１０に
出力されたデータは、第２オペランドで指定された汎用
レジスタにセットされる。線Ｌ９に出力されたタグの値
はメモリアクセス回路３８に入力されるが、この値が０
のときにはメモリアクセス回路３８は線Ｌ３０１にメモ
リ読み出し要求信号を再び発生し上記のメモリアクセス
を繰り返す。線Ｌ９に出力された値が１のときには、無
効化回路３３内のＡＮＤ回路３００の出力が１になる。In addition to this, the memory access circuit 38 occurs, and the line L301
．． Together with the memory read request signal via the OR circuit 31, the signal is transmitted to the third port of the local memory 6 as a line L8. Then, the local memory 6 outputs the tag value to the line L9 and the data to the line LIO. The data output to line L10 is set in the general-purpose register specified by the second operand. The tag value output to line L9 is input to the memory access circuit 38, but if this value is 0
At this time, the memory access circuit 38 again generates a memory read request signal on the line L301 and repeats the above memory access. When the value output to line L9 is 1, the output of AND circuit 300 in invalidation circuit 33 becomes 1.

それにより、第１オペランドが線Ｌ３１３．アドレス変
換１１０を経由し、線Ｌ６に出力されている値をアドレ
スとして、線Ｌ７に値０とＡＮＤ回路３００の出力をあ
わせてタグ書き込みデータと書き込み要求信号としてロ
ーカルメモリ６の第２のボートに出力することになるの
で、第１オペランドで指定される受信メモリの空間の語
のタグが０になる。線Ｌ９が１のときにメモリアクセス
回路３８はこれに並行して線Ｌ３０２よりプログラムカ
ウンタ３５に信号を与え、プログラムカウンタ３５の内
容を更新する。以上でＲＥＣＥＩＶＥ命令が終了する。Thereby, the first operand is set to line L313. Via the address conversion 110, the value output to the line L6 is used as an address, and the value 0 and the output of the AND circuit 300 are combined to the line L7 and sent to the second port of the local memory 6 as tag write data and a write request signal. Since it is to be output, the tag of the word in the reception memory space specified by the first operand becomes 0. When the line L9 is 1, the memory access circuit 38 applies a signal to the program counter 35 from the line L302 in parallel to this, and updates the contents of the program counter 35. This completes the RECEIVE command.

以上では、５ＥＮＤ命令の実行にともなうアドレス変換
を第９図の受信器１０の中のアドレス変換１００で実行
するようになっているが、この変換は５ＥＮＤ命令を実
行してから実際にローカルメモリに値が書き込まれるま
での間に実行すればよい。それゆえ、例えば第１３図の
ように送信器１２の中にアドレス変換１２０を置き、こ
こでアドレス変換１１０と同じ変換をするように構成し
てもよい。In the above, the address conversion accompanying the execution of the 5END instruction is executed by the address conversion 100 in the receiver 10 in FIG. It should be executed before the value is written. Therefore, for example, as shown in FIG. 13, an address translation 120 may be placed in the transmitter 12 and configured to perform the same translation as the address translation 110 here.

また、以上ではマツピングの方法として一定値”ｘ″′
を加える方法で説明したが、マツピング自体はこれに限
る必要はない。受信メモリ空間のアドレスからローカル
メモリ空間のアドレスが一意に決まり、かつ、重なりが
ないマツピングであればよい。In addition, in the above mapping method, a constant value "x"'
Although we have explained the method of adding , mapping itself does not need to be limited to this method. It is sufficient that the address of the local memory space is uniquely determined from the address of the reception memory space, and the mapping does not overlap.

また、以上において５ＥＮＤ命令で他のプロセッサエレ
メントのローカルメモリにデータを書き込むときに、第
８図のローカルメモリの空間９１の中の、受信メモリの
空間８１をマツピングした部分以外の部分への書き込み
を実行させないようにすれば、プログラムの誤りなどに
よってプロセッサエレメントにローカルなデータを他の
プロセッサエレメントから破壊されないようにできる。In addition, when writing data to the local memory of another processor element using the 5END command in the above, writing to a portion of the local memory space 91 in FIG. 8 other than the portion mapped to the receiving memory space 81 is prohibited. By preventing execution, data local to the processor element can be prevented from being destroyed by other processor elements due to a program error or the like.

例えば、受信メモリ空間として使用するローカルメモリ
空間の容量をｎ語とするとき、５ＥＮＤ命令を実行する
ときに命令の第２のオペランドである受信メモリの空間
のアドレスがｎ番地未満であることをコンパレータなど
によって確認し、ｎ番地以上であるならば書き込みを抑
止するようにすればよい。あるいは５ＥＮＤ命令に伴っ
て実行するアドレス変換の実行後に、変換で得られたア
ドレスがＸ番地以上ｘ＋ｎ番地未満であることを確認す
ればよい。For example, if the capacity of the local memory space used as the reception memory space is n words, when executing the 5END instruction, the comparator determines that the address of the reception memory space, which is the second operand of the instruction, is less than address n. For example, if the address is n or more, writing may be inhibited. Alternatively, after executing the address conversion performed in conjunction with the 5END instruction, it may be confirmed that the address obtained by the conversion is greater than or equal to address X and less than address x+n.

本実施例によれば、第２の実施例で示した効果を得られ
るばかりでなく、次のような効果もある。According to this embodiment, not only the effects shown in the second embodiment can be obtained, but also the following effects.

すなわち、第２の実施例では受信メモリ８とローカルメ
モリ９がハードウェア上分離しているため、規模の大き
な計算を行なおうとしてもローカルメモリ９の容量が不
充分になると、たとえ受信メモリ８に余裕があってもこ
れを転用して計算を実行できない。逆の場合もそうなる
。ところが本実施例によれば、受信メモリ空間に余裕が
あるときには、この部分をローカルメモリとして使用す
ることができる。つまり、第２の実施例における受信メ
モリ８の容量とローカルメモリの容量の和が、本実施例
のローカルメモリ６の容量に等しい場合には、本実施例
の並列計算機のほうがより大規模な計算を行なえる可能
性があることになる。That is, in the second embodiment, since the receiving memory 8 and the local memory 9 are separated in terms of hardware, if the capacity of the local memory 9 becomes insufficient even when attempting to perform large-scale calculations, even if the receiving memory 8 Even if there is a margin, this cannot be used to perform calculations. The opposite is also true. However, according to this embodiment, if there is sufficient reception memory space, this portion can be used as a local memory. In other words, if the sum of the capacity of the receiving memory 8 and the capacity of the local memory in the second embodiment is equal to the capacity of the local memory 6 in the present embodiment, the parallel computer of the present embodiment can perform larger-scale calculations. This means that it is possible to do this.

〈第４の実施例〉本実施例は第３の実施例の変形である。第３の実施例で
は、プロセッサエレメントのローカルメモリは語を単位
としてアドレスが付けられているメモリになっていた。<Fourth Example> This example is a modification of the third example. In the third embodiment, the local memory of the processor element is word-addressed memory.

実際の計算機は文字データなどの１バイト単位のデータ
を処理する必要があるので、バイト（８ビツト）を単位
にアドレスを付けである場合が多い。その一方で、通常
の数値データ処理では４バイトあるいは８バイトを単位
に計算する場合が多い。それゆえ、語を１バイトとし、
１バイトごとにタグを付けるようにすると、たとえば８
バイトのデータの参照順序の保証のために第３の実施例
の５ＥＮＤ命令。Since actual computers need to process data such as character data in units of 1 byte, addresses are often assigned in units of bytes (8 bits). On the other hand, in normal numerical data processing, calculations are often performed in units of 4 or 8 bytes. Therefore, if a word is one byte,
If you add a tag to each byte, for example, 8
5END instruction of the third embodiment to guarantee reference order of byte data.

ＲＥＣＥ　ＩＶＥ命令を８回ずつ実行することは、実行
時間が長くなるだけでなく、タグを実装するために用意
しなければならないハードウェア量も大きくなってしま
う、そこで本実施例では、プロセッサエレメント内のデ
ータは１バイト単位に処理できるようにしておきながら
、８バイト単位にタグを付け、８バイトをｍ位に５ＥＮ
Ｄ命令。Executing the RECE IVE instruction eight times at a time not only increases the execution time but also increases the amount of hardware that must be prepared to implement the tag. The data can be processed in 1-byte units, and tags are attached to each 8-byte unit, and 5EN is added to the 8-byte m position.
D command.

ＲＥＣＥ　Ｉ　ＶＥ命令を実行できるようにする。Enables execution of RECE IVE command.

本実施例でのマツピング方法を第１４図にて説明する。The mapping method in this embodiment will be explained with reference to FIG.

第１４図において、８２は受信メモリの空間であり、８
バイトごとにアドレスが付けられている。９２はローカ
ルメモリの空間であり、１バイトごとにアドレスが付け
られている。ここで、受信メモリの空間のアドレスをａ
、これに対応するローカルメモリの空間のアドレスをｂ
とするとき、ｂ＝ａ＊８＋ｘのように行なう。ここで値
Ｘは、第３の実施例の一定値′ｌ　Ｘ、、と同じ意味を
持つ。In FIG. 14, 82 is the reception memory space;
Each byte is assigned an address. 92 is a local memory space, and each byte is assigned an address. Here, the address of the reception memory space is a
, the address of the local memory space corresponding to this is b
When , perform as b=a*8+x. Here, the value X has the same meaning as the constant value 'lX, , in the third embodiment.

本実施例の並列計算機は、第３の実施例の並列計算機の
構成を示す第９図、第１０図において、アドレス変換１
００を第１５図のアドレス変換１０４に、アドレス変換
１１０を第１６図のアドレス変換１１４に置き換えるこ
とによって実現できる。The parallel computer of this embodiment has address conversion 1 in FIGS. 9 and 10 showing the configuration of the parallel computer of the third embodiment.
This can be realized by replacing 00 with address translation 104 in FIG. 15 and address translation 110 with address translation 114 in FIG. 16.

第１５図において、１０３は左３ビツトシフタである。In FIG. 15, 103 is a left 3-bit shifter.

第１６図において１１３は左３ビツトシフタである。こ
のように左３ビツトシフタをおけば、左３ビツトシフト
はアドレスのような整数データについては８を乗するこ
とと等価なので上述のマツピングが可能になる。In FIG. 16, 113 is a left 3-bit shifter. If a 3-bit shifter is provided on the left in this manner, the above-mentioned mapping becomes possible because a 3-bit shift on the left is equivalent to multiplying integer data such as an address by 8.

く第５の実施例〉第５の実施例を図によって説明する１本実施例は第１の
実施例の変形になっている０本実施例と第１の実施例に
おいて、同一の符号で表わされる構成要素は同一の構成
になっている。まず１本実施例に係る並列計算機の概要
を第１９図にて説明する。第１９図において１−１ない
し１−ｎはｎ台の独立に動作可能なプロセッサエレメン
トである。２はネットワークであり、１−１ないし１−
ｎの任意のプロセッサエレメントから発せられるデータ
の送信要求を受け、データを指定された任意のプロセッ
サエレメントへ転送する。Fifth Embodiment> The fifth embodiment will be explained using figures.1 This embodiment is a modification of the first embodiment.The same reference numerals are used in this embodiment and the first embodiment. The components shown have the same configuration. First, an overview of the parallel computer according to this embodiment will be explained with reference to FIG. In FIG. 19, 1-1 to 1-n are n processor elements that can operate independently. 2 is a network, 1-1 or 1-
It receives a data transmission request issued from any n arbitrary processor element and transfers the data to the designated arbitrary processor element.

次にプロセッサエレメント１−１ないし１−ｎの構成を
説明する。プロセッサエレメント１−１ないし１−ｎは
同一の構成になっているが、第１９図では簡単のため１
−１のみの内部を示しである。プロセッサエレメント１
−１は、プロセッサ１３．受信器４．送信器５．ローカ
ルメモリ６から構成される。プロセッサ１３の詳細は第
２０図に示しである。第２０図において、３０は命令フ
ェッチ回路、３５はプログラムカウンタである。Next, the configuration of processor elements 1-1 to 1-n will be explained. Processor elements 1-1 to 1-n have the same configuration, but for simplicity in FIG.
Only the inside of -1 is shown. Processor element 1
-1 is processor 13. Receiver 4. Transmitter 5. It is composed of a local memory 6. Details of the processor 13 are shown in FIG. In FIG. 20, 30 is an instruction fetch circuit, and 35 is a program counter.

３６は命令レジスタであり、命令コードを格納するフィ
ールドである３６−１と、オペランドを格納するフィー
ルドである３６−２．３６−３゜３６−４に分かれてい
る。１３０１は命令解読制御で、命令の解読とその実行
の制御を行なう。Reference numeral 36 denotes an instruction register, which is divided into a field 36-1 for storing an instruction code and a field 36-2, 36-3, and 36-4 for storing operands. Reference numeral 1301 denotes an instruction decoding control, which decodes instructions and controls their execution.

１３０７は送信制御回路で、後述のｖＳＥＮＤ命令を実
行するときに使用する。１３０２はメモリアクセス回路
で、後述のＶＲＥＣＥ　ＩＶＥ命令を実行するときに使
用する。１３０８はセレクタ、３１はＯＲ回路、３３は
無効化回路、３００はＡＮＤ回路、１３０９．１３１７
はセレクタである。１３０６はアドレス生成回路である
。３４は汎用レジスタ群、１３０３はベクトルレジスタ
群、１３０４．１３０５は演算器である。ベクトルレジ
スタ群１３０３は複数のベクトルレジスタから構成され
ており、各ベクトルレジスタは読み出し。A transmission control circuit 1307 is used when executing a vSEND command, which will be described later. A memory access circuit 1302 is used when executing a VRECE IVE instruction, which will be described later. 1308 is a selector, 31 is an OR circuit, 33 is an invalidation circuit, 300 is an AND circuit, 1309.1317
is a selector. 1306 is an address generation circuit. 34 is a general-purpose register group, 1303 is a vector register group, and 1304 and 1305 are arithmetic units. The vector register group 1303 is composed of a plurality of vector registers, and each vector register can be read.

書き込みにおいて第１要素から順にアクセスされるよう
になっている。１３０６はアドレス生成回路で、その内
部は第２１図に示しである。第２１図において、１３１
０はレジスタ、１３１１はインクリメント回路、１３１
２はレジスタ、１３１３はディクリメント回路、１３１
４はＯ検出回路、１３１５と１３１６はセレクタである
。In writing, the elements are accessed in order starting from the first element. 1306 is an address generation circuit, the inside of which is shown in FIG. In Figure 21, 131
0 is a register, 1311 is an increment circuit, 131
2 is a register, 1313 is a decrement circuit, 131
4 is an O detection circuit, and 1315 and 1316 are selectors.

プロセッサ１３はいわゆるベクトル計算機であるが、本
実施例のために通常のベクトル計算機の命令セット（ス
カラデータ用のメモリリード、メモリライト、演算命令
およびベクトルデータ用のメモリリード、メモリライト
、演算命令など）の他に若干の新設命令を実行できるよ
うになっている。新設命令については後述する。The processor 13 is a so-called vector computer, but for the purpose of this embodiment, it uses a normal vector computer instruction set (memory read, memory write, arithmetic instructions for scalar data, memory read, memory write, arithmetic instructions for vector data, etc.). ), it is now possible to execute some newly established commands. The new order will be discussed later.

第１９図のローカルメモリ６の機能と構成は第１の実施
例のローカルメモリ６と同一である。The function and configuration of the local memory 6 in FIG. 19 are the same as the local memory 6 in the first embodiment.

送信器５．受信器４．ネットワーク２についても、それ
ぞれ第１の実施例の送信器５．受信器４゜ネットワーク
２と同一である。Transmitter 5. Receiver 4. Regarding the network 2, transmitters 5 and 5 of the first embodiment are respectively used. Receiver 4° is the same as network 2.

プロセッサエレメント１−１の動作を第１９図。FIG. 19 shows the operation of the processor element 1-1.

第２０図、第２１図を用いて説明する。まずプロセッサ
１３の中の命令フェッチ回路３０が線Ｌ１２よりプログ
ラムカウンタ３５の内容とデータ読み出し要求信号をロ
ーカルメモリ６の第４のポートに出力する。するとロー
カルメモリ６が読み出され、その内容が線Ｌ１３に出力
されるので、これが命令レジスタ３６にセットされる。This will be explained using FIGS. 20 and 21. First, the instruction fetch circuit 30 in the processor 13 outputs the contents of the program counter 35 and a data read request signal to the fourth port of the local memory 6 via the line L12. Then, the local memory 6 is read out and its contents are output to the line L13, so this is set in the instruction register 36.

命令解読制御１３０１は、命令レジスタ３６にセットさ
れた命令のうちフィールド３６−１に格納された命令コ
ードの値を解読し、その命令で指定された動作を実現す
るための信号をプロセッサ１３の内部に配り、演算器１
３０４．演算器１３０５．汎用レジスタ群３４．ベクト
ルレジスタ群１３０３などを動作させる。命令で指定さ
れた動作が終了すると、命令解読制御１３０１は線Ｌ１
３０８よりプログラムカウンタ３５の値を更新し以上の
動作を繰り返すようになっている。The instruction decoding control 1301 decodes the value of the instruction code stored in the field 36-1 of the instruction set in the instruction register 36, and sends a signal to the internal processor 13 to implement the operation specified by the instruction. Distribute to the calculator 1
304. Arithmetic unit 1305. General-purpose register group 34. The vector register group 1303 and the like are operated. When the operation specified by the command is completed, the command decoding control 1301 outputs the line L1.
From step 308, the value of the program counter 35 is updated and the above operations are repeated.

続いて新設した命令について説明する。まず、ＶＳＥＮ
Ｄ命令を説明する。ＶＳＥＮＤ命令は、この命令を実行
したプロセッサエレメントの持つベクトルデータを他の
プロセッサエレメントの中にあるローカルメモリに書き
込むための命令である。第１８図にＶＳＥＮＤ命令のフ
ォーマットを示す。ＶＳＥＮＤ命令はオペランドを３つ
持つ。Next, I will explain the newly established commands. First, VSEN
Explain the D command. The VSEND instruction is an instruction for writing vector data possessed by the processor element that executed this instruction into a local memory in another processor element. FIG. 18 shows the format of the VSEND command. The VSEND instruction has three operands.

１、あて先２、ベースアドレス３、ベクトルデータ各オペランドは、それぞれ命令フォーマットのＲ１で指
定される汎用レジスタ、Ｒ２で指定される汎用レジスタ
、ＶＲ３フィールドで指定されるベクトルレジスタに格
納されている。この命令は、第１オペランドで指定され
るプロセッサエレメントのローカルメモリの、第２オペ
ランドで指定されるアドレスから始まる連続領域に第３
オペランドで指定されるベクトルデータを書き込むこと
を意味する。なお、ベクトルデータの要素数はあらかじ
め別の命令で汎用レジスタ群３４内の特定のレジスタに
格納されているものとする。この命令を実行するとき、
本実施例の並列計算機は次のように動作する。The operands 1, destination 2, base address 3, and vector data are respectively stored in a general-purpose register specified by R1, a general-purpose register specified by R2, and a vector register specified by the VR3 field of the instruction format. This instruction stores a third contiguous area in the local memory of the processor element specified by the first operand, starting from the address specified by the second operand.
This means writing the vector data specified by the operand. It is assumed that the number of elements of the vector data is stored in advance in a specific register in the general-purpose register group 34 by another instruction. When executing this command,
The parallel computer of this embodiment operates as follows.

まず、プロセッサ１３の命令解読制御１３０１は命令レ
ジスタ３６のフィールド３６−２．３６−３に格納され
ている値（レジスタ番号）を、それぞれ線Ｌ１３０２．
線Ｌ１３０３にて汎用レジタス群３４に、フィールド３
６−４に格納されている値（ベクトルレジスタ番号）を
線Ｌ１３０４にてベクトルレジスタ群１３０３に伝える
。また、ＶＳＥＮＤ命令を実行することを意味する信号
をａＬ１３１０にて送信制御回路１３０７に伝え、これ
を起動する。すると第１オペランドは線Ｌ１３１１を経
由し、送信制御回路１３０７が発生し線Ｌｔ３１２．セ
レクタ１３０８を経由してきたレジスタへの書き込み要
求信号とともに線Ｌ２０として出力され、送信器５のレ
ジスタ５０のフィールドである５０−１にセットされる
。第２オペランドは、１％ＬＬ３０１．セレクタ１３１
５を経由してレジスタ１３１０に、ベクトルデータの要
素数は線Ｌｌ　３０１．セレクタ１３１６を経由してレ
ジスタ１３１２に格納される。First, the instruction decoding control 1301 of the processor 13 reads the values (register numbers) stored in fields 36-2, 36-3 of the instruction register 36 on lines L1302, .
At line L1303, field 3 is added to general register group 34.
The value (vector register number) stored in 6-4 is transmitted to the vector register group 1303 via line L1304. Further, a signal indicating execution of the VSEND command is transmitted to the transmission control circuit 1307 through the aL 1310, and this is activated. Then, the first operand is transmitted via line L1311, and the transmission control circuit 1307 generates a signal on line Lt312. It is output as a line L20 together with the register write request signal that has passed through the selector 1308, and is set to field 50-1 of the register 50 of the transmitter 5. The second operand is 1%LL301. Selector 131
The number of elements of the vector data is stored in the register 1310 via line Ll 301.5. It is stored in the register 1312 via the selector 1316.

続いて以下の動作を繰り返す。Then repeat the following actions.

まず、アドレス生成回路１３０６内のレジスタ１３１０
の内容が線Ｌ１３１５．セレクタ１３０９を経由して線
Ｌ１３２０に、送信制御回路１３０７が発生する書き込
み要求信号が線Ｌ１３１９に出力され、これらがあわさ
れて線Ｌ２１として送信器５に伝えられ、フィールド５
〇−２にセットされる。これと並行して第３オペランド
で指定されるベクトルレジスタの要素が線Ｌ１３１１に
ひとつ読み出され、送信制御回路１３０７が発生し線Ｌ
１３１２を経由しセレクタ１３０８に出力された書き込
み信号とあわせて線Ｌ２０として送信器５に伝えられ、
フィールド５０−３にセットされる。引き続き送信制御
回路１３０７は線Ｌ１３１３．線Ｌ１３０６を経由して
アドレス生成回路１３０６内のインクリメント回路１３
１１．ディクリメント回路１３１３に信号を送り、レジ
スタ１３１０．１３１２の内容をそれぞれ１増加、１減
少させる。このとき０検出回路１３１４はディクリメン
ト回路１３１３がレジスタ１３１２の内容を１減少した
結果がＯになったならば、線Ｌ１３０７にて送信制御回
路１３０７に信号を送る。First, the register 1310 in the address generation circuit 1306
The contents of line L1315. The write request signal generated by the transmission control circuit 1307 is outputted to the line L1320 via the selector 1309, and the write request signal generated by the transmission control circuit 1307 is outputted to the line L1319.
It is set to 〇-2. In parallel with this, one element of the vector register specified by the third operand is read out to line L1311, the transmission control circuit 1307 is activated, and
It is transmitted to the transmitter 5 as a line L20 together with the write signal output to the selector 1308 via the line L20,
It is set in field 50-3. Subsequently, the transmission control circuit 1307 connects the line L1313. Increment circuit 13 in address generation circuit 1306 via line L1306
11. A signal is sent to a decrement circuit 1313 to increase and decrease the contents of registers 1310 and 1312 by 1, respectively. At this time, if the decrement circuit 1313 decreases the contents of the register 1312 by 1 and the result becomes O, the 0 detection circuit 1314 sends a signal to the transmission control circuit 1307 via the line L1307.

送信器５では、レジスタ５０に値がセットされるたびに
その内容をネットワーク経由で５０−１の内容で指定さ
れるプロセッサエレメントに送信する。送信以降の動作
は第１の実施例と同じである。The transmitter 5 transmits the contents via the network to the processor element specified by the contents of 50-1 each time a value is set in the register 50. The operations after transmission are the same as in the first embodiment.

命令実行過程の最初にレジスタ１３１２にベクトルデー
タの要素数をセットしであるので、以上の動作は要素数
と同じ回数だけ繰り返され、そののちに線Ｌ１３０７に
て送信制御回路１３０７へ信号が送られることになる。Since the number of vector data elements is set in the register 1312 at the beginning of the instruction execution process, the above operation is repeated the same number of times as the number of elements, and then a signal is sent to the transmission control circuit 1307 via line L1307. It turns out.

送信制御回路１３０７はこの信号が到着すると線Ｌ１３
０８にてプログラムカウンタの内容を更新する。以上が
ＶＳＥＮＤ命令の動作である。When this signal arrives, the transmission control circuit 1307 transmits the signal to the line L13.
At step 08, the contents of the program counter are updated. The above is the operation of the VSEND instruction.

次にＶＲＥＣＥＩＶＥ命令を説明する。Next, the VRECEIVE command will be explained.

ＶＲＥＣＥ　ＩＶＥ命令は、この命令を実行したプロセ
ッサエレメントのローカルメモリから有効なデータを読
み出し、ベクトルレジスタに格納する命令である。第１
７図にＶＲＥＣＥＩＶＥ命令のフォーマットを示す。Ｖ
ＲＥＣＥ　ＩＶＥ命令はオペランドを２つ持つ。The VRECE IVE instruction is an instruction that reads valid data from the local memory of the processor element that executed this instruction and stores it in a vector register. 1st
Figure 7 shows the format of the VRECEIVE command. V
The RECE IVE instruction has two operands.

１、ベースアドレス２、ベクトルレジスタ番号第１オペランドは命令フォーマットのＲ１フィールドで
指定される汎用レジスタに、第２オペランドはＶＲ２フ
ィールドに格納されている。この命令は、第１オペラン
ドで指定されるアドレスから始まる連続領域にある有効
なデータを順次読み出し、第２オペランドで指定される
ベクトルレジスタに格納することを意味する。なお、ベ
クトルデータの要素数はあらかじめ別の命令で汎用レジ
スタ群３４内の特定のレジスタに格納されているものと
する。この命令を実行するとき、本実施例の並列計算機
は次のように動作する。1, base address 2, vector register number The first operand is stored in the general-purpose register specified by the R1 field of the instruction format, and the second operand is stored in the VR2 field. This instruction means to sequentially read valid data in a continuous area starting from the address specified by the first operand and store it in the vector register specified by the second operand. It is assumed that the number of elements of the vector data is stored in advance in a specific register in the general-purpose register group 34 by another instruction. When executing this instruction, the parallel computer of this embodiment operates as follows.

まずプロセッサ１３の命令解読制御１３０１は命令レジ
スタ３６のフィールド３６−２に格納されている値（レ
ジスタ番号）を線Ｌ１３０２にて汎用レジスタ群３４に
、フィールド３６−３に格納されている値（ベクトルレ
ジスタ番号）を線Ｌ１３０３にてベクトルレジスタ群１
３０３に伝える。また、ＶＲＥＣＥＩＶＥ命令を実行す
ることを意味する信号を線Ｌｌ　３０９にてメモリアク
セス回路１３０２に伝え、これを起動する。すると第１
オペランドはＭＬ１３ｏｔ、セレクタ１３１５を経由し
てレジスタ１３１０に、ベクトルデータの要素数は線Ｌ
１３０１．セレクタ１３１６を経由してレジスタ１３１
２に格納される。First, the instruction decoding control 1301 of the processor 13 transfers the value (register number) stored in the field 36-2 of the instruction register 36 to the general-purpose register group 34 via line L1302, and transfers the value (vector register number) on line L1303 to vector register group 1.
Tell 303. Further, a signal indicating execution of the VRECEIVE instruction is transmitted to the memory access circuit 1302 via line L1 309 to activate it. Then the first
The operand is stored in the register 1310 via the ML13ot and selector 1315, and the number of vector data elements is stored in the line L.
1301. Register 131 via selector 1316
2.

続いて以下の動作を繰り返す。Then repeat the following actions.

まず、アドレス生成回路１３０６内のレジスタ１３１Ｏ
の内容が線Ｌ１３１５を経由してセレクタ１３０９から
、メモリアクセス回路１３０２が発生するデータ読み出
し要求信号が線Ｌｌ　３０５を経由してＯＲ回路３１か
ら出力され、これをあわせて線Ｌ８としてローカルメモ
リ６の第３のボートに伝えられる。するとローカルメモ
リ６は線Ｌ　９にタグの値を、線ＬＩＯにデータを出力
する。First, register 131O in address generation circuit 1306
The content of is outputted from the selector 1309 via the line L1315, the data read request signal generated by the memory access circuit 1302 is outputted from the OR circuit 31 via the line L1305, and the data read request signal generated by the memory access circuit 1302 is outputted from the OR circuit 31 via the line L1305. Transferred to third boat. Then, the local memory 6 outputs the tag value on the line L9 and the data on the line LIO.

線Ｌ９に出力されたタグの値はメモリアクセス回路１３
０２に入力されるが、この値がＯのときにはメモリアク
セス回路１３０２は線Ｌ１３０５にメモリ読み出し要求
信号を再び発生し、ローカルメモリ６を読み出す。線Ｌ
９に出力された値が１のときには、メモリアクセス回路
１３０２は線Ｌ１３２１にてベクトルレジスタ群１３０
３に、線ＬＩＯに出力されているデータを第２オペラン
ドで指定されたベクトルレジスタに書き込むように指示
し、線Ｌ１３０６にてアドレス生成回路１３０６内のレ
ジスタ１３１０．１３１２の内容をそれぞれインクリメ
ント回路１３１１によって１増加、デイグリメント回路
１３１３によって１減少させる。ディクリメント回路１
３１３による１減少の結果レジスタ１３１２の内容がＯ
になる場合には、０検出回路１３１４が線Ｌ１３０７に
てメモリアクセス回路１３０２に通知する。線Ｌ９に出
力された値が１のときには無効化回路３３内のＡＮＤ回
路３００の出力が１になるので、ここに示したメモリア
クセス回路１３０２の動作と並行して、線Ｌ１３１５．
セレクタ１３１７経由で線Ｌ６に出力されている値をア
ドレスとして、線Ｌ７に値０とＡＮＤ回路３００の出力
をあわせてそれぞれタグ書き込みデータと書き込み要求
信号としてローカルメモリ６の第２のポートに伝える。The tag value output to line L9 is the memory access circuit 13.
02, but when this value is O, the memory access circuit 1302 again generates a memory read request signal on the line L1305 and reads the local memory 6. Line L
9 is 1, the memory access circuit 1302 connects the vector register group 130 to the line L1321.
3, instructs to write the data output on line LIO to the vector register specified by the second operand, and writes the contents of registers 1310 and 1312 in address generation circuit 1306 by increment circuit 1311 on line L1306. It is increased by 1 and decreased by 1 by the degreement circuit 1313. Decrement circuit 1
As a result of decrementing by 1 by 313, the contents of register 1312 are O.
If so, the 0 detection circuit 1314 notifies the memory access circuit 1302 on line L1307. When the value output to line L9 is 1, the output of AND circuit 300 in invalidation circuit 33 becomes 1, so in parallel with the operation of memory access circuit 1302 shown here, line L1315.
Using the value output to the line L6 via the selector 1317 as an address, the value 0 and the output of the AND circuit 300 are transmitted to the second port of the local memory 6 on the line L7 as tag write data and a write request signal, respectively.

それによって線Ｌ６に出力されている値をアドレスにす
る語のタグがＯになる。なお、この操作は上述のレジス
タ１３１２の内容のインクリメントに先行して行なうよ
うに制御されている。As a result, the tag of the word whose address is the value output on line L6 becomes O. Note that this operation is controlled to be performed prior to incrementing the contents of register 1312 described above.

命令実行過程の最初にレジスタ１３１２にベクトルデー
タの要素数がセットしであるので、以上の動作は要素数
と同じ回数だけ繰り返され、そののちに線Ｌ１３０７に
てメモリアクセス回路１３０２へ信号が送られることに
なる。メモリアクセス回路１３０２はこの信号が到着す
ると線Ｌ１３０８にてプログラムカウンタの内容を更新
する。以上がＶＲＥＣＥ　ＩＶＥ命令の動作である。Since the number of vector data elements is set in the register 1312 at the beginning of the instruction execution process, the above operation is repeated the same number of times as the number of elements, and then a signal is sent to the memory access circuit 1302 via line L1307. It turns out. When this signal arrives, memory access circuit 1302 updates the contents of the program counter on line L1308. The above is the operation of the VRECE IVE command.

本実施例に係る並列計算機を構成する各プロセッサエレ
メントは、他のプロセッサエレメントとデータの受は渡
しをしないときには通常のベクトル計算機と同様に動作
する。他のプロセッサエレメントとのデータの受は渡し
が必要になる場合には、データを受ける側のプロセッサ
エレメントのローカルメモリ内の、データの受は渡しに
用いる領域（この領域は、プログラマあるいはコンパイ
ラがあらかじめ定めておく）に対して、データを渡す側
のプロセッサエレメントが上述のＶＳＥＮＤ命令でデー
タを書き込むように、データを受ける側のプロセッサエ
レメントがＶＲＥＣＥＩＶＥ命令にてデータを読み出す
ようにプログラムしておく。このようにすることで、第
１の実施例と同じ効果を得られるとともに、第１の実施
例に比べてより高速にデータの受は渡しが行なえるよう
になる。Each processor element constituting the parallel computer according to this embodiment operates in the same manner as a normal vector computer when it does not receive or exchange data with other processor elements. When it is necessary to pass data to and from other processor elements, the area used for data passing in the local memory of the processor element receiving the data (this area is created in advance by the programmer or compiler). (predetermined), the processor element on the data transfer side writes data using the above-mentioned VSEND instruction, and the processor element on the data receiving side reads data using the VRECEIVE instruction. By doing so, the same effects as in the first embodiment can be obtained, and data can be received and delivered at a higher speed than in the first embodiment.

以上では、ＶＳＥＮＤ命令、ＶＲＥＣＥ　ＩＶＥ命令が
アクセスするローカルメモリ６の領域は連続領域になっ
ているが、アドレス生成回路１３０６内のインクリメン
ト回路１３１１で増加させる値をプログラム可能にすれ
ば、ローカルメモリ上に等間隔に並んだデータ群を扱え
るようになる。In the above, the area of the local memory 6 accessed by the VSEND instruction and the VRECE IVE instruction is a continuous area, but if the increment circuit 1311 in the address generation circuit 1306 is made programmable, the increment value can be programmed. Be able to handle data groups arranged at equal intervals.

さらに柔軟にするためには次のようなＶ　Ｓ　Ｅ　Ｎ　Ｄ　Ｌ命令、ＶＲＥＣＥ　ＩＶＥＬ命
令を導入すればよい。For further flexibility, the following V S E N D L command and VRECE IVEL command may be introduced.

ＶＳＥＮＤＬ命令のフォーマットを第２３図に示す。Ｖ
ＳＥＮＤＬ命令はオペランドを３つ持つ。The format of the VSENDL command is shown in FIG. V
The SENDL instruction has three operands.

■、あて先２、アドレスベクトル３、ベクトルデータ各オペランドは、それぞれ命令フォーマットのＲ１で指
定される汎用レジスタ、ＶＲ２，ＶＲ３で指定されるベ
クトルレジスタに格納されている。(2) Destination 2, Address Vector 3, Vector Data Each operand is stored in a general-purpose register designated by R1 of the instruction format, and a vector register designated by VR2 and VR3, respectively.

この命令は、第１オペランドで指定されるプロセッサエ
レメントのローカルメモリの、第２オペランドのベクト
ルの第ｊ要素（ｊ＝１　ｒ　２＋・・・。This instruction writes the j-th element (j=1 r 2+ . . . ) of the vector of the second operand in the local memory of the processor element specified by the first operand.

ベクトルデータの要素数）で指定されるアドレスに第３
オペランドのベクトルの第Ｊ要素を書き込むことを意味
する。なお、ベクトルデータの要素数はあらかじめ別の
命令で汎用レジスタ群の中の特定のレジスタに格納され
ているものとする。The third address is specified by the number of elements of vector data.
This means writing the Jth element of the operand vector. It is assumed that the number of elements of the vector data is stored in advance in a specific register in the general-purpose register group by another instruction.

この命令は上述のＶＳＥＮＤ命令とほぼ同様に実行でき
るが、以下に述べる点が異なる。This instruction can be executed in substantially the same way as the VSEND instruction described above, but differs in the following points.

ＶＳＥＮＤ命令では、送信器５のレジスタ５０のフィー
ルド５０−２にセットする値としてはアドレス生成回路
１３０６の生成する値を用い、これを線Ｌ１３１５．セ
レクタ１３０９．線Ｌ１３２０、線Ｌ２１を経由して５
０−２にセットするようになっていたが、ＶＳＥＮＤＬ
命令では、あらかじめ命令レジスタ３６のフィールド３
６−３の内容で指定されるベクトルレジスタ群１３０３
内のベクトルレジスタを線Ｌ　１３０３経由で選択して
おき、このベクトルレジスタの内容を線Ｌ１３１４．セ
レクタ１３０９．線Ｌ１３２０、線Ｌ２１を経由して５
０−２にセットするようにすればよい。In the VSEND command, the value generated by the address generation circuit 1306 is used as the value set in the field 50-2 of the register 50 of the transmitter 5, and this is set on the line L1315. Selector 1309. 5 via line L1320 and line L21
It was supposed to be set to 0-2, but VSENDL
In the instruction, field 3 of the instruction register 36 is set in advance.
Vector register group 1303 specified by the contents of 6-3
The vector register within is selected via line L1303, and the contents of this vector register are displayed on line L1314. Selector 1309. 5 via line L1320 and line L21
It may be set to 0-2.

続いてＶＲＥＣＥ　Ｉ　ＶＥＬ命令のフォーマットを第
２２図に示す。ＶＲＥＣＥＩＶＥＬ命令はオペランドを
２つ持つ。Next, the format of the VRECE I VEL command is shown in FIG. The VRECEIVEL instruction has two operands.

■、アドレスベクトル２、ベクトルレジスタ番号各オペランドは、それぞれ命令フォーマットの■Ｒ１フ
ィールドで指定されるベクトルレジスタ、ＶＲ２フィー
ルドにある。この命令は、第１オペランドのベクトルの
第ｊ要！’　（Ｊ　＝　１　＋　２＋・・・。(2) Address vector 2, Vector register number Each operand is located in the vector register and VR2 field specified by (1) R1 field of the instruction format, respectively. This instruction returns the jth element of the vector of the first operand! '(J = 1 + 2+...

ベクトルデータの要素数）で指定されるアドレスから有
効なデータを読み出し、第２オペランドのベクトルレジ
スタの第ｊ要素に書き込むことを意味する。なお、ベク
トルデータの要素数はあらかじめ別の命令で汎用レジス
タ群の中の特定のレジスタに格納されているものとする
。This means reading valid data from the address specified by (number of elements of vector data) and writing it to the j-th element of the vector register of the second operand. It is assumed that the number of elements of the vector data is stored in advance in a specific register in the general-purpose register group by another instruction.

この命令は上述のＶ　ＲＥ　ＣＥ　Ｉ　Ｖ　Ｅ命令とほ
ぼ同様に実行できるが、以下に述べる点が異なる。This instruction can be executed in much the same way as the V RE CE I VE instruction described above, but differs in the following points.

ＶＲＥＣＥ　ＩＶＥ命令では、ローカルメモリ６の読み
出しのアドレスとしてはアドレス生成回路１３０６の生
成する値を用い、これを線Ｌ１３１５、セレクタ１３０
９．線Ｌ８を経由して、あるいは線Ｌ１３１５．セレク
タ１３１７．線Ｌ６を経由してローカルメモリ６にアド
レスを伝えるようになっていたが、ＶＲＥＣＥＩＶＥＬ
命令では、あらかじめ命令レジスタ３６のフィールド３
６−２の内容で指定されるベクトルレジスタ群１３０３
内のベクトルレジスタを線Ｌ１３０２経出で選択してお
き、このベクトルレジスタの内容を線Ｌ１３１４．セレ
クタ１３０９．線Ｌ８を経由して、あるいは線Ｌ１３１
５．セレクタ１３１７、線Ｌ６を経由してローカルメモ
リ６に伝えるようにすればよい。In the VRECE IVE instruction, the value generated by the address generation circuit 1306 is used as the address for reading from the local memory 6, and this is sent to the line L1315 and the selector 130.
9. via line L8 or via line L1315. Selector 1317. The address was to be transmitted to local memory 6 via line L6, but VRECEIVEL
In the instruction, field 3 of the instruction register 36 is set in advance.
Vector register group 1303 specified by the contents of 6-2
The vector register in line L1302 is selected, and the contents of this vector register are written in line L1314. Selector 1309. via line L8 or line L131
5. The information may be transmitted to the local memory 6 via the selector 1317 and the line L6.

以上によれば、あらかじめローカルメモリ６のアドレス
の列をベクトルレジスタ群１３０３の中のベクトルレジ
スタにセットしておき、これをＶＳＥＮＤＬ命令の第２
オペランド、あるいはＶＲＥＣＥＩＶＥＬ命令の第１オ
ペランドに指定してそれぞれの命令を実行することによ
って、ローカルメモリ６上に並んだ任意のデータ群をプ
ロセッサエレメント間で効率よく、かつ、参照順序を保
証しつつ受は渡すことができるようになる。According to the above, the address column of the local memory 6 is set in advance in the vector register in the vector register group 1303, and this is set in the second register of the VSENDL instruction.
By specifying the operand or the first operand of the VRECEIVEL instruction and executing each instruction, any data group arranged in the local memory 6 can be efficiently received between processor elements while guaranteeing the reference order. will be able to be passed.

〔Effect of the invention〕

本発明によれば、ローカルメモリを持つ複数のプロセッ
サエレメントからなり、他のプロセッサエレメントから
ローカルメモリにデータを書き込める並列計算機におい
て、プロセッサ間でデータを受は渡すときに発生するオ
ーバヘッドをいちじるしく削減することが可能になり、
並列計算機が高い性能を発揮できるようになる。According to the present invention, it is possible to significantly reduce the overhead that occurs when receiving and passing data between processors in a parallel computer that is composed of a plurality of processor elements each having a local memory and in which data can be written to the local memory from other processor elements. becomes possible,
Parallel computers will be able to demonstrate high performance.

[Brief explanation of the drawing]

第１図は本発明の第１の実施例の並列計算機の全体構成
図、第２図は第１の実施例を構成するプロセッサの詳細
図、第３図はＲＥＣＥ　ＩＶＥ命令のフォーマットを示
す図、第４図は５ＥＮＤ命令のフォーマットを示す図、
第５図は本発明の第２の実施例の並列計算機の全体構成
図、第６図は第２の実施例を構成するプロセッサの詳細
図、第７図は第２の実施例の並列計算機のアーキテクチ
ャから見えるメモリ空間を示す図、第８図は本発明の第
３の実施例の並列計算機のアーキテクチャから見える２
つのメモリ空間の関係を示す図、第９図は本発明の第３
の実施例の並列計算機の全体構成図、第１０図は第３の
実施例を構成するプロセッサの詳細図、第１１図は第３
の実施例にて用いるアドレス変換回路のひとつを示す図
、第１２図は第３の実施例にて用いるアドレス変換回路
のもうひとつを示す図、第１３図は第３の実施例の別の
実施態様である並列計算機の全体構成図、第１４図は本
発明の第４の実施例の並列計算機のアーキテクチャから
見える２つのメモリ空間の関係を示す図、第１５図は第
４の実施例にて用いるアドレス変換回路のひとつを示す
図、第１６図は第４の実施例にて用いるアドレス変換回
路のもうひとつを示す図、第１７図はＶＲＥＣＥＩＶＥ
命令のフォーマットを示す図、第１８図はＶＳＥＮＤ命
令のフォーマットを示す図、第１９図は本発明の第５の
実施例の並列計算機の全体構成図、第２０図は第５の実
施例を構成するプロセッサの詳細図、第２１図は第５の
実施例にて用いるアドレス生成回路を示す図、第２２図
はＶＲＥＣＥＩＶＥＬ命令のフォーマットを示す図、第２
３図はＶＳＥＮＤＬ命令のフォーマットを示す図である
。５、符号の説明１・・・プロセッサエレメント、２　・ネットワーク、
３・・プロセッサ、４・・・受信器、５・・・送信器、
６・・ローカルメモリ、３０・・・命令フェッチ回路、
３１・・ＯＲ回路、３２−４ＬＵ、３３・・無効化回路
、３４・・・汎用レジスタ群、３５・・・プログラムカ
ウンタ、３６・・命令レジスタ、３７・・・命令解読制
御、３８・・メモリアクセス回路、３００・・・ＡＮＤ
回路、４０・・・レジスタ、５０・・・レジスタ、４１
・・書き込み制御。FIG. 1 is an overall configuration diagram of a parallel computer according to a first embodiment of the present invention, FIG. 2 is a detailed diagram of a processor constituting the first embodiment, and FIG. 3 is a diagram showing the format of a RECE IVE instruction. FIG. 4 is a diagram showing the format of the 5END instruction,
FIG. 5 is an overall configuration diagram of a parallel computer according to a second embodiment of the present invention, FIG. 6 is a detailed diagram of a processor constituting the second embodiment, and FIG. 7 is a diagram of a parallel computer according to a second embodiment. FIG. 8 is a diagram showing the memory space seen from the architecture.
FIG. 9 is a diagram showing the relationship between two memory spaces.
10 is a detailed diagram of the processor configuring the third embodiment, and FIG. 11 is a detailed diagram of the processor configuring the third embodiment.
12 is a diagram showing another address translation circuit used in the third embodiment, and FIG. 13 is another implementation of the third embodiment. FIG. 14 is a diagram showing the relationship between two memory spaces seen from the architecture of a parallel computer according to the fourth embodiment of the present invention, and FIG. 15 is a diagram showing the relationship between two memory spaces in the fourth embodiment. A diagram showing one of the address conversion circuits used, FIG. 16 is a diagram showing another address conversion circuit used in the fourth embodiment, and FIG. 17 is a diagram showing VRECEIVE.
18 is a diagram showing the format of the VSEND instruction, FIG. 19 is an overall configuration diagram of a parallel computer according to the fifth embodiment of the present invention, and FIG. 20 is a diagram showing the configuration of the fifth embodiment. 21 is a diagram showing the address generation circuit used in the fifth embodiment. FIG. 22 is a diagram showing the format of the VRECEIVEL instruction.
FIG. 3 is a diagram showing the format of the VSENDL instruction. 5. Explanation of symbols 1... Processor element, 2 ・Network,
3...processor, 4...receiver, 5...transmitter,
6... Local memory, 30... Instruction fetch circuit,
31...OR circuit, 32-4LU, 33...invalidation circuit, 34...general-purpose register group, 35...program counter, 36...instruction register, 37...instruction decoding control, 38...memory Access circuit, 300...AND
circuit, 40... register, 50... register, 41
...Write control.

Claims

[Claims] 1. Consisting of a plurality of processor elements having local memory and capable of operating independently, and a network connecting the plurality of processor elements, each of the plurality of processor elements is connected to the other processor elements. In a parallel computer capable of writing data into the local memory via the network, a tag is provided to a part or all of the words in the local memory of the plurality of processor elements, and the tag attached to the word is an enabling means for indicating whether data held by a word is valid or invalid and setting the content of a tag attached to the word to indicate the validity when data is written to the word from an arbitrary processor element; 1. A parallel computer comprising tag access means for inspecting the content of a tag attached to a word and repeating the inspection for the word until the tag indicates the validity. 2. In a parallel computer consisting of a plurality of processor elements that have local memory and can operate independently and a network that connects the plurality of processor elements,
Each of the plurality of processor elements is provided with a reception memory that can be written to from any processor element via the network, a tag is provided to some or all of the words constituting the reception memory, and a tag is attached to the word. The tag indicates whether the data held by the word is valid or invalid,
an enabling means for setting the content of a tag attached to the word to indicate the validity when data is written to the word from an arbitrary processor element; and an enabling means for inspecting the content of the tag attached to the word to confirm that the tag is A parallel computer comprising: tag access means for repeating checking of the word until the word is determined to be valid. 3. The logical address specified when writing data to the local memory in another processor element and the real address assigned to the local memory are different, and the logical address specified when writing data to the local memory in another processor element is different. A writing address conversion circuit that converts an address into a real address, and a read address conversion circuit that converts a logical address into a real address when reading data written using the address conversion circuit. A parallel computer according to claim 1. 4. The scope of the claim is characterized in that an address generation means is provided for sequentially generating an address specified when writing data to a local memory in another processor element and an address specified when reading data from the local memory. Parallel computer according to item 1.