JPH06259581A

JPH06259581A - Parallel computer

Info

Publication number: JPH06259581A
Application number: JP4034193A
Authority: JP
Inventors: Sanehisa Doi; 実久土肥
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-03-02
Filing date: 1993-03-02
Publication date: 1994-09-16

Abstract

PURPOSE:To accelerate an arithmetic operation by providing an internal register file consisting of plural registers and a pointer register to designate the register of the internal register file at each processor element(PE). CONSTITUTION:The internal register 35 provided at each of the PEs 4-11, 4-21 to 4-23 is constituted of the plural registers, and stores data for the arithmetic operation. The pointer register 30 is to designate the register of the internal register file 35, and the content of the pointer register 30 is designated by an instruction, and the content of memory and that of the instruction are stored in it. By employing such constitution, it is possible to change the content of the pointer register 30 even when the same instruction is issued. therefore, to make access the different addresses of the internal register file 35 and to maintain the degree of freedom of a program. Also, data with arbitrary address in the memory can be accessed by transferring the data from the memory of the PEs 4-11, 4-21 to 4-23 to the internal register file 35 simultaneously comprehensively.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は並列計算機に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel computer.

【０００２】[0002]

【従来の技術】近年のコンピュータシステムの高速・大
容量化の要求に伴い、プロセッサを多数並行して設け、
処理を分散して動作させることにより、高い処理能力と
信頼性を実現させる技術が要求されている。このため、
簡単な構成からなるプロセッサエレメント（以下ＰＥと
略する）を多数接続した、いわゆる超並列計算機が提供
されている。2. Description of the Related Art With the recent demand for high speed and large capacity computer systems, a large number of processors are provided in parallel,
There is a demand for a technology that realizes high processing capacity and reliability by operating the processing in a distributed manner. For this reason,
A so-called massively parallel computer is provided in which a large number of processor elements (hereinafter abbreviated as PE) having a simple configuration are connected.

【０００３】このような超並列計算機には、同一のプロ
グラムですべてのＰＥが同一の命令で各々に与えられた
データを処理するＳＩＭＤ（シングルインストラクショ
ン・マルチデータ）型と、各々のＰＥが異なった命令を
実行するＭＩＭＤ型（マルチインストラクション・マル
チデータ）がある。In such a massively parallel computer, the PEs are different from the SIMD (single instruction multi data) type in which all PEs process the data given to each PE with the same instruction in the same program. There is a MIMD type (multi-instruction multi-data) that executes instructions.

【０００４】このような超並列計算機は個々の計算機を
小型化しないと、各計算機間や演算回路とメモリの間等
の距離によって、命令やデータの信号の伝播遅延により
実行時間の遅延を生ずる。In such a massively parallel computer, unless the individual computers are miniaturized, the execution time is delayed due to the propagation delay of the signal of the instruction or the data due to the distance between the computers or between the arithmetic circuit and the memory.

【０００５】そこで、各ＰＥは同一の物理的担体上に構
成して近接させると同時に信号端子数を減少させること
が超並列計算機の実現に重要である。ここで、同一の物
理的担体といっているのは、例えば同一の半導体チップ
やウェーハであり、同一の印刷配線基板である。Therefore, it is important for realizing a massively parallel computer that each PE is formed on the same physical carrier so as to be close to each other and at the same time reduce the number of signal terminals. Here, the same physical carrier refers to, for example, the same semiconductor chip or wafer and the same printed wiring board.

【０００６】従って、各ＰＥで同一命令を実行するＳＩ
ＭＤ型の場合には命令を供給する信号線を共通にするこ
とで信号端子を減少させることができ、同一の物理的担
体に実装することが比較的容易となる。しかし、多数の
ＰＥを集積するため、各ＰＥは単純な構成とする必要が
ある。Therefore, SI for executing the same instruction in each PE
In the case of the MD type, it is possible to reduce the number of signal terminals by making the signal line for supplying the command common, and it becomes relatively easy to mount them on the same physical carrier. However, since a large number of PEs are integrated, each PE needs to have a simple structure.

【０００７】一方、ＳＩＭＤ型超並列計算機の場合に
は、各ＰＥは必ず同一命令が供給され、同一命令サイク
ル内に終了する必要があるため、データによって処理を
変更しなければならない場合のために、選択命令を設け
ることが行われる。On the other hand, in the case of the SIMD type massively parallel computer, each PE is always supplied with the same instruction and must be completed within the same instruction cycle, so that the processing must be changed depending on the data. , A selection command is provided.

【０００８】図５は従来のＳＩＭＤ型の超並列計算機の
構成図である。図において、１は制御プロセッサであっ
て、複数の並列に設けられたＰＥに共通に命令と各メモ
リのアドレスを供給して、超並列計算機全体の制御を行
うものである。FIG. 5 is a block diagram of a conventional SIMD type massively parallel computer. In the figure, reference numeral 1 denotes a control processor, which supplies an instruction and an address of each memory commonly to a plurality of PEs provided in parallel to control the entire massively parallel computer.

【０００９】２は命令放送信号線であって、制御プロセ
ッサ1 が各ＰＥ4 に命令とメモリのアドレスを伝達する
信号線である。３はデータ収集信号線であって、各ＰＥ
が制御プロセッサ1 に応答する際に出力を出す信号線で
ある。Reference numeral 2 is an instruction broadcast signal line, which is a signal line through which the control processor 1 transmits an instruction and a memory address to each PE 4. 3 is a data collection signal line, and each PE
Is a signal line that outputs when responding to the control processor 1.

【００１０】４はＰＥであって、制御プロセッサ1 から
放送された命令に従って演算、論理動作、メモリ6 に対
する書込読出等を行うものである。このＰＥ4 は複数個
一つの半導体ウェーハー上に構成され、この半導体ウェ
ーハーを複数個使用することによって超並列計算機が構
成される。通常、１万個程度並列に接続する。Reference numeral 4 denotes a PE, which performs arithmetic operations, logical operations, writing / reading with respect to the memory 6, and the like in accordance with an instruction broadcast from the control processor 1. The PE4 is constructed on a plurality of semiconductor wafers, and a massively parallel computer is constructed by using the plurality of semiconductor wafers. Usually, about 10,000 pieces are connected in parallel.

【００１１】図６は従来のＰＥの構成図である。ここで
使用するＰＥは基本的には簡単なアーキテクチャーを採
用するが、選択命令の機能をも有するものである。図に
おいて、ＰＥ4 は、演算部5 とメモリ6 から構成され
る。FIG. 6 is a block diagram of a conventional PE. The PE used here basically adopts a simple architecture but also has a function of a selection instruction. In the figure, PE4 is composed of a computing unit 5 and a memory 6.

【００１２】５は演算部であって、制御部7 、ＡＬＵ8
、制御フラグ9 、出力制御回路13から構成される。６
はメモリであって、制御部7 から第一入力アドレスレジ
スタ10、第二入力アドレスレジスタ11、出力アドレスレ
ジスタ12の３つのアドレスを指定されて第一入力アドレ
スレジスタ、第二入力アドレスレジスタで指定されるア
ドレスの内容を読み出し、出力アドレスレジスタで指定
されるアドレスへデータの書込を行う３ポートメモリで
ある。Reference numeral 5 denotes an arithmetic unit, which includes a control unit 7 and an ALU8.
, A control flag 9, and an output control circuit 13. 6
Is a memory, which is designated by the control unit 7 by the first input address register 10, the second input address register 11, and the output address register 12 and is designated by the first input address register and the second input address register. It is a 3-port memory that reads the contents of the address to be written and writes the data to the address specified by the output address register.

【００１３】７は制御部であって、第一及び第二入力ア
ドレスレジスタ10,11 、出力アドレスレジスタ12を有
し、制御プロセッサ1 からの命令を解釈して演算論理ユ
ニット（以下ＡＬＵと略する）8 、制御フラグ9 及び出
力制御回路13を制御するものである。A control unit 7 has first and second input address registers 10 and 11 and an output address register 12, and interprets an instruction from the control processor 1 to calculate an arithmetic logic unit (hereinafter abbreviated as ALU). ) 8, the control flag 9, and the output control circuit 13.

【００１４】８はＡＬＵであって、第一入力の内容と第
二入力の内容とを命令によって指定された数値演算、論
理演算を行うものである。９は制御フラグであって、Ａ
ＬＵ8 の演算結果や制御部7 によってセット若しくはリ
セットされるフリップフロップである。Reference numeral 8 denotes an ALU, which is used to perform a numerical operation and a logical operation on the contents of the first input and the contents of the second input, which are designated by an instruction. A control flag 9 is A
It is a flip-flop that is set or reset by the operation result of LU8 and the control unit 7.

【００１５】１０、１１は第一及び第二入力アドレスレ
ジスタであって、メモリ6 の読出アドレスを指定するも
のである。１２は出力アドレスレジスタであって、メモ
リ6の書込アドレスを指定するものである。Reference numerals 10 and 11 denote first and second input address registers for designating a read address of the memory 6. An output address register 12 designates a write address of the memory 6.

【００１６】１３は出力制御回路であって、制御プロセ
ッサ1 からの指令によってＰＥの出力の内容をデータ収
集信号線3 に送出するものである。１４はアドレスデコ
ーダであって、３つのアドレスレジスタの出力をデコー
ドしてメモリに供給するものであり、メモリ6 と一体と
して構成される。An output control circuit 13 sends the contents of the PE output to the data collection signal line 3 in response to a command from the control processor 1. An address decoder 14 decodes the outputs of the three address registers and supplies it to the memory, and is integrated with the memory 6.

【００１７】このような構成の超並列計算機の動作につ
いて説明する。制御プロセッサ1 によって各ＰＥ4 のメ
モリ6 の内容は図示されてない書込回路によって書き込
まれている。The operation of the massively parallel computer having such a configuration will be described. The contents of the memory 6 of each PE 4 are written by the control processor 1 by a write circuit (not shown).

【００１８】制御プロセッサ1 から命令が命令放送信号
線によって各ＰＥ4 に与えられる。各ＰＥ4 は命令のア
ドレス部分を３つのアドレスレジスタに設定し、また、
命令の種類により、メモリ6 の内容や、ＡＬＵ8 の演算
結果を３つのアドレスレジスタに設定してメモリ6 に供
給する。従って制御プロセッサ1 からの命令は同一であ
っても、実際のメモリ6 に供給するアドレス自体は別の
ものとすることができる。A command is given from the control processor 1 to each PE 4 by a command broadcasting signal line. Each PE4 sets the address part of the instruction in three address registers, and
Depending on the type of instruction, the contents of memory 6 and the operation result of ALU8 are set in three address registers and supplied to memory 6. Therefore, even if the instruction from the control processor 1 is the same, the address itself supplied to the actual memory 6 can be different.

【００１９】このことによって、プログラムの自由度を
増加させて、超並列計算機の処理の対象を拡大すること
ができる。しかし、この方式では各ＰＥから各メモリ6
に３つのアドレスを伝達することが必要となり、そのた
めに多数のアドレス信号端子や配線が必要となる。この
信号端子や配線をまかなうために、例えば一つの物理的
担体に収容できるプロセッサの数が制限されるというこ
とが生じ、より高速の計算機の実現に障害となってい
た。As a result, the degree of freedom of the program can be increased and the processing target of the massively parallel computer can be expanded. However, in this method, each PE is connected to each memory 6
Therefore, it is necessary to transmit three addresses to each other, which requires a large number of address signal terminals and wiring. Since the number of processors that can be accommodated in one physical carrier is limited in order to cover the signal terminals and wiring, it has been an obstacle to the realization of a faster computer.

【００２０】[0020]

【発明が解決しようとする課題】従来の超並列計算機で
は、各ＰＥから各メモリにアドレスを伝達することが必
要となり、アドレス信号端子や配線が必要となってい
た。また、各メモリでは別々のアドレスであるため別々
にデコードを行っていた。In the conventional massively parallel computer, it is necessary to transmit an address from each PE to each memory, and an address signal terminal and wiring are required. Further, since each memory has a different address, decoding is performed separately.

【００２１】本発明はこのような点にかんがみて、ＳＩ
ＭＤ型の超並列計算機のアドレス信号端子や配線を減少
させ、メモリのデコーダを共用することができ、且つプ
ログラムの自由度を損なわない手段を提供することを目
的とする。In view of such a point, the present invention makes SI
An object of the present invention is to provide a means capable of reducing the number of address signal terminals and wiring of an MD type massively parallel computer, sharing a decoder of a memory, and not impairing the degree of freedom of programming.

【００２２】[0022]

【課題を解決するための手段】上記の課題は下記の如く
に構成された超並列計算機によって解決される。図１
は、本発明の原理図である。The above problems can be solved by a massively parallel computer configured as described below. Figure 1
FIG. 3 is a principle diagram of the present invention.

【００２３】同一の命令に従い演算動作を行う複数のプ
ロセッサエレメント4 から構成される超並列計算機にお
いて、各プロセッサエレメント4 に対して複数のレジス
タからなる内部レジスタファイル35と、該内部レジスタ
ファイル35のレジスタを指定するポインタレジスタ30と
を設けるように構成する。In a massively parallel computer composed of a plurality of processor elements 4 that perform arithmetic operations according to the same instruction, an internal register file 35 consisting of a plurality of registers for each processor element 4 and the registers of the internal register file 35. And a pointer register 30 for designating.

【００２４】[0024]

【作用】各プロセッサエレメント4 に対して設けられた
内部レジスタファイル35は、複数のレジスタからなり、
演算のためのデータを格納する。[Function] The internal register file 35 provided for each processor element 4 is composed of a plurality of registers,
Stores data for calculation.

【００２５】ポインタレジスタ30は該内部レジスタファ
イル35のレジスタを指定するものであり、ポインタレジ
スタ30の内容は命令で指定され、メモリ6 の内容、命令
の内容等が格納される。The pointer register 30 designates a register of the internal register file 35. The contents of the pointer register 30 are designated by an instruction, and the contents of the memory 6, the contents of the instruction and the like are stored.

【００２６】この構成をとることによって、命令が同一
であっても、ポインタレジスタ30の内容を変化させ、従
って内部レジスタファイル35の異なったアドレスをアク
セスでき、プログラムの自由度を維持することができ
る。By adopting this configuration, even if the instructions are the same, the contents of the pointer register 30 are changed, so that different addresses of the internal register file 35 can be accessed and the degree of freedom of the program can be maintained. .

【００２７】また、該内部レジスタファイル35に対して
は各ＰＥ同時にメモリ6 から一括してデータを転送する
ことでメモリ6 の任意の番地のデータをアクセスするこ
とができる。Further, by simultaneously transferring data to the internal register file 35 from the memory 6 for each PE, the data at any address of the memory 6 can be accessed.

【００２８】[0028]

【実施例】本願発明は各ＰＥの内部にレジスタファイル
を設けることによって、各ＰＥのメモリアドレスを指定
するアドレス信号を共通化し、アドレス信号端子や配線
を減少させ、アドレスデコーダを共用することができ、
且つプログラムの自由度を損なわない構成とするもので
ある。以下に図面を用いて説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS According to the present invention, by providing a register file inside each PE, it is possible to share an address signal designating a memory address of each PE, reduce address signal terminals and wiring, and share an address decoder. ,
Moreover, the configuration is such that the degree of freedom of the program is not impaired. It will be described below with reference to the drawings.

【００２９】図２は本発明の実施例のＰＥの構成図であ
る。図において、３５は内部レジスタファイルであっ
て、複数のレジスタからなり、演算のためのデータを格
納するものであり、３ポートメモリを使用している。３
０はポインタレジスタであって、内部レジスタファイル
35のレジスタを指定するものである。これを以下ではポ
インタレジスタ30は内部レジスタファイル35の番地を指
定するという。FIG. 2 is a block diagram of the PE of the embodiment of the present invention. In the figure, reference numeral 35 denotes an internal register file, which is composed of a plurality of registers and stores data for calculation, and uses a 3-port memory. Three
0 is a pointer register and is an internal register file
It specifies 35 registers. In the following, the pointer register 30 specifies the address of the internal register file 35.

【００３０】３６は内部レジスタ選択回路であって、演
算部5 の制御部7 で受けられた命令のうち、内部レジス
タを指定する部分とポインタレジスタ30とを切り換え
て、内部レジスタファイルを選択するものである。詳細
は図３で示す。その他、図５と同一符号の物は同一物で
ある。Reference numeral 36 denotes an internal register selection circuit, which selects an internal register file by switching between the portion designating the internal register and the pointer register 30 in the instruction received by the control unit 7 of the arithmetic unit 5. Is. Details are shown in FIG. In addition, the same reference numerals as those in FIG. 5 are the same.

【００３１】図３は内部レジスタ選択回路の構成図であ
る。内部レジスタ選択回路36は３個のマルチプレクサ
（以下ＭＰＸと略する）37、38、39で構成される。各Ｍ
ＰＸは命令で指定される３つの内部レジスタファイルの
アドレスとポインタレジスタ30の内容とを切替える。３
つの内部レジスタファイルのアドレスとは、第一読出ア
ドレス、第二読出アドレス、書込アドレスであって、レ
ジスタ間演算命令のアドレス部分として、制御プロセッ
サ1 から命令放送信号線2 で配信されたものである。そ
の他、図６と同一符号の物は同一物である。FIG. 3 is a block diagram of the internal register selection circuit. The internal register selection circuit 36 is composed of three multiplexers (hereinafter abbreviated as MPX) 37, 38 and 39. Each M
The PX switches the address of the three internal register files specified by the instruction and the contents of the pointer register 30. Three
The addresses of one internal register file are the first read address, the second read address, and the write address, which are distributed from the control processor 1 via the instruction broadcast signal line 2 as the address part of the inter-register operation instruction. Is. In addition, the same reference numerals as those in FIG. 6 are the same.

【００３２】ここで各アドレスの長さは内部レジスタフ
ァイルの容量がすべてアクセスできるように定められ
る。レジスタの数が６４であれば、アドレスの長さは６
ビット必要になる。Here, the length of each address is determined so that the entire capacity of the internal register file can be accessed. If the number of registers is 64, the address length is 6
Bit needed.

【００３３】内部レジスタ選択回路36の３つの出力は内
部レジスタファイル35に供給される。内部レジスタファ
イル35は３ポートメモリで構成されており、２つのアド
レスの内容を並列に読み出して、１つのアドレス番地に
書き込むことができる。従って２つの独立した出力を得
ることができる。この２つの出力がＡＬＵ8 に対する２
つの入力となる。The three outputs of the internal register selection circuit 36 are supplied to the internal register file 35. The internal register file 35 is composed of a 3-port memory, and the contents of two addresses can be read in parallel and written into one address. Therefore, two independent outputs can be obtained. These two outputs are 2 for ALU8
One input.

【００３４】ＡＬＵ8 は制御部からの指令に従い第一読
出アドレスで指定された内部レジスタファイル35のデー
タと、第二読出アドレスで指定された内部レジスタファ
イル35のデータとを指定の演算を行い、書込アドレスで
指定された内部レジスタファイル35のデータとして書き
込む。The ALU 8 performs a designated operation on the data of the internal register file 35 designated by the first read address and the data of the internal register file 35 designated by the second read address in accordance with a command from the control unit. It is written as the data of the internal register file 35 specified by the write address.

【００３５】図４は本発明の実施例の超並列計算機の構
成図である。図において、２１はＰＥチップであって、
複数のＰＥ4 を搭載した半導体チップである。命令放送
信号線2 、データ収集信号線3 等は複数のＰＥ4 に対し
て共用に設けられて端子の節減を行っている。２２はメ
モリチップであって、複数のメモリ6 と共用のアドレス
デコーダ24を搭載した半導体チップである。制御プロセ
ッサ1 からのアドレス放送信号線は共用のアドレスデコ
ーダ24に入りその出力を各メモリ6 で共用する。従って
メモリチップ22は例えば４メガバイトの１６ビットとい
った多ビット構成のメモリになる。FIG. 4 is a block diagram of a massively parallel computer according to an embodiment of the present invention. In the figure, 21 is a PE chip,
It is a semiconductor chip equipped with multiple PE4. The command broadcast signal line 2, the data collection signal line 3 and the like are commonly provided for a plurality of PEs 4 to save terminals. Reference numeral 22 denotes a memory chip, which is a semiconductor chip having an address decoder 24 shared with a plurality of memories 6. The address broadcast signal line from the control processor 1 enters a shared address decoder 24 and its output is shared by each memory 6. Therefore, the memory chip 22 is a multi-bit memory such as 16 bits of 4 megabytes.

【００３６】このような構成の超並列計算機の動作につ
いて説明する。制御プロセッサ1 によって各ＰＥのメモ
リ6 の内容は図示されてない書込回路によって書き込ま
れている。The operation of the massively parallel computer having such a configuration will be described. The contents of the memory 6 of each PE are written by the control processor 1 by a write circuit (not shown).

【００３７】制御プロセッサ1 からブロック転送命令が
命令放送信号線によって各ＰＥに与えられる。各ＰＥは
メモリから送られてくるデータを各々の内部レジスタフ
ァイル35のデータとして書き込む。A block transfer command is given from the control processor 1 to each PE by a command broadcasting signal line. Each PE writes the data sent from the memory as the data of each internal register file 35.

【００３８】以降は制御プロセッサ1 から放送される命
令に従って各ＰＥは内部レジスタファイル35のデータを
処理する。このとき各ＰＥは命令を解読して第一読出ア
ドレス、第二読出アドレス、書込アドレスを内部レジス
タファイル35の位置を指定する３つのアドレスレジスタ
に設定する。Thereafter, each PE processes the data in the internal register file 35 in accordance with the instruction broadcast from the control processor 1. At this time, each PE decodes the instruction and sets the first read address, the second read address, and the write address in the three address registers that specify the position of the internal register file 35.

【００３９】また、第一読出アドレス、第二読出アドレ
ス、書込アドレスを命令により、ポインタレジスタ30の
内容と切り換える。ポインタレジスタ30の内容を予めメ
モリに格納したデータや処理の結果によって変化させる
ことによって、内部レジスタファイル35の内容を間接的
に指定することができる。The first read address, the second read address, and the write address are switched to the contents of the pointer register 30 by an instruction. The contents of the internal register file 35 can be indirectly designated by changing the contents of the pointer register 30 according to the data stored in advance in the memory and the result of the processing.

【００４０】例えば各ＰＥに与えられた数値データを正
規化する際に、ポインタレジスタ30の内容を最上位の"
1" のビット位置としておけば、制御プロセッサ1 から
放送される命令はポインタレジスタ30で指定された内部
レジスタファイル35の位置からシフトすることによっ
て、すべてのＰＥに与えられた数値データを一斉に正規
化することができるようになる。For example, when normalizing the numerical data given to each PE, the contents of the pointer register 30 are set to the uppermost "
If the bit position is set to "1", the instruction broadcast from the control processor 1 is shifted from the position of the internal register file 35 designated by the pointer register 30 to simultaneously normalize the numerical data given to all PEs. Will be able to

【００４１】このように制御プロセッサ1 からの命令は
同一であっても、実際の内部レジスタファイル35の指定
アドレス自体は別のものとすることができる。このこと
によって、プログラムの自由度を増加させて、超並列計
算機の処理の対象を拡大することができる。As described above, even if the instruction from the control processor 1 is the same, the actual designated address of the internal register file 35 can be different. As a result, the degree of freedom of the program can be increased and the processing target of the massively parallel computer can be expanded.

【００４２】また、このような構成にすることによっ
て、アドレス信号端子や配線を減少させ、メモリのデコ
ーダを共用することができる。従って本願発明を利用し
て複数のＰＥを一枚の半導体基板上に構成し、かつこの
ような半導体基板を複数設けることによって、高性能の
超並列計算機を構成することができる。Further, with such a structure, it is possible to reduce the number of address signal terminals and wirings and share the decoder of the memory. Therefore, a high-performance massively parallel computer can be configured by using the present invention to configure a plurality of PEs on one semiconductor substrate and providing a plurality of such semiconductor substrates.

【００４３】[0043]

【発明の効果】以上の説明から明らかなように本発明に
よれば以下のような著しい工業的効果がある。As is apparent from the above description, the present invention has the following remarkable industrial effects.

【００４４】各プロセッサに供給する命令は共通なの
で端子数が少なくて良く、高い集積度が実現できる。内部レジスタにデータを格納するため、アドレスの指
定ビット数が少なくなり、レジスタ間演算命令ができ
る。Since the instructions supplied to each processor are common, the number of terminals can be small and a high degree of integration can be realized. Since the data is stored in the internal register, the number of designated bits of the address is reduced, and inter-register operation instructions can be performed.

【００４５】内部だけで演算できるので、配線長が短
くなり演算速度の向上が期待できる。内部レジスタだけで演算を行うためＰＥが演算を行っ
ている際には、新たなデータをメモリ6 に書き込む動作
が並行して実行でき、高速化が可能になる。Since the calculation can be performed only inside, the wiring length can be shortened and the calculation speed can be improved. Since the operation is performed only by the internal register, when the PE is performing the operation, the operation of writing new data in the memory 6 can be executed in parallel, and the speed can be increased.

【００４６】内部レジスタをレジスタポインタで指定
できるのでプログラムの自由度が向上する。メモリのアドレスデコーダを共用することが可能とな
り、小型化に有利となる。Since the internal register can be designated by the register pointer, the degree of freedom of the program is improved. The memory address decoder can be shared, which is advantageous for downsizing.

【００４７】上記の結果高速演算が可能な超並列計算
機を構成できる。As a result of the above, it is possible to construct a massively parallel computer capable of high-speed computation.

[Brief description of drawings]

【図１】本発明の原理図FIG. 1 is a principle diagram of the present invention.

【図２】本発明の実施例のＰＥの構成図FIG. 2 is a block diagram of a PE according to an embodiment of the present invention.

【図３】内部レジスタ選択回路の構成図FIG. 3 is a configuration diagram of an internal register selection circuit.

【図４】本発明の実施例の超並列計算機の構成図FIG. 4 is a configuration diagram of a massively parallel computer according to the embodiment of this invention.

【図５】従来のＳＩＭＤ型の超並列計算機の構成図FIG. 5 is a block diagram of a conventional SIMD type massively parallel computer.

【図６】従来のＰＥの構成図FIG. 6 is a block diagram of a conventional PE

[Explanation of symbols]

１制御プロセッサ２命令放送信号
線３データ収集信号線４ＰＥ５演算部６メモリ７制御部８ＡＬＵ９制御フラグ１０第一入力アド
レスレジスタ１１第二入力アドレスレジスタ１２出力アドレ
スレジスタ１３出力制御回路１４アドレスデ
コーダ１５アドレス放送信号線２１ＰＥチップ２２メモリチッ
プ２４共用アドレスデコーダ３０ポインタレジスタ３５内部レジス
タファイル３６内部レジスタ選択回路３７〜３９ＭＰ
Ｘ1 Control Processor 2 Command Broadcast Signal Line 3 Data Collection Signal Line 4 PE 5 Arithmetic Unit 6 Memory 7 Control Unit 8 ALU 9 Control Flag 10 First Input Address Register 11 Second Input Address Register 12 Output Address Register 13 Output Control Circuit 14 Address Decoder 15 Address broadcast signal line 21 PE chip 22 Memory chip 24 Shared address decoder 30 Pointer register 35 Internal register file 36 Internal register selection circuit 37 to 39 MP
X

Claims

[Claims]

1. A parallel computer comprising a plurality of processor elements (4) which perform arithmetic operations according to the same instruction, and an internal register file (35) comprising a plurality of registers for each processor element (4), Pointer register (30) for specifying the register of the internal register file (35)
A parallel computer characterized by having and.