JPS63318635A

JPS63318635A - Data processor

Info

Publication number: JPS63318635A
Application number: JP15448987A
Authority: JP
Inventors: Hidekazu Matsumoto; 松本　秀和; Tetsuaki Nakamikawa; 哲明中三川; Hideo Inayoshi; 秀夫稲吉
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-06-23
Filing date: 1987-06-23
Publication date: 1988-12-27
Anticipated expiration: 2011-03-21
Also published as: JPH0827719B2

Abstract

PURPOSE:To execute a data processing at high speed by a simple hardware, by providing a waiting field on an instruction, and also, providing an execution waiting control circuit and a flag on an executing means and a storing means in a pipeline processor element. CONSTITUTION:Pipeline processor elements 101, 102, 103 and 104 consist of an instruction fetching unit 122, an executing unit 123 and a storing unit 124 of the same constitution component, and the unit 123 is provided with AND gates 203, 207, etc., for executing a waiting control. Also, the unit 124 is provided with a flag 209 by which the elements 101-104 become ON, when a processing of an instruction is ended, and a control circuit 208. In this state, when a waiting field (w) in an instruction decoded by an execution waiting control circuit consisting of the gates 203 and 207, etc., provided on the unit 123 is OFF, the execution of an instruction is started immediately. Also, when the field (w) is ON when the flag 209 become ON, the execution of the instruction is started. In such a way, a data processing can be executed at high speed by a simple hardware.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は並列形のデータ処理装置に係り、特に複数のパ
イプラインプロセサから成るデータ処理装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a parallel data processing device, and particularly to a data processing device comprising a plurality of pipeline processors.

[Conventional technology]

計算機高速化の手段として、パイプライン処理は有効な
手段であり、特に、単一のプロセスの高速化を図るため
の重要な技術となっている。しかし、単一のパイプライ
ン処理では、その性能向上に寄与するには限界がある。Pipeline processing is an effective means for speeding up a computer, and is an important technique in particular for speeding up a single process. However, there is a limit to how much a single pipeline process can contribute to improving performance.

このため、マイクロプログラムレベルでの並列処理を狙
った低レベル並列計算機が研究されている。これは１例
えば、第１３回コンピュータアーキテクチャ学会（１９
８６）第２８０頁から第２８９頁（３ｔ　ｈ　　Ｓｙｍ
ｐｏｓｉｕｍｏｎ　Ｃｏｍｐｕｔｅｒ　Ａｒｃｈｉｔｅ
ｃｔｕｒｅ、ｐｐ２８０−２８９．１９８６）に論じら
れている。For this reason, low-level parallel computers aiming at parallel processing at the microprogram level are being researched. This is 1For example, the 13th Computer Architecture Society (19th
86) Pages 280 to 289 (3th Sym
posiumon Computer Architecture
ture, pp 280-289.1986).

[Problem that the invention seeks to solve]

上記従来技術は、マイクロプログラムのレベルで並列処
理を行わしめるものであるため、マイクロプログラムの
作成が複雑であること、マイクロプログラムレベルでの
並列処理であるため、並列化率があまり高くならないこ
と、ハードウェアが複雑になること、細かな制御を行う
ためにはマクロ命令の数が膨大になること、したがって
、効率的なコンパイラを開発することが困難であること
、などの点で問題があった。Since the above-mentioned conventional technology performs parallel processing at the microprogram level, the creation of the microprogram is complicated.Since the above conventional technology performs parallel processing at the microprogram level, the parallelization rate does not become very high. Problems included the complexity of the hardware, the large number of macro instructions needed to perform detailed control, and the difficulty of developing an efficient compiler. .

本発明の目的は、単一のプロセスの高速化に対して、マ
ルチプロセサの考え方を導入して、繰返しを主体として
シンプルなハードウェアで高速のデータ処理装置を提供
することにある。An object of the present invention is to introduce a multiprocessor concept to speed up a single process, and to provide a high-speed data processing device that uses simple hardware and is mainly based on repetition.

[Means for solving problems]

上記目的は、単一命令ストリームを実行する複数のパイ
プラインプロセサによりデータ処理装置を構成し、各パ
イプラインプロセサにプロセサ間の待合せ制御機構を設
けることにより、達成される。The above object is achieved by configuring a data processing device with a plurality of pipeline processors that execute a single instruction stream, and providing each pipeline processor with an inter-processor queuing control mechanism.

[Effect]

個別のパイプラインプロセサエレメントは、単一の命令
ストリームから命令を取出して、実行する。データ処理
装置には複数のパイプラインプロセサエレメントが含ま
れ、これらは並列に動作するので、単一の命令ストリー
ムに対して、並列処理が行われる。プロセサエレメント
間で待合せ制御が必要な場合には、待合せを行う命令を
用意し、この命令を実行したときには、他プロセサの状
態を監視してそれらの動作終了を待って自プロセサでの
命令実行を始めることにより、待ち制御を簡単に行え、
簡単な機構で処理を高速化できる。Separate pipeline processor elements take and execute instructions from a single instruction stream. The data processing device includes a plurality of pipeline processor elements that operate in parallel, so that parallel processing is performed on a single instruction stream. If queuing control is required between processor elements, prepare an instruction that performs queuing, and when this instruction is executed, it monitors the status of other processors, waits for their operations to finish, and then executes the instruction in its own processor. By starting, you can easily perform waiting control,
Processing can be sped up with a simple mechanism.

〔Example〕

以下、図面を参照して本発明の一実施例を説明する。第
２図は、本発明を実施するデータ処理装置の全体の構成
を示している。処理装置は、プロセサユニット（ＰＵ）
１０．メモリコントロールユニット（ＭＣＵ）２０．メ
インメモリ（ＭＭ）３０、Ｉ１０プ０−１＝４＋　（Ｉ
ＯＰ）４０と、：れを接続する信号線群１５，２５，３
５から構成さ゛れる。もちろん、上記以外の構成要素も
存在するが、本発明の理解には必要ないので省略してい
る。プロセサユニット１０はメインメモリ３０に格納さ
れている命令及びオペランドデータをメモリコントロー
ルユニット２０を介して取り込み、命令を実行する。Ｉ
１０プロセサ４０は入出力制御用のプロセサである。本
発明は、この中で、プロセサユニット１０に関するもの
であるため、以下ではこれを主として説明する。もちろ
んＩ１０プロセサ４０もプロセサであるため、本発明が
適用できる。Hereinafter, one embodiment of the present invention will be described with reference to the drawings. FIG. 2 shows the overall configuration of a data processing device implementing the present invention. The processing device is a processor unit (PU)
10. Memory control unit (MCU)20. Main memory (MM) 30, I10 pu 0-1=4+ (I
OP) Signal line group 15, 25, 3 connecting 40 and :
It consists of 5 parts. Of course, there are other components other than those described above, but they are omitted because they are not necessary for understanding the present invention. The processor unit 10 takes in instructions and operand data stored in the main memory 30 via the memory control unit 20, and executes the instructions. I
10 processor 40 is a processor for input/output control. Since the present invention relates to the processor unit 10, this will be mainly explained below. Of course, since the I10 processor 40 is also a processor, the present invention can be applied to it.

第３図は、プロセサユニット１０の内部構成を示すブロ
ック図である。プロセサユニット１０は複数のパイプラ
インプロセサエレメント（Ｐ　Ｐ　＃２、ＰＰ＃３．Ｐ
Ｐ＃４）１０１，１０２，１０３゜１０４（以下単にプ
ロセサエレメントという）と共通リソース管理部（ＣＲ
Ｍ）１０５から成る。FIG. 3 is a block diagram showing the internal configuration of the processor unit 10. The processor unit 10 includes a plurality of pipeline processor elements (PP#2, PP#3.P
P#4) 101, 102, 103゜104 (hereinafter simply referred to as processor element) and common resource management unit (CR
M) Consists of 105.

本実施例では、４組のプロセサエレメントを待つが、こ
の数は４組に制限されず、任意の数に対して本発明は適
用可能である。In this embodiment, four sets of processor elements are waited for, but this number is not limited to four, and the present invention is applicable to any number.

各プロセサエレメント１０１〜１０４は、信号線群１１
１，１１２，１１３，１１４によって共通リソース管理
部１０５と結合されており、単体としてマイクロ命令を
処理できる。そしてリソースの競合が起こらない限り、
他のプロセサエレメントと独立に動作できるものである
。Each processor element 101 to 104 has a signal line group 11
It is connected to the common resource management unit 105 by 1, 112, 113, and 114, and can process microinstructions as a single unit. And as long as there is no resource contention,
It can operate independently from other processor elements.

第４図はプロセサエレメント１０１の構成を示すブロッ
ク図で、他も同じ構成であって命令フェッチユニット（
ＩＵ）１２１．命令デコードユニット（ＤＵ）１２２．
実行ユニット（ＥＵ）１２３゜ストアユニッチ（ＳＵ）
１２４がら成る。上記の各ユニットは各々独立に動作し
、４段のパイプライン処理を実現する。FIG. 4 is a block diagram showing the configuration of the processor element 101. The other components have the same configuration and the instruction fetch unit (
IU) 121. Instruction decode unit (DU) 122.
Execution unit (EU) 123° Store unit (SU)
It consists of 124 pieces. Each of the above units operates independently to realize four-stage pipeline processing.

命令フェッチユニット１２１は命令のアドレスを信号線
１３４を介して送出し、フェッチした命令を信号線１３
５を介して受取る。デコードユニット１２２は、命令フ
ェッチユニット１２１でフェッチした命令を信号線１３
１を介して受取り、デコードした結果を信号５１３２を
介して実行ユニット１２３に送る。実行ユニツｌ−１２
３はデコード結果を受けて命令を実行し、実行結果をス
トアユニット１２４に信号線１３３を介して送る。The instruction fetch unit 121 sends the address of the instruction via the signal line 134, and transfers the fetched instruction to the signal line 13.
Receive via 5. The decode unit 122 transfers the instruction fetched by the instruction fetch unit 121 to the signal line 12.
1 and sends the decoded result to the execution unit 123 via signal 5132. Execution unit l-12
3 executes the instruction upon receiving the decoding result, and sends the execution result to the store unit 124 via the signal line 133.

ストアユニット１２４は受取った実行結果を信号線１４
０を介して、共通リソース管理部１０５へ送る。バス１
３０は各ユニット間のデータ転送を行うためのパスであ
る。信号線１３６はオペランドのアドレス及びレジスタ
のアドレスを共通リソース管理部１０５に送る信号線、
信号線１３８゜１３９はレジスタのリードデータを受は
取るためのもの、信号線１３７はメモリリードオペラン
ドを受は取るためのものである。The store unit 124 stores the received execution results on the signal line 14.
0 to the common resource management unit 105. bus 1
30 is a path for transferring data between each unit. A signal line 136 is a signal line that sends operand addresses and register addresses to the common resource management unit 105.
Signal lines 138 and 139 are for receiving and receiving register read data, and signal line 137 is for receiving and receiving memory read operands.

第５図は共通リソース管理部１０５の内部構成を示すブ
ロック図である。命令フェッチセレクタ（ＩＦＳ）１５
１は、各プロセサエレメントのフェッチ要求と命令フェ
ッチアドレスを信号線１３４を介して受取り、共通メモ
リブロック内へのアクセスである場合には１つのメモリ
アクセスにまとめて、異なるメモリブロックの場合には
到着順に。FIG. 5 is a block diagram showing the internal configuration of the common resource management section 105. Instruction fetch selector (IFS) 15
1 receives the fetch request and instruction fetch address of each processor element via the signal line 134, and if the access is to a common memory block, it is combined into one memory access, and if the access is to a different memory block, the instruction fetch address is received. In order.

アクセス要求とメモリアドレスをメモリアクセスコント
ローラ１５３に信号線１６１を介して送る。The access request and memory address are sent to the memory access controller 153 via the signal line 161.

オペランドアクセスセレクタ（ＯＡＳ）１５２は、各プ
ロセサエレメントのフェッチ要求とそのアドレスを信号
線１３６を介して受取り、共通メモリブロック内へのア
クセスである場合には１つのメモリアクセスにまとめて
、異なるメモリブロックの場合には到着順に、アクセス
要求とメモリアドレスをメモリアクセスコントローラ１
５３へ信号線１６２を介して送る。メモリアクセスコン
トローラ（ＡＭ）１５３は、命令フェッチセレクタ１５
１又はオペランドアクセスセレクタ１５２から受取った
アドレスを物理アドレスに変換し、信号［１６５を介し
てメインメモリ３０をアクセスする。リードバッファ（
ＲＢ）１５７はメインメモリ３０から信号線１６６を介
して受取ったり−ドデータをバッファリングして、命令
フェッチデータの場合命令ディストリビュータ１５４に
、オペランドデータの場合オペランドディストリビュー
タ１５６に送る。命令ディストリビュータ（ＩＤ）１５
４は、リードバッファ１５７から信号線１６３を介して
受取った命令を、要求したプロセサエレメントに信号線
１３５を介して送る。オペランドディストリビュータ（
ＯＤ）１５６は、リードバッファ１５７から信号線１６
４を介して受取ったオペランドデータを、要求したプロ
セサエレメントに信号線１３７を介して送る。レジスタ
ファイル（ＲＦ）１５８は、リードボートとして８組。An operand access selector (OAS) 152 receives the fetch request and its address of each processor element via the signal line 136, and if the access is to a common memory block, it is combined into one memory access, and if the fetch request is to a common memory block, it is combined into one memory access, In this case, access requests and memory addresses are sent to memory access controller 1 in the order of arrival.
53 via the signal line 162. A memory access controller (AM) 153 includes an instruction fetch selector 15
1 or converts the address received from the operand access selector 152 into a physical address and accesses the main memory 30 via the signal [165. Read buffer (
RB) 157 buffers the read data received from main memory 30 via signal line 166 and sends it to instruction distributor 154 in the case of instruction fetch data and to operand distributor 156 in the case of operand data. Instruction distributor (ID) 15
4 sends the command received from the read buffer 157 via the signal line 163 to the requested processor element via the signal line 135. operand distributor (
OD) 156 is the signal line 16 from the read buffer 157.
The operand data received via line 137 is sent to the requested processor element via signal line 137. There are 8 sets of register files (RF) 158 as read ports.

ライトポートとして４組、計１２ボートを持ち。It has 12 boats in total, 4 groups serving as light ports.

プロセサエレメント毎に、２組のリードデータを信号線
１３８，１３９を介して送り、ライトデータを信号１４
０を介して受取る。ストアバッファ（ＳＢ）１５９は、
信号線１４０を介して受取ったライトデータと、信号線
１６５を介して受取ったメモリアドレスを受取り、メイ
ンメモリ３０への書込みを信号線１６７ｔｔ介して行う
。For each processor element, two sets of read data are sent via signal lines 138 and 139, and write data is sent via signal line 14.
Receive via 0. The store buffer (SB) 159 is
It receives the write data received via the signal line 140 and the memory address received via the signal line 165, and writes it into the main memory 30 via the signal line 167tt.

第６図は各プロセサエレメントが実行するマイクロ命令
（機械語）の命令フォーマットを示している。全ての命
令は３２ビツトの固定長とすることにより、命令のデコ
ードを行わなくとも、次の命令の位置を検出できるよう
な、いわゆるＲＩＳＣ（Ｒｅｄｕｃｅｄ　Ｉｎ５ｔｒｕ
ｃｔｉｏｎ　Ｓｅｔ　Ｃｏｍｐｕｔｅｒ）と呼ばれる命
令形式を採用する。ＲＩＳＣに関しては、例えば、アイ
・イー・イー・イー、コンピュータ、　１９８２年９月
号第１１頁（Ｉ　Ｅ　Ｅ　Ｅ　、　Ｃｏｍｐｕｔｅｒ、
ｐｐＨ。FIG. 6 shows the instruction format of microinstructions (machine language) executed by each processor element. By setting all instructions to a fixed length of 32 bits, it is possible to detect the position of the next instruction without decoding the instruction.
An instruction format called ction Set Computer) is adopted. Regarding RISC, for example, IEE, Computer, September 1982 issue, page 11 (IEE, Computer,
ppH.

Ｓｅｐｔｅｍｂｅｒ、　１９８２）に参照される。September, 1982).

本命令形式では、ロード、ストア、ブランチ命令以外の
命令は全て、レジスタオペランドまたはイミーデイエイ
トオペランドを取る。In this instruction format, all instructions other than load, store, and branch instructions take register operands or immediate-eight operands.

第６図（ａ）は全てのオペランドがレジスタの場合の命
令形式を示す。オペレーションの種類を示すオペコード
（ＯＰ）フィールド、待ちのテストを行うことを指示す
る（Ｗ）フィールド、レジスタの番号を指示する（Ｒｓ
ｌ、　Ｒｓｚ、　Ｒｉ）フィールドを含む。FIG. 6(a) shows the instruction format when all operands are registers. Opcode (OP) field that indicates the type of operation, (W) field that instructs to perform a wait test, and (Rs) field that indicates the register number.
1, Rsz, Ri) fields.

第６図（ｂ）は、ロード、ストア、ブランチ命令のよう
にメモリアドレスの計算が必要となる命令の形式を示し
ている。Ｒ５，Ｒ−フィールドはレジスタ番号を指定し
、Ｄｉｓｐフィールドはオフセット値を与える。なお第
６図では、ＷフィールドをＯＰフィールドから独立して
与えているが。FIG. 6(b) shows the format of instructions such as load, store, and branch instructions that require calculation of memory addresses. The R5, R- field specifies the register number and the Disp field gives the offset value. Note that in FIG. 6, the W field is provided independently from the OP field.

ＯＰフィールド内にＷフィールドをコード化・して含め
ることも、もちろん可能である。Of course, it is also possible to code and include the W field within the OP field.

次に、各プロセサエレメントの動作について説明する。Next, the operation of each processor element will be explained.

第７図、第８図はプロセサユニット１０全体のパイプラ
インステージフローを示している。7 and 8 show the pipeline stage flow of the entire processor unit 10.

このうち第７図は、全プロセサエレメントでパイプライ
ンに空きが全く生じない場合を示している。Of these, FIG. 7 shows a case where there is no vacant space in the pipeline for all processor elements.

各プロセサエレメントはデコード、実行、ストアの各ユ
ニット１２２，１２３，１２４で、１マシンサイクルピ
ツチで命令を処理している。各プロセサエレメントは命
令ストリーム中の４つの離れた命令を順次実行する。例
えばプロセサエレメント１０１では、命令ｎの次は命令
ｎ＋４、次に命令ｎ＋８の順に実行する。同様にプロセ
サエレメント１０２では、命令ｎ＋１の次には命令ｎ＋
５、次に命令ｎ＋９の順に実行する。同一のパイプライ
ンステージで実行されている命令は、隣近するプロセサ
レメント間では、分岐命令が発行されない限り、１（又
は３）だけ違うだけである０例えばｔ５サイクルの実行
ユニット１２３で実行中の命令はプロセサエレメント１
０１では、命令ｎ＋８、プロセサエレメント１０２では
命令ｎ’＋９、プロセサエレメント１０３では命令ｎ＋
１０゜プロセサエレメント１０４では命令ｎ＋１１であ
る。Each processor element has decode, execute, and store units 122, 123, and 124 that process instructions at one machine cycle pitch. Each processor element sequentially executes four separate instructions in the instruction stream. For example, in the processor element 101, after instruction n, instruction n+4 is executed, and then instruction n+8 is executed in this order. Similarly, in the processor element 102, the instruction n+1 is followed by the instruction n+1.
5. Next, execute instruction n+9 in order. Instructions being executed in the same pipeline stage differ by only 1 (or 3) between adjacent processor elements, unless a branch instruction is issued. The instruction is processor element 1
01, instruction n+8, processor element 102 instruction n'+9, processor element 103 instruction n+
In the 10° processor element 104, the instruction is n+11.

第８図は、他の命令実行終了を待って、その命令の実行
を行う場合のパイプラインステージフローを示している
。第８図で、命令ｎ、ｎ＋１．ｎ＋２．ｎ＋３．ｎ＋４
．ｎ＋５．ｎ＋６．ｎ＋７は前の命令の実行状態に依ら
ず実行される命令である。このように先行する命令の実
行状態によらずに実行できることを表すため、命令語の
Ｗフィールドを“０”　（即ち待合せしないことを示す
）とする、一方、命令ｎ＋８は、先行する命令の実行が
終了するのを待って実行される命令である。FIG. 8 shows a pipeline stage flow when executing an instruction after waiting for the completion of execution of another instruction. In FIG. 8, instructions n, n+1. n+2. n+3. n+4
．． n+5. n+6. n+7 is an instruction that is executed regardless of the execution state of the previous instruction. In order to indicate that it can be executed regardless of the execution state of the preceding instruction, the W field of the instruction word is set to "0" (in other words, indicating that there is no waiting).On the other hand, instruction n+8 is executed without depending on the execution state of the preceding instruction. This is an instruction that is executed after waiting for the completion of the process.

これは、例えば、先行する命令の実行結果を使って演算
を行うような場合、このような待合せが必要となる。命
令ｎ＋８を実行するプロセサエレメント１０１はｔ４サ
イクルで命令デコード終了後、全てのプロセサエレメン
トで先行する命令の実行が終了する（即ち、全てのプロ
セサエレメントのストアユニット２４の実行が完了する
）のを待ち、終了後実行ユニット１２３での実行を開始
する。このように先行する命令の実行終了を待つ場合に
は、命令語のＷフィールドをｇｇ　１　＃＃とする。This kind of waiting is necessary, for example, when an operation is performed using the execution result of a preceding instruction. The processor element 101 that executes instruction n+8 waits for all processor elements to finish executing the preceding instruction (that is, for all processor elements to finish executing the store unit 24) after completing instruction decoding in cycle t4. , starts execution in the execution unit 123 after completion. In this way, when waiting for the completion of execution of the preceding instruction, the W field of the instruction word is set to gg 1 ##.

他のプロセサエレメント１０２〜１０４はプロセサエレ
メント１０１と同期をとるため、命令ｎ＋８に後続する
命令ｎ　＋　９　、　ｎ　＋　１０　＊　ｎ　＋　１１
の３命令についても、命令ｎ＋８と同様Ｗフィールドを
“１”とする、これによる待ち状態を第８図では点線で
示しており、ｔ５サイクルでは全プロセサエレメントの
実行ユニットが待ちとなっている。Since the other processor elements 102 to 104 are synchronized with the processor element 101, the instructions n+9, n+10*n+11 following the instruction n+8
For the three instructions, the W field is set to "1" similarly to instruction n+8, and the resulting wait state is shown by dotted lines in FIG. 8, and in the t5 cycle, the execution units of all processor elements are in the wait state.

上記の命令待ち合わせ機能を実現するための、本発明の
特徴とする回路の実施例を第１図に示す。FIG. 1 shows an embodiment of a circuit, which is a feature of the present invention, for realizing the above-mentioned instruction waiting function.

同図において、４組のプロセサエレメント１０１〜１０
４　（ＰＰ＃１〜＃４）は同じ構成要素を持つため、こ
れらを区別する目的で、各ユニットを示す符号の後に＄
ｎ　（ｎ＝１＋　２ｅ　３＊　４）を付加している。In the figure, four sets of processor elements 101 to 10
4 (PP#1 to #4) have the same components, so to distinguish them, $ is added after the code indicating each unit.
n (n=1+2e3*4) is added.

命令フェッチユニット１２２は、命令をデコードする命
令デコーダ（ＩＤ＃１）２０１．該デコード結果をラッ
チするラッチ（ＤＲ＃１）２０２を有する。実行ユニッ
ト１２３は、待制御を行うためのアンドゲート２０３，
２０４，２０７．オアゲート２０６．インバータ２０５
を有する。ストアユニット１２４は、制御回路（ＳＣ＃
１）２０８とストアユニットがアイドル中又は次のサイ
クルでアイドルとなることを示すフラグ（ＳＴ＃１）２
Ｑ９を有する。このフラグ２０９は制御回路２０８から
の信号線２２９によってセット／リセットされる。The instruction fetch unit 122 includes an instruction decoder (ID#1) 201 . It has a latch (DR#1) 202 that latches the decoded result. The execution unit 123 includes an AND gate 203 for performing wait control,
204, 207. Orgate 206. Inverter 205
has. The store unit 124 includes a control circuit (SC#
1) 208 and a flag (ST#1) 2 indicating that the store unit is idle or will become idle in the next cycle
It has Q9. This flag 209 is set/reset by a signal line 229 from the control circuit 208.

このような構成において、Ｗフィールドが１１０　＃＃
の命令を命令デコーダ２０１がデコードして、その結果
をラッチ２０２にセットすると、待ち制御を必要としな
い命令であることを実行ユニット１２３に指示する信号
線２１１がａＯ″となる。In such a configuration, the W field is 110 ##
When the instruction decoder 201 decodes the instruction and sets the result in the latch 202, the signal line 211 that instructs the execution unit 123 that the instruction does not require wait control becomes aO''.

これによってオアゲート出力は＃　１　＄１となり、ク
ロック信号２２３はアンドゲート２０７を通って、実行
ユニット１２３の全ラッチゲートへのクロック信号２２
４として出力される。This causes the OR gate output to be # 1 $1, and the clock signal 223 passes through the AND gate 207 to the clock signal 22 to all latch gates of the execution unit 123.
Output as 4.

一方、命令のＷフィールドが１”で、待ち制御を必要と
するときは信号線２１１が１１１　）ｌとなり、このと
きはアンドゲート２０３の出力２２２が“１″の時だけ
オアゲート２０３の出力はＪ”になる。ところが各プロ
セサエレメント１０１〜１０４のストアユニット（ＳＵ
＃１〜＃４）のいずれかが動作中であると各フラグ２０
９　（ＳＩ＃１〜＃４）の該当するものが０”でアンド
ゲート２０３出力も４１０　ＩＩＩになり、実行ユニッ
ト１２３は待ち状態になる。そしてすべてのプロセサエ
レメント１０１〜１０４すべての動作が終了してフラグ
がすべて“１”になったとき、ゲート２０３の出力２２
２はＮ　１　ｙｔとなってフロック信号２２４が出力さ
れ、実行ユニット１２３が実行処理を行う。On the other hand, when the W field of the instruction is "1" and wait control is required, the signal line 211 becomes 111)l, and in this case, the output of the OR gate 203 is J only when the output 222 of the AND gate 203 is "1". "become. However, the store unit (SU) of each processor element 101 to 104
If any of #1 to #4) is in operation, each flag 20
9 (SI #1 to #4) is 0'', the AND gate 203 output also becomes 410 III, and the execution unit 123 enters the wait state.Then, all operations of all processor elements 101 to 104 are completed. When all the flags become “1”, the output 22 of the gate 203
2 becomes N 1 yt, a flock signal 224 is output, and the execution unit 123 performs execution processing.

〔Effect of the invention〕

本発明によれば、単純な論理を繰り返し用いて構成した
複数のパイプラインプロセサエレメントによりデータ処
理装置を構成でき、複雑なロジックを有する単一プロセ
サにより構成された従来のデータ処理装置に比べて、よ
り高速な性能を実現できるという効果がある。According to the present invention, a data processing device can be configured by a plurality of pipeline processor elements configured by repeatedly using simple logic, and compared to a conventional data processing device configured by a single processor having complex logic, This has the effect of realizing faster performance.

[Brief explanation of the drawing]

第１図は本発明の装置の待ち制御の一実施例を示すブロ
ック図、第２図は本発明を適用するデータ処理装置の全
体構成を示すブロック図、第３図は本発明の適用対象と
なるプロセサ部の構成を示すブロック図、第４図及び第
５図は第３図の詳細を示すブロック図、第６図は本発明
の一実施例で採用している命令形式を示す図、第７図及
び第８図は本発明の一実施例の動作シーケンスを説明す
るステージフロー図である。１０１．１０２，１０３，１０４・・・パイプラインプ
ロセサエレメント、１２２・・・命令デコードユニット
、１２３・・・実行ユニット、１２４・・・ストアユニ
ット、２０３，２０４，２０５，２０６，２０７・・・
ゲート、２ｏ９・・・フラグ。FIG. 1 is a block diagram showing an embodiment of wait control of the device of the present invention, FIG. 2 is a block diagram showing the overall configuration of a data processing device to which the present invention is applied, and FIG. 4 and 5 are block diagrams showing the details of FIG. 3. FIG. 6 is a diagram showing the instruction format adopted in an embodiment of the present invention. 7 and 8 are stage flow diagrams illustrating the operation sequence of an embodiment of the present invention. 101.102, 103, 104... Pipeline processor element, 122... Instruction decode unit, 123... Execution unit, 124... Store unit, 203, 204, 205, 206, 207...
Gate, 2o9...flag.

Claims

[Claims]

1. Processors with the same pipeline structure, each having as a pipeline stage an instruction decoding means, an execution means for executing the instruction decoded by the means, and a storage means for storing the processing result by the means in memory. In a data processing device that includes a plurality of elements and is configured such that a single stream of instructions is sequentially and cyclically processed by each of the processor elements, a waiting field is provided for the instructions, and the storage means in the processor element is provided with a waiting field. A processor element is provided with a flag that is turned on when processing of an instruction is completed, and an execution wait control circuit is provided in the execution means in the processor element, and the execution wait control circuit is configured to turn off the wait field of the decoded instruction. , the execution of the instruction is started immediately, and when the wait field of the decoded instruction is on, the execution of the instruction is started when the flags of all the storage means are turned on. What is claimed is: 1. A data processing device having a function of controlling execution means.