JP2668987B2

JP2668987B2 - Data processing device

Info

Publication number: JP2668987B2
Application number: JP63248108A
Authority: JP
Inventors: 正人鈴木; 雅士出口; 幸伸西川; 隆坂尾
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1988-09-30
Filing date: 1988-09-30
Publication date: 1997-10-27
Anticipated expiration: 2012-10-27
Also published as: JPH0296234A

Description

【発明の詳細な説明】産業上の利用分野本発明は、コンピュータの高速化を目的としたデータ
処理装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device for increasing the speed of a computer.

従来の技術従来のデータ処理装置としては、例えば、元岡達「計
算機システム技術」，（昭48.4.20），オーム社,P93〜9
9に示されている。2. Description of the Related Art Conventional data processing devices include, for example, Tatsumoto Motooka, "Computer System Technology," (48.4.20), Ohmsha, P93-9
Shown in 9.

第10図はこの従来のデータ処理装置の構成図を示すも
のである。第10図において、11は命令コードの先読みを
行なう命令先読み装置、12は命令解読装置、13はオペラ
ンドのアドレス計算を行なうアドレス計算装置、14はオ
ペランドの先読みを行なうオペランド先読み装置、15は
演算装置、16はメモリ・I/Oなどを接続する入出力バ
ス、17は命令先読み装置11、オペランド先読み装置14、
およびオペランドの書込み時の演算装置15からの要求を
調停し入出力バス16を制御を行なうバス制御装置であ
る。FIG. 10 is a block diagram of this conventional data processing device. In FIG. 10, 11 is an instruction prefetching device for prefetching an instruction code, 12 is an instruction decoding device, 13 is an address calculation device for performing operand address calculation, 14 is an operand prefetching device for performing operand prefetching, and 15 is an arithmetic device. , 16 is an input / output bus for connecting memory, I / O, etc., 17 is an instruction prefetching device 11, an operand prefetching device 14,
And a bus controller for controlling the input / output bus 16 by arbitrating requests from the arithmetic unit 15 when writing operands.

命令解読装置12は、命令先読み装置11により先読みさ
れた命令コードを解読し、命令実行に関する制御情報
と、メモリオペランドのフェッチを伴う場合はオペラン
ドのアドレス計算および先読みのための制御情報を、ま
たメモリへの書込みを伴う場合はオペランドのアドレス
計算のための制御情報を、アドレス計算装置13に発行す
る。The instruction decoding device 12 decodes the instruction code pre-read by the instruction pre-reading device 11 and provides control information related to instruction execution, and control information for operand address calculation and pre-reading when a memory operand is fetched, and memory. When writing to the address is performed, control information for calculating the address of the operand is issued to the address calculation device 13.

アドレス計算装置13は、オペランドのアドレス計算を
行ない、オペランドアドレスとメモリ参照に伴う制御情
報と命令実行に関する制御情報をオペランド先読み装置
14に送出する。The address calculator 13 calculates the address of the operand, and stores the operand address, the control information associated with the memory reference, and the control information related to the instruction execution into the operand prefetcher.
Send to 14.

オペランド先読み装置14は、メモリオペランドのフェ
ッチが必要な場合はバス制御装置17へ要求を出し、アド
レス計算装置13より受け取ったオペランドアドレスに従
ってオペランドの先読みを行なう。バス制御装置17より
受け取ったオペランドの先読みデータ、アドレス計算装
置13より受け取ったメモリへの書込みを伴う場合のオペ
ランドの書込みアドレスおよび命令実行に関する制御情
報は、演算装置15に送出する。The operand prefetch device 14 issues a request to the bus control device 17 when fetching a memory operand is necessary, and prefetches the operand according to the operand address received from the address calculation device 13. The read-ahead data of the operand received from the bus control device 17, the write address of the operand when accompanied by writing to the memory received from the address calculation device 13, and control information on instruction execution are sent to the arithmetic device 15.

演算装置15は、オペランド先読み装置14より受け取っ
た先読みデータおよび命令実行に関する制御情報に従っ
て演算を実行する。また、演算結果のメモリへの書込み
を必要とする場合は、バス制御装置17へ要求を出し、オ
ペランド先読み装置14より受け取った書込みアドレスに
従って演算結果のメモリへの書込みを行なう。The arithmetic unit 15 executes an arithmetic operation according to the prefetch data received from the operand prefetch unit 14 and the control information relating to the instruction execution. When it is necessary to write the operation result to the memory, a request is sent to the bus control device 17 to write the operation result to the memory in accordance with the write address received from the operand prefetch device 14.

以上のように構成された従来のデータ処理装置につい
て、以下その動作を説明する。The operation of the conventional data processing apparatus configured as described above will be described below.

第11図は動作タイミング図を示すものである。命令先
読み装置11、命令解読装置12、アドレス計算装置13、オ
ペランド先読み装置14、および演算装置15において実行
されている命令をクロック単位で示している。ここで、
命令先読み装置11は内部にキャッシュメモリを持ち、各
装置の必要クロック数が、命令先読み装置11（１クロッ
ク）、命令解読装置12（１クロック）、アドレス計算装
置13（１クロック）、オペランド先読み装置14（３クロ
ック）、および演算装置15（１クロック）の場合を示し
ている。実行している命令シーケンスは、メモリオペラ
ンドのフェッチが必要な命令に続いて２命令のメモリオ
ペランドのフェッチが不要な命令を実行し、この３命令
の繰り返しとなっている。具体的には、命令1,4がメモ
リオペランドのフェッチが必要な命令であり、命令2,3,
5,6がメモリオペランドのフェッチが不要な命令であ
る。またパイプラインの初期状態は空状態（例えば条件
分岐時）としている。命令１は、クロックt1に命令先読
み装置11で命令コードの先読みが行なわれ、命令コード
を命令解読装置12に発行する。クロックt2に命令解読装
置12で命令解読が行なわれ、命令実行に関する制御情報
とオペランドのアドレス計算および先読みのための制御
情報をアドレス計算装置13へ発行する。クロックt3にア
ドレス計算装置13でオペランドのアドレス計算および先
読みのための制御情報に従ってオペランドのアドレス計
算が行なわれ、オペランドアドレスとメモリ参照に伴う
制御情報と命令実行に関する制御情報をオペランド先読
み装置14へ送出する。クロックt4〜t6にオペランド先読
み装置14でオペランドアドレスおよびメモリ参照に伴う
制御情報に従ってオペランドの先読みが行なわれ、先読
みデータと命令実行に関する制御情報を演算装置15へ送
出する。クロックt7に演算装置15において命令実行に関
する制御装置に従って演算を実行する、命令２は、クロ
ックt2に命令先読み装置11で命令コードの先読みが行な
われ、クロックt3に命令解読装置12で命令解読が行なわ
れ、メモリオペランドのフェッチが不要な命令のため命
令実行に関する制御情報だけをアドレス計算装置13へ発
行する。クロックt4にアドレス計算装置13のステージを
通るが、オペランドのアドレス計算および先読みのため
の制御情報が無いため何も行なわず、命令実行に関する
制御情報をオペランド先読み装置14へ送出しようとす
る。しかし、命令１のオペランド先読み装置14での実行
が完了していないために、命令実行に関する命令２の制
御情報の送出はクロックt7まで遅延される。クロックt7
にオペランド先読み装置t4のステージを通るが、オペラ
ンドのアドレス計算および先読みのための制御情報が無
いため何も行なわず、命令実行に関する制御情報を演算
装置15へ送出する。クロックt8に演算装置15において命
令実行に関する制御情報に従って演算を実行する。FIG. 11 shows an operation timing chart. The instructions executed in the instruction prefetching device 11, the instruction decoding device 12, the address calculating device 13, the operand prefetching device 14, and the arithmetic unit 15 are shown in clock units. here,
The instruction prefetching device 11 has a cache memory inside, and the required clock number of each device is such that the instruction prefetching device 11 (1 clock), the instruction decoding device 12 (1 clock), the address calculation device 13 (1 clock), the operand prefetching device. The case of 14 (3 clocks) and the arithmetic unit 15 (1 clock) is shown. The instruction sequence being executed is an instruction that does not require fetching two memory operands following an instruction that requires fetching memory operands, and is a repetition of these three instructions. Specifically, instructions 1 and 4 are instructions that require fetching of memory operands, and instructions 2, 3 and
Instructions 5 and 6 do not require fetching of memory operands. The initial state of the pipeline is empty (for example, at the time of conditional branch). Instruction 1 is prefetched by the instruction prefetching device 11 at clock t1 and issues the instruction code to the instruction decoding device 12. At the clock t2, the instruction is decoded by the instruction decoding device 12, and control information relating to instruction execution and control information for operand address calculation and pre-reading are issued to the address calculation device 13. At clock t3, the address calculation unit 13 calculates the operand address in accordance with the control information for operand address calculation and prefetching, and sends the operand address, control information associated with memory reference, and control information relating to instruction execution to the operand prefetching unit 14. To do. At clocks t4 to t6, operand prefetching is performed by operand prefetching device 14 in accordance with the operand address and control information associated with memory reference, and the prefetching data and control information relating to instruction execution are sent to arithmetic device 15. At clock t7, the arithmetic unit 15 executes an operation in accordance with the control unit relating to instruction execution. For instruction 2, the instruction code is prefetched by the instruction prefetching unit 11 at clock t2, and the instruction is decoded by the instruction decoding unit 12 at clock t3. Therefore, only the control information relating to the instruction execution is issued to the address calculation device 13 because the instruction does not require fetching the memory operand. The clock t4 passes through the stage of the address calculation device 13, but does not perform any operation because there is no control information for calculating the address of the operand and prefetching, and attempts to send control information relating to instruction execution to the operand prefetch device 14. However, since the execution of the instruction 1 in the operand prefetching device 14 is not completed, the transmission of the control information of the instruction 2 related to the instruction execution is delayed until the clock t7. Clock t7
Although it passes through the stage of the operand prefetch unit t4, nothing is done because there is no control information for operand address calculation and prefetch, and control information relating to instruction execution is sent to the arithmetic unit 15. At clock t8, the arithmetic unit 15 executes an arithmetic operation in accordance with control information regarding instruction execution.

しかしながら上記のような構成では、メモリオペラン
ドのフェッチが不要なレジスタ間演算等の命令において
も、オペランドのアドレス計算や先読み等のパイプライ
ンステージの不必要な通過が必要となり、このため分岐
時等のパイプラインの乱れが発生した場合、パイプライ
ンの充填のためのオーバーヘッドが発生する。また、メ
モリオペランドのフェッチが不要な命令のパイプライン
ステージの不必要な通過により本来オペランドのフェッ
チが必要な命令のオペランドアクセスのためのバス帯域
を制限するという課題を有していた。However, in the above-described configuration, an instruction such as an inter-register operation that does not require fetching of a memory operand also requires unnecessary passage through a pipeline stage such as operand address calculation and prefetching. When the pipeline is disturbed, overhead for filling the pipeline is generated. Further, there is a problem that the bus band for operand access of an instruction originally requiring an operand fetch is limited by unnecessary passage of a pipeline stage of an instruction not requiring a memory operand fetch.

そこで上記の課題を解決するために例えば次のような
特許（特開昭63−167935）が出願されている。Therefore, in order to solve the above problems, for example, the following patent (Japanese Patent Laid-Open No. 63-167935) has been applied.

第12図はそのデータ処理装置の構成図を示すものであ
る。第12図において、18は命令コードの先読みを行なう
命令先読み装置、19は命令解読装置、20はオペランドの
アドレス計算を行なうアドレス計算装置、21はオペラン
ドの先読みを行なうオペランド先読み装置、22は演算制
御装置、23は演算装置、24はメモリ・I/Oなどを接続す
る入出力バス、25は命令先読み装置18、オペランド先読
み装置21およびオペランドの書込み時の演算装置23から
の要求を調停し入出力バス24の制御を行なうバス制御装
置である。FIG. 12 shows a configuration diagram of the data processing device. In FIG. 12, 18 is an instruction pre-reading device for pre-reading an instruction code, 19 is an instruction decoding device, 20 is an address calculation device for performing operand address calculation, 21 is an operand pre-reading device for performing operand pre-reading, and 22 is arithmetic control. Device, 23 is an arithmetic device, 24 is an input / output bus for connecting memory / I / O, etc., 25 is an instruction prefetch device 18, an operand prefetch device 21, and arbitrates input / output requests from the arithmetic device 23 when writing operands. A bus control device for controlling the bus 24.

命令解読装置19は、命令先読み装置18により先読みさ
れた命令コードを解読し、メモリオペランドのフェッチ
を伴う場合はオペランドのアドレス計算および先読みの
ための制御情報を、またメモリへの書込みを伴う場合は
オペランドのアドレス計算のための制御情報をアドレス
計算装置20に発行する。また、命令実行に関する制御情
報は演算制御装置22に発行する。The instruction decoding device 19 decodes the instruction code prefetched by the instruction prefetching device 18, and when it involves the fetch of the memory operand, the control information for the address calculation and prefetch of the operand, and when it involves the writing to the memory. Control information for calculating the address of the operand is issued to the address calculation device 20. Further, control information regarding instruction execution is issued to the arithmetic and control unit 22.

アドレス計算装置20は、オペランドのアドレス計算を
行ない、オペランドアドレスとメモリ参照に伴う制御情
報をオペランド先読み装置21に送出する。The address calculation device 20 calculates the address of the operand, and sends the operand address and control information accompanying the memory reference to the operand prefetch device 21.

オペランド先読み装置21は、メモリオペランドのフェ
ッチが必要な場合はバス制御装置25へ要求を出し、アド
レス計算装置20より受け取ったオペランドアドレスに従
ってメモリの先読みを行ない、読出したデータのキュー
イングを行う。また、メモリへの書込みが必要な場合は
オペランドアドレスのキューイングを行う。先読みデー
タおよび書込みアドレスのキューイングの状態は演算制
御装置22に通知される。Operand prefetching device 21 issues a request to bus control device 25 when fetching of a memory operand is necessary, performs memory prefetching according to the operand address received from address calculation device 20, and queues the read data. If it is necessary to write to the memory, the operand address is queued. The state of queuing of the pre-read data and the write address is notified to the arithmetic and control unit 22.

演算制御装置22は、命令解読装置19より受け取った演
算装置23の制御情報のキューイングを行うう。また、演
算装置23からの要求に従って制御情報を発行する。この
時、発行する制御情報が先読みデータまたは書込みアド
レスを必要とする場合は、オペランド先読み装置21の先
読みデータおよび書込みアドレスのキューイングの状態
の確認を行なう。準備が完了していない場合、演算装置
23への制御情報の発行は先読みデータまたは書込みアド
レスの準備が完了するまで遅延させる。The arithmetic control unit 22 performs queuing of the control information of the arithmetic unit 23 received from the instruction decoding unit 19. In addition, control information is issued in accordance with a request from the arithmetic unit 23. At this time, if the control information to be issued requires prefetch data or a write address, the queuing state of the prefetch data and the write address of the operand prefetch device 21 is confirmed. If not ready, the arithmetic unit
Issuance of control information to 23 is delayed until the preparation of the pre-read data or write address is completed.

演算装置23は、演算制御装置22より受け取った制御情
報およびオペランド先読み装置21より受け取った先読み
データに従って演算を実行する。また、演算結果のメモ
リへの書込みを必要とする場合はバス制御装置25へ要求
を出し、オペランド先読み装置21より受け取った書込み
アドレスに従って演算結果のメモリへの書込みを行う。The arithmetic unit 23 executes an arithmetic operation according to the control information received from the arithmetic control unit 22 and the prefetch data received from the operand prefetch unit 21. If the operation result needs to be written to the memory, a request is issued to the bus control device 25, and the operation result is written to the memory according to the write address received from the operand prefetch device 21.

以上のように構成された第２の従来のデータ処理装置
について、以下その動作を説明する。The operation of the second conventional data processing device configured as described above will be described below.

第13図は動作タイミング図を示すものである。命令先
読み装置18、命令解読装置19、アドレス計算装置20、オ
ペランド先読み装置21、および演算装置23におい実行さ
れている命令をクロック単位で示している。ここで、命
令先読み装置18は内部にキャッシュメモリを持ち、各装
置の必要クロック数が、命令先読み装置18（１クロッ
ク）、命令解読装置19（１クロック）、アドレス計算装
置20（１クロック）、オペランド先読み装置21（３クロ
ック）、および演算装置23（１クロック）の場合を示し
ている。実行している命令シーケンスは、第11図の動作
タイミング図に示した命令シーケンスと同じであり、メ
モリオペランドのフェッチが必要な命令に続いて２命令
のメモリオペランドのフェッチが不要な命令を実行し、
この３命令の繰り返しとなっている。具体的には、命令
1,4がメモリオペランドのフェッチが必要な命令であ
り、命令2,3,5,6がメモリオペランドのフェッチが不要
な命令である。また、パイプラインの初期状態は空状態
（例えば条件分岐時）としている。命令１は、クロック
t1に命令先読み装置18で命令コードの先読みが行なわ
れ、命令コードを命令解読装置19に発行する。クロック
t2に命令解読装置19で命令解読が行なわれ、オペランド
のアドレス計算および先読みのための制御情報をアドレ
ス計算装置20へ発行し、命令実行に関する制御情報を演
算制御装置22へ発行する。しかし、データの準備が完了
していないために、命令実行に関する制御情報は演算制
御装置22にキューイングされた状態で演算装置23への発
行は遅延される。オペランドのアドレス計算および先読
みのための制御情報に従って、クロックt3にアドレス計
算装置20でオペランドのアドレス計算が行なわれ、クロ
ックt4〜t6にオペラド先読み装置21でオペランドの先読
みが行なわれる。データの準備が完了したことにより演
算制御装置22においてキューイングされている命令実行
に関する制御情報が発行され、クロックt7に演算装置23
において実行される。命令２は、クロックt2に命令先読
み装置18で命令コードの先読みが行なわれ、クロックt3
に命令解読装置19で命令解読が行なわれ、メモリオペラ
ンドのフェッチが不要な命令のため命令実行に関する制
御情報だけを演算制御装置22へ発行する。しかし、命令
１のための制御情報の発行が完了していないために、命
令実行に関する命令２の制御情報は演算制御装置22にキ
ューイングされた状態で演算装置23への発行はクロック
t8まで遅延される。FIG. 13 shows an operation timing chart. Instructions executed in the instruction prefetching device 18, the instruction decoding device 19, the address calculation device 20, the operand prefetching device 21, and the arithmetic device 23 are shown in clock units. Here, the instruction prefetch device 18 has a cache memory therein, and the number of clocks required for each device is determined by the instruction prefetch device 18 (1 clock), the instruction decoding device 19 (1 clock), the address calculation device 20 (1 clock), The figure shows the case of the operand prefetch device 21 (3 clocks) and the arithmetic unit 23 (1 clock). The instruction sequence being executed is the same as the instruction sequence shown in the operation timing diagram of FIG. 11, and an instruction that does not require fetching two memory operands is executed after an instruction that requires fetching memory operands. ,
These three instructions are repeated. Specifically, the instruction
Instructions 1, 4 are instructions that require fetching memory operands, and instructions 2, 3, 5, 6 are instructions that do not require fetching memory operands. The initial state of the pipeline is empty (for example, at the time of conditional branch). Instruction 1 is the clock
At t1, the instruction prefetching device 18 prefetches the instruction code and issues the instruction code to the instruction decoding device 19. clock
At t2, instruction decoding is performed by the instruction decoding device 19, control information for calculating the address of the operand and prefetching is issued to the address calculation device 20, and control information relating to instruction execution is issued to the operation control device 22. However, since the data preparation is not completed, the issuance of the control information relating to the instruction execution to the arithmetic unit 23 is delayed while the control information is queued in the arithmetic control unit 22. In accordance with control information for operand address calculation and prefetching, the address calculation of the operand is performed by the address calculation device 20 at the clock t3, and the operand prefetching is performed by the operand prefetching device 21 at the clocks t4 to t6. When the preparation of the data is completed, the control information about the instruction execution queued in the arithmetic control unit 22 is issued, and the arithmetic unit 23 is clocked at the clock t7.
Executed in For the instruction 2, the instruction prefetching device 18 prefetches the instruction code at the clock t2, and the clock t3
Then, the instruction decoding unit 19 decodes the instruction and issues only the control information related to the instruction execution to the arithmetic and control unit 22 because the instruction does not need to fetch the memory operand. However, since the issuance of the control information for the instruction 1 is not completed, the control information of the instruction 2 relating to the execution of the instruction is queued in the arithmetic and control unit 22 and issuance to the arithmetic unit 23
Delayed until t8.

以上に示す第２の従来例によれば、メモリオペランド
のフェッチが不要なレジスタ間演算等の命令において
は、オペランドのアドレス計算や先読み等のパイプライ
ンステージの不必要な通過が不要となり、このため分岐
等のパイプラインの乱れが発生した場合においても、パ
イプラインの充填のためのオーバヘッドは発生しない。
また、メモリオペランドのフェッチが不要な命令のパイ
プラインステージの不必要な通気により生じる本来オペ
ランドのフェッチが必要な命令のオペランドアクセスの
ためのバス帯域の制限も解消される。According to the second conventional example described above, in an instruction such as an inter-register operation that does not require fetching of a memory operand, unnecessary passage of a pipeline stage such as address calculation of an operand or prefetching is unnecessary. Even if a pipeline disorder such as branching occurs, the overhead for filling the pipeline does not occur.
In addition, the limitation on the bus band for operand access of an instruction that originally requires fetching of an operand, which is caused by unnecessary ventilation of a pipeline stage of an instruction that does not require fetching of a memory operand, is also eliminated.

発明が解決しようとする課題しかしながら上記のような構成では、オペランドフェ
ッチの必要な命令の次にメモリへの書込みを必要とする
命令が続く場合、後者の命令の演算ステージは、たとえ
その演算に前者の命令の実行結果を用いなくても、前者
の命令のオペランド先読みステージと演算ステージが完
了するまで待たなければならない。さらに、オペランド
フェッチもメモリへの書込みも必要としない命令が続く
場合、その命令の演算ステージは、オペランドフェッチ
の必要な命令の実行結果を用いなくても、先行するオペ
ランドフェッチの必要な命令のオペランド先読みステー
ジおよび演算ステージと、先行するメモリへの書込みを
必要とする命令の演算ステージが完了するまで待たなけ
ればならない。また、メモリへの書込みを必要とする命
令にオペランドフェッチの必要な命令が続く場合、後者
の命令の演算ステージは、たとえその演算に前者の命令
の実行結果を用いなくても、前者の命令のオペランド先
読みステージと演算ステージが完了するまで待たなけれ
ばならない。さらに、オペランドフェッチもメモリへの
書込みも必要としない命令が続く場合、その命令の演算
ステージは、オペランドフェッチの必要な命令の実行結
果を用いなくても、先行するメモリへの書込みを必要と
する命令の演算ステージと、先行するオペランドフェッ
チの必要な命令のオペランド先読みステージおよび演算
ステージが完了するまで待たなければならない。このよ
うに、プログラム中に頻出するオペランドフェッチの必
要な命令とメモリへの書込みを必要とする命令が連続す
ると実行クロック数が増大し、後続するレジスタ間演算
命令等のオペランドフェッチもメモリへの書込みも必要
としない命令の実行が遅れてしまうという第１の課題を
有していた。この課題を第14図の動作タイミング図で説
明する。各装置の必要クロック数は第13図に示すものと
同じである。ただし、オペランド先読み装置21における
書込みアドレスのキューイングは１クロック、演算装置
23における演算とメモリへの書込みは４クロックとす
る。第14図は命令のシーケンスとして、オペランドフェ
ッチの必要な命令（命令１）、レジスタのメモリへの書
込みを行なう命令（命令２）、オペランドフェッチもメ
モリへの書込みも必要としない命令（命令３）が続く場
合を示している、命令１はクロックt3でアドレスが計算
され、クロックt4〜t6でオペランドをフェッチし、クロ
ックt7で演算される。しかし、命令２はクロックt4でア
ドレスが計算された後、命令１の演算ステージの完了を
待たねばならず、クロックt8〜t11で演算とメモリへの
書込みが行なわれる。命令３の演算はクロックt12まで
遅れる。However, in the configuration described above, when an instruction requiring operand fetch is followed by an instruction requiring writing to the memory, the operation stage of the latter instruction is performed even if the operation requires the former. Without using the execution result of this instruction, it is necessary to wait until the operand prefetch stage and the operation stage of the former instruction are completed. Furthermore, when an instruction that does not require operand fetching or writing to memory follows, the operation stage of the instruction is performed without using the execution result of the instruction that requires operand fetching. One has to wait until the pre-read stage and the operation stage and the operation stage of the instruction that requires writing to the preceding memory are completed. Also, when an instruction that requires writing to memory is followed by an instruction that requires operand fetch, the operation stage of the latter instruction will not be executed even if the execution result of the former instruction is not used for the operation. You must wait until the operand lookahead stage and the operation stage are completed. Furthermore, when an instruction that does not require operand fetching and writing to memory continues, the operation stage of the instruction requires writing to the preceding memory without using the execution result of the instruction requiring operand fetching. One must wait until the operation stage of the instruction, the operand prefetch stage and the operation stage of the instruction requiring the preceding operand fetch are completed. As described above, when an instruction requiring operand fetch and an instruction requiring writing to memory frequently appear in a program successively, the number of execution clocks increases, and the operand fetch such as a subsequent inter-register operation instruction is also written to memory. However, there is a first problem that the execution of an unnecessary instruction is delayed. This problem will be described with reference to the operation timing chart of FIG. The required number of clocks for each device is the same as that shown in FIG. However, the queuing of the write address in the operand prefetching device 21 is one clock, and the arithmetic device
The calculation in 23 and the writing to the memory are performed at 4 clocks. FIG. 14 shows, as a sequence of instructions, an instruction requiring an operand fetch (instruction 1), an instruction for writing a register to memory (instruction 2), and an instruction requiring neither operand fetch nor writing to memory (instruction 3). In the case of instruction 1, the address is calculated at clock t3, the operand is fetched at clocks t4 to t6, and the operation is performed at clock t7. However, the instruction 2 must wait for the completion of the operation stage of the instruction 1 after the address is calculated at the clock t4, and the operation and the writing to the memory are performed at the clocks t8 to t11. The operation of the instruction 3 is delayed until the clock t12.

本発明はかかる点に鑑み、オペランドフェッチの必要
な命令とメモリへの書込みを必要とする命令が連続して
も両命令が並列に実行でき、後続するレジスタ間演算命
令等のオペランドフェッチもメモリへの書込みも必要と
しない命令の実行が遅れることのないデータ処理装置を
提供することを目的とする。In view of this point, the present invention can execute both instructions in parallel even if an instruction that requires operand fetch and an instruction that needs to write to memory are consecutive, and operand fetches such as subsequent inter-register arithmetic instructions can also be performed to memory. It is an object of the present invention to provide a data processing device in which execution of an instruction that does not require writing of data is not delayed.

また従来の構成では、オペランドフェッチの必要な命
令が一度実行されると、以降の命令の演算ステージはそ
の命令がオペランドフェッチの不要な命令であっても、
命令先読みステージから時間的に離れてしまう。そのた
めに、演算ステージに空きが生じ、空状態のパイプライ
ンから最初に出現するオペランドフェッチの必要な命令
の実行クロック数は１クロックとはならず、アドレス計
算装置20とオペランド先読み装置21と演算装置23の必要
クロック数の和になるという第２の課題を有していた。
さらに、条件分岐命令が後続する場合は、分岐先命令の
命令先読みステージが遅れるためパイプラインのインタ
ロック時間が長くなり、パイプラインの効率が大きく低
下するという第３の課題を有していた。第２の課題を第
15図の動作タイミング図で、第３の課題を第16図の動作
タイミング図で説明する。両図とも各装置の必要なクロ
ック数は第13図に示すものと同じである。第15図は命令
のシーケンスとして、オペランドフェッチの不要な２つ
の命令（命令１、命令２）に続いて、オペランドフェッ
チの必要な１つの命令（命令３）、さらにオペランドフ
ェッチの不要な２つの命令（命令４、命令５）が続く場
合を示している。命令１および命令２はそれぞれクロッ
クt3、クロックt4で演算されるが、命令３はアドレス計
算ステージとオペランド先読みステージのため、クロッ
クt9で演算される。そのため、クロックt5〜t8の期間は
演算ステージに空きが生じ、空状態のパイプラインから
最初に出現するオペランドフェッチの必要な命令３の実
行クロック数は５クロックとなり、また命令４および命
令５の演算ステージは命令先読みステージから６クロッ
クだけ離れてしまう。第16図は命令のシーケンスとし
て、オペランドフェッチの必要な１つの命令（命令１）
とオペランドフェッチの不要な４つの命令（命令２〜命
令５）の後に、条件分岐命令（命令６）が続き、条件分
岐命令６では分岐が成立し、オペランドフェッチの不要
な命令（命令ｎ）に分岐する場合を示している。命令１
がアドレス計算ステージとオペランド先読みステージを
必要とするため、以降の命令２から命令６の演算ステー
ジは命令先読みステージから６クロックだけ離れてしま
う。従って、分岐先命令ｎの命令先読みステージはクロ
ックt13となり、命令先読みステージにおいてパイプラ
インは、クロックt7〜t12の６クロックものインタロッ
クが発生することになる。このように従来の構成では、
プログラム中の頻度が20パーセントもある条件分岐命令
の出現は、パイプラインの効率の大きな低下を招くこと
になる。Further, in the conventional configuration, once an instruction that requires an operand fetch is executed once, even if the instruction is an instruction that does not require an operand fetch, the operation stages of subsequent instructions
The instruction look-ahead stage is moved away from time. Therefore, the operation stage becomes vacant, and the number of execution clocks of the instruction that requires the operand fetch that first appears from the pipeline in the empty state is not one clock, and the address calculation device 20, the operand prefetch device 21, and the operation device. The second problem was that it would be the sum of 23 required clocks.
Furthermore, when a conditional branch instruction follows, the instruction prefetch stage of the branch destination instruction is delayed, so that the pipeline interlock time is lengthened, and the third problem is that pipeline efficiency is greatly reduced. Second task
The third problem will be described with reference to the operation timing chart of FIG. 15 and the operation timing chart of FIG. In both figures, the required number of clocks for each device is the same as that shown in FIG. FIG. 15 shows a sequence of instructions, two instructions that do not require operand fetch (instruction 1 and instruction 2), one instruction that requires operand fetch (instruction 3), and two instructions that do not require operand fetch. (Instruction 4, Instruction 5) is shown. Instruction 1 and instruction 2 are calculated at clock t3 and clock t4, respectively, while instruction 3 is calculated at clock t9 because of the address calculation stage and operand prefetch stage. Therefore, during the period of clocks t5 to t8, a vacancy occurs in the operation stage, the number of execution clocks of the instruction 3 requiring operand fetch that first appears from the pipeline in the empty state is 5 clocks, and the operation of the instruction 4 and the instruction 5 is performed. The stage is 6 clocks away from the instruction look-ahead stage. Figure 16 shows one instruction (instruction 1) that requires operand fetch as an instruction sequence.
And a conditional branch instruction (instruction 6) follow four instructions (instruction 2 to instruction 5) that do not require operand fetch, and a branch is taken at the conditional branch instruction 6 to become an instruction (instruction n) that does not require operand fetch. This shows a case where a branch occurs. Instruction 1
Requires an address calculation stage and an operand prefetch stage, the subsequent operation stages of instructions 2 to 6 are separated from the instruction prefetch stage by 6 clocks. Accordingly, the instruction prefetch stage of the branch destination instruction n is clock t13, and in the instruction prefetch stage, an interlock of six clocks from clocks t7 to t12 occurs in the pipeline. Thus, in the conventional configuration,
The appearance of conditional branch instructions with a frequency of as much as 20% in a program causes a significant decrease in pipeline efficiency.

本発明はかかる点に鑑み、オペランドフェッチの必要
な命令にオペランドフェッチの不要な命令が続く場合で
も、演算ステージに空きが生じず、空状態のパイプライ
ンから最初に出現するオペランドフェッチの必要な命令
の実行クロック数が１クロックとなり、さらに、条件分
岐命令が後続する場合も、パイプラインのインタロック
時間をオペランドフェッチの必要な命令がない場合と等
しく、パイプラインの効率の低下を最小にするデータ処
理装置を提供することを目的とする。In view of the foregoing, the present invention provides an instruction that requires no operand fetch, followed by an instruction requiring an operand fetch. When the number of execution clocks is one, and when a conditional branch instruction follows, the pipeline interlock time is equal to the case where there is no instruction requiring operand fetch, and the data that minimizes the decrease in pipeline efficiency is obtained. It is an object to provide a processing device.

また従来の構成では、オペランドフェッチの不要な命
令の演算で書換えられるレジスタを後続するオペランド
フェッチの必要な命令のアドレス計算で読出す場合、そ
のアドレス計算ステージは、オペランドフェッチの不要
な命令の演算ステージが完了するまでインタロックし
（これをアドレス計算干渉という）、パイプラインの効
率が低下するという第４の課題を有していた。この課題
を第17図の動作タイミング図で説明する。各装置の必要
クロック数は第13図に示すものと同じである。第17図は
命令のシーケンスとして、オペランドフェッチの必要な
命令（命令１）に続いて、オペランドフェッチの不要な
２つの命令（命令２、命令３）、さらにオペランドフェ
ッチの必要な命令（命令４）が続き、かつ、命令３の演
算結果を命令４のアドレス計算で用いる場合を示してい
る。命令４のアドレス計算はアドレス計算干渉がなけれ
ばクロックt6で行なうことができるが、干渉のための命
令３の演算が完了する次のクロックt10まで待たされ
る。このように、アドレス計算ステージにおいてパイプ
ラインはクロックt6〜t9の４クロックものインタロック
が発生し、アドレス計算干渉はパイプラインの効率の低
下を招くことになる。Further, in the conventional configuration, when a register that is rewritten by an operation of an instruction that does not require operand fetch is read by the address calculation of a subsequent instruction that requires operand fetch, the address calculation stage is an operation stage of an instruction that does not require operand fetch. Interlock until this is completed (this is called address calculation interference), which has a fourth problem that the efficiency of the pipeline is reduced. This problem will be described with reference to the operation timing chart of FIG. The number of clocks required for each device is the same as that shown in FIG. FIG. 17 shows, as an instruction sequence, an instruction requiring an operand fetch (instruction 1), two instructions requiring no operand fetch (instruction 2, instruction 3), and an instruction requiring an operand fetch (instruction 4). And the result of the operation of the instruction 3 is used in the address calculation of the instruction 4. The address calculation of the instruction 4 can be performed at the clock t6 if there is no address calculation interference. However, the calculation is delayed until the next clock t10 when the operation of the instruction 3 for the interference is completed. As described above, in the address calculation stage, the pipeline is interlocked with four clocks t6 to t9, and the address calculation interference causes the pipeline efficiency to be lowered.

本発明はかかる点に鑑み、オペランドフェッチの不要
な命令の演算で書換えられるレジスタを後続するオペラ
ンドフェッチの必要な命令のアドレス計算で読出す場合
でも、インタロックが発生せず、従ってパイプラインの
効率が低下しないデータ処理装置を提供することを目的
とする。In view of the foregoing, the present invention does not generate an interlock even when a register that is rewritten by an operation of an instruction that does not require operand fetch is read by the address calculation of an instruction that requires an operand fetch. It is an object of the present invention to provide a data processing device that does not reduce the power consumption.

また従来の構成では、複数のオペランドフェッチを必
要とする命令あるいは複数のオペランドのメモリへの書
込みを伴う場合が実行されると、その後に続く命令がた
とえオペランドフェッチが不要であり、かつ、複数のオ
ペランドフェッチを必要とする命令あるいは複数のオペ
ランドのメモリへの書込みを伴う命令の実行結果を必要
としない場合であっても、その命令の演算は、複数のオ
ペランドフェッチを必要とする命令あるいは複数のオペ
ランドのメモリへの書込みを伴う命令が完了するまでの
長い時間を待たなければならず、パイプラインの効率が
低下するという第５の課題を有していた。また、従来の
構成では、複数のオペランドフェッチを必要とする命令
あるいは複数のオペランドのメモリへの書込みを伴う命
令に続く命令が、複数のオペランドフェッチを必要とす
る命令あるいは複数のオペランドのメモリへの書込みを
伴う命令の実行結果を必要としないようにプログラムを
最適化するソフトウェアの性能を発揮させることができ
ないという第６の課題を有していた。第５、第６の課題
を第18図の動作タイミング図で説明する。各装置の必要
クロック数は第13図に示すものと同じである。第18図は
命令のシーケンスとして、３つのオペランドフェッチを
行なう命令（命令１）に、オペランドフェッチの不要な
２つの命令（命令２、命令３）が続き、かつ、命令２、
命令３はいずれも命令１の実行結果を必要としない場合
を示している。命令１はクロックt4〜t6で第１のオペラ
ンドの先読みを行ないクロックt7でそのオペランドをレ
ジスタR1に格納し、クロックt7〜t9で第２のオペランド
の先読みを行ないクロックt10でそのオペランドをレジ
スタR2に格納し、クロックt10〜t12で第３のオペランド
の先読みを行ないクロックt13でそのオペランドをレジ
スタR3に格納する。命令２の演算は、演算装置23がクロ
ックt4において使われておらず、かつ、命令２が命令１
の実行結果であるレジスタR1,R2,R3を用いないにもかか
わらず、命令１の第３の演算ステージの次のクロックt1
4まで待たされる。また、命令３の演算は、演算装置23
がクロックt5において使われておわず、命令３が命令１
の実行結果であるレジスタR1,R2,R3を用いないにもかか
わらず、クロックt15まで待たされる。このように、演
算装置23の長大な空き時間が生じ、パイプラインの効率
が著しく低下する。Further, in the conventional configuration, when an instruction requiring a plurality of operand fetches or a case involving writing of a plurality of operands to the memory is executed, even if the subsequent instruction does not require the operand fetch, Even if the execution result of an instruction that requires operand fetch or an instruction that involves writing a plurality of operands to memory is not required, the operation of the instruction is performed by an instruction that requires a plurality of operand fetches or a plurality of operands. There is a fifth problem that it is necessary to wait for a long time until an instruction involving writing of an operand to a memory is completed, and the efficiency of the pipeline is reduced. Further, in the conventional configuration, an instruction that requires a plurality of operand fetches or an instruction that involves writing a plurality of operands to the memory is followed by an instruction that requires a plurality of operand fetches or a plurality of operands to the memory. There is a sixth problem in that the performance of software that optimizes a program so that the execution result of an instruction involving writing is not required cannot be exhibited. The fifth and sixth problems will be described with reference to the operation timing chart of FIG. The required number of clocks for each device is the same as that shown in FIG. FIG. 18 shows an instruction sequence in which three operand fetch instructions (instruction 1) are followed by two instructions that do not require operand fetch (instruction 2 and instruction 3).
Instruction 3 shows the case where the execution result of instruction 1 is not required. Instruction 1 prefetches the first operand at clocks t4 to t6, stores the operand in register R1 at clock t7, prefetches the second operand at clocks t7 to t9, and stores the operand in register R2 at clock t10. Then, the third operand is prefetched at clocks t10 to t12, and the operand is stored in the register R3 at clock t13. The operation of the instruction 2 is such that the arithmetic unit 23 is not used at the clock t4 and the instruction 2 is the instruction 1
Although the registers R1, R2, and R3 that are the execution results of the instruction 1 are not used, the clock t1 next to the third operation stage of the instruction 1
Wait until 4 The operation of the instruction 3 is performed by the arithmetic unit 23
Is not used at clock t5, and instruction 3 becomes instruction 1
Even though the registers R1, R2, and R3, which are the execution results of, are not used, the clock t15 is kept waiting. As described above, a long idle time occurs in the arithmetic unit 23, and the efficiency of the pipeline is significantly reduced.

本発明はかかる点に鑑み、複数のオペランドフェッチ
を必要とする命令あるいは複数のオペランドのメモリへ
の書込みを伴う命令が実行されても、その後に続く命令
がオペランドフェッチが不要であり、かつ、複数のオペ
ランドフェッチを必要とする命令あるいは複数のオペラ
ンドのメモリへの書込みを伴う命令の実行結果を必要と
しない場合であれば、その命令の演算を、複数のオペラ
ンドフェッチを必要とする命令あるいは複数のオペラン
ドのメモリへの書込みを伴う命令の演算と並行または先
行して行ない、パイプラインの効率を低下させないデー
タ処理装置を提供することを目的とする。また本発明
は、複数のオペランドフェッチを必要とする命令あるい
は複数のオペランドのメモリへの書込みを伴う命令に続
く命令が、複数のオペランドフェッチを必要とする命令
あるいは複数のオペランドのメモリへの書込みを伴う命
令の実行結果を必要としないようにプログラムを最適化
するソフトウェアの性能を発揮させるデータ処理装置を
提供するうことを目的とする。In view of the foregoing, the present invention has been made in consideration of the above circumstances. Even if an instruction requiring a plurality of operand fetches or an instruction involving writing of a plurality of operands to a memory is executed, the subsequent instruction does not require the operand fetch, and If the execution result of an instruction that requires an operand fetch or an instruction that involves writing a plurality of operands to memory is not required, the operation of the instruction is performed using an instruction that requires a plurality of operand fetches or a plurality of operands. An object of the present invention is to provide a data processing device which does not reduce the efficiency of the pipeline by performing the operation in parallel or in advance with the operation of the instruction involving writing the operand to the memory. Further, according to the present invention, an instruction that requires a plurality of operand fetches or an instruction that involves writing a plurality of operands to a memory is an instruction that requires a plurality of operand fetches or a plurality of operands to be written to a memory. It is an object of the present invention to provide a data processing device that exhibits the performance of software that optimizes a program so that the execution result of an accompanying instruction is not required.

以上に示した課題は、いずれもオペランド先読み装置
21の必要クロック数が増加するほど顕著になる。従っ
て、従来の構成において、オペランド先読み装置21にデ
ータ用のキャッシュメモリを備え、オペランド先読み装
置21のキャッシュヒット時の必要クロック数を減少させ
ることにより、第１から第６の課題をより軽少なものに
することは可能である。しかしながら、内蔵するキャッ
シュメモリの容量が増加するほどそのヒット率は向上す
るが、キャッシュメモリのアクセス時間が大きくなり、
オペランド先読み装置21のキャッシュヒット時の必要ク
ロック数を減少させることが困難になるという新たな第
７の課題を有していた。All of the above issues are operand prefetch devices.
It becomes remarkable as the required number of clocks of 21 increases. Therefore, in the conventional configuration, the operand read-ahead device 21 is provided with a cache memory for data, and the number of clocks required at the time of a cache hit of the operand read-ahead device 21 is reduced to reduce the first to sixth problems. It is possible to However, as the capacity of the built-in cache memory increases, the hit rate improves, but the access time of the cache memory increases,
There is a new seventh problem that it is difficult to reduce the required number of clocks at the time of a cache hit of the operand prefetch device 21.

本発明はかかる点に鑑み、キャッシュッモリの内蔵の
有無、キャッシュメモリのヒット率、キャッシュメモリ
の容量などによるオペランドの読出しや書込みに必要な
クロック数の増減に関わりなく、第１から第６の課題を
解決することができ、またオペランドの読出しや書込み
に必要なクロック数をレジスタ間演算の実行クロック数
とは独立に設定できるデータ処理装置を提供することを
目的とする。In view of the above points, the present invention has the following features, regardless of whether the number of clocks required for reading or writing an operand is increased or decreased depending on whether or not the cache memory is built in, the hit rate of the cache memory, and the capacity of the cache memory. It is an object of the present invention to provide a data processing apparatus which can solve the problem and can set the number of clocks required for reading and writing of an operand independently of the number of execution clocks of an inter-register operation.

また従来の構成では、演算制御装置22における先読み
データの待ち合わせ時間を短縮するために、命令解読装
置19は演算制御装置22に命令実行に関する制御情報を送
出するよりもできるだけ早い時点にアドレス計算装置20
にアドレス計算および先読みのための制御情報を送出す
る必要がある。そのため、命令解読装置19は制御が複雑
になる。また上記のような構成では、演算制御装置22に
おける先読みデータまたは書込みアドレスの待ち合わせ
のための複雑な制御が必要になる。このように、各装置
の制御が複雑化することに伴い制御回路の遅延時間が増
大し、プロセッサが動作するクロック周波数を向上させ
ることが容易ではないという第８の課題を有していた。Further, in the conventional configuration, in order to reduce the waiting time of the pre-read data in the arithmetic and control unit 22, the instruction decoding unit 19 transmits the address calculation unit 20 as soon as possible before sending the control information relating to the instruction execution to the arithmetic and control unit 22.
It is necessary to send control information for address calculation and look-ahead. Therefore, the control of the instruction decoding device 19 becomes complicated. Further, in the above configuration, complicated control for waiting for the pre-read data or the write address in the arithmetic and control unit 22 is required. As described above, as the control of each device becomes complicated, the delay time of the control circuit increases, and there is an eighth problem that it is not easy to increase the clock frequency at which the processor operates.

本発明はかかる点に鑑み、プロセッサが動作するクロ
ック周波数を容易に向上できるデータ処理装置を提供す
ることを目的とする。The present invention has been made in view of the above circumstances, and an object thereof is to provide a data processing device capable of easily improving the clock frequency at which a processor operates.

また本発明は、本来逐次的に実行される先行命令と後続
命令とが並列に実行されたときでも逐次的に実行された
場合と何ら矛盾することなくフラグを更新することがで
きるデータ処理装置を提供することを目的とする。Further, the present invention provides a data processing device capable of updating a flag even when a preceding instruction and a succeeding instruction, which are originally sequentially executed, are executed in parallel without any contradiction. The purpose is to provide.

さらに本発明は、本来正順に完了される先行例と接続
命令とが逆順に完了されたときでも正順に完了された場
合と何ら矛盾することなくフラグを更新することができ
るデータ処理装置を提供することを目的とする。Further, the present invention provides a data processing device capable of updating a flag even when a preceding example which is originally completed in a normal order and a connection instruction are completed in a reverse order, without any inconsistency with a case where the flag is completed in a normal order. The purpose is to

課題を解決するための手段請求項１記載の発明は、少なくとも読出しステージと
解読ステージと実行ステージとからなり、前記実行ステ
ージはさらに直列な第一の実行ステージと第二の実行ス
テージとを含むパイプライン方式のデータ処理装置にお
いて、読出しステージとして作用する命令先読み手段
と、解読ステージとして作用する命令解読手段と、少な
くとも、第一の実行ステーとして作用する第一の命令実
行手段と、第二の実行ステージとして作用する第二の命
令実行手段とからなる命令実行手段とを備え、第一の命
令が、少なくとも前記第一の命令実行手段と前記第二の
命令実行手段においてこの順に処理される命令であっ
て、第一の命令に後続する第二の命令が、少なくとも前
記第一の命令実行手段において処理され、かつ前記第二
の命令実行手段での処理を必要としない命令であると
き、前記命令実行手段は、前記第二の命令実行手段が前
記第一の命令に係る処理を実行中であっても、前記第一
の命令実行手段において前記第二の命令に係る処理を行
い、かつ該処理で前記第二の命令のすべての処理を完了
することを特等とするデータ処理装置である。Means for Solving the Problems The invention according to claim 1 comprises at least a read stage, a decoding stage and an execution stage, and the execution stage further includes a serial execution first stage and a second execution stage. In a line-type data processing device, an instruction prefetching unit acting as a reading stage, an instruction decoding unit acting as a decoding stage, at least a first instruction executing unit acting as a first execution stage, and a second execution unit And a second instruction execution means acting as a stage, wherein the first instruction is an instruction that is processed in this order at least in the first instruction execution means and the second instruction execution means. A second instruction following the first instruction is processed in at least the first instruction executing means, and When the instruction does not require processing by the second instruction executing means, the instruction executing means may execute the first instruction even if the second instruction executing means is executing the processing relating to the first instruction. The data processing apparatus is characterized in that the instruction execution means performs the processing related to the second instruction, and completes all the processing of the second instruction in the processing.

請求項２記載の発明は、命令の先読みを行なう命令先
読み手段と、前記命令先読み手段によって読出された命
令の解読を行う命令解読手段と、前記命令解読手段の解
読結果に応じて命令の実行の処理を行う命令実行手段
と、命令実行手段の実行結果の状態を表すフラグを生成
するフラグ生成手段と、ステータスレジスタとを備え、
前記命令実行手段は、第一の命令に係る処理と、第一の
命令に後続する第二の命令に係る処理とを並列に行い、
前記フラグ生成手段は、前記命令実行手段において、前
記第一の命令に係る処理と前記第二の命令に係る処理と
が並列に実行される場合に、前記ステータスレジスタに
含まれるフラグのうち、前記第一の命令の実行により更
新され、かつ前記第二の命令の実行により更新されない
第一のフラグと、前記第二の命令の実行により更新され
る第二のフラグとを識別し、前記第一の命令の実行結果
に基づいて前記第一のフラグを生成し、前記第二の命令
の実行結果に基づいて前記第二のフラグを生成して、そ
れぞれ前記ステータスレジスタに反映することを特徴と
するデータ処理装置である。According to a second aspect of the present invention, an instruction pre-reading means for pre-reading an instruction, an instruction decoding means for decoding the instruction read by the instruction pre-reading means, and an execution of the instruction according to the decoding result of the instruction decoding means. An instruction executing means for performing processing; a flag generating means for generating a flag indicating a state of an execution result of the instruction executing means; and a status register,
The instruction execution means performs a process related to a first command and a process related to a second command subsequent to the first command in parallel,
The flag generating means, when the processing related to the first instruction and the processing related to the second instruction are executed in parallel in the instruction executing means, among the flags included in the status register, A first flag that is updated by the execution of the first instruction and is not updated by the execution of the second instruction is distinguished from a second flag that is updated by the execution of the second instruction. Generating the first flag based on the execution result of the second instruction, generating the second flag based on the execution result of the second instruction, and reflecting the second flag in the status register. A data processing device.

請求項３記載の発明は、命令の先読みを行う命令先読
み手段と、前記命令先読み手段によって読出された命令
の解読を行う命令解読手段と、前記命令解読手段の解読
結果に応じて命令の実行の処理を行う命令実行手段と、
命令実行手段の実行結果の状態を表すフラグを生成する
フラグ生成手段と、ステータスレジスタとを備え、前記
命令実行手段は、第一の命令に係る処理と前記第一の命
令に後続する第二の命令に係る処理とをプログラムの順
序と関係ない順序で完了し、前記フラグ生成手段は、前
記命令実行手段において、前記第一の命令に係る処理が
前記第二の命令に係る処理の完了後に完了される場合
に、前記第一の命令の実行により更新され、かつ前記第
二の命令の実行により更新されなかったフラグを生成し
て前記ステータスレジスタに反映することを特徴とする
データ処理装置である。According to a third aspect of the present invention, an instruction pre-reading means for pre-reading an instruction, an instruction decoding means for decoding the instruction read by the instruction pre-reading means, and an execution of the instruction according to the decoding result of the instruction decoding means. Instruction execution means for performing processing;
A flag generation unit for generating a flag indicating a state of an execution result of the instruction execution unit; and a status register, wherein the instruction execution unit performs processing related to a first instruction and a second instruction subsequent to the first instruction. The processing related to the instruction is completed in an order irrelevant to the order of the program, and the flag generating means completes the processing related to the first instruction after the processing related to the second instruction is completed in the instruction executing means. In this case, the data processing device is configured to generate a flag updated by the execution of the first instruction and not updated by the execution of the second instruction and reflecting the flag in the status register. .

作用請求項１記載の発明は前記した構成により、第二の命
令実行手段における第一の命令に係る処理の完了を持つ
ことなく、第一の命令実行手段において第二の命令に係
るすべての処理を完了する。請求項２記載の発明は前記
した構成により、先行する第一の命令の実行により更新
されかつ後続する第二の命令の実行によって更新されな
い第一のフラグを識別した場合は、第一のフラグを第一
の命令の実行結果に基づいて生成してステータスレジス
タに反映し、先行する第一の命令の実行によって更新さ
れるかされないかに関わらず後続する第二の命令の実行
によって更新される第二のフラグを識別した場合は、第
二のフラグを第二の命令の実行結果に基づいて生成して
ステータスレジスタに反映する。The invention according to claim 1 has the configuration described above, and all processing related to the second instruction is performed in the first instruction execution means without having the completion of the processing related to the first instruction in the second instruction execution means. To complete. According to the second aspect of the present invention, when the first flag that is updated by the execution of the preceding first instruction and is not updated by the execution of the subsequent second instruction is identified, the first flag is set. Generated based on the execution result of the first instruction, reflected in the status register, and updated by the execution of the subsequent second instruction whether or not updated by the execution of the preceding first instruction. If the second flag is identified, a second flag is generated based on the execution result of the second instruction and reflected in the status register.

請求項３記載の発明は前記した構成により、前記第二
の命令に係る処理が完了される時点で、前記第二の命令
の実行により更新されるフラグを生成して前記ステータ
スレジスタに反映し、その後に前記第一の命令に係る処
理が完了される時点で、前記第一の命令の実行により更
新され、かつ前記第二の命令の実行により更新されなか
ったフラグを生成して前記ステータスレジスタに反映す
る。According to the third aspect of the present invention, with the configuration described above, at the time when the process related to the second instruction is completed, a flag updated by execution of the second instruction is generated and reflected in the status register, After that, when the processing related to the first instruction is completed, a flag that is updated by the execution of the first instruction, and is not updated by the execution of the second instruction, is generated in the status register. reflect.

実施例第１図は本発明の実施例におけるデータ処理装置の構
成図を示すものである。第１図において、１は命令コー
ドの先読みを行なう命令先読み装置、２は命令解読装
置、３は第１演算装置、４はオペランドの読出しを行な
うオペランド読出し装置、５は第２演算装置、６はオペ
ランドのメモリへの書込みを行なうオペランド書込み装
置、７はメモリ・I/Oなどを接続する入出力バス、８は
命令先読み装置１およびオペランド読出し装置４からの
要求を調停し入出力バス７の制御を行なうバス制御装置
である。Embodiment FIG. 1 shows a configuration diagram of a data processing apparatus according to an embodiment of the present invention. In FIG. 1, 1 is an instruction prefetching device for prefetching an instruction code, 2 is an instruction decoding device, 3 is a first arithmetic device, 4 is an operand reading device for reading operands, 5 is a second arithmetic device, and 6 is Operand writing device for writing operands to memory, 7 an input / output bus for connecting memories and I / Os, 8 arbitrating requests from instruction prefetch device 1 and operand reading device 4 and controlling input / output bus 7 The bus control device performs the following.

命令解読装置２は、命令先読み装置１により先読みさ
れた命令コードを解読し、命令実行に関する制御情報、
メモリオペランドのフェッチを伴う場合のオペランドの
アドレス計算および読出しのための制御情報、メモリへ
の書込みを伴う場合のオペランドのアドレス計算および
書込みのための制御情報を第１演算装置３に発行する。The instruction decoding device 2 decodes the instruction code pre-read by the instruction pre-reading device 1 to obtain control information related to instruction execution,
Control information for calculating and reading the address of the operand when fetching the memory operand and control information for calculating and writing the address of the operand when writing to the memory are issued to the first arithmetic unit 3.

第１演算装置３は、命令がレジスタ間演算またはレジ
スタ・即値間演算の時は命令解読装置２より受け取った
制御情報に従って演算を実行する。命令がメモリオペラ
ンドのフェッチを伴う時は命令解読装置２より受け取っ
た制御情報に従ってオペランドのアドレス計算を行な
い、オペランドアドレスとメモリ参照に伴う制御情報と
命令実行に関する制御情報をオペランド読出し装置４に
送出する。また、命令がメモリへの書込みを伴う時は命
令解読装置21より受け取った制御情報に従って書込みア
ドレスの計算を行ない、書込みアドレスとメモリ書込み
に伴う制御情報をオペランド書込み装置６に送出する。When the instruction is an inter-register operation or a register / immediate operation, the first operation device 3 executes the operation in accordance with the control information received from the instruction decoding device 2. When an instruction involves fetching a memory operand, the address of the operand is calculated in accordance with the control information received from the instruction decoding device 2, and the operand address, control information associated with memory reference, and control information relating to instruction execution are sent to the operand reading device 4. . When the instruction involves writing to the memory, the write address is calculated in accordance with the control information received from the instruction decoding device 21, and the write address and the control information accompanying the memory writing are sent to the operand writing device 6.

オペランド読出し装置４は、バス制御装置８へ要求を
出し、第１演算装置３より受け取ったオペランドアドレ
スに従ってメモリの読出しを行なう。読出したデータは
命令実行に関する制御情報とともに第２演算装置５に送
られる。Operand reading device 4 issues a request to bus control device 8 and reads the memory according to the operand address received from first arithmetic unit 3. The read data is sent to the second arithmetic unit 5 together with the control information regarding the instruction execution.

第２演算装置５は、オペランド読出し装置４より受け
取ったデータと命令実行に関する制御情報に従って演算
を実行する。演算結果は、第１演算装置３に返される。The second arithmetic unit 5 executes the arithmetic according to the data received from the operand reading unit 4 and the control information regarding the instruction execution. The calculation result is returned to the first calculation device 3.

オペランド書込み装置６は、第１演算装置３から書込
みデータを取りこみ、オペランド読出し装置４を通して
バス制御装置８へ要求を出し、第１演算装置３より受け
取った書込みアドレスに従って、メモリへの書込みを行
なう。The operand writing device 6 fetches write data from the first arithmetic device 3, issues a request to the bus control device 8 through the operand read device 4, and performs writing to the memory according to the write address received from the first arithmetic device 3.

第２図は第１図に示す第１演算装置３、オペランド読
出し装置４、第２演算装置５およびオペランド書込み装
置６の詳細構成図を示すものである。第２図において、
301は命令解読装置２より受け取った命令実行に関する
制御情報やオペランドのアドレス計算のための制御情報
に従って第１演算装置３の制御を行なう演算制御回路、
302は汎用レジスタ、303は汎用レジスタ302の制御を行
なうレジスタ制御回路、304は汎用レジスタ302に格納さ
れたデータを演算する第１算術論理演算回路、305はス
テータレジスタ、306はステータレジスタ305が示す演算
結果のフラグ情報を生成するフラグ生成回路で、以上は
第１演算装置３に実装される。401はメモリの読出しを
行なうアドレスを保持する読出しアドレスレジスタ、40
2は複数のオペランド読出しの際の結果を格納するレジ
スタを指定する情報（レジスタリスト）を含メモリ読出
しに伴う制御情報を演算制御回路301より受け取り、オ
ペランド読出し装置４とレジスタ制御回路303の制御を
行なう読出し制御回路、403はバス制御装置８より受け
取った読出しデータを保持する読出しデータバッファ、
404は読出しアドレスレジスタ401の内容を一定値増加さ
せる読出しアドレス増分回路で、以上はオペランド読出
し装置４に実装される。501は読出しデータバッファ403
と汎用レジスタ302に格納されたデータの演算を行なう
第２算術論理演算回路で、第２演算装置５に実装され
る。601はメモリの書込みを行なうアドレスを保持する
書込みアドレスレジスタ、602は複数のオペランド書込
みの際のオペランドを格納しているレジスタを指定する
情報（レジスタリスト）を含めメモリ書込みに伴う制御
情報を演算制御回路301より受け取り、オペランド書込
み装置６とレジスタ制御回路303の制御を行なう書込み
制御回路、603は汎用レジスタ302から取りこんだ書込み
データまたは第２算術論理演算回路501より受け取った
書込みデータを保持する書込みデータバッファ、604は
書込みアドレスレジスタ601の内容を一定値増加させる
書込みアドレス増分回路で、以上はオペランド書込み装
置６に実装される。フラグ生成回路306は第１算術論理
演ざ回路304および第２算術論理演算回路501よりフラグ
生成に関する情報を受け取り、フラグを生成する。FIG. 2 is a detailed block diagram of the first arithmetic unit 3, the operand reading unit 4, the second arithmetic unit 5 and the operand writing unit 6 shown in FIG. In FIG.
301, an arithmetic control circuit for controlling the first arithmetic unit 3 in accordance with control information relating to instruction execution and control information for calculating an address of an operand received from the instruction decoding unit 2;
302 is a general-purpose register, 303 is a register control circuit that controls the general-purpose register 302, 304 is a first arithmetic logic operation circuit that calculates data stored in the general-purpose register 302, 305 is a stator register, and 306 is a stator register 305. A flag generation circuit for generating flag information of a calculation result, which is implemented in the first calculation device 3. 401 is a read address register for holding an address for reading the memory, 40
2 receives from the arithmetic and control circuit 301 control information accompanying the memory read, including information (register list) specifying a register for storing a result of reading a plurality of operands, and controls the operand read device 4 and the register control circuit 303. A read control circuit 403; a read data buffer 403 for holding read data received from the bus control device 8;
Reference numeral 404 denotes a read address increment circuit for increasing the content of the read address register 401 by a constant value. 501 is the read data buffer 403
And a second arithmetic and logic circuit that performs an operation on the data stored in the general-purpose register 302 and is implemented in the second arithmetic unit 5. 601 is a write address register for holding an address for writing to the memory, and 602 is arithmetically controlled for control information associated with writing to the memory, including information (register list) for specifying a register storing operands when writing a plurality of operands. A write control circuit which receives from the circuit 301 and controls the operand writing device 6 and the register control circuit 303; 603, write data which is taken in from the general-purpose register 302 or write data which holds the write data received from the second arithmetic logic operation circuit 501 A buffer, 604, is a write address increment circuit that increments the contents of the write address register 601 by a fixed value, and the above is implemented in the operand writing device 6. The flag generation circuit 306 receives information about flag generation from the first arithmetic logic operation circuit 304 and the second arithmetic logic operation circuit 501, and generates a flag.

以上のように構成された本実施例のデータ処理装置に
ついて、以下その動作を説明する。The operation of the data processing apparatus of this embodiment configured as described above will be described below.

第３図は命令の型と、パイプラインステージの流れの
関係を示したものである。ここで、IFは命令先素み装置
１における命令先読みステージ、DECは命令解読装置２
における命令解読ステージ、EX1は第１演算装置３にお
ける第１演算ステージ、OFはオペランド読出し装置４に
おけるオペランド読出しステージ、EX2は第２演算装置
５における第２演算ステージ、OSはオペランド書込み装
置６におけるオペランド書込みステージを表す。命令は
命令先読み装置１で先読みされ、命令解読装置２で解読
される。その後の処理の流れは命令の型によって異な
る。FIG. 3 shows the relationship between the instruction type and the flow of the pipeline stage. Here, IF is the instruction prefetch stage in the instruction predecessor device 1, and DEC is the instruction decoding device 2.
, EX1 is the first operation stage in the first operation device 3, OF is the operand operation stage in the operand operation device 4, EX2 is the second operation stage in the second operation device 5, and OS is the operand in the operand operation device 6. Represents a writing stage. The instruction is prefetched by the instruction prefetching device 1 and decoded by the instruction decoding device 2. The subsequent processing flow differs depending on the type of instruction.

（ａ）はレジスタ間演算命令またはレジスタ・即値間
演算命令の場合である。処理は第１演算装置３での演算
で終わる。即ち、第１演算装置３は、汎用レジスタ302
のデータまたは演算制御回路301から得られる即値を第
１算術論理演算回路304で演算し、汎用レジスタ302に格
納する。(A) is a case of a register-to-register operation instruction or a register-immediate-value operation instruction. The process ends with the calculation in the first calculation device 3. That is, the first arithmetic unit 3 includes the general-purpose register 302
Is calculated by the first arithmetic and logic operation circuit 304 and stored in the general-purpose register 302.

（ｂ）はメモリ・レジスタ間演算を行ない結果をレジ
スタに格納する命令の場合である。演算の伴わない（ロ
ード）命令も含まれる。処理は第１演算装置３でのオペ
ランドのアドレス計算、オペランド読出し装置４でのオ
ペランドの読出し、第２演算装置５での演算と流れて終
わる。即ち、第１演算装置３は、第１算術論理演算回路
304でオペランドの読出しアドレスを計算し、読出しア
ドレスレジスタ401に送る。オペランド読出し装置４
は、読出しアドレスレジスタ401に保持された読出しア
ドレスをバス制御装置８に渡し、バス制御装置８から受
け取った読出しデータを一旦読出しデータバッファ403
に保持する。第２演算装置５は、読出しデータバッファ
403に保持された読出しデータを一方のオペランドであ
る汎用レジスタ302の内容とともに第２算術論理演算回
路501で演算し、結果を汎用レジスタ302に格納する。命
令が演算を伴わない時は、第２算術論理演算回路501は
読出しデータバッファ403の内容をそのまま通過させる
働きをする。(B) is a case of an instruction for performing a memory-register operation and storing the result in the register. It also includes instructions that do not involve computation (load). The processing ends with the flow of the operand address calculation in the first arithmetic unit 3, the operand reading in the operand read unit 4, and the arithmetic in the second arithmetic unit 5. That is, the first arithmetic unit 3 includes a first arithmetic and logic operation circuit
The read address of the operand is calculated at 304 and sent to the read address register 401. Operand reading device 4
Passes the read address held in the read address register 401 to the bus control device 8 and once reads the read data received from the bus control device 8 from the read data buffer 403.
To hold. The second arithmetic unit 5 includes a read data buffer
The read data held in 403 is operated by the second arithmetic and logic operation circuit 501 together with the contents of the general register 302 as one operand, and the result is stored in the general register 302. When the instruction does not involve an operation, the second arithmetic and logic operation circuit 501 functions to pass the contents of the read data buffer 403 as it is.

（ｃ）はレジスタの内容をメモリに書込む（ストア）
命令の場合である。処理は第１演算装置３での書込みア
ドレスの計算、オペランド書込み装置６でのメモリへの
書込みと流れて終わる。即ち、第１演算装置３は、第１
算術論理演算回路304でオペランドの書込みアドレスを
計算し、書込みアドレスレジスタ601に送る。オペラン
ド書込み装置６は、汎用レジスタ302から書込みデータ
を取りこみ、書込みデータバッファ603に一旦保持した
後、書込みアドレスレジスタ601に保持された書込みア
ドレスと共にバス制御装置８に送り、メモリへの書込み
を行なう。(C) writes the contents of the register to the memory (store)
This is the case for instructions. The processing ends with the calculation of the write address in the first arithmetic unit 3 and the writing to the memory in the operand writing unit 6. That is, the first arithmetic unit 3
The arithmetic logic operation circuit 304 calculates the write address of the operand and sends it to the write address register 601. The operand writing device 6 takes in the write data from the general-purpose register 302, temporarily holds it in the write data buffer 603, and then sends it to the bus controller 8 together with the write address held in the write address register 601 to write it in the memory.

（ｄ）はレジスタ・メモリ間演算を行ない結果をメモ
リに書込む命令の場合である。この場合のメモリオペラ
ンドは読出し→更新→書込みとなる。処理は第１演算装
置３でのオペランドのアドレス計算、オペランド読出し
装置４でのオペランドの読出し、第２演算装置５での演
算、オペランド書込み装置６での演算結果のメモリへの
書込みと流れで終わる。即ち、第１演算装置３は、第１
算術論理演算回路304でオペランドの読出し・書込みア
ドレスを計算し、読出しアドレスレジスタ401と書込み
アドレスレジスタ601に送る。オペランド読出し装置４
は、読出しアドレスレジスタ401に保持された読出しア
ドレスをバス制御装置８に渡し、バス制御装置８から受
け取った読出しデータを一旦読出しデータバッファ403
に保持する。第２演算装置５は、読出しデータバッファ
403に保持された読出しデータを一方のオペランドであ
る汎用レジスタ302の内容とともに第２算術論理演算回
路501で演算し、結果を書込みデータバッファ603に格納
する。オペランド書込み装置６は、書込みアドレスレジ
スタ601に保持された書込みアドレスと書込みデータバ
ッファ603に保持された書込みデータを共にバス制御装
置８に送り、メモリへの書込みを行なう。(D) is a case of an instruction for performing an operation between the register and the memory and writing the result to the memory. The memory operand in this case is read → update → write. The processing ends with the calculation of the address of the operand in the first arithmetic unit 3, the reading of the operand in the operand reading unit 4, the calculation in the second arithmetic unit 5, and the writing of the calculation result in the operand writing unit 6 into the memory. . That is, the first arithmetic unit 3
The arithmetic / logic operation circuit 304 calculates the read / write address of the operand and sends it to the read address register 401 and the write address register 601. Operand reading device 4
Passes the read address held in the read address register 401 to the bus control device 8 and once reads the read data received from the bus control device 8 from the read data buffer 403.
To hold. The second arithmetic unit 5 includes a read data buffer
The read data held in 403 is operated by the second arithmetic and logic operation circuit 501 together with the contents of the general-purpose register 302 as one operand, and the result is stored in the write data buffer 603. The operand writing device 6 sends both the write address held in the write address register 601 and the write data held in the write data buffer 603 to the bus control device 8, and writes the data to the memory.

（ｅ）はアドレスが連続する複数のメモリオペランド
とレジスタ間の演算を行ない複数の結果を複数のレジス
タに格納する命令の場合である。演算の伴わない（マル
チプルロード）命令も含まれる。処理は第１演算装置３
での第１のメモリオペランドのアドレス計算の後、オペ
ランド読出し装置４でのオペランドの読出しと第２演算
装置５での演算とを繰返して終わる。ただし、オペラン
ド読出し装置４での次のオペランドの読出しと第２演算
装置５での演算は並列に実行する。即ち、第１演算装置
３は、第１算術論理演算回路304で最初のオペランドの
読出しアドレスを計算し、読出しアドレスレジスタ401
に送るとともに、結果を格納するレジスタ群を示すレジ
スタリストを演算制御回路301から読出し制御回路402に
送る。またこの時、レジスタ制御回路303は汎用レジス
タ302の結果を格納するレジスタ群の読出しを禁止す
る。オペランド読出し装置４は、読出しアドレスレジス
タ401に保持された読出しアドレスをバス制御装置８に
渡すと同時に、読出しアドレス増分回路404で一定値増
加し、再び読出しアドレスレジスタ401に保持する。バ
ス制御装置８から受け取った読出しデータを一旦読出し
データバッファ403に保持する。第２演算装置５は、読
出しデータバッファ403に保持された読出しデータを一
方のオペランドである汎用レジスタ302の内容とともに
第２算術論理演算回路501で演算し、結果を汎用レジス
タ302に格納する。命令が演算を伴わない時は、第２算
術論理演算回路501は読出しデータバッファ403の内容を
そのまま通過させる働きをする。同時に、読出し制御回
路402はレジスタ制御回路303に対して結果の１つが汎用
レジスタ302に格納されたことを通知し、レジスタ制御
回路303はこれを受けて汎用レジスタ302の当該レジスタ
の読出しの禁止を解除する。第２以降のオペランドにつ
いては、オペランド読出し装置４と第２演算装置５が第
１演算装置３とは独立に、オペランドアドレスの増分加
算、メモリオペランドの読出し、演算、結果の格納、当
該レジスタの読出しの禁止の解除を繰返して行なう。こ
の時、オペランド読出し装置４と第２演算装置５は、パ
イプラインを形成し並列に実行する。(E) is a case of an instruction for performing an operation between a plurality of memory operands having consecutive addresses and a register and storing a plurality of results in a plurality of registers. Instructions without operations (multiple loads) are also included. The processing is performed by the first arithmetic unit 3
After the calculation of the address of the first memory operand in 1), the reading of the operand in the operand reading device 4 and the calculation in the second arithmetic device 5 are repeated. However, the reading of the next operand by the operand reading device 4 and the operation by the second arithmetic device 5 are executed in parallel. That is, the first arithmetic unit 3 calculates the read address of the first operand in the first arithmetic and logic operation circuit 304, and
And a register list indicating a register group for storing the result is read from the arithmetic control circuit 301 and sent to the read control circuit 402. At this time, the register control circuit 303 prohibits reading of the register group that stores the result of the general-purpose register 302. The operand reading device 4 transfers the read address held in the read address register 401 to the bus control device 8, and at the same time, increases the fixed value by the read address increment circuit 404 and holds the read address in the read address register 401 again. The read data received from the bus controller 8 is once held in the read data buffer 403. The second arithmetic unit 5 performs an arithmetic operation on the read data held in the read data buffer 403 together with the contents of the general-purpose register 302 which is one of the operands, and stores the result in the general-purpose register 302. When the instruction does not involve an operation, the second arithmetic and logic operation circuit 501 functions to pass the contents of the read data buffer 403 as it is. At the same time, the read control circuit 402 notifies the register control circuit 303 that one of the results is stored in the general-purpose register 302, and the register control circuit 303 receives this and prohibits the general-purpose register 302 from reading the register. To release. For the second and subsequent operands, the operand reading device 4 and the second arithmetic device 5 are independent of the first arithmetic device 3 and perform incremental addition of operand addresses, reading of memory operands, calculation, storage of results, reading of the register. Is repeatedly released. At this time, the operand reading device 4 and the second arithmetic device 5 form a pipeline and execute them in parallel.

（ｆ）は複数のレジスタの内容をアドレスが連続する
メモリに書込む（マルチプルストア）命令の場合であ
る。処理は第１演算装置３での第１のレジスタの書込み
アドレス計算の後、オペランド書込み装置６でのメモリ
への書込みを繰返して終わる。即ち、第１演算装置３
は、第１算術論理演算回路304で最初のオペランドの書
込みアドレスを計算し、書込みアドレスレジスタ601に
送るとともに、メモリに書込むレジスタ群を示すレジス
タリストを演算制御回路301から書込み制御回路602に送
る。またこの時、レジスタ制御回路303は汎用レジスタ3
02のメモリに書込むレジスタ群の書込みを禁止する。オ
ペランド書込み装置６は、書込み制御回路602に従い汎
用レジスタ302から書込みデータを取りこみ、書込みデ
ータバッファ603に一旦保持した後、書込みアドレスレ
ジスタ601に保持された書込みアドレスと共にバス制御
装置８に送り、メモリへの書込みを行なうと同時に、書
込みアドレス増分回路604で一定値増加し、再び書込み
アドレスレジスタ601に保持する。同時に、書込み制御
回路602はレジスタ制御回路303に対してレジスタの１つ
が書込みデータバッファ603に保持されたことを通知
し、レジスタ制御回路303はこれを受けて汎用レジスタ3
02の当該レジスタの書込みの禁止を解除する。第２以降
のオペランドについては、オペランド書込み装置６が第
１演算装置３とは独立に、レジスタの読出し、オペラン
ドアドレスの増分加算、メモリへの書込み、当該レジス
タの書込みの禁止の解除を繰返して行なう。(F) is a case of an instruction to write the contents of a plurality of registers into a memory having consecutive addresses (multiple store). After the write address of the first register is calculated by the first arithmetic unit 3, the process is repeated by repeating the writing to the memory by the operand write unit 6. That is, the first arithmetic unit 3
Calculates the write address of the first operand in the first arithmetic logic operation circuit 304, sends it to the write address register 601, and sends a register list showing a register group to be written in the memory from the operation control circuit 301 to the write control circuit 602. . At this time, the register control circuit 303 sets the general-purpose register 3
Inhibits writing to the registers that are written to the 02 memory. The operand writing device 6 fetches the write data from the general-purpose register 302 according to the write control circuit 602, temporarily holds it in the write data buffer 603, and then sends it to the bus controller 8 together with the write address held in the write address register 601 to the memory. At the same time that the write address is written, the write address increment circuit 604 increments it by a constant value and holds it again in the write address register 601. At the same time, the write control circuit 602 notifies the register control circuit 303 that one of the registers has been held in the write data buffer 603.
Release the prohibition of writing to the corresponding register of 02. Regarding the second and subsequent operands, the operand writing device 6 repeats the reading of the register, the incremental addition of the operand address, the writing to the memory, and the release of the prohibition of the writing of the register, independently of the first arithmetic unit 3. .

以上のように本実施例のデータ処理装置は、第１演算
装置３が以降の処理を切り離し、オペランド読出し装置
４と第２演算装置５、あるいはオペランド書込み装置６
が命令先読み装置１、命令解読装置２、第１演算装置３
とは独立に命令の以降の処理を実行するものである。As described above, in the data processing apparatus according to the present embodiment, the first arithmetic unit 3 separates the subsequent processing, and the operand reading unit 4 and the second arithmetic unit 5 or the operand writing unit 6
Is an instruction prefetching device 1, an instruction decoding device 2, a first arithmetic device 3
Independently executes the subsequent processing of the instruction.

第４図は動作タイミング図を示すものである。命令先
読み装置１、命令解読装置２、第１演算装置３、オペラ
ンド読出し装置４、および第２演算装置５において実行
されている命令とステータスレジスタ305の内容の変化
をクロック単位で示している。ここで、命令先読み装置
18は内部にキャッシュメモリを持ち、各装置の必要クロ
ック数が、命令先読み装置１（１クロック）、命令解読
装置２（１クロック）、第１演算装置３（１クロッ
ク）、オペランド読出し装置４（３クロック）、および
第２演算装置５（１クロック）の場合を示している。実
行している命令シーケンスは、第13図の動作タイミング
図に示した命令シーケンスと同じであり、メモリ・レジ
スタ間演算レジスタ格納命令に続いて２命令のレジスタ
間演算命令を実行し、この３命令の繰り返しとなってい
る。具体的には、命令1,4がメモリ・レジスタ間演算レ
ジスタ格納命令であり、命令2,3,5,6がレジスタ間演算
命令である。またパイプラインの初期状態は空状態（例
えば条件分岐時）としている。命令１は、クロックt1に
命令先読み装置１で命令コードの先読みが行なわれ、命
令コードを命令解読装置２に発行する。クロックt2に命
令解読装置２で命令解読が行なわれ、オペランドのアド
レス計算および読出しのための制御情報と命令実行に関
する制御情報を第１演算装置３に発行する。オペランド
のアドレス計算および読出しのための制御情報に従っ
て、クロックt3に第１演算装置３でオペランドのアドレ
ス計算が行なわれ、クロックt4〜t6にオペランド読出し
装置４でオペランドの読出しが行なわれる。読出しデー
タと命令実行に関する制御情報は第２演算装置５に送出
され、クロックt7に第２演算装置５で演算される。命令
２は、クロックt2に命令先読み装置１で命令コードの先
読みが行なわれ、クロックt3に命令解読装置２で命令解
読が行なわれ、命令実行に関する制御情報のみを第１演
算装置３に発行する。クロックt4に第１演算装置３で演
算される。同様に命令３はクロックt5に第１演算装置３
で演算される。FIG. 4 shows an operation timing chart. The instruction executed in the instruction prefetching device 1, the instruction decoding device 2, the first arithmetic device 3, the operand reading device 4, and the second arithmetic device 5 and the change of the contents of the status register 305 are shown in clock units. Where the instruction prefetcher
Reference numeral 18 has a cache memory therein, and the number of clocks required for each device is the instruction prefetching device 1 (1 clock), the instruction decoding device 2 (1 clock), the first arithmetic unit 3 (1 clock), and the operand reading device 4 ( 3 clocks) and the second arithmetic unit 5 (1 clock). The instruction sequence that is being executed is the same as the instruction sequence shown in the operation timing diagram of FIG. 13, and two instruction operations between registers are executed following the instruction for storing the operation register between the memory and the register. It has become a repetition. Specifically, instructions 1 and 4 are memory / register arithmetic operation register storage instructions, and instructions 2, 3, 5, and 6 are inter-register arithmetic operation instructions. The initial state of the pipeline is empty (for example, at the time of conditional branch). Instruction 1 is prefetched by the instruction prefetching device 1 at a clock t1 and issues the instruction code to the instruction decoding device 2. At the clock t2, the instruction decoding device 2 decodes the instruction, and issues to the first arithmetic unit 3 control information for calculating and reading the address of the operand and control information for executing the instruction. In accordance with control information for operand address calculation and reading, operand address calculation is performed by first arithmetic unit 3 at clock t3, and operand reading is performed by operand reading device 4 at clocks t4 to t6. The read data and the control information relating to the instruction execution are sent to the second arithmetic unit 5, and are calculated by the second arithmetic unit 5 at the clock t7. For the instruction 2, the instruction code is prefetched by the instruction prefetching device 1 at the clock t2, the instruction is decoded by the instruction decoding device 2 at the clock t3, and only the control information relating to the instruction execution is issued to the first arithmetic unit 3. The calculation is performed by the first calculation device 3 at the clock t4. Similarly, the instruction 3 is the first arithmetic unit 3 at the clock t5.
Is calculated by

このように、命令2,命令３は命令１より先行的に演算
され、命令５は命令１と並列に演算される。フラグ生成
回路306はクロックt4において第１算術論理演算回路304
から得られる情報に基づいて命令２のフラグを生成し、
クロックt5にステータスレジスタ305に反映する。この
時、更新したフラグを記憶しておく。また、クロックt5
において第１算術論理演算回路304から得られる情報に
基づいて命令３のフラグを生成し、クロックt6にステー
タスレジスタ305に反映する。この時も更新したフラグ
を記憶しておく。次に、フラグ生成回路306はクロックt
7において第１算術論理演算回路304から得られる情報に
基づいて命令５のフラグを、第２算術論理演算回路501
から得られる情報に基づいて命令１のフラグを生成す
る。しかしこの時、命令１で更新されるフラグのうち、
クロックt5,t6で更新したフラグと同一のものについて
は新たにフラグを生成しない。また、命令５で更新され
るフラグのうち、命令１で更新されるフラグと同一のも
のがある場合は、クロックt5,t6で更新したフラグと同
一であっても命令５を優先して生成する。生成されたフ
ラグはクロックt8にステータスレジスタ305に反映す
る。In this way, the instruction 2 and the instruction 3 are operated in advance of the instruction 1, and the instruction 5 is operated in parallel with the instruction 1. The flag generation circuit 306 receives the first arithmetic logic operation circuit 304 at the clock t4.
Generates a flag for instruction 2 based on the information obtained from
Reflected in the status register 305 at the clock t5. At this time, the updated flag is stored. Also, clock t5
Then, the flag of the instruction 3 is generated based on the information obtained from the first arithmetic logic operation circuit 304, and the flag is reflected in the status register 305 at the clock t6. At this time also, the updated flag is stored. Next, the flag generation circuit 306 outputs the clock t
7, the flag of the instruction 5 is set based on the information obtained from the first arithmetic logic operation circuit 304, and the second arithmetic logic operation circuit 501
The flag of the instruction 1 is generated based on the information obtained from. However, at this time, of the flags updated by instruction 1,
No new flag is generated for the same flag updated at clocks t5 and t6. Further, if there is the same flag updated by the instruction 1 as the flag updated by the instruction 5, the instruction 5 is preferentially generated even if it is the same as the flag updated by the clocks t5 and t6. . The generated flag is reflected in the status register 305 at clock t8.

以上のように本実施例によれば、命令の演算される順
序は、必ずしもプログラムの順序とは一致しない。しか
し、ステータスレジスタ305へのフラグ反映は、フラグ
生成回路306において生成するフラグの調停を行なうこ
とにより、命令がプログラムの順序で演算された場合と
矛盾なく行なわれることができる。As described above, according to the present embodiment, the order in which the instructions are operated does not always match the order in which the programs are executed. However, the flag can be reflected in the status register 305 by performing arbitration of the flag generated by the flag generation circuit 306 without any contradiction with the case where the instructions are operated in the program order.

第５図は、本実施例が第１の課題を解決することを説
明する動作タイミング図を示したものである。各装置の
必要クロック数は第４図に示すものと同じである。ただ
し、オペランド書込み装置６におけるメモリへの書込み
に要するクロック数は３クロックとする。実行している
命令シーケンスは、第14図の動作タイミング図に示した
命令シーケンスと同じである。メモリ・レジスタ間演算
レジスタ格納命令（命令１）、レジスタをメモリへ転送
する（ストア）命令（命令２）、レジスタ間演算命令
（命令３）が続く場合を示している。命令１はクロック
t3でアドレスが計算され、クロックt4〜t6でオペランド
をフェッチし、クロックt7で演算される。しかし、命令
２はクロックt4に第１演算装置３でアドレスが計算され
た後、以降の処理は切り離され、レジスタのメモリへの
転送はオペランド書込み装置６が第１演算装置３とは独
立に行なう。即ち、オペランド書込み装置６はクロック
t5において汎用レジスタ302から書込みデータを書込み
データバッファ603に取りこみ、クロックt7〜t9におい
て書込みアドレスレジスタ601に保持された書込みアド
レスと書込みデータバッファ603に保持された書込みデ
ータをバス制御装置８に送り、メモリへの書込みを行な
う。命令３は命令１のメモリ読出しや命令２のメモリ書
込みと並列にクロックt5で演算される。FIG. 5 is an operation timing chart for explaining that the present embodiment solves the first problem. The number of clocks required for each device is the same as that shown in FIG. However, the number of clocks required for writing to the memory in the operand writing device 6 is three. The instruction sequence being executed is the same as the instruction sequence shown in the operation timing diagram of FIG. A case is shown in which a memory / register operation register storage instruction (instruction 1), a register transfer (store) instruction (instruction 2), and an inter-register operation instruction (instruction 3) follow. Instruction 1 is a clock
The address is calculated at t3, the operand is fetched at clocks t4 to t6, and the operation is performed at clock t7. However, after the address of the instruction 2 is calculated by the first arithmetic unit 3 at the clock t4, the subsequent processing is separated, and the transfer of the register to the memory is performed by the operand writing unit 6 independently of the first arithmetic unit 3. . That is, the operand writer 6 is clocked
At t5, write data is fetched from the general-purpose register 302 to the write data buffer 603, and at clocks t7 to t9, the write address held in the write address register 601 and the write data held in the write data buffer 603 are sent to the bus controller 8. Write to memory. The instruction 3 is operated at the clock t5 in parallel with the memory read of the instruction 1 and the memory write of the instruction 2.

以上のように本実施例によれば、オペランドフェッチ
の必要な命令と後続するオペランドフェッチもメモリへ
の書込みも必要としない命令、メモリへの書込みを必要
とする命令と後続するオペランドフェッチもメモリへの
書込みも必要としない命令、オペラドフェッチの必要な
命令と後続するメモリへの書込みを必要とする命令、メ
モリへの書込みを必要とする命令と後続するオペランド
フェッチの必要な命令をそれぞれ並列に実行することが
でき、命令列の実行時間を短縮することができる。As described above, according to the present embodiment, an instruction that requires an operand fetch and an instruction that does not require a subsequent operand fetch or writing to a memory, an instruction that requires a memory writing and a subsequent operand fetch are also stored in a memory Instructions that do not require writing, instructions that require operand fetching and instructions that require writing to subsequent memory, instructions that require writing to memory, and instructions that require subsequent operand fetching are executed in parallel. It can be executed, and the execution time of the instruction sequence can be shortened.

第６図は、本実施例が第２の課題を解決することを説
明する動作タイミング図を示したものである。各装置の
必要クロック数は第４図に示すものと同じである。実行
している命令シーケンスは、第15図の動作タイミング図
に示した命令シーケンスと同じであり、２つのレジスタ
間演算命令（命令1,命令２）に続いて、１つのメモリ・
レジスタ間演算レジスタ格納命令（命令３）、さらに２
つのレジスタ間演算命令（命令4,命令５）が続く場合を
示している。命令３はクロックt5でアドレス計算を行な
い、クロックt6〜t8でオペランドがメモリから読出さ
れ、クロックt9で演算される。しかしオペランド読出し
装置４、第２演算装置５は、第１演算装置３と独立に動
作するため、命令4,命令５は、命令３のメモリの読出し
と並列に第１演算装置３においてそれぞれクロックt6,
クロックt7で演算することができ、パイプラインステー
ジの空きが発生しない。FIG. 6 is an operation timing chart for explaining that the present embodiment solves the second problem. The required number of clocks for each device is the same as that shown in FIG. The instruction sequence being executed is the same as the instruction sequence shown in the operation timing diagram of FIG. 15, and two memory operation instructions (instruction 1, instruction 2) are followed by one memory instruction.
Inter-register operation register storage instruction (instruction 3), and 2 more
The case where one inter-register operation instruction (instruction 4, instruction 5) continues is shown. Instruction 3 performs address calculation at clock t5, and operands are read from the memory at clocks t6 to t8, and are operated at clock t9. However, since the operand reading device 4 and the second arithmetic device 5 operate independently of the first arithmetic device 3, the instruction 4 and the instruction 5 are sent to the first arithmetic device 3 in parallel with the reading of the memory of the instruction 3 by the clock t6. ,
Operation can be performed at clock t7, and no empty pipeline stage occurs.

以上のように本実施例によれば、オペランドフェッチ
の必要な命令にオペランドフェッチの不要な命令が続く
場合でも、演算ステージに空きが生じず、空状態のパイ
プラインから最初に出現するオペランドフェッチの必要
な命令の実行クロック数を１クロックとすることができ
る。As described above, according to the present embodiment, even when an instruction requiring an operand fetch is followed by an instruction requiring an operand fetch, no empty space is generated in the operation stage, and the first operand fetch from the empty pipeline is not executed. The number of execution clocks of a necessary instruction can be one clock.

第７図は、本実施例が第３の課題を解決することを説
明する動作タイミング図を示したものである。各装置の
必要クロック数は第４図に示すものと同じである。実行
している命令シーケンスは、第16図の動作タイミング図
に示した命令シーケンスと同じであり、１つのメモリ・
レジスタ間演算レジスタ格納命令（命令１）と４つのレ
ジスタ間演算命令（命令２〜命令５）の後に、条件分岐
命令（命令６）が続き、条件分岐命令６では分岐が成立
し、レジスタ間演算命令（命令ｎ）に分岐する場合を示
している。命令１がオペランドのアドレス計算とメモリ
からのオペランドの読出しを必要とするが、オペランド
読出し装置４、第２演算装置５は、第１演算装置３と独
立に動作するため、以降の命令２から命令６は、命令１
のメモリの読出しと並列に第１演算装置３においてクロ
ックt4からクロックt8で演算することができる。従っ
て、分岐先命令ｎの命令先読みステージはクロックt9ま
で早まり、命令先読みステージにおいてパイプライン
は、クロックt7〜t8の２クロックのインクロックしか発
生しない。FIG. 7 is an operation timing chart for explaining that the present embodiment solves the third problem. The required number of clocks for each device is the same as that shown in FIG. The instruction sequence being executed is the same as the instruction sequence shown in the operation timing chart of FIG.
Inter-register operation A register store instruction (instruction 1) and four inter-register operation instructions (instruction 2 to instruction 5) are followed by a conditional branch instruction (instruction 6). In conditional branch instruction 6, a branch is taken and inter-register operation is performed. The figure shows a case of branching to an instruction (instruction n). Although instruction 1 requires operand address calculation and operand reading from memory, the operand reading device 4 and the second arithmetic device 5 operate independently of the first arithmetic device 3, so that the instruction 2 6 is instruction 1
In the first arithmetic unit 3 from the clock t4 to the clock t8 in parallel with the reading of the memory. Therefore, the instruction prefetch stage of the branch destination instruction n is advanced to the clock t9, and in the instruction prefetch stage, the pipeline generates only two in-clocks of the clocks t7 to t8.

以上のように本実施例によれば、オペランドフェッチ
の必要な命令に条件分岐命令が後続する場合も、パイプ
ラインのインタロック時間をオペランドフェッチの必要
な命令がない場合と等しくし、パイプラインの効率の低
下を最小にすることができる。As described above, according to the present embodiment, even when an instruction requiring an operand fetch is followed by a conditional branch instruction, the interlock time of the pipeline is made equal to that when no instruction requires the operand fetch, and Efficiency loss can be minimized.

第８図は、本実施例が第４の課題を解決することを説
明する動作タイミング図を示したものである。各装置の
必要クロック数は第４図に示すものと同じである。実行
している命令シーケンスは、第17図の動作タイミング図
に示した命令シーケンスと同じであり、メモリ・レジス
タ間演算レジスタ格納命令（命令１）に続いて、２つの
レジスタ間演算命令（命令2,命令３）、さらにメモリ・
レジスタ間演算レジスタ格納命令（命令４）が続き、か
つ、命令３の演算結果を命令４のアドレス計算で用いる
場合を示している。アドレス計算とレジスタ間演算をと
もに第１演算装置３で行なうため、命令４のアドレス計
算は命令３の演算が完了する次のクロックt6で行なうこ
とができ、アドレス計算干渉によるパイプラインのイン
タロックは発生しない。FIG. 8 is an operation timing chart for explaining that the present embodiment solves the fourth problem. The required number of clocks for each device is the same as that shown in FIG. The instruction sequence being executed is the same as the instruction sequence shown in the operation timing diagram of FIG. 17, and is followed by an instruction for storing an operation register between memory and registers (instruction 1), followed by an operation instruction for two registers (instruction 2). , Instruction 3), and memory
A case is shown in which an inter-register operation register storage instruction (instruction 4) continues and the operation result of instruction 3 is used in the address calculation of instruction 4. Since both the address calculation and the register-to-register calculation are performed by the first arithmetic unit 3, the address calculation of the instruction 4 can be performed at the next clock t6 when the calculation of the instruction 3 is completed, and the pipeline interlock due to the address calculation interference is prevented. Does not occur.

以上のように本実施例によれば、オペランドフェッチ
の不要な命令の演算で書換えらるレジスタを後続するオ
ペランドフェッチの必要な命令のアドレス計算で読出す
場合でも、インタロックが発生せず、従ってパイプライ
ンの効率が低下しないようにすることができる。As described above, according to the present embodiment, even when a register to be rewritten by an operation of an instruction that does not require an operand fetch is read by the address calculation of a subsequent instruction that requires an operand fetch, no interlock occurs. It is possible to prevent the efficiency of the pipeline from decreasing.

第９図は、本実施例が第5,第６の課題を解決すること
を説明する動作タイミング図を示したものである。各装
置の必要クロック数は第４図に示すものと同じである。
実行している命令シーケンスは、第18図の動作タイミン
グ図に示した命令シーケンスと同じであり、アドレスが
連続する３つのメモリオペランドを３つのレジスタに転
送する（マルチプルロード）命令（命令１）に、２つの
レジスタ間演算命令（命令２、命令３）が続き、かつ、
命令2,命令３はいずれも命令１の実行結果を必要としな
い場合を示している。命令１はクロックt4〜t6で第１の
オペランドの読出しを行ないクロックt7でそのオペラン
ドをレジスタR1に格納し、クロックt7〜t9で第２のオペ
ランドの読出しを行ないクロックt10でそのオペランド
をレジスタR2に格納し、クロックt10〜t12で第３のオペ
ランドの読出しを行ないクロックt13でそのオペランド
をレジスタR3に格納する。しかし、クロックt3で第１演
算装置３が第１オペランドのアドレス計算を行なった
後、オペランド読出し装置４、第２演算装置５は、第１
演算装置３と独立に動作するため、以降の命令２、命令
３は、命令１のメモリの読出しおよびレジスタ格納と並
列に第１演算装置３においてクロックt4、クロックt5で
演算することができる。従って、第１演算装置３にメモ
リ読出しにかかる長大な空き時間が生じることがない。FIG. 9 is an operation timing chart for explaining that the present embodiment solves the fifth and sixth problems. The required number of clocks for each device is the same as that shown in FIG.
The instruction sequence being executed is the same as the instruction sequence shown in the operation timing diagram of FIG. 18, and the instruction (instruction 1) for transferring three memory operands having consecutive addresses to three registers (instruction 1). Followed by two register-to-register operation instructions (instruction 2 and instruction 3), and
Instruction 2 and instruction 3 show the case where the execution result of instruction 1 is not required. Instruction 1 reads the first operand at clocks t4 to t6, stores the operand in register R1 at clock t7, reads the second operand at clocks t7 to t9, and stores the operand in register R2 at clock t10. Then, the third operand is read out at clocks t10 to t12, and the operand is stored in the register R3 at clock t13. However, after the first arithmetic unit 3 calculates the address of the first operand at the clock t3, the operand reading unit 4 and the second arithmetic unit 5
Since it operates independently of the arithmetic unit 3, the following instructions 2 and 3 can be operated in the first arithmetic unit 3 at clocks t4 and t5 in parallel with the reading of the memory of the instruction 1 and the storage of the register. Therefore, there is no long vacant time required for memory reading in the first arithmetic unit 3.

また、複数のレジスタの内容をアドレスが連続するメ
モリに書込む（マルチプルストア）命令の場合も同様
で、第１演算装置３が第１オペランドのアドレス計算を
行なった後、オペランド書込み装置６は、第１演算装置
３と独立に動作するため、以降の命令は、複数のレジス
タの内容をアドレスが連続するメモリに書込む命令のメ
モリの書込みと並列に第１演算装置３において演算する
ことができる。従って、第１演算装置３にメモリ書込み
にかかる長大な空き時間が生じることがない。The same applies to the case of a (multiple store) instruction for writing the contents of a plurality of registers into a memory having consecutive addresses. After the first arithmetic unit 3 calculates the address of the first operand, the operand writing unit 6 Since it operates independently of the first arithmetic unit 3, subsequent instructions can be operated in the first arithmetic unit 3 in parallel with the writing of an instruction for writing the contents of a plurality of registers into a memory having continuous addresses. . Therefore, the first arithmetic unit 3 does not have a long free time for writing to the memory.

以上のように本実施例によれば、複数のオペランドフ
ェッチを必要とする命令あるいは複数のオペランドのメ
モリへの書込みを伴う命令が実行されても、その後に続
く命令がオペランドフェッチが不要であり、かつ、複数
のオペランドフェッチを必要とする命令あるいは複数の
オペランドのメモリへの書込みを伴う命令の実行結果を
必要としない場合であれば、その命令の演算を、複数の
オペランドフェッチを必要とする命令あるいは複数のオ
ペランドのメモリへの書込みを伴う命令の演算と並行ま
たは先行して行なうことにより、パイプラインの効率を
低下させないようにすることができる。As described above, according to the present embodiment, even if an instruction that requires a plurality of operand fetches or an instruction that involves writing a plurality of operands to a memory is executed, the subsequent instructions do not require an operand fetch, If the execution result of an instruction that requires a plurality of operand fetches or an instruction that involves writing a plurality of operands to memory is not required, the operation of the instruction is performed using an instruction that requires a plurality of operand fetches. Alternatively, the efficiency of the pipeline can be prevented from being lowered by performing the operation in parallel or in advance with the operation of the instruction that involves writing a plurality of operands to the memory.

また本実施例によれば、複数のオペランドフェッチを
必要とする命令あるいは複数のオペランドのメモリへの
書込みを伴う命令に続く命令が、複数のオペランドフェ
ッチを必要とする命令あるいは複数のオペランドのメモ
リへの書込みを伴う命令の実行結果を必要としないよう
にプログラムを最適化するソフトウェアの性能を発揮さ
せることができる。Further, according to the present embodiment, an instruction that requires a plurality of operand fetches or an instruction that follows an instruction that involves writing a plurality of operands to a memory is transferred to an instruction that requires a plurality of operand fetches or a plurality of operands of memory. It is possible to exert the performance of software that optimizes a program so that the execution result of an instruction accompanied by writing of is unnecessary.

次に、本実施例が第７の課題を解決することを説明す
る。以上のように構成された本実施例のデータ処理装置
は、第１演算装置３がオペランドのアドレス計算を行な
った後、以降の処理を切り離し、オペランド読出し装置
４、第２演算装置５、オペランド書込み装置６を、第１
演算装置３と独立に動作させるため、後続する命令はメ
モリの読出しや書込みと並列に第１演算装置３において
演算することができる。このことにより、たとえオペラ
ンド読出し装置４またはオペランド書込み装置６の必要
クロック数が増加しても、メモリ読出しあるいは書込み
を伴う命令に続く命令の演算ステージが遅れるようなこ
とはない。また、データ用のキャッシュメモリを内蔵
し、オペランド読出し装置４またはオペランド書込み装
置６の必要クロック数が減少しても、メモリ読出しある
いは書込みを伴う命令に続く命令の演算ステージの時間
的位置は変化しない。Next, a description will be given of how the present embodiment solves the seventh problem. In the data processing device of the present embodiment configured as described above, after the first arithmetic unit 3 calculates the address of the operand, the subsequent processing is separated, and the operand reading device 4, the second arithmetic device 5, and the operand writing Device 6 for the first
Since the operation is performed independently of the arithmetic unit 3, subsequent instructions can be operated in the first arithmetic unit 3 in parallel with reading and writing of the memory. As a result, even if the number of clocks required for the operand reading device 4 or the operand writing device 6 is increased, the operation stage of the instruction following the instruction accompanied by the memory reading or writing is not delayed. In addition, even if the required number of clocks of the operand reading device 4 or the operand writing device 6 is reduced by incorporating a cache memory for data, the temporal position of the operation stage of the instruction following the instruction accompanying the memory reading or writing does not change. .

以上のように本実施例によれば、キャッシュメモリの
内蔵の有無、キャッシュメモリのヒット率、キャッシュ
メモリの容量などによるオペランドの読出しや書込みに
必要なクロック数の増減に関わりなく、第１から第６の
課題を解決することができ、またオペランドの読出しや
書込みに必要なクロック数をレジスタ間演算の実行クロ
ック数とは独立に設定できる。As described above, according to the present embodiment, regardless of whether the built-in cache memory is included, the hit rate of the cache memory, the capacity of the cache memory, and the like, the number of clocks required for reading and writing operands increases and decreases. 6 can be solved, and the number of clocks required for reading and writing of the operand can be set independently of the number of execution clocks of the inter-register operation.

最後に、本実施例が第８の課題を解決することを説明
する。以上のように構成された本実施例のデータ処理装
置は、メモリオペランドの先読みを行なわない。そのた
め命令解読装置２は、アドレス計算のための制御情報を
命令実行に関する制御情報に先行して発行する必要がな
く、すべての制御情報を同時に第１演算装置３に発行す
ればよい。従って、命令解読装置２の制御は簡単にな
る。また、先読みデータや書込みアドレスの待ち合わせ
のための複雑な制御も必要としない。Finally, it will be described that the present embodiment solves the eighth problem. The data processing device according to the present embodiment configured as described above does not prefetch the memory operand. Therefore, the instruction decoding device 2 does not need to issue control information for address calculation prior to control information related to instruction execution, and may issue all control information to the first arithmetic device 3 at the same time. Therefore, the control of the command decoding device 2 is simplified. Also, there is no need for complicated control for waiting for pre-read data or write addresses.

以上のように本実施例によれば、各装置の制御の簡単
化に伴い制御回路の遅延時間が短縮され、プロセッサが
動作するクロック周波数を容易に向上することができ
る。As described above, according to the present embodiment, the delay time of the control circuit is reduced along with the simplification of the control of each device, and the clock frequency at which the processor operates can be easily improved.

なお本実施例は、単一または複数のオペランドをメモ
リから読出して演算を施さずにレジスタに格納する（ロ
ードまたはマルチプルロード）命令を実行する場合、読
出しデータバッファ403の内容のそのまま第２算術論理
演算回路501を通過させて汎用レジスタ302に格納してい
るが、読出しデータバッファ403から直接汎用レジスタ3
02に通じるデータ線を設けて汎用レジスタ302に格納し
てもよい。このようにすることにより、プログラム中に
頻出するロード命令やマルチプルロード命令は第２演算
ステージが不要になり、これらの命令の実行結果を後続
する命令で用いる場合に発生するパイプラインのインタ
ロックを軽減でき、パイプラインの効率が向上するとい
う効果がある。ただしロード命令やマルチプルロード命
令によりフラグを変化させる場合、読出しデータバッフ
ァ403からフラグ生成回路306にフラグ生成に関する情報
を送出する必要がある。In the present embodiment, when executing an instruction to read a single or a plurality of operands from a memory and store them in a register without performing an operation (load or multiple load), the contents of the read data buffer 403 are used as the second arithmetic logic Although it is stored in the general-purpose register 302 through the arithmetic circuit 501, the general-purpose register 3 is directly read from the read data buffer 403.
A data line leading to 02 may be provided and stored in the general-purpose register 302. By doing so, the load instruction and the multiple load instruction that frequently appear in the program do not require the second operation stage, and the pipeline interlock that occurs when the execution result of these instructions is used in the subsequent instructions is eliminated. Therefore, there is an effect that the efficiency of the pipeline is improved. However, when the flag is changed by a load instruction or a multiple load instruction, it is necessary to send information about flag generation from the read data buffer 403 to the flag generation circuit 306.

また、本実施例は実記憶対応としてアドレス変換機構
を考慮しなかったが、仮想記憶対応の場合は命令先読み
装置１とオペランド読出し装置４、またはバス制御装置
８にアドレス変換機構を組み込んでもよい。Although the present embodiment does not consider an address translation mechanism for real storage, an address translation mechanism may be incorporated in the instruction prefetching device 1 and the operand reading device 4 or the bus control device 8 for virtual storage.

また、本実施例はオペランド読出し装置４と第２演算
装置５を分離したが、オペランド読出し装置としてひと
つの装置として実現してもよい。Further, in the present embodiment, the operand reading device 4 and the second arithmetic device 5 are separated, but they may be realized as one device as the operand reading device.

また、本実施例は各装置をひとつのパイプラインステ
ージとして説明したが、複数のパプラインステージを持
つ装置として実現してもよい。Further, in this embodiment, each device is described as one pipeline stage, but may be realized as a device having a plurality of pipeline stages.

発明の効果以上説明したように、請求項１記載の発明によれば、
第二の命令実行手段による先行命令の処理時間が長くと
も後続命令が第二の命令実行手段での処理を必要としな
い場合は後続命令を処理し完了することができる。その
上、第一の命令実行手段と第二の命令実行手段とはこの
順に直列に結合するパイプラインを形成するに過ぎない
ため、複雑な並列ステージ制御を必要としない。Effects of the Invention As described above, according to the first aspect of the present invention,
Even if the processing time of the preceding instruction by the second instruction execution means is long, if the subsequent instruction does not require processing by the second instruction execution means, the subsequent instruction can be processed and completed. In addition, since the first instruction execution means and the second instruction execution means merely form a pipeline connected in series in this order, complicated parallel stage control is not required.

請求項２記載の発明によれば、本来逐次的に実行され
る第一の命令と第二の命令とが並列に実行されたときで
も逐次的に実行された場合と何ら矛盾することなくフラ
グを更新することができる。According to the second aspect of the present invention, even when the first instruction and the second instruction, which are originally executed sequentially, are executed in parallel, the flag is set without inconsistency with the case of being executed sequentially. Can be updated.

請求項３記載の発明によれば、本来正順に完了される
第一の命令と第二の命令とが逆順に完了されたときでも
正順に完了された場合と何ら矛盾することなくフラグを
更新することができる。According to the third aspect of the present invention, even when the first instruction and the second instruction that are originally completed in the normal order are completed in the reverse order, the flag is updated without inconsistency with the case where the first instruction and the second instruction are completed in the normal order. be able to.

以上に示すように、本発明の実用的効果はきわめて大
きい。As described above, the practical effect of the present invention is extremely large.

[Brief description of the drawings]

第１図は本発明の実施例におけるデータ処理装置の構成
図、第２図は同実施例の第１演算装置３、オペランド読
出し装置４、第２演算装置５およびオペランド書込み装
置６の詳細構成図、第３図は同実施例の命令の型とパイ
プラインステージの関係図、第４図は同実施例の動作タ
イミング図、第５図は第１の課題を解決することを説明
する同実施例の動作タイミング図、第６図は第２の課題
を解決することを説明する同実施例の動作タイミング
図、第７図は第３の課題を解決することを説明する同実
施例の動作タイミング図、第８図は第４の課題を解決す
ることを説明する同実施例の動作タイミング図、第９図
は第５および第６の課題を解決することを説明する同実
施例の動作タイミング図、第10図は第１の従来のデータ
処理装置の構成図、第11図は同第１の従来例の動作タイ
ミング図、第12図は第２の従来のデータ処理装置の構成
図、第13図は同第２の従来例の動作タイミング図、第14
図は第１の課題を説明する同第２の従来例の動作タイミ
ング図、第15図は第２の課題を説明する同第２の従来例
の動作タイミング図、第16図は第３の課題を説明する同
第２の従来例の動作タイミング図、第17図は第４の課題
を説明する同第２の従来例の動作タイミング図、第18図
は第５および第６の課題を説明する同第２の従来例の動
作タイミング図である。１……命令先読み装置、２……命令解読装置、３……第
１演算装置、４……オペランド読出し装置、５……第２
演算装置、６……オペランド書込み装置、７……入出力
バス、８……バス制御装置、301……演算制御回路、302
……汎用レジスタ、303……レジスタ制御回路、304……
第１算術論理演算回路、305……ステータスレジスタ、3
06……フラグ生成回路、401……読出しアドレスレジス
タ、402……読出し制御回路、403……読出しデータバッ
ファ、404……読出しアドレス増分回路、501……第２算
術論理演算回路、601……書込みアドレスレジスタ、602
……書込み制御回路、603……書込みデータバッファ、6
04……書込みアドレス増分回路。FIG. 1 is a block diagram of a data processing device in an embodiment of the present invention, and FIG. 2 is a detailed configuration diagram of a first arithmetic device 3, an operand reading device 4, a second arithmetic device 5 and an operand writing device 6 of the same embodiment. FIG. 3 is a diagram showing the relationship between the instruction type and the pipeline stage of the embodiment, FIG. 4 is an operation timing diagram of the embodiment, and FIG. 5 is a diagram illustrating the solution of the first problem. FIG. 6 is an operation timing chart of the embodiment explaining that the second problem is solved. FIG. 7 is an operation timing diagram of the embodiment explaining that the third problem is solved. FIG. 8, FIG. 8 is an operation timing chart of the embodiment explaining solving the fourth problem, FIG. 9 is an operation timing diagram of the embodiment explaining solving the fifth and sixth problems, FIG. 10 is a configuration diagram of a first conventional data processing apparatus, 11 figure operation timing diagram of the first conventional example, FIG. 12 is a structural view of a second conventional data processing apparatus, FIG. 13 timing diagram of the second conventional example, 14
FIG. 15 is an operation timing diagram of the second conventional example illustrating the first problem, FIG. 15 is an operation timing diagram of the second conventional example illustrating the second problem, and FIG. 16 is a third problem. FIG. 17 is an operation timing chart of the second conventional example explaining the fourth problem, and FIG. 18 is a diagram explaining the fifth and sixth problems of the second conventional example. FIG. 11 is an operation timing chart of the second conventional example. 1 ... Instruction prefetching device, 2 ... Instruction decoding device, 3 ... First arithmetic device, 4 ... Operand reading device, 5 ... Second
Arithmetic unit 6, operand write unit 7, input / output bus 8, bus control unit 301 arithmetic control circuit 302
...... General purpose register, 303 ...... Register control circuit, 304 ......
1st arithmetic logic operation circuit, 305... Status register, 3
06: flag generation circuit, 401: read address register, 402: read control circuit, 403: read data buffer, 404: read address increment circuit, 501: second arithmetic logic operation circuit, 601: write Address register, 602
... Write control circuit, 603 ... Write data buffer, 6
04 …… Write address increment circuit.

フロントページの続き (72)発明者坂尾隆大阪府門真市大字門真1006番地松下電器産業株式会社内 (56)参考文献特開昭62−262141（ＪＰ，Ａ) 特開昭63−261428（ＪＰ，Ａ) 特開昭58−90247（ＪＰ，Ａ) 特開昭58−106641（ＪＰ，Ａ) 特開昭57−168350（ＪＰ，Ａ) 特開昭54−67348（ＪＰ，Ａ) 特開昭58−189738（ＪＰ，Ａ) 特開昭54−47438（ＪＰ，Ａ)Front page continuation (72) Inventor Takashi Sakao 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (56) References JP-A-62-262141 (JP, A) JP-A-63-261428 (JP) , A) JP 58-90247 (JP, A) JP 58-106641 (JP, A) JP 57-168350 (JP, A) JP 54-67348 (JP, A) JP 58-189738 (JP, A) JP-A-54-47438 (JP, A)

Claims

(57) [Claims]

1. A pipelined data processing device comprising at least a read stage, a decryption stage and an execution stage, the execution stage further including a serial first execution stage and a second execution stage. Pre-reading means acting as a decoding stage, an instruction decoding means used as a decoding stage, at least a first instruction executing means acting as a first execution stage, and a second instruction executing means acting as a second execution stage An instruction executing means consisting of: a first instruction is a name processed at least in the first instruction executing means and the second instruction executing means in this order, and is a successor to the first instruction. A second instruction to execute is processed in at least the first instruction executing means,
And, when the instruction does not require processing in the second instruction execution means, the instruction execution means, even if the second instruction execution means is executing the processing according to the first instruction, A data processing device, characterized in that the first command execution means performs a process related to the second command and completes all the processes of the second naming.

2. An instruction prefetching means for prefetching an instruction, an instruction decoding means for decoding an instruction read by the instruction prefetching means, and an instruction execution process according to a decoding result of the instruction decoding means. An instruction execution unit; a flag generation unit that generates a flag indicating a state of an execution result of the instruction execution unit; and a status register. The instruction execution unit includes: a process related to a first instruction; The process according to the subsequent second instruction is performed in parallel, and the flag generation unit executes the process according to the first instruction and the process according to the second instruction in parallel in the instruction execution unit. The first flag is updated by execution of the first instruction, and is not updated by execution of the second instruction, among the flags included in the status register; and Identifying a second flag updated by execution of the instruction, generating the first flag based on the execution result of the first instruction, and generating the second flag based on the execution result of the second instruction. The data processing device is characterized in that each of the flags is generated and reflected in the status register.

3. An instruction pre-reading means for pre-reading an instruction, an instruction decoding means for decoding the instruction read by the instruction pre-reading means, and an instruction execution process according to the decoding result of the instruction decoding means. An instruction execution unit; a flag generation unit that generates a flag indicating a state of an execution result of the instruction execution unit; and a status register. The instruction execution unit performs processing related to a first instruction and performs processing on the first instruction. The processing related to the subsequent second instruction is completed in an order irrelevant to the order of the program, and the flag generation unit determines that the processing related to the first instruction corresponds to the second instruction in the instruction execution unit. When completed after the completion of the processing, the status register is generated by generating a flag that is updated by the execution of the first instruction and is not updated by the execution of the second instruction. The data processing apparatus characterized by reflecting.