JP2904624B2

JP2904624B2 - Parallel processing unit

Info

Publication number: JP2904624B2
Application number: JP26458491A
Authority: JP
Inventors: 龍宏五島
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-10-14
Filing date: 1991-10-14
Publication date: 1999-06-14
Anticipated expiration: 2014-06-14
Also published as: JPH05108348A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、複数の命令を並列に
処理することが可能なパイプライン方式の並列演算処理
装置に係り、特に、先行する複数の命令中に少なくとも
１つのロード命令が含まれる場合に好適なロード命令処
理制御方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pipeline type parallel processing device capable of processing a plurality of instructions in parallel, and more particularly, to at least one load instruction included in a plurality of preceding instructions. And a load instruction processing control method suitable for the

【０００２】[0002]

【従来の技術】従来、ＶＬＩＷ（Very Long Instructio
n Word）方式やスーパスカラ方式等の複数命令を並列
（同時）に処理する形式の計算機（並列演算処理装置）
では、同時に投入する複数命令（ここでは、複合命令と
呼ぶ）間でレジスタやメモリ等での参照関係が存在する
場合には、実行順序関係を保証するための待ち時間が発
生し、処理性能が大きく低下する。このため、この種の
計算機では、コンパイラによる静的なプログラムのスケ
ジューリングを行うことによって、上記の参照関係によ
る待ちを減少させることが知られていた。2. Description of the Related Art Conventionally, VLIW (Very Long Instructio)
n) A computer (parallel processing unit) that processes multiple instructions in parallel (simultaneously), such as the Word) and Superscalar methods
In the case where there is a reference relationship in a register, a memory, or the like among a plurality of instructions to be input simultaneously (herein, called a compound instruction), a waiting time for guaranteeing the execution order relationship occurs, and the processing performance is reduced. It greatly decreases. For this reason, in this type of computer, it has been known that the waiting by the above-mentioned reference relationship is reduced by performing static program scheduling by a compiler.

【０００３】[0003]

【発明が解決しようとする課題】しかし、コンパイラに
よる静的なスケジューリングでは、メモリアドレス等の
動的に決定される依存関係を取扱うことが困難である。
したがって、これら動的な依存関係はハードウェア的な
サポートによって解析され実行順序を保証しなければな
らない。However, with static scheduling by a compiler, it is difficult to handle dynamically determined dependencies such as memory addresses.
Therefore, these dynamic dependencies must be analyzed by hardware support to guarantee the execution order.

【０００４】上記の動的な依存関係を発生させる代表的
な命令にロード命令（メモリデータの読出し）がある。
このロード命令によるメモリアクセスは通常はキャッシ
ュメモリを介して行われる。このため、キャッシュにヒ
ットする限り、ロード命令に依存する命令の順序関係を
保証しようとしても性能への影響は少ない。ところが、
キャッシュミスヒットの場合にはブロックリードが発生
するため、ロードデータが得られるまでには、キャッシ
ュヒット時の数倍から数十倍のオーバヘッドが発生す
る。従来の並列演算処理装置では、ロード命令ミスヒッ
ト時には、無条件に後続の複合命令を待たせるのみであ
ったため、大きな性能低下を招いていた。A typical instruction for generating the above-mentioned dynamic dependency is a load instruction (reading of memory data).
The memory access by this load instruction is usually performed via a cache memory. For this reason, as long as the cache hits, even if an attempt is made to guarantee the order of instructions that depend on the load instruction, the effect on performance is small. However,
In the case of a cache miss, a block read occurs, so that the overhead is several times to several tens times that of the cache hit before the load data is obtained. In the conventional parallel processing unit, when a load instruction miss occurs, only a subsequent compound instruction is made to wait unconditionally, resulting in a large decrease in performance.

【０００５】この発明は上記事情に鑑みてなされたもの
でその目的は、ロード命令がキャッシュミスヒットの場
合でも、後続の複合命令の待ち時間をできるだけ少なく
し、処理性能を向上させることができる並列演算処理装
置を提供することにある。The present invention has been made in view of the above circumstances, and has as its object to reduce the waiting time of the subsequent compound instruction as much as possible and improve the processing performance even when the load instruction is a cache miss. An object of the present invention is to provide an arithmetic processing device.

【０００６】[0006]

【課題を解決するための手段】この発明は、命令フェッ
チ機構によってフェッチされた複数の命令（複合命令）
を並列に処理することが可能なパイプライン方式の並列
演算処理装置において、先にフェッチされた複合命令
（先行複合命令）と後にフェッチされた複合命令（後続
の複合命令）との間に、少なくとも１つの依存関係が存
在することを検出するための依存関係検出手段と、この
依存関係検出手段の検出結果に応じてパイプラインの流
れを制御するパイプライン制御機構とを備え、先行複合
命令と後続複合命令との間に依存関係が存在しない場合
には、先行複合命令中にロード命令が含まれていても後
続複合命令を待たせないようにしたことを特徴とするも
のである。According to the present invention, a plurality of instructions (composite instructions) fetched by an instruction fetch mechanism are provided.
In a pipeline-type parallel operation processing device capable of processing data in parallel, at least between a compound instruction fetched earlier (preceding compound instruction) and a compound instruction fetched later (subsequent compound instruction) A dependency detection means for detecting the existence of one dependency; and a pipeline control mechanism for controlling a flow of the pipeline in accordance with a detection result of the dependency detection means. When there is no dependency with the compound instruction, the subsequent compound instruction is not made to wait even if the preceding compound instruction includes a load instruction.

【０００７】また、この発明は、上記の依存関係検出手
段を、先行複合命令中にロード命令が含まれている場合
に、同ロード命令で指定されるロードデータ書込み先が
使用状態にあることを示す状態保持手段と、この状態保
持手段によって示されているロードデータ書込み先が、
上記後続複合命令中のいずれかの命令によって指定され
るアクセス先に一致することを検出するための一致検出
手段とで構成したことをも特徴とする。Further, the present invention provides the above-mentioned dependency detecting means, wherein when a load instruction is included in the preceding compound instruction, the load data write destination specified by the load instruction is in use. State holding means, and load data write destination indicated by the state holding means,
And a match detecting means for detecting a match with an access destination specified by any one of the subsequent compound instructions.

【０００８】[0008]

【作用】上記の構成では、先行複合命令と後続複合命令
との間に依存関係が存在するか否かが依存関係検出手段
により検出される。この検出のために状態保持手段およ
び一致検出手段が用意される。状態保持手段には、フェ
ッチされた複合命令の解読の結果、ロード命令が含まれ
ていることが判明した場合に、同ロード命令で指定され
るロードデータ書込み先が使用状態にあることを示す状
態情報が保持される。一致検出手段は、命令フェッチ機
構によって新たに複合命令（後続複合命令）がフェッチ
された場合には、その時点において状態保持手段によっ
て示されている使用状態にあるロードデータ書込み先、
即ち先行複合命令にロード命令が含まれている場合に同
命令で指定されるロードデータ書込み先が、上記後続複
合命令中のいずれかの命令によって指定されるアクセス
先に一致するか否かを調べ、その結果をパイプライン制
御機構に通知する。With the above arrangement, the dependency detection means detects whether or not a dependency exists between the preceding compound instruction and the subsequent compound instruction. For this detection, a state holding unit and a coincidence detecting unit are prepared. When the result of decoding of the fetched compound instruction indicates that a load instruction is included, the state holding means indicates that the load data write destination specified by the load instruction is in use. Information is retained. When a new compound instruction (subsequent compound instruction) is newly fetched by the instruction fetch mechanism, the coincidence detecting means outputs the load data write destination in the use state indicated by the state holding means at that time.
That is, when the preceding compound instruction includes a load instruction, it is determined whether the load data write destination specified by the same instruction matches the access destination specified by any of the following compound instructions. , And notifies the result to the pipeline control mechanism.

【０００９】パイプライン制御機構は、（依存関係検出
手段内の）一致検出手段からの一致／不一致通知を受け
ると、一致通知の場合には、上記後続複合命令と先行複
合命令との間に、少なくとも１つの依存関係が存在する
ものとして、後続複合命令の実行を待たせる。これに対
して不一致通知の場合には、パイプライン制御機構は、
後続複合命令と先行する複合命令との間に依存関係が存
在しないものとして、後続複合命令の実行を許可する。
これにより、先行する複合命令中にロード命令が含まれ
ており、且つ同命令がキャッシュミスヒットとなってブ
ロックリード中であったとしても、そのブロックリード
処理の完了を待たずに、後続複合命令が実行され、不要
な待ち時間が回避される。When the pipeline control mechanism receives a match / mismatch notification from the match detection means (within the dependency detection means), in the case of a match notification, the pipeline control mechanism intervenes between the subsequent compound instruction and the preceding compound instruction. Assuming that at least one dependency exists, the execution of the subsequent compound instruction is made to wait. On the other hand, in the case of a mismatch notification, the pipeline control mechanism:
Assuming that there is no dependency between the subsequent compound instruction and the preceding compound instruction, execution of the subsequent compound instruction is permitted.
As a result, even if the preceding compound instruction includes a load instruction and the instruction becomes a cache miss and is being read in a block, the subsequent compound instruction can be executed without waiting for the completion of the block read processing. Is executed to avoid unnecessary waiting time.

【００１０】[0010]

【実施例】図１はこの発明の一実施例に係る並列演算処
理装置の構成を示すブロック図である。この図１の並列
演算処理装置は、例えば複合命令フェッチのステージ
（Ｉステージ）、複合命令デコードのステージ（Ｄステ
ージ）、複合命令実行（ロード命令のキャッシュアクセ
スを含む）のステージ（Ｅステージ）、および結果の書
込み（レジスタライトバック）のステージ（Ｗステー
ジ）の４ステージのパイプライン方式を適用する４命令
並列実行の４ステージパイプライン処理計算機であるも
のとする。なお、図１の演算処理装置では、並列処理さ
れる４命令中に含まれるロード命令は１つに制限されて
いるものとする。FIG. 1 is a block diagram showing a configuration of a parallel processing unit according to an embodiment of the present invention. The parallel operation processing device of FIG. 1 includes, for example, a composite instruction fetch stage (I stage), a composite instruction decode stage (D stage), a composite instruction execution (including a cache access of a load instruction) (E stage), It is assumed that the computer is a four-stage pipeline processing computer that executes four instructions in parallel and applies a four-stage pipeline method of a stage (W stage) for writing a result (register write-back). In the arithmetic processing device of FIG. 1, it is assumed that the number of load instructions included in the four instructions processed in parallel is limited to one.

【００１１】図１において、１は並列に実行すべき４命
令を命令キャッシュ或いは主メモリなどのプログラム格
納装置（図示せず）からフェッチする命令フェッチ機
構、２は命令フェッチ機構１によってフェッチされた４
命令を保持するための命令レジスタである。In FIG. 1, reference numeral 1 denotes an instruction fetch mechanism for fetching four instructions to be executed in parallel from a program storage device (not shown) such as an instruction cache or a main memory, and 2 denotes an instruction fetch mechanism fetched by the instruction fetch mechanism 1.
Instruction register for holding instructions.

【００１２】３は例えば４つのレジスタＲ１〜Ｒ４から
なる汎用のレジスタファイル、４は解読／レジスタリー
ド機構である。解読／レジスタリード機構４は、命令レ
ジスタ２に保持されている４命令の解読を各命令毎に並
列に行うと共に、同命令でレジスタファイル３内レジス
タがソース指定されている場合に、そのレジスタ内容を
ソースデータとしてリードする。解読／レジスタリード
機構４は、解読した４命令中にレジスタファイル３内の
レジスタＲ１〜Ｒ４をロードデータ書込み先とするロー
ド命令があるか否かを示す（４ビットの）レジスタ使用
信号４１、および解読した４命令中にレジスタファイル
３内のレジスタＲ１〜Ｒ４をアクセス先（参照または書
込み先）とする命令があるか否かを示す（４ビットの）
レジスタ使用信号４２を出力するようになっている。Reference numeral 3 denotes a general-purpose register file including, for example, four registers R1 to R4, and reference numeral 4 denotes a decoding / register reading mechanism. The decoding / register reading mechanism 4 decodes the four instructions held in the instruction register 2 in parallel for each instruction, and when a source in the register file 3 is designated by the instruction, the contents of the register are read. Is read as source data. The decoding / register reading mechanism 4 includes a (4-bit) register use signal 41 indicating whether or not there is a load instruction to which the registers R1 to R4 in the register file 3 are to be loaded with the load data among the four decoded instructions. Indicates whether any of the four decoded instructions has an instruction to access (reference or write) the registers R1 to R4 in the register file 3 (4 bits).
A register use signal 42 is output.

【００１３】５は解読／レジスタリード機構４の４命令
分の解読結果等を命令毎に保持するための解読結果レジ
スタである。解読結果レジスタ５に保持される情報に
は、命令によっては、解読／レジスタリード機構４によ
ってリードされたレジスタデータ、或いは後述するロー
ド実行機構７によって読出されたロードデータが含まれ
る。Reference numeral 5 denotes a decoding result register for holding the decoding result for four instructions of the decoding / register reading mechanism 4 for each instruction. The information held in the decryption result register 5 includes register data read by the decryption / register reading mechanism 4 or load data read by the load execution mechanism 7 described later, depending on the instruction.

【００１４】６は解読結果レジスタ５に演算命令の解読
結果が保持されている場合に、その解読結果の示す演算
を実行する演算実行機構である。演算実行機構６は、最
大４命令分のの並列演算処理を実行するために、４つの
演算器から構成される。７は解読結果レジスタ５にロー
ド命令（メモリデータをロードデータとして読出して解
読結果レジスタ５内レジスタに書込む命令）の解読結果
が保持されている場合に、その解読結果の示すロード処
理を実行するロード実行機構である。ロード実行機構７
はオペランドキャッシュ（図示せず）を内蔵しており、
ロード命令で指定されたデータが同キャッシュに存在す
るか否かを調べるヒットチェックを行い、ミスヒット検
出時にはメモリからのブロックリードを実行する。また
ロード実行機構７は、ヒット検出時に真値（論理
“１”）となるヒット検出信号８を出力するようになっ
ている。Reference numeral 6 denotes an operation executing mechanism for executing the operation indicated by the result of decoding when the result of decoding of the operation instruction is held in the decoding result register 5. The operation execution mechanism 6 includes four operation units in order to execute parallel operation processing for up to four instructions. When the decoding result register 5 holds the decoding result of the load instruction (the instruction to read out the memory data as the load data and write it into the register in the decoding result register 5), it executes the load processing indicated by the decoding result. It is a load execution mechanism. Load execution mechanism 7
Has a built-in operand cache (not shown),
A hit check is performed to determine whether or not the data specified by the load instruction exists in the same cache. When a mishit is detected, a block read from the memory is executed. The load execution mechanism 7 outputs a hit detection signal 8 that becomes a true value (logic "1") when a hit is detected.

【００１５】９は先行する複合命令中にロード命令が含
まれていた場合に、そのロード命令と後続の複合命令と
の間にレジスタファイル３内のレジスタ使用についての
依存関係（参照関係）が存在することを検出するための
依存関係検出回路である。Reference numeral 9 indicates that when a load instruction is included in a preceding compound instruction, there is a dependency (reference relation) regarding the use of a register in the register file 3 between the load instruction and the subsequent compound instruction. This is a dependency detection circuit for detecting the operation.

【００１６】依存関係検出回路９は、解読／レジスタリ
ード機構４から出力されるレジスタＲ１〜Ｒ４に対応し
たレジスタ使用信号４１の真値を保持するための４ビッ
ト（b0〜b3）のフラグレジスタ１０と、フラグレジスタ
１０の各ビットb0〜b3の論理値、解読／レジスタリード
機構４から出力されるレジスタＲ１〜Ｒ４に対応したレ
ジスタ使用信号４２の論理値、およびロード実行機構７
から出力されるヒット検出信号８のレベル反転信号の論
理値のＡＮＤ（論理積）をとるアンドゲート１１〜１４
とから構成される。The dependency detection circuit 9 is a 4-bit (b0-b3) flag register 10 for holding the true value of the register use signal 41 corresponding to the registers R1-R4 output from the decryption / register read mechanism 4. And the logical values of the bits b0 to b3 of the flag register 10, the logical values of the register use signals 42 corresponding to the registers R1 to R4 output from the decryption / register read mechanism 4, and the load execution mechanism 7.
AND gates 11 to 14 which take AND (logical product) of the logical value of the level inversion signal of hit detection signal 8 output from
It is composed of

【００１７】フラグレジスタ１０の各ビットb0〜b3は、
対応するレジスタＲ１〜Ｒ４を書き込み先とするロード
命令の実行完了によりリセットされるようになってい
る。アンドゲート１１〜１４は、アンドゲート条件成立
時に、現在解読／レジスタリード機構４で処理されてい
る複合命令以降の命令の実行を待たせることを指示する
ＷＡＩＴ信号２１〜２４を出力する。The bits b0 to b3 of the flag register 10 are
The reset is performed when the execution of the load instruction with the corresponding registers R1 to R4 as write destinations is completed. When the AND gate condition is satisfied, the AND gates 11 to 14 output WAIT signals 21 to 24 for instructing to wait for the execution of instructions after the compound instruction currently being processed by the decoding / register reading mechanism 4.

【００１８】２５は図１の装置のパイプライン制御を司
るパイプライン制御機構である。パイプライン制御機構
２５は、（依存関係検出回路９内の）アンドゲート１１
〜１４からのＷＡＩＴ信号２１〜２４のいずれかが真の
場合に、現在解読／レジスタリード機構４で処理されて
いる複合命令以降の命令の実行を待たせる。Reference numeral 25 denotes a pipeline control mechanism for controlling the pipeline of the apparatus shown in FIG. The pipeline control mechanism 25 includes the AND gate 11 (in the dependency detection circuit 9).
If any of the WAIT signals 21 to 24 from to 14 is true, execution of the instructions following the compound instruction currently being processed by the decoding / register reading mechanism 4 is made to wait.

【００１９】３１〜３４は演算実行機構６の４命令に対
応した演算結果またはロード実行機構７によってメモリ
から読出されたロードデータのいずれか一方を選択する
ためのセレクタ（ＳＥＬ）である。セレクタ３１〜３４
は、通常状態では演算実行機構６側を選択するように設
定されている。なお、上記の演算結果およびロードデー
タには、その書込み先を示す識別子（例えばレジスタ番
号）が付加されるようになっている。３５はセレクタ３
１〜３４によって選択されたデータを保持するための演
算結果レジスタである。この演算結果レジスタ３５の保
持データは、そのデータに付加された識別子の指定する
書込み先（例えばレジスタファイル３内レジスタ）に書
込まれる。Reference numerals 31 to 34 denote selectors (SEL) for selecting either one of the operation results corresponding to the four instructions of the operation execution unit 6 or the load data read from the memory by the load execution unit 7. Selectors 31-34
Is set to select the operation execution mechanism 6 in the normal state. Note that an identifier (for example, a register number) indicating the write destination is added to the operation result and the load data. 35 is selector 3
This is an operation result register for holding data selected by 1 to 34. The data held in the operation result register 35 is written to a write destination (for example, a register in the register file 3) specified by the identifier added to the data.

【００２０】次に、図１の構成の動作を、ロード命令を
含む３つの複合命令を順に実行する場合を例に、複合命
令間にレジスタの参照関係（ＲＡＷ；ReadAfter Writ
e）が無いケース（ケース１）と、参照関係が有るケー
ス（ケース２）とについて順に説明する。Next, the operation of the configuration shown in FIG. 1 will be described by taking, as an example, a case where three compound instructions including a load instruction are sequentially executed, and a register reference relationship (RAW; Read After Write) between compound instructions.
The case without e) (case 1) and the case with reference (case 2) will be described in order.

【００２１】（ケース１）まず、ケース１について、図
２のタイミングチャートを参照して説明する。ここで
は、図２に示すように、先頭の複合命令ＣＩ１が、ロー
ド命令Ｌ→Ｒ１（メモリからロードデータを読出してレ
ジスタＲ１に書込むロード命令）、加算命令＋、乗算命
令×、および減算命令−の４命令であり、複合命令ＣＩ
２が、ロード命令Ｌ→Ｒ２、減算命令−、加算命令＋、
および乗算命令×の４命令であり、複合命令ＣＩ３が、
ロード命令Ｌ→Ｒ３、減算命令−、除算命令÷、および
減算命令−の４命令であり、複合命令ＣＩ１→ＣＩ２→
ＣＩ３間にレジスタの参照関係（ＲＡＷ；Read After W
rite）が無いものとする。(Case 1) First, Case 1 will be described with reference to the timing chart of FIG. Here, as shown in FIG. 2, the first compound instruction CI1 is a load instruction L → R1 (a load instruction that reads load data from a memory and writes it into register R1), an addition instruction +, a multiplication instruction ×, and a subtraction instruction. -Four instructions, and the compound instruction CI
2 is a load instruction L → R2, a subtraction instruction −, an addition instruction +,
And a multiplication instruction × 4, and the composite instruction CI3 is
Load instruction L → R3, subtraction instruction−, division instruction ÷, and subtraction instruction−, which are compound instructions CI1 → CI2 →
Register reference relationship between CI3 (RAW; Read After W
rite).

【００２２】まず、サイクルＴ１では、命令フェッチ機
構１によりＩステージが行われ、先頭の複合命令ＣＩ１
がフェッチされる。この複合命令ＣＩ１は命令レジスタ
２に保持される。First, in the cycle T1, the I stage is performed by the instruction fetch mechanism 1, and the first composite instruction CI1
Is fetched. The composite instruction CI1 is held in the instruction register 2.

【００２３】次のサイクルＴ２では、命令レジスタ２に
保持された複合命令ＣＩ１の各命令を対象とするＤステ
ージ、即ち命令解読が、解読／レジスタリード機構４に
よって行われ、その解読結果が命令別に解読結果レジス
タ５に保持される。同時に、命令フェッチ機構１によっ
て次のＩステージが行われ、複合命令ＣＩ１の次の複合
命令ＣＩ２がフェッチされて命令レジスタ２に保持され
る。In the next cycle T2, the D stage for each instruction of the compound instruction CI1 held in the instruction register 2, that is, instruction decoding is performed by the decoding / register reading mechanism 4, and the decoding result is obtained for each instruction. The result is stored in the decryption result register 5. At the same time, the next I stage is performed by the instruction fetch mechanism 1, and the compound instruction CI2 next to the compound instruction CI1 is fetched and held in the instruction register 2.

【００２４】また、複合命令ＣＩ１のＤステージでは、
同ＣＩ１にロード命令Ｌ→Ｒ１が含まれているために、
Ｒ１，Ｒ２，Ｒ３，Ｒ４のうちのＲ１に対応するビット
が真値の４ビットレジスタ使用信号４１“１０００”が
解読／レジスタリード機構４から出力される。このレジ
スタ使用信号４１は依存関係検出回路９の４ビットフラ
グレジスタ１０に供給され、同信号４１中の真値をとる
ビットに対応するフラグレジスタ１０内のビットが、サ
イクルＴ２の終了時にセットされる。ここでは、Ｒ１に
対応するフラグレジスタ１０内のビットb0がセットさ
れ、フラグレジスタ１０の内容は“１０００”となる。In the D stage of the compound instruction CI1,
Since the load instruction L → R1 is included in the same CI1,
The decoding / register reading mechanism 4 outputs a 4-bit register use signal 41 "1000" in which the bit corresponding to R1 among R1, R2, R3 and R4 is a true value. This register use signal 41 is supplied to the 4-bit flag register 10 of the dependency detection circuit 9, and a bit in the flag register 10 corresponding to a bit having a true value in the signal 41 is set at the end of the cycle T2. . Here, the bit b0 in the flag register 10 corresponding to R1 is set, and the content of the flag register 10 becomes "1000".

【００２５】次のサイクルＴ３では、解読結果レジスタ
５に保持された複合命令ＣＩ１中の＋，×，−の各命令
の解読結果に従い、演算実行機構６において＋，×，−
の各演算（Ｅステージ）が並列に行われる。この結果は
セレクタ３２〜３４によってそれぞれ選択されて演算結
果レジスタ３５に保持される。In the next cycle T3, according to the decoding result of each of the +, × and-instructions in the compound instruction CI1 held in the decoding result register 5, the operation execution unit 6 adds +, × and-.
(E stage) are performed in parallel. The result is selected by each of the selectors 32 to 34 and held in the operation result register 35.

【００２６】またサイクルＴ３では、解読結果レジスタ
５に保持された複合命令ＣＩ１中のロード命令Ｌ→Ｒ１
の解読結果に従い、ロード実行機構７において、ロード
データ読出しのための処理が行われる。この処理では、
まず目的ロードデータがオペランドキャッシュに存在す
るか否かを調べるためのヒットチェックが行われる。こ
こでは、ミスヒットであるものとすると、ロード実行機
構７は目的データを含むブロックデータをメモリから読
むためのブロックリードを開始し、ヒット検出信号８を
真値としない。In cycle T3, the load instruction L → R1 in the composite instruction CI1 held in the decryption result register 5
According to the result of decoding, the load execution mechanism 7 performs a process for reading the load data. In this process,
First, a hit check is performed to check whether the target load data exists in the operand cache. Here, assuming that there is a mishit, the load execution mechanism 7 starts block reading for reading block data including target data from the memory, and does not set the hit detection signal 8 to a true value.

【００２７】更にサイクルＴ３では、命令レジスタ２に
保持された２番目の複合命令ＣＩ２の各命令を対象とす
るＤステージが解読／レジスタリード機構４によって行
われ、その結果が解読結果レジスタ５に保持される。同
時に、複合命令ＣＩ２の次の複合命令ＣＩ３が命令フェ
ッチ機構１によってフェッチされ、命令レジスタ２に保
持される。Further, in the cycle T3, the D stage for each instruction of the second compound instruction CI2 held in the instruction register 2 is executed by the decoding / register reading mechanism 4, and the result is held in the decoding result register 5. Is done. At the same time, the compound instruction CI3 following the compound instruction CI2 is fetched by the instruction fetch mechanism 1 and held in the instruction register 2.

【００２８】上記複合命令ＣＩ２のＤステージでは、解
読した４命令中にレジスタファイル３内のレジスタＲ２
をアクセス先（書込み先）とするロード命令Ｌ→Ｒ２が
あるために、解読／レジスタリード機構４から値が“０
１００”の４ビットレジスタ使用信号４１が出力され
る。また、他の３命令中にレジスタファイル３内のレジ
スタをアクセス先（参照先、書込み先）とする命令が無
いものとすると、解読／レジスタリード機構４からは値
が“０１００”の４ビットレジスタ使用信号４２が出力
される。このときフラグレジスタ１０の内容は“１００
０”であり、アンドゲート１１〜１４の出力信号である
ＷＡＩＴ信号２１〜２４は真値とはならない。In the D stage of the compound instruction CI2, the register R2 in the register file 3 is included in the four decoded instructions.
There is a load instruction L → R2 whose access destination (write destination) is “0”.
A 4-bit register use signal 41 of "100" is output. If it is assumed that there is no instruction that makes the register in the register file 3 an access destination (reference destination, write destination) in the other three instructions, the decoding / register A 4-bit register use signal 42 whose value is "0100" is output from the read mechanism 4. At this time, the content of the flag register 10 is "100".
0 ", and the WAIT signals 21 to 24, which are output signals of the AND gates 11 to 14, do not become true values.

【００２９】このようにＷＡＩＴ信号２１〜２４が全て
偽値の場合、パイプライン制御機構２５はパイプライン
中の複合命令間にレジスタ参照関係が無いものとして、
パイプラインの流れを止めること（パイプロック）を控
える。この結果、次のサイクルＴ４では、以下に述べる
ように、現在Ｅステージにある複合命令ＣＩ１はブロッ
クリード中のロード命令Ｌ→Ｒ１を除き（おいてきぼり
にして）Ｗステージに入り、Ｄステージにある複合命令
ＣＩ２はＥステージに入り、Ｉステージにある複合命令
ＣＩ３はＤステージに入る。これは、複合命令ＣＩ２中
に、レジスタファイル３内のレジスタをアクセス先とす
る命令があったとしても、そのアクセス先がＲ１でなけ
れば、同様の結果となる。When the WAIT signals 21 to 24 are all false values, the pipeline control mechanism 25 determines that there is no register reference relationship between the compound instructions in the pipeline.
Refrain from stopping the pipeline flow (pipe lock). As a result, in the next cycle T4, as described below, the composite instruction CI1 currently in the E stage enters the W stage (excluding the load instruction L → R1 during block read), and enters the composite stage in the D stage. The instruction CI2 enters the E stage, and the composite instruction CI3 in the I stage enters the D stage. This is the same result even if there is an instruction in the compound instruction CI2 to access the register in the register file 3 if the access destination is not R1.

【００３０】さて、サイクルＴ３では、上記したように
解読／レジスタリード機構４から値が“０１００”の４
ビットレジスタ使用信号４１が出力されることから、Ｒ
２に対応するフラグレジスタ１０内のビットb1が、サイ
クルＴ３の終了時にセットされる。これにより、フラグ
レジスタ１０の内容は“１０００”から“１１００”と
なる。In cycle T3, as described above, the decoding / register reading mechanism 4 outputs the value "4" of "0100".
Since the bit register use signal 41 is output, R
Bit b1 in flag register 10 corresponding to 2 is set at the end of cycle T3. As a result, the content of the flag register 10 changes from "1000" to "1100".

【００３１】次のサイクルＴ４では、演算結果レジスタ
３５に保持された複合命令ＣＩ１中の＋，×，−の各命
令の演算結果を指定された書込み先に書込むためのＷス
テージが行われる。また、解読結果レジスタ５に保持さ
れた複合命令ＣＩ２中の−，＋，×の各命令の解読結果
に従い、演算実行機構６において−，＋，×の各演算
（Ｅステージ）が並列に行われ、セレクタ３２〜３４を
介して演算結果レジスタ３５に保持される。In the next cycle T4, a W stage for writing the operation result of each of the +, ×, and − instructions in the composite instruction CI1 held in the operation result register 35 to the designated write destination is performed. Further, in accordance with the decoding result of each of the-, +, and x instructions in the composite instruction CI2 held in the decoding result register 5, each of the-, +, and -X operations (E stage) is performed in parallel in the execution unit 6. , Are held in the operation result register 35 via the selectors 32 to 34.

【００３２】またサイクルＴ４では、解読結果レジスタ
５に保持された複合命令ＣＩ２中のロード命令Ｌ→Ｒ２
の解読結果に従い、ロード実行機構７において、ロード
データ読出しのための処理が行われる。ここで、目的ロ
ードデータがオペランドキャッシュに存在しないミスヒ
ットが検出されたものとすると、ブロックリードが開始
され、ヒット検出信号８は偽値のままとなる。In cycle T4, load instruction L → R2 in compound instruction CI2 held in decoding result register 5
According to the result of decoding, the load execution mechanism 7 performs a process for reading the load data. Here, assuming that a mishit in which the target load data does not exist in the operand cache is detected, block reading is started, and the hit detection signal 8 remains a false value.

【００３３】更にサイクルＴ４では、命令レジスタ２に
保持された３番目の複合命令ＣＩ３の各命令を対象とす
るＤステージが解読／レジスタリード機構４によって行
われ、その結果が解読結果レジスタ５に保持される。Further, in cycle T4, the D stage for each instruction of the third compound instruction CI3 held in the instruction register 2 is executed by the decoding / register reading mechanism 4, and the result is held in the decoding result register 5. Is done.

【００３４】上記複合命令ＣＩ３のＤステージでは、解
読した４命令中にロード命令Ｌ→Ｒ３があるために、解
読／レジスタリード機構４から値が“００１０”の４ビ
ットレジスタ使用信号４１が出力される。また、他の３
命令中にレジスタファイル３内のレジスタをアクセス先
（参照先、書込み先）とする命令が無いものとすると、
解読／レジスタリード機構４から値が“００１０”の４
ビットレジスタ使用信号４２が出力される。このときフ
ラグレジスタ１０の内容は“１１００”であり、アンド
ゲート１１〜１４の出力信号であるＷＡＩＴ信号２１〜
２４は真値とはならない。このため、パイプロックはな
されず、次のサイクルＴ５では、現在Ｅステージにある
複合命令ＣＩ２はブロックリード中のロード命令Ｌ→Ｒ
２をおいてきぼりにしてＷステージに入り、Ｄステージ
にある複合命令ＣＩ３はＥステージに入る。In the D stage of the compound instruction CI3, since the load instruction L → R3 is included in the decoded four instructions, the decode / register read mechanism 4 outputs the 4-bit register use signal 41 having the value “0010”. You. In addition, other three
Assuming that there is no instruction that makes the register in the register file 3 an access destination (reference destination, write destination) in the instruction,
From the decryption / register read mechanism 4, the value of "0010"
A bit register use signal 42 is output. At this time, the content of the flag register 10 is “1100”, and the WAIT signals 21 to 21 which are the output signals of the AND gates 11 to 14 are output.
24 is not a true value. For this reason, no pipe lock is performed, and in the next cycle T5, the composite instruction CI2 currently in the E stage is changed from the load instruction L → R
After entering 2, the stage enters the W stage, and the compound instruction CI3 in the D stage enters the E stage.

【００３５】（ケース２）次に、ケース２について、図
３のタイミングチャートを参照して説明する。ここで
は、図３に示すように、先頭の複合命令ＣＩ１１が、ロ
ード命令Ｌ→Ｒ１、加算命令＋、乗算命令×、および減
算命令−の４命令であり、複合命令ＣＩ１２が、ロード
命令Ｌ→Ｒ２、減算命令−、レジスタＲ１，Ｒ２の各内
容を加算する加算命令Ｒ１＋Ｒ２、および乗算命令×の
４命令であり、複合命令ＣＩ１３が、ロード命令Ｌ→Ｒ
３、減算命令−、除算命令÷、および減算命令−の４命
令であり、複合命令ＣＩ１１→ＣＩ１２間にレジスタＲ
１の参照関係が有るものとする。(Case 2) Next, Case 2 will be described with reference to the timing chart of FIG. Here, as shown in FIG. 3, the first composite instruction CI11 is a load instruction L → R1, an addition instruction +, a multiplication instruction ×, and a subtraction instruction −, and the composite instruction CI12 is a load instruction L → R2, a subtraction instruction-, an addition instruction R1 + R2 for adding the contents of the registers R1 and R2, and a multiplication instruction x.
3, a subtraction instruction-, a division instruction ÷, and a subtraction instruction-.
Assume that there is a reference relationship of 1.

【００３６】このケース２では、まずサイクルＴ１で、
複合命令ＣＩ１１をフェッチするＩステージが行われ
る。次のサイクルＴ２では、複合命令ＣＩ１１を解読す
るＤステージが行われると共に、複合命令ＣＩ１１の次
の複合命令ＣＩ１２をフェッチするＩステージが行われ
る。複合命令ＣＩ１１のＤステージでは、同ＣＩ１にロ
ード命令Ｌ→Ｒ１が含まれているために、値が“１００
０”のレジスタ使用信号４１が解読／レジスタリード機
構４から出力され、これによりサイクルＴ２の終了時に
は、フラグレジスタ１０の内容が“１０００”となる。In case 2, first, in cycle T1,
An I stage for fetching the compound instruction CI11 is performed. In the next cycle T2, a D stage for decoding the compound instruction CI11 is performed, and an I stage for fetching the compound instruction CI12 next to the compound instruction CI11 is performed. In the D stage of the compound instruction CI11, the value is “100” because the load instruction L → R1 is included in the CI1.
A register use signal 41 of "0" is output from the decryption / register read mechanism 4, whereby the content of the flag register 10 becomes "1000" at the end of the cycle T2.

【００３７】次のサイクルＴ３では、複合命令ＣＩ１１
中の＋，×，−の各命令のＤステージの結果に従い、演
算実行機構６において＋，×，−の各演算（Ｅステー
ジ）が並列に行われ、セレクタ３２〜３４を介して演算
結果レジスタ３５に保持される。In the next cycle T3, the composite instruction CI11
In accordance with the result of the D stage of each of the +, ×, and − instructions, the +, ×, and − operations (E stage) are performed in parallel in the operation execution mechanism 6, and the operation result registers are provided via selectors 32 to 34. 35.

【００３８】またサイクルＴ３では、複合命令ＣＩ１１
中のロード命令Ｌ→Ｒ１のＤステージの結果に従い、ロ
ード実行機構７において、ロードデータ読出しのための
処理が行われる。ここで、目的ロードデータがオペラン
ドキャッシュに存在しないミスヒットが検出されたもの
とすると、ブロックリードが開始され、ヒット検出信号
８は偽値のままとなる。In cycle T3, compound instruction CI11
According to the result of the D stage of the middle load instruction L → R1, the load execution mechanism 7 performs a process for reading the load data. Here, assuming that a mishit in which the target load data does not exist in the operand cache is detected, block reading is started, and the hit detection signal 8 remains a false value.

【００３９】更にサイクルＴ３では、２番目の複合命令
ＣＩ１２の各命令を対象とするＤステージが解読／レジ
スタリード機構４によって行われる。同時に、複合命令
ＣＩ１２の次の複合命令ＣＩ１３が命令フェッチ機構１
によってフェッチされる。In the cycle T3, the D stage for each instruction of the second compound instruction CI12 is performed by the decoding / register reading mechanism 4. At the same time, the compound instruction CI13 following the compound instruction CI12 is
Fetched by

【００４０】さて、複合命令ＣＩ１２のＤステージで
は、同命令ＣＩ１２にロード命令Ｌ→Ｒ２が含まれてい
るために、値が“０１００”のレジスタ使用信号４１が
解読／レジスタリード機構４から出力される。また、命
令ＣＩ１２には、加算命令Ｒ１＋Ｒ２も含まれているこ
とから、残りの命令中にレジスタファイル３内のレジス
タをアクセス先（参照先、書込み先）とする命令が無い
ものとすると、解読／レジスタリード機構４からは値が
“１１００”の４ビットレジスタ使用信号４２が出力さ
れる。このときフラグレジスタ１０の内容は“１００
０”、ヒット検出信号８は偽値であることから、アンド
ゲート１１〜１４からのＷＡＩＴ信号２１〜２４のう
ち、ＷＡＩＴ信号２１だけが真値となる。In the D stage of the compound instruction CI12, since the load instruction L → R2 is included in the instruction CI12, the register use signal 41 having the value “0100” is output from the decoding / register reading mechanism 4. You. Since the instruction CI12 also includes the addition instruction R1 + R2, if there is no instruction that makes the register in the register file 3 an access destination (reference destination or write destination) among the remaining instructions, the decoding / The register read mechanism 4 outputs a 4-bit register use signal 42 having a value of “1100”. At this time, the content of the flag register 10 is "100
Since the hit detection signal 8 is a false value, only the WAIT signal 21 of the WAIT signals 21 to 24 from the AND gates 11 to 14 has a true value.

【００４１】このようにＷＡＩＴ信号２１〜２４のうち
の少なくとも１つ（ここではＷＡＩＴ信号２１）が真値
の場合、パイプライン制御機構２５は現在実行中の複合
命令と後続の複合命令との間にレジスタ参照関係が有る
ものとして、パイプラインの流れを止めるパイプロック
を行う。これにより、実行中複合命令ＣＩ１１の次の
（加算命令Ｒ１＋Ｒ２を含む）複合命令ＣＩ１２はＤス
テージで、更に次の複合命令ＣＩ１３はＩステージで、
それぞれ以下に述べるように複合命令ＣＩ１中のロード
命令Ｌ→Ｒ１の実行完了まで待たされる。As described above, when at least one of the WAIT signals 21 to 24 (here, the WAIT signal 21) is a true value, the pipeline control mechanism 25 determines whether or not the currently executed composite instruction and a subsequent composite instruction are to be executed. , A pipe lock for stopping the pipeline flow is performed. As a result, the compound instruction CI12 (including the addition instruction R1 + R2) next to the currently executing compound instruction CI11 is at the D stage, and the next compound instruction CI13 is at the I stage.
As described below, the process waits until the execution of the load instruction L → R1 in the composite instruction CI1 is completed.

【００４２】さて本実施例では、複合命令ＣＩ１１中の
ロード命令Ｌ→Ｒ１に従うロード実行機構７のブロック
リード処理がサイクルＴ５で完了し、指定されたデータ
（ロードデータ）がロード実行機構７から出力されたも
のとする。このロードデータはセレクタ３１によって選
択され、サイクルＴ５の終了時に演算結果レジスタ３５
に保持される。また、上記のロードデータは、解読／レ
ジスタリード機構４によってレジスタＲ１から読出され
たデータであるかのように、解読結果レジスタ５に保持
される。In this embodiment, the block read process of the load execution unit 7 according to the load instruction L → R1 in the compound instruction CI11 is completed in cycle T5, and the designated data (load data) is output from the load execution unit 7. It shall have been done. This load data is selected by the selector 31 and, at the end of the cycle T5, the operation result register 35
Is held. The load data is held in the decoding result register 5 as if it were data read from the register R1 by the decoding / register reading mechanism 4.

【００４３】上記のように、サイクルＴ５において複合
命令ＣＩ１１中のロード命令Ｌ→Ｒ１の処理が完了する
と、Ｒ１に対応するフラグレジスタ１０内のビットｂ0
がリセットされる。これにより、フラグレジスタ１０の
内容は“１１００”から“０１００”に変化する。この
とき、Ｄステージには加算命令Ｒ１＋Ｒ２を含む複合命
令ＣＩ１２が止められているため、解読／レジスタリー
ド機構４から出力されるレジスタ使用信号４２は“１１
００”のままであるが、フラグレジスタ１０の内容が
“０１００”となったために、アンドゲート１１〜１４
のＡＮＤ条件は成立せず、ＷＡＩＴ信号２１〜２４は全
て偽値となる。この結果、パイプライン制御機構２５は
パイプライン中の複合命令間にレジスタ参照関係が無く
なったものとして、パイプロックを解除する。この結
果、次のサイクルＴ６では、以下に述べるように、現在
Ｅステージにある複合命令ＣＩ１１はＷステージに入
り、Ｄステージにある複合命令ＣＩ１２はＥステージに
入り、Ｉステージにある複合命令ＣＩ１３はＤステージ
に入る。As described above, when the processing of the load instruction L → R1 in the composite instruction CI11 is completed in the cycle T5, the bit b0 in the flag register 10 corresponding to R1 is set.
Is reset. As a result, the content of the flag register 10 changes from “1100” to “0100”. At this time, since the composite instruction CI12 including the addition instruction R1 + R2 is stopped in the D stage, the register use signal 42 output from the decoding / register read mechanism 4 is "11".
00 ", but since the contents of the flag register 10 have become" 0100 ", the AND gates 11 to 14
Are not satisfied, and all of the WAIT signals 21 to 24 become false values. As a result, the pipeline control mechanism 25 releases the pipe lock on the assumption that there is no register reference relationship between the compound instructions in the pipeline. As a result, in the next cycle T6, the compound instruction CI11 currently in the E stage enters the W stage, the compound instruction CI12 in the D stage enters the E stage, and the compound instruction CI13 in the I stage enters Enter the D stage.

【００４４】なお、ロード実行機構７における複合命令
ＣＩ１１中のロード命令Ｌ→Ｒ１に対する処理で、キャ
ッシュヒットが検出されてヒット検出信号８が真値とな
った場合には、複合命令ＣＩ１１と後続の複合命令Ｃ１
２との間にレジスタＲ１の参照関係があったとしても、
アンドゲート１１（〜１４）のＡＮＤ条件は成立せず、
ＷＡＩＴ信号２１（〜２４）は偽値となる。この場合、
パイプライン制御機構２５はパイプラインの流れを止め
ない。これは、ロード実行機構７でキャッシュヒットが
検出された場合には、そのサイクルでロード処理が完了
して、目的のロードデータが演算結果レジスタ３５およ
び解読結果レジスタ５に保持され、次のサイクルでその
ロードデータを用いた加算命令Ｒ１＋Ｒ２が行えるため
である。以上は、４命令を並列に処理する並列演算処理
装置について説明したが、本発明は複数の命令を並列に
処理する並列演算処理装置全般に適用可能である。ま
た、説明の簡略化のために、複合命令に含まれるロード
命令は１つに制限されているものとして説明したが、こ
れに限るものではないことは勿論である。In the processing of the load instruction L → R1 in the composite instruction CI11 in the load execution unit 7, if a cache hit is detected and the hit detection signal 8 becomes a true value, the composite instruction CI11 and the succeeding instruction Compound instruction C1
Even if there is a reference relationship of register R1 with
The AND condition of AND gate 11 (~ 14) is not satisfied,
The WAIT signal 21 (to 24) becomes a false value. in this case,
The pipeline control mechanism 25 does not stop the flow of the pipeline. That is, when a cache hit is detected by the load execution unit 7, the load processing is completed in that cycle, the target load data is held in the operation result register 35 and the decryption result register 5, and in the next cycle, This is because the addition instruction R1 + R2 using the load data can be performed. In the above, the parallel processing device that processes four instructions in parallel has been described. However, the present invention is applicable to all parallel processing devices that process a plurality of instructions in parallel. Further, for simplicity of description, the description has been made assuming that the number of load instructions included in the compound instruction is limited to one, but it is a matter of course that the present invention is not limited to this.

【００４５】[0045]

【発明の効果】以上詳述したようにこの発明によれば、
パイプライン方式の並列演算処理装置において、先行す
る複合命令と後続の複合命令との間に少なくとも１つの
依存関係が存在するか否かを調べ、依存関係が存在しな
い場合には、たとえ先行複合命令中にロード命令が含ま
れていても後続複合命令を待たせない構成としたので、
ロード命令がキャッシュミスヒットとなって長時間のブ
ロックリード処理が行われたとしても、そのブロックリ
ード中のロード命令だけをおいてきぼりにして後続複合
命令を実行させることができ、この後続複合命令の待ち
時間をできるだけ少なくして、処理性能を向上させるこ
とができる。As described in detail above, according to the present invention,
In a parallel processing device of a pipeline system, it is checked whether or not at least one dependency exists between a preceding compound instruction and a following compound instruction. Even if a load instruction is included, the following compound instruction is not made to wait.
Even if a load instruction causes a cache miss and a long block read process is performed, only the load instruction during the block read can be narrowed down and the subsequent compound instruction can be executed. The processing performance can be improved by minimizing the time.

[Brief description of the drawings]

【図１】この発明の一実施例に係る並列演算処理装置の
構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of a parallel operation processing device according to one embodiment of the present invention.

【図２】同実施例において複合命令間にレジスタ参照関
係の無い場合の動作を説明するためのタイミングチャー
ト。FIG. 2 is a timing chart for explaining an operation when there is no register reference relationship between compound instructions in the embodiment.

【図３】同実施例において複合命令間にレジスタ参照関
係の有る場合の動作を説明するためのタイミングチャー
ト。FIG. 3 is a timing chart for explaining an operation when a compound instruction has a register reference relationship in the embodiment.

[Explanation of symbols]

１…命令フェッチ機構、２…命令レジスタ、３…レジス
タファイル、４…解読／レジスタリード機構、５…解読
結果レジスタ、６…演算実行機構、７…ロード実行機
構、９…依存関係検出回路、１０…フラグレジスタ（状
態保持手段）、１１〜１４…アンドゲート（一致検出手
段）、２５…パイプライン制御機構、３５…演算結果レ
ジスタ。DESCRIPTION OF SYMBOLS 1 ... Instruction fetch mechanism, 2 ... instruction register, 3 ... register file, 4 ... decoding / register read mechanism, 5 ... decoding result register, 6 ... operation execution mechanism, 7 ... load execution mechanism, 9 ... dependency detection circuit, 10 ... Flag registers (state holding means), 11 to 14 AND gates (coincidence detecting means), 25... Pipeline control mechanism, 35.

Claims

(57) [Claims]

1. A pipeline-type parallel processing device capable of processing a plurality of instructions fetched by an instruction fetch mechanism in parallel, comprising: a plurality of preceding instructions fetched earlier by said instruction fetch mechanism; Dependency detection means for detecting that at least one dependency exists between a plurality of subsequent instructions fetched later, and a flow of the pipeline according to a detection result of the dependency detection means. A pipeline control mechanism for controlling, wherein the pipeline control mechanism detects that there is no dependency between the preceding plurality of instructions and the following plurality of instructions by the dependency detection unit. in case of,
A parallel processing device, wherein even if a load instruction is included in the preceding plurality of instructions, the subsequent plurality of instructions are not made to wait.

2. A state in which, when a load instruction is included in the plurality of preceding instructions, the dependency detection unit indicates that a load data write destination specified by the load instruction is in use. Holding means, and match detecting means for detecting that the load data write destination indicated by the state holding means matches the access destination specified by any of the subsequent instructions. 2. The parallel processing device according to claim 1, wherein: