JPH08263289A

JPH08263289A - Pipeline computer for plural instruction flows

Info

Publication number: JPH08263289A
Application number: JP8883895A
Authority: JP
Inventors: Masatoshi Hotta; 正利堀田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1995-03-22
Filing date: 1995-03-22
Publication date: 1996-10-11

Abstract

PURPOSE: To improve the arithmetic processing ability of a pipeline computer. CONSTITUTION: An instruction flow identification tag generation unit 31 generates an instruction flow identification tag for identifying plural instruction flows. A PC unit 32 is provided with the same number of program counters as the instruction flows to select the program counter of an instruction flow shown by the tag. A decoder 3 decodes an instruction specified by the program counter. A renaming register 33 executes the renaming of registers by distinguishing the instruction flow based on the instruction flow identification tag. Each reservation station 10 to 13 holds the decoded instruction and the entry values, etc., of the instruction flow identification tag and the renaming register 33. Each arithmetic unit 20 to 23 executes an instruction and sends the execution result to an entry corresponding to the renaming register 33.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、複数の命令列の実行機
能を備えた複数命令流パイプライン計算機の、特に、汎
用レジスタの使用の制御構成に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multi-instruction flow pipeline computer having a function of executing a plurality of instruction sequences, and more particularly to a control configuration for using general purpose registers.

【０００２】[0002]

【従来の技術】単一命令流を処理するパイプライン計算
機において、その処理性能を向上させるために同時に複
数の命令を実行するスーパースカラ方式や、命令の発行
順序を変えて実行するアウトオブオーダ発行が用いられ
ている。2. Description of the Related Art In a pipeline computer that processes a single instruction stream, a superscalar system that executes a plurality of instructions at the same time in order to improve its processing performance, and an out-of-order issue that executes instructions in different order. Is used.

【０００３】その際に、性能低下の原因となる命令間の
データ依存を回避するための技術としてレジスターリネ
ーミング（register renaming ：レジスタ名前替え）が
用いられる。このレジスターリネーミングを用いること
により、命令依存のうち、先行の命令がそのレジスタの
値をリードしてからでないとレジスタに新しい値を書き
込めないといった逆依存と、同じレジスタに同時に書き
込もうとしたという出力依存という二つの依存を取り除
くことができる。これにより依存による性能の低下を抑
えることが可能となる。At this time, register renaming is used as a technique for avoiding data dependence between instructions which causes performance degradation. By using this register renaming, among the instruction dependence, the preceding instruction must read the value of that register before it can write a new value to the register, and the output that it tried to write to the same register at the same time. You can remove the two dependencies of dependence. As a result, it is possible to suppress the performance deterioration due to dependence.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記の
ような単一命令流を処理するパイプライン計算機が実行
中にデータの真の依存のためやメモリへのストア、キャ
ッシュのミスヒット等により実行が待たされたりして、
プロセッサの演算資源が使われずに無駄になることがあ
る。However, during execution of a pipeline computer that processes a single instruction stream as described above, execution may occur due to true dependence of data, memory store, cache miss-hit, and the like. I was kept waiting,
The computing resources of the processor may be wasted because they are not used.

【０００５】特にスーパースカラ方式の場合、プロセッ
サの演算資源として、例えば４命令が同時に演算できる
にもかかわらず、命令間の依存のために２命令しか実行
できずに、残りの演算資源が使われない場合があった。
このような点から、演算ユニットの有効利用が図れ、演
算処理能力を向上させることのできるパイプライン計算
機の実現が望まれていた。In particular, in the case of the superscalar system, as the calculation resource of the processor, for example, although four instructions can be simultaneously calculated, only two instructions can be executed due to the dependency between the instructions, and the remaining calculation resources are used. There were times when it wasn't.
From such a point, it has been desired to realize a pipeline computer capable of effectively using the arithmetic unit and improving the arithmetic processing capability.

【０００６】[0006]

【課題を解決するための手段】本発明の複数命令流パイ
プライン計算機は、複数の命令流を識別するためのタグ
を生成する命令流識別タグ生成ユニットと、命令流の個
数分のプログラムカウンタを備え、命令流識別タグに基
づき対応するプログラムカウンタを選択するＰＣユニッ
トと、命令流識別タグに基づき、命令流毎にレジスタ名
前替えを行うリネーミングレジスタとを備えている。そ
して、演算ユニットは、リネーミングレジスタの値を用
い、ＰＣユニットから選択されたプログラムカウンタに
よって指定された命令を実行するよう構成されているも
のである。A multiple instruction flow pipeline computer of the present invention comprises an instruction flow identification tag generation unit for generating a tag for identifying a plurality of instruction streams, and a program counter for the number of instruction streams. A PC unit that selects a corresponding program counter based on the instruction stream identification tag and a renaming register that renames the register for each instruction stream based on the instruction stream identification tag are provided. Then, the arithmetic unit is configured to execute the instruction designated by the program counter selected from the PC unit, using the value of the renaming register.

【０００７】[0007]

【作用】本発明の複数命令流パイプライン計算機におい
ては、命令流識別タグ生成ユニットは、複数の命令流の
うちの一つを選択する信号を送出する。これにより、Ｐ
Ｃユニットは、対応するプログラムカウンタを選択し、
その結果、いずれかの命令流が選択される。また、リネ
ーミングレジスタは、命令流識別タグに基づき、命令流
毎にレジスタ名前替えを行う。更に、演算ユニットは、
リネーミングレジスタの値を用いて、プログラムカウン
タによって選択された命令流中の命令を実行する。In the multi-instruction flow pipeline computer of the present invention, the instruction flow identification tag generation unit sends a signal for selecting one of a plurality of instruction flows. This gives P
The C unit selects the corresponding program counter,
As a result, either instruction stream is selected. In addition, the renaming register changes the register name for each instruction stream based on the instruction stream identification tag. Furthermore, the arithmetic unit
The value in the renaming register is used to execute the instruction in the instruction stream selected by the program counter.

【０００８】[0008]

【実施例】以下、本発明の実施例を図面を用いて詳細に
説明する。図１は、本発明の一実施例による複数命令流
パイプライン計算機の構成図であるが、これに先立ち、
スーパースカラプロセッサの構成について説明する。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a block diagram of a multi-instruction flow pipeline computer according to an embodiment of the present invention. Prior to this,
The configuration of the superscalar processor will be described.

【０００９】図２は、そのスーパースカラプロセッサの
構成図である。図の装置は、命令メモリ１、命令キャッ
シュ２、デコーダ３、リオーダバッファ４、レジスタ
５、プログラムカウンタ（ＰＣ）６、データキャッシュ
７、データメモリ８、リザベーションステーション１０
〜１３、演算ユニット２０〜２３からなる。FIG. 2 is a block diagram of the superscalar processor. The apparatus shown in the figure includes an instruction memory 1, an instruction cache 2, a decoder 3, a reorder buffer 4, a register 5, a program counter (PC) 6, a data cache 7, a data memory 8, and a reservation station 10.
-13, and arithmetic units 20-23.

【００１０】命令メモリ１は命令キャッシュ２を通じて
デコーダ３に接続されている。デコーダ３からの出力は
各演算ユニット２０〜２３のリザベーションステーショ
ン１０〜１３と、リオーダバッファ４とレジスタ５に接
続されている。プログラムカウンタ６の出力は、命令キ
ャッシュ２を通して命令メモリ１に接続されている。各
リザベーションステーション１０〜１３は、対応する各
演算ユニット２０〜２３に接続されており、ロードスト
アユニット２３を除く各演算ユニット２０〜２２の出力
はリオーダバッファ４に接続されている。ロードストア
ユニット２３の出力は、データキャッシュ７を通じてデ
ータメモリ８と接続されている。また、各演算ユニット
２０〜２３は、例えば、分岐（Branch）、算術演算（AL
U ）、シフタ（Shifter ）、ロード／ストア（Load/Sto
re）ユニットである。The instruction memory 1 is connected to the decoder 3 through the instruction cache 2. The output from the decoder 3 is connected to the reservation stations 10 to 13 of the arithmetic units 20 to 23, the reorder buffer 4 and the register 5. The output of the program counter 6 is connected to the instruction memory 1 through the instruction cache 2. The reservation stations 10 to 13 are connected to the corresponding arithmetic units 20 to 23, and the outputs of the arithmetic units 20 to 22 except the load / store unit 23 are connected to the reorder buffer 4. The output of the load / store unit 23 is connected to the data memory 8 through the data cache 7. In addition, each of the arithmetic units 20 to 23, for example, branches (Branch), arithmetic operation (AL
U), shifter (Shifter), load / store (Load / Sto
re) unit.

【００１１】図３は、リオーダバッファ４の構成図であ
る。リオーダバッファ４は、ｖフィールド、destフィー
ルド、dataフィールド、Ｃフィールドを備えたテーブル
である。ここで、ｖフィールドは１ビットからなる値で
そのエントリが有効か否かを示すもので、destフィール
ドは、そのエントリが実際にはどのレジスタの値を保持
しているかを表す。また、dataフィールドはそのレジス
タの持つ値、または結果タグを保持する。Ｃフィールド
は、そのエントリのdataフィールドが実際の値を持って
いるのか、または、実行完了待ち、即ち結果タグを持っ
ているのかを区別するための１ビットからなるフラグで
ある。FIG. 3 is a block diagram of the reorder buffer 4. The reorder buffer 4 is a table having a v field, a dest field, a data field, and a C field. Here, the v field is a 1-bit value indicating whether or not the entry is valid, and the dest field indicates which register value the entry actually holds. The data field holds the value of the register or the result tag. The C field is a 1-bit flag for distinguishing whether the data field of the entry has an actual value or the completion of execution, that is, the result tag.

【００１２】このように、リオーダバッファ４は、dest
フィールドを検索キーとして持つ連想メモリである。As described above, the reorder buffer 4 has a dest
It is an associative memory that has a field as a search key.

【００１３】次に、一般的なレジスタリネーミングにつ
いて説明する。命令が元のプログラム順序とは殆ど無関
係に発行されることになるアウト・オブ・オーダ発行の
場合でも、個々の命令の発行に関する制約は、命令が元
のプログラム順序で発行されるイン・オーダ発行の場合
と殆ど同じである。即ち、資源競合と依存関係が消滅し
た時点で命令は発行される。アウト・オブ・オーダ発行
は、発行の対象となる命令を単に多くプロセッサに与
え、結果として並列実行可能な命令を発見しやすくして
いるだけである。しかし、アウト・オブ・オーダに命令
発行することで、命令発行に新たな制約が生じる。これ
は、アウト・オブ・オーダ完了が、出力依存関係に関す
る制約を生じたのとちょうど同じである。Next, general register renaming will be described. Even with out-of-order issuance, where instructions are issued almost independently of the original program order, the constraint on issuing individual instructions is that the instructions are issued in the original program order. It is almost the same as the case. That is, the instruction is issued when the resource competition and the dependency disappear. Out-of-order issuance simply gives a large number of instructions to be issued to the processor, and as a result makes it easier to find instructions that can be executed in parallel. However, issuing commands out-of-order places new restrictions on command issuing. This is exactly the same as out-of-order completion caused constraints on output dependencies.

【００１４】これを、命令列の一例を用いて説明する。Ｒ３＝Ｒ３×Ｒ５（１）Ｒ４＝Ｒ３＋１（２）Ｒ３＝Ｒ５＋１（３）Ｒ７＝Ｒ３÷Ｒ４（４）ここで、上記命令列は、例えば、（１）では、レジスタ
Ｒ３の値とレジスタＲ５の値を掛けて、その値をレジス
タＲ３に書き込む、といった命令を意味している。This will be described using an example of an instruction sequence. R3 = R3 × R5 (1) R4 = R3 + 1 (2) R3 = R5 + 1 (3) R7 = R3 ÷ R4 (4) Here, for example, in the case of (1), the value of the register R3 and the register R5 And the value is written to the register R3.

【００１５】上記の命令列では、２番目の命令が実行を
始める以前に、３番目の命令を完了することはできな
い。さもないと、２番目の命令の第１ソース・オペラン
ド（＝Ｒ３）を３番目の命令が誤って上書きしてしまう
からである。３番目の命令の実行結果は、２番目の命令
の第１ソース・オペランドに対して逆依存関係（antide
pendecy ）にある。ここで、逆依存関係とは、方向が逆
であるという点以外は、真の依存関係と同様の制約を意
味している。真の依存関係では先行命令が後続命令の使
う値を生成するのに対して、逆依存関係では先行命令が
使う値を後続命令が破壊してしまう。これを避けるた
め、プロセッサは、２番目の命令が実行を始めるまで
は、３番目の命令を発行してはいけない。そして、２番
目の命令は、１番目の命令に依存しているから、３番目
の命令は１番目の命令が完了するのを待たなくてはなら
ない。たとえ、３番目の命令が他の点では独立であって
もそうである。In the above instruction sequence, the third instruction cannot be completed before the second instruction starts executing. Otherwise, the first source operand (= R3) of the second instruction is erroneously overwritten by the third instruction. The execution result of the third instruction is an inverse dependency (antide) on the first source operand of the second instruction.
pendecy). Here, the inverse dependency means the same constraint as the true dependency except that the directions are opposite. In the true dependency, the preceding instruction generates the value used by the subsequent instruction, whereas in the inverse dependency, the subsequent instruction destroys the value used by the preceding instruction. To avoid this, the processor must not issue the third instruction until the second instruction begins execution. Since the second instruction depends on the first instruction, the third instruction has to wait for the completion of the first instruction. Even if the third instruction is otherwise independent.

【００１６】そして、これを解決する手段がレジスタリ
ネーミング（レジスタ名前替え）という手法である。プ
ロセッサは、付加的なレジスタを設け、レジスタと値と
の間の対応関係を再構築することで、メモリ競合を除去
する。これら付加的なレジスタは、ハードウェアによっ
て実行時に動的に割り当てられ、そして、レジスタリネ
ーミングでプログラムで必要とする値と対応付けられ
る。レジスタリネーミングを実現するため、プロセッサ
は、新しく生成された全ての値に対して、即ち、レジス
タに書き込む全ての命令に対して、新しいレジスタを割
り付ける。A means for solving this is a technique called register renaming (register name change). The processor eliminates memory contention by providing additional registers and reconstructing the correspondence between registers and values. These additional registers are dynamically allocated by the hardware at run time and are associated with the values needed by the program in register renaming. To implement register renaming, the processor allocates a new register for every newly generated value, ie every instruction that writes to it.

【００１７】レジスタから値を読み出す命令は、元々、
指定されているレジスタからではなく、新しく割り当て
られたレジスタから値を読み出す。このように、新しい
レジスタおよび正しい値を識別するように、ハードウェ
アは命令内の元のレジスタ名を変更する。同じレジスタ
名でも、命令が異なればレジスタ割当てに対するレジス
タ参照の位置によっては異なるハードウェア・レジスタ
にアクセスすることになる。The instruction to read the value from the register is originally
Read the value from the newly allocated register instead of from the specified register. In this way, the hardware changes the original register name in the instruction to identify the new register and the correct value. Even with the same register name, different instructions will access different hardware registers depending on the location of the register reference to the register allocation.

【００１８】名前替えを行うと、上記の命令列は以下の
ようになる。Ｒ３_b＝Ｒ３_a×Ｒ５_a （１）Ｒ４_b＝Ｒ３_b＋１（２）Ｒ３_c＝Ｒ５_a＋１（３）Ｒ７_b＝Ｒ３_c÷Ｒ４_b （４）When the name is changed, the above instruction sequence becomes as follows. R3 _b = R3 _a × R5 _a (1) R4 _b = R3 _b +1 (2) R3 _c = R5 _a +1 (3) R7 _b = R3 _c ÷ R4 _b (4)

【００１９】この命令列では、レジスタへの各代入が、
新しいインスタンス（instance）を生成しており、これ
をアルファベットの添え字で示す。３番目の命令中のＲ
３に対して新しいインスタンスを生成して、２番目の命
令と１番目の命令に対する逆依存関係と出力依存関係を
消去している。しかも、４番目の命令には正しくオペラ
ンドを供給している。３番目の命令のＲ３に対する代入
は、１番目の命令Ｒ３に対する代入にとって代わる。よ
って、他の命令がＲ３に値を代入するまでは、後続命令
から見るとＲ３_C が新しいＲ３になる。In this instruction sequence, each assignment to the register is
We are creating a new instance, which is shown in the alphabetical subscript. R in the third instruction
A new instance is generated for 3, and the inverse dependency and the output dependency for the second instruction and the first instruction are deleted. Moreover, the operand is correctly supplied to the fourth instruction. The assignment of the third instruction to R3 replaces the assignment of the first instruction R3. Therefore, until another instruction assigns a value to R3, R3 _C becomes a new R3 from the viewpoint of the succeeding instruction.

【００２０】次に、このように構成されたスーパースカ
ラプロセッサの動作について説明する。プログラムカウ
ンタ６は、命令キャッシュ２にアクセスし、命令キャッ
シュ２からの出力はデコーダ３に送られる。（例えば、
上記命令列（１）〜（４）が送られるとする。）デコーダ３は、その命令を解釈し、どの演算ユニット２
０〜２３に送るかを決定し、対応するリザベーションス
テーション１０〜１３に出力を送る。（例えば、上記
（１）〜（４）では、算術演算であるため、リザベーシ
ョンステーション１１に送られる。Next, the operation of the superscalar processor thus constructed will be described. The program counter 6 accesses the instruction cache 2, and the output from the instruction cache 2 is sent to the decoder 3. (For example,
It is assumed that the instruction sequences (1) to (4) are sent. ) The decoder 3 interprets the instruction and determines which arithmetic unit 2
0-23, and sends the output to the corresponding reservation station 10-13. (For example, in (1) to (4) above, since it is an arithmetic operation, it is sent to the reservation station 11.

【００２１】また、同時に、デコーダ３は、命令のソー
スレジスタの値とデスティネーションレジスタの値をリ
オーダバッファ４に、ソースレジスタの値をレジスタ５
に送る。例えば、（１）の場合、Ｒ３_a 、Ｒ５_a の値と
Ｒ３_b の値をリオーダバッファ４に、また、Ｒ３_a 、Ｒ
５_a の値をレジスタ５に送る。At the same time, the decoder 3 stores the value of the source register and the value of the destination register of the instruction in the reorder buffer 4, and the value of the source register in the register 5.
Send to For example, in the case of (1), the value of R3 _a, R5 _a value and R3 _b to the reorder buffer 4, also, R3 _a, R
5 and sends the value of _a in the register 5.

【００２２】リオーダバッファ４は、このソースレジス
タの値をキーとして連想メモリを引き、destフィールド
が合致するものの中で最新の値を、対応するリザベーシ
ョンステーション１０〜１３に送る。この際、もし、合
致したエントリのＣフィールドが無効だった（結果タグ
であった）場合、値の代わりにその結果タグの値を送
る。The reorder buffer 4 draws an associative memory by using the value of the source register as a key, and sends the latest value among the ones matching the dest field to the corresponding reservation stations 10 to 13. At this time, if the C field of the matching entry is invalid (it was a result tag), the value of the result tag is sent instead of the value.

【００２３】もし合致するエントリがリオーダバッファ
４内にない場合は、レジスタ５の値がリザベーションス
テーション１０〜１３に送られる。また、リオーダバッ
ファ４は、デスティネーションレジスタの値を新しいエ
ントリのdestフィールドに保持して、ｖフィールドを有
効にし、また、Ｃフィールドを無効にし、結果タグを生
成してdataエントリに保持する。更に、このリオーダバ
ッファ４のエントリの値をリザベーションステーション
１０〜１３に送り、これがその命令の新しいデスティネ
ーションとなる。If there is no matching entry in reorder buffer 4, the value in register 5 is sent to reservation stations 10-13. Further, the reorder buffer 4 holds the value of the destination register in the dest field of the new entry, validates the v field, invalidates the C field, generates a result tag, and holds it in the data entry. Further, the value of the entry of the reorder buffer 4 is sent to the reservation stations 10 to 13, which becomes the new destination of the instruction.

【００２４】図４は、リオーダバッファ４における上記
命令列の値を示す説明図である。ここで、上のエントリ
からＲ３は、上記命令列におけるＲ３_b であり、次のＲ
４はＲ４_b 、その次のＲ３はＲ３_c 、更に、Ｒ７はＲ７
_b のエントリである。即ち、この状態は、Ｒ３_a 、Ｒ５
_a の値はレジスタ５にあり、Ｒ３、Ｒ４、Ｒ３、Ｒ７
は、上記命令列における（１）〜（４）に対応したディ
スティネーションレジスタの値である。そして、このリ
オーダバッファ４を用いることにより、（３）式が
（１）、（２）式とは無関係に実行できることが分か
る。尚、ｖフィールドおよびＣフィールドは、「１」が
有効、「０」が無効を示している。また、dataフィール
ドの「１０５」〜「１０８」は、結果タグを示してい
る。FIG. 4 is an explanatory diagram showing the values of the above-mentioned instruction sequence in the reorder buffer 4. Here, R3 from the entry of the above, a R3 _b in the above instruction sequence, the following R
4 is R4 _b , R3 next is R3 _c , and R7 is R7
It is an entry of _b . That is, this state is R3 _a , R5
The value of _a is in the register 5, R3, R4, R3, R7
Is the value of the destination register corresponding to (1) to (4) in the instruction sequence. Then, by using this reorder buffer 4, it is understood that the expression (3) can be executed independently of the expressions (1) and (2). In the v field and the C field, "1" indicates valid and "0" indicates invalid. Further, "105" to "108" in the data field indicate result tags.

【００２５】リザベーションステーション１０〜１３で
は、これらのデコードされた命令と、リオーダバッファ
４のエントリの値（デスティネーション）と、ソースレ
ジスタの値（もしくは、結果タグの値）を保持してお
り、該当する演算ユニット２０〜２３が空いていて、ソ
ースレジスタの値が揃っていれば、その演算ユニット２
０〜２３に送られる。例えば、図２の構成において、算
術演算ユニット２１が複数あり、かつ、ソースレジスタ
であるＲ３_a 、Ｒ５_a の値が揃っていれば、上記（１）
式と（３）式の同時実行が行われる。The reservation stations 10 to 13 hold these decoded instructions, the entry value (destination) of the reorder buffer 4, and the source register value (or the result tag value). If the operation units 20 to 23 to be operated are empty and the values of the source register are complete, the operation unit 2
Sent to 0-23. For example, in the configuration of FIG. 2, if there are a plurality of arithmetic operation units 21 and the source registers R3 _a and R5 _{a have} the same values, the above (1)
The expression and the expression (3) are simultaneously executed.

【００２６】演算ユニット２０〜２３で実行された結果
は、ストア命令を除きリオーダバッファ４の先ほどのエ
ントリへ送られ、dataフィールドに結果を書き込み、Ｃ
フィールドを有効にする。その際、dataフィールドに書
かれていた、結果タグの値と一致するタグを持っている
ものがリザベーションステーション１０〜１３にいない
かを探し、もし一致するものがあれば、そのリザベーシ
ョンステーション１０〜１３にも結果を書き込む。ま
た、実行が実際の命令順において前の命令の実行が全て
終っていれば、そのエントリのdataの値をdestフィール
ドで示されるレジスタ５へ送り、ｖフィールドを無効に
する。The results executed by the arithmetic units 20 to 23 are sent to the previous entries of the reorder buffer 4 except for the store instruction, the results are written in the data field, and C
Enable the field. At that time, the reservation stations 10 to 13 are searched for those having a tag that matches the value of the result tag written in the data field, and if there is a match, the reservation stations 10 to 13 are searched. Also write the result. If the execution of the previous instruction is completed in the actual instruction order, the data value of the entry is sent to the register 5 indicated by the dest field, and the v field is invalidated.

【００２７】また、割り込みや例外が発生した場合は、
リザベーションステーション１０〜１３のエントリを無
効化し、かつ、リオーダバッファ４のエントリのｖフィ
ールドを無効化する。When an interrupt or exception occurs,
The entries of the reservation stations 10 to 13 are invalidated, and the v field of the entry of the reorder buffer 4 is invalidated.

【００２８】ところで、複数のアプリケーションを実行
するといったように、複数の命令流がある場合は、各命
令流毎に実行できれば演算ユニットの有効利用が図れ、
プロセッサの性能向上を図ることができる。By the way, when there are a plurality of instruction streams such as execution of a plurality of applications, if the instruction streams can be executed for each instruction stream, the arithmetic unit can be effectively used.
The performance of the processor can be improved.

【００２９】図５は、このような複数の命令流の説明図
である。この例は、二つのアプリケーションＡＰ１、Ａ
Ｐ２の命令列を示しており、それぞれの命令（１Ａ）〜
（４Ａ）と、命令（１Ｂ）〜（４Ｂ）とはプログラムの
実行順序としては全く無関係である。本実施例ではこの
ような場合に複数の命令流を同時に実行可能とするため
に、図１の構成としたものであり、以下、これを詳細に
説明する。FIG. 5 is an explanatory diagram of such a plurality of instruction streams. This example shows two applications AP1, A
The instruction sequence of P2 is shown, and each instruction (1A)-
(4A) and the instructions (1B) to (4B) are completely unrelated to the execution order of the program. In this embodiment, in order to enable a plurality of instruction streams to be executed simultaneously in such a case, the configuration of FIG. 1 is adopted, which will be described in detail below.

【００３０】図１に示す装置は、命令メモリ１、命令キ
ャッシュ２、データキャッシュ７、データメモリ８、リ
ザベーションステーション１０〜１３、演算ユニット２
０〜２３を備えると共に、命令流識別タグ生成ユニット
３１、ＰＣユニット３２、リネーミングレジスタ３３を
備えている。ここで、命令メモリ１〜演算ユニット２３
の構成については、上述した図２の構成と同様である。The apparatus shown in FIG. 1 includes an instruction memory 1, an instruction cache 2, a data cache 7, a data memory 8, reservation stations 10 to 13, and an arithmetic unit 2.
0 to 23, an instruction stream identification tag generation unit 31, a PC unit 32, and a renaming register 33. Here, the instruction memory 1 to the arithmetic unit 23
The configuration of is the same as the configuration of FIG. 2 described above.

【００３１】命令流識別タグ生成ユニット３１は、複数
の命令流を識別するためのタグを生成するユニットで、
その出力は、各演算ユニット２０〜２３のリザベーショ
ンステーション１０〜１３と、リネーミングレジスタ３
３と、ＰＣユニット３２に接続されている。ＰＣユニッ
ト３２は、命令流の個数分のプログラムカウンタ（Ｐ
Ｃ）を持つプログラムカウンタユニットであり、その出
力は命令キャッシュ２を通して命令メモリ１に接続され
ている。The instruction stream identification tag producing unit 31 is a unit for producing a tag for identifying a plurality of instruction streams.
The output is output from the reservation stations 10 to 13 of the arithmetic units 20 to 23 and the renaming register 3
3 and the PC unit 32. The PC unit 32 includes program counters (P
C), the output of which is connected to the instruction memory 1 through the instruction cache 2.

【００３２】図６は、リネーミングレジスタ３３の構成
図である。このリネーミングレジスタ３３は、図２に示
した通常のリオーダバッファ４に［log₂n ］（小数点以
下切り上げ。ただし、ｎ＞１で命令流の数。）ビットか
らなる命令流指示ビットフィールドＩＮと、１ビットの
完了フラグフィールドＩＣとを付加したものであり、de
stフィールドとＩＮフィールドを検索キーとした連想メ
モリである。即ち、命令流指示ビットフィールドＩＮ
は、そのエントリの値がどの命令流であるかを識別する
ための値であり、完了フラグフィールドＩＣは、割り込
みに対処するためのフィールドである。FIG. 6 is a block diagram of the renaming register 33. This renaming register 33 has an instruction stream instruction bit field IN consisting of [log ₂ n] (rounded up to the nearest decimal point, where n> 1 is the number of instruction streams) bits in the normal reorder buffer 4 shown in FIG. 1-bit completion flag field IC is added.
The associative memory uses the st field and the IN field as search keys. That is, the instruction flow instruction bit field IN
Is a value for identifying which instruction flow the value of the entry is, and the completion flag field IC is a field for handling an interrupt.

【００３３】先ず、命令流識別タグ生成ユニット３１
が、ＰＣユニット３２に命令流個あるＰＣのうちの一つ
を選択する信号を送出する。ＰＣユニット３２は、選択
されたＰＣの値で命令キャッシュ２にアクセスし、命令
キャッシュ２からの出力はデコーダ３に送られる。デコ
ーダ３は命令を解釈し、どの演算ユニット２０〜２３に
送るかを決定し、対応するリザベーションステーション
１０〜１３に出力を送る。First, the instruction flow identification tag generation unit 31
Sends a signal to the PC unit 32 to select one of the PCs having an instruction stream. The PC unit 32 accesses the instruction cache 2 with the selected PC value, and the output from the instruction cache 2 is sent to the decoder 3. The decoder 3 interprets the instruction, determines which arithmetic unit 20-23 to send and sends the output to the corresponding reservation station 10-13.

【００３４】また、同時に命令流識別タグ生成ユニット
３１から［log₂n ］（小数点以下切り上げ）ビットの命
令流識別タグが同一リザベーションステーションに送ら
れる。更に、これと同時に、デコーダ３からは命令のソ
ースレジスタの値とデスティネーションレジスタの値
が、また、命令流識別タグ生成ユニット３１からは命令
流識別タグが、それぞれリネーミングレジスタ３３に送
られる。リネーミングレジスタ３３は、このソースレジ
スタの値と命令流識別タグをキーとして連想メモリを引
き、合致するものの中で最新の値を対応するリザベーシ
ョンステーション１０〜１３に送る。At the same time, the instruction stream identification tag generating unit 31 sends an instruction stream identification tag of [log ₂ n] (rounded up to the right of the decimal point) to the same reservation station. Further, at the same time, the value of the source register and the value of the destination register of the instruction are sent from the decoder 3 and the instruction stream identification tag is sent from the instruction stream identification tag generating unit 31 to the renaming register 33. The renaming register 33 draws an associative memory using the value of the source register and the instruction stream identification tag as a key, and sends the latest value among the matching ones to the corresponding reservation stations 10 to 13.

【００３５】この際、もし、合致したエントリのＣフィ
ールドが無効だった場合、値の代わりに結果タグの値を
送る。また、リネーミングレジスタ３３はデスティネー
ションレジスタの値を新しいエントリのdestフィールド
に、命令流識別タグをＩＮフィールドに加え、更に、ｖ
フィールドを有効にし、ＣフィールドとＩＣフィールド
を無効にし、結果タグを生成してdataフィールドに保持
する。また、このリネーミングレジスタ３３のエントリ
の値をリザベーションステーション１０〜１３に送る。At this time, if the C field of the matching entry is invalid, the value of the result tag is sent instead of the value. The renaming register 33 adds the value of the destination register to the dest field of the new entry and the instruction stream identification tag to the IN field, and further adds v
Enable the fields, disable the C and IC fields, generate result tags and hold them in the data field. Further, the value of the entry of the renaming register 33 is sent to the reservation stations 10 to 13.

【００３６】図７は、リネーミングレジスタ３３の内容
説明図である。即ち、これは、図５に示した二つのアプ
リケーションＡＰ１とＡＰ２との命令列が入力されたも
のであり、アプリケーションＡＰ１およびＡＰ２のそれ
ぞれのＲ３_a とＲ５_a の値がdataフィールドに書き込ま
れており、従って、Ｃフィールドが有効（＝１）であ
り、また、ＩＣフィールドも、前の命令での結果を保持
しているエントリのＩＣフィールドが全て有効であるた
め、有効（＝１）となっている。このように、ＡＰ１と
ＡＰ２とは、ＩＮフィールドの値が異なるため、これら
の命令は同時に実行することが可能となる。尚、dataフ
ィールドの「２」「３」「５」「６」は値であり、「１
０５」〜「１０８」および「２０１」〜「２０４」は、
結果タグの値である。FIG. 7 is an explanatory diagram of contents of the renaming register 33. That is, this is for the instruction sequence with two applications AP1 and AP2 shown in FIG. 5 is entered, the value of each of R3 _a and R5 _a application AP1 and AP2 have been written in the data field Therefore, the C field is valid (= 1), and the IC field is valid (= 1) because all the IC fields of the entry holding the result of the previous instruction are valid. There is. As described above, since the AP1 and AP2 have different IN field values, these instructions can be executed simultaneously. Note that “2”, “3”, “5”, and “6” in the data field are values, and “1”
05 ”-“ 108 ”and“ 201 ”-“ 204 ”
The value of the result tag.

【００３７】リザベーションステーション１０〜１３で
は、これらのデコードされた命令と命令流識別タグ、リ
ネーミングレジスタ３３のエントリの値、ソースレジス
タの値（もしくは、タグの値）を保持しており、該当す
る演算ユニット２０〜２３が空いていて、かつ、ソース
レジスタの値が揃っていれば、その演算ユニット２０〜
２３に送られる。The reservation stations 10 to 13 hold the decoded instruction and instruction stream identification tag, the entry value of the renaming register 33, and the value of the source register (or the value of the tag). If the arithmetic units 20 to 23 are empty and the values of the source register are complete, the arithmetic units 20 to
Sent to 23.

【００３８】演算ユニット２０〜２３で実行された結果
と命令流識別タグは、ストア命令を除きリネーミングレ
ジスタ３３の先ほどのエントリへ送られ、dataフィール
ドに結果を書き込み、Ｃフィールドを有効にする。その
際dataフィールドに書かれていた、結果タグの値と一致
するタグを持っているものが、リザベーションステーシ
ョン１０〜１３にいないかを探し、もし一致するものが
あれば、そのリザベーションステーション１０〜１３に
も結果を書き込む。また、ＩＣフィールドは、実行が実
際の命令順において前の命令での結果を保持しているエ
ントリのＩＣフィールドが全て有効であれば、有効にす
る。また、ＩＣフィールドが有効なもののうちdestエン
トリが同一なエントリが存在した場合、古い方のエント
リのｖフィールドを無効にする。The results executed by the arithmetic units 20 to 23 and the instruction stream identification tag are sent to the previous entries of the renaming register 33 except for the store instruction, the results are written in the data field, and the C field is validated. At that time, the reservation stations 10 to 13 are searched for those having a tag that matches the value of the result tag written in the data field, and if there is a match, the reservation stations 10 to 13 are searched. Also write the result. Further, the IC field is validated when all the IC fields of the entry holding the result of the previous instruction in the actual instruction order of execution are valid. If there is an entry having the same dest entry among valid IC fields, the v field of the older entry is invalidated.

【００３９】割り込みや例外が発生した場合は、割り込
みや例外を発生させた命令流識別タグと同一のタグを持
つリザベーションステーション１０〜１３のエントリを
無効化し、かつ、リネーミングレジスタ３３のエントリ
中で、ＩＮフィールドと命令流識別タグが一致し、ＩＣ
フィールドが無効なもののｖフィールドを無効化する。When an interrupt or exception occurs, the entries of the reservation stations 10 to 13 having the same tag as the instruction stream identification tag that caused the interrupt or exception are invalidated, and in the entry of the renaming register 33. , IN field and instruction stream identification tag match, IC
Invalidates the v field even though the field is invalid.

【００４０】[0040]

【発明の効果】以上説明したように、本発明の複数命令
流パイプライン計算機によれば、複数の命令流にアクセ
スするためのプログラムカウンタを有し、かつ、複数の
命令流を識別するためのタグを付加したリネーミングレ
ジスタでレジスタ名前替えを行うようにしたので、真の
依存のために、演算ユニットが空いているにもかかわら
ず、命令が発行できずに有効に活用できなかったもの
を、複数の命令流を実行できることにより、演算ユニッ
トの有効利用が図れ、高い演算処理性能を維持したまま
高スループットの処理を行うことができる。As described above, according to the multi-instruction flow pipeline computer of the present invention, the multi-instruction flow pipeline computer has a program counter for accessing a plurality of instruction streams and identifies a plurality of instruction streams. Since the renaming register with a tag is used for register renaming, even if the arithmetic unit is empty, instructions cannot be issued and cannot be effectively used due to true dependence. Since the plurality of instruction streams can be executed, the arithmetic unit can be effectively used and high throughput processing can be performed while maintaining high arithmetic processing performance.

[Brief description of drawings]

【図１】本発明の複数命令流パイプライン計算機の一実
施例の構成図である。FIG. 1 is a configuration diagram of an embodiment of a multi-instruction flow pipeline computer of the present invention.

【図２】一般的なスーパースカラプロセッサの構成図で
ある。FIG. 2 is a configuration diagram of a general superscalar processor.

【図３】リオーダバッファの構成図である。FIG. 3 is a configuration diagram of a reorder buffer.

【図４】リオーダバッファにおける命令列の一例を示す
説明図である。FIG. 4 is an explanatory diagram showing an example of an instruction sequence in a reorder buffer.

【図５】複数の命令流の説明図である。FIG. 5 is an explanatory diagram of a plurality of instruction streams.

【図６】本発明の複数命令流パイプライン計算機におけ
るリネーミングレジスタの構成図である。FIG. 6 is a configuration diagram of a renaming register in a multiple instruction flow pipeline computer of the present invention.

【図７】本発明の複数命令流パイプライン計算機におけ
るリネーミングレジスタの内容説明図である。FIG. 7 is an explanatory diagram of contents of a renaming register in the multi-instruction flow pipeline computer of the present invention.

[Explanation of symbols]

３デコーダ２０〜２３演算ユニット３１命令流識別タグ生成ユニット３２ＰＣユニット３３リネーミングレジスタ 3 Decoder 20-23 Operation Unit 31 Instruction Stream Identification Tag Generation Unit 32 PC Unit 33 Renaming Register

Claims

[Claims]

1. An instruction stream identification tag generation unit for generating an instruction stream identification tag for identifying which instruction stream an arbitrary instruction belongs to, and a program counter for the number of instruction streams. A PC unit that selects a program counter of the indicated instruction stream, a renaming register that renames a register for each instruction stream based on the instruction stream identification tag, and a value of the renaming register that is used by the program counter. A multi-instruction flow pipeline computer, comprising: an arithmetic unit for executing designated instructions.

2. An instruction stream identification tag generation unit for generating an instruction stream identification tag for identifying which instruction stream an arbitrary instruction belongs to, and a program counter for the number of instruction streams. A PC unit for selecting a program counter of the indicated instruction stream; a decoder for decoding an instruction designated by the program counter selected from the PC unit to obtain values of a source register and a destination register in the instruction; A renaming register that outputs data corresponding to the value of the source register and the instruction stream identification tag, and sets the value of the destination register in the instruction as a new entry for each instruction stream, and renames the register; Use the value corresponding to the source register output from the register , And executes the instruction decoded by the decoder, multiple instruction streams pipeline computer, characterized in that an arithmetic unit that writes the value of the destination register an execution result in the renaming register.