JP2000285082A

JP2000285082A - Central processing unit and compiling method

Info

Publication number: JP2000285082A
Application number: JP11091725A
Authority: JP
Inventors: Yoshifumi Yoshikawa; 宜史吉川; Shigehiro Asano; 滋博浅野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-03-31
Filing date: 1999-03-31
Publication date: 2000-10-13

Abstract

PROBLEM TO BE SOLVED: To provide a central processing unit(CPU) capable of effectively executing speculative processing while furthermore reducing the storage quantity of information, the quantity of hardware or throughput. SOLUTION: The CPU provided with plural operation execution units 8 to 11 and capable of allocating respective units 8 to 11 in the case of executing an interpreted instruction is also provided with a group value generation part 6 for providing a group value for identifying which instruction group obtained by dividing an instruction string of a program by a specific instruction includes an instruction concerned to the instruction at the time of executing it, speculation judging parts 12 to 15 for judging whether the execution of an instruction is speculative or not by using the group value applied to the instruction and a register retreating part 16 for storing information for restoring a register updated by a speculatively executed instruction to a state held before the execution of the instruction.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数の命令を並列
実行可能な中央演算装置及び中央演算装置で実行される
プログラムを生成するためのコンパイル方法に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a central processing unit capable of executing a plurality of instructions in parallel and a compiling method for generating a program executed by the central processing unit.

【０００２】[0002]

【従来の技術】命令並列度の向上は、計算機システムの
高性能化を図る方法の一つである。命令レベルの並列度
を上げる方法として、基本的に相違する２つの方法、す
なわちコンパイル時にスタティックに資源を割り当て使
用するＶＬＩＷ（ＶｅｒｙＬｏｎｇＩｎｓｔｒｕｃ
ｔｉｏｎＷｏｒｄ）方式と、実行時に資源の割り当て
をダイナミックに行うスーパースカラー方式がある。2. Description of the Related Art Improving the degree of instruction parallelism is one of the methods for improving the performance of a computer system. As a method of increasing the parallelism at the instruction level, there are basically two different methods, that is, a VLIW (Very Long Instruction) which statically allocates and uses resources at compile time.
Tion Word) system and a superscalar system that dynamically allocates resources at the time of execution.

【０００３】ＶＬＩＷプロセッサやスーパースカラープ
ロセッサで実行されるプログラムにも、通常、多数の条
件分岐命令やロード・ストア命令が含まれている。条件
分岐命令の分岐先が確定する前にその分岐先を予測して
次の命令を実行する場合や、ロード・ストア命令がペー
ジフォルトを発生しないことが保証される前に以後の命
令を実行する場合には、それら命令実行は投機的となる
が、投機的命令実行を可能とすることで、システムの命
令並列度は更に向上する。A program executed by a VLIW processor or a superscalar processor usually contains a large number of conditional branch instructions and load / store instructions. Execute the next instruction by predicting the branch destination of the conditional branch instruction before the branch destination is determined, or execute the subsequent instruction before it is guaranteed that the load / store instruction does not generate a page fault. In such a case, the execution of these instructions is speculative, but by enabling speculative instruction execution, the instruction parallelism of the system is further improved.

【０００４】従来、投機的命令実行方式として、リオー
ダーバッファ方式とチェックポイント方式がある。以下
これらの方式について説明する。Conventionally, there are a reorder buffer method and a checkpoint method as speculative instruction execution methods. Hereinafter, these methods will be described.

【０００５】リオーダーバッファ方式は、従来のｏｕｔ
−ｏｆ−ｏｒｄｅｒスーパースカラープロセッサで用い
られる方式である。このスーパースカラープロセッサ
は、ｏｕｔ−ｏｆ−ｏｒｄｅｒで実行された演算結果が
プログラム順にプロセッサの状態に反映されることを保
証する機構として、リオーダーバッファを備えている。
リオーダーバッファを用いた投機的命令実行方式につい
て説明するために、最初に、リオーダーバッファによる
ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行方式について説明する。[0005] The reorder buffer method uses the conventional out
-Of-order This is a method used in a superscalar processor. This superscalar processor includes a reorder buffer as a mechanism for ensuring that the operation results executed in out-of-order are reflected in the state of the processor in program order.
In order to describe a speculative instruction execution method using a reorder buffer, first, an out-of-order execution method using a reorder buffer will be described.

【０００６】ｏｕｔ−ｏｆ−ｏｒｄｅｒスーパースカラ
ープロセッサでは、入力オペランドが確定しているデコ
ードされた命令群から、実行できる命令を可能な限り選
び、それらを同時に実行する。命令は、オペランドが使
用可能で且つ演算実行ユニット（演算器）に空きがあれ
ば、その前にある命令が実行可能でなくても実行でき
る。そのため命令はときにはｏｕｔ−ｏｆ−ｏｒｄｅｒ
で実行される。演算結果は、それが投機的か否かにかか
わらず、直接にはレジスタに書かれず、一旦、リオーダ
ーバッファに登録される。An out-of-order superscalar processor selects as many executable instructions as possible from a group of decoded instructions whose input operands are fixed, and executes them simultaneously. An instruction can be executed as long as its operands are available and the operation execution unit (arithmetic unit) has room, even if the preceding instruction is not executable. So the instruction is sometimes out-of-order
Executed in Regardless of whether it is speculative or not, the operation result is not directly written in the register but is once registered in the reorder buffer.

【０００７】演算結果をリオーダーバッファに登録する
際、各演算結果に対して重複しない仮想レジスタ名称が
割当てられる。すなわち、同一レジスタを出力とする複
数の演算においても、それら演算結果に対する仮想レジ
スタはそれぞれ異なる。後の命令実行では、入力レジス
タ名称は仮想レジスタ名称に変更され、リオーダーバッ
ファおよびレジスタから値の読み出しが行われる。これ
をレジスタリネーミング機構と呼ぶ。When registering an operation result in the reorder buffer, a unique virtual register name is assigned to each operation result. That is, even in a plurality of operations that output the same register, the virtual registers corresponding to the operation results are different from each other. In the subsequent instruction execution, the input register name is changed to the virtual register name, and the value is read from the reorder buffer and the register. This is called a register renaming mechanism.

【０００８】例えば、図４３に示したプログラムにおい
て、命令番号１の命令“ｌｉ”における出力レジスタｒ
２と命令番号３の命令“ｌｗ”における出力レジスタｒ
２が、リオーダーバッファ登録時にそれぞれ＄１０と＄
１２のような仮想レジスタ名称を割当てられたとする
と、命令番号２の命令“ａｄｄ”のデコード時には、ｒ
２からそれぞれ＄１０へのレジスタリネーミングが行な
われ、命令番号４の命令“ｓｌｌ”のデコード時には、
ｒ２から＄１２へのレジスタリネーミングが行なわれ
る。命令２、命令４がレジスタを読み出す際には、仮想
レジスタに対応する演算結果がリオーダーバッファに存
在するなら、その結果が読み出され、そうでない場合に
は実レジスタであるｒ２の値が読み出される。For example, in the program shown in FIG. 43, the output register r for the instruction "li" of the instruction number 1
2 and the output register r in the instruction “lw” of the instruction number 3
2 is $ 10 and $ when reorder buffer is registered
Assuming that a virtual register name such as "12" is assigned, when the instruction "add" of the instruction number 2 is decoded, r
Register renaming from $ 2 to $ 10 is performed, and when the instruction "sll" of instruction number 4 is decoded,
Register renaming from r2 to $ 12 is performed. When the instructions 2 and 4 read the register, if the operation result corresponding to the virtual register exists in the reorder buffer, the result is read. Otherwise, the value of the real register r2 is read. It is.

【０００９】リオーダーバッファに登録された命令の結
果は、プログラム順で前にある全ての命令の結果がレジ
スタに書かれたことを確認した後で、レジスタに反映さ
れる。このことにより、ｏｕｔ−ｏｆ−ｏｒｄｅｒで演
算された結果がプログラム順に状態に反映されることが
保証される。The result of the instruction registered in the reorder buffer is reflected in the register after confirming that the results of all the preceding instructions in the program order have been written in the register. This guarantees that the result calculated by out-of-order is reflected in the state in the order of programming.

【００１０】以上に説明したリオーダーバッファによる
ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行方式において、命令の投
機的実行は以下のように実現される（Ｍ．Ｊｏｈｎｓｏ
ｎ，“ＳｕｐｅｒｓｃａｌａｒＭｉｃｒｏｐｒｏｃｅ
ｓｓｏｒＤｅｓｉｇｎ”，Ｐｒｅｎｔｉｃｅ−Ｈａｌ
ｌ，１９９１）。In the out-of-order execution method using the reorder buffer described above, speculative execution of an instruction is realized as follows (M. Johnson).
n, “Superscalar Microproce
ssor Design ”, Prentice-Hal
1, 1991).

【００１１】条件分岐命令の分岐予測が外れた場合、あ
るいはロード・ストア命令がページフォルトを発生した
場合には、投機的に実行した演算は誤りとなるため、そ
の結果はレジスタに反映させずにキャンセルする必要が
ある。この場合、プロセッサは、分岐予測の誤りもしく
はページフォルトが確認された命令より前に位置する命
令の完了を待って、トラップをかける。トラップにおい
ては、まずパイプラインで実行中の全命令がキャンセル
される。If the branch prediction of the conditional branch instruction is incorrect, or if the load / store instruction generates a page fault, the operation speculatively executed becomes erroneous, and the result is not reflected in the register. Need to cancel. In this case, the processor waits for the completion of the instruction located before the instruction in which the branch prediction error or the page fault is confirmed, and then traps the instruction. In the trap, first, all instructions being executed in the pipeline are canceled.

【００１２】リオーダーバッファに登録されている結果
も、トラップ時にキャンセルされる。トラップを起こす
命令より前の命令の結果は、既に完了しているためリオ
ーダーバッファには存在しない。一方、投機的に実行さ
れたトラップを起こす命令以後の命令の結果は、リオー
ダーバッファのキャンセルによりレジスタには反映され
ない。このようにして、投機的に実行された演算だけを
キャンセルすることができる。The result registered in the reorder buffer is also canceled at the time of trap. The result of the instruction before the instruction causing the trap does not exist in the reorder buffer because it has already been completed. On the other hand, the result of the speculatively executed instruction after the instruction causing the trap is not reflected in the register due to cancellation of the reorder buffer. In this way, it is possible to cancel only the operation performed speculatively.

【００１３】パイプラインとリオーダーバッファのキャ
ンセルが完了した時点で、プログラムカウンタを、条件
分岐命令のトラップでは正しい分岐先アドレスに変更
し、ロード・ストア命令のトラップではその命令のアド
レスに変更してトラップ処理を終了する。When the cancellation of the pipeline and the reorder buffer is completed, the program counter is changed to the correct branch destination address in the trap of the conditional branch instruction and to the address of the instruction in the trap of the load / store instruction. The trap processing ends.

【００１４】なお、投機的に実行した演算が正しい場合
には、その前の全命令の完了後に結果がレジスタに反映
される。If the speculatively executed operation is correct, the result is reflected in the register after completion of all previous instructions.

【００１５】以上のようにしてリオーダーバッファを用
いた投機的実行は実現される。As described above, speculative execution using the reorder buffer is realized.

【００１６】一方、チェックポイント方式は、従来のス
ーパースカラープロセッサおよびＶＬＩＷプロセッサの
いずれにおいても用いられる方式である。この方式で
は、プログラム上のチェックポイントにおいて、その時
点でのプロセッサの状態を保存しておく。投機的命令実
行の結果は、直接、レジスタに反映される。On the other hand, the checkpoint method is a method used in both conventional superscalar processors and VLIW processors. In this method, the state of the processor at that time is stored at a checkpoint on the program. The result of speculative instruction execution is directly reflected in the register.

【００１７】条件分岐命令やロード・ストア命令の実行
により、投機的な命令実行が誤りと判明した際には、ト
ラップを起こし、投機的命令実行の結果が反映される前
のチェックポイントまでプロセッサの状態を戻すこと
で、投機的実行の結果をキャンセルする。そして、トラ
ップを起こした命令まで非投機的に再実行することで、
再び投機的実行による誤動作が生じないことを保証す
る。投機的に実行した命令が正しいことが保証された時
点で、不要なチェックポイントの状態は破棄される。When the speculative instruction execution is found to be incorrect by the execution of the conditional branch instruction or the load / store instruction, a trap is generated, and the processor of the processor reaches a checkpoint before the result of the speculative instruction execution is reflected. By returning the state, the result of the speculative execution is cancelled. And by re-executing non-speculatively until the instruction that caused the trap,
Again, it is ensured that malfunction due to speculative execution does not occur. Unnecessary checkpoint states are discarded when the speculatively executed instruction is guaranteed to be correct.

【００１８】チェックポイントを取る方法に関しては、
経過クロック数や実行命令数をカウントし、それが一定
数に達した時点でチェックポイントを取る方式がまず挙
げられる。この方式では、チェックポイントを取る間隔
が長い場合には再実行時に多くの命令を実行する必要が
あり、短かい場合にはチェックポイントを取るための処
理が重い負荷になる。Regarding how to checkpoint,
First, a method of counting the number of elapsed clocks and the number of executed instructions and taking a checkpoint when the number reaches a certain number is available. In this method, if the interval for taking checkpoints is long, many instructions must be executed at the time of re-execution, and if the interval is short, processing for taking checkpoints becomes a heavy load.

【００１９】この問題を解決する方法として、命令実行
のために最低限保存する必要のある情報がどれだけであ
るかを前もって解析し、その情報をコードに付加するこ
とで保存する情報を少なくし、更にその分の情報を保存
するだけの領域がない場合には非投機的に実行するとい
う方法も提案されている（“ＰｒｏｃｅｓｓｏｒＳｔ
ｒｕｃｔｕｒｅａｎｄＭｅｔｈｏｄｆｏｒＣｈ
ｅｃｋｐｏｉｎｔｉｎｇＩｎｓｔｒｕｃｔｉｏｎｓ
ｔｏＭａｉｎｔａｉｎＰｒｅｃｉｓｅＳｔａｔ
ｅ”，米国特許５，６５９，７２１）。この方法は従来
のｉｎ−ｏｒｄｅｒ命令発行スーパースカラープロセッ
サとＶＬＩＷプロセッサの双方に適用することができ
る。As a method of solving this problem, it is necessary to analyze in advance how much information needs to be stored for executing an instruction, and to add the information to a code to reduce the information to be stored. In addition, there is also proposed a method of executing non-speculatively when there is not enough area to store the information (“Processor St”).
structure and Method for Ch
ekpointing Instructions
to Maintain Precision Stat
e ", U.S. Pat. No. 5,659,721. This method is applicable to both conventional in-order instruction issuing superscalar processors and VLIW processors.

【００２０】一般的なチェックポイント方式では、チェ
ックポイントごとに状態レジスタを含むプロセッサの全
ての状態を保存する。複数のチェックポイントを越えて
命令を投機的に実行する場合には、その間の全てのチェ
ックポイントの状態を保存する必要がある。よって、投
機的実行は、保存できる状態の数に制限される。In a general checkpoint method, all states of a processor including a state register are stored for each checkpoint. When an instruction is executed speculatively over a plurality of checkpoints, it is necessary to save the state of all checkpoints between them. Thus, speculative execution is limited to the number of states that can be saved.

【００２１】一方で、全てのレジスタを保存する方式で
は、ハードウェアレジスタのセットを複数組用意し、そ
れぞれのセットにチェックポイントの状態を持たせると
いう構成が可能である。この方法では、プロセッサの状
態をチェックポイントまで戻す作業は、使用するレジス
タセットの切り換え処理のみで済むため、状態の復帰を
高速に行なうことができるという利点もある。On the other hand, in the method of storing all the registers, it is possible to prepare a plurality of sets of hardware registers and provide a checkpoint state for each set. According to this method, the operation of returning the state of the processor to the checkpoint can be performed only by switching the register set to be used, so that there is an advantage that the state can be restored at a high speed.

【００２２】[0022]

【発明が解決しようとする課題】上述した投機的実行方
式のうち、まずリオーダーバッファ方式においては、リ
オーダーバッファの実現に複雑なハードウェアを要する
という問題がある。まず、リオーダーバッファはある演
算結果をレジスタに反映してよいかどうかを判断するた
めに、リオーダーバッファおよびデコードした命令を保
存する命令ウィンドウの中に、プログラム順でその演算
の前になる命令がないことを確認するための比較処理が
必要がある。ｏｕｔ−ｏｆ−ｏｒｄｅｒで実行される機
会を増大させるためには、リオーダーバッファや命令ウ
ィンドウのサイズを増やす必要があるが、これらのサイ
ズが増えると、その分、この比較に要するハードウェア
は複雑になる。また、同時にレジスタに反映できる数を
増す場合にも、やはり比較のためのハードウェアは複雑
になる。Among the speculative execution methods described above, the reorder buffer method has a problem in that complicated hardware is required to realize the reorder buffer. First, in order to determine whether the result of a certain operation can be reflected in the register, the reorder buffer displays instructions in the instruction window that stores the decoded instruction and the instruction that precedes the operation in program order. It is necessary to perform a comparison process to confirm that there is no data. In order to increase the chances of being executed in the out-of-order, the size of the reorder buffer and the instruction window needs to be increased. However, as these sizes increase, the hardware required for this comparison becomes complicated. become. Also, when the number that can be reflected in the register is increased at the same time, the hardware for comparison also becomes complicated.

【００２３】さらに、リオーダーバッファ方式では、投
機的命令実行の結果を直接レジスタに反映させないため
に、同一レジスタに書く複数の命令を投機的に実行する
ためには、レジスタリネーミング機構が必須となる。こ
のレジスタリネーミング機構のためのハードウェアも複
雑なものとなる。Further, in the reorder buffer method, a register renaming mechanism is indispensable in order to speculatively execute a plurality of instructions to be written in the same register so that the result of speculative instruction execution is not directly reflected in the register. Become. The hardware for this register renaming mechanism is also complicated.

【００２４】以上のようにリオーダーバッファ方式は、
ｏｕｔ−ｏｆ−ｏｒｄｅｒスーパースカラープロセッサ
のような、複雑なハードウェアを持つことが前提となる
プロセッサに適用される方式である。そのため、ハード
ウェアを単純にして動作周波数を上げることで性能向上
を図る、ＶＬＩＷのような思想に基づくプロセッサに用
いることはできない。また、組み込み用プロセッサのよ
うな、低消費電力化・低コストが重要となるプロセッサ
への適用も困難である。As described above, the reorder buffer method is as follows.
This is a method applied to a processor such as an out-of-order superscalar processor which is assumed to have complicated hardware. Therefore, it cannot be used for a processor based on an idea such as VLIW which aims to improve the performance by increasing the operating frequency by simplifying the hardware. Also, it is difficult to apply to a processor in which low power consumption and low cost are important, such as an embedded processor.

【００２５】一方、チェックポイント方式においては、
命令が投機的に実行される機会を増やすためには、より
多くのチェックポイントにおける状態を保存しておく必
要がある。一般的なチェックポイント方式では、チェッ
クポイントごとに全てのレジスタの状態を保存する必要
があり、状態の保存と復帰を高速に行なうためには、プ
ロセッサ内に複数のレジスタセットを用意する必要があ
る。このためのハードウェアにかかるコストは大きく、
汎用のプロセッサにおいても、レジスタセットを３つ以
上持つことは困難である。On the other hand, in the checkpoint method,
To increase the chances that an instruction is executed speculatively, it is necessary to save the state at more checkpoints. In a general checkpoint method, it is necessary to save the state of all registers for each checkpoint, and to save and restore the state at high speed, it is necessary to prepare a plurality of register sets in the processor . The cost of the hardware for this is large,
It is difficult for a general-purpose processor to have three or more register sets.

【００２６】チェックポイントで保存する状態を減らす
方式は、保存すべき状態をコードに付加する必要がある
ため、スーパースカラープロセッサに適用する場合には
命令セットアーキテクチャの変更が必要で、従来の命令
セットアーキテクチャとの互換性を保つのは困難であ
る。また、この方式の適用領域は、ｉｎ−ｏｒｄｅｒ命
令発行プロセッサに限られる。In the method of reducing the state to be saved at the checkpoint, the state to be saved needs to be added to the code. Therefore, when applied to a superscalar processor, the instruction set architecture needs to be changed. It is difficult to maintain compatibility with the architecture. The application area of this method is limited to an in-order instruction issuing processor.

【００２７】本発明は、上記事情を考慮してなされたも
ので、より少ない情報の保存やより少ないハードウェア
量もしくは処理量で効果的な投機的実行を可能とする中
央演算装置及びコンパイル方法を提供することを目的と
する。The present invention has been made in view of the above circumstances, and provides a central processing unit and a compiling method capable of storing less information and effectively performing speculative execution with a smaller amount of hardware or processing. The purpose is to provide.

【００２８】[0028]

【課題を解決するための手段】本発明（第１の発明）
は、複数の演算実行ユニットを備え、プログラムの命令
列を実行する際に前記演算実行ユニットの割り当てを行
う中央演算装置（例えば、ｉｎ−ｏｒｄｅｒ命令発行ス
ーパースカラープロセッサ）において、前記プログラム
の命令列を予め定められた特定の命令で区切ったとき
に、プログラム区間のいずれに属する命令であるかを識
別するための識別番号（例えば、グループ値）を、その
実行中に各命令毎に付与する識別番号付与手段（例え
ば、グループ値生成部）と、各命令に付与された前記識
別番号を用いて命令の実行が投機的か否かを判定する投
機性判定手段（例えば、投機性判定部）とを備えたこと
を特徴とする。Means for Solving the Problems The present invention (first invention)
A central processing unit (e.g., a superscalar processor that issues in-order instructions) that includes a plurality of operation execution units and allocates the execution units when executing the instruction sequences of the program. An identification number for assigning an identification number (for example, a group value) for identifying which instruction in a program section the instruction belongs to when divided by a predetermined specific instruction for each instruction during its execution An assigning unit (for example, a group value generating unit) and a speculative determining unit (for example, a speculative determining unit) that determines whether or not execution of the instruction is speculative using the identification number assigned to each instruction. It is characterized by having.

【００２９】また、本発明（第２の発明）は、複数の演
算実行ユニットを備え、プログラムの命令列を予め前記
演算実行ユニットの各々に割り当てられた一連の命令ご
とにフェッチし解釈する中央演算装置（例えば、ｏｕｔ
−ｏｆ−ｏｒｄｅｒ命令発行ＶＬＩＷ）において、フェ
ッチしたが実行できない命令を、（後続の命令を先行し
て実行させることを可能とするために、）一時待避させ
ておくための命令蓄積手段（例えば、ペンディングキュ
ー）と、各レジスタの使用状況に関する情報を記憶する
記憶手段と、前記記憶手段に記憶されている情報に基づ
いてフェッチした命令が実行可能であるか否か判断し、
実行不可と判断された場合に該命令を前記命令蓄積手段
に一時待避させる手段と、前記プログラムの命令列を予
め定められた特定の命令で区切ったときに、プログラム
区間のいずれに属する命令であるかを識別するための識
別番号（例えば、グループ値）を、その実行に際して各
命令毎に付与する識別番号付与手段（例えば、グループ
値生成部）と、各命令に付与された前記識別番号を用い
て命令の実行が投機的か否かを判定する投機性判定手段
（例えば、投機性判定部）とを備えたことを特徴とす
る。According to a second aspect of the present invention, there is provided a central processing unit having a plurality of operation execution units for fetching and interpreting an instruction sequence of a program for each of a series of instructions assigned to each of the operation execution units in advance. Device (eg, out
-Of-order instruction issuance VLIW), instruction storage means for temporarily saving an instruction that has been fetched but cannot be executed (to enable subsequent instructions to be executed in advance) (for example, Pending queue), storage means for storing information on the usage status of each register, and determining whether or not the fetched instruction is executable based on the information stored in the storage means,
Means for temporarily storing the instruction in the instruction storage means when it is determined that the instruction cannot be executed; and instructions belonging to any of the program sections when the instruction sequence of the program is divided by a predetermined specific instruction. An identification number (for example, a group value) for identifying each of the instructions at the time of execution of the identification number (for example, a group value), and the identification number assigned to each of the instructions. And a speculativeness judging unit (for example, a speculativeness judging unit) for judging whether the execution of the instruction is speculative.

【００３０】好ましくは、前記識別番号付与手段は、各
識別番号が使用中か否かを示す情報を保持し、識別番号
を付与すべき命令が前記特定の命令である場合には、前
記情報を参照して使用中でない識別番号を選択して付与
し、識別番号を付与すべき命令が前記特定の命令以外の
命令である場合には、直前に付与した識別番号と同じ識
別番号を付与する手段とを含むようにしてもよい（例え
ば、ビットベクタ方式）。Preferably, the identification number assigning means holds information indicating whether each identification number is in use or not, and when an instruction to which an identification number is to be assigned is the specific instruction, the identification number assigning means stores the information. Means for selecting and assigning an identification number that is not in use by reference, and when an instruction to be assigned an identification number is an instruction other than the specific instruction, means for assigning the same identification number as the identification number assigned immediately before (For example, a bit vector method).

【００３１】好ましくは、前記投機性判定手段は、前記
識別番号付与手段により付与される各識別番号毎に、該
識別番号が付与された特定の命令が実行中であるか否か
を示す情報に基づいて、判定対象となった命令に付与さ
れた識別番号と同じ識別番号が付与された特定の命令が
実行中であることが示されている場合に、該判定対象と
なった命令の実行が投機的であると判断するようにして
もよい。Preferably, the speculativeness judging means includes, for each of the identification numbers assigned by the identification number assigning means, information indicating whether or not a specific command assigned with the identification number is being executed. Based on the above, when it is indicated that a specific instruction having the same identification number as the identification number assigned to the instruction to be determined is being executed, execution of the instruction as the determination target is It may be determined that it is speculative.

【００３２】好ましくは、前記識別番号付与手段は、直
前に命令に付与した識別番号を保持する識別番号カウン
タ手段と、識別番号を付与すべき命令が前記特定の命令
以外の命令である場合には、前記カウンタ手段に保持さ
れている識別番号と同じ識別番号を付与し、識別番号を
付与すべき命令が前記特定の命令である場合には、前記
識別番号カウンタ手段に保持されている識別番号を１だ
け増加させた後に該増加後の識別番号を付与する手段と
を含むようにしてもよい（例えば、ハードウェアカウン
タ方式）。Preferably, the identification number assigning means includes an identification number counter means for holding an identification number assigned to a command immediately before, and an identification number adding means for assigning an identification number to a command other than the specific instruction. Assigning the same identification number as the identification number held in the counter means, and when the instruction to be given the identification number is the specific instruction, the identification number held in the identification number counter means Means for adding the increased identification number after incrementing by one (for example, a hardware counter method).

【００３３】好ましくは、前記識別番号カウンタ手段と
同じ初期状態を持ち、該識別番号カウンタ手段を用いて
前記識別番号が付与された前記特定の命令の実行が完了
した際に、保持する識別番号を１だけ増加させる実行命
令カウンタ手段を更に備え、前記投機性判定手段は、前
記識別番号カウンタ手段および前記実行命令カウンタ手
段を参照し、判定対象となった命令に付与された識別番
号が、前記実行命令カウンタ手段の保持する識別番号か
ら前記識別番号カウンタ手段の保持する識別番号までの
範囲内の識別番号に相当するものである場合に、該判定
対象となった命令の実行が投機的であると判断するよう
にしてもよい。Preferably, the identification number has the same initial state as that of the identification number counter means, and when the execution of the specific instruction to which the identification number is given is completed using the identification number counter means, the identification number to be held is changed. Further comprising execution instruction counter means for incrementing the execution instruction counter by one, wherein the speculative judgment means refers to the identification number counter means and the execution instruction counter means, and the identification number given to the instruction to be determined is the execution number. If the identification number corresponds to an identification number within a range from the identification number held by the instruction counter means to the identification number held by the identification number counter means, the execution of the instruction as the determination target is speculative. You may make it determine.

【００３４】好ましくは、投機的に実行されたと判定さ
れた命令により更新されたレジスタを該命令の実行前の
状態に復帰するための情報を保持する手段（例えば、レ
ジスタ退避部）を更に備えるようにしてもよい。Preferably, the apparatus further includes means (for example, a register saving unit) for holding information for restoring a register updated by an instruction determined to be speculatively executed to a state before execution of the instruction. It may be.

【００３５】好ましくは、第２の発明においては、前記
命令蓄積手段は、フェッチしたが実行できない前記命令
と併せて、該命令に付与された識別番号および該命令が
前記特定の命令に該当するか否かを示す情報を保持する
ようにしてもよい。Preferably, in the second invention, the instruction accumulating means, together with the fetched but unexecutable instruction, includes an identification number assigned to the instruction and whether the instruction corresponds to the specific instruction. Information indicating whether or not the information may be held may be held.

【００３６】また、好ましくは、第２の発明において
は、前記命令蓄積手段から命令を発行する場合、前記特
定の命令の命令発行はプログラム順を維持して行うよう
にしてもよい。Preferably, in the second invention, when the instruction is issued from the instruction storage means, the instruction of the specific instruction may be issued in a program order.

【００３７】また、好ましくは、投機的に実行されたと
判定された命令について当該投機的実行に失敗したと判
断された際には、該失敗の原因に応じたトラップを掛
け、トラップルーチンにより当該投機的実行の前の状態
に復帰させるようにしてもよい。Preferably, when it is determined that the speculative execution of the instruction determined to be speculatively executed has failed, a trap corresponding to the cause of the failure is trapped, and the speculation is performed by a trap routine. You may make it return to the state before the target execution.

【００３８】好ましくは、前記特定の命令は、条件分岐
命令に該当する命令であるようにしてもよい。また、好
ましくは、前記特定の命令は、ロード命令またはストア
命令に該当する命令であるようにしてもよい。さらに、
好ましくは、前記特定の命令は、Ｐｒｅｄｉｃａｔｅセ
ット命令に該当する命令であるようにしてもよい。[0038] Preferably, the specific instruction may be an instruction corresponding to a conditional branch instruction. Preferably, the specific instruction may be an instruction corresponding to a load instruction or a store instruction. further,
Preferably, the specific instruction may be an instruction corresponding to a Predicate set instruction.

【００３９】また、好ましくは、第２の発明において
は、前記特定の命令は、条件分岐命令、ロード命令、ス
トア命令またはそれら以外で出力オペランドが使用でき
ないために実行不可と判断された命令のうちのいずれか
に該当する命令であるようにしてもよい。Preferably, in the second invention, the specific instruction is a conditional branch instruction, a load instruction, a store instruction, or any of the other instructions determined to be unexecutable because output operands cannot be used. The instruction may correspond to any of the above.

【００４０】好ましくは、前記投機性判定手段は、投機
的実行を制限するためのＳＹＮＣ命令についてのみ、そ
の実行が投機的か否かを判定し、前記ＳＹＮＣ命令が投
機的に実行されたと判定された場合には、該ＳＹＮＣ命
令よりプログラム順で先に投入された前記特定の命令が
完了するまで該ＳＹＮＣ命令より後の命令を実行させな
いようにするようにしてもよい。Preferably, the speculativeness judging means judges whether or not the execution of only the SYNC instruction for restricting the speculative execution is speculative, and judges that the SYNC instruction is speculatively executed. In such a case, an instruction subsequent to the SYNC instruction may not be executed until the specific instruction input earlier in the program order than the SYNC instruction is completed.

【００４１】また、本発明（第３の発明）は、プログラ
ムの命令列を実行する際に複数の演算実行ユニットの割
り当てを行う中央演算装置（例えば、ｉｎ−ｏｒｄｅｒ
命令発行スーパースカラープロセッサ）、またはプログ
ラムの命令列を予め複数の演算実行ユニットの各々に割
り当てられた一連の命令ごとにフェッチし解釈する中央
演算装置（例えば、ｏｕｔ−ｏｆ−ｏｒｄｅｒ命令発行
ＶＬＩＷプロセッサ）で実行されるプログラムを生成す
るコンパイル方法であって、条件分岐命令から分岐した
一方のパス上に配置されたある命令の出力オペランド
が、該条件分岐命令から分岐した他方のパスにおいて更
新されることなく他の命令により参照されるような場合
における、該一方のパス上に配置された命令、および投
機的実行された命令の入力オペランドの一つと一致する
出力オペランドを持つような命令が、前記中央演算装置
において投機的に実行されないように投機的実行を制限
するためのＳＹＮＣ命令を生成して割り当てることを特
徴とする。なお、ＳＹＮＣ命令は、該ＳＹＮＣ命令が中
央演算装置で投機的に実行された場合に該ＳＹＮＣ命令
よりプログラム順で先に投入された投機的実行の原因と
なる特定の命令が完了するまで該ＳＹＮＣ命令より後の
命令を実行させないように制御させるための命令であ
る。According to the present invention (third invention), a central processing unit (for example, an in-order unit) for allocating a plurality of execution units when executing an instruction sequence of a program.
An instruction issuing super scalar processor) or a central processing unit (for example, an out-of-order instruction issuing VLIW processor) which fetches and interprets a sequence of instructions of a program for each of a series of instructions assigned to each of a plurality of operation execution units in advance. Wherein the output operand of an instruction arranged on one path branched from the conditional branch instruction is updated in the other path branched from the conditional branch instruction. The instruction placed on the one path and the instruction having an output operand that matches one of the input operands of the speculatively executed instruction in the case where the instruction is referred to by another instruction, SYNC for restricting speculative execution so that it is not speculatively executed in an arithmetic unit And allocating to generate decree. Note that, when the SYNC instruction is speculatively executed by the central processing unit, the SYNC instruction is executed until a specific instruction which is input prior to the SYNC instruction in program order and causes speculative execution is completed. This is an instruction for controlling not to execute an instruction subsequent to the instruction.

【００４２】次に、本発明（第４の発明）は、複数の演
算実行ユニットを備え、プログラムの命令列を予め前記
演算実行ユニットの各々に割り当てられた一連の命令ご
とにフェッチし解釈する中央演算装置（例えば、ｉｎ−
ｏｒｄｅｒ命令発行ＶＬＩＷプロセッサ）において、前
記プログラムの命令列を予め定められた特定の命令で区
切ったときに、該命令がプログラム区間のいずれに属す
るものであるかを識別するための識別番号（例えば、グ
ループ値）、および該命令の投機性に関する情報を、該
命令を解釈する際に抽出する手段と、実行開始した命令
に組み込まれた、前記投機性に関する情報に基づいて、
該命令の実行が投機的であるか否かを判定する投機性判
定手段と、投機的に実行されたと判定された命令に組み
込まれた前記識別番号および前記投機性に関する情報に
基づいて、該投機的実行が成功したか否かを判定する投
機結果判定手段と、投機的に実行されたと判定された命
令について投機的実行に失敗したと判断された際に、該
失敗の原因に応じてトラップを掛ける手段とを備えたこ
とを特徴とする。Next, the present invention (fourth invention) comprises a plurality of operation execution units, and a central processing unit for fetching and interpreting an instruction sequence of a program for each series of instructions assigned to each of the operation execution units in advance. An arithmetic unit (for example, in-
In an order instruction issuing VLIW processor), when an instruction sequence of the program is divided by a predetermined specific instruction, an identification number (for example, an identification number) for identifying to which of the program sections the instruction belongs. A group value), and information relating to the speculative property of the instruction, and a means for extracting the information at the time of interpreting the instruction; and
A speculativeness determining means for determining whether or not the execution of the instruction is speculative; and a speculative characteristic based on the identification number and the information on the speculativeness incorporated in the instruction determined to be speculatively executed. Speculative result determining means for determining whether or not speculative execution has succeeded, and when it is determined that speculative execution has failed for an instruction determined to have been speculatively executed, trapping is performed according to the cause of the failure. And a hanging means.

【００４３】好ましくは、前記投機性判定手段は、判定
対象となった命令に組み込まれた前記投機性に関する情
報が投機的である旨を示す場合に、該命令の実行が投機
的であると判定するようにしてもよい。Preferably, the speculativeness judging means judges that the execution of the instruction is speculative when the information on the speculativeness incorporated in the instruction to be judged indicates that the instruction is speculative. You may make it.

【００４４】好ましくは、前記投機結果判定手段は、判
定対象となった命令に組み込まれた前記識別番号と同じ
識別番号を持ち且つ前記投機性に関する情報に対応する
命令が実行された場合に、該命令の投機的実行が失敗し
たと判定するようにしてもよい。Preferably, the speculative result judging means, when an instruction having the same identification number as the identification number incorporated in the instruction to be judged and corresponding to the information relating to the speculative property is executed, It may be determined that the speculative execution of the instruction has failed.

【００４５】好ましくは、命令の投機的実行に失敗した
と判断された際に該投機的実行を行なう前の状態に復帰
するための退避情報を一次的に保持する保持手段（例え
ば、レジスタ退避部）を更に備えるようにしてもよい。
この場合に、好ましくは、前記退避情報は、投機的に実
行されたと判定された命令に付与された識別番号に対応
して保持されるようにしてもよい。さらに、前記退避情
報に基づく前記状態の復帰は、前記保持手段中の識別番
号を参照して選択的に行なうようにしてもよい。Preferably, when it is determined that the speculative execution of the instruction has failed, holding means (for example, a register saving unit) for temporarily holding save information for returning to a state before performing the speculative execution is provided. ) May be further provided.
In this case, preferably, the evacuation information may be held corresponding to an identification number given to the instruction determined to have been executed speculatively. Further, the return of the state based on the evacuation information may be selectively performed with reference to an identification number in the holding unit.

【００４６】次に、本発明（第５の発明）は、プログラ
ムの命令列を予め複数の演算実行ユニットの各々に割り
当てられた一連の命令ごとにフェッチし解釈する中央演
算装置（例えば、ｉｎ−ｏｒｄｅｒ命令発行ＶＬＩＷプ
ロセッサ）で実行されるプログラムを生成するコンパイ
ル方法であって、前記プログラムの命令列を予め定めら
れた特定の命令で区切ったときに、プログラム区間のい
ずれに属する命令であるかを識別するための識別番号
（例えば、グループ値）、および命令の投機性に関する
情報を各命令毎に付与するとともに、プログラム順で前
記特定の命令よりも後方に位置する命令を、投機的実行
の原因となる該特定の命令を越えて前方に移動すること
を特徴とする。Next, the present invention (fifth invention) relates to a central processing unit (for example, an in-instruction unit) which fetches and interprets an instruction sequence of a program for each of a series of instructions assigned to each of a plurality of operation execution units in advance. A compiling method for generating a program to be executed by an order instruction issuing VLIW processor), wherein, when an instruction sequence of the program is divided by a predetermined specific instruction, which instruction belongs to a program section is determined. An identification number for identification (for example, a group value) and information on the speculative nature of the instruction are given for each instruction, and an instruction located after the specific instruction in the program order is identified as a cause of the speculative execution. And moving forward beyond the specific command.

【００４７】また、本発明は、複数の演算実行ユニット
を備え、実行プログラムの命令列を実行する際に複数の
演算実行ユニットの割り当てを行う中央演算装置、また
は実行プログラムの命令列を予め複数の演算実行ユニッ
トの各々に割り当てられた一連の命令ごとにフェッチし
解釈する中央演算装置で実行可能なプログラムを生成す
るコンパイル手順であって、条件分岐命令から分岐した
一方のパス上に配置された命令の出力オペランドが、該
条件分岐命令から分岐した他方のパスにおいて更新され
ることなく他の命令により参照されるような場合におけ
る、該一方のパス上に配置された命令、および投機的実
行された命令の入力オペランドの一つと一致する出力オ
ペランドを持つような命令が、前記中央演算装置におい
て投機的に実行されないように、投機的実行を制限する
ためのＳＹＮＣ命令を生成して割り当てる手順を含むコ
ンパイル手順をコンピュータに実行させるためのコンパ
イルプログラムを記録したコンピュータ読取り可能な記
録媒体を要旨とする。According to the present invention, there is provided a central processing unit having a plurality of operation execution units and allocating a plurality of operation execution units when executing an instruction sequence of an execution program, or a plurality of instruction sequences of an execution program in advance. A compilation procedure for generating a program executable by a central processing unit that fetches and interprets a series of instructions assigned to each of the operation execution units, wherein the instructions are arranged on one path branched from a conditional branch instruction. And the instruction placed on one path when the output operand of the instruction is referenced by another instruction without being updated on the other path branched from the conditional branch instruction, and the speculatively executed instruction An instruction whose output operand matches one of the input operands of the instruction is executed speculatively in the central processing unit. No way, a gist of the computer readable recording medium recording the compiled program for executing the compiled instructions to the computer, including the steps of assigning to generate SYNC command to limit speculative execution.

【００４８】また、本発明は、実行プログラムの命令列
を予め複数の演算実行ユニットの各々に割り当てられた
一連の命令ごとにフェッチし解釈する中央演算装置で実
行可能なプログラムを生成するコンパイル手順であっ
て、前記プログラムの命令列を予め定められた特定の命
令で区切ったときに、プログラム区間のいずれに属する
命令であるかを識別するための識別番号、および命令の
投機性に関する情報を各命令毎に付与するとともに、プ
ログラム順で前記特定の命令よりも後方に位置する命令
を、投機的実行の原因となる該特定の命令を越えて前方
に移動する手順を含むコンパイル手順をコンピュータに
実行するためのコンパイルプログラムを記録したコンピ
ュータ読取り可能な記録媒体を要旨とする。The present invention also provides a compiling procedure for generating a program executable by a central processing unit which fetches and interprets an instruction sequence of an execution program for each of a series of instructions assigned to each of a plurality of operation execution units in advance. When an instruction sequence of the program is divided by a predetermined specific instruction, an identification number for identifying to which of the program sections the instruction belongs, and information on speculativeness of the instruction are stored in each instruction. A compiling procedure including a procedure of giving an instruction every time and moving an instruction located after the specific instruction in the program order beyond the specific instruction causing speculative execution is executed by the computer. And a computer-readable recording medium on which a compiling program is recorded.

【００４９】なお、装置に係る本発明は方法に係る発明
としても成立し、方法に係る本発明は装置に係る発明と
しても成立する。It should be noted that the present invention relating to the apparatus is also valid as an invention relating to the method, and the present invention relating to the method is also valid as an invention relating to the apparatus.

【００５０】また、コンパイル装置または方法に係る本
発明は、コンピュータに当該発明に相当する手順を実行
させるための（あるいはコンピュータを当該発明に相当
する手段として機能させるための、あるいはコンピュー
タに当該発明に相当する機能を実現させるための）プロ
グラムを記録したコンピュータ読取り可能な記録媒体と
しても成立する。The present invention relating to a compiling apparatus or method is provided for causing a computer to execute a procedure corresponding to the present invention (or for causing a computer to function as means corresponding to the present invention, or for causing a computer to execute the procedure corresponding to the present invention) The present invention is also realized as a computer-readable recording medium on which a program (for realizing a corresponding function) is recorded.

【００５１】本発明では、特定の命令、例えば、条件分
岐命令やロード・ストア命令の実行前に、プログラム順
においてこれらの命令の後になる命令を投機的に実行す
る命令処理方式において、予め定められた特定の命令の
いずれかに該当する命令から、いずれかの特定の命令に
該当する次の命令の直前の命令までのプログラム区間
（の各命令）に対して、プログラム区間ごとに固有の識
別番号を付加することにより、特定の命令を先頭とし次
の特定の命令の直前までの命令をグループとして管理す
ることで、命令が投機的に実行されたかどうかの判定を
容易にし（該識別番号の比較のみで命令の投機性判定を
可能とし）、さらに誤って投機的に実行された結果をキ
ャンセルするために必要な情報やハードウェアを、従来
の投機的命令実行方式に比較してより少なくすることが
できる（例えば、リオーダーバッファよりも単純なハー
ドウェアで、且つ、一般的なチェックポイントによる投
機的実行方式よりも少ない情報の保存により、効果的な
投機的実行を可能とする。）。According to the present invention, in the instruction processing method for executing speculatively, before executing a specific instruction, for example, a conditional branch instruction or a load / store instruction, instructions following these instructions in program order are speculated. A unique identification number for each program section from (instruction of) the program section from the instruction corresponding to any of the specific instructions to the instruction immediately before the next instruction corresponding to any of the specific instructions Is added to add a specific instruction to the head and manage the instructions immediately before the next specific instruction as a group, so that it is easy to determine whether the instruction has been executed speculatively (comparison of the identification numbers). Only the speculative execution of instructions can be performed), and the information and hardware necessary to cancel the result of speculative execution by mistake can be obtained using the conventional speculative instruction execution method. (Eg, simpler hardware than a reorder buffer and saving less information than a general checkpoint speculative execution method, resulting in more efficient speculative execution). Is possible.).

【００５２】このように本発明によれば、単純なハード
ウェアと少ない情報の保存によって、より効果的な命令
の投機的実行が可能になる。As described above, according to the present invention, more effective speculative execution of instructions becomes possible by simple hardware and storage of a small amount of information.

【００５３】また、これにより、単純なハードウェアが
要求されるプロセッサにおいても投機的実行が可能にな
るという効果がある。また、保存する情報が少ないため
により多くの分岐命令、ロード・ストア命令を越える投
機的実行が可能となり、一般的なチェックポイント方式
による投機的実行より性能が向上するという効果があ
る。さらに、本発明は、ｏｕｔ−ｏｆ−ｏｒｄｅｒ命令
発行ＶＬＩＷプロセッサのようなリオーダーバッファを
持たないｏｕｔ−ｏｆ−ｏｒｄｅｒ命令発行プロセッサ
においても、少ない情報の保存による投機的実行が可能
となるという効果がある。This also has the effect that speculative execution becomes possible even in a processor requiring simple hardware. Further, since there is less information to be stored, speculative execution exceeding more branch instructions and load / store instructions can be performed, and the performance is improved as compared with speculative execution using a general checkpoint method. Further, the present invention has an effect that speculative execution by saving a small amount of information becomes possible even in an out-of-order instruction issuing processor having no reorder buffer, such as an out-of-order instruction issuing VLIW processor. is there.

【００５４】[0054]

【発明の実施の形態】以下、図面を参照しながら発明の
実施の形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００５５】（第１の実施形態）本実施形態は、ｉｎ−
ｏｒｄｅｒ命令発行スーパースカラープロセッサで命令
を投機的に実行する場合の実施形態である。(First Embodiment) In this embodiment, an in-
This is an embodiment in the case where instructions are speculatively executed by an order instruction issuing superscalar processor.

【００５６】図１は、本実施形態に係るプロセッサの構
成例を示すブロック図である。図１に示されるように、
本実施形態のプロセッサは、メモリ１、アドレス生成部
２８、命令フェッチ部２、命令キュー３、命令デコード
部４、グループ値生成部６、デコード命令キュー７、オ
ペランド状態判定部５、命令実行ユニット８，９、ロー
ド・ストアユニット１０、分岐命令実行ユニット１１、
プロセッサ状態制御部３０、投機性判定部１２〜１５、
レジスタ退避部１６、レジスタ１７を持つ。FIG. 1 is a block diagram showing a configuration example of a processor according to the present embodiment. As shown in FIG.
The processor according to the present embodiment includes a memory 1, an address generation unit 28, an instruction fetch unit 2, an instruction queue 3, an instruction decode unit 4, a group value generation unit 6, a decoded instruction queue 7, an operand state determination unit 5, and an instruction execution unit 8. , 9, a load / store unit 10, a branch instruction execution unit 11,
Processor state control unit 30, speculative judgment units 12 to 15,
It has a register saving unit 16 and a register 17.

【００５７】なお、本プロセッサの持つ命令実行ユニッ
ト（８〜１１）の個数や機能は一例であり、これに限定
されるものではない。また、図１の例では、各命令実行
ユニット（８〜１１）に個別に対応する４つのレジスタ
ライトバス（２６〜２９）を設けた例を示しているが、
レジスタライトを伴う分岐命令を使用しない場合には分
岐命令実行ユニット１１に対応するレジスタライトバス
が省かれる。The number and functions of the instruction execution units (8 to 11) of the present processor are merely examples, and the present invention is not limited thereto. Also, in the example of FIG. 1, an example is shown in which four register write buses (26 to 29) respectively corresponding to the instruction execution units (8 to 11) are provided.
When a branch instruction with register write is not used, the register write bus corresponding to the branch instruction execution unit 11 is omitted.

【００５８】本実施形態では、プロセッサで実行すべき
命令列において、条件分岐命令またはロード・ストア命
令に該当する命令から始まり、条件分岐命令またはロー
ド・ストア命令に該当する次の命令の直前の命令で終わ
る部分を、「命令グループ」と呼ぶものとする。つま
り、命令グループの先頭の命令は、条件分岐命令または
ロード・ストア命令に該当する命令であり、かつ、その
命令グループにおいて唯一の条件分岐命令またはロード
・ストア命令に該当する命令である。また、本実施形態
では、プロセッサ内において、各命令がどの命令グルー
プに属するかを識別するために、各命令には、それが属
する命令グループに固有の「グループ値」が付与され
る。In this embodiment, in the instruction sequence to be executed by the processor, the instruction starting from the instruction corresponding to the conditional branch instruction or the load / store instruction, and the instruction immediately before the next instruction corresponding to the conditional branch instruction or the load / store instruction are used. The part ending with is called an “instruction group”. That is, the first instruction in the instruction group is an instruction corresponding to a conditional branch instruction or a load / store instruction, and an instruction corresponding to only one conditional branch instruction or a load / store instruction in the instruction group. Further, in the present embodiment, in order to identify which instruction group each instruction belongs to in the processor, each instruction is given a “group value” unique to the instruction group to which it belongs.

【００５９】図１のプロセッサの主なユニットの概要は
次の通りである。グループ値生成部６は、デコードされ
た命令に対して付与すべきグループ値を生成するための
ユニットである。オペランド状態判定部５は、デコード
された命令が使用するオペランドが使用可能かどうかを
判定するためのユニットである。プロセッサ状態制御部
３０は、条件分岐命令またはロード・ストア命令に該当
する命令の相互間においてそれらの完了順序が変わらな
いように制御するためのユニットである。投機性判定部
１２〜１５は、実行する命令が投機的かどうかをグルー
プ値により判断するためのユニットである。レジスタ退
避部１６は、命令が投機的に実行された場合に、対応す
るレジスタを更新する前に、そのレジスタの元の値を保
存しておくためのユニットである。The outline of the main units of the processor shown in FIG. 1 is as follows. The group value generation unit 6 is a unit for generating a group value to be given to the decoded instruction. The operand state determination unit 5 is a unit for determining whether the operand used by the decoded instruction is usable. The processor state control unit 30 is a unit for controlling the instructions corresponding to the conditional branch instruction or the load / store instruction so that their completion order does not change. The speculativeness judging units 12 to 15 are units for judging whether an instruction to be executed is speculative based on a group value. The register saving unit 16 is a unit for storing an original value of a register before updating a corresponding register when an instruction is speculatively executed.

【００６０】本実施形態のプロセッサは、概略的には、
条件分岐命令の分岐予測が外れた場合またはロード・ス
トア命令の実行時にページフォルトが生じた場合に、ト
ラップを掛けて、レジスタ退避部１６に保存されている
値をレジスタ１７に復帰してから命令の再実行を行なう
ものであり、グループ値の比較のみで命令の投機性判定
を可能とし且つ単純なハードウェアと少ない量の情報の
保存により命令を投機的に実行することができる。The processor of this embodiment is schematically
If the branch prediction of the conditional branch instruction is incorrect or a page fault occurs during the execution of the load / store instruction, a trap is set, the value stored in the register save unit 16 is restored to the register 17, and then the instruction is executed. The speculativeness of the instruction can be determined only by comparing the group values, and the instruction can be speculatively executed by simple hardware and saving a small amount of information.

【００６１】次に、図２に、本実施形態のプロセッサに
おけるパイプライン処理の流れの一例を示す。Next, FIG. 2 shows an example of the flow of pipeline processing in the processor of the present embodiment.

【００６２】命令フェッチ部２は、アドレス生成部２８
により示される命令アドレスをアドレスバス３３で指定
し、その命令から始まる複数の命令をメモリ１から命令
フェッチバス３４を介してフェッチする（ステップＳ１
１）。フェッチされた命令は、命令キューバス３２を介
して命令キュー３にフェッチ順に挿入される。The instruction fetch unit 2 includes an address generation unit 28
Are designated on the address bus 33, and a plurality of instructions starting from the instruction are fetched from the memory 1 via the instruction fetch bus 34 (step S1).
1). The fetched instructions are inserted into the instruction queue 3 via the instruction queue bus 32 in the order of fetch.

【００６３】命令デコード部４は、命令キュー３の先頭
から、複数の命令をデコードバス３５を介してデコード
し、各命令についてその種類と使用するオペランドを得
る（ステップＳ１２）。さらに、グループ値生成部６か
ら各命令のグループ値を得た（ステップＳ１３）後、デ
コードした命令をデコード命令キュー７に挿入する。The instruction decoding unit 4 decodes a plurality of instructions from the head of the instruction queue 3 via the decode bus 35, and obtains the type and operand of each instruction (step S12). Further, after obtaining the group value of each instruction from the group value generation unit 6 (step S13), the decoded instruction is inserted into the decoded instruction queue 7.

【００６４】デコードされた命令は、デコード命令キュ
ー７の先頭から順にオペランド状態判定部５によりオペ
ランドが使用可能かどうかについて判定された（ステッ
プＳ１４）後、オペランドが使用可能で且つその命令を
実行する命令実行ユニットに空きがあれば、その命令の
種類に従ってそのグループ値と共に命令実行ユニット
８、命令実行ユニット９、ロード・ストアユニット１
０、分岐命令実行ユニット１１のいずれかに、対応する
命令バス１８〜２１を介して転送され、オペランドを対
応するレジスタリードバス２２〜２５を介してレジスタ
１７からフェッチした後、命令が実行される（ステップ
Ｓ１５）。For the decoded instruction, the operand state determination unit 5 determines in order from the head of the decoded instruction queue 7 whether or not the operand is usable (step S14). Then, the operand is usable and the instruction is executed. If there is a free space in the instruction execution unit, the instruction execution unit 8, the instruction execution unit 9, the load / store unit 1 together with the group value according to the type of the instruction.
0, the instruction is transferred to any of the branch instruction execution units 11 via the corresponding instruction buses 18 to 21 and the operand is fetched from the register 17 via the corresponding register read buses 22 to 25, and then the instruction is executed. (Step S15).

【００６５】命令の実行後、対応する投機性判定部１２
〜１５により命令実行が投機的であったかどうかを判定
し（ステップＳ１７）、投機的である場合には、「変更
するレジスタの識別情報」と、「そのレジスタの変更前
の値」と、「その命令のグループ値」とを対応付けてレ
ジスタ退避部１６に登録する（ステップＳ１８）。After the execution of the instruction, the corresponding speculative judgment section 12
It is determined whether or not the instruction execution is speculative according to (15) (step S17). If the instruction execution is speculative, the "identification information of the register to be changed", the "value before change of the register", and the " The instruction group value is registered in the register saving unit 16 in association with the instruction group value (step S18).

【００６６】そして、プロセッサ状態制御部３０に状態
の更新を許可された実行ユニット（８〜１１）は、対応
するレジスタライトバス２６〜２９およびメモリバス３
１を介して実行結果をレジスタ１７およびメモリ１に書
き戻す（ステップＳ１９）。Then, the execution units (8 to 11) permitted to update the state by the processor state control unit 30 execute the corresponding register write buses 26 to 29 and memory bus 3
Then, the execution result is written back to the register 17 and the memory 1 via step 1 (step S19).

【００６７】ここで、命令の実行が投機的になるのは、
以下の場合である。・条件分岐命令の完了前に、その予
測された分岐先の命令を実行する場合・ロード・ストア
命令の完了前に、その後の命令を実行する場合まず、条
件分岐命令に関する投機的実行について説明する。The reason why the execution of the instruction becomes speculative is as follows.
The following is the case. When executing an instruction at a predicted branch destination before completion of a conditional branch instruction. When executing subsequent instructions before completion of a load / store instruction. First, speculative execution of a conditional branch instruction will be described. .

【００６８】条件分岐命令を実行する場合には、アドレ
ス生成部２８が備える分岐予測機構により、次にフェッ
チすべき命令アドレスが予測される（例：Ｊ．Ａ．Ｆｉ
ｓｈｅｒａｎｄＳ．Ｍ．Ｆｒｅｕｄｅｎｂｅｒｇｅ
ｒ，“ＰｒｅｄｉｃｔｉｎｇＣｏｎｄｉｔｉｏｎａｌ
ＢｒａｎｃｈＤｉｒｅｃｔｉｏｎｓｆｒｏｍＰｒｅ
ｖｉｏｕｓＲｕｎｓｏｆａＰｒｏｇｒａｍ”，
ｉｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ５ｔｈ
ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅ
ｏｎＡＳＰＬＯＳ，１９９１）。次の命令フェッチ
は条件分岐命令の完了を待たずに開始される。分岐命令
実行ユニット１１は、分岐予測結果と実際の分岐先とを
比較し、それらが異なる場合には実行中の全ての命令の
完了後に分岐予測ミストラップを掛ける。When executing a conditional branch instruction, an instruction address to be fetched next is predicted by a branch prediction mechanism provided in the address generation unit 28 (eg, JA Fi).
sher and S.M. M. Freudenberg
r, "Predicting Conditional
Branch Directions fromPre
visible Runs of a Program ",
in Proceedings of the 5th
International Conference
on ASPLOS, 1991). The next instruction fetch is started without waiting for the completion of the conditional branch instruction. The branch instruction execution unit 11 compares the branch prediction result with the actual branch destination, and if they are different, performs a branch prediction mistrap after completion of all the instructions being executed.

【００６９】プロセッサは掛けられたトラップの要因を
調べる手段を備える。トラップの要因としては、分岐予
測ミス、ページフォルト等が挙げられる。この手段によ
り分岐予測ミスによるトラップと判定された場合には
（ステップＳ１６）、まず、命令キュー３、デコード命
令キュー７を空にして、実行中のフェッチ処理、デコー
ド処理を無効化する（ステップＳ２０）。次に、レジス
タ退避部１６に登録されているレジスタの値を元の値に
戻し（ステップＳ２１）、グループ値生成部６の状態を
リセットする。そして、アドレス生成部２８の示すアド
レスを正しい分岐先に設定した後、トラップ処理を完了
する。The processor has means for examining the cause of the trap. Causes of the trap include a branch prediction error, a page fault, and the like. When it is determined by this means that the trap is caused by a branch prediction error (step S16), first, the instruction queue 3 and the decode instruction queue 7 are emptied to invalidate the fetch process and the decode process being executed (step S20). ). Next, the value of the register registered in the register saving unit 16 is returned to the original value (step S21), and the state of the group value generating unit 6 is reset. Then, after setting the address indicated by the address generation unit 28 to the correct branch destination, the trap processing is completed.

【００７０】次に、ロード・ストア命令を越える投機的
実行について説明する。Next, speculative execution exceeding a load / store instruction will be described.

【００７１】一般に、メモリからの読み出し、メモリへ
の書き込みには仮想アドレスを用いる。仮想アドレス領
域の一部がメモリ１上に保持されており、その他は外部
記憶装置（図示せず）に保存されている。ロード・スト
ア命令の実行時に、指定される仮想アドレスの値がメモ
リ１上に存在しない場合にはページフォルトトラップを
掛ける。Generally, virtual addresses are used for reading from and writing to memory. A part of the virtual address area is stored in the memory 1, and the others are stored in an external storage device (not shown). If the value of the specified virtual address does not exist in the memory 1 when executing the load / store instruction, a page fault trap is performed.

【００７２】上記のトラップの要因を調べる手段により
トラップの要因がページフォルトによるものと判定され
た場合には（ステップＳ１６）、上記の分岐予測ミスト
ラップと同様に、フェッチ処理、デコード処理の無効化
（ステップＳ２０）、レジスタ退避部１６の内容に従っ
たレジスタ値の復帰（ステップＳ２１）、グループ値生
成部６のリセットを行なう。また、仮想アドレス領域を
メモリ１上にロードするページ処理も同時に行なわれ
る。ページフォルトトラップの完了時には、アドレス生
成部２８はページフォルトトラップを掛けたロード・ス
トア命令の命令アドレスを示すよう設定される。If the means for examining the cause of the trap determines that the cause of the trap is a page fault (step S16), the fetch processing and the decoding processing are invalidated in the same manner as in the branch prediction mistrap. (Step S20), the register value is restored according to the contents of the register saving unit 16 (Step S21), and the group value generating unit 6 is reset. Also, page processing for loading the virtual address area onto the memory 1 is performed at the same time. Upon completion of the page fault trap, the address generation unit 28 is set to indicate the instruction address of the load / store instruction that has been subjected to the page fault trap.

【００７３】さて、以下では、本実施形態のプロセッサ
において命令が投機的に実行される際に各ユニットがど
のように動作するかについて、およびいくつかのユニッ
トの構成もしくは機能について、より詳しく説明する。Now, how the units operate when instructions are speculatively executed in the processor of the present embodiment, and the configuration or function of some units will be described in more detail. .

【００７４】以下の説明において、各パイプラインステ
ージで行なわれる処理を次のように定義する。・フェッチステージ（Ｆステージ）…メモリから命令を
フェッチする・デコードステージ（Ｄステージ）…フェッチした命令
をデコードする・実行ステージ（Ｅステージ）…命令実行ユニットを用
いた演算を行ない、演算の結果により必要に応じてトラ
ップを発生させる・メモリステージ（Ｍステージ）…ページフォルトの判
定、およびメモリからのデータの読み出し、メモリへの
データの書き込みを行なう・ライトバックステージ（Ｗステージ）…演算結果を出
力オペランドに書くまた、以下では、「命令ｉ」は命令番号ｉの命令を表す
こととする。In the following description, the processing performed in each pipeline stage is defined as follows. • Fetch stage (F stage): Fetch an instruction from memory • Decode stage (D stage): Decode the fetched instruction • Execution stage (E stage): Perform an operation using an instruction execution unit, and execute the operation based on the result of the operation. A trap is generated if necessary.-Memory stage (M stage): Determines page faults, reads data from memory, and writes data to memory.-Write-back stage (W stage): Outputs operation results. In the following, “instruction i” represents the instruction of the instruction number i.

【００７５】図３に、本プロセッサで実行するプログラ
ムの一例を示す。図３においては、命令１と命令３がロ
ード・ストア命令であり、命令５が条件分岐命令であ
る。FIG. 3 shows an example of a program executed by the present processor. In FIG. 3, instructions 1 and 3 are load / store instructions, and instruction 5 is a conditional branch instruction.

【００７６】以下、図３のプログラムを実行対象とした
場合を例にとってサイクルの流れに沿って説明する。な
お、説明を簡明にするため、図３のプログラムの開始時
点において、パイプラインで処理中の命令は存在しない
ものとする。Hereinafter, the case where the program shown in FIG. 3 is to be executed will be described along the flow of the cycle as an example. For the sake of simplicity, it is assumed that there are no instructions being processed in the pipeline at the start of the program in FIG.

【００７７】まず、サイクル１に、Ｆステージで命令フ
ェッチ部２は命令１〜４を順にフェッチして命令キュー
３に挿入する。本発明は同時フェッチ数がいくつのプロ
セッサであっても適用可能であり命令の投機実行を可能
とするが、本実施形態では同時フェッチ数を４としてい
る。First, in cycle 1, the instruction fetch unit 2 fetches instructions 1 to 4 in order at the F stage and inserts them into the instruction queue 3. Although the present invention is applicable to any number of processors having the same number of simultaneous fetches and enables speculative execution of instructions, the number of simultaneous fetches is set to 4 in this embodiment.

【００７８】次に、サイクル２に、Ｆステージで命令フ
ェッチ部２はまず命令番号５の条件分岐命令“ｂｅｑ
ｒ４，ｒ３，命令７（のアドレス）”をフェッチする。
このとき、条件分岐命令５の分岐先の予測が行なわれる
（この場合、命令６または命令７のいずれかになる）。
この例では分岐予測先を命令７とすると、続けて命令
７，８およびその次の命令（図示せず）がフェッチされ
る。分岐予測先（この場合、命令７）は条件分岐命令５
の実行が完了するまで保存される。Next, in cycle 2, at the F stage, the instruction fetch unit 2 first selects the conditional branch instruction "beq
r4, r3, and instruction 7 (the address thereof) "are fetched.
At this time, the branch destination of the conditional branch instruction 5 is predicted (in this case, either instruction 6 or instruction 7).
In this example, assuming that the branch prediction destination is instruction 7, instructions 7, 8 and the next instruction (not shown) are fetched subsequently. The branch prediction destination (in this case, instruction 7) is conditional branch instruction 5
Is saved until the execution is completed.

【００７９】一方、サイクル２にＤステージでは、命令
デコード部４は命令キュー３にある命令１〜４をデコー
ドし、命令の種類と使用するオペランドを得る。また、
各命令はグループ値生成部６から順にグループ値を得
る。その後、デコード命令キュー７に命令順に挿入され
る。デコード命令キュー７に空きがない場合には命令デ
コードは停止される。On the other hand, in the D stage in cycle 2, the instruction decoding unit 4 decodes the instructions 1 to 4 in the instruction queue 3 and obtains the instruction type and the operand to be used. Also,
Each instruction obtains a group value from the group value generator 6 in order. Thereafter, they are inserted into the decode instruction queue 7 in the order of instructions. If there is no free space in the decode instruction queue 7, instruction decoding is stopped.

【００８０】グループ値生成部６は、前述したように、
ロード・ストア命令もしくは条件分岐命令から始まり、
次のロード・ストア命令もしくは条件分岐命令の直前の
命令で終わる命令グループの個々の命令に対して、共通
のグループ値を割当てる。同時に実行されている命令グ
ループには、互いに異なるグループ値が割当てられる。As described above, the group value generating unit 6
Starting with a load / store instruction or conditional branch instruction,
A common group value is assigned to each instruction in the instruction group ending with the instruction immediately before the next load / store instruction or conditional branch instruction. Instruction groups that are being executed simultaneously are assigned different group values.

【００８１】ここで、図４に、グループ値生成部６の一
実現例を示す（なお、図４中に示されているプログラム
例は図３とは異なるものである）。Here, FIG. 4 shows an example of realizing the group value generating section 6 (the program example shown in FIG. 4 is different from that of FIG. 3).

【００８２】グループ値生成部６の一つの実現として
は、どのグループ値が使用中であるか否かを示すビット
ベクタによる実現が考えられる。As one implementation of the group value generation unit 6, an implementation using a bit vector indicating which group value is in use or not can be considered.

【００８３】ロード・ストア命令または条件分岐命令に
グループ値を割当てる際には、対応するビット（使用中
であるか否かを示すビット）が０の状態にあるグループ
値のうちから、所定の選択手順に従って１つを選択し
（例えば、値が最大のものを選択し）、その選択したグ
ループ値を割当てる。このとき、そのグループ値に対応
するビットを１の状態に変更し、グループ値生成部６が
備えるバッファの値を、このグループ値に更新する。一
方、他の種類の命令にグループ値を割当てる際には、そ
の時点において上記バッファに設定されている値を割当
てる。When assigning a group value to a load / store instruction or a conditional branch instruction, a predetermined selection is made from group values in which the corresponding bit (bit indicating whether or not it is in use) is 0. One is selected according to the procedure (e.g., the one with the largest value) and the selected group value is assigned. At this time, the bit corresponding to the group value is changed to 1 and the value of the buffer provided in the group value generation unit 6 is updated to this group value. On the other hand, when assigning a group value to another type of instruction, a value set in the buffer at that time is assigned.

【００８４】あるグループ値を持つ全ての命令の実行が
完了したら、そのグループ値に対応するビットを０とす
る。上記ビットベクタの全てのビットが１の場合には、
命令デコード部４は未使用のグループ値が生じるまで命
令のデコードを停止する。When execution of all instructions having a certain group value is completed, the bit corresponding to the group value is set to 0. When all the bits of the above bit vector are 1,
The instruction decoding unit 4 stops decoding the instruction until an unused group value is generated.

【００８５】なお、図４のグループ値に対応する実行中
か否かを示すフラグは、後述する投機性判定部１２に関
するものである。The flag corresponding to the group value shown in FIG. 4, which indicates whether or not the program is being executed, is related to the speculativeness judging section 12 described later.

【００８６】図５に、グループ値生成部６の他の実現例
を示す（なお、図５中に示されているプログラム例は図
３とは異なるものである）。FIG. 5 shows another example of realizing the group value generator 6 (note that the example of the program shown in FIG. 5 is different from that of FIG. 3).

【００８７】グループ値生成部６の別の実現としては、
ハードウェアカウンタによる実現も考えられる。As another implementation of the group value generation unit 6,
Implementation using a hardware counter is also conceivable.

【００８８】ロード・ストア命令または条件分岐命令に
対してグループ値を付加する際には、前もってグループ
値カウンタの値を１増やす。一方、他の種類の命令に対
してグループ値を付加する際には、グループ値カウンタ
の値は変更しない。そして、命令に対してグループカウ
ンタの値をグループ値として付加する。グループカウン
タの値が最大値を示している場合には、カウンタ値を１
増やす代わりに値を０にセットして０をグループ値とす
る。When adding a group value to a load / store instruction or a conditional branch instruction, the value of the group value counter is increased by one in advance. On the other hand, when adding a group value to another type of instruction, the value of the group value counter is not changed. Then, the value of the group counter is added to the instruction as a group value. When the value of the group counter indicates the maximum value, the counter value is set to 1
Instead of increasing the value, the value is set to 0, and 0 is set as a group value.

【００８９】ハードウェアカウンタ方式では、実行中の
命令のグループ値と新たに割当てるグループ値とが重複
しないことを保証するために、グループ値生成部６は実
行命令カウンタを備える。実行命令カウンタは、初期状
態では上記グループ値カウンタと同じ値で、ロード・ス
トア命令または条件分岐命令の実行が完了した際に値が
１増える。実行命令カウンタが最大値の場合には値を０
とする。命令デコード部４は、付加する値が実行命令カ
ウンタと等しい命令グループ以後の命令が、対応する実
行ユニットに転送されないよう制御する。In the hardware counter method, the group value generator 6 has an execution instruction counter in order to guarantee that the group value of the instruction being executed does not overlap with the newly assigned group value. The execution instruction counter has the same value as the group value counter in the initial state, and increases by one when the execution of the load / store instruction or the conditional branch instruction is completed. 0 if the execution instruction counter is at the maximum value
And The instruction decoding unit 4 controls so that instructions subsequent to the instruction group whose value to be added is equal to the execution instruction counter are not transferred to the corresponding execution unit.

【００９０】グループ値生成部６の初期状態は、図４の
ようなビットベクタ方式では全てのビットが０の状態、
図５のようなハードウェアカウンタ方式ではグループ値
カウンタと実行命令カウンタが共に０の状態である。分
岐予測ミストラップまたはページフォルトトラップが発
生した場合には、トラップ処理の完了後、グループ値生
成部６の状態は初期状態に戻る。The initial state of the group value generator 6 is such that all bits are 0 in the bit vector system as shown in FIG.
In the hardware counter method as shown in FIG. 5, both the group value counter and the execution instruction counter are 0. When a branch prediction mistrap or a page fault trap occurs, the state of the group value generation unit 6 returns to the initial state after the completion of the trap processing.

【００９１】次に、サイクル３に、デコード命令キュー
７の先頭の命令から順にオペランドが利用可能かどうか
をオペランド状態判定部５により判定し、利用可能であ
れば使用中でない実行ユニットに対して命令を転送す
る。この例では、まず、命令番号１の命令“ｌｗｒ
３，playlevel ”のレジスタｒ３が利用可能と判定さ
れ、該命令１がロード・ストアユニット１０に送られ
る。次に、命令番号２の命令“ｌｉｒ１２，２”のレ
ジスタｒ１２が利用可能と判定され、該命令２が命令実
行ユニット８に送られる。次に、命令番号３の命令“ｓ
ｗｒ１２，ｑｕｉｃｋｌｉｂｓ”はレジスタｒ１２が
利用できないため、命令の転送はここで終了する。な
お、この場合には、仮にレジスタｒ１２が利用可能であ
っても、命令１がロード・ストアユニット１０を使用し
ているために命令３は実行できないことになる。転送れ
さた命令（ここでは、命令１と命令２）については、Ｅ
ステージにて各実行ユニット上でその実行が開始され
る。Next, in cycle 3, the operand state determination unit 5 determines whether or not operands are available in order from the first instruction in the decode instruction queue 7, and if available, an instruction is issued to an execution unit that is not being used. To transfer. In this example, first, the instruction “lwr” of the instruction number 1
3, playlevel "is determined to be available, and the instruction 1 is sent to the load / store unit 10. Next, it is determined that the register r12 of the instruction" li r12, 2 "of instruction number 2 is available. , The instruction 2 is sent to the instruction execution unit 8. Next, the instruction "s" of the instruction number 3 is sent.
The instruction transfer ends here because the register r12 cannot be used for “wr12, quicklibs”. In this case, even if the register r12 is available, the instruction 1 uses the load / store unit 10. Therefore, the instruction 3 cannot be executed because of the transfer of the instruction (here, the instruction 1 and the instruction 2).
The execution is started on each execution unit at the stage.

【００９２】オペランド状態判定部５は、前述したよう
に、実行中の命令の中に指定されたレジスタに書く命令
が存在するかどうかを調べ、存在しなければレジスタは
利用可能、存在していれば利用不可能と判定する手段で
ある。この手段は、例えば従来のｉｎ−ｏｒｄｅｒスー
パースカラープロセッサが持つようなレジスタスコアボ
ード方式により実現可能である。レジスタスコアボード
方式は、レジスタごとに利用可能か否かを示すスコアボ
ードを持ち、命令を実行ユニットに送る際に、その命令
が書くレジスタのスコアボードを利用中の状態にし、命
令の実行が完了した時点で該レジスタのスコアボードを
利用可能の状態にするものである。As described above, the operand state determination unit 5 checks whether or not there is an instruction to be written in the designated register in the instruction being executed. If the instruction does not exist, the register can be used or the register can be used. Is a means for determining that the service is unavailable. This means can be realized by, for example, a register scoreboard method as in a conventional in-order superscalar processor. The register scoreboard method has a scoreboard that indicates whether or not each register can be used.When an instruction is sent to an execution unit, the scoreboard of the register written by that instruction is used, and the execution of the instruction is completed. At this point, the scoreboard of the register is made usable.

【００９３】一方、サイクル３にＤステージでは、命令
デコード部４は、命令５、命令７、命令８およびその次
の命令（図示せず）をデコードしてデコード命令キュー
７に挿入する。Ｆステージでは、命令フェッチ部２は前
サイクルにフェッチした命令の後の命令（図示せず）を
フェッチする。以後のサイクルにおいても、命令フェッ
チ部２はＦステージで命令キュー３に空きがある限り命
令をフェッチし、命令デコード部４はＤステージでデコ
ード命令キュー７に空きがある限り命令をデコードす
る。On the other hand, in the D stage in cycle 3, the instruction decoding unit 4 decodes the instruction 5, the instruction 7, the instruction 8, and the next instruction (not shown) and inserts them into the decoded instruction queue 7. In the F stage, the instruction fetch unit 2 fetches an instruction (not shown) after the instruction fetched in the previous cycle. Also in the subsequent cycles, the instruction fetch unit 2 fetches instructions as long as there is a vacancy in the instruction queue 3 at the F stage, and the instruction decode unit 4 decodes the instructions as long as there is a vacancy in the decoded instruction queue 7 at the D stage.

【００９４】続いて、サイクル５に、命令番号２の命令
“ｌｉ”の演算が終了し、命令番号１の命令“ｌｗ”は
まだ実行中であるとする。投機性判定部１２は、命令２
の実行が投機的かどうかを判定する。この例では、命令
１がページフォルトトラップを起こすかどうかが確定し
ていないため、命令２の実行は投機的と判定される。Subsequently, it is assumed that the operation of the instruction "li" of the instruction number 2 is completed in the cycle 5, and the instruction "lw" of the instruction number 1 is still being executed. The speculativeness judging unit 12 outputs the instruction 2
Is determined to be speculative. In this example, execution of the instruction 2 is determined to be speculative because it is not determined whether the instruction 1 causes a page fault trap.

【００９５】投機性判定部１２は、グループ値生成部６
がビットベクタ方式で実現される場合には、図４のよう
に、各グループ値に対して該グループ値を持つロード・
ストア命令または条件分岐命令に該当する命令が実行中
であるか実行完了したかを示すビットベクタを備える。
この場合、あるグループ値を持つロード・ストア命令ま
たは条件分岐命令の実行開始時に、そのグループ値に対
応するビットを１とし、その命令の実行完了時に０にす
る。投機性判定部１２は、判定する命令のグループ値に
対応するビットが１のときには命令は投機的であると判
定し、０のときには非投機的であると判定する。The speculativeness judging section 12 includes the group value generating section 6
Is realized by the bit vector method, as shown in FIG.
It has a bit vector indicating whether an instruction corresponding to a store instruction or a conditional branch instruction is being executed or has been completed.
In this case, at the start of execution of a load / store instruction or conditional branch instruction having a certain group value, a bit corresponding to the group value is set to 1, and to 0 when the execution of the instruction is completed. The speculativeness determining unit 12 determines that the instruction is speculative when the bit corresponding to the group value of the instruction to be determined is 1, and determines that the instruction is non-speculative when the bit is 0.

【００９６】一方、グループ値生成部６がハードウェア
カウンタ方式で実現されている場合には、投機性判定部
１２は、ビットベクタを備える必要はない。この場合に
は、オペランド状態判定部５が備えるグループ値カウン
タの値ｘと、実行命令カウンタの値ｙと、判定する命令
のグループ値ｚとから、図６により投機的か否かを判定
することができる。On the other hand, when the group value generator 6 is realized by a hardware counter method, the speculativeness determiner 12 does not need to include a bit vector. In this case, it is determined from FIG. 6 whether or not it is speculative from the value x of the group value counter provided in the operand state determination unit 5, the value y of the execution instruction counter, and the group value z of the instruction to be determined. Can be.

【００９７】さて、投機性の判定後、同じくサイクル５
に、プロセッサ状態制御部３０は命令番号２の命令“ｌ
ｉｒ１２，２”によりプロセッサの状態を更新してよ
いかどうか判定する。この例では、命令２により更新さ
れるレジスタｒ１２は汎用レジスタであるので、状態の
更新が許可される。プロセッサの状態の更新が許可され
た場合、ここでは命令２は投機的であるため、まず、レ
ジスタ番号“１２”と、レジスタｒ１２の（更新前の）
値と、命令２のグループ値“１”とがレジスタ退避部１
６に登録される。そして、サイクル５のＷステージでレ
ジスタｒ１２の値を命令２の演算結果に更新する。Now, after the determination of the speculative property, the cycle 5
First, the processor state control unit 30 executes the instruction “l
It is determined whether or not the state of the processor may be updated according to ir12,2 ". In this example, since the register r12 updated by the instruction 2 is a general-purpose register, the update of the state is permitted. If the update is permitted, the instruction 2 is speculative here, and therefore, first, the register number “12” and the register r12 (before the update)
The value and the group value “1” of the instruction 2 are stored in the register save unit 1
6 is registered. Then, in the W stage of cycle 5, the value of the register r12 is updated to the operation result of the instruction 2.

【００９８】プロセッサ状態制御部３０は、前述のよう
に、ロード・ストア命令および条件分岐命令の実行完了
順序が変化しないことを保証する。プロセッサ状態制御
部３０は、命令の種類とその命令の投機性から、図７に
例示した条件により状態更新を許可する。例えば、上記
の命令番号２の命令“ｌｉｒ１２，２”は図７におい
てはその他の命令に該当し投機的であっても更新が許可
される。As described above, the processor state control unit 30 guarantees that the execution completion order of the load / store instruction and the conditional branch instruction does not change. The processor state control unit 30 permits the state update based on the condition illustrated in FIG. 7 from the type of the instruction and the speculativeness of the instruction. For example, the instruction “lir12, 2” of the instruction number 2 corresponds to the other instruction in FIG. 7 and the update is permitted even if it is speculative.

【００９９】なお、図７においては、その他の命令の代
表的なものが汎用レジスタを変える命令であり、制御レ
ジスタを変える命令の代表的なものが乗除算用レジスタ
を変える命令である。ただし、図７の条件は、一例であ
り、プロセッサの仕様や命令セットにより適宜設定すれ
ばよい。基本的には、非投機的の場合には全ての命令の
種類において「許可」であり、投機的の場合には少なく
ともロード・ストア命令または条件分岐命令に該当する
命令は「不許可」とする。そして、投機的の場合でロー
ド・ストア命令および条件分岐命令以外の命令の場合に
は、例えば、「出力先として明示されていない特殊レジ
スタに書く命令」あるいは「あるレジスタを変更する命
令であってそのレジスタを変更したらその実行時点で何
らかのアクションが発生するような命令」については
「不許可」とし、「出力先として明示される汎用レジス
タに書く命令」あるいは「あるレジスタを変更する命令
であってそのレジスタを変更してもその実行時点ではア
クションが発生せず後に他の命令による判断の結果とし
てアクションが発生するような命令」については「許
可」とする。例えば、コンディションレジスタを変える
命令が、投機的の場合に、あるプロセッサにおいては
「許可」となり、別のプロセッサにおいては「不許可」
となることもあり得る。In FIG. 7, other typical instructions are instructions for changing general-purpose registers, and typical instructions for changing control registers are instructions for changing multiplication / division registers. However, the conditions in FIG. 7 are merely examples, and may be appropriately set according to the specifications of the processor and the instruction set. Basically, in the case of non-speculative, it is "permitted" for all instruction types, and in the case of speculative, at least instructions corresponding to load / store instructions or conditional branch instructions are "disallowed". . In the case of an instruction other than a load / store instruction and a conditional branch instruction in a speculative case, for example, an "instruction to write in a special register not explicitly specified as an output destination" or an "instruction to change a certain register" Instructions that cause an action to occur at the time of execution when the register is changed ”are“ disabled ”and“ instructions to write to general purpose registers specified as output destinations ”or“ instructions to change a certain register. An instruction in which no action occurs at the time of execution even if the register is changed and an action occurs later as a result of determination by another instruction is "permitted". For example, if the instruction to change the condition register is speculative, it becomes "permitted" in one processor and "disabled" in another processor.
It is possible that

【０１００】また、レジスタ退避部１６は、投機的な命
令が誤りだった場合にプロセッサを投機的命令実行が行
なわれる前の状態に戻すために必要な情報を保持するキ
ューである。The register saving section 16 is a queue for holding information necessary for returning the processor to a state before the execution of the speculative instruction when the speculative instruction is an error.

【０１０１】なお、レジスタ退避部１６には、同一のレ
ジスタに関する情報が複数組登録されることもある。図
８の例（なお、図８中に示されているプログラム例は図
３とは異なるものである）では、（２）の命令“ａｄｄ
ｒ２，ｒ３，ｒ４”と（５）の命令“ｌｉｒ２，１
００”が投機的に実行されたため、レジスタ退避部１６
にレジスタｒ２に関する情報が２つ登録されている。Note that a plurality of sets of information about the same register may be registered in the register saving unit 16. In the example of FIG. 8 (the program example shown in FIG. 8 is different from that of FIG. 3), the instruction “add” of (2)
r2, r3, r4 "and the instruction" li r2,1
00 ”was executed speculatively, so that the register
In the register r2 are registered.

【０１０２】レジスタ退避部１６への情報の登録、削除
および情報に基づくレジスタ値の復帰は、以下の規則に
従い行なわれる。・投機的な命令（図７の場合、その他の命令）がプロセ
ッサ状態を更新する際、その命令が変更するレジスタの
識別情報と、そのレジスタの変更前の値と、その命令の
グループ値とが登録される・登録されている情報のグループ値と同一のグループ値
を持つロード・ストア命令または条件分岐命令が、トラ
ップを生じることなく完了した際、その情報を削除する・分岐予測ミストラップまたはページフォルトトラップ
が生じた際、登録されている全てのレジスタの値を最も
新しく登録されたものから順に登録値に戻し、その情報
を削除する図８の例では、（１）の命令“ｌｗ”がページフォルト
を発生させた場合にはレジスタｒ２は１５８３２に戻さ
れ、（４）の命令“ｂｅｑ”が命令（５）（図示せず）
に分岐すると予測し、分岐予測ミスを発生させた場合に
はレジスタｒ２は４５に戻される。Registration and deletion of information in the register saving section 16 and restoration of a register value based on information are performed according to the following rules. When a speculative instruction (in FIG. 7, other instructions) updates the processor state, the identification information of the register to be changed by the instruction, the value before the change of the register, and the group value of the instruction are Registered ・ When a load / store instruction or conditional branch instruction having the same group value as the registered information is completed without causing a trap, the information is deleted. ・ Branch prediction mistrap or page When a fault trap occurs, the values of all the registered registers are returned to the registered values in order from the most recently registered one, and the information is deleted. In the example of FIG. 8, the instruction “1w” of (1) is When a page fault occurs, the register r2 is returned to 15832, and the instruction “beq” of (4) is replaced with the instruction (5) (not shown).
When a branch prediction error occurs, the register r2 is returned to 45.

【０１０３】続いて、サイクル１０に、図３の命令番号
１の命令“ｌｗ”の実行がトラップを生じることなく完
了すると、レジスタ退避部１６に退避した情報で命令１
と同一のグループ値１を持つもの、すなわち先のサイク
ル５でレジスタ退避部１６に退避したレジスタｒ１２に
関する情報が削除される。この時点でロード・ストアユ
ニット１０が使用可になるため、命令番号３の命令“ｓ
ｗ”から順にオペランド状態判定部５による判定の後、
各実行ユニットへ転送される。命令番号３の命令“ｓ
ｗ”はロードストア・ユニット１０へ、命令番号４の命
令“ａｄｄｉｕ”は命令実行ユニット８へ、命令番号５
の命令“ｂｅｑ”は分岐命令実行ユニット１１へ、命令
番号７の命令“ｍｏｖｅ”は命令実行ユニット９へ転送
され、Ｅステージで命令が実行される。Subsequently, in cycle 10, when the execution of the instruction “lw” of the instruction number 1 in FIG. 3 is completed without causing a trap, the instruction 1
The information having the same group value 1 as that of the register r12, that is, the information on the register r12 saved in the register saving unit 16 in the previous cycle 5 is deleted. At this point, since the load / store unit 10 becomes usable, the instruction “s
w "in order from the operand state determination unit 5,
Transferred to each execution unit. Instruction "s" of instruction number 3
w "to the load store unit 10, the instruction" addiu "of instruction number 4 to the instruction execution unit 8, and the instruction number 5
The instruction “beq” is transferred to the branch instruction execution unit 11, and the instruction “move” of the instruction number 7 is transferred to the instruction execution unit 9, and the instruction is executed in the E stage.

【０１０４】次に、サイクル１１に、命令４、命令７の
演算が終了し、命令３、命令５はまだ実行中であるとす
る。命令４，７はそれぞれ投機性判定部１２，１３によ
り投機的と判定され、プロセッサ状態制御部３０により
状態更新が許可される。その後、命令４が変更するレジ
スタｒ１３に関する情報と、命令７が変更するレジスタ
ｒ６に関する情報とがレジスタ退避部１６に登録され、
Ｗステージにこれらのレジスタの値が更新される。Next, it is assumed that the operations of the instructions 4 and 7 are completed in the cycle 11, and the instructions 3 and 5 are still being executed. The instructions 4 and 7 are determined to be speculative by the speculativeness determining units 12 and 13, respectively, and the processor state control unit 30 permits the state update. Thereafter, information on the register r13 changed by the instruction 4 and information on the register r6 changed by the instruction 7 are registered in the register saving unit 16,
The values of these registers are updated in the W stage.

【０１０５】次に、サイクル１２に、命令番号５の命令
“ｂｅｑ”の演算は終了し、命令番号３の命令“ｓｗ”
はまだ実行中であるとする。命令５はプロセッサ状態制
御部３０により状態更新が許可されないため完了せず、
ロード・ストアユニット１０を占有したまま命令３の完
了を待つ。Next, in cycle 12, the operation of the instruction “beq” of the instruction number 5 is completed, and the instruction “sw” of the instruction number 3 is completed.
Is still running. The instruction 5 is not completed because the state update is not permitted by the processor state control unit 30.
Wait for the completion of the instruction 3 while occupying the load / store unit 10.

【０１０６】続いて、サイクル１５に、命令３の実行が
ページフォルトトラップを生じることなく完了すると、
レジスタ退避部１６のレジスタｒ１３に関する情報（命
令３と同一のグループ値２を持つ情報）は削除される。Subsequently, in cycle 15, when execution of instruction 3 is completed without generating a page fault trap,
Information on the register r13 of the register saving unit 16 (information having the same group value 2 as the instruction 3) is deleted.

【０１０７】そして、サイクル１６に、命令５はプロセ
ッサ状態制御部３０により状態更新が許可されるため、
実行を完了してレジスタ退避部１６のレジスタｒ６に関
する情報（命令５と同一のグループ値３を持つ情報）が
削除される。Then, in cycle 16, the instruction 5 is permitted to be updated in state by the processor state control unit 30.
Upon completion of the execution, information on the register r6 of the register saving unit 16 (information having the same group value 3 as the instruction 5) is deleted.

【０１０８】一方、上述した図３のプログラムの流れに
おいて、命令番号５の命令“ｂｅｑｒ４，ｒ３，命令７
（のアドレス）”が分岐予測ミストラップを生じた場
合、および命令番号３の命令“ｓｗｒ１２，ｑｕｉｃ
ｋｌｉｂｓ”がページフォルトトラップを生じた場合の
動作は以下のようになる。On the other hand, in the program flow of FIG. 3 described above, the instruction “beqr4, r3, instruction 7
(Address of) causes a branch prediction mistrap, and the instruction “swr12, quick
klibs "generates a page fault trap as follows.

【０１０９】サイクル１２の命令５の演算終了時に予測
した分岐先と実際の分岐先が異なることが判明したら
（本例では、予測した分岐先が命令７であったのに対し
て実際の分岐先が命令６であった場合）、命令の完了時
に分岐予測ミストラップを掛ける。分岐予測ミストラッ
プが生じると、命令キュー３およびデコード命令キュー
７の内容が全て消去される。この例では、命令８はデコ
ード命令キュー７から消去される命令の一つになる。そ
の後、レジスタ退避部１６に登録されているレジスタが
退避した値に戻される。この例では、命令５の完了時に
は命令３が完了し、レジスタｒ１３に関する情報はレジ
スタ退避部１６から削除されているため、レジスタｒ６
の値だけが変更される。その後、アドレス生成部２８の
示すアドレスを命令６の命令アドレスに設定してトラッ
プ処理を終える。If it is found that the predicted branch destination is different from the actual branch destination at the end of the operation of the instruction 5 in the cycle 12 (in this example, the predicted branch destination is the instruction 7 and the actual branch destination is Is instruction 6), a branch prediction mistrap is applied when the instruction is completed. When the branch prediction mistrap occurs, the contents of the instruction queue 3 and the decoded instruction queue 7 are all erased. In this example, the instruction 8 is one of the instructions to be deleted from the decode instruction queue 7. After that, the register registered in the register saving unit 16 is returned to the saved value. In this example, when the instruction 5 is completed, the instruction 3 is completed, and the information on the register r13 has been deleted from the register saving unit 16;
Only the value of is changed. After that, the address indicated by the address generation unit 28 is set as the instruction address of the instruction 6, and the trap processing ends.

【０１１０】また、サイクル１４にＭステージで命令３
の実行時にストアする仮想アドレスが実メモリ領域に対
応しないことが判明したら、命令の完了時にページフォ
ルトトラップを掛ける。トラップ時の動作は分岐予測ミ
ストラップの動作と同様である。この例では、命令３が
ページフォルトトラップを生じた時点で、レジスタ退避
部１６にはレジスタｒ１３とレジスタｒ６が登録されて
いるので、これらのレジスタの値が変更される。その
後、アドレス生成部２８の示すアドレスを命令３の命令
アドレスに設定してトラップ処理を終える。In cycle 14, instruction 3 at M stage
If it is found that the virtual address to be stored does not correspond to the real memory area when executing, a page fault trap is set when the instruction is completed. The operation at the time of the trap is the same as the operation of the branch prediction mistrap. In this example, when the instruction 3 causes a page fault trap, the registers r13 and r6 are registered in the register saving unit 16, so that the values of these registers are changed. Thereafter, the address indicated by the address generation unit 28 is set as the instruction address of the instruction 3, and the trap processing is completed.

【０１１１】ところで、ロード・ストア命令によるペー
ジフォルトの判定、および条件分岐命令による分岐予測
ミスの判定が、これらの命令と同一のグループ値を持つ
他の命令の完了までに行なわれるという条件が満たされ
れば、投機性判定部１２〜１５およびレジスタ退避部１
６を省くことができる。この場合、実行完了した命令は
トラップによりキャンセルされない場合には投機的では
ない。図１のように実行ユニットに転送されている命令
が最大でも１であるプロセッサでは、トラップの判定が
完了する時間と、各実行ユニットの演算が完了する時間
を等しくすることでこの条件を満たすことができる。一
方、図９のように、各実行ユニット３０８は命令デコー
ド部３０７との間にそれぞれキュー３０９を備え、デコ
ードされた命令は実行ユニットが使用できない場合には
そのキューに入れられ、実行ユニットはキューの先頭に
ある命令から順に実行するという構成では、命令の転送
から実行完了までの時間が一意に決まらないため、投機
性判定部１２〜１５およびレジスタ退避部１６を備える
必要がある。By the way, the condition that the determination of a page fault by a load / store instruction and the determination of a branch misprediction by a conditional branch instruction are performed before completion of another instruction having the same group value as these instructions is satisfied. Then, the speculativeness judging units 12 to 15 and the register saving unit 1
6 can be omitted. In this case, if the executed instruction is not canceled by the trap, it is not speculative. In a processor in which the number of instructions transferred to the execution unit is at most 1 as shown in FIG. 1, satisfying this condition by making the time for completing the trap determination equal to the time for completing the operation of each execution unit. Can be. On the other hand, as shown in FIG. 9, each execution unit 308 has a queue 309 between itself and the instruction decoding unit 307, and the decoded instruction is put in the queue when the execution unit is unavailable, and the execution unit is placed in the queue. Since the time from the transfer of the instruction to the completion of the execution is not uniquely determined in the configuration in which the instructions are sequentially executed from the head of the instruction, it is necessary to provide the speculativeness determination units 12 to 15 and the register saving unit 16.

【０１１２】（第２の実施形態）さて、本発明に係る投
機的命令実行機構は、ロード・ストア命令および条件分
岐命令を越えてｏｕｔ−ｏｆ−ｏｒｄｅｒで命令実行す
る場合に留まらない。以下では、本発明をＰｒｅｄｉｃ
ａｔｅ付き命令を含むプログラムコードに適用した場合
の実施形態について説明する。(Second Embodiment) The speculative instruction execution mechanism according to the present invention is not limited to executing instructions out-of-order beyond load / store instructions and conditional branch instructions. In the following, the present invention is referred to as Predic
An embodiment in which the present invention is applied to a program code including an instruction with an a.

【０１１３】Ｐｒｅｄｉｃａｔｅ付き命令実行機構は、
条件分岐を条件分岐命令とは異なる形で実現する機構で
ある。この機構では、各命令に付加されるＰｒｅｄｉｃ
ａｔｅと呼ばれるブール変数の値により、演算結果をレ
ジスタに反映するかどうかが制御される。The instruction execution mechanism with Predicate is:
This is a mechanism that implements a conditional branch in a different form from a conditional branch instruction. In this mechanism, Predic added to each instruction
The value of a Boolean variable called "ate" controls whether the result of the operation is reflected in the register.

【０１１４】図１０にＰｒｅｄｉｃａｔｅ付き命令を含
まないプログラムコードの一例を示し、図１１にＰｒｅ
ｄｉｃａｔｅ付き命令を含むプログラムコードの一例を
示す。図１１のプログラムコードは、図１０プログラム
コードと同じ内容をＰｒｅｄｉｃａｔｅを用いて表した
ものである。FIG. 10 shows an example of a program code which does not include the instruction with Predicate, and FIG.
4 shows an example of a program code including an instruction with a digit. The program code in FIG. 11 shows the same contents as the program code in FIG. 10 using Predicate.

【０１１５】図１１の（１）のＰｓｅｑ命令（Ｐｒｅｄ
ｉｃａｔｅセット命令）は、条件が成立した場合、本例
ではレジスタｒ３の値とレジスタｒ４の値とが等しい場
合に、ｐ１にＰｒｅｄｉｃａｔｅ値として１を、ｐ２に
Ｐｒｅｄｉｃａｔｅ値として０をセットする。一方、条
件が成立しない場合、本例ではレジスタｒ３の値とレジ
スタｒ４の値とが等しくない場合に、ｐ１を０に、ｐ２
を１にセットする。また、図１１の（２）〜（４）のよ
うに、＜ｐ１＞あるいは＜ｐ２＞のようなＰｒｅｄｉｃ
ａｔｅが付いた命令については、＜＞内の変数がＰｓ
ｅｑ命令により１にセットされた場合にのみ、その命令
の結果がレジスタに反映される。一方、（５）のよう
に、Ｐｒｅｄｉｃａｔｅが付いていない命令について
は、常に、その命令の結果がレジスタに反映される。The Pseq instruction (Pred) shown in (1) of FIG.
In the present example, if the condition is satisfied, and in this example, the value of the register r3 is equal to the value of the register r4, 1 is set to p1 as the Predicate value and 0 is set to p2 as the Predicate value. On the other hand, when the condition is not satisfied, in this example, when the value of the register r3 is not equal to the value of the register r4, p1 is set to 0, p2
Is set to 1. Also, as shown in (2) to (4) of FIG. 11, a Predic such as <p1> or <p2> is used.
For the instruction with ate, the variable in <> is Ps
Only when set to 1 by the eq instruction, the result of that instruction is reflected in the register. On the other hand, as for the instruction without Predicate as in (5), the result of the instruction is always reflected in the register.

【０１１６】したがって、図１１のプログラムコードに
おいて、レジスタｒ３の値とレジスタｒ４の値とが等し
いときは、Ｐｓｅｑ命令によりｐ１に１、ｐ２に０がセ
ットされ、このとき＜ｐ１＞が付いた命令“ｓｌｌｒ
６，ｒ１０，２”，“ｌｉｒ５，１”と、Ｐｒｅｄｉｃ
ａｔｅが付いていない命令“ｍｏｖｅｒ２，ｒ５”の
結果だけがレジスタに反映されることになるため、図１
０でＬ２に分岐した場合と同様の結果が得られる。同様
に、レジスタｒ３の値とレジスタｒ４の値とが等しくな
いときは、ｐ１は０、ｐ２は１にセットされ、このとき
＜ｐ２＞が付いた命令“ｌｉｒ５，０”と、Ｐｒｅｄ
ｉｃａｔｅが付いていない命令“ｍｏｖｅｒ２，ｒ
５”の結果だけがレジスタに反映されることになるた
め、図１０でＬ２に分岐しなかった場合と同様の結果と
なる。Therefore, in the program code of FIG. 11, when the value of the register r3 is equal to the value of the register r4, 1 is set to p1 and 0 is set to p2 by the Pseq instruction. At this time, the instruction with <p1> is added. "Sll r
6, r10,2 "," lir5,1 "and Predic
Since only the result of the instruction “move r2, r5” without “ate” is reflected in the register, FIG.
The same result as in the case of branching to L2 at 0 is obtained. Similarly, when the value of the register r3 is not equal to the value of the register r4, p1 is set to 0 and p2 is set to 1. At this time, the instruction "li r5,0" with <p2> and the Pred
Instruction "move r2, r without icate"
Since only the result of 5 ″ is reflected in the register, the result is the same as the case where the process did not branch to L2 in FIG.

【０１１７】従来のＰｒｅｄｉｃａｔｅ付き命令実行機
構では、Ｐｒｅｄｉｃａｔｅの付いた命令は、そのＰｒ
ｅｄｉｃａｔｅが確定するまで実行されない。よって、
Ｐｒｅｄｉｃａｔｅをセットする命令の後にあるＰｒｅ
ｄｉｃａｔｅ付き命令を投機的に実行することはできな
い。本発明によれば、このようなＰｒｅｄｉｃａｔｅ付
き命令の投機的実行が可能となる。以下、本発明を適用
したＰｒｅｄｉｃａｔｅ付き命令の投機的実行方式につ
いて説明する。In the conventional instruction execution mechanism with Predicate, an instruction with Predicate is assigned to its Pr.
Not executed until edit is confirmed. Therefore,
Pre after the instruction to set Predicate
Instructions with dict cannot be executed speculatively. According to the present invention, speculative execution of such an instruction with Predicate becomes possible. Hereinafter, a speculative execution method of an instruction with Predicate to which the present invention is applied will be described.

【０１１８】なお、本実施形態に係るプロセッサの構成
や動作はＰｒｅｄｉｃａｔｅ付き命令に関する部分以外
は基本的には第１の実施形態と同様であるので、以下で
は、第１の実施形態と相違する点を中心に説明する。Note that the configuration and operation of the processor according to the present embodiment are basically the same as those of the first embodiment except for the part relating to the instruction with Predicate. Therefore, the following description is different from the first embodiment. This will be mainly described.

【０１１９】本実施形態では、図１１の（１）のような
Ｐｒｅｄｉｃａｔｅセット命令は、第１の実施形態にお
けるロード・ストア命令や条件分岐命令と同様に扱われ
る。すなわち、本実施形態では、プロセッサで実行すべ
き命令列において、条件分岐命令またはロード・ストア
命令またはＰｒｅｄｉｃａｔｅセット命令に該当する命
令から始まり、条件分岐命令またはロード・ストア命令
またはＰｒｅｄｉｃａｔｅセット命令に該当する次の命
令の直前の命令で終わる部分を、「命令グループ」とす
ることになる。同様に、プロセッサ状態制御部３０は、
条件分岐命令またはロード・ストア命令またはＰｒｅｄ
ｉｃａｔｅセット命令に該当する命令の相互間において
それらの完了順序が変わらないように制御することにな
る。例えば、図７においてもＰｒｅｄｉｃａｔｅセット
命令はロード・ストア命令や条件分岐命令と同様であ
る。In this embodiment, a Predicate set instruction as shown in FIG. 11A is handled in the same manner as the load / store instruction and the conditional branch instruction in the first embodiment. That is, in this embodiment, in the instruction sequence to be executed by the processor, the instruction sequence starts with an instruction corresponding to a conditional branch instruction, a load / store instruction, or a Predicate set instruction, and corresponds to a conditional branch instruction, a load / store instruction, or a Predicate set instruction. The part ending with the instruction immediately before the next instruction is referred to as an “instruction group”. Similarly, the processor state control unit 30
Conditional branch instruction or load / store instruction or Pred
Instructions corresponding to the icate set instruction are controlled so that their completion order is not changed. For example, in FIG. 7, the Predicate set instruction is the same as the load / store instruction and the conditional branch instruction.

【０１２０】また、プロセッサ内において、各命令がど
の命令グループに属するかを識別するために、各命令に
は、それが属する命令グループに固有の「グループ値」
が付与される点は、第１の実施形態と同様である。In order to identify which instruction group each instruction belongs to in the processor, each instruction has a “group value” unique to the instruction group to which it belongs.
Is the same as in the first embodiment.

【０１２１】オペランド状態判定部５は、図１１の
（２）〜（４）のように命令に付加されたＰｒｅｄｉｃ
ａｔｅに関しては状態の判定をせず、他のオペランドが
利用可能であれば利用可能とする。ただし、Ｐｒｅｄｉ
ｃａｔｅセット命令のようにＰｒｅｄｉｃａｔｅを変更
するような命令や、Ｐｒｅｄｉｃａｔｅを入力として演
算を行なうような命令については、そのＰｒｅｄｉｃａ
ｔｅが利用可能かが判定される。また、オペランド状態
判定部５は、Ｐｒｅｄｉｃａｔｅ付き命令の実行が完了
した場合でも、命令のＰｒｅｄｉｃａｔｅが確定するま
では、その命令が書くレジスタを利用可能にはしないで
おく。これは、レジスタスコアボード方式でオペランド
状態判定部５を構成した場合には、Ｐｒｅｄｉｃａｔｅ
付命令の投機的実行完了時にはスコアボードを利用中の
状態に保ち、後述するＰｒｅｄｉｃａｔｅセットトラッ
プ時にスコアボードを利用可能とすることで実現でき
る。The operand state judging section 5 includes the Predic added to the instruction as shown in (2) to (4) of FIG.
Regarding ate, the state is not determined, and if another operand is available, it can be used. However, Predi
For instructions such as changing the Predicate, such as a cat set instruction, and instructions for performing an operation with the Predicate as an input, the Predica is used.
It is determined whether te is available. Also, even when the execution of the instruction with Predicate is completed, the operand state determination unit 5 does not make the register written by the instruction available until the Predicate of the instruction is determined. This is because when the operand state determination unit 5 is configured by the register scoreboard method, Predicate
This can be realized by keeping the scoreboard in use when speculative execution of the attached instruction is completed, and making the scoreboard available at the time of Predicate set trap described later.

【０１２２】Ｐｒｅｄｉｃａｔｅセットトラップは、本
方式においてＰｒｅｄｉｃａｔｅセット命令の完了時に
掛けられるトラップである。トラップの要因を判定する
手段によりＰｒｅｄｉｃａｔｅセットによるトラップと
判定された場合には、パイプラインは消去されない。そ
して、レジスタ退避部１６に登録された情報が最も新し
く登録されたものから順に調べられ、その情報に基づき
Ｐｒｅｄｉｃａｔｅセットトラップ時の処理が行なわれ
る。The Predicate set trap is a trap that is set when a Predicate set instruction is completed in this method. If the trap determining unit determines that the trap is caused by the Predicate set, the pipeline is not deleted. Then, the information registered in the register evacuation unit 16 is checked in order from the most recently registered information, and the process at the time of the Predictate set trap is performed based on the information.

【０１２３】図１２に、Ｐｒｅｄｉｃａｔｅセットトラ
ップ時の処理手順の一例を示す。なお、図１２のｔｇは
トラップを発生した命令のグループ値である。この処理
では、Ｐｒｅｄｉｃａｔｅ値を０とする命令により投機
的に更新されたレジスタが、更新前の値に戻される。FIG. 12 shows an example of a processing procedure at the time of Predicate set trap. Note that tg in FIG. 12 is the group value of the instruction that caused the trap. In this process, the register speculatively updated by the instruction for setting the Predicate value to 0 is returned to the value before the update.

【０１２４】図１２において、まず、レジスタ退避部１
６に登録されている情報数ｎを求める（ステップＳ３
１）。ｎ＞０であれば（ステップＳ３２）、レジスタ退
避部１６に最後に登録された情報のグループ値ｇを得る
（ステップＳ３３）。ｇ＝ｔｇであれば（ステップＳ３
４）、レジスタ退避部１６に最後に登録された情報のｐ
ｒｅｄｉｃａｔｅ値ｐを得る（ステップＳ３５）。In FIG. 12, first, the register save unit 1
6 is obtained (step S3).
1). If n> 0 (step S32), the group value g of the information registered last in the register saving unit 16 is obtained (step S33). If g = tg (step S3
4), p of information registered last in the register saving unit 16
A redactate value p is obtained (step S35).

【０１２５】ｐ＝０のとき（ステップＳ３６）、該当す
るレジスタが使用可でないならば（ステップＳ３７）、
該登録された情報をもとに該レジスタを元の値に戻し
（ステップＳ３８）、該レジスタを使用可とし、該レジ
スタ退避部１６の最後に登録された情報を削除する（ス
テップＳ３９）。When p = 0 (step S36), if the corresponding register is not usable (step S37),
The register is returned to the original value based on the registered information (step S38), the register is made usable, and the information registered last in the register saving unit 16 is deleted (step S39).

【０１２６】上記以外の場合、すなわち、ｐ＝０でない
場合、またはｐ＝０で且つ該当するレジスタが使用可に
なっている場合には、レジスタを元の値に戻さずに、該
レジスタを使用可とし、該レジスタ退避部１６の最後に
登録された情報を削除する（ステップＳ３９）。In cases other than the above, that is, when p = 0 is not satisfied, or when p = 0 and the corresponding register is available, the register is used without returning to the original value. Then, the information registered last in the register saving unit 16 is deleted (step S39).

【０１２７】上記の処理は、ステップＳ３２でｎ＞０で
なくまたはステップＳ３４でｇ＝ｔｇでないことにより
ループを抜けるまで、繰り返し実行する。The above processing is repeatedly executed until the processing exits the loop because n> 0 in step S32 or g = tg in step S34.

【０１２８】レジスタ退避部１６は、図１３に示すよう
に、レジスタ番号、そのレジスタの元の値、命令のグル
ープ値用のフィールドに加えて、Ｐｒｅｄｉｃａｔｅ番
号（Ｐ１やＰ２を識別する情報）のフィールドを持つ。As shown in FIG. 13, the register saving section 16 stores a Predictate number (information for identifying P1 and P2) in addition to a register number, an original value of the register, and a field for an instruction group value. have.

【０１２９】Ｐｒｅｄｉｃａｔｅ付き命令が投機的に実
行された場合、その命令が更新するレジスタと、その元
の値と、その命令のグループ値と共に、その命令に付加
されたＰｒｅｄｉｃａｔｅの番号がレジスタ退避部１６
に登録される。When an instruction with Predicate is speculatively executed, the register updated by the instruction, its original value, the group value of the instruction, and the number of the Predicate added to the instruction are stored in the register save unit 16.
Registered in.

【０１３０】なお、Ｐｒｅｄｉｃａｔｅが付加されてい
ない命令については、常に値が１であるＰｒｅｄｉｃａ
ｔｅ変数が付加されているとして、同様の処理が行なわ
れる。例えば、ｐ３１を常に値が１であるＰｒｅｄｉｃ
ａｔｅ変数であるものとすると、Ｐｒｅｄｉｃａｔｅな
し命令の投機的実行時にはＰｒｅｄｉｃａｔｅ番号とし
て３１が登録される。For instructions to which Predicate is not added, Predica having a value of 1 is always used.
Similar processing is performed assuming that the te variable has been added. For example, p31 is a Predic whose value is always 1.
If it is assumed that the instruction is a predicate number, 31 is registered as a Predicate number during speculative execution of an instruction without Predicate.

【０１３１】本実施形態のプロセッサが備えるその他の
ユニットについては、第１の実施形態におけるロード・
ストア命令や条件分岐命令に対する規則もしくは処理の
仕方をそのままＰｒｅｄｉｃａｔｅセット命令に対して
も適用したものを、本実施形態のプロセッサのユニット
として用いることができる。The other units included in the processor of this embodiment are the same as those of the first embodiment.
A rule or processing method applied to a store instruction or a conditional branch instruction as it is also applied to a Predicate set instruction can be used as a processor unit of the present embodiment.

【０１３２】以下、本実施形態において図１１に例示し
たプログラムコードが投機的に実行される場合の動作に
ついて説明する。説明を簡明にするため、このコードの
開始時点においてパイプラインで処理中の命令は存在し
ないものとする。The operation of the present embodiment when the program code illustrated in FIG. 11 is executed speculatively will be described. For simplicity, it is assumed that there are no instructions being processed in the pipeline at the beginning of this code.

【０１３３】まず、サイクル１に、命令（１）〜（４）
がＦステージでフェッチされる。First, in cycle 1, instructions (1) to (4)
Is fetched at the F stage.

【０１３４】次に、サイクル２に、命令（１）〜（４）
がＤステージでデコードされ、同時にＦステージで命令
（５）から４つの命令がフェッチされる。デコードされ
た（１）〜（４）の命令は同一のグループ値を持つ。以
後、フェッチ、デコードに関する動作は省略する。Next, in cycle 2, instructions (1) to (4)
Are decoded in the D stage, and at the same time, four instructions are fetched from the instruction (5) in the F stage. The decoded instructions (1) to (4) have the same group value. Hereinafter, operations related to fetch and decode are omitted.

【０１３５】次に、サイクル３に、命令（１）すなわち
Ｐｒｅｄｉｃａｔｅセット命令が命令実行ユニット８
に、命令（２）が命令実行ユニット９に転送され、Ｅス
テージで命令の実行が開始される。Next, in cycle 3, the instruction (1), that is, the Predicate set instruction is sent to the instruction execution unit 8
Then, the instruction (2) is transferred to the instruction execution unit 9, and the execution of the instruction is started in the E stage.

【０１３６】続いて、サイクル５に、命令（２）の実行
が完了し、命令（１）はまだ実行中であるとする。命令
（２）は投機性判定部１２により投機的と判断され、レ
ジスタ退避部１６に、レジスタ番号“５”、レジスタｒ
５の更新前の値、グループ値、Ｐｒｅｄｉｃａｔｅ番号
“２”が登録される。そして、Ｗステージでレジスタｒ
５の値が更新される。Subsequently, in cycle 5, it is assumed that the execution of the instruction (2) is completed and the instruction (1) is still being executed. The instruction (2) is determined to be speculative by the speculativeness judging unit 12, and the register number “5”, the register r
The pre-update value, group value, and Predicate number “2” of 5 are registered. Then, at the W stage, the register r
The value of 5 is updated.

【０１３７】続いて、サイクル９の時点で命令（３），
（４）の実行が完了し、命令（１）はまだ実行中である
とする。命令（３）は投機性判定部１２により投機的と
判断され、レジスタ退避部１６に、レジスタ番号
“６”、レジスタｒ６の更新前の値、グループ値、Ｐｒ
ｅｄｉｃａｔｅ番号“１”が登録される。同様に、命令
（４）は投機性判定部１２により投機的と判断され、レ
ジスタ退避部１６に、レジスタ番号“５”、レジスタｒ
５の更新前の値、グループ値、Ｐｒｅｄｉｃａｔｅ番号
“１”が登録される。そして、Ｗステージでレジスタｒ
５とレジスタｒ６の値が更新される。このときレジスタ
退避部１６には図１３の情報が登録された状態になる。Subsequently, at the time of cycle 9, instructions (3),
It is assumed that the execution of (4) has been completed and the instruction (1) is still being executed. The instruction (3) is determined to be speculative by the speculativeness determining unit 12, and the register saving unit 16 stores the register number “6”, the value of the register r6 before updating, the group value, and Pr.
The edit number "1" is registered. Similarly, the instruction (4) is determined to be speculative by the speculativeness judging unit 12 and the register number “5”, the register r
5, the pre-update value, the group value, and the Predicate number “1” are registered. Then, at the W stage, the register r
5 and the value of the register r6 are updated. At this time, the information in FIG. 13 is registered in the register saving unit 16.

【０１３８】次に、サイクル１０に、命令（１）すなわ
ちＰｒｅｄｉｃａｔｅセット命令の実行が完了すると、
Ｐｒｅｄｉｃａｔｅセットトラップが生じる。Next, in cycle 10, when the execution of the instruction (1), that is, the Predicate set instruction is completed,
A Predictate set trap occurs.

【０１３９】Ｐｒｅｄｉｃａｔｅセット命令によってｐ
１が１、ｐ２が０にセットされた場合には、図１２の手
順に従い、まず、図１３の登録順３の情報が調べられ、
Ｐ１＝１であるので、レジスタｒ５は変更せずに（元の
値に戻さずに）、レジスタｒ５を使用可とし、該登録順
３の情報を削除する。次に、登録順２の情報が調べら
れ、Ｐ１＝１であるので、レジスタｒ６は変更せずに、
レジスタｒ６を使用可とし、該登録順２の情報を削除す
る。最後に登録順１の情報が調べられ、Ｐ２＝０である
がレジスタｒ５は使用可の状態になっているので、何も
せずに、該登録順１の情報を削除する。[0139] By the Predicate set instruction, p
When 1 is set to 1 and p2 is set to 0, first, the information of the registration order 3 in FIG. 13 is checked according to the procedure of FIG.
Since P1 = 1, the register r5 is enabled without changing the register r5 (without returning to the original value), and the information of the registration order 3 is deleted. Next, the information of the registration order 2 is examined, and since P1 = 1, the register r6 is not changed, and
The register r6 is made usable, and the information of the registration order 2 is deleted. Finally, the information of the registration order 1 is examined, and although P2 = 0, the register r5 is in a usable state, so that the information of the registration order 1 is deleted without doing anything.

【０１４０】一方、Ｐｒｅｄｉｃａｔｅセット命令によ
ってｐ１が０、ｐ２が１にセットされた場合には、ま
ず、登録順３の情報が調べられ、Ｐ１＝０且つレジスタ
ｒ５が使用可でないので、レジスタｒ５の値を更新前の
値である０に戻し、レジスタｒ５を使用可として、該登
録順３の情報を削除する。次に、登録順２の情報が調べ
られ、Ｐ１＝０且つレジスタｒ６が使用可でないので、
レジスタｒ６の値を元に戻し、レジスタｒ６を使用可と
して、該登録順２の情報を削除する。最後に、登録順１
の情報を調べ、Ｐ２＝１であり、またレジスタｒ５は使
用可の状態になっているので、何もせずに、該登録順１
の情報を削除する。On the other hand, when p1 is set to 0 and p2 is set to 1 by the Predicate set instruction, first, the information of the registration order 3 is checked. Since P1 = 0 and the register r5 is not usable, the register r5 The value is returned to 0, which is the value before the update, the register r5 is made usable, and the information of the registration order 3 is deleted. Next, the information of the registration order 2 is examined. Since P1 = 0 and the register r6 is not usable,
The value of the register r6 is restored, the register r6 is made usable, and the information of the registration order 2 is deleted. Finally, registration order 1
Is checked, and P2 = 1, and the register r5 is in a usable state.
Delete information for.

【０１４１】なお、図１のように実行ユニットに転送さ
れている命令が最大でも１であるプロセッサでは、Ｐｒ
ｅｄｉｃａｔｅセット命令の発行から完了までに要する
サイクル数（これは一意に決まる）がＰｒｅｄｉｃａｔ
ｅセットトラップの処理時間より長い場合により大きな
Ｐｒｅｄｉｃａｔｅ付き命令の投機的実行の効果が得ら
れる。また、図９の実行ユニットがキューを備える構成
のように、Ｐｒｅｄｉｃａｔｅセット命令の発行から完
了までに要するサイクル数が一定でない場合には、Ｐｒ
ｅｄｉｃａｔｅ付き命令を投機的に実行した方が効率が
良い場合のみ投機的実行を行なうという方式も考えられ
る。実行効率の判断には、例えばＰｒｅｄｉｃａｔｅセ
ット命令の発行時に発行先の実行ユニットのキューにあ
る命令数を調べ、ある一定数以上の命令がある場合には
投機的実行を行なうという手法が考えられる。In a processor in which the number of instructions transferred to the execution unit is at most 1 as shown in FIG.
The number of cycles required from issuance to completion of the edit set instruction (this is uniquely determined) is Predicat
When the processing time is longer than the e-set trap processing time, the effect of speculative execution of a larger instruction with Predicate is obtained. If the number of cycles required from issuance to completion of a Predicate set instruction is not constant, as in the configuration in which the execution unit includes a queue in FIG.
It is also conceivable to perform speculative execution only when it is more efficient to speculatively execute an instruction with edit. To determine the execution efficiency, for example, a method of examining the number of instructions in the queue of the execution unit to which the Predicate set instruction is issued at the time of issuing the Predicate set instruction and performing speculative execution when there is a certain number or more of instructions is considered.

【０１４２】（第３の実施形態）第１、第２の実施形態
では、投機的に書き換えるレジスタの元の値を退避させ
ておき、トラップの際にその値を復帰等させる手法につ
いて述べた。(Third Embodiment) In the first and second embodiments, the method of saving the original value of the register to be speculatively rewritten and restoring the value at the time of trapping has been described.

【０１４３】本実施形態では、投機的実行により誤って
レジスタを変更した場合にもレジスタ値を復帰する必要
がないようにコンパイラがレジスタ割当ておよびコード
生成を行なうことで、プロセッサに、第１、第２の実施
形態のような各命令の投機性判定とレジスタ退避・復帰
を行わずに命令の投機的実行を行なう方式について説明
する。In the present embodiment, the compiler performs register allocation and code generation so that the register value does not need to be restored even when the register is erroneously changed by speculative execution. A method of performing speculative execution of an instruction without performing speculative judgment of each instruction and saving / restoring a register as in the second embodiment will be described.

【０１４４】なお、本実施形態に係るプロセッサの構成
や動作は、図１のプロセッサにおいてレジスタ退避部１
６を備えない点、図２の手順からレジスタ退避部１６に
関する処理（情報の登録や復帰など）が省かれる点およ
び投機性判定部１２〜１５は後述するＳＹＮＣ命令につ
いてのみ投機性を判定する点以外は、基本的には第１の
実施形態と同様とする。命令グループ、グループ値、オ
ペランド状態判定、プロセッサ状態制御などについて
は、第１の実施形態と同様である。The structure and operation of the processor according to the present embodiment are the same as those of the processor shown in FIG.
2, the point that the processing (registration and restoration of information, etc.) relating to the register saving unit 16 is omitted from the procedure of FIG. 2, and that the speculativeness determining units 12 to 15 determine the speculativeness only for a SYNC instruction described later. Other than the above, it is basically the same as the first embodiment. The instruction group, group value, operand state determination, processor state control, and the like are the same as in the first embodiment.

【０１４５】さて、投機的に実行した命令が誤ってレジ
スタを変更した際にレジスタを元の値に戻す必要がある
のは、トラップ後に誤ったレジスタ値を用いて命令が実
行されることで、プロセッサが誤った状態に遷移するこ
とを防ぐためである。When an instruction executed speculatively changes a register by mistake, it is necessary to return the register to the original value because the instruction is executed using an incorrect register value after trapping. This is to prevent the processor from transitioning to an incorrect state.

【０１４６】このような誤動作が生じるのは、以下の２
種類の命令のいずれかが投機的に実行され誤ってレジス
タを更新した場合に限られる。なお、図１４に下記の
（２）の様子を示す。（１）命令の出力オペランドが、投機実行が誤りだった
命令（自身を含む）の入力オペランドの一つと一致する
命令（２）命令の出力オペランドが、その命令に至るパスへ
分岐する条件分岐命令のもう一方のパスにおいて、値が
更新されることなく使用されるような命令ここで、図１５に上記の（１）に該当する命令と（２）
に該当する命令とを含むプログラムコードの一例を示
す。Such a malfunction occurs in the following two cases.
Only when one of the types of instructions is speculatively executed and erroneously updates a register. FIG. 14 shows the state of the following (2). (1) An instruction whose output operand matches one of the input operands of an instruction (including itself) for which speculative execution was erroneous (2) A conditional branch instruction in which an instruction output operand branches to a path leading to the instruction An instruction whose value is used without being updated in the other path of FIG. 15. Here, an instruction corresponding to the above (1) and (2)
An example of a program code including an instruction corresponding to (1) is shown.

【０１４７】図１５では、命令番号４の命令“ｍｏｖｅ
ｒ６，ｒ３”が上記の（１）の命令に該当する。命令
番号２の命令“ｌｗｒ４，ｉｎ（ｒ０）”の完了前
に、命令番号３の命令“ａｄｄｕｒ４，ｒ４，ｒ６”
と命令番号４の命令“ｍｏｖｅｒ６，ｒ３”が投機的に
実行されると、レジスタｒ４、レジスタｒ６が更新され
る。命令２がページフォルトトラップを生じた場合、レ
ジスタｒ４は命令２の再実行時に代入されるが、レジス
タｒ６は誤った値のままのために命令３の再実行時に誤
った状態に遷移してしまう。In FIG. 15, the instruction “move” of instruction number 4
r6, r3 ”corresponds to the instruction of (1). Before the completion of the instruction“ lw r4, in (r0) ”of the instruction number 2, the instruction“ addu r4, r4, r6 ”of the instruction number 3 is completed.
When the instruction “mover6, r3” of instruction number 4 is speculatively executed, the registers r4 and r6 are updated. When the instruction 2 causes a page fault trap, the register r4 is substituted when the instruction 2 is re-executed, but the register r6 remains at an erroneous value, so that when the instruction 3 is re-executed, a transition to an erroneous state occurs. .

【０１４８】一方、図１５の命令番号６の命令“ｌｉ
ｒ６，２０”は上記の（２）の命令に該当する。命令番
号１の命令“ｂｅｑｒ２，ｒ３，命令６”の完了前に
命令６に分岐すると予測し、命令６を投機的に実行して
結果をレジスタｒ６に書くと、命令１が分岐予測ミスト
ラップを生じた場合、命令２からの実行再開時に命令番
号３の命令“ａｄｄｕｒ４，ｒ４，ｒ６”のレジスタ
ｒ６の値が変更されているため誤った状態に遷移してし
まう。On the other hand, the instruction “li” of the instruction number 6 in FIG.
r6, 20 "corresponds to the instruction of the above (2). It is predicted that the instruction" beq r2, r3, instruction 6 "will branch to the instruction 6 before completion, and the instruction 6 is speculatively executed. When the result is written to the register r6, when the instruction 1 causes the branch prediction mistrap, the value of the register r6 of the instruction “addu r4, r4, r6” of the instruction number 3 is changed when the execution is resumed from the instruction 2. Transitions to the wrong state.

【０１４９】逆に、上記の（１）または（２）の条件に
合致するような命令が存在しないようなコードであれ
ば、誤って投機的に更新されたレジスタを元の値に戻す
ことなく、トラップ後の実行でプロセッサが誤った状態
に遷移しないことを保証することができる。On the other hand, if the code does not include an instruction that satisfies the above condition (1) or (2), the register that has been erroneously speculatively updated is not restored to the original value. Therefore, it is possible to guarantee that the processor does not transition to an erroneous state in the execution after the trap.

【０１５０】以下、第１の実施形態のような各命令の投
機性判定およびレジスタ退避・復帰なしに投機的実行を
可能とする、コンパイラによるレジスタ割当て方式およ
びコード生成方式について説明する。A register allocation method and a code generation method by a compiler which enable speculative determination of each instruction and speculative execution without register save / restore as in the first embodiment will be described below.

【０１５１】本方式では、命令セットに「ｓｙｎｃ命
令」と呼ぶ特殊命令を備える。図１をもとにしたプロセ
ッサ構成においては、ｓｙｎｃ命令は、命令実行ユニッ
ト８，９で実行される。ｓｙｎｃ命令が実行ユニットに
送られると、ｓｙｎｃ命令が完了するまで全実行ユニッ
トへの命令転送は停止される。In this method, the instruction set includes a special instruction called a "sync instruction". In the processor configuration based on FIG. 1, the sync instruction is executed by the instruction execution units 8 and 9. When a sync instruction is sent to an execution unit, instruction transfer to all execution units is stopped until the sync instruction is completed.

【０１５２】ｓｙｎｃ命令は実行ユニットにおいて以下
のように動作する。・ｓｙｎｃ命令が非投機的に実行さ
れた場合には、何もしない・ｓｙｎｃ命令が投機的に実
行された場合には、ｓｙｎｃ命令に割当てられたグルー
プ値と同一のグループ値を持つロード・ストア命令また
は条件分岐命令の実行が完了するまで、ｓｙｎｃ命令は
完了しない。The sync instruction operates in the execution unit as follows. If the sync instruction is executed non-speculatively, nothing is done. If the sync instruction is executed speculatively, load / store having the same group value as the group value assigned to the sync instruction. The sync instruction is not completed until the execution of the instruction or the conditional branch instruction is completed.

【０１５３】本実施形態に係るコンパイラは、基本的構
成としては従来のコンパイラと同様の構成を有するが、
そのレジスタ割当ての過程において、本プロセッサにお
けるトラップ後の実行で誤動作しないことを保証するよ
うにｓｙｎｃ命令を発行するものである（すなわち、前
述の（１）または（２）の条件に合致するような命令が
存在しないようにレジスタ割当ておよびコード生成する
ものである）。なお、レジスタ割当て方式に関しては、
レジスタ生存区間を求める部分については任意のレジス
タ割当て方式が適用できる（例：ＡｎｄｒｅｗＷ．Ａ
ｐｐｅｌ，“ＭｏｄｅｒｎＣｏｍｐｉｌｅｒＩｍｐ
ｌｅｍｅｎｔａｔｉｏｎｉｎＣ”，Ｃａｍｂｒｉｄ
ｇｅＵｎｉｖｅｒｓｉｔｙＰｒｅｓｓ，１９９
８）。The compiler according to this embodiment has a basic configuration similar to that of a conventional compiler.
In the process of register allocation, a sync instruction is issued so as to guarantee that no malfunction occurs in execution after a trap in the present processor (that is, a sync instruction that satisfies the above condition (1) or (2)). Register allocation and code generation such that there are no instructions). In addition, regarding the register allocation method,
An arbitrary register allocation method can be applied to a part for obtaining a register live range (eg, Andrew W.A.).
ppel, "Modern Compiler Imp
elementation in C ", Cambridge
Ge University Press, 199
8).

【０１５４】以下、本方式に基づくレジスタ割当てにつ
いて説明する。The following describes the register allocation based on this method.

【０１５５】図１６に、本実施形態に係るコンパイラ
の、基本ブロックに対するレジスタ割当ておよびコード
生成の処理手順の一例を示す。FIG. 16 shows an example of a procedure of register allocation and code generation for a basic block by the compiler according to the present embodiment.

【０１５６】なお、命令の開始部分または条件分岐や無
条件分岐の分岐先の部分から始まり、条件分岐や無条件
分岐、および条件分岐や無条件分岐の分岐先の直前で終
わる一連の命令を、基本ブロックと呼ぶものとする。A series of instructions starting from an instruction start portion or a branch destination of a conditional branch or an unconditional branch, and ending immediately before a conditional branch or an unconditional branch, and immediately before a branch destination of a conditional branch or an unconditional branch are described as follows. It is called a basic block.

【０１５７】まず、高級言語で記述されたソースプログ
ラムなどをもとに、記号レジスタを含むプログラムコー
ドが生成されているものとする。First, it is assumed that a program code including a symbol register has been generated based on a source program or the like described in a high-level language.

【０１５８】さて、この処理の初期状態では、全てのレ
ジスタ（ｒ１，…）は使用可能である。レジスタ割当て
はプログラムコード先頭の基本ブロックから始める。By the way, in the initial state of this processing, all the registers (r1,...) Can be used. Register allocation starts from the basic block at the beginning of the program code.

【０１５９】まず、基本ブロックに存在する全ての記号
レジスタ（＄１，…）に対して、命令順に以下の処理を
レジスタが不足するまで行なう。・使用可能なレジスタを一つ選び割当てる・そのレジスタを使用不可とする以上は、図１６のステップＳ４１，Ｓ４２，Ｓ４３，Ｓ
４４，Ｓ４５，Ｓ４６，Ｓ５２のループに該当する。First, the following processing is performed on all the symbol registers (# 1,...) Existing in the basic block in the order of instructions until the registers become insufficient. -Select and assign one available register.-Disable that register. Steps S41, S42, S43, S43 in FIG.
44, S45, S46, and S52.

【０１６０】割当て中の記号レジスタが既に割当てが完
了した別の基本ブロックに存在する場合には、割当てが
完了したブロックで割当てたレジスタを割当てる（ステ
ップＳ４３，Ｓ５０，Ｓ５１）。また、スタックポイン
タなど割当てレジスタが固定の記号レジスタについて
も、その固定のレジスタを割当てる（ステップＳ４３，
Ｓ５０，Ｓ５１）。このとき、割当てたレジスタが使用
可能なレジスタ集合に含まれない場合（ステップＳ５
０）には、この記号レジスタに出力する命令の直前にｓ
ｙｎｃ命令を入れ、この時点で生存していない全てのレ
ジスタを使用可能とする（ステップＳ４７）。If the symbol register being assigned exists in another basic block to which assignment has already been completed, the assigned register is assigned to the block to which assignment has been completed (steps S43, S50, S51). Also, for a symbol register whose assignment register is fixed, such as a stack pointer, the fixed register is assigned (step S43,
S50, S51). At this time, when the assigned register is not included in the usable register set (step S5).
0) contains s immediately before the instruction to be output to this symbol register.
A nc instruction is inserted, and all registers that are not alive at this time are made available (step S47).

【０１６１】また、レジスタが不足した場合（ステップ
Ｓ４６）には、最後にレジスタ割当てを行なった命令の
後にｓｙｎｃ命令を挿入し、この時点で生存していない
全てのレジスタを使用可能とする（ステップＳ４７）。If there are not enough registers (step S46), a sync instruction is inserted after the last register allocation instruction, and all registers that are not alive at this time can be used (step S46). S47).

【０１６２】そして、新たに使用可になるレジスタがな
ければ挿入したＳＹＮＣ命令を削除してスピル処理（命
令の一部をレジスタを使用しないように変更する処理）
を行なった（ステップＳ４８，Ｓ４９）後に、ステップ
Ｓ４３に戻ってレジスタ割当てを再開する。If there is no newly available register, the inserted SYNC instruction is deleted and spill processing is performed (processing for changing a part of the instruction so as not to use the register).
(Steps S48 and S49), the flow returns to step S43 to resume register allocation.

【０１６３】レジスタ割り当てが完了した時点で、その
基本ブロックがループを構成し、かつそのループで最後
にレジスタ割り当てを行った基本ブロックの場合には、
最後の命令の後にＳＹＮＣ命令を挿入する（ステップＳ
５３）。When the register allocation is completed, if the basic block forms a loop, and if the register is the last basic block to be allocated in the loop,
Insert a SYNC instruction after the last instruction (step S
53).

【０１６４】以上のような処理により基本ブロックの全
ての記号レジスタに対する割当が完了したら、レジスタ
割当てを行っていない別の基本ブロックのうち、以下の
条件を満たす基本ブロックを一つ選んで更にレジスタ割
当てを行なう。・その基本ブロックに流れ込む全ての基本ブロックはい
ずれも「レジスタ割当てが完了している」かまたは「そ
の基本ブロックと共にループの一部を構成している」図１７の例では、基本ブロックＤは３つの基本ブロック
Ａ，Ｂ，Ｃが流れ込むブロックであり、基本ブロックＡ
は「レジスタ割当てが完了している」という条件を満足
し、基本ブロックＣは「基本ブロックＤと共にループの
一部を構成している」という条件を満足するが、ブロッ
クＢがレジスタ未割当てで且つ基本ブロックＤと共にル
ープの一部を構成しておらず、いずれの条件も満足しな
いので、基本ブロックＤを選択することはできない。し
かし、図１７において、基本ブロックＢが選択されてそ
のレジスタ割当てが完了すると、その後には、基本ブロ
ックＡ，Ｂは前者の条件を満足し、ブロックＣは後者の
条件を満足するので、ブロックＤが選ばれる。When the allocation of all the basic registers to the symbol registers is completed by the above processing, one basic block satisfying the following condition is selected from the other basic blocks to which no register allocation has been performed, and further register allocation is performed. Perform All the basic blocks flowing into the basic block are either “register allocation is completed” or “constitute a part of a loop with the basic block”. In the example of FIG. 17, the basic block D is 3 Is a block into which two basic blocks A, B, and C flow, and the basic block A
Satisfies the condition that “register allocation is completed”, the basic block C satisfies the condition that “is part of a loop together with the basic block D”, but block B is unregistered and Since a part of the loop is not formed together with the basic block D and none of the conditions is satisfied, the basic block D cannot be selected. However, in FIG. 17, when the basic block B is selected and its register allocation is completed, the basic blocks A and B satisfy the former condition and the block C satisfies the latter condition. Is selected.

【０１６５】ここで、新たに選ばれたレジスタ割当てが
行なわれていない基本ブロックの先頭においては、使用
可能なレジスタの集合は、以下の式により表される。Here, at the head of a newly selected basic block to which no register is allocated, a set of usable registers is represented by the following equation.

【０１６６】[0166]

【数１】 (Equation 1)

【０１６７】上記の式において、Ｉ₁…Ｉ_nは、この新
たに選ばれた基本ブロックに流れ込み、かつレジスタ割
り当てが完了している基本ブロックの集合であり、Ｂ₁
…Ｂ _mは、Ｉ₁…Ｉ_nから流れ出る基本ブロックの集合
である。In the above formula, I₁... I_nThis new
Flow into the selected basic block and register allocation
A set of basic blocks for which allocation has been completed.₁
... B _mIs I₁... I_nSet of basic blocks flowing out of
It is.

【０１６８】例えば、図１８の場合、Ｂ３を割り当て対
象の基本ブロックとすると、Ｉ₁…Ｉ₃は基本ブロック
Ｂ３に流れ込み且つ基本ブロックＢ３と共にループの一
部を構成しない基本ブロックであり、Ｂ₁…Ｂ₃はＩ₁
…Ｉ_nから流れ出る基本ブロックの集合である。また、
ｉ１で示す基本ブロックは、基本ブロックＢ３のレジス
タ割り当て時にはまだレジスタ割り当てがなされていな
いので、Ｉ₁…Ｉ_nには含まれない。[0168] For example, in the case of FIG. 18, when the basic block allocation target B3, a basic block does not form part of a loop with I ₁ ... I ₃ is and basic block flow into the basic block B3 B3, B ₁ ... B ₃ is I ₁
... is a set of basic block flow out from the I _n. Also,
basic blocks shown in i1, since at the time the register allocation of basic blocks B3 have not yet been made register allocation, not included in the I ₁ ... I _n.

【０１６９】また、上記の式において、ｆｒｅｅはその
基本ブロックのレジスタ割当て完了時に利用可能なレジ
スタ、ｄｅａｄはその基本ブロックの先頭で生存してい
ないレジスタの集合を表す。In the above equation, free represents a register that can be used at the time of completion of register allocation of the basic block, and dead represents a set of registers that do not exist at the head of the basic block.

【０１７０】以上のような処理を繰り返し行い、全ての
基本ブロックのレジスタ割当てが完了したら処理を終了
する。The above processing is repeated, and when the register allocation for all basic blocks is completed, the processing is terminated.

【０１７１】以下では、図１９の記号レジスタで記述さ
れたプログラムコードに対して、本方式に基づきレジス
タ割当ておよびコード生成を行なった際の動作について
説明する。In the following, description will be given of the operation when register allocation and code generation are performed based on the present method for the program code described in the symbol register of FIG.

【０１７２】図２０は、図１９のコードに存在する基本
ブロック間の関係を示したものである。FIG. 20 shows the relationship between the basic blocks existing in the code of FIG.

【０１７３】本例では、レジスタはｒ１〜ｒ８までとす
る。また、レジスタ割当ては命令１の基本ブロックから
始まり、この基本ブロックの先頭でレジスタｒ１，ｒ
５，ｒ７，ｒ８が使用可能かつ生存していないものとす
る。In this example, the registers are r1 to r8. The register allocation starts from the basic block of the instruction 1, and the registers r1, r
5, r7, and r8 are usable and do not survive.

【０１７４】まず、命令１の記号レジスタ＄１にレジス
タｒ１を割当て、ｒ１を使用不可とする。同様に、命令
２の＄５にｒ５を、命令３の＄７にｒ７を割当てて、最
初の基本ブロックの割当てを終了する。この基本ブロッ
クの割当て終了時の使用可能レジスタ集合はｒ８とな
る。First, the register r1 is assigned to the symbol register # 1 of the instruction 1, and the register r1 is disabled. Similarly, r5 is allocated to $ 5 of instruction 2, and r7 is allocated to $ 7 of instruction 3, and the allocation of the first basic block is completed. The available register set at the end of the basic block allocation is r8.

【０１７５】次に、命令５，６の基本ブロックのレジス
タ割当てが行なわれる。命令１〜４の基本ブロック
Ｉ₁、命令５，６の基本ブロックＢ₁、命令７の基本ブ
ロックＢ ₂に対し、ｆｒｅｅ（Ｉ₁）＝ｒ８、ｄｅａｄ
（Ｂ₁）＝ｒ４〜ｒ８、ｄｅａｄ（Ｂ₂）＝ｒ３〜ｒ６
およびｒ８であるので、この基本ブロックの先頭におい
て割当て可能なレジスタはｒ８だけである。したがっ
て、レジスタｒ８を＄８に割当てる。この結果、割当て
終了時に使用可能なレジスタはない。Next, the register of the basic block of instructions 5 and 6
Data allocation is performed. Basic block of instructions 1-4
I₁, Basic block B of instructions 5 and 6₁, Instruction 7
Lock B _TwoFor free (I₁) = R8, dead
(B₁) = R4 to r8, dead (B_Two) = R3 to r6
And r8, so at the beginning of this basic block
The only register that can be allocated is r8. Accordingly
Then, the register r8 is allocated to $ 8. As a result, the assignment
No registers are available on exit.

【０１７６】次に、命令７の基本ブロックのレジスタ割
当てが行なわれる。上記の命令５，６の基本ブロックと
同様、命令７の基本ブロックの先頭において割当て可能
なレジスタはｒ８だけである。したがって、レジスタｒ
８を＄９に割当てる。この結果、割当て終了時に使用可
能なレジスタはない。Next, register allocation of the basic block of the instruction 7 is performed. As with the basic blocks of the instructions 5 and 6, the only register that can be allocated at the beginning of the basic block of the instruction 7 is r8. Therefore, register r
Assign 8 to $ 9. As a result, no registers are available at the end of the assignment.

【０１７７】最後に命令８の基本ブロックのレジスタ割
当てを行なう。命令５，６の基本ブロックＩ₃、命令７
の基本ブロックＩ₄に対し、ｆｒｅｅ（Ｉ₃）＝ｆｒｅ
ｅ（Ｉ₄ ）＝φであるので、この基本ブロックの先頭で
使用可能なレジスタはない。そこで、命令番号８の命令
“ａｄｄ＄１１，＄１，ｒ２”の前にｓｙｎｃ命令を
入れ（この結果、ｓｙｎｃ命令が命令番号８、“ａｄ
ｄ”が命令番号９となる）、これによって全てのレジス
タが使用可能になるので、＄１１にｒ１を割当てる。Finally, the register of the basic block of the instruction 8 is allocated. Basic block I _{3 of} instructions 5 and 6, instruction 7
Free (I ₃ ) = free for the basic block I ₄ of
Since e (I ₄ ) = φ, there is no register available at the beginning of this basic block. Therefore, a sync instruction is inserted before the instruction “add $ 11, $ 1, r2” of the instruction number 8 (as a result, the sync instruction becomes the instruction number 8, “ad
d "becomes the instruction number 9), which makes all the registers usable. Therefore, r1 is assigned to $ 11.

【０１７８】以上により、図１９にレジスタ割当ておよ
びコード挿入を施した後のコードは図２１のようにな
る。Thus, the code after register assignment and code insertion in FIG. 19 are as shown in FIG.

【０１７９】次に、本実施形態のプロセッサで図２１の
コードを実行した場合の動作について説明する。Next, the operation when the code of FIG. 21 is executed by the processor of this embodiment will be described.

【０１８０】まず、命令１，２が命令実行ユニット８，
９で実行され、それらの結果がレジスタｒ１，ｒ５に書
かれる。First, the instructions 1 and 2 correspond to the instruction execution units 8 and
9 and their results are written to registers r1 and r5.

【０１８１】命令４の分岐予測先は命令７とすると、次
に、命令３、命令４、命令７がそれぞれ命令実行ユニッ
ト８、分岐命令実行ユニット１１、命令実行ユニット９
で実行される。Assuming that the branch prediction destination of the instruction 4 is the instruction 7, the instructions 3, 4 and 7 are respectively executed by the instruction execution unit 8, the branch instruction execution unit 11, and the instruction execution unit 9
Executed in

【０１８２】命令４の完了前に命令３と命令７が完了す
ると、命令８すなわちｓｙｎｃ命令が実行ユニットに転
送される。ｓｙｎｃ命令は投機性判定部により投機的と
判定されるため、ｓｙｎｃ命令と同じグループ値を持つ
条件分岐命令４が完了するまで命令９の実行ユニットへ
の転送は停止される。If instructions 3 and 7 complete before instruction 4 completes, instruction 8, a sync instruction, is transferred to the execution unit. Since the sync instruction is determined to be speculative by the speculativeness determination unit, the transfer of the instruction 9 to the execution unit is stopped until the conditional branch instruction 4 having the same group value as the sync instruction is completed.

【０１８３】命令４の完了時に分岐予測ミストラップが
生じたとすると、トラップ後、命令５からコード実行が
再開される。ここで、命令５はレジスタｒ１を用いるた
め、もし命令４の完了前に命令９が投機的にｒ１を更新
すると誤動作が生じる。このように、命令８のｓｙｎｃ
命令は、命令９が投機的に実行されないことを保証して
いる。Assuming that a branch prediction mistrap occurs upon completion of instruction 4, code execution is resumed from instruction 5 after the trap. Here, since the instruction 5 uses the register r1, if the instruction 9 speculatively updates r1 before the completion of the instruction 4, a malfunction occurs. Thus, the sync of instruction 8
The instruction guarantees that instruction 9 is not executed speculatively.

【０１８４】一方、命令４が命令５に分岐すると予測し
て分岐予測ミストラップが発生した場合でも、命令９が
投機的にｒ１を更新すると命令９の再実行時に誤動作が
生じるが、命令８のｓｙｎｃ命令により命令９は投機的
に実行されないことが保証される。On the other hand, even if the branch prediction mistrap occurs by predicting that the instruction 4 branches to the instruction 5, if the instruction 9 speculatively updates r1, a malfunction occurs when the instruction 9 is re-executed. The sync instruction guarantees that instruction 9 will not be executed speculatively.

【０１８５】（第４の実施形態）第１〜第３の実施形態
はｉｎ−ｏｒｄｅｒ命令発行スーパースカラープロセッ
サで命令を投機的に実行する場合の実施形態であるが、
本実施形態は、ｉｎ−ｏｒｄｅｒ命令発行ＶＬＩＷプロ
セッサで命令を投機的に実行する場合の実施形態であ
る。(Fourth Embodiment) The first to third embodiments are embodiments in which instructions are speculatively executed by an in-order instruction issuing superscalar processor.
This embodiment is an embodiment in which an in-order instruction issuing VLIW processor executes instructions speculatively.

【０１８６】本実施形態のプロセッサにおけるパイプラ
インの流れについては基本的には第１の実施形態と同様
であるが、本実施形態では、グループ値はＶＬＩＷ命令
のコード内に含まれていて、そこから取得されるもので
ある。すなわち、本実施形態では、グループ値はコンパ
イラによって生成される。なお、命令グループやグルー
プ値の定義は第１の実施形態と同様である。The flow of the pipeline in the processor of this embodiment is basically the same as that of the first embodiment. However, in this embodiment, the group value is included in the code of the VLIW instruction. Is obtained from That is, in the present embodiment, the group value is generated by the compiler. The definition of the instruction group and the group value is the same as in the first embodiment.

【０１８７】図２２は、本実施形態に係るプロセッサの
構成例を示すブロック図である。FIG. 22 is a block diagram showing a configuration example of a processor according to the present embodiment.

【０１８８】図２２に示されるように、本実施形態のプ
ロセッサは、メモリ１０１、アドレス生成部１０３、命
令フェッチ部１０２、命令デコード部１０４、オペラン
ド状態判定部１０５、命令実行ユニット１０８〜１１
０、分岐履歴記憶部１１５、投機性判定部１１１〜１１
３、レジスタ退避部１１４、レジスタ１１６を持つ。As shown in FIG. 22, the processor of this embodiment comprises a memory 101, an address generation unit 103, an instruction fetch unit 102, an instruction decode unit 104, an operand state determination unit 105, and instruction execution units 108 to 11.
0, branch history storage unit 115, speculative judgment units 111 to 11
3. It has a register saving unit 114 and a register 116.

【０１８９】図１の構成例と比べてグループ値生成部が
なく、分岐履歴記憶部１１５が設けられている点が主に
異なっている。The main difference from the configuration example of FIG. 1 is that a group value generation unit is not provided and a branch history storage unit 115 is provided.

【０１９０】なお、以下では、「ＶＬＩＷ命令」と、Ｖ
ＬＩＷ命令を構成する各々の「命令」とを区別するため
に、後者の１つのＶＬＩＷ命令を構成する各々の命令の
ことを「アトム」とも呼ぶことがある。In the following, the "VLIW instruction" and the V
In order to distinguish from each "instruction" constituting the LIW instruction, each of the instructions constituting the latter one VLIW instruction may also be referred to as "atom".

【０１９１】本実施形態では、命令実行ユニットは３つ
であり、すなわち１つのＶＬＩＷ命令が３つのアトムか
ら構成される場合の例を示しているが、命令実行ユニッ
トが２つあるいは４以上の場合、すなわち１つのＶＬＩ
Ｗ命令が２あるいは４以上のアトムから構成される場合
にも本発明は適用可能である。In this embodiment, there is shown an example in which the number of instruction execution units is three, that is, one VLIW instruction is composed of three atoms. However, when the number of instruction execution units is two or four or more, , Ie one VLI
The present invention is also applicable when the W instruction is composed of two or more atoms.

【０１９２】また、本実施形態のプロセッサにおけるパ
イプラインの流れについては、基本的には第１の実施形
態の図２と同様である。Further, the flow of the pipeline in the processor of the present embodiment is basically the same as that of FIG. 2 of the first embodiment.

【０１９３】以下では、本実施形態が第１の実施形態と
相違する点を中心に説明する。In the following, description will be made focusing on the points in which the present embodiment is different from the first embodiment.

【０１９４】まず、本実施形態のプロセッサにおけるパ
イプライン処理の流れについて説明する。First, the flow of the pipeline processing in the processor of this embodiment will be described.

【０１９５】命令フェッチ部１０２は、アドレス生成部
１０３により示されるＶＬＩＷ命令アドレスをアドレス
バス１１９で指定し、メモリ１０１から命令フェッチバ
ス１１７を介して３つのアトムからなるＶＬＩＷ命令を
フェッチする（ステップＳ１１）。The instruction fetch unit 102 specifies the VLIW instruction address indicated by the address generation unit 103 on the address bus 119, and fetches a VLIW instruction composed of three atoms from the memory 101 via the instruction fetch bus 117 (step S11). ).

【０１９６】命令デコード部１０４は、フェッチしたＶ
ＬＩＷ命令をデコードバス１２０を介してデコードし、
各アトムについて、その種類と、使用するオペランド
と、グループ値と、投機的かどうかの情報とを得る（ス
テップＳ１２）。The instruction decode unit 104
Decodes the LIW instruction via the decode bus 120,
For each atom, the type, the operand to be used, the group value, and information on whether or not the atom is speculative are obtained (step S12).

【０１９７】デコードされたＶＬＩＷ命令は、オペラン
ド状態判定部１０５により各アトムが用いるオペランド
が使用可能かどうかを判定された（ステップＳ１４）
後、全てのオペランドが使用可能であれば各アトムは命
令バス１２４〜１２６を介してそれぞれ対応する命令実
行ユニット１０８〜１１０に転送される。このとき、グ
ループ値および投機的かどうかの情報も併せて転送され
る。その後、レジスタリードバス１２７〜１２９を介し
てレジスタ１１６からオペランドをフェッチして各アト
ムが実行される（ステップＳ１５）。In the decoded VLIW instruction, it is determined by the operand state determining unit 105 whether the operand used by each atom is usable (step S14).
Thereafter, if all operands are available, each atom is transferred to the corresponding instruction execution unit 108-110 via the instruction bus 124-126, respectively. At this time, the group value and information on whether it is speculative or not are also transferred. Thereafter, the operand is fetched from the register 116 via the register read buses 127 to 129, and each atom is executed (step S15).

【０１９８】各アトムの実行後、投機性判定部１１１〜
１１３によりアトムの実行が投機的であったかどうかを
判定し（ステップＳ１７）、投機的である場合には、変
更するレジスタの識別情報と、そのレジスタの変更前の
値と、そのアトムのグループ値とをレジスタ退避部１１
４に退避する（ステップＳ１８）。After the execution of each atom, the speculativeness judging sections 111 to 111
It is determined whether the execution of the atom is speculative by 113 (step S17). If it is speculative, the identification information of the register to be changed, the value before the change of the register, the group value of the atom, To the register saving unit 11
4 (step S18).

【０１９９】その後、そのアトムが条件分岐命令である
場合には、その条件分岐命令にかかれたラベルに分岐し
たかどうかを分岐履歴記憶部１１５に記録する。他の種
類のアトムについては、レジスタライトバス１３９〜１
４１およびメモリバス１４２を介して実行結果をレジス
タ１１６およびメモリ１０１に書き戻す（ステップＳ１
９）。Thereafter, if the atom is a conditional branch instruction, whether or not the branch to the label associated with the conditional branch instruction is recorded in the branch history storage unit 115. For other types of atoms, register write buses 139-1
The execution result is written back to the register 116 and the memory 101 via the memory bus 41 and the memory bus 142 (step S1).
9).

【０２００】ところで、ｉｎ−ｏｒｄｅｒ命令発行ＶＬ
ＩＷプロセッサでは、整数演算用パイプライン、浮動小
数点演算用パイプラインにおいて、あるＶＬＩＷ命令中
のアトムは、プログラム順で前に位置する別のＶＬＩＷ
命令中のアトムの実行完了前に完了することはない。よ
って、条件分岐命令を含むＶＬＩＷ命令を越えるＶＬＩ
Ｗ命令の投機的実行が生じることはあり得ない。同様の
理由から、ロード命令、ストア命令を含むＶＬＩＷ命令
を越えるＶＬＩＷ命令の投機的実行も起こり得ない。Incidentally, the in-order instruction issuance VL
In the IW processor, in an integer operation pipeline and a floating point operation pipeline, an atom in a certain VLIW instruction is replaced with another VLIW instruction located earlier in the program order.
It does not complete before the execution of the atom in the instruction is completed. Therefore, VLI exceeding VLIW instruction including conditional branch instruction
Speculative execution of the W instruction cannot occur. For the same reason, speculative execution of a VLIW instruction exceeding a VLIW instruction including a load instruction and a store instruction cannot occur.

【０２０１】このように、ｉｎ−ｏｒｄｅｒ命令発行Ｖ
ＬＩＷプロセッサでは、ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行
完了に起因する投機的実行は起こり得ない。一方、条件
分岐命令（条件分岐アトム）の一方の分岐先にあるアト
ムをその条件分岐命令の前に配置する、投機的コード配
置方式で生成されたコードでは、ｉｎ−ｏｒｄｅｒ命令
発行ＶＬＩＷプロセッサでもアトムが投機的に実行され
る。As described above, the in-order instruction issuance V
In the LIW processor, speculative execution due to completion of out-of-order execution cannot occur. On the other hand, in a code generated by a speculative code arranging method in which an atom at one branch destination of a conditional branch instruction (conditional branch atom) is arranged before the conditional branch instruction, the in-order instruction issuing VLIW processor uses the atom. Is executed speculatively.

【０２０２】従来、アトムを投機的に配置するコードス
ケジューリング方式では、コンパイラは投機的に配置さ
れたアトムが誤って実行された際に変更されたレジスタ
を正しい値に戻す必要があるなら、そのための補正コー
ドを生成する。Conventionally, in a code scheduling method in which an atom is speculatively arranged, if a compiler needs to return a changed register to a correct value when a speculatively arranged atom is erroneously executed, the compiler is required to do so. Generate a correction code.

【０２０３】本発明は、アトムが投機的に配置されたコ
ードにおいて、投機的に実行されたアトムが誤って実行
された際に補正コードを用いずにレジスタを正しい値に
戻すことを可能とする。これにより、補正コードによる
コードサイズの増大を防ぐことが可能となる。The present invention makes it possible to return a register to a correct value without using a correction code when a speculatively executed atom is erroneously executed in a code in which the atom is speculatively arranged. . As a result, it is possible to prevent the code size from increasing due to the correction code.

【０２０４】次に、本実施形態のプロセッサで動作する
投機的命令配置コードをコンパイラが生成するための方
式について示す。Next, a method for the compiler to generate speculative instruction arrangement codes that operate on the processor of this embodiment will be described.

【０２０５】本実施形態に係るコンパイラは、基本構成
としては従来のコンパイラと同様の構成を有するが、そ
の処理の過程において、命令（アトム）へのグループ値
の付加、命令（アトム）の投機的配置、ｔｂｔ，ｔｂｎ
という２種類の分岐トラップ命令（分岐トラップアト
ム）の挿入、といった処理を行なうものである。なお、
命令スケジューリング方式については、（１）記号レジスタの段階で投機的に配置する方式（２）記号レジスタの段階で非投機的に配置されたアト
ムを投機的に移動する方式（３）非投機的に配置されレジスタ割当てされたアトム
を投機的に移動する方式が考えられるが、本方式はいずれの方式においても実現
可能である。以下では（３）に基づく方式について説明
する。The compiler according to the present embodiment has a basic configuration similar to that of a conventional compiler. However, in the course of the processing, addition of a group value to an instruction (atom) and speculation of an instruction (atom) are performed. Placement, tbt, tbn
And inserting two types of branch trap instructions (branch trap atoms). In addition,
Regarding the instruction scheduling method, (1) a method of speculatively arranging at the symbol register stage, (2) a method of speculatively moving an atom arranged non-speculatively at the symbol register stage, and (3) non-speculatively A method of speculatively moving an atom allocated and assigned to a register is conceivable, but this method can be realized by any method. Hereinafter, a method based on (3) will be described.

【０２０６】まず、命令移動前の全ての条件分岐命令
（条件分岐アトム）に対して、できるだけ同じグループ
値を持つ条件分岐命令が少なくなるようグループ値を割
当てる。これは、例えば、各条件分岐命令に対して、コ
ード順に最少のグループ値から１ずつ増やしたグループ
値を割当て、グループ値が最大値を越えた場合には再び
最少のグループ値から順に割当てる、という方法で実現
することができる。First, a group value is assigned to all conditional branch instructions (conditional branch atoms) before instruction movement so that the number of conditional branch instructions having the same group value is reduced as much as possible. This means that, for example, to each conditional branch instruction, a group value increased by one from the smallest group value in code order is assigned, and if the group value exceeds the maximum value, it is again assigned in order from the smallest group value. Can be realized in a way.

【０２０７】次に、移動するアトムの選択およびその移
動を行なう。この方法に関しては、従来提案されている
任意の方式を適用できる（例：Ｊ．Ａ．Ｆｉｓｈｅｒ，
“Ｔｒａｃｅｓｃｈｅｄｕｌｉｎｇ：Ａｔｅｃｈｎｉ
ｑｕｅｆｏｎｇｌｏｂａｌｍｉｃｒｏｃｏｄｅ
ｃｏｍｐａｃｔｉｏｎ”，ＩＥＥＥＴｒａｎｓａｃｔ
ｉｏｎｓｏｎＣｏｍｐｕｔｅｒｓ，３０（７），１
９８１）。ただし、アトムの移動先は以下の条件を満た
している必要がある。・アトムの移動先からアトムの元の位置までのパスに含
まれる条件分岐命令（アトム）は互いに異なるグループ
値を持つこの条件を満たさない位置へのアトムの移動は許されな
い。よって、同一グループ値を持つ分岐アトムの間が大
きく離れているほど、投機的命令移動の可能性が増すこ
とになる。先のグループ値割り当て方式はこのことも考
慮されている。移動したアトムには、そのアトムが投機
的であることを示すフラグを付け、そのアトムが最初に
越えた条件分岐命令のグループ値を割当てる。最後に、
そのアトムが最初に越えた条件分岐命令の分岐先のう
ち、そのアトムが含まれない方の分岐先の先頭に分岐ト
ラップ命令（分岐トラップアトム）を置き、当該条件分
岐命令のグループ値をこの分岐トラップ命令に割当て
る。この様子を、図２３に示す。Next, an atom to be moved is selected and moved. As for this method, any conventionally proposed method can be applied (eg, JA Fisher,
“Trace scheduling: Atechni
que fon global microcode
compaction ”, IEEE Transact
ions on Computers, 30 (7), 1
981). However, the destination of the atom must satisfy the following conditions. -Conditional branch instructions (atoms) included in the path from the atom destination to the atom's original position have different group values. The atom cannot be moved to a position that does not satisfy this condition. Therefore, the greater the distance between branch atoms having the same group value, the greater the possibility of speculative instruction movement. The above-described group value assignment scheme takes this into consideration. A flag indicating that the atom is speculative is attached to the moved atom, and a group value of a conditional branch instruction that the atom first crosses is assigned. Finally,
A branch trap instruction (branch trap atom) is placed at the head of the branch destination that does not include the atom among the branch destinations of the conditional branch instruction that the atom first crosses, and the group value of the conditional branch instruction is set to this branch. Assign to a trap instruction. This is shown in FIG.

【０２０８】配置する分岐トラップ命令（アトム）の種
類は以下の基準により選択する。・条件分岐命令にラベルが記されている方の分岐先にア
トムが置かれていた場合にはｔｂｎ命令（ｔｒａｐｉ
ｆｂｒａｎｃｈｎｏｔｔａｋｅｎ）・条件分岐命令にラベルが記されていない方の分岐先に
アトムが置かれていた場合にはｔｂｔ命令（ｔｒａｐ
ｉｆｂｒａｎｃｈｔａｋｅｎ）コードの選択およびその移動を終了する基準について
は、上記の移動するアトムの選択およびその移動と同様
に従来の方法を用いることができる。The type of the branch trap instruction (atom) to be arranged is selected based on the following criteria. -If an atom is placed at the branch destination where the label is written in the conditional branch instruction, the tbn instruction (trap i
f branch not take) • If an atom is placed at the branch destination where no label is written in the conditional branch instruction, a tbt instruction (trap
If branch take) The selection of the code and the criterion for terminating the movement can be performed using a conventional method, similarly to the above-described selection and movement of the atom to be moved.

【０２０９】図２４は、非投機的な命令配置およびレジ
スタ割当てが完了したコードの例である。本方式に基づ
きＶＬＩＷ命令４のアトム１すなわち“ｌｉｒ２１，
１０２４”を、ＶＬＩＷ命令３のアトム１の条件分岐命
令“ｂｅｑｒ３，ｒ１８，ＶＬＩＷ命令６（のアドレ
ス）”を越えて、ＶＬＩＷ命令１のアトム２の位置に移
動する場合について説明する。FIG. 24 shows an example of code in which non-speculative instruction arrangement and register allocation have been completed. Based on this method, the atom 1 of the VLIW instruction 4, that is, "lir21,
1024 "is moved to the position of atom 2 of VLIW instruction 1 beyond the conditional branch instruction" beq r3, r18, VLIW instruction 6 (address) "of atom 1 of VLIW instruction 3.

【０２１０】まず、ＶＬＩＷ命令４のアトム１をＶＬＩ
Ｗ命令１に移動する。このとき、ＶＬＩＷ命令３の条件
分岐命令にはグループ値“１”が付加されていたとする
と、移動後のＶＬＩＷ命令１のアトム２にはグループ値
“１”（最初に越えた条件分岐命令のグループ値）が付
加される。また、ＶＬＩＷ命令１のアトム２には投機的
であることを示す投機的フラグを付ける。投機性フラグ
は２種類あり、ｓｔ（ｓｐｅｃｕｌａｔｉｖｅｉｆ
ｔａｋｅｎ）は条件分岐命令がそのラベルの方に分岐し
た場合に投機的となることを示し、ｓｎ（ｓｐｅｃｕｌ
ａｔｉｖｅｉｆｎｏｔｔａｋｅｎ）はそうでない
場合に投機的となることを示す。この例の場合、ｓｔが
付加される。なお、命令移動の結果、図２４のＶＬＩＷ
命令４は不要になるので削除される。First, the atom 1 of the VLIW instruction 4 is converted to the VLI
Move to W instruction 1. At this time, if it is assumed that the group value “1” is added to the conditional branch instruction of the VLIW instruction 3, the group value “1” is added to the atom 2 of the VLIW instruction 1 after the movement (the group of the conditional branch instruction that is first exceeded) Value) is added. Further, a speculative flag indicating speculative is attached to atom 2 of the VLIW instruction 1. There are two types of speculative flags, st (speculative if)
(taken) indicates that the conditional branch instruction becomes speculative when it branches toward the label, and sn (specul)
"active if not taken" indicates that otherwise it is speculative. In the case of this example, st is added. Note that as a result of the instruction movement, VLIW in FIG.
Instruction 4 is deleted because it is no longer needed.

【０２１１】次に、分岐トラップ命令を割り当てる。こ
の例の場合、移動したアトムの元の位置（図２４におけ
るＶＬＩＷ命令４）はＶＬＩＷ命令３の条件分岐命令に
ラベルが記されていない方のパスに含まれるので、分岐
トラップ命令の種類としてはｔｂｔ命令が選ばれる。そ
して、該条件分岐命令に記されたラベルの先頭（図２４
におけるＶＬＩＷ命令６）にｔｂｔ命令を配置する。な
お、この例の場合、ｔｂｔ命令にはグループ値“１”が
付加される。Next, a branch trap instruction is assigned. In the case of this example, the original position of the moved atom (VLIW instruction 4 in FIG. 24) is included in the path not labeled in the conditional branch instruction of VLIW instruction 3, so that the type of the branch trap instruction is The tbt instruction is chosen. Then, the head of the label written in the conditional branch instruction (FIG. 24)
The tbt instruction is placed in the VLIW instruction 6). In this case, a group value “1” is added to the tbt instruction.

【０２１２】このようにして最終的に得られるコードは
図２５のようになる。なお、［］内の値はグループ値
を表す。The code finally obtained in this way is as shown in FIG. In addition, the value in [] represents a group value.

【０２１３】アトムへのグループ値および投機性フラグ
の付加は、例えば、図２６のようにアトム１〜３に加え
て拡張アトムを含めたものを一つのＶＬＩＷ命令として
メモリ上に配置することで実現できる。命令デコード部
１０４は、デコード時に拡張アトムから各アトムのグル
ープ値および投機性フラグを得る。拡張アトムは例えば
図２７のようにグループ値と投機性フラグの組を３つ並
べたものである。グループ値は任意のビット数で構成さ
れ、そのビット数で表せる数の範囲がグループ値の範囲
となる。投機性フラグは２ビットで、図２８の意味を持
つ（０１がｓｔに対応し、１０がｓｎに対応する）。The addition of a group value and a speculative flag to an atom is realized, for example, by arranging, in a memory, one including an extended atom in addition to atoms 1 to 3 as one VLIW instruction as shown in FIG. it can. The instruction decoding unit 104 obtains a group value and a speculative flag of each atom from the extended atom at the time of decoding. The extended atom is, for example, one in which three sets of group values and speculative flags are arranged as shown in FIG. The group value is composed of an arbitrary number of bits, and the range of numbers represented by the number of bits is the range of the group value. The speculative flag is 2 bits and has the meaning shown in FIG. 28 (01 corresponds to st and 10 corresponds to sn).

【０２１４】次に、本実施形態のプロセッサにおいて、
アトムが投機的に配置されたコードが実行される際に各
ユニットがどのように動作するかについて、およびいく
つかのユニットの構成もしくは機能について、より詳し
く説明する。Next, in the processor of this embodiment,
The following describes in more detail how each unit operates when the code in which the atoms are speculatively arranged is executed, and the configuration or function of some units.

【０２１５】ここでは、一例として、図２５のコードを
実行した際の動作について説明する。説明を簡明にする
ため、コードの開始時点においてパイプラインで処理中
の命令は存在しないものとする。パイプラインステージ
の定義は第１の実施形態の場合と同様である。投機性フ
ラグは図２８の例に従うものとする。Here, the operation when the code of FIG. 25 is executed will be described as an example. For simplicity, it is assumed that there are no instructions being processed in the pipeline at the beginning of the code. The definition of the pipeline stage is the same as in the first embodiment. The speculative flag is based on the example of FIG.

【０２１６】まず、サイクル１のＦステージで命令フェ
ッチ部１０２はＶＬＩＷ命令１をフェッチする。First, the instruction fetch unit 102 fetches the VLIW instruction 1 in the F stage of the cycle 1.

【０２１７】次に、サイクル２のＤステージで命令デコ
ード部１０４はＶＬＩＷ命令１をデコードし、Ｆステー
ジで命令フェッチ部１０２はＶＬＩＷ命令２をフェッチ
する。デコードの結果、ＶＬＩＷ命令１のアトム２のグ
ループ値“１”および投機性フラグ“０１”を得る。Next, in the D stage of cycle 2, the instruction decode unit 104 decodes the VLIW instruction 1, and in the F stage, the instruction fetch unit 102 fetches the VLIW instruction 2. As a result of the decoding, the group value “1” of the atom 2 of the VLIW instruction 1 and the speculative flag “01” are obtained.

【０２１８】次に、サイクル３に、オペランド判定部１
０５は、ＶＬＩＷ命令１の各アトムが用いるレジスタｒ
２，ｒ１６，ｒ５が使用可能かどうかを判定する。な
お、オペランド判定部１０５は、第１の実施形態の場合
と同様の構成で実現できる。これらのレジスタは全て使
用可能なので、ＶＬＩＷ命令１のアトム１〜３は、グル
ープ値、投機性フラグとともに命令実行ユニット１０８
〜１１０に転送されてＥステージで実行される。一方、
Ｄステージで命令デコード部１０４はＶＬＩＷ命令２を
デコードし、Ｆステージで命令フェッチ部１０２はＶＬ
ＩＷ命令３をフェッチする。Next, in cycle 3, the operand judging unit 1
05 is a register r used by each atom of the VLIW instruction 1
It is determined whether 2, r16 and r5 are available. Note that the operand determination unit 105 can be realized with the same configuration as in the first embodiment. Since all of these registers can be used, the atoms 1 to 3 of the VLIW instruction 1 together with the group value and the speculative flag are used in the instruction execution unit 108
１１０110 and executed in the E stage. on the other hand,
At the D stage, the instruction decode unit 104 decodes the VLIW instruction 2, and at the F stage, the instruction fetch unit 102
Fetch IW instruction 3.

【０２１９】ＶＬＩＷ命令１の実行が完了すると、投機
性判定部１１１〜１１３は各アトムが投機的に実行され
たかどうかを判定する。投機性判定部１１１〜１１３は
アトムの投機性フラグが“００”であれば非投機的、
“０１”または“１０”であれば投機的と判定する。本
例の場合、ＶＬＩＷ命令１のアトム１“ｌｂｕｒ２，
０（ｒ１６）”は投機性フラグ“００”から非投機的と
判定され、ＶＬＩＷ命令１のアトム２“ｌｉ．ｓｔｒ
５，１０２４”は投機性フラグ“０１”から投機的と判
定される。When the execution of the VLIW instruction 1 is completed, the speculativeness determining units 111 to 113 determine whether each atom has been speculatively executed. If the speculativeness flag of the atom is “00”, the speculativeness judging units 111 to 113 are non-speculative.
If it is “01” or “10”, it is determined to be speculative. In the case of this example, the atom 1 of the VLIW instruction 1 “lbur r2,
0 (r16) ”is determined to be non-speculative from the speculative flag“ 00 ”, and the atom 2“ li. str
5,1024 "is determined to be speculative from the speculative flag" 01 ".

【０２２０】投機的と判定されたＶＬＩＷ命令１のアト
ム２については、レジスタ退避部１１４に、レジスタ番
号“５”、レジスタｒ５の値、そのグループ値“１”お
よび投機性フラグ“０１”を登録する。その後、Ｗステ
ージでレジスタｒ２，ｒ５の値が更新される。With respect to the atom 2 of the VLIW instruction 1 determined to be speculative, the register number “5”, the value of the register r5, its group value “1”, and the speculative flag “01” are registered in the register saving unit 114. I do. Thereafter, the values of the registers r2 and r5 are updated in the W stage.

【０２２１】ここで、レジスタ退避部１１４は、投機的
なアトムが誤りだった場合にプロセッサを正しい状態に
戻すための情報を保存するバッファである。レジスタ退
避部１１４への情報の登録、削除および情報に基づくレ
ジスタ値の復帰は以下の規則に従い行なわれる。・投機的なアトムがプロセッサ状態を更新する際、その
アトムが変更するレジスタ、そのレジスタの変更前の
値、そのアトムのグループ値および投機性フラグが登録
される。・後述する分岐履歴記憶部１１５の値が“分岐”のとき
にｔｂｔ命令が実行された場合、または“直進”のとき
にｔｂｎ命令が実行された場合に、投機実行ミストラッ
プを発生させる。このとき、ｔｂｔ命令またはｔｂｎ命
令と同じグループ値を持つアトムが登録したレジスタ
を、レジスタ退避部１１４の情報を用いて元に戻す。・条件分岐命令が分岐履歴記憶部１１５を更新するとき
に、その更新前のグループ値を持つレジスタ退避部１１
４の情報が削除される。Here, the register saving unit 114 is a buffer for storing information for returning the processor to a correct state when a speculative atom is incorrect. Registration and deletion of information in the register save unit 114 and restoration of a register value based on the information are performed according to the following rules. When a speculative atom updates the processor state, a register changed by the atom, a value before the change of the register, a group value of the atom, and a speculative flag are registered. A speculative execution mistrap is generated when the tbt instruction is executed when the value of the branch history storage unit 115 described later is “branch” or when the tbn instruction is executed when the value is “straight ahead”. At this time, the register registered by the atom having the same group value as the tbt instruction or the tbn instruction is restored using the information of the register saving unit 114. When the conditional branch instruction updates the branch history storage unit 115, the register saving unit 11 having the group value before the update
4 is deleted.

【０２２２】さて、パイプライン処理では、ＶＬＩＷ命
令１の完了後にＶＬＩＷ命令２が実行され、その完了後
に条件分岐命令（アトム）が含まれるＶＬＩＷ命令３が
実行される。条件分岐命令の実行時には、その分岐の動
作を分岐履歴記憶部１１５に保存する。In the pipeline processing, the VLIW instruction 2 is executed after the completion of the VLIW instruction 1, and the VLIW instruction 3 including a conditional branch instruction (atom) is executed after the completion. When the conditional branch instruction is executed, the operation of the branch is stored in the branch history storage unit 115.

【０２２３】ここで、分岐履歴記憶部１１５には、以下
の情報が登録される。・条件分岐命令のグループ値・条件分岐命令がラベルの方向に分岐した場合には“分
岐”を示す情報、そうでない場合には“直進”を示す情
報ＶＬＩＷ命令３の完了後にＶＬＩＷ命令４が実行され、
その完了後にＶＬＩＷ命令５が実行される。その際、Ｖ
ＬＩＷ命令３が分岐履歴記憶部１１５に記憶した情報が
“直進”の場合には、ｔｂｔ命令ではトラップが起きず
に、ＶＬＩＷ命令１のアトム２がレジスタ退避部１１４
に登録した情報はそのまま削除される。一方、分岐履歴
記憶部１１５に記憶した情報が“分岐”の場合には、ｔ
ｂｔ命令で投機実行ミストラップが生じる。The following information is registered in the branch history storage unit 115. -Group value of conditional branch instruction-Information indicating "branch" when the conditional branch instruction branches in the label direction, otherwise indicating "straight ahead" VLIW instruction 4 is executed after VLIW instruction 3 is completed And
After the completion, the VLIW instruction 5 is executed. At that time, V
When the information stored in the branch history storage unit 115 of the LIW instruction 3 is “straight ahead”, no trap occurs in the tbt instruction, and the atom 2 of the VLIW instruction 1 is stored in the register save unit 114.
The information registered in is deleted as it is. On the other hand, if the information stored in the branch history storage unit 115 is “branch”, t
The bt instruction causes a speculative execution mistrap.

【０２２４】投機実行ミストラップでは、分岐トラップ
命令の種類により以下の処理が行なわれる。・ｔｂｔ命令で投機実行ミストラップが起きた場合に
は、レジスタ退避部１１４のレジスタ情報のうち、その
ｔｂｔ命令と同じグループ値で投機性フラグが“０１”
の情報を持つレジスタを元の値に戻す・ｔｂｎ命令で投機実行ミストラップが起きた場合に
は、レジスタ退避部１１４のレジスタ情報のうち、その
ｔｂｎ命令と同じグループ値で投機性フラグが“１０”
の情報を持つレジスタを元の値に戻す本例の場合、レジスタ退避部１１４に登録された、グル
ープ値“１”で投機性フラグが“０１”の情報を持つレ
ジスタ、すなわちレジスタｒ５が、元の値に戻される。In the speculative execution mistrap, the following processing is performed depending on the type of the branch trap instruction. When a speculative execution mistrap occurs due to a tbt instruction, the speculative flag is “01” in the register information of the register saving unit 114 with the same group value as the tbt instruction.
When the speculative execution mistrap occurs with the tbn instruction, the register information of the register saving unit 114 has the same group value as the tbn instruction and the speculative flag is set to “10”. "
In this example, the register having the group value “1” and the information having the speculative flag “01” registered in the register saving unit 114, that is, the register r5 is the original register. Is returned to the value of

【０２２５】その後、次の条件分岐命令の実行時に、レ
ジスタ退避部１１４の情報のうちグループ値“１”を持
つ情報が削除される。Thereafter, when the next conditional branch instruction is executed, the information having the group value “1” among the information of the register saving unit 114 is deleted.

【０２２６】（第５の実施形態）第１〜第３の実施形態
では本発明をｉｎ−ｏｒｄｅｒ命令発行スーパースカラ
ープロセッサに適用した場合について説明し、第４の実
施形態では本発明をｉｎ−ｏｒｄｅｒ命令発行ＶＬＩＷ
に適用した場合について説明したが、本実施形態では、
本発明をｏｕｔ−ｏｆ−ｏｒｄｅｒ命令発行ＶＬＩＷプ
ロセッサに適用した場合について説明する。(Fifth Embodiment) In the first to third embodiments, the case where the present invention is applied to an in-order instruction issuing superscalar processor will be described, and in the fourth embodiment, the present invention will be described as an in-order instruction. Instruction issue VLIW
Has been described, but in the present embodiment,
A case in which the present invention is applied to an out-of-order instruction issuing VLIW processor will be described.

【０２２７】本発明はｏｕｔ−ｏｆ−ｏｒｄｅｒ命令発
行ＶＬＩＷ（以下、ダイナミックＶＬＩＷ）プロセッサ
における動的な投機的実行を可能とするものであるが、
その動的な投機的実行を可能としたダイナミックＶＬＩ
Ｗプロセッサについて説明する前に、ダイナミックＶＬ
ＩＷプロセッサの基本的な構成について説明する。The present invention enables dynamic speculative execution in an out-of-order instruction issuing VLIW (hereinafter, dynamic VLIW) processor.
Dynamic VLI that enables its dynamic speculative execution
Before describing the W processor, dynamic VL
The basic configuration of the IW processor will be described.

【０２２８】なお、以下では、第４の実施形態と同様に
１つのＶＬＩＷ命令を構成している個々の命令をアトム
と呼ぶことがある。図２９に、１つのＶＬＩＷ命令の一
例を示す。これは、３つのアトムから１つのＶＬＩＷ命
令が構成される例である。また、ＶＬＩＷ命令を構成す
る個々のアトムが入るべき位置を、スロットと呼ぶ。In the following, each instruction constituting one VLIW instruction may be called an atom, as in the fourth embodiment. FIG. 29 shows an example of one VLIW instruction. This is an example in which one VLIW instruction is composed of three atoms. The position where each atom constituting the VLIW instruction should enter is called a slot.

【０２２９】さて、命令レベルの並列度を上げる方法と
してコンパイル時にスタティックに資源を割り当て使用
するＶＬＩＷによる方法と、実行時に資源の割り当てを
ダイナミックに行うスーパースカラーの方法とがある。
ＶＬＩＷ方式ではコンパイラにより同時実行可能な命令
を検出するので、実行時に検出するメカニズムが必要な
く、実行時のハードウェアが単純化され、高い周波数が
達成される可能性がある。しかし、コンパイラにより同
時実行可能な命令を検出する方法にはコンパイラでは完
全に予測できないあるいは現実的に予測不可能なパラメ
ータが存在する。As a method of increasing the parallelism at the instruction level, there are a VLIW method in which resources are statically allocated and used at compile time and a superscalar method in which resources are dynamically allocated at execution time.
In the VLIW method, since a compiler detects instructions that can be executed simultaneously, a mechanism for detecting the instructions at the time of execution is not required, hardware at the time of execution is simplified, and a high frequency may be achieved. However, the method of detecting instructions that can be executed simultaneously by the compiler includes parameters that cannot be completely predicted by the compiler or that cannot be realistically predicted.

【０２３０】このダイナミックＶＬＩＷ方式は、スーパ
ースカラー方式とＶＬＩＷ方式の中間に位置するもの
で、基本的にはＶＬＩＷ方式でありながら一部をダイナ
ミックに実行することにより、コンパイラ時に予測困難
な事項に対してもある程度ダイナミックに動作し、プロ
セッサ全体を止めることなく処理を進めることができる
ようにしたものである。つまり、このダイナミックＶＬ
ＩＷ方式は、ハードウェアとソフトウェア（コンパイ
ラ）の新たな最適点を求め、性能を最適化することを目
指したものである。The dynamic VLIW method is located between the super scalar method and the VLIW method. Basically, the dynamic VLIW method partially executes the VLIW method to solve a problem that is difficult to predict at the time of compiler. However, it operates to some extent dynamically and can proceed with processing without stopping the entire processor. That is, this dynamic VL
The IW method seeks a new optimal point of hardware and software (compiler) and aims at optimizing performance.

【０２３１】ダイナミックＶＬＩＷ方式によるプロセッ
サの基本的な構成においては、フェッチしたが実行でき
ないアトムを、後続のアトムを先行して実行させること
を可能とするために一時待避させておくためのペンディ
ングキューを備え、各レジスタの使用状況に関する情報
を記憶・管理し、この情報に基づいて、フェッチしたア
トムの実行可否の判断を行い、実行可能であればフェッ
チしたアトムを実行し、実行可能でないならばフェッチ
したアトムをペンディングキューに蓄積するとともに、
ペンディングキューに蓄積されているアトムの実行可否
の判断を行い、実行可能であれば該アトムを実行するこ
とにより、先行するアトムが直ちには実行できない場合
にこれを一時待避しておき後続のアトムを先に実行でき
るようにしている。In the basic configuration of a processor based on the dynamic VLIW method, a pending queue for temporarily saving an fetched but unexecutable atom so that a subsequent atom can be executed in advance is provided. It stores and manages information on the use status of each register, determines whether or not the fetched atom can be executed based on this information, executes the fetched atom if executable, and fetches if not executable The accumulated atoms are stored in the pending queue,
Judgment of the execution of the atom stored in the pending queue is performed. If the execution is possible, the atom is executed. It can be executed first.

【０２３２】このダイナミックＶＬＩＷ方式は、ＶＬＩ
Ｗ命令ごとにアトムをフェッチしていく点は従来のｉｎ
−ｏｒｄｅｒ命令発行ＶＬＩＷと同様であるが、同時に
フェッチしたＶＬＩＷ命令の複数のアトムのうちに実行
できないものがでてきた場合に、従来のｉｎ−ｏｒｄｅ
ｒ命令発行ＶＬＩＷでは常にフェッチを中断することに
なるが、このダイナミックＶＬＩＷ方式ではフェッチを
中断させないで済む可能性がでてくるわけである。The dynamic VLIW method is based on the VLI
The point of fetching an atom for each W instruction is the conventional in
-Order instruction issuance VLIW is the same as that of the VLIW instruction, but when a plurality of atoms of the simultaneously fetched VLIW instruction cannot be executed, the conventional in-order
In the r instruction issue VLIW, the fetch is always interrupted, but in the dynamic VLIW method, there is a possibility that the fetch need not be interrupted.

【０２３３】図３０は、このようなダイナミックＶＬＩ
Ｗプロセッサの基本的な構成を表す概念的な図である。
図３０では、２つのパイプラインユニット（１００６−
１，１００６−２）を持つ場合を例としている。このダ
イナミックＶＬＩＷプロセッサは、命令列からフェッチ
したアトムが直ちには実行できない場合にこれを実行待
ちとして待避させておくためのペンディングキュー（Ｐ
ｅｎｄｉｎｇＱｕｅｕｅ）というスロット毎に独立に
設けたキュー１００２−１，１００２−２と、各レジス
タの使用状況に関する情報を各レジスタ毎に管理するた
めのスコアボード１００４というテーブルを用いて、ア
ウトオブオーダーを実現している例である。FIG. 30 shows such a dynamic VLI.
It is a conceptual diagram showing the basic structure of W processor.
In FIG. 30, two pipeline units (1006-
1, 1006-2). This dynamic VLIW processor is provided with a pending queue (P) for saving an atom fetched from an instruction sequence as an execution wait when the atom cannot be immediately executed.
Out-of-order is determined using queues 1002-1 and 1002-2 provided independently for each slot called "ending queue" and a scoreboard 1004 for managing information on the use status of each register for each register. This is an example that has been realized.

【０２３４】フェッチされたＶＬＩＷ命令の複数のアト
ムのうち実行されないアトムは、実行可能になるまで、
対応するペンディングキューに保存される。An atom that is not executed among the plurality of atoms of the fetched VLIW instruction is executed until it becomes executable.
Stored in the corresponding pending queue.

【０２３５】ペンディングキューはＦＩＦＯ（先入れ先
出し型のバッファ）で構成すると好ましい。ペンディン
グキューをＦＩＦＯで構成すると、ペンディングキュー
に蓄積された先頭のアトムから順に実行されることにな
り、この点が従来のスーパースカラーのリオーダーバッ
ファの場合と異なってくる。つまり、実行可能なアトム
がペンディングキューに存在するのに実行できない場合
があるという性能上の制約と引き換えに、ハードウェア
を非常に単純化させて高速化を図ることができる。The pending queue is preferably constituted by a FIFO (first-in first-out buffer). When the pending queue is configured by FIFO, the execution is performed in order from the first atom stored in the pending queue, which is different from the conventional superscalar reorder buffer. In other words, in exchange for a performance constraint that an executable atom may not be executable even if it exists in the pending queue, the hardware can be greatly simplified and the speed can be increased.

【０２３６】さらに、ペンディングキューは、ＶＬＩＷ
命令を構成する個々のアトムが入るべきスロットごとに
設けるのが好ましい。例えば、図３０に例示したＶＬＩ
Ｗ命令の形式を使う場合には、スロットが２つあるの
で、ペンディングキューは２つ用意されることになる。
そして、フェッチされたＶＬＩＷ命令のうち実行されな
いアトムは、そのスロットに対応するペンディングキュ
ーに投入する。このようにスロットごとにペンディング
キューが存在し、スロット間をまたぐことがないこと
も、ハードウェアを単純化して高速化を図るための制限
の一つになる。Further, the pending queue is a VLIW
It is preferably provided for each slot into which the individual atoms making up the instruction are to enter. For example, the VLI illustrated in FIG.
When the format of the W instruction is used, since there are two slots, two pending queues are prepared.
Then, an atom that is not executed among the fetched VLIW instructions is put into a pending queue corresponding to the slot. The fact that a pending queue exists for each slot as described above and does not span slots also constitutes one of the limitations for simplifying hardware and increasing the speed.

【０２３７】各サイクル／各スロットにおいて、実行の
機会を与えるアトムには、通常の命令列からフェッチし
たアトムと、ペンディングキューが空でない場合におけ
るペンディングキューからのアトムとがあり得るが、
（１）フェッチしたアトム、（２）ペンディングキュー
のアトムの順に、実行が優先される。In each cycle / slot, an atom giving an opportunity to execute may be an atom fetched from a normal instruction sequence or an atom from the pending queue when the pending queue is not empty.
Execution is prioritized in the order of (1) the fetched atom and (2) the atom in the pending queue.

【０２３８】実行の機会が与えられたアトム（フェッチ
したアトムまたはペンディングキューの先頭にあるアト
ム）が実行可能かどうかについての判定は、スコアボー
ドの内容（当該アトムに関連するレジスタの使用状況）
に基づいて行い、基本的には、当該アトムが使うレジス
タが当該アトムにとって利用可能でないときは、当該ア
トムが実行できないと判定される。The determination as to whether an atom given an opportunity to be executed (a fetched atom or an atom at the head of the pending queue) is executable is based on the contents of the scoreboard (use of registers related to the atom).
Basically, if the register used by the atom is not available to the atom, it is determined that the atom cannot be executed.

【０２３９】以上のように、本ダイナミックＶＬＩＷ方
式では、直ちには実行できないアトムをペンディングキ
ューに一時待避しておき、それが実行可能になったら実
行するという方法で、アウトオブオーダーを実現してい
る。As described above, in the dynamic VLIW method, an out-of-order method is realized by temporarily saving atoms that cannot be executed immediately in the pending queue and executing them when they become executable. .

【０２４０】なお、このダイナミックＶＬＩＷ方式で
は、レジスタについては、プロセッサ内にリネーミング
の構成を持たず、コンパイラによりレジスタを割り当て
るものとする。レジスタリネーミングを行わないように
することで、ハードウェアを単純にすることができる。
なお、このために、ＶＬＩＷの命令列を生成するコンパ
イラとして、フォールスディペンデンシが起こらないよ
うにレジスタ割付を行うものが用いられる（公知のコン
パイラで構わない）。In the dynamic VLIW system, registers are assigned by a compiler without having a renaming configuration in the processor. Eliminating register renaming can simplify the hardware.
For this purpose, as a compiler that generates a VLIW instruction sequence, a compiler that assigns registers so that false dependency does not occur is used (a known compiler may be used).

【０２４１】次に、このダイナミックＶＬＩＷの作用効
果を示すために、簡単な例を使ってその概要を説明す
る。Next, the outline of the dynamic VLIW will be described with reference to a simple example in order to show the effects of the dynamic VLIW.

【０２４２】図３１に、実行される命令列の例として、
一つのＶＬＩＷ命令に二つのアトムが含まれる場合の命
令列の一例を示す。FIG. 31 shows an example of an instruction sequence to be executed.
An example of an instruction sequence when two atoms are included in one VLIW instruction is shown.

【０２４３】図３１に示されるように、この命令列は、
ＡＤＤＲ８，Ｒ９，Ｒ１０とＬＤＲ５，（Ｒ３）、
ＬＤＩＲ１８，１０００とＡＤＤＩＲ１３，Ｒ９，
４、ＡＤＤＲ２１，Ｒ１８，Ｒ９とＳＵＢＲ１１，
Ｒ５，Ｒ８、ＬＳＲＲ２２，Ｒ２１，５とＯＲＩ
Ｒ２４，Ｒ２１，０ｘＦＦ、ＳＵＢＩＲ２５，Ｒ２
４，５とＮＯＰ、ＢＲＺＲ１１，Ｒ０，ＲＯＯＰ＿Ｅ
ＸＴとＮＯＰが、この順に１組ずつフェッチされること
になる。As shown in FIG. 31, this instruction sequence is
ADD R8, R9, R10 and LD R5, (R3),
LDI R18,1000 and ADDI R13, R9,
4, ADD R21, R18, R9 and SUB R11,
R5, R8, LSR R22, R21, 5 and ORI
R24, R21, 0xFF, SUBI R25, R2
4, 5 and NOP, BRZ R11, R0, ROOP_E
XT and NOP are fetched one by one in this order.

【０２４４】以下、図３１に例示した命令列が従来のｉ
ｎ−ｏｒｄｅｒ命令発行ＶＬＩＷ方式とダイナミックＶ
ＬＩＷ方式とでそれぞれ実行された場合について比較し
て説明する。In the following, the instruction sequence illustrated in FIG.
n-order instruction issue VLIW method and dynamic V
A description will be made in comparison with the case where the respective processes are executed by the LIW method.

【０２４５】図３２に、この命令列が従来のｉｎ−ｏｒ
ｄｅｒ命令発行ＶＬＩＷ方式で実行された場合の様子を
示し、図３３に、この命令列がダイナミックＶＬＩＷ方
式で実行された場合の様子を示す。FIG. 32 shows that this instruction sequence is a conventional in-or
FIG. 33 shows a case where the der instruction is executed by the VLIW system, and FIG. 33 shows a case where the instruction sequence is executed by the dynamic VLIW system.

【０２４６】図３２と図３３の例では、最初のＶＬＩＷ
命令の第２スロットのアトムであるＬＤ（ロード命令）
が１次キャッシュでミスを起こし、該当するデータが２
次キャッシュに存在したために、これをロードしてくる
のに４サイクル必要となったものとする。In the examples of FIGS. 32 and 33, the first VLIW
LD (load instruction), which is the atom of the second slot of the instruction
Caused a miss in the primary cache and the corresponding data was 2
It is assumed that four cycles are required to load this because it exists in the next cache.

【０２４７】図３２に示されるように、この命令列を従
来のｉｎ−ｏｒｄｅｒ命令発行ＶＬＩＷ方式により実行
した場合、サイクル１では、第１スロットのＡＤＤＲ
８，Ｒ９，Ｒ１０と第２スロットのＬＤＲ５，（Ｒ
３）が実行されるが、第２スロットのＬＤがキャシュミ
スを起こしたため、サイクル２〜５の４サイクルは第
１、第２スロットともにＬＤのミスによるストールにな
り（この間、フェッチが中断する）、その後は、順次命
令が実行され、結局、１０サイクルを要して処理が完了
している。As shown in FIG. 32, when this instruction sequence is executed by the conventional in-order instruction issuance VLIW method, in cycle 1, ADD R of the first slot
8, R9, R10 and LD R5, (R
3) is executed, but since the LD in the second slot causes a cache miss, the four cycles of cycles 2 to 5 are stall due to the LD miss in both the first and second slots (while fetch is interrupted). Thereafter, the instructions are sequentially executed, and as a result, the processing is completed in 10 cycles.

【０２４８】次に、図３３に示されるように、この命令
列をダイナミックＶＬＩＷ方式により実行した場合、ま
ず、サイクル１では、第１スロットのＡＤＤＲ８，Ｒ
９，Ｒ１０と第２スロットのＬＤＲ５，（Ｒ３）が実
行され、ＬＤがキャシュミスを起す。次のサイクルから
は、このＬＤのディスティネーション・レジスタである
Ｒ５を使用するアトムは、ＬＤが完了するまで実行でき
なくなる（このレジスタＲ５の状況は、スコアボードに
反映される）。Next, as shown in FIG. 33, when this instruction sequence is executed by the dynamic VLIW method, first, in cycle 1, ADD R8, R
9, R10 and LD R5, (R3) in the second slot are executed, and the LD causes a cache miss. From the next cycle, an atom using the destination register R5 of this LD cannot be executed until the LD is completed (the status of this register R5 is reflected on the scoreboard).

【０２４９】サイクル２では、ＶＬＩＷ命令の各アトム
はＬＤのディスティネーション・レジスタであるＲ５を
使用しないため、ＬＤＩＲ１８，１０００とＡＤＤＩ
Ｒ１３，Ｒ９，４が実行される。In cycle 2, since each atom of the VLIW instruction does not use R5, which is the destination register of LD, LDI R18,1000 and ADDI
R13, R9, and 4 are executed.

【０２５０】サイクル３では、第１スロットのＡＤＤ
Ｒ２１，Ｒ１８，Ｒ９はＲ５を使用しないため実行され
るが、第２スロットのＳＵＢＲ１１，Ｒ５，Ｒ８は、
Ｒ５を第１のソースレジスタとして参照するので、実行
できずにペンディングキューへ投入される（スコアボー
ドを参照することによってＲ５が使用できないことが分
かることから、実行できないことが分かる）。また、次
のサイクルからは、ＳＵＢのディスティネーション・レ
ジスタであるＲ１１を使用するアトム（このＳＵＢを除
く）は、このＳＵＢが完了するまで実行できなくなる
（このレジスタＲ１１の状況も、スコアボードに反映さ
れる）。At cycle 3, ADD of the first slot
R21, R18, and R9 are executed because R5 is not used, but SUB R11, R5, and R8 in the second slot are
Since R5 is referred to as the first source register, it cannot be executed and is put into the pending queue (it can be seen that R5 cannot be used by referring to the scoreboard, so it cannot be executed). Also, from the next cycle, atoms (excluding this SUB) using the SUB destination register R11 cannot be executed until this SUB is completed (the status of this register R11 is also reflected on the scoreboard). Is done).

【０２５１】サイクル４では、Ｒ５もＲ１１も使用され
ないので、ＬＳＲＲ２２，Ｒ２１，５とＯＲＩＲ２
４，Ｒ２１，０ｘＦＦが実行される。In cycle 4, since neither R5 nor R11 is used, LSR R22, R21, 5 and ORI R2
4, R21, 0xFF is executed.

【０２５２】サイクル５では、Ｒ５もＲ１１も使用され
ないので、ＳＵＢＩＲ２５，Ｒ２４，５とＮＯＰが実
行される。In cycle 5, since neither R5 nor R11 is used, NOBI is executed with SUBI R25, R24, 5.

【０２５３】ここで、ＬＤが完了し、次のサイクルから
は、Ｒ５が使用可能となる（このレジスタＲ５の状況
も、スコアボードに反映される）。Here, the LD is completed, and from the next cycle, R5 becomes available (the status of this register R5 is also reflected on the scoreboard).

【０２５４】サイクル６では、まず、第１スロットのＢ
ＲＺＲ１１，Ｒ０，ＲＯＯＰ＿ＥＸは、Ｒ１１をディ
スティネーションとするので、実行できないことがわか
る。なお、詳しくは後述するが、ディスティネーション
とするレジスタが使用できない場合には、ペンディング
キューへは投入せずに、実行可能になるのを待つ（フェ
ッチを中断する）。従って、このサイクルは、空きスロ
ットとなる。フェッチが中断するので、フェッチした第
２スロットの命令も実行が保留される。In cycle 6, first, B in the first slot
RZ R11, R0, and ROOP_EX cannot be executed because R11 is the destination. As will be described in detail later, when the register to be used as the destination cannot be used, the register is not put into the pending queue and waits for the executable state (the fetch is interrupted). Therefore, this cycle becomes an empty slot. Since the fetch is interrupted, the execution of the fetched instruction in the second slot is also suspended.

【０２５５】ここで、第２スロットでは、フェッチの中
断が発生したので、ペンディングキュー中のアトムに実
行の機会が与えられる。ペンディングキューにあるＳＵ
ＢＲ１１，Ｒ５，Ｒ８は、先のＬＤが完了し、Ｒ５が使
用可能となっているので、実行可能であり（スコアボー
ドを参照することによって実行できることが分かる）、
したがってＳＵＢＲ１１，Ｒ５，Ｒ８がペンディング
キューから取り出され、実行される。Here, in the second slot, since the interruption of the fetch has occurred, the execution opportunity is given to the atom in the pending queue. SUs in the pending queue
BR11, R5, and R8 are executable because the previous LD has been completed and R5 is available (it can be understood by referring to the scoreboard).
Therefore, SUB R11, R5, and R8 are taken out of the pending queue and executed.

【０２５６】ここで、ＳＵＢが完了し、次のサイクルか
らは、Ｒ１１が使用可能となる（このレジスタＲ１１の
状況も、スコアボードに反映される）。Here, SUB is completed, and R11 becomes available from the next cycle (the status of this register R11 is also reflected on the scoreboard).

【０２５７】サイクル７では、第１スロットで実行を待
っていたＢＲＺＲ１１，Ｒ０，ＲＯＯＰ＿ＥＸＴが、
実行可能となって、実行され、第２スロットでは実行を
待っていたＮＯＰが実行される。In cycle 7, BRZ R11, R0, ROOP_EXT waiting for execution in the first slot are
Executable and executed, and in the second slot, the NOP waiting for execution is executed.

【０２５８】この結果、７サイクルを要して処理が完了
したことになる。As a result, the process is completed in seven cycles.

【０２５９】以上のように、従来のｉｎ−ｏｒｄｅｒ命
令発行ＶＬＩＷ方式では１０サイクルかかるところが、
ダイナミックＶＬＩＷ方式ではＬＤアトムによるミスの
期間中に他のアトムが実行できるアウトオブオーダーの
機能により、７サイクルで実行が完了し、高速化できる
ことがわかる。As described above, where the conventional in-order instruction issuance VLIW system takes 10 cycles,
It can be seen that in the dynamic VLIW method, the execution is completed in seven cycles and the speed can be increased by an out-of-order function that can be executed by another atom during a miss due to an LD atom.

【０２６０】以上、アウトオブオーダーを可能としたダ
イナミックＶＬＩＷプロセッサの基本的な構成について
説明した。The basic configuration of the dynamic VLIW processor capable of out-of-order has been described above.

【０２６１】さて、以下では、アウトオブオーダーを可
能とし更に条件分岐命令およびロード・ストア命令の動
的な投機的実行をも可能とした本実施形態に係るダイナ
ミックＶＬＩＷプロセッサについて詳しく説明する。Now, the dynamic VLIW processor according to the present embodiment, which enables out-of-order and enables dynamic speculative execution of a conditional branch instruction and a load / store instruction, will be described in detail below.

【０２６２】図３４に、本実施形態に係る動的な投機的
実行可能なダイナミックＶＬＩＷプロセッサの構成例を
示す。FIG. 34 shows an example of the configuration of a dynamic VLIW processor that can be dynamically speculatively executed according to the present embodiment.

【０２６３】図３４に示されるように、本実施形態のプ
ロセッサは、メモリ２０１、アドレス生成部２０３、命
令フェッチ部２０２、プロセッサ状態レジスタ２０４、
命令デコード部２０５、グループ値生成部２０６、ペン
ディングキュー２０９〜２１１、命令実行許可部２０
７、オペランド状態判定部２０８、命令実行ユニット２
１２〜２１４、投機性判定部２１５〜２１７、レジスタ
退避部２１８、レジスタ２１９を持つ。As shown in FIG. 34, the processor of this embodiment comprises a memory 201, an address generation unit 203, an instruction fetch unit 202, a processor status register 204,
Instruction decoding unit 205, group value generation unit 206, pending queues 209 to 211, instruction execution permission unit 20
7, operand state determination unit 208, instruction execution unit 2
12 to 214, a speculative determination unit 215 to 217, a register saving unit 218, and a register 219.

【０２６４】図１の構成例と比べてプロセッサ状態レジ
スタ２０４と命令実行許可部２０７が設けられている点
が主に異なっている。The main difference from the configuration example of FIG. 1 lies in that a processor status register 204 and an instruction execution permitting unit 207 are provided.

【０２６５】本実施形態では、命令実行ユニットは３つ
であり、すなわち１つのＶＬＩＷ命令が３つの命令から
構成される場合の例を示しているが、命令実行ユニット
が２つあるいは４以上の場合、すなわち１つのＶＬＩＷ
命令が２あるいは４以上の命令から構成される場合にも
本発明は適用可能である。In the present embodiment, there is shown an example in which there are three instruction execution units, that is, one VLIW instruction is composed of three instructions. However, in the case where there are two or four or more instruction execution units, , Ie one VLIW
The present invention is applicable to a case where an instruction is composed of two or more instructions.

【０２６６】以下では、本実施形態が第１の実施形態と
相違する点を中心に説明する。In the following, description will be made focusing on the points in which this embodiment is different from the first embodiment.

【０２６７】最初に、図３４のダイナミックＶＬＩＷプ
ロセッサの概要について説明する。First, the outline of the dynamic VLIW processor shown in FIG. 34 will be described.

【０２６８】命令のフェッチ、デコードは、従来のもし
くは第４の実施形態のｉｎ−ｏｒｄｅｒ命令発行ＶＬＩ
Ｗプロセッサと同様にして行なう。デコードされたＶＬ
ＩＷ命令は、オペランド状態判定部２０８により各アト
ムが用いるオペランドが使用可能かどうかが判定され
る。Instruction fetching and decoding are performed in accordance with the conventional or fourth embodiment of the in-order instruction issuing VLI.
This is performed in the same manner as in the W processor. Decoded VL
For the IW instruction, the operand state determination unit 208 determines whether the operand used by each atom is usable.

【０２６９】全てのオペランドが使用可能な場合には、
従来のもしくは第４の実施形態のｉｎ−ｏｒｄｅｒ命令
発行ＶＬＩＷプロセッサと同様にして命令の実行が開始
される。If all operands are available,
Instruction execution is started in the same manner as in the conventional or the in-order instruction issuing VLIW processor of the fourth embodiment.

【０２７０】使用できないオペランドが存在する場合に
は、そのオペランドを使用するアトムが、・条件分岐命令、ロード命令もしくはストア命令の場
合、または・命令の出力オペランドが使用できない場合にはストールし、オペランドが使用可能になった時点で
実行が開始される。If there is an operand that cannot be used, the atom that uses the operand is a conditional branch instruction, a load instruction or a store instruction, or if the output operand of the instruction cannot be used, the operation is stalled. Execution is started when becomes available.

【０２７１】一方、他の種類の命令が、使用できないオ
ペランドを用いる場合には、ストールさせずに、これら
の命令のデコード情報を対応するペンディングキュー２
０９〜２１１に挿入する。On the other hand, when another type of instruction uses an unusable operand, the decode information of these instructions is stored in the corresponding pending queue 2 without stalling.
09 to 211.

【０２７２】そして、実行可能な命令のみを命令実行ユ
ニット２１２〜２１４に送って実行する。Then, only executable instructions are sent to the instruction execution units 212 to 214 and executed.

【０２７３】ペンディングキュー２０９〜２１１にある
アトムは、その後にデコードされたＶＬＩＷ命令の対応
するスロットのアトムが実行できない場合に、オペラン
ド状態判定部２０８により使用可能と判定されたなら
ば、ペンディングキュー２０９〜２１１の先頭から命令
実行ユニット２１２〜２１４に送られて実行される。If the atoms in the pending queues 209 to 211 cannot be executed in the slot corresponding to the subsequently decoded VLIW instruction and the operand state determination unit 208 determines that they can be used, the pending queue 209 To 211 are sent to the instruction execution units 212 to 214 and executed.

【０２７４】アトムの実行から完了までの動作は従来の
もしくは第４の実施形態のｉｎ−ｏｒｄｅｒ命令発行Ｖ
ＬＩＷプロセッサと同様である。The operation from the execution of the atom to the completion thereof is the same as that of the in-order instruction issuance V according to the conventional or fourth embodiment.
It is similar to the LIW processor.

【０２７５】オペランド状態判定部２０８は、例えば、
以下のようにして第１の実施形態で示したスコアボード
方式を用いて実現することができる。・アトムがペンディングキューに入ったとき、そのアト
ムが用いる入力レジスタ、出力レジスタを利用中とす
る。・ロードアトムの実行が開始されたとき、その出力レジ
スタを利用中とする。・あるレジスタを入力として用いるアトムが全てのペン
ディングキューからなくなった時点で、そのレジスタを
利用可とする。・出力レジスタについては、上の条件に加えてそのレジ
スタに書くロードが完了した時点で利用可とする。[0275] The operand state determination unit 208
It can be realized using the scoreboard method shown in the first embodiment as follows. -When an atom enters the pending queue, it is assumed that the input register and output register used by the atom are being used. When the execution of the load atom is started, the output register is used. -When an atom that uses a register as an input is removed from all pending queues, the register is made available. • The output register is available when the load to write to that register is completed in addition to the above conditions.

【０２７６】次に、図３４のダイナミックＶＬＩＷプロ
セッサで投機的実行を行なう場合について詳しく説明す
る。Next, a case where speculative execution is performed by the dynamic VLIW processor shown in FIG. 34 will be described in detail.

【０２７７】図３５に、本実施形態のプロセッサにおけ
るパイプライン処理の流れの一例を示す。FIG. 35 shows an example of the flow of pipeline processing in the processor of this embodiment.

【０２７８】命令フェッチ（ステップＳ６１）、命令デ
コード（ステップＳ６２）、グループ値取得（ステップ
Ｓ６３）が、順次行われる。Instruction fetch (step S61), instruction decode (step S62), and group value acquisition (step S63) are sequentially performed.

【０２７９】デコードされたＶＬＩＷ命令中のアトムの
うち、命令実行許可部２０７により実行が許可されない
アトムは、命令の種類によらずペンディングキュー２０
９〜２１１に入れられる（ステップＳ６４，Ｓ７２）。
その際、グループ値生成部２０６によりアトムにグルー
プ値が付加され、このグループ値もペンディングキュー
２０９〜２１１に入れられる。[0279] Of the atoms in the decoded VLIW instruction, those whose execution is not permitted by the instruction execution permitting unit 207 are set in the pending queue 20 regardless of the type of instruction.
9 to 211 (steps S64 and S72).
At this time, a group value is added to the atom by the group value generation unit 206, and this group value is also put in the pending queues 209 to 211.

【０２８０】デコードされたＶＬＩＷ命令中のアトムの
うち、命令実行許可部２０７により実行が許可されたア
トムは、命令実行ユニット２１２〜２１４に転送され実
行される（ステップＳ６４，Ｓ６５）。この場合にも、
グループ値生成部２０６によりアトムにグループ値が付
加される。Of the atoms in the decoded VLIW instruction, those whose execution is permitted by the instruction execution permitting unit 207 are transferred to the instruction execution units 212 to 214 and executed (steps S64 and S65). Again, in this case,
The group value is added to the atom by the group value generation unit 206.

【０２８１】一方、ペンディングキュー２０９〜２１１
にあるアトムは、その後にデコードされたＶＬＩＷ命令
中の対応するスロットのアトムが実行できないときに、
命令実行許可部２０７により実行が許可されれば（ステ
ップＳ７３，Ｓ７４）、ペンディングキューから命令実
行ユニット２１２〜２１４に転送されて実行される（ス
テップＳ６５）。On the other hand, pending queues 209 to 211
At the corresponding slot in the subsequently decoded VLIW instruction cannot be executed.
If the execution is permitted by the instruction execution permission unit 207 (steps S73 and S74), the instruction is transferred from the pending queue to the instruction execution units 212 to 214 and executed (step S65).

【０２８２】アトムの実行後、投機性判定部２１５〜２
１７により命令実行が投機的であったかどうかを判定し
（ステップＳ６７）、投機的な場合には、変更するレジ
スタの識別情報と、そのレジスタの変更前の値と、その
アトムのグループ値とをレジスタ退避部２１８に登録す
る（ステップＳ６８）。その後、プロセッサの状態が更
新される（ステップＳ６９）。After the execution of the atom, the speculativeness judgment sections 215-2
It is determined whether or not the instruction execution is speculative by step 17 (step S67). If it is speculative, the identification information of the register to be changed, the value of the register before the change, and the group value of the atom are registered in the register. The information is registered in the evacuation unit 218 (step S68). Thereafter, the state of the processor is updated (step S69).

【０２８３】条件分岐命令の実行時に、フェッチ時に予
測した分岐先と実際の分岐先が異なることが判明した場
合、分岐予測ミストラップを掛けて（ステップＳ６
６）、フェッチ処理、デコード処理をキャンセルし、ト
ラップを起こしたアトムを含むＶＬＩＷ命令より後のＶ
ＬＩＷ命令に属するアトムを全て消去する（ステップＳ
７０）。命令フェッチはトラップ処理の完了まで停止さ
れる。実行中のアトムが全て完了したら、レジスタ退避
部２１８に登録された全てのレジスタを登録された元の
値に戻す（ステップＳ７１）。また、グループ値生成部
２０６を初期状態に戻す。そして、アドレス生成部２０
３の示すアドレスを正しい分岐先に設定してトラップ処
理を完了する。If it is found that the branch destination predicted at the time of fetch is different from the actual branch destination during execution of the conditional branch instruction, a branch prediction mistrap is applied (step S6).
6) The fetch processing and decoding processing are canceled, and the V after the VLIW instruction including the atom that caused the trap
Delete all atoms belonging to the LIW instruction (step S
70). Instruction fetch is stopped until the trap processing is completed. When all the atoms being executed have been completed, all the registers registered in the register saving unit 218 are returned to the registered original values (step S71). Further, the group value generation unit 206 is returned to the initial state. Then, the address generation unit 20
The address indicated by No. 3 is set as a correct branch destination, and the trap processing is completed.

【０２８４】ロード命令、ストア命令の実行時にページ
フォルトが発生した場合、ページフォルトトラップを掛
ける（ステップＳ６６）。ページフォルトトラップで
は、分岐予測ミストラップと同様に、フェッチ処理、デ
コード処理のキャンセル、ペンディングキュー２０９〜
２１１の消去（ステップＳ７０）、およびレジスタ値の
復帰を行ない（ステップＳ７１）、この間、命令フェッ
チは停止する。同時に、ページング処理が行なわれ、こ
れらの処理が完了したらグループ値生成部２０６を初期
状態に戻し、アドレス生成部２０３の示すアドレスをロ
ード命令、ストア命令を含むＶＬＩＷ命令のアドレスに
設定してトラップ処理を完了する。If a page fault occurs during execution of a load instruction or a store instruction, a page fault trap is performed (step S66). In the page fault trap, similarly to the branch prediction mistrap, fetch processing, decoding processing cancellation, pending queue
The erasure of 211 (step S70) and the restoration of the register value are performed (step S71), during which the instruction fetch stops. At the same time, paging processing is performed. When these processings are completed, the group value generation unit 206 is returned to the initial state, and the address indicated by the address generation unit 203 is set to the address of the VLIW instruction including the load instruction and the store instruction, and the trap processing is performed. Complete.

【０２８５】図３６に、ペンディングキューの構成例を
示す。FIG. 36 shows a configuration example of the pending queue.

【０２８６】ペンディングキュー２０９〜２１１には、
命令のデコード情報、命令のグループ値とともに、ｉｎ
−ｏｒｄｅｒフラグと呼ぶ１ビットの情報が登録され
る。以下の命令（コミット命令と呼ぶ）がペンディング
キューに入れられるときはｉｎ−ｏｒｄｅｒフラグを１
とし、他の命令の場合には０とする。・条件分岐命令、ロード命令、ストア命令・オペランド状態判定部２０８により、出力オペランド
が使用できないと判定された命令プロセッサ状態レジスタ２０４は、２ビットからなり、
図３７の意味を持つ。値“０１”のときには、条件分岐
命令、ロード命令、ストア命令は、用いるオペランドが
使用可能であってもペンディングキュー２０９〜２１１
に入れられる。値“１０”のときには、命令フェッチ部
２０２は命令のフェッチを行なわない。In the pending queues 209 to 211,
Instruction decode information, instruction group value,
One-bit information called an -order flag is registered. When the following instruction (called a commit instruction) is put in the pending queue, the in-order flag is set to 1
And 0 for other instructions. -Conditional branch instruction, load instruction, store instruction-Instruction whose output operand cannot be used by the operand state determination unit 208. The processor state register 204 has 2 bits.
It has the meaning of FIG. When the value is “01”, the conditional queue instruction, the load instruction, and the store instruction are assigned to the pending queues 209 to 211 even if the operand to be used is available.
Can be put in. When the value is “10”, the instruction fetch unit 202 does not fetch an instruction.

【０２８７】グループ値生成部２０６は、第１の実施形
態と同様にビットベクタ方式による実現あるいはハード
ウェアカウンタ方式による実現が可能である。ただし、
本実施形態においては、ペンディングキュー２０９〜２
１１に入るコミット命令から、次にペンディングキュー
２０９〜２１１に入るコミット命令までが、一つの命令
グループとなる。同一のＶＬＩＷ命令中にあるアトム間
の順序は、アトム１からアトム３への順序とする。The group value generation unit 206 can be realized by a bit vector system or a hardware counter system as in the first embodiment. However,
In the present embodiment, the pending queues 209 to 2
The group from the commit instruction entering 11 to the next commit instruction entering the pending queues 209 to 211 is one instruction group. The order between atoms in the same VLIW instruction is from atom 1 to atom 3.

【０２８８】なお、本実施形態において、コミット命令
間の実行順序は保存されなければならない。そのため、
グループ値生成部２０６をビットベクタ方式で実現する
場合には、これらの順序を把握するために別の機構が必
要となる。この機構は、以下のようなＦＩＦＯにより実
現することができる。ＦＩＦＯの先頭に近い命令ほど前
の命令と判断される。・命令に割り当てるグループ値が変更されたら、そのグ
ループ値をＦＩＦＯの最後尾に入れる・ＦＩＦＯの先頭にあるグループ値を持つｉｎ−ｏｒｄ
ｅｒフラグが１の命令がペンディングキューから発行さ
れたら、そのグループ値をＦＩＦＯから出す一方、ハードウェアカウンタによる実現は、グループ値
が重複しないことを保証するために実行命令カウンタ
（図５参照）を備えている。この実現では命令Ａ、命令
Ｂ間の順序は、命令Ａのグループ値をｘ、命令Ｂのグル
ープ値をｙ、実行命令カウンタの値をｚとすると、図３
８のように判定される。In this embodiment, the order of execution between commit instructions must be preserved. for that reason,
When the group value generation unit 206 is implemented by the bit vector method, another mechanism is required to grasp the order. This mechanism can be realized by the following FIFO. An instruction closer to the head of the FIFO is determined to be a previous instruction. When the group value assigned to the instruction is changed, put the group value at the end of the FIFO. In-ord having the group value at the beginning of the FIFO
When an instruction whose er flag is 1 is issued from the pending queue, the group value is output from the FIFO. On the other hand, the hardware counter implementation includes an execution instruction counter (see FIG. 5) to ensure that the group values do not overlap. ing. In this realization, the order between the instruction A and the instruction B is as follows: if the group value of the instruction A is x, the group value of the instruction B is y, and the value of the execution instruction counter is z
8 is determined.

【０２８９】命令実行許可部２０７は、命令の対応する
実行ユニットへの転送を許可する。ある命令が用いるオ
ペランドの一部が使用できない場合には、その命令の実
行は許可されない。全てのオペランドが使用可能な場合
には、プロセッサ状態レジスタ２０４の値が“００”な
ら命令の実行は許可される。プロセッサ状態レジスタ２
０４の値が“０１”の場合には、命令の発行場所（命令
デコード部またはペンディングキュー）および命令の種
類により、図３９のように実行の許可と不許可を決定す
る。The instruction execution permitting unit 207 permits the transfer of the instruction to the corresponding execution unit. If some of the operands used by an instruction cannot be used, execution of the instruction is not permitted. If all operands are available, the execution of the instruction is permitted if the value of the processor status register 204 is "00". Processor status register 2
When the value of 04 is "01", execution permission and non-permission are determined as shown in FIG. 39 according to the instruction issue location (instruction decoding unit or pending queue) and the type of instruction.

【０２９０】投機性判定部２１５〜２１７は、第１の実
施形態と同様に実現できる。The speculative judgment sections 215 to 217 can be realized in the same manner as in the first embodiment.

【０２９１】レジスタ退避部２１８も第１の実施形態と
同様に実現できるが、登録された情報の削除に関する条
件は以下のように変更する。・ｉｎ−ｏｒｄｅｒフラグが１の命令がトラップを生じ
ることなく完了したら、その命令のグループ値と同一の
グループ値を持つ情報を削除する。Although the register saving unit 218 can be realized in the same manner as in the first embodiment, the conditions for deleting registered information are changed as follows. When an instruction with an in-order flag of 1 is completed without causing a trap, information having the same group value as that of the instruction is deleted.

【０２９２】以下では、本実施形態のダイナミックＶＬ
ＩＷプロセッサにおいて、図４０のコードを実行した際
の動作について説明する。説明を簡明にするために、コ
ードの開始時点においてパイプラインで処理中の命令は
存在しないものとし、プロセッサ状態レジスタ２０４の
値は“００”であるとする。パイプラインステージの定
義は第１の実施形態と同様である。Hereinafter, the dynamic VL of this embodiment will be described.
The operation when the code of FIG. 40 is executed in the IW processor will be described. For the sake of simplicity, it is assumed that no instruction is being processed in the pipeline at the start of the code, and the value of the processor status register 204 is "00". The definition of the pipeline stage is the same as in the first embodiment.

【０２９３】まず、サイクル１のＦステージでＶＬＩＷ
命令１がフェッチされ、サイクル２のＤステージでＶＬ
ＩＷ命令１のデコード、ＦステージでＶＬＩＷ命令２の
フェッチが行なわれる。デコードされたＶＬＩＷ命令１
の各アトムにはグループ値“０”が付加される。ここで
はグループ値生成部２０６はハードウェアカウンタ方式
で実現されているとする。First, VLIW in the F stage of cycle 1
Instruction 1 is fetched, and VL is
Decoding of the IW instruction 1 and fetching of the VLIW instruction 2 are performed in the F stage. Decoded VLIW instruction 1
Are added with a group value "0". Here, it is assumed that the group value generation unit 206 is realized by a hardware counter method.

【０２９４】続いて、サイクル３に、ＶＬＩＷ命令１の
各アトムが用いるオペランドはオペランド状態判定部２
０８により全て使用可能と判定され、プロセッサ状態レ
ジスタ２０４の値が“００”であるので命令実行許可部
２０７により実行が許可されて、各アトムはそれぞれの
命令実行ユニット２１２〜２１４に転送されＥステージ
で実行される。一方、ＤステージではＶＬＩＷ命令２の
デコードが、ＦステージではＶＬＩＷ命令３のフェッチ
が行なわれ、ＶＬＩＷ命令２の各命令にグループ値
“０”が付加される。Subsequently, in cycle 3, the operand used by each atom of the VLIW instruction 1 is
08, it is determined that all of them can be used. Since the value of the processor status register 204 is "00", the execution is permitted by the instruction execution permitting unit 207, and each atom is transferred to each of the instruction execution units 212 to 214. Executed in On the other hand, in the D stage, the VLIW instruction 2 is decoded, and in the F stage, the VLIW instruction 3 is fetched, and a group value “0” is added to each instruction of the VLIW instruction 2.

【０２９５】次に、サイクル４の開始時にＶＬＩＷ命令
１のアトム１“ａｄｄｉｕｒ１２，ｒ０，３”の実行
は完了し、アトム２“ｌｗｒ３，ｌｅｖｅｌ（ｒ
０）”は実行中であるがページフォルトは発生しなかっ
たとする。アトム１は投機性判定部２１５により投機的
でないと判定されるため、レジスタ退避部２１８に情報
を登録せずにＷステージでレジスタｒ１２を更新する。
一方、ＶＬＩＷ命令２の各アトムは実行が許可されてＥ
ステージで実行される。ＶＬＩＷ命令３はＤステージで
デコードされて各アトムにグループ値“０”が付加さ
れ、ＶＬＩＷ命令４はＦステージでフェッチされる。以
下、フェッチ、デコード処理については省略する。ま
た、ＶＬＩＷ命令１のアトム２の実行は、ＶＬＩＷ命令
１２の実行後まで完了しないとする。Next, at the start of cycle 4, the execution of the atom 1 "addi r12, r0, 3" of the VLIW instruction 1 is completed, and the execution of the atom 2 "lwr3, level (r
"0)" is being executed, but no page fault has occurred. Since the speculativeness determining unit 215 determines that the atom 1 is not speculative, the atom 1 does not register information in the register saving unit 218, and does not register information in the W stage. Update the register r12.
On the other hand, each atom of the VLIW instruction 2 is permitted to execute and
Performed on stage. The VLIW instruction 3 is decoded at the D stage, a group value “0” is added to each atom, and the VLIW instruction 4 is fetched at the F stage. Hereinafter, the fetch and decode processes will be omitted. It is also assumed that the execution of the atom 2 of the VLIW instruction 1 is not completed until after the execution of the VLIW instruction 12.

【０２９６】ＶＬＩＷ命令２、ＶＬＩＷ命令３は実行完
了後、投機的でないためレジスタ退避なしにレジスタの
状態を更新する。ここで、ＶＬＩＷ命令４のデコード後
もＶＬＩＷ命令１のアトム２“ｌｗｒ３，ｌｅｖｅｌ
（ｒ０）”は実行中であるので、ＶＬＩＷ命令４のアト
ム１“ｓｌｔｉｒ４，ｒ３，２０”は、オペランド状
態判定部２０８によりオペランドｒ３が使用できないと
判定されて、そのデコード情報がペンディングキュー２
０９にグループ値“０”とともに入れられ、ｉｎ−ｏｒ
ｄｅｒフラグは“０”にセットされる。After the execution of VLIW instruction 2 and VLIW instruction 3 is completed, the state of the register is updated without saving the register because it is not speculative. Here, even after the decoding of the VLIW instruction 4, the atom 2 of the VLIW instruction 1 “lwr3, level”
Since (r0) "is being executed, the operand 1 of the VLIW instruction 4" slti r4, r3, 20 "is determined by the operand state determination unit 208 to be unable to use the operand r3, and its decode information is stored in the pending queue 2
09 together with the group value “0” and the in-or
The der flag is set to “0”.

【０２９７】同様に、ＶＬＩＷ命令５のデコード後もＶ
ＬＩＷ命令１のアトム２が実行中であるので、ＶＬＩＷ
命令５のアトム１“ｂｅｑｒ４，ｒ０，ＶＬＩＷ命令
８（のアドレス）”は、レジスタｒ４が使用できないた
め、ペンディングキュー２０９にグループ値“１”とと
もに入れられ、ｉｎ−ｏｒｄｅｒフラグは１にセットさ
れる。そして、プロセッサ状態レジスタ２０４を“０
１”とする。ここで、ＶＬＩＷ命令５のアトム１はフェ
ッチ時にＶＬＩＷ命令８に分岐しないと予測していたと
する。Similarly, after decoding of VLIW instruction 5, V
Since atom 2 of LIW instruction 1 is being executed, VLIW
The atom 1 of the instruction 5 “beq r4, r0, (address of the VLIW instruction 8)” is put in the pending queue 209 together with the group value “1” because the register r4 cannot be used, and the in-order flag is set to 1. You. Then, the processor status register 204 is set to “0”.
Here, assume that atom 1 of VLIW instruction 5 is predicted not to branch to VLIW instruction 8 at the time of fetch.

【０２９８】次に、ＶＬＩＷ命令６のアトム１“ａｄｄ
ｉｕｒ１２，ｒ０，２”のオペランドは使用可能であ
るので、命令実行ユニット２１２に転送されて実行され
る。実行後、投機性判定部２１５により投機的と判定さ
れるため、レジスタｒ１２、その元の値、グループ値
“１”をレジスタ退避部２１８に登録する。その後、レ
ジスタｒ１２が更新される。Next, atom 1 “add” of VLIW instruction 6
Since the operands of iu r12, r0, 2 ″ are available, they are transferred to the instruction execution unit 212 and executed. After execution, the speculativeness determination unit 215 determines that the speculativeness has occurred. And the group value “1” are registered in the register saving unit 218. Then, the register r12 is updated.

【０２９９】次に、ＶＬＩＷ命令７のアトム２“ｓｗ
ｒ１２，ｌｉｂｓ（ｒ０）”のオペランドは使用可能で
あるが、プロセッサ状態レジスタ２０４が“０１”であ
るので、ストア命令は実行されずに、ペンディングキュ
ー２１０にグループ値“２”とともに入れられ、ｉｎ−
ｏｒｄｅｒフラグは“１”にセットされる。Next, atom 2 “sw” of VLIW instruction 7
Although the operand of “r12, libs (r0)” can be used, since the processor status register 204 is “01”, the store instruction is not executed and is put in the pending queue 210 together with the group value “2”. −
The order flag is set to "1".

【０３００】次に、ＶＬＩＷ命令８のアトム１“ｓｌｔ
ｉｒ１２，ｒ３，２”は、出力オペランドｒ１２が使
用できないため、コミット命令となる。よって、グルー
プ値“３”とともにペンディングキュー２０９に入り、
ｉｎ−ｏｒｄｅｒフラグを“１”とする。Next, atom 1 “slt” of VLIW instruction 8
i r12, r3, 2 "is a commit instruction because the output operand r12 cannot be used. Therefore, it enters the pending queue 209 together with the group value" 3 ".
The in-order flag is set to “1”.

【０３０１】次に、ＶＬＩＷ命令９のアトム１“ｂｅｑ
ｒ１２，ｒ０，ＶＬＩＷ命令１２（のアドレス）”
も、オペランドｒ１２が使用できないため、グループ値
“４”とともにペンディングキュー２０９に入り、ｉｎ
−ｏｒｄｅｒフラグを“１”とする。ここで、ＶＬＩＷ
命令９のアトム１はフェッチ時にアトム１２に分岐する
と予測していたとする。Next, atom 1 “beq” of VLIW instruction 9
r12, r0, (address of VLIW instruction 12) "
Also enters the pending queue 209 together with the group value “4” because the operand r12 cannot be used, and
The −order flag is set to “1”. Here, VLIW
Assume that atom 1 of instruction 9 is predicted to branch to atom 12 at the time of fetch.

【０３０２】次に、ＶＬＩＷ命令１２のアトム１“ａｄ
ｄｉｕｒ２９，ｒ２９，１２”は、オペランドｒ２９
が使用可能であるので、命令実行ユニット２１２に転送
されて実行される。一方、アトム２“ｍｏｖｅｒ２，
ｒ１２”は、オペランドｒ１２が使用できないため、ペ
ンディングキュー２１０に入れられる。アトム１の実行
は投機的であるため、レジスタｒ２９、その元の値、グ
ループ値“４”をレジスタ退避部２１８に登録する。そ
の後、レジスタｒ２９が更新される。Next, atom 1 “ad” of VLIW instruction 12
diu r29, r29, 12 ″ is the operand r29
Are available and transferred to the instruction execution unit 212 for execution. On the other hand, Atom 2 “move r2,
Since the operand r12 cannot be used, the operand r12 is put in the pending queue 210. Since the execution of the atom 1 is speculative, the register r29, its original value, and the group value “4” are registered in the register saving unit 218. Thereafter, the register r29 is updated.

【０３０３】この時点でのペンディングキュー２０９〜
２１１の状態を図４１に示す。各項目は左から順に、グ
ループ値、そのアトムが属するＶＬＩＷ命令の番号、ｉ
ｎ−ｏｒｄｅｒフラグを示す。また、この時点でのレジ
スタ退避部２１８の状態を図４２に示す。At this time, the pending queue 209-
The state of 211 is shown in FIG. Each item is a group value, the number of the VLIW instruction to which the atom belongs, i
Indicates an n-order flag. FIG. 42 shows the state of the register save unit 218 at this time.

【０３０４】その後、ＶＬＩＷ命令１のアトム２“ｌｗ
ｒ３，ｌｅｖｅｌ（ｒ０）”の実行が完了すると、レ
ジスタｒ３が使用可能になるため、グループ値が“０”
であるＶＬＩＷ命令４のアトム１“ｓｌｔｉｒ４，ｒ
３，２０”がペンディングキュー２０９から命令実行ユ
ニット２１２に転送されて実行される。Thereafter, atom 2 “lw” of VLIW instruction 1
When the execution of “r3, level (r0)” is completed, the register r3 becomes usable, and the group value becomes “0”.
Atom 1 “sltir r4, r of VLIW instruction 4”
3, 20 "is transferred from the pending queue 209 to the instruction execution unit 212 and executed.

【０３０５】このＶＬＩＷ命令４のアトム１の実行完了
後、レジスタｒ４が使用可となるため、グループ値が
“１”であるＶＬＩＷ命令５の命令１“ｂｅｑｒ４，
ｒ０，ＶＬＩＷ命令８（のアドレス）”がペンディング
キュー２０９から命令実行ユニット２１２に転送されて
実行される。このとき、実行命令カウンタは“１”にな
る。After the execution of the atom 1 of the VLIW instruction 4 is completed, the register r4 becomes available. Therefore, the instruction 1 “beq r4,” of the VLIW instruction 5 whose group value is “1”
r0, VLIW instruction 8 (address of) ”is transferred from the pending queue 209 to the instruction execution unit 212 and executed. At this time, the execution instruction counter becomes“ 1 ”.

【０３０６】この条件分岐命令の分岐予測が正しかった
とすると、条件分岐命令はトラップを発生せずに実行完
了する。このとき、レジスタ退避部２１８に退避された
グループ値“１”に関する情報を削除する。図４２では
レジスタｒ１２に関する情報がそれに該当する。Assuming that the branch prediction of the conditional branch instruction is correct, the execution of the conditional branch instruction is completed without generating a trap. At this time, the information about the group value “1” saved in the register saving unit 218 is deleted. In FIG. 42, the information on the register r12 corresponds thereto.

【０３０７】続いて、グループ値が“２”であるＶＬＩ
Ｗ命令７のアトム２“ｓｗｒ１２，ｌｉｂｓ（ｒ
０）”がペンディングキュー２１０から実行され、実行
命令カウンタは“２”となる。Subsequently, the VLI whose group value is “2”
Atom 2 “sw r12, libs (r
0) ”is executed from the pending queue 210, and the execution instruction counter becomes“ 2 ”.

【０３０８】このストア命令がページフォルトを起こさ
ずに完了したとすると、次にグループ値が“３”である
ＶＬＩＷ命令８のアトム１“ｓｌｔｉｒ１２，ｒ３，
２”が実行され、実行命令カウンタは“３”となる。レ
ジスタ退避部２１８にはグループ値“２”および“３”
に関するレジスタ情報は登録されていないので、これら
の命令の完了時に削除される情報はない。Assuming that this store instruction is completed without causing a page fault, the atom 1 “sltir12, r3, atom1” of the VLIW instruction 8 whose group value is “3” is next.
2 is executed and the execution instruction counter becomes “3.” The register saving unit 218 stores the group values “2” and “3”.
Since no register information is registered, no information is deleted upon completion of these instructions.

【０３０９】一方、上記のストア命令がページフォルト
を起こした場合には、まず命令のフェッチ処理、デコー
ド処理がキャンセルされ、ペンディングキュー２０９〜
２１１が消去される。この例では、ＶＬＩＷ命令９のア
トム１“ｂｅｑｒ１２，ｒ０，ＶＬＩＷ命令１２（の
アドレス）”がペンディングキューから消去される。そ
して、プロセッサ状態レジスタ２０４を“１０”にセッ
トした後、実行中の命令の完了後にレジスタ退避部２１
８に登録されているレジスタを元の値に戻し、同時にペ
ージング処理を行なう。この例では、レジスタｒ２９が
元の値に戻される。最後に、グループ値生成部２０６を
初期状態に戻し、プロセッサ状態レジスタ２０４を“０
０”としてアドレス生成部２０３の示すアドレスをスト
ア命令を含むＶＬＩＷ命令のアドレスに設定してトラッ
プ処理を完了する。On the other hand, when the above store instruction causes a page fault, first, the fetch processing and the decode processing of the instruction are canceled, and the pending queues 209 to 209 are canceled.
211 is erased. In this example, the atom 1 “beq r12, r0, (address of) the VLIW instruction 12” of the VLIW instruction 9 is deleted from the pending queue. Then, after the processor status register 204 is set to “10”, the register evacuation unit 21
8 is returned to the original value, and at the same time, paging processing is performed. In this example, the register r29 is returned to the original value. Finally, the group value generation unit 206 is returned to the initial state, and the processor status register 204 is set to “0”.
The trap processing is completed by setting the address indicated by the address generation unit 203 to 0 "as the address of the VLIW instruction including the store instruction.

【０３１０】ストア命令がページフォルトを起こさない
場合、ＶＬＩＷ命令８のアトム１“ｓｌｔｉｒ１２，
ｒ３，２”の実行が完了するとＶＬＩＷ命令９のアトム
１“ｂｅｑｒ１２，ｒ０，ＶＬＩＷ命令１２（のアド
レス）”がペンディングキュー２０９から命令実行ユニ
ット２１２に転送される。このとき、実行命令カウンタ
は“４”となり、プロセッサ状態レジスタ２０４は“０
０”となる。If the store instruction does not cause a page fault, the VLIW instruction 8 atom 1 "sltir12,
When the execution of “r3, 2” is completed, the atom 1 “beq r12, r0, (address of) the VLIW instruction 12” of the VLIW instruction 9 is transferred from the pending queue 209 to the instruction execution unit 212. At this time, the execution instruction counter is It becomes “4” and the processor status register 204 becomes “0”.
0 ".

【０３１１】この条件分岐命令が分岐予測ミストラップ
を発生させずに実行完了した場合には、レジスタ退避部
２１８のグループ値“４”に関するレジスタ情報（この
例では、レジスタｒ２９に関する情報）が削除される。When the execution of this conditional branch instruction is completed without generating a branch prediction mistrap, the register information (in this example, information about the register r29) of the register save unit 218 relating to the group value “4” is deleted. You.

【０３１２】上記の条件分岐命令が分岐予測ミストラッ
プを発生させた場合の処理は、ページング処理がないこ
ととアドレス生成部２０３の示すアドレスがＶＬＩＷ命
令１０のアドレスに設定されることを除けば先述のスト
ア命令のページフォルトトラップの場合と同様である。The processing in the case where the above conditional branch instruction generates a branch prediction mistrap is described above, except that there is no paging processing and that the address indicated by the address generation unit 203 is set to the address of the VLIW instruction 10. This is the same as the case of the page fault trap of the store instruction.

【０３１３】なお、第１の実施形態のｉｎ−ｏｒｄｅｒ
命令発行スーパースカラープロセッサと同様に、本実施
形態のダイナミックＶＬＩＷプロセッサに関してＰｒｅ
ｄｉｃａｔｅ付き命令を投機的に実行するようにした実
施形態も、第２の実施形態と同様の方法で実現すること
ができる。また、第３の実施形態のようにＳＹＮＣ命令
を用いて各命令の投機性判定とレジスタ退避・復帰を必
要としない方法も本実施形態にそのまま適用することが
可能である。Note that the in-order of the first embodiment is used.
Like the instruction issuing superscalar processor, the dynamic VLIW processor of the present embodiment has a Pre
The embodiment in which the instruction with dictate is executed speculatively can also be realized by the same method as the second embodiment. Further, a method that does not require speculative judgment of each instruction and save / restore of registers by using a SYNC instruction as in the third embodiment can be applied to this embodiment as it is.

【０３１４】なお、各実施形態におけるコンパイラはソ
フトウェアとしても実現可能である。また、各実施形態
におけるコンパイラは、コンピュータに所定の手段を実
行させるための（あるいはコンピュータを所定の手段と
して機能させるための、あるいはコンピュータに所定の
機能を実現させるための）プログラムを記録したコンピ
ュータ読取り可能な記録媒体としても実施することもで
きる。The compiler in each of the embodiments can be realized as software. Further, the compiler in each of the embodiments is a computer-readable program that records a program for causing a computer to execute predetermined means (or for causing a computer to function as predetermined means, or for causing a computer to realize predetermined functions). It can also be implemented as a possible recording medium.

【０３１５】本発明は、上述した実施の形態に限定され
るものではなく、その技術的範囲において種々変形して
実施することができる。The present invention is not limited to the above-described embodiments, but can be implemented with various modifications within the technical scope.

【０３１６】[0316]

【発明の効果】本発明によれば、予め定められた特定の
命令間のプログラム区間の各命令に対してプログラム区
間ごとに固有の識別番号を付加し、該識別番号に基づい
て投機的実行に関する処理を行うようにしたので、より
少ない情報の保存やより少ないハードウェア量で効果的
な投機的実行を可能とする。According to the present invention, a unique identification number is added to each instruction of a program section between predetermined specific instructions for each program section, and speculative execution is performed based on the identification number. Since processing is performed, storage of less information and effective speculative execution with a smaller amount of hardware are enabled.

[Brief description of the drawings]

【図１】本発明の第１、第２の実施形態に係るプロセッ
サの構成例を示す図FIG. 1 is a diagram illustrating a configuration example of a processor according to first and second embodiments of the present invention.

【図２】本発明の第１、第２、第４の実施形態における
パイプライン処理の一例を示すフローチャートFIG. 2 is a flowchart illustrating an example of pipeline processing according to the first, second, and fourth embodiments of the present invention;

【図３】プログラムの一例を示す図FIG. 3 shows an example of a program.

【図４】グループ値生成部をビットベクタにより実現す
る場合について説明するための図FIG. 4 is a diagram for explaining a case where a group value generation unit is realized by a bit vector;

【図５】グループ値生成部をハードウェアカウンタによ
り実現する場合について説明するための図FIG. 5 is a diagram for explaining a case where a group value generation unit is realized by a hardware counter;

【図６】命令の実行が投機的であるか非投機的であるか
を判断するための基準の一例を示す図FIG. 6 is a diagram illustrating an example of a criterion for determining whether execution of an instruction is speculative or non-speculative;

【図７】プロセッサの状態の更新を許可するか否かを判
断するための基準の一例を示す図FIG. 7 is a diagram showing an example of a criterion for determining whether or not to permit updating of a state of a processor;

【図８】レジスタ退避部に同一レジスタに関する情報が
複数登録される場合について説明するための図FIG. 8 is a diagram for explaining a case where a plurality of pieces of information relating to the same register are registered in a register saving unit;

【図９】実行ユニットがキューを備える場合について説
明するための図FIG. 9 is a diagram illustrating a case where an execution unit includes a queue.

【図１０】Ｐｒｅｄｉｃａｔｅ付き命令を含まないプロ
グラムの一例を示す図FIG. 10 is a diagram illustrating an example of a program that does not include an instruction with Predicate.

【図１１】Ｐｒｅｄｉｃａｔｅ付き命令を含むプログラ
ムの一例を示す図FIG. 11 is a diagram showing an example of a program including an instruction with Predicate;

【図１２】本発明の第２の実施形態におけるＰｒｅｄｉ
ｃａｔｅセットトラップ時のレジスタ復帰の処理手順の
一例を示すフローチャートFIG. 12 shows a Predi according to a second embodiment of the present invention.
9 is a flowchart showing an example of a processing procedure for register return at the time of a “cate” set trap.

【図１３】レジスタ退避部に記憶される情報のフォーマ
ットの一例を示す図FIG. 13 is a diagram illustrating an example of a format of information stored in a register saving unit;

【図１４】トラップ再実行で誤動作が生じる場合につい
て説明するための図FIG. 14 is a diagram for describing a case where a malfunction occurs due to re-execution of a trap.

【図１５】プログラムの一例を示す図FIG. 15 shows an example of a program.

【図１６】本発明の第３の実施形態におけるレジスタ退
避部を用いないレジスタ割当ておよびコード生成の処理
手順の一例を示すフローチャートFIG. 16 is a flowchart illustrating an example of a processing procedure of register allocation and code generation without using a register saving unit according to the third embodiment of the present invention.

【図１７】次にレジスタ割当てを行なう基本ブロックの
選択方法について説明するための図FIG. 17 is a diagram for describing a method of selecting a basic block to be subjected to register allocation.

【図１８】基本ブロックの先頭で使用可能なレジスタ集
合について説明するための図FIG. 18 is a diagram illustrating a register set usable at the beginning of a basic block.

【図１９】記号レジスタで記述されたプログラムの一例
を示す図FIG. 19 is a diagram showing an example of a program described by a symbol register.

【図２０】図１９のプログラムに存在する基本ブロック
間の関係を説明するための図FIG. 20 is a view for explaining the relationship between basic blocks existing in the program of FIG. 19;

【図２１】図１９のプログラムにレジスタ割当ておよび
コード挿入を施した結果の一例を示す図FIG. 21 is a diagram showing an example of a result obtained by performing register allocation and code insertion on the program of FIG. 19;

【図２２】本発明の第４の実施形態に係るプロセッサの
構成例を示す図FIG. 22 is a diagram illustrating a configuration example of a processor according to a fourth embodiment of the present invention.

【図２３】分岐トラップ命令の配置について説明するた
めの図FIG. 23 is a view for explaining the arrangement of branch trap instructions;

【図２４】非投機的な命令配置およびレジスタ割当てが
なされたプログラムの一例を示す図FIG. 24 is a diagram showing an example of a program in which non-speculative instruction arrangement and register allocation are performed.

【図２５】図２４のプログラムを処理した結果の一例を
示す図FIG. 25 is a diagram showing an example of a result obtained by processing the program in FIG. 24;

【図２６】拡張アトムを持つＶＬＩＷ命令のブロック配
置の一例を示す図FIG. 26 is a diagram showing an example of a block arrangement of a VLIW instruction having an extended atom.

【図２７】拡張アトムのフォーマットの一例を示す図FIG. 27 is a diagram showing an example of the format of an extended atom

【図２８】２ビットの投機性フラグの例を示す図FIG. 28 is a diagram showing an example of a 2-bit speculative flag;

【図２９】ＶＬＩＷ命令の一例を示す図FIG. 29 is a diagram illustrating an example of a VLIW instruction;

【図３０】ダイナミックＶＬＩＷ方式について説明する
ための図FIG. 30 is a diagram for explaining a dynamic VLIW method;

【図３１】ＶＬＩＷ命令の命令列の一例を示す図FIG. 31 is a diagram showing an example of an instruction sequence of a VLIW instruction;

【図３２】図３１の命令列を従来のＶＬＩＷ方式で実行
した場合について説明するための図FIG. 32 is a view for explaining a case where the instruction sequence of FIG. 31 is executed by a conventional VLIW method;

【図３３】図３１の命令列をダイナミックＶＬＩＷ方式
で実行した場合について説明するための図FIG. 33 is a view for explaining a case where the instruction sequence of FIG. 31 is executed by a dynamic VLIW method;

【図３４】本発明の第５の実施形態に係るプロセッサの
構成例を示す図FIG. 34 is a diagram illustrating a configuration example of a processor according to a fifth embodiment of the present invention.

【図３５】同実施形態におけるパイプライン処理の一例
を示すフローチャートFIG. 35 is a flowchart showing an example of pipeline processing in the embodiment.

【図３６】ペンディングキューの構成例を示す図FIG. 36 is a diagram illustrating a configuration example of a pending queue.

【図３７】２ビットのプロセッサ状態レジスタの例を示
す図FIG. 37 shows an example of a 2-bit processor status register.

【図３８】命令間の順序を判断するための基準の一例を
示す図FIG. 38 is a diagram showing an example of a criterion for determining the order between instructions;

【図３９】実行を許可するか否かを判断するための基準
の一例を示す図FIG. 39 is a diagram illustrating an example of a criterion for determining whether to permit execution;

【図４０】プログラムの一例を示す図FIG. 40 shows an example of a program.

【図４１】ペンディングキューの状態の一例を示す図FIG. 41 is a diagram showing an example of a pending queue state;

【図４２】レジスタ退避部の状態の一例を示す図FIG. 42 is a diagram illustrating an example of a state of a register saving unit;

【図４３】プログラムの一例を示す図FIG. 43 shows an example of a program.

[Explanation of symbols]

１，１０１，２０１…メモリ２，１０２，２０２…命令フェッチ部３…命令キュー４，１０４，２０５…命令デコード部５，１０５，２０８…オペランド状態判定部６，２０６…グループ値生成部７…デコード命令キュー８〜９，１０８〜１１０，２１２〜２１４…命令実行ユ
ニット１０…ロード・ストアユニット１１…分岐命令実行ユニット１２〜１５，１１１〜１１３，２１５〜２１７…投機性
判定部１６，１１４，２１８…レジスタ退避部１７，１１６，２１９…レジスタ１８〜２１，１２４〜１２６…命令バス２２〜２５，１２７〜１２９…レジスタリードバス２６〜２９，１３９〜１４１…レジスタライトバス２８，１０３，２０３…アドレス生成部３０…プロセッサ状態制御部３１，１４２…メモリバス３２…命令キューバス３３，１１９…アドレスバス３４，１１７…命令フェッチバス３５，１２０…デコードバス１１５…分岐履歴記憶部２０４…プロセッサ状態レジスタ２０７…命令実行許可部２０９〜２１１，１００２−１，１００２−２…ペンデ
ィングキュー（ＰｅｎｄｉｎｇＱｕｅｕｅ）１００４…スコアボード１００６−１，１００６−２…パイプラインユニット1, 101, 201 memory 2, 102, 202 instruction fetch unit 3 instruction queue 4, 104, 205 instruction decode unit 5, 105, 208 operand state determination unit 6, 206 group value generation unit 7 decode Instruction queue 8-9, 108-110, 212-214 Instruction execution unit 10 Load / store unit 11 Branch instruction execution unit 12-15, 111-113, 215-217 ... Speculative judgment unit 16, 114, 218 ... Register save units 17,116,219 ... Registers 18-21,124-126 ... Instruction buses 22-25,127-129 ... Register read buses 26-29,139-141 ... Register write buses 28,103,203 ... Addresses Generation unit 30 Processor state control unit 31, 142 Memory bus 32 Instruction Queue buses 33, 119 Address buses 34, 117 Instruction fetch buses 35, 120 Decode bus 115 Branch history storage unit 204 Processor status register 207 Instruction execution permitting units 209 to 211, 1002-1, 1002-2 Pending Queue 1004 ... Scoreboard 1006-1, 1006-2 ... Pipeline unit

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5B013 AA12 DD04 5B033 AA07 BE05 CA09 5B045 GG11 5B081 CC23 CC32 ──────────────────────────────────────────────────続き Continued on the front page F term (reference) 5B013 AA12 DD04 5B033 AA07 BE05 CA09 5B045 GG11 5B081 CC23 CC32

Claims

[Claims]

1. A central processing unit having a plurality of operation execution units and allocating the operation execution units when executing an instruction sequence of a program, wherein the instruction sequence of the program is divided by a predetermined specific instruction. When an instruction is issued, an identification number for identifying an instruction belonging to any of the program sections is assigned to each instruction during its execution, using the identification number assigned to each instruction. And a speculativeness determining means for determining whether execution of the instruction is speculative.

2. A central processing unit having a plurality of operation execution units and fetching and interpreting an instruction sequence of a program for each of a series of instructions assigned to each of the operation execution units in advance. Instruction storage means for temporarily saving, storage means for storing information on the use status of each register, and whether or not an instruction fetched based on the information stored in the storage means is executable. Means for temporarily saving the instruction in the instruction storage means when it is determined that the instruction cannot be executed; and, when the instruction sequence of the program is divided by a predetermined specific instruction, An identification number assigning means for assigning an identification number for identifying an instruction to each instruction when executing the instruction, and an identification number assigned to each instruction. Serial execution of instructions with identification numbers central processing unit, characterized in that a speculative determining means for determining whether speculative or.

3. The identification number assigning means holds information indicating whether or not each identification number is in use, and refers to the information when an instruction to which an identification number is to be assigned is the specific instruction. Means for selecting and assigning an identification number that is not in use, and when an instruction to be assigned an identification number is an instruction other than the specific instruction, means for assigning the same identification number as the identification number given immediately before; The central processing unit according to claim 1 or 2, further comprising:

4. The speculativeness judging means, for each identification number given by the identification number giving means, is based on information indicating whether or not a specific instruction given the identification number is being executed. When it is indicated that a specific instruction having the same identification number as the identification number assigned to the instruction to be determined is being executed, the execution of the instruction to be determined is speculative. 4. The central processing unit according to claim 3, wherein the central processing unit determines that the target is appropriate.

5. An identification number assigning means comprising: identification number counter means for holding an identification number assigned to an instruction immediately before; and if the instruction to be assigned an identification number is an instruction other than the specific instruction, The same identification number as the identification number held in the counter means is assigned, and when the instruction to be given the identification number is the specific instruction, the identification number held in the identification number counter means is set to 1 A means for assigning the increased identification number after the number has been increased. 3. The central processing unit according to claim 1, wherein

6. An identification number which has the same initial state as that of said identification number counter means, and when the execution of said specific instruction given said identification number is completed using said identification number counter means, the identification number to be held is set to 1 The speculativeness judging means refers to the identification number counter means and the execution instruction counter means, and the identification number given to the instruction to be determined is the execution instruction. If the identification number corresponds to an identification number within the range from the identification number held by the counter means to the identification number held by the identification number counter means, it is determined that the execution of the instruction subject to the determination is speculative. The central processing unit according to claim 5, wherein

7. The apparatus according to claim 1, further comprising means for retaining information for restoring a register updated by an instruction determined to be speculatively executed to a state before execution of the instruction. Or the central processing unit according to 2.

8. The instruction storage means holds together with the fetched instruction that cannot be executed, an identification number assigned to the instruction and information indicating whether the instruction corresponds to the specific instruction. The central processing unit according to claim 2, wherein:

9. When it is determined that the speculative execution has failed for an instruction determined to have been speculatively executed, a trap corresponding to the cause of the failure is set, and the trapping routine executes the speculative execution. 9. The central processing unit according to claim 1, wherein the central processing unit is returned to a previous state.

10. The speculativeness judging means judges whether or not execution of only a SYNC instruction for restricting speculative execution is speculative, and judges that the SYNC instruction has been speculatively executed. 3. The method according to claim 1, wherein in the case, an instruction subsequent to the SYNC instruction is not executed until the specific instruction input earlier in the program order than the SYNC instruction is completed. Central processing unit.

11. A central processing unit for allocating a plurality of execution units when executing a program instruction sequence, or a program instruction sequence for each of a series of instructions previously assigned to each of the plurality of operation execution units. A compiling method for generating a program to be executed by a central processing unit for fetching and interpreting, wherein an output operand of a certain instruction arranged on one path branched from a conditional branch instruction is the other operand branched from the conditional branch instruction. Has an instruction placed on one of the paths and an output operand that matches one of the input operands of the speculatively executed instruction in such a case that the instruction is referred to by another instruction without being updated in that path SYNC instruction for restricting speculative execution such that such an instruction is not speculatively executed in the central processing unit. A compilation method characterized by generating and assigning instructions.

12. A central processing unit having a plurality of operation execution units and fetching and interpreting an instruction sequence of a program for each of a series of instructions assigned to each of the operation execution units in advance. When demarcated by a specified specific instruction, an identification number for identifying to which one of the program sections the instruction belongs, and information on the speculative property of the instruction are used when interpreting the instruction. Means for extracting, based on the information on the speculative property incorporated in the instruction that has been executed, speculativeness determining means for determining whether or not the execution of the instruction is speculative, and that the instruction has been speculatively executed. Based on the information on the identification number and the speculative property incorporated in the determined instruction,
A speculative result determining means for determining whether or not the speculative execution has succeeded, and when it is determined that the speculative execution has failed for the instruction determined to have been speculatively executed, according to a cause of the failure, A central processing unit comprising: means for trapping.

13. The speculativeness judging means judges that the execution of the instruction is speculative when the information on the speculativeness incorporated in the instruction to be judged indicates that the instruction is speculative. The central processing unit according to claim 12, wherein:

14. The speculative result judging means, when an instruction having the same identification number as the identification number incorporated in the instruction to be judged and corresponding to the information relating to the speculative property is executed, executes the instruction. 13. The central processing unit according to claim 12, wherein it is determined that the speculative execution of has failed.

15. A storage device for temporarily storing evacuation information for returning to a state before performing speculative execution when it is determined that speculative execution of an instruction has failed. The central processing unit according to claim 12, wherein

16. The central processing unit according to claim 15, wherein said evacuation information is held in correspondence with an identification number assigned to an instruction determined to have been executed speculatively.

17. The central processing unit according to claim 16, wherein the return of the state based on the save information is selectively performed by referring to an identification number in the holding means.

18. A compiling method for generating a program to be executed by a central processing unit which fetches and interprets an instruction sequence of a program for each of a series of instructions assigned to each of a plurality of operation execution units in advance. When the instruction sequence is divided by a predetermined specific instruction, an identification number for identifying which instruction belongs to the program section, and information regarding the speculative property of the instruction are added to each instruction. An instruction located after the specific instruction in program order,
A compiling method comprising: moving forward beyond the specific instruction causing speculative execution.

19. A central processing unit for allocating a plurality of operation execution units when executing an instruction sequence of an execution program, or a series of instructions in which an instruction sequence of an execution program is previously assigned to each of the plurality of operation execution units. A compile procedure for generating a program executable by a central processing unit that fetches and interprets each instruction, wherein an output operand of an instruction arranged on one path branched from a conditional branch instruction branches from the conditional branch instruction An instruction placed on one path and an output operand that matches one of the input operands of the speculatively executed instruction in such a case that the instruction is referred to by another instruction without being updated in the other path. SY for restricting speculative execution so that instructions having the same are not speculatively executed in the central processing unit. A computer-readable recording medium recording a compilation program for causing a computer to execute a compilation procedure including a procedure of generating and assigning an NC instruction.

20. A compiling procedure for generating a program executable by a central processing unit that fetches and interprets an instruction sequence of an execution program for each of a series of instructions assigned to each of a plurality of operation execution units in advance. When an instruction sequence of a program is divided by a predetermined specific instruction, an identification number for identifying which instruction belongs to a program section and information on speculativeness of the instruction are assigned to each instruction. Along with an instruction located after the specific instruction in the program order,
A computer-readable recording medium storing a compile program for causing a computer to execute a compile procedure including a procedure of moving forward beyond the specific instruction causing speculative execution.