JP2000194555A

JP2000194555A - Arithmetic processor and instruction order control method

Info

Publication number: JP2000194555A
Application number: JP10373380A
Authority: JP
Inventors: Shigehiro Asano; 滋博浅野; Yoshifumi Yoshikawa; 宜史吉川
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1998-12-28
Filing date: 1998-12-28
Publication date: 2000-07-14

Abstract

PROBLEM TO BE SOLVED: To further improve frequency by judging whether or not a fetched instruction or an instruction stored in an instruction storage means can be executed according to information regarding the use states of registers. SOLUTION: For example, when two pipeline units 6-1 and 6-2 are provided, this dynamic VLIW system is provided with independent Pending Queues 2-1 and 2-2 for slots 1 and 2, respectively, so as to save an instruction fetched from a normal instruction sequence as an execution waiting instruction if it can not be executed immediately. Further, out-of-order is realized by using a table called a score board 4 for managing information regarding the use states of the registers for each register. an atom which is not executed among atoms of a fetched VLIW instruction is saved in the Pending Queues 2-1 and 2-2 until its execution becomes possible.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、命令の並列的な処
理を可能とした演算処理装置及びその命令順序制御方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an arithmetic processing unit capable of processing instructions in parallel and an instruction sequence control method therefor.

【０００２】[0002]

【従来の技術】命令レベルの並列度を上げる方法として
コンパイル時にスタティックに資源を割り当て使用する
ＶＬＩＷによる方法と、実行時に資源の割り当てをダイ
ナミックに行うスーパースカラの方法とがある。スタテ
ィックにスケジューリングを行うＶＬＩＷの方法に対
し、スーパースカラによる方法は高性能ＲＩＳＣプロセ
ッサで代表されるような方法である。2. Description of the Related Art There are a VLIW method in which resources are statically allocated and used at compile time and a superscalar method in which resources are dynamically allocated at execution time. In contrast to the VLIW method of performing static scheduling, a method using a superscalar is a method represented by a high-performance RISC processor.

【０００３】元来、ＲＩＳＣプロセッサは単純な構成に
より高い実行周波数を達成することにより高速化を行っ
てきたが、より高い性能を求められることからハードウ
ェアにより同時実行可能な命令を検出し、それらを複数
の演算器に与え、並列に実行するというスーパースカラ
方式を採用してきた。[0003] Originally, RISC processors have been speeded up by achieving a high execution frequency with a simple configuration. However, since higher performance is required, instructions that can be executed simultaneously by hardware are detected, and those are executed. Has been applied to a plurality of arithmetic units and executed in parallel.

【０００４】スーパースカラ方式では、ある命令が実行
できなくても、他に実行可能な命令をダイナミックに探
して実行することから、例えばキャッシュミスを起こし
た場合でもプロセッサ全体を止めることなく処理を進め
ることができる。In the superscalar system, even if a certain instruction cannot be executed, another executable instruction is dynamically searched for and executed. Therefore, even if a cache miss occurs, the process proceeds without stopping the entire processor. be able to.

【０００５】しかしながら、これらのハードウェアは複
雑であり、これ以上の周波数向上を行うのに重大な障害
となっている。[0005] However, such hardware is complicated, and has become a serious obstacle to further improving the frequency.

【０００６】これらの複雑なハードウェアの対局に位置
するのがＶＬＩＷ方式である。ＶＬＩＷ方式ではコンパ
イラにより同時実行可能な命令を検出するので、実行時
に検出するメカニズムが必要ないので、実行時のハード
ウェアが単純化され、高い周波数が達成される可能性が
ある。[0006] The VLIW system is located on the opposite side of these complicated hardware. In the VLIW method, since a compiler detects instructions that can be executed simultaneously, a mechanism for detecting the instructions at the time of execution is not required. Therefore, hardware at the time of execution is simplified, and a high frequency may be achieved.

【０００７】コンパイラにより同時実行可能な命令を検
出する方法にはコンパイラでは完全に予測できないある
いは現実的に予測不可能なパラメータが存在し、高い並
列度を維持するのは困難である。予測できないパラメー
タとは例えばキャッシュミスによるロードのレーテンシ
の増大などで、これらにコンパイラで対処する方法とし
てはロード命令とロードした値を使用する命令をなるべ
く離すという方法がとられるが、すべてのロード命令に
これを適用するのは極めて困難であることが知られてい
る。In the method of detecting instructions that can be executed simultaneously by the compiler, there are parameters that cannot be completely predicted or realistically unpredictable by the compiler, and it is difficult to maintain a high degree of parallelism. An unpredictable parameter is, for example, an increase in load latency due to a cache miss. A method for dealing with these is to separate the load instruction and the instruction using the loaded value as much as possible. It is known that this is very difficult to apply.

【０００８】[0008]

【発明が解決しようとする課題】上記のように、従来の
ＶＬＩＷ方式では、同時実行可能な命令をコンパイル時
に検出する（実行時には検出しない）ことによってハー
ドウェアが単純化され、このため高い周波数の実現が期
待されるものの、コンパイル時に予測できないパラメー
タが存在するために、これ以上の命令レベルの並列度を
上げるのが困難になっている。As described above, the conventional VLIW system simplifies the hardware by detecting simultaneously executable instructions at compile time (not at run time), and therefore, has a higher frequency. Although it is expected to be realized, there are parameters that cannot be predicted at the time of compiling, so that it is difficult to further increase the parallelism at the instruction level.

【０００９】一方、従来のスーパースカラ方式では、実
行時に実行可能な命令を探して実行することによってプ
ロセッサ全体を止めることなく処理を進めることができ
るものの、ハードウェアが非常に複雑化し、これ以上の
周波数向上を行うのが困難になっている。On the other hand, in the conventional super scalar method, although processing can be performed without stopping the entire processor by searching for and executing an instruction that can be executed at the time of execution, the hardware becomes very complicated, and It is becoming difficult to improve the frequency.

【００１０】本発明は、上記事情を考慮してなされたも
ので、さらなる周波数の向上を図った演算処理装置及び
そのための命令順序制御方法を提供することを目的とす
る。The present invention has been made in view of the above circumstances, and has as its object to provide an arithmetic processing device with further improved frequency and an instruction sequence control method therefor.

【００１１】[0011]

【課題を解決するための手段】本発明に係る演算処理装
置は、フェッチしたが実行できない命令を、後続の命令
を先行して実行させることを可能とするために一時待避
させておくための命令蓄積手段と、各レジスタの使用状
況に関する情報を記憶する記憶手段と前記記憶手段に記
憶されている情報に基づいて、フェッチした命令または
前記命令蓄積手段に蓄積されている命令についての実行
可否の判断を行う判断手段とを備えたことを特徴とす
る。According to the present invention, there is provided an arithmetic processing unit for temporarily saving an instruction that has been fetched but cannot be executed so that a subsequent instruction can be executed in advance. A storage unit, a storage unit for storing information on the use state of each register, and a determination as to whether or not the fetched instruction or the instruction stored in the instruction storage unit can be executed based on the information stored in the storage unit. And determination means for performing the determination.

【００１２】好ましくは、前記判断手段は、フェッチし
たが実行できない命令を前記命令蓄積手段に待避させる
かまたはフェッチを中断するかについて判断する手段を
含むようにしてもよい。Preferably, the judgment means may include means for judging whether an instruction that has been fetched but cannot be executed is saved in the instruction storage means or the fetch is interrupted.

【００１３】好ましくは、前記判断手段は、前記命令蓄
積手段に蓄積されている命令に実行の機会を与えるか否
かを判断する手段を含むようにしてもよい。Preferably, the judging means may include means for judging whether or not an instruction stored in the instruction accumulating means is given an opportunity for execution.

【００１４】好ましくは、前記判断手段がフェッチされ
た命令を前記命令蓄積手段に投入すると判断するために
は、少なくとも、該命令にその実行結果を書き込むべき
レジスタがある場合に該レジスタが上書き可能であり、
かつ、該命令が参照するレジスタの値が未だ確定してな
い、という条件が成立していることを要するものとする
ようにしてもよい。Preferably, in order for the determination means to determine that the fetched instruction is to be input to the instruction storage means, at least when the instruction has a register to which the execution result is to be written, the register can be overwritten. Yes,
Further, the condition that the value of the register referred to by the instruction has not been determined yet may be required.

【００１５】好ましくは、前記判断手段は、フェッチさ
れた命令の実行結果を書き込むべきレジスタが上書きで
きない場合には、フェッチを中断すると判断するように
してもよい。Preferably, the determination means may determine that the fetch should be interrupted when a register to which the execution result of the fetched instruction is to be written cannot be overwritten.

【００１６】好ましくは、いかなる場合にも前記命令蓄
積手段に投入しない特定の命令を予め定めておくように
してもよい。Preferably, a specific command not to be input to the command storage means in any case may be determined in advance.

【００１７】好ましくは、前記判断手段は、前記命令蓄
積手段に蓄積されている命令について実行可否を判断す
る際、判断対象となる命令にその実行結果を書き込むべ
きレジスタがある場合に該レジスタが上書き可能であ
り、かつ、該命令が参照するレジスタの値が確定してい
る場合に、実行できるものと判断するようにしてもよ
い。Preferably, when determining whether or not the instruction stored in the instruction storage unit is executable, the determining unit overwrites the register to which the execution result is to be written for the instruction to be determined. If it is possible and the value of the register referred to by the instruction is determined, it may be determined that the instruction can be executed.

【００１８】本発明は、与えられた命令を実行する複数
の演算処理ユニットと命令の実行のために使用される複
数のレジスタとを備え、予め該演算処理ユニットに対応
付けられた複数の命令を同時にフェッチし、複数の命令
を対応する各命令に対応する演算処理ユニットで並列的
に処理可能な演算処理装置であって、同時にフェッチし
た複数の命令のうち直ちには実行できないものを、後続
の命令を先行して実行させることを可能とするために一
時待避させておくための、各演算ユニット毎に設けられ
た命令蓄積手段と、各レジスタ毎に、該レジスタに書き
込みを行う先行命令で未だ実行完了となっていないもの
が存在するか否かを示す第１の情報と、該レジスタを参
照する先行命令で未だ実行完了となっていないものの個
数を示す第２の情報とを記憶するための記憶手段と、同
時にフェッチした複数の命令それぞれについて、各命令
の種類、各命令の使用するレジスタならびに該当する前
記第１および第２の情報に基づいて、直ちに実行できる
か、もしくは前記命令蓄積手段に投入するか、またはフ
ェッチを中断させるかを判断するとともに、直ちに実行
できると判断された命令がＮＯＰ命令もしくはＮＯＰ命
令に相当する命令の場合または直ちに実行できるとは判
断されなかった場合に該命令に対応する前記命令蓄積手
段に命令が存在するならば、該命令蓄積手段内の命令に
ついて、該命令の使用するレジスタならびに該当する前
記第１および第２の情報に基づいて、直ちに実行できる
か否かを判断する判断手段とを備えたことを特徴とす
る。The present invention comprises a plurality of arithmetic processing units for executing a given instruction and a plurality of registers used for executing the instruction, and stores a plurality of instructions previously associated with the arithmetic processing unit. An arithmetic processing unit capable of simultaneously fetching and processing a plurality of instructions in parallel by an arithmetic processing unit corresponding to each of the corresponding instructions, wherein a plurality of simultaneously fetched instructions which cannot be immediately executed are replaced by a subsequent instruction Instruction storage means provided for each arithmetic unit to temporarily save the data so that it can be executed in advance, and for each register, execution is still performed by a preceding instruction that writes to the register. First information indicating whether or not there is an uncompleted instruction; and second information indicating the number of preceding instructions which have not yet been executed by referring to the register. For each of a plurality of instructions fetched at the same time, based on the type of each instruction, the register used by each instruction, and the corresponding first and second information, Alternatively, it is determined whether the instruction is to be input to the instruction storage means or the fetch is interrupted, and if the instruction determined to be executable immediately is a NOP instruction or an instruction corresponding to a NOP instruction, or not determined to be executable immediately. If there is an instruction in the instruction storage means corresponding to the instruction, the instruction in the instruction storage means is determined based on the register used by the instruction and the corresponding first and second information. Determining means for determining whether or not the processing can be executed immediately.

【００１９】好ましくは、フェッチした命令を対応する
命令蓄積手段に投入するときに、前記記憶手段に記憶さ
れている情報のうち、該命令のディスティネーションと
なるレジスタの第１の情報をセットし、かつ、該命令の
ソースとなるレジスタの第２の情報をインクリメント
し、フェッチした命令を対応する命令蓄積手段から取り
出すときに、前記記憶手段に記憶されている情報のう
ち、該命令のディスティネーションとなるレジスタの第
１の情報をリセットし、かつ、該命令の該ソースとなる
レジスタの第２の情報をデクリメントする、第１の情報
更新手段と、実行中のロード命令がフェッチミスを起こ
したときに、前記記憶手段に記憶されている情報のう
ち、該命令がロードしたデータを書き込むべきレジスタ
の第１の情報をセットし、フェッチミスを起こしたロー
ド命令が実行完了となるとき、前記記憶手段に記憶され
ている情報のうち、該命令がロードしたデータを書き込
むべきレジスタの第１の情報をリセットする、第２の情
報更新手段とをさらに備えるようにしてもよい。Preferably, when the fetched instruction is input to a corresponding instruction storage means, first information of a register serving as a destination of the instruction is set among information stored in the storage means, In addition, when incrementing the second information of the register serving as the source of the instruction and extracting the fetched instruction from the corresponding instruction storage means, the information stored in the storage means includes a destination of the instruction, First information updating means for resetting first information of a register and decrementing second information of a register serving as the source of the instruction, and when a load instruction being executed causes a fetch miss Among the information stored in the storage means, first information of a register to which data loaded by the instruction is to be written is set. When the execution of the load instruction in which the fetch error has occurred is completed, first information of a register to which data loaded by the instruction is to be written is reset among information stored in the storage means. Means may be further provided.

【００２０】好ましくは、前記命令蓄積手段は、蓄積し
ている命令毎に、該命令のディスティネーションとなる
レジスタと第１のソースとなるレジスタが同一のレジス
タである場合に、その旨を示す第１のタグと、該命令の
ディスティネーションとなるレジスタと第２のソースと
なるレジスタが同一のレジスタである場合に、その旨を
示す第２のタグとを記憶し、フェッチした命令を対応す
る命令蓄積手段に投入するときに、該当する場合に前記
第１または第２のタグをセットし、フェッチした命令を
対応する命令蓄積手段から取り出すときに、該当する場
合に前記第１のまたは第２のタグをリセットする、第３
の情報更新手段をさらに備えるようにしてもよい。Preferably, for each stored instruction, the instruction storage means, when the register serving as the destination of the instruction and the register serving as the first source are the same register, indicating that the register is the same. In the case where the register serving as the destination of the instruction and the register serving as the second source are the same register, a second tag indicating that fact is stored, and the fetched instruction is stored in the corresponding instruction. When inputting to the storage means, the first or second tag is set when applicable, and when the fetched instruction is retrieved from the corresponding instruction storage means, the first or second tag is applied when applicable. Reset tag, 3rd
May be further provided.

【００２１】好ましくは、前記判断手段は、前記命令蓄
積手段の命令の実行可否を判断する際、該命令の前記第
１のタグがセットされている場合には、前記記憶手段に
記憶された情報のうち、該命令の第２のソースとなるレ
ジスタの前記第１の情報がリセットされており、かつ、
該命令のディスティネーションとなるレジスタの前記第
２の情報が０であるならば、実行できるものと判断し、
該命令の前記第２のタグがセットされている場合には、
前記記憶手段に記憶された情報のうち、該命令の第１の
ソースとなるレジスタの前記第１の情報がリセットされ
ており、かつ、該命令のディスティネーションとなるレ
ジスタの前記第２の情報が０であるならば、実行できる
ものと判断するようにしてもよい。Preferably, when determining whether or not the instruction of the instruction storage means can be executed, the determining means determines whether or not the first tag of the instruction is set, and stores the information stored in the storage means. Wherein the first information of a register serving as a second source of the instruction is reset, and
If the second information of the register serving as the destination of the instruction is 0, it is determined that the instruction can be executed;
If the second tag of the instruction is set,
Among the information stored in the storage means, the first information of a register serving as a first source of the instruction is reset, and the second information of a register serving as a destination of the instruction is If it is 0, it may be determined that it can be executed.

【００２２】好ましくは、前記判断手段は、フェッチさ
れた命令の実行可否を判断する際、前記記憶手段に記憶
された情報のうち、フェッチした命令のディスティネー
ションとなるレジスタの前記第１の情報がセットされて
いる場合および／または該レジスタの前記第２の情報が
０でない場合には、フェッチを中断すると判断するよう
にしてもよい。Preferably, when determining whether or not the fetched instruction can be executed, the determination unit may determine whether the first information of a register serving as a destination of the fetched instruction is included in the information stored in the storage unit. When set and / or when the second information of the register is not 0, it may be determined that the fetch is interrupted.

【００２３】好ましくは、前記判断手段は、フェッチさ
れた命令の実行可否を判断する際、フェッチの中断が発
生せず、かつ、前記記憶手段に記憶された情報のうち、
フェッチした命令のソースとなるレジスタの前記第１の
情報がセットされている場合には、該命令を前記命令蓄
積手段に投入すべきと判断するようにしてもよい。Preferably, when determining whether or not the fetched instruction can be executed, the determination means does not cause interruption of the fetch and, among the information stored in the storage means,
When the first information of the register serving as the source of the fetched instruction is set, it may be determined that the instruction should be input to the instruction storage means.

【００２４】好ましくは、前記判断手段は、フェッチし
た命令がロード命令、ストア命令または条件ブランチ命
令に該当するものであるときは、いかなる場合において
も該命令を前記命令蓄積手段に投入しないと判断するよ
うにしてもよい。Preferably, when the fetched instruction corresponds to a load instruction, a store instruction or a conditional branch instruction, the determination means determines that the instruction is not input to the instruction storage means in any case. You may do so.

【００２５】好ましくは、実行中の条件ブランチ命令の
条件予測に失敗した場合に、該失敗した条件予測に基づ
いてフェッチされ実行中になっている命令および前記命
令蓄積手段に投入され蓄積されている命令をキャンセル
する手段をさらに備えるようにしてもよい。Preferably, when the condition prediction of the currently executed conditional branch instruction fails, the instruction fetched and being executed based on the failed condition prediction and the instruction stored in the instruction storage means. Means for canceling the command may be further provided.

【００２６】好ましくは、前記命令蓄積手段をＦＩＦＯ
バッファで構成するようにしてもよい。Preferably, the instruction storage means is a FIFO
You may make it comprise a buffer.

【００２７】好ましくは、前記命令はパイプライン処理
により実行し、前記実行可否の判断をデコードステージ
（Ｄステージ）で行い、前記情報の更新をデコードステ
ージおよびメモリステージ（Ｍステージ）で行うように
してもよい。Preferably, the instruction is executed by pipeline processing, the determination of the execution is performed in a decode stage (D stage), and the update of the information is performed in a decode stage and a memory stage (M stage). Is also good.

【００２８】本発明に係る命令順序制御方法は、命令を
フェッチしてその実行可否を判断し、前記フェッチされ
た命令が実行できないものと判断された場合、フェッチ
の中断が発生していなければ、該命令を一時記憶手段に
蓄積することを決定し、前記フェッチされた命令を前記
命令蓄積手段に蓄積することが決定された場合、該命令
記憶手段に最も早く蓄積された他の命令の実行可否を判
断し、前記他の命令が実行できると判断された場合、該
命令を実行することを決定することを特徴とする。According to the instruction order control method of the present invention, an instruction is fetched to determine whether or not it can be executed. If it is determined that the fetched instruction cannot be executed, the fetch is not interrupted. When it is determined that the instruction is to be stored in the temporary storage unit, and when it is determined that the fetched instruction is to be stored in the instruction storage unit, whether the other instruction stored in the instruction storage unit is executed first is determined. And if it is determined that the other instruction can be executed, it is determined that the instruction is to be executed.

【００２９】本発明は、与えられた命令を実行する複数
の演算処理ユニットと、命令の実行のために使用される
複数のレジスタと、各レジスタの使用状況に関する情報
を記憶する記憶手段と、フェッチした命令が直ちには実
行できない場合に該命令を、後続の命令の先行実行を可
能とするために待避させる、各演算ユニット毎に対応し
て設けられた命令蓄積手段とを備え、予め演算処理ユニ
ットに対応付けられた複数の命令を同時にフェッチし、
前記命令蓄積手段を利用して命令の順序を変更しつつ複
数の命令を処理していく演算処理装置のための命令順序
制御方法であって、複数の命令を同時にフェッチし、前
記記憶手段に記憶された情報に基づき、各演算ユニット
について、フェッチされた命令を実行するか、前記命令
蓄積手段に蓄積された命令を実行するか、またはＮＯＰ
命令を実行するかを判断するとともに、フェッチされた
命令を実行しない場合に該命令を前記命令蓄積手段に蓄
積するか否かを判断し、前記命令蓄積手段に蓄積された
命令を実行することまたはフェッチされた命令を前記命
令蓄積手段に蓄積することとの少なくとも一方が判断さ
れた場合に前記記憶手段に記憶された情報を更新すると
ともに、前記命令蓄積手段からの命令の取り出しと前記
命令蓄積手段への命令の蓄積の該当するものを実行し、
前記判断された命令を実行することを特徴とする。According to the present invention, there are provided a plurality of arithmetic processing units for executing a given instruction, a plurality of registers used for executing the instruction, storage means for storing information on the use status of each register, and fetching. Instruction storage means provided corresponding to each operation unit for saving the instruction in order to enable the preceding execution of a subsequent instruction when the instruction cannot be immediately executed. Fetch multiple instructions associated with at the same time,
An instruction sequence control method for an arithmetic processing unit that processes a plurality of instructions while changing the order of instructions using the instruction storage unit, wherein the plurality of instructions are fetched simultaneously and stored in the storage unit. Based on the received information, for each arithmetic unit, execute the fetched instruction, execute the instruction stored in the instruction storage means,
Determining whether to execute the instruction, determining whether to store the instruction in the instruction storage means when not executing the fetched instruction, and executing the instruction stored in the instruction storage means; or When at least one of storing the fetched instruction in the instruction storage means is determined, the information stored in the storage means is updated, the instruction is fetched from the instruction storage means, and the instruction storage means is stored. Execute the applicable one of the accumulation of instructions into the
Executing the determined command.

【００３０】なお、装置に係る本発明は方法に係る発明
としても成立し、方法に係る本発明は装置に係る発明と
しても成立する。It should be noted that the present invention relating to the apparatus is also established as an invention relating to a method, and the present invention relating to a method is also established as an invention relating to an apparatus.

【００３１】本発明では、先行する命令が直ちには実行
できない場合に、これを一時待避しておき、後続の命令
を先に実行できるようにしたので、処理の高速化を図る
ことができる。In the present invention, when the preceding instruction cannot be executed immediately, it is temporarily saved, and the subsequent instruction can be executed first, so that the processing can be speeded up.

【００３２】また、ハードウェアも簡易な構成で済むの
で、効果的な高速化を期待することができる。Further, since the hardware can have a simple configuration, it is possible to expect an effective increase in speed.

【００３３】[0033]

【発明の実施の形態】以下、図面を参照しながら発明の
実施の形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００３４】本実施形態で示す方式は、スーパースカラ
方式とＶＬＩＷ方式の中間に位置するもので、これをダ
イナミックＶＬＩＷ方式と呼ぶものとする。本実施形態
に係るダイナミックＶＬＩＷ方式は、基本的にはＶＬＩ
Ｗ方式でありながら一部をダイナミックに実行すること
により、コンパイラ時に予測困難な事項に対してもある
程度ダイナミックに動作し、プロセッサ全体を止めるこ
となく処理を進めることができるようにしたものであ
る。つまり、このダイナミックＶＬＩＷ方式は、ハード
ウェアとソフトウェア（コンパイラ）の新たな最適点を
求め、性能を最適化することを目指したものである。The system shown in this embodiment is located between the super scalar system and the VLIW system, and is called the dynamic VLIW system. The dynamic VLIW method according to the present embodiment basically includes a VLI
Even though the W method is used, a part of the processing is dynamically executed, so that it is possible to dynamically operate to a certain degree even with respect to items that are difficult to predict at the time of the compiler, so that the processing can be performed without stopping the entire processor. In other words, the dynamic VLIW method seeks a new optimum point of hardware and software (compiler) and aims at optimizing performance.

【００３５】また、ダイナミックＶＬＩＷ方式における
ハードウェアの複雑さについては、スーパースカラ方式
が備えるもののうちでも特にサイクルタイムにインパク
トの大きいリオーダーバッファや命令発行の部分を省い
た構成であるので、比較的単純なハードウェアであり、
プログラムの実行について効果的な高速化が期待でき
る。Regarding the hardware complexity in the dynamic VLIW system, since the reorder buffer and the instruction issue portion which have a particularly large impact on the cycle time are omitted from the super scalar system, the hardware is relatively simple. Simple hardware,
Effective speeding up of program execution can be expected.

【００３６】以下、本実施形態のダイナミックＶＬＩＷ
方式について詳細に説明する。Hereinafter, the dynamic VLIW of this embodiment will be described.
The method will be described in detail.

【００３７】まず、本実施形態に係るダイナミックＶＬ
ＩＷ方式の概要について図１および図２を参照しながら
説明する。First, the dynamic VL according to the present embodiment
An outline of the IW method will be described with reference to FIGS.

【００３８】以下では、説明を簡単にするために、１つ
のＶＬＩＷの命令を構成している個々の命令をアトム、
複数のアトムにより構成される同時にフェッチされる単
位をＶＬＩＷ命令と呼ぶことにする。また、ＶＬＩＷ命
令を構成する個々のアトムが入るべき位置を、スロット
と呼ぶことにする。実際のプログラムは、複数のＶＬＩ
Ｗ命令の列からなる。In the following, for the sake of simplicity, the individual instructions making up one VLIW instruction are represented by an atom,
A simultaneously fetched unit composed of a plurality of atoms is referred to as a VLIW instruction. The position where each atom constituting the VLIW instruction should enter is called a slot. The actual program has multiple VLIs
It consists of a sequence of W instructions.

【００３９】図１に、１つのＶＬＩＷ命令の一例を示
す。これは、３つのアトムから１つのＶＬＩＷ命令が構
成される例である。FIG. 1 shows an example of one VLIW instruction. This is an example in which one VLIW instruction is composed of three atoms.

【００４０】このダイナミックＶＬＩＷ方式では、レジ
スタについては、リネーミングの構成を持たず、コンパ
イラによりレジスタを割り当てることとしている。レジ
スタリネーミングを行わないようにすることで、ハード
ウェアを単純にすることができる。なお、このために、
ＶＬＩＷの命令列を生成するコンパイラとして、フォー
ルスディペンデンシが起こらないようにレジスタ割付を
行うものが用いられるものとする（公知のコンパイラで
構わない）。In the dynamic VLIW system, registers are not assigned a renaming structure, and are assigned by a compiler. Eliminating register renaming can simplify the hardware. Note that for this,
As a compiler that generates a VLIW instruction sequence, a compiler that allocates registers so that false dependency does not occur is used (a known compiler may be used).

【００４１】命令の発行については従来のＶＬＩＷの場
合と同じで、コンパイラによって割り付けられた複数の
命令を同時に実行するが、従来のＶＬＩＷの場合と違う
点は、ＶＬＩＷの複数の命令のうち一部が実行できなか
った場合の処理にある（この場合、従来のＶＬＩＷで
は、フェッチを中断することになる；つまり、アウトオ
ブオーダーをしない）。The instruction issuance is the same as that of the conventional VLIW, and a plurality of instructions assigned by the compiler are executed at the same time. However, the difference from the conventional VLIW is that some of the instructions of the VLIW are different. Is not executed (in this case, the fetch is interrupted in the conventional VLIW; that is, out-of-order is not performed).

【００４２】図２に、本実施形態のダイナミックＶＬＩ
Ｗ方式を表す概念的な図を示す。図２では、２つのパイ
プラインユニット（６−１，６−２）を持つ場合を例と
している。FIG. 2 shows the dynamic VLI of this embodiment.
FIG. 2 shows a conceptual diagram illustrating the W method. FIG. 2 shows an example in which two pipeline units (6-1 and 6-2) are provided.

【００４３】このダイナミックＶＬＩＷ方式では、概略
的には、通常の命令列からフェッチした命令が直ちには
実行できない場合にこれを実行待ちとして待避させてお
くためのペンディングキュー（ＰｅｎｄｉｎｇＱｕｅ
ｕｅ）というスロット毎に独立に設けたキュー（図２で
は、２−１，２−２）と、各レジスタの使用状況に関す
る情報を各レジスタ毎に管理するためのスコアボード
（図２では、４）というテーブルを用いて、アウトオブ
オーダーを実現している。In the dynamic VLIW system, generally, when an instruction fetched from a normal instruction sequence cannot be immediately executed, a pending queue (Pending Queue) for saving the instruction as an execution wait.
ue) independently provided for each slot (2-1 and 2-2 in FIG. 2), and a scoreboard (4 in FIG. 2) for managing information on the use status of each register for each register. Out-of-order is realized using a table called ")."

【００４４】フェッチされたＶＬＩＷ命令の複数のアト
ムのうち実行されないアトムは、実行可能になるまでＰ
ｅｎｄｉｎｇＱｕｅｕｅに保存される。このＰｅｎｄ
ｉｎｇＱｕｅｕｅはＦＩＦＯ（先入れ先出し型のバッ
ファ）で構成するのが好ましい。An atom that is not executed among a plurality of atoms of the fetched VLIW instruction is P until the execution becomes executable.
It is stored in the ending Queue. This Pend
The ing Queue is preferably constituted by a FIFO (first-in first-out buffer).

【００４５】ＰｅｎｄｉｎｇＱｕｅｕｅをＦＩＦＯで
構成すると、ＰｅｎｄｉｎｇＱｕｅｕｅに蓄積された
先頭のアトムから順に実行されることになり、この点が
従来のスーパースカラのリオーダーバッファの場合と異
なっている。つまり、実行可能なアトムがＰｅｎｄｉｎ
ｇＱｕｅｕｅに存在するのに実行できない場合がある
という性能上の制約と引き換えに、ハードウェアを非常
に単純化させて高速化を図っている。If the Pending Queue is configured with a FIFO, it is executed in order from the first atom stored in the Pending Queue, which is different from the conventional superscalar reorder buffer. In other words, the executable atom is Pendin
In exchange for the performance constraint that there may be cases where execution cannot be performed even though g Queue exists, hardware is greatly simplified to achieve high speed.

【００４６】さらに、ＰｅｎｄｉｎｇＱｕｅｕｅは、
ＶＬＩＷ命令を構成する個々のアトムが入るべきスロッ
トごとに設けるのが好ましい。例えば、図２に例示した
ＶＬＩＷ命令の形式を使う場合には、スロットが２つあ
るので、ＰｅｎｄｉｎｇＱｕｅｕｅは２つ用意される
ことになる。そして、フェッチされたＶＬＩＷ命令のう
ち実行されないアトムは、そのスロットに対応するＰｅ
ｎｄｉｎｇＱｕｅｕｅに投入する。このようにスロット
ごとにＰｅｎｄｉｎｇＱｕｅｕｅが存在し、スロット
間をまたぐことがないことも、ハードウェアを単純化し
て高速化を図るために行っている制限の一つである。Further, the Pending Queue is:
It is preferably provided for each slot in which an individual atom constituting the VLIW instruction is to be inserted. For example, when the format of the VLIW instruction illustrated in FIG. 2 is used, since there are two slots, two Pending Queues are prepared. An atom that is not executed among the fetched VLIW instructions is the Pe corresponding to the slot.
Input to ndingQueue. As described above, the Pending Queue exists for each slot, and the fact that the slot does not span between slots is one of the restrictions that are made to simplify the hardware and increase the speed.

【００４７】各サイクル／各スロットにおいて、実行の
機会を与えるアトムには、通常の命令列からフェッチし
たアトムと、ＰｅｎｄｉｎｇＱｕｅｕｅが空でない場
合におけるＰｅｎｄｉｎｇＱｕｅｕｅからのアトムと
があり得るが、（１）フェッチしたアトム、（２）Ｐｅ
ｎｄｉｎｇＱｕｅｕｅのアトムの順に、実行が優先さ
れる。In each cycle / slot, an atom giving an opportunity for execution may be an atom fetched from a normal instruction sequence, or an atom from a Pending Queue when the Pending Queue is not empty. (1) Atom fetched, (2) Pe
Execution takes precedence in the order of the atoms of the nding Queue.

【００４８】すなわち、各ＰｅｎｄｉｎｇＱｕｅｕｅ
の先頭のアトムは、通常の命令列からフェッチしたＶＬ
ＩＷ命令の対応するスロットのアトムが実行できない場
合（フェッチのキャッシュミスの場合、フェッチの中断
の場合も含む）、または該フェッチしたアトムがＮＯＰ
アトムである場合に、該ＰｅｎｄｉｎｇＱｕｅｕｅの
アトムが実行可能であれば実行することができる（この
場合に、該ＰｅｎｄｉｎｇＱｕｅｕｅのアトムも実行
できなければ、そのサイクルではそのスロットは空きス
ロットとなる）。That is, each Pending Queue
Is the VL fetched from the normal instruction sequence.
If the atom in the slot corresponding to the IW instruction cannot be executed (including a fetch cache miss, interruption of fetch), or if the fetched atom is a NOP
If the atom is an atom, it can be executed if the atom of the Pending Queue is executable (in this case, if the atom of the Pending Queue is not executable, the slot becomes an empty slot in the cycle).

【００４９】実行の機会が与えられたアトム、すなわち
フェッチしたアトムまたは上記の場合のＰｅｎｄｉｎｇ
Ｑｕｅｕｅの先頭にあるアトム、が実行可能かどうか
についての判定は、スコアボードの内容（当該アトムに
関連するレジスタの使用状況）に基づいて行い、基本的
には、当該アトムが使うレジスタが当該アトムにとって
利用可能でないときは、当該アトムが実行できないと判
定される（なお、詳しくは後述するが、通常の命令列か
らフェッチしたアトムとＰｅｎｄｉｎｇＱｕｅｕｅか
らのアトムとでは実行可能かどうかの判定基準が相違す
るものとなる）。The atom given the opportunity to execute, ie, the fetched atom or the Pending in the above case
The determination as to whether the atom at the head of the Queue is executable is made based on the contents of the scoreboard (the usage of registers related to the atom). Basically, the register used by the atom is If the atom cannot be executed, it is determined that the atom cannot be executed (details will be described later, but an atom fetched from a normal instruction sequence and an atom from the Pending Queue have different criteria for determining whether the atom can be executed). Will do).

【００５０】スコアボードは、フェッチしたアトムをＰ
ｅｎｄｉｎｇＱｕｅｕｅに入れるとき、Ｐｅｎｄｉｎ
ｇＱｕｅｕｅからアトムを出すとき、ロード命令がキ
ャッシュミスを起こしたとき、キャッシュミスを起こし
たロード命令が完了するときに、該当するレジスタの内
容について、更新される。The scoreboard uses the fetched atom as P
When you put it in the ending queue,
When issuing an atom from g Queue, when a load instruction causes a cache miss, and when the load instruction causing the cache miss is completed, the contents of the corresponding register are updated.

【００５１】以上のように、本実施形態に係るダイナミ
ックＶＬＩＷ方式では、直ちには実行できないアトムを
ＰｅｎｄｉｎｇＱｕｅｕｅに一時待避しておき、それ
が実行可能になったら実行するという方法で、アウトオ
ブオーダーを実現している。また、実行可能かどうかの
判定は、独自のスコアボードによる判定を採用してい
る。As described above, in the dynamic VLIW method according to the present embodiment, an atom that cannot be executed immediately is temporarily saved in the Pending Queue, and executed once it becomes executable. Has been realized. Also, the determination as to whether or not execution is possible employs a determination based on an original scoreboard.

【００５２】なお、フェッチしたときにそのタイミング
での実行が不可能なアトムでも、そのすべてをＰｅｎｄ
ｉｎｇＱｕｅｕｅに入れるわけではなく、トラップが
起こる場合などを考えて、ＰｅｎｄｉｎｇＱｕｅｕｅ
に入れないようにする場合が存在する。すなわち、Ｐｅ
ｎｄｉｎｇＱｕｅｕｅには、フェッチしたときに実行
不可と判定されたアトムであってＰｅｎｄｉｎｇＱｕ
ｅｕｅに入れる条件を満たすアトムを入れる。なお、こ
れらの条件については、後に詳しく述べることとする。Even if an atom cannot be executed at the time of fetching, all of the atoms are Pend
Pinging Queue is not considered in the Ping Queue.
There are cases where it is not allowed to enter. That is, Pe
The nding Queue includes an atom determined to be unexecutable at the time of fetching, and a Pending Que.
Insert an atom that satisfies the conditions for eue. Note that these conditions will be described later in detail.

【００５３】フェッチしたときに実行不可と判定された
アトムをＰｅｎｄｉｎｇＱｕｅｕｅに入れることがで
きない場合には、そのアトムが実行可能になるまでフェ
ッチを中断することになる。フェッチを中断した場合
（たとえ、１つのアトムのために中断することになった
場合でも）、そのＶＬＩＷ命令の全てのアトムを実行し
ないようにする点は、従来のＶＬＩＷと同様である。If an atom determined to be unexecutable at the time of fetch cannot be put in the Pending Queue, the fetch is suspended until the atom becomes executable. When the fetch is interrupted (even if interrupted due to one atom), all atoms of the VLIW instruction are not executed as in the conventional VLIW.

【００５４】次に、ダイナミックＶＬＩＷの作用効果を
示すために、簡単な例を使ってその概要を説明する。Next, an outline of the dynamic VLIW will be described using a simple example in order to show the effects of the dynamic VLIW.

【００５５】図３に、実行される命令列の例として、一
つのＶＬＩＷ命令に二つのアトムが含まれる場合の命令
列の一例と示す。FIG. 3 shows an example of an instruction sequence when two atoms are included in one VLIW instruction as an example of an instruction sequence to be executed.

【００５６】なお、以下では、各アトムは、ニーモニッ
ク、ディスティネーション（ｄｅｓｔ）のレジスタ、第
１のソース（ｓｒｃ１）のレジスタ、第２のソース（ｓ
ｒｃ２）のレジスタの順番で表記するものとする。In the following, each atom is defined as a mnemonic, a destination (dest) register, a first source (src1) register, and a second source (s
rc2) are described in the order of the registers.

【００５７】図３に示されるように、この命令列は、ＡＤＤＲ８，Ｒ９，Ｒ１０とＬＤＲ５，（Ｒ３）、ＬＤＩＲ１８，１０００とＡＤＤＩＲ１３，Ｒ９，４、ＡＤＤＲ２１，Ｒ１８，Ｒ９とＳＵＢＲ１１，Ｒ５，Ｒ８、ＬＳＲＲ２２，Ｒ２１，５とＯＲＩＲ２４，Ｒ２１，０ｘＦＦ、ＳＵＢＩＲ２５，Ｒ２４，５とＮＯＰ、ＢＲＺＲ１１，Ｒ０，ＲＯＯＰ＿ＥＸＴとＮＯＰが、この順に１組ずつフェッチされることになる。As shown in FIG. 3, this instruction sequence is composed of ADD R8, R9, R10 and LD R5, (R3), LDI R18, 1000 and ADD R13, R9, 4, ADD R21, R18, R9 and SUB. R11, R5, R8, LSR R22, R21, 5 and ORI R24, R21, 0xFF, SUBI R25, R24, 5 and NOP, BRZ R11, R0, ROOP_EXT and NOP are fetched one by one in this order. .

【００５８】なお、ＮＯＰアトムは、実際になにも動作
を生じさせない命令であってもよいし、ＡＤＤ等を実行
するが結果としてなにも変化が起こらないような命令で
あってもよい。The NOP atom may be an instruction that does not actually cause any operation, or may be an instruction that executes ADD or the like but results in no change.

【００５９】以下、図３に例示した命令列が従来のＶＬ
ＩＷ方式とダイナミックＶＬＩＷ方式とでそれぞれ実行
された場合について比較して説明する。Hereinafter, the instruction sequence illustrated in FIG.
A description will be given of a comparison between the cases where the IW scheme and the dynamic VLIW scheme are executed.

【００６０】図４に、この命令列が従来のＶＬＩＷ方式
で実行された場合の様子を示し、図５に、この命令列が
ダイナミックＶＬＩＷ方式で実行された場合の様子を示
す。FIG. 4 shows a case where this instruction sequence is executed by the conventional VLIW system, and FIG. 5 shows a case where this instruction sequence is executed by the dynamic VLIW system.

【００６１】図４と図５の例では、最初のＶＬＩＷ命令
の第２スロットのアトムであるＬＤ（ロード命令）が１
次キャッシュでミスを起こし、該当するデータが２次キ
ャッシュに存在したために、これをロードしてくるのに
４サイクル必要となったものとする。In the examples of FIGS. 4 and 5, the LD (load instruction) which is the atom of the second slot of the first VLIW instruction is 1
It is assumed that a miss occurs in the next cache and the corresponding data exists in the second cache, so that four cycles are required to load it.

【００６２】図４に示されるように、この命令列を従来
のＶＬＩＷ方式により実行した場合、サイクル１では、
第１スロットのＡＤＤＲ８，Ｒ９，Ｒ１０と第２スロ
ットのＬＤＲ５，（Ｒ３）が実行されるが、第２スロ
ットのＬＤがキャシュミスを起こしたため、サイクル２
〜５の４サイクルは第１、第２スロットともにＬＤのミ
スによるストールになり（この間、フェッチが中断す
る）、その後は、順次命令が実行され、結局、１０サイ
クルを要して処理が完了している。As shown in FIG. 4, when this instruction sequence is executed by the conventional VLIW method, in cycle 1,
ADD R8, R9, R10 in the first slot and LD R5, (R3) in the second slot are executed. However, since the LD in the second slot causes a cache miss, the cycle 2 is executed.
In the four cycles of (1) to (5), both the first and second slots are stalled due to an LD miss (the fetch is interrupted during this period). Thereafter, the instructions are sequentially executed. ing.

【００６３】次に、図５に示されるように、この命令列
をダイナミックＶＬＩＷ方式により実行した場合、ま
ず、サイクル１では、第１スロットのＡＤＤＲ８，Ｒ
９，Ｒ１０と第２スロットのＬＤＲ５，（Ｒ３）が実
行され、ＬＤがキャシュミスを起す。次のサイクルから
は、このＬＤのディスティネーション・レジスタである
Ｒ５を使用するアトムは、ＬＤが完了するまで実行でき
なくなる（このレジスタＲ５の状況は、スコアボードに
反映される）。Next, as shown in FIG. 5, when this instruction sequence is executed by the dynamic VLIW method, first, in cycle 1, ADD R8, R
9, R10 and LD R5, (R3) in the second slot are executed, and the LD causes a cache miss. From the next cycle, an atom using the destination register R5 of this LD cannot be executed until the LD is completed (the status of this register R5 is reflected on the scoreboard).

【００６４】サイクル２では、ＶＬＩＷ命令の各アトム
はＬＤのディスティネーション・レジスタであるＲ５を
使用しないため、ＬＤＩＲ１８，１０００とＡＤＤＩ
Ｒ１３，Ｒ９，４が実行される。In cycle 2, since each atom of the VLIW instruction does not use R5, which is the destination register of LD, LDI R18,1000 and ADDI
R13, R9, and 4 are executed.

【００６５】サイクル３では、第１スロットのＡＤＤ
Ｒ２１，Ｒ１８，Ｒ９はＲ５を使用しないため実行され
るが、第２スロットのＳＵＢＲ１１，Ｒ５，Ｒ８は、
Ｒ５を第１のソースレジスタとして参照するので、実行
できずにＰｅｎｄｉｎｇＱｕｅｕｅへ投入される（ス
コアボードを参照することによってＲ５が使用できない
ことが分かることから、実行できないことが分かる）。
また、次のサイクルからは、ＳＵＢのディスティネーシ
ョン・レジスタであるＲ１１を使用するアトム（このＳ
ＵＢを除く）は、このＳＵＢが完了するまで実行できな
くなる（このレジスタＲ１１の状況も、スコアボードに
反映される）。In cycle 3, ADD of the first slot
R21, R18, and R9 are executed because R5 is not used, but SUB R11, R5, and R8 in the second slot are
Since R5 is referred to as the first source register, it cannot be executed and is input to the Pending Queue. (It can be seen that R5 cannot be used by referring to the scoreboard, indicating that it cannot be executed.)
From the next cycle, an atom using the destination register R11 of SUB (this S
(Excluding UB) cannot be executed until this SUB is completed (the status of this register R11 is also reflected on the scoreboard).

【００６６】サイクル４では、Ｒ５もＲ１１も使用され
ないので、ＬＳＲＲ２２，Ｒ２１，５とＯＲＩＲ２
４，Ｒ２１，０ｘＦＦが実行される。In cycle 4, since neither R5 nor R11 is used, LSR R22, R21, 5 and ORI R2
4, R21, 0xFF is executed.

【００６７】サイクル５では、Ｒ５もＲ１１も使用され
ないので、ＳＵＢＩＲ２５，Ｒ２４，５とＮＯＰが実
行される。In cycle 5, since neither R5 nor R11 is used, NOBI is executed with SUBI R25, R24, and 5.

【００６８】ここで、ＬＤが完了し、次のサイクルから
は、Ｒ５が使用可能となる（このレジスタＲ５の状況
も、スコアボードに反映される）。Here, the LD is completed, and from the next cycle, R5 becomes available (the status of this register R5 is also reflected on the scoreboard).

【００６９】サイクル６では、まず、第１スロットのＢ
ＲＺＲ１１，Ｒ０，ＲＯＯＰ＿ＥＸは、Ｒ１１をディ
スティネーションとするので、実行できないことがわか
る。なお、詳しくは後述するが、ディスティネーション
とするレジスタが使用できない場合には、Ｐｅｎｄｉｎ
ｇＱｕｅｕｅへは投入せずに、実行可能になるのを待
つ（フェッチを中断する）。従って、このサイクルは、
空きスロットとなる。フェッチが中断するので、フェッ
チした第２スロットの命令も実行が保留される。In cycle 6, first, B in the first slot
RZ R11, R0, and ROOP_EX cannot be executed because R11 is the destination. As will be described in detail later, if a register to be used as a destination cannot be used, Pendin
g Do not put in Queue, but wait for execution (suspend fetch). Therefore, this cycle
It becomes an empty slot. Since the fetch is interrupted, the execution of the fetched instruction in the second slot is also suspended.

【００７０】ここで、第２スロットでは、フェッチの中
断が発生したので、ＰｅｎｄｉｎｇＱｕｅｕｅ中のア
トムに実行の機会が与えられる。ＰｅｎｄｉｎｇＱｕ
ｅｕｅにあるＳＵＢＲ１１，Ｒ５，Ｒ８は、先のＬＤ
が完了し、Ｒ５が使用可能となっているので、実行可能
であり（スコアボードを参照することによって実行でき
ることが分かる）、したがってＳＵＢＲ１１，Ｒ５，
Ｒ８がＰｅｎｄｉｎｇＱｕｅｕｅから取り出され、実
行される。Here, in the second slot, since the interruption of the fetch has occurred, the atom in the Pending Queue is given an opportunity to execute. Pending Qu
SUB R11, R5, R8 in eue is the LD
Is completed and R5 is available, so it is executable (it can be seen by referring to the scoreboard), and therefore SUB R11, R5
R8 is retrieved from the Pending Queue and executed.

【００７１】ここで、ＳＵＢが完了し、次のサイクルか
らは、Ｒ１１が使用可能となる（このレジスタＲ１１の
状況も、スコアボードに反映される）。Here, the SUB is completed, and R11 becomes usable from the next cycle (the status of the register R11 is also reflected on the scoreboard).

【００７２】サイクル７では、第１スロットで実行を待
っていたＢＲＺＲ１１，Ｒ０，ＲＯＯＰ＿ＥＸＴが、
実行可能となって、実行され、第２スロットでは実行を
待っていたＮＯＰが実行される。In cycle 7, BRZ R11, R0, ROOP_EXT waiting for execution in the first slot are
Executable and executed, and in the second slot, the NOP waiting for execution is executed.

【００７３】この結果、７サイクルを要して処理が完了
したことになる。As a result, the process is completed in seven cycles.

【００７４】以上のように、従来のＶＬＩＷ方式では１
０サイクルかかるところが、ダイナミックＶＬＩＷ方式
ではＬＤアトムによるミスの期間中に他のアトムが実行
できるアウトオブオーダーの機能により、７サイクルで
実行が完了し、高速化できることがわかる。As described above, in the conventional VLIW system, 1
Although it takes 0 cycles, in the dynamic VLIW method, the execution can be completed in 7 cycles and the speed can be increased by the out-of-order function that another atom can execute during a miss due to the LD atom.

【００７５】以下では、本実施形態をさらに詳しく説明
する。Hereinafter, the present embodiment will be described in more detail.

【００７６】図６に、各サイクル／各スロットにおい
て、通常の命令列からのアトムを実行するか、Ｐｅｎｄ
ｉｎｇＱｕｅｕｅのアトムを実行するか、いずれも実
行しないかが決定される際の大略手順の一例を示す。FIG. 6 shows that in each cycle / each slot, an atom from a normal instruction sequence is executed or Pend
An example of a general procedure for determining whether to execute an atom of ing Queue or not to execute any of them will be described.

【００７７】まず、通常の命令列から１つのＶＬＩＷ命
令をフェッチする（ステップＳ１）。ただし、フェッチ
が中断されている間は、ステップＳ１のフェッチは行わ
れない。First, one VLIW instruction is fetched from a normal instruction sequence (step S1). However, while the fetch is suspended, the fetch in step S1 is not performed.

【００７８】次に、インストラクションフェッチミスが
発生した場合、フェッチしたアトムそれ自身が実行でき
ない場合（ＰｅｎｄｉｎｇＱｕｅｕｅに投入できるか
否かにかかわらない）、フェッチの中断が発生した場合
（同じＶＬＩＷ命令を構成するこのスロットおよび他の
スロットのアトムのうち１つでも実行できず且つＰｅｎ
ｄｉｎｇＱｕｅｕｅに投入できないものがある場
合）、フェッチした命令がＮＯＰアトムである場合、の
いずれの条件も成立しない場合（ステップＳ２〜Ｓ５）
には、当該スロットのアトムを実行する（ステップＳ
６）。Next, if an instruction fetch miss occurs, if the fetched atom itself cannot be executed (regardless of whether or not it can be put into the Pending Queue), if the fetch is interrupted (the same VLIW instruction Cannot execute even one of the atoms in this slot and other slots, and Pen
If none of the conditions are satisfied in the ding Queue), if the fetched instruction is a NOP atom, or if none of the conditions is satisfied (steps S2 to S5).
Execute the atom of the slot (step S
6).

【００７９】上記した条件のうちのいずれかが成立する
場合（ステップＳ２〜Ｓ５）には、当該スロットのＰｅ
ｎｄｉｎｇＱｕｅｕｅのアトムが実行可能であれば
（ステップＳ７）、当該ＰｅｎｄｉｎｇＱｕｅｕｅの
アトムを実行する（ステップＳ８）。If any of the above conditions is satisfied (steps S2 to S5), the Pe of the slot is
If the atom of the nding Queue can be executed (step S7), the atom of the Pending Queue is executed (step S8).

【００８０】それ以外の場合には、フェッチしたアトム
がＮＯＰアトムならばこれを実行すればよく、そうでな
い場合には、空きスロットとなる（もしくはＮＯＰを実
行させる）（ステップＳ９）。In other cases, if the fetched atom is a NOP atom, this may be executed, and if not, an empty slot (or NOP is executed) (step S9).

【００８１】図７に、ＰｅｎｄｉｎｇＱｕｅｕｅの構
成例を示す。FIG. 7 shows a configuration example of the Pending Queue.

【００８２】ＰｅｎｄｉｎｇＱｕｅｕｅは、ＦＩＦＯ
になっており、各要素は、タグとアトムのフィールドか
らなる。The Pending Queue is a FIFO
Where each element consists of a tag and an atom field.

【００８３】アトムのフィールドは、アトムそのものを
入れておくためのフィールドである。The atom field is a field for storing the atom itself.

【００８４】タグのフィールドは、アトムの２つのｓｒ
ｃにそれぞれ対応して設けたＢｉｔ０とＢｉｔ１からな
り、Ｂｉｔ０とＢｉｔ１の各フィールドにそれぞれ１ビ
ットの情報が入る。このタグの使い方の説明は、次のス
コアボードの説明の中で行う。The tag field is the two sr
Bit 0 and Bit 1 provided corresponding to c, respectively, and each bit of Bit 0 and Bit 1 contains 1-bit information. The use of this tag will be explained in the following scoreboard description.

【００８５】図８に、スコアボードの構成例を示す。こ
の例は、レジスタがＲ０〜Ｒ３１の３２個の場合の例で
ある。FIG. 8 shows a configuration example of the scoreboard. In this example, there are 32 registers R0 to R31.

【００８６】スコアボードは、レジスタ毎に設けられ
る、各レジスタの状態を示すビットの集合（Ｐビットと
ＰｅｎｄｉｎｇＣｏｕｎｔ）を記憶するものである。
詳しくは後述するが、このスコアボードの内容に基づい
て、アトムが実行可能かどうかの判定が行われる。The scoreboard stores a set of bits (P bits and Pending Count) provided for each register and indicating the state of each register.
As will be described in detail later, whether or not the atom is executable is determined based on the content of the scoreboard.

【００８７】スコアボードの基本になる法則は、値が用
意されてないレジスタをソースにした演算はできないと
いうことである。ここで、値が用意されていないとは、
ロード命令による結果がレジスタに書かれていない、す
なわちロードがキャッシュミスした場合が起点になって
いる。The rule underlying the scoreboard is that an operation cannot be performed using a register whose value is not prepared as a source. Here, the value is not prepared,
The starting point is when the result of the load instruction is not written in the register, that is, when the load has a cache miss.

【００８８】コンパイラで予測できない事柄として大き
なものは、キャッシュミスおよびブランチの結果である
が、ダイナミックＶＬＩＷ方式では、キャッシュミスに
対して対策をとっていることになる。The major things that cannot be predicted by the compiler are the results of cache misses and branches. In the dynamic VLIW method, measures are taken against cache misses.

【００８９】スコアボードによってフェッチしたアトム
の実行可能性が判定され、実行可能と判定された場合、
実行不可となった場合のアトムの扱いとしては、次の２
種類の処理がある。（ｉ）アトムをＰｅｎｄｉｎｇＱｕｅｕｅに積む。た
だし、他のスロットのアトムのためにフェッチ中断とな
った場合には、アトムをＰｅｎｄｉｎｇＱｕｅｕｅに
積むことはできない。The feasibility of the fetched atom is determined by the scoreboard, and if it is determined that the fetched atom is feasible,
When the execution of the atom becomes impossible, the following 2
There are different types of processing. (I) Load the atom on the Pending Queue. However, if the fetch is interrupted due to an atom in another slot, the atom cannot be loaded on the Pending Queue.

【００９０】（ii）アトムが実行可能になるまで新たな
フェッチを中断する。これは従来のプロセッサのストー
ルにあたる動作である。ただし、本実施形態では、新た
なフェッチを中断しても、ＰｅｎｄｉｎｇＱｕｅｕｅ
からのアトムは実行できる。(Ii) Suspend the new fetch until the atom becomes executable. This is an operation corresponding to the stall of the conventional processor. However, in this embodiment, even if a new fetch is interrupted, the Pending Queue
Atom from is executable.

【００９１】このようなアトムの実行可能性の判断を行
うもとになるスコアボードは、以下の規則で更新され
る。（ｉ）ロード命令がミスしたときに、ｄｅｓｔ（ディス
ティネーション）のレジスタにＰビットをセットする
（立てる）。（ii）実行不可と判定されたアトムの少なくとも一つの
ｓｒｃ（ソース）のレジスタにＰビットがセットされて
いて、次の（iii ）のケースに一致しない場合は、ｄｅ
ｓｔのレジスタにＰビットをセットし、Ｐｅｎｄｉｎｇ
Ｑｕｅｕｅにこのアトムを入れる（ただし、新たなフ
ェッチを中断すべき場合には該当しない場合）。（iii ）実行不可と判定されたアトムのｓｒｃの一つと
ｄｅｓｔが一致する場合で、一致するｓｒｃのレジスタ
のＰビットが０であり、かつ、一致しないｓｒｃのレジ
スタのＰビットが１であるときは、ｄｅｓｔのレジスタ
のＰビットをセットし、ＰｅｎｄｉｎｇＱｕｅｕｅに
このアトムを入れる（ただし、新たなフェッチを中断す
べき場合には該当しない場合）。The scoreboard from which the execution of such an atom is determined is updated according to the following rules. (I) When a load instruction is missed, the P bit is set (set) in the dest (destination) register. (Ii) If the P bit is set in the register of at least one src (source) of the atom determined to be unexecutable and does not match the following case (iii), de
Set the P bit in the register of st
Put this atom in the Queue (unless the new fetch should be interrupted). (Iii) When dest matches one of the srcs of the atom determined to be unexecutable, and the P bit of the register of the matching src is 0 and the P bit of the register of the src that does not match is 1 Sets the P bit in the register of dest, and puts this atom in the Pending Queue (unless the new fetch should be interrupted).

【００９２】この（iii ）の場合には、併せて、Ｐｅｎ
ｄｉｎｇＱｕｅｕｅのタグのフィールドに次のように
記録する。すなわち、・ｓｒｃ１がｄｅｓｔと一致する場合は、Ｂｉｔ０をセ
ットする。・ｓｒｃ２がｄｅｓｔと一致する場合は、Ｂｉｔ１をセ
ットする。In the case of (iii), Pen
The following is recorded in the field of the ding Queue tag. If src1 matches dest, Bit0 is set. • If src2 matches dest, set Bit1.

【００９３】上記のようにしてセットされたレジスタの
Ｐビットやアトムのタグは、該当するアトムがＰｅｎｄ
ｉｎｇＱｕｅｕｅから取り出されるときに、リセット
される。また、ミスを起こしたロード命令が完了したと
きにも、該当するレジスタのＰビットがリセットされ
る。The P bit of the register and the tag of the atom set as described above indicate that the corresponding atom is Pend.
Reset when fetched from ing Queue. Also, when the load instruction causing the miss is completed, the P bit of the corresponding register is reset.

【００９４】なお、上記の（iii ）のケースにおいて、
タグに関する処理は、例えば、ａｄｄＲ２，Ｒ２，Ｒ
３のようにｓｒｃとなるレジスタとｄｅｓｔとなるレジ
スタとが一致する命令をＰｅｎｄｉｎｇＱｕｅｕｅか
ら正しく出すことができるようにするために必要とな
る。In the above case (iii),
The processing related to the tag is, for example, add R2, R2, R
It is necessary for the Pending Queue to correctly issue an instruction in which the register serving as the src matches the register serving as the dest as shown in FIG.

【００９５】例えば、ａｄｄＲ２，Ｒ２，Ｒ３の場
合、ｄｅｓｔとしてのＲ２のＰビットをセットすること
で、ｓｒｃ１としてのＲ２もＰビットがセットされたこ
とになるが、タグを用いないと、このアトムは、Ｒ２の
Ｐビットがクリアされるまで実行不可の形になり、ま
た、いつまで待ってもＲ２のＰビットがクリアされない
状態になり（Ｒ２のＰビットはこのアトム自身がＰｅｎ
ｄｉｎｇＱｕｅｕｅから取り出されるときにクリアさ
れる）、ＰｅｎｄｉｎｇＱｕｅｕｅに滞留し続けるこ
とになるので、タグを参照して、ｓｒｃとしてのＲ２の
Ｐビットが他のアトムによってセットされたのか、自身
によってセットされたのかを区別する必要がある。For example, in the case of add R2, R2, and R3, by setting the P bit of R2 as dest, the P bit of R2 as src1 is also set. The atom becomes unexecutable until the P bit of R2 is cleared, and the P bit of R2 is not cleared no matter how long it waits.
(cleared when fetched from the Ding Queue), and will continue to stay in the Pending Queue, so referring to the tag, the P bit of R2 as src was set by another atom or set by itself. It is necessary to distinguish between the two.

【００９６】次に、スコアボードには、このＰビットの
他に、ＰｅｎｄｉｎｇＱｕｅｕｅ中に何個の命令がこ
のレジスタを参照しているかを示すカウンタが存在して
いる。それらは以下のような更新規則で更新される。（ｉ）ＰｅｎｄｉｎｇＱｕｅｕｅにアトムが入ると
き、ｓｒｃになるレジスタのカウンタをインクリメント
する。（ii）ＰｅｎｄｉｎｇＱｕｅｕｅよりアトムが出ると
き、ｓｒｃになるレジスタのカウンタをデクリメントす
る。Next, in the scoreboard, in addition to the P bit, there is a counter which indicates how many instructions refer to this register in the Pending Queue. They are updated with the following update rules: (I) When an atom enters the Pending Queue, the counter of the register that becomes src is incremented. (Ii) When an atom comes out of the Pending Queue, the counter of the register which becomes src is decremented.

【００９７】このカウンタは、ＰｅｎｄｉｎｇＣｏｕ
ｎｔ（Ｐカウント）という名前で、レジスタ毎に存在
し、Ｐｅｎｄｉｎｇ中のアトムのソースが上書きされな
いよう保護するための情報を持っている。This counter is used for Pending Cou.
It is called nt (P count) and exists for each register, and has information for protecting the source of an atom under pending from being overwritten.

【００９８】ここで、ＰｅｎｄｉｎｇＱｕｅｕｅにア
トムを積むときにスコアボードとＰｅｎｄｉｎｇＱｕ
ｅｕｅの各情報に対して行われることになる操作を纏め
ると、次のようになる。・該当するｄｅｓｔのＰビットがセットされる。・該当するｓｒｃのＰカウントがインクリメントされ
る。・条件に応じて、Ｂｉｔ０またはＢｉｔ１がセットされ
る。Here, when an atom is placed on the Pending Queue, the scoreboard and the Pending Queue are used.
The operation to be performed on each piece of eue information is summarized as follows. -The P bit of the corresponding dest is set. -The P count of the corresponding src is incremented. -Bit0 or Bit1 is set according to the conditions.

【００９９】また、ＰｅｎｄｉｎｇＱｕｅｕｅからア
トムを出すときにＰｅｎｄｉｎｇＱｕｅｕｅとスコアボ
ードの情報に対して行われることになる操作を纏める
と、次のようになる。・該当するｄｅｓｔのＰビットがリセットされる。・該当するｓｒｃのＰカウントがデクリメントされる。・セットされているＢｉｔ０またはＢｉｔ１がリセット
される。In addition, the operations to be performed on the Pending Queue and the information of the scoreboard when issuing an atom from the Pending Queue are summarized as follows. -The P bit of the corresponding dest is reset. -The P count of the corresponding src is decremented. • Bit0 or Bit1 that has been set is reset.

【０１００】なお、１つのサイクルで、フェッチした命
令をＰｅｎｄｉｎｇＱｕｅｕｅに入れ、かつ、Ｐｅｎ
ｄｉｎｇＱｕｅｕｅからの命令を実行することになっ
た場合には、まず、ＰｅｎｄｉｎｇＱｕｅｕｅから取
り出す命令について、スコアボードやタグの更新を行
い、そして、フェッチした命令について、スコアボード
やタグの更新を行うようにすればよい（後の命令が先の
命令の実行に影響を与えないようにするため）。In one cycle, the fetched instruction is put in the Pending Queue, and the
When the instruction from the ding Queue is to be executed, first, the scoreboard and the tag are updated for the instruction extracted from the Pending Queue, and the scoreboard and the tag are updated for the fetched instruction. (So that the later instruction does not affect the execution of the earlier instruction).

【０１０１】次に、実行可能性の判定について説明す
る。Next, the determination of the feasibility will be described.

【０１０２】まず、フェッチしてきたＶＬＩＷ命令のア
トムの実行可能性の判定について説明する。First, the determination of the executability of the fetched VLIW instruction atom will be described.

【０１０３】フェッチしてきたアトムの実行可能性は、
スコアボードに基づいて、いくつかの条件に該当するか
否かで判定され、実行不可と判定された場合には、Ｐｅ
ｎｄｉｎｇＱｕｅｕｅに入れられるか、または新たな
命令のフェッチを中断する（止める）かの２種類の処理
のいずれかが行われる。それ以外の場合に、実行可能と
なる。The executability of the fetched atom is
Based on the scoreboard, it is determined whether some conditions are met or not. If it is determined that execution is not possible, Pe
One of two types of processing is performed: the processing is put into the nding Queue, or the fetch of a new instruction is interrupted (stopped). Otherwise, it becomes executable.

【０１０４】なお、フェッチの中断の条件が成立した場
合には、フェッチの中断が優先的に発生し、フェッチが
中断しない場合に、はじめてＰｅｎｄｉｎｇＱｕｅｕ
ｅに入れることが可能となる。When the condition for interrupting the fetch is satisfied, the interruption of the fetch occurs preferentially, and when the interruption of the fetch is not interrupted, the Pending Queue is interrupted for the first time.
e.

【０１０５】図９に、各サイクル／各スロットにおけ
る、フェッチしてきたアトムの実行可能性を判断する手
順の一例を示す。FIG. 9 shows an example of a procedure for determining the executability of the fetched atom in each cycle / each slot.

【０１０６】まず、次の条件のいずれかが成立すると
（ステップＳ１１〜Ｓ１５）、当該アトムは実行できな
いと判定され、かつ、フェッチの中断が決定される（ス
テップＳ１９）。First, when any of the following conditions is satisfied (steps S11 to S15), it is determined that the atom cannot be executed, and interruption of the fetch is determined (step S19).

【０１０７】（ｉ）スコアボードにおいて、フェッチし
たアトムのｄｅｓｔとなるレジスタにＰビットがセット
されているこの場合には、当該アトムを実行すると、Ｐｅｎｄｉｎ
ｇＱｕｅｕｅに入っているアトムより先にレジスタに
書き込むことになって、依存関係が破壊されることにな
ることから、当該アトムを実行することができないの
で、新たな命令をフェッチしない。(I) In the scoreboard, the P bit is set in the register that is the dest of the fetched atom. In this case, when the atom is executed, Pendin
Since writing into a register prior to an atom included in g Queue destroys the dependency, the atom cannot be executed, and therefore no new instruction is fetched.

【０１０８】（ii）フェッチしたアトムがＬＤ命令（ロ
ード命令）もしくはＳＴ命令（ストア命令）であり、ス
コアボードにおいて、該ＬＤアトムもしくはＳＴアトム
のアドレスとなるｓｒｃのレジスタにＰビットがセット
されているこの場合は、ＰｅｎｄｉｎｇＱｕｅｕｅには入れず、
新たな命令をフェッチしない。（iii ）フェッチしたアトムがブランチ命令であり、ス
コアボードにおいて、該ブランチアトムの条件になるレ
ジスタにＰビットがセットされているこの場合は、ＰｅｎｄｉｎｇＱｕｅｕｅには入れず、
新たな命令をフェッチしない。(Ii) The fetched atom is an LD instruction (load instruction) or ST instruction (store instruction), and the P bit is set in the src register which is the address of the LD atom or ST atom in the scoreboard. Yes In this case, do not include in the Pending Queue.
Do not fetch new instructions. (Iii) The fetched atom is a branch instruction, and the P bit is set in a register that becomes a condition of the branch atom in the scoreboard. In this case, the Pending Queue is not included in the Pending Queue.
Do not fetch new instructions.

【０１０９】ただし、この場合に、ＰｅｎｄｉｎｇＱ
ｕｅｕｅに入れ、新たな命令をフェッチして投機的に実
行を進める方法も考えられる。（iv）スコアボードにおいて、フェッチしたアトムのｄ
ｅｓｔとなるレジスタのＰカウントが０でない（１以上
である）この場合は、このレジスタを読もうとしているアトムが
Ｐｅｎｄｉｎｇしているので、上書きすることができな
いことから、新たな命令をフェッチしない。In this case, however, the Pending Q
uee, a new instruction is fetched, and the execution proceeds speculatively. (Iv) In the scoreboard, d of the fetched atom
The P count of the register to be est is not 0 (1 or more). In this case, since the atom that is trying to read this register is pending, it cannot be overwritten, so that no new instruction is fetched.

【０１１０】（ｖ）同時にフェッチされた他のスロット
のアトムによってフェッチの中断が決定された次に、上記の条件のいずれも成立しなかった場合に（ス
テップＳ１１〜Ｓ１５）、さらに、当該アトムが実行可
能か、またはＰｅｎｄｉｎｇＱｕｅｕｅに入れるか
が、判定される。(V) Interruption of fetch is determined by an atom of another slot fetched at the same time. Next, if none of the above conditions is satisfied (steps S11 to S15), the atom is It is determined whether it can be executed or put in the Pending Queue.

【０１１１】すなわち、下記の条件を満たす場合に（ス
テップＳ１６）、当該アトムは実行できないと判定さ
れ、かつ、ＰｅｎｄｉｎｇＱｕｅｕｅに入れることが
決定される（ステップＳ１８）。That is, when the following condition is satisfied (step S16), it is determined that the atom cannot be executed, and it is determined that the atom is to be put in the Pending Queue (step S18).

【０１１２】（vi）スコアボードにおいて、フェッチし
たアトムのｓｒｃとなるレジスタのＰビットのうち、少
なくとも一つのＰビットがセットされているそして、ステップＳ１６において条件が成立しない場合
に初めて、当該アトムは実行可能と判断される（ステッ
プＳ１７）。(Vi) In the scoreboard, at least one P bit among the P bits of the src register of the fetched atom is set. Only when the condition is not satisfied in step S16, the atom is It is determined that execution is possible (step S17).

【０１１３】なお、（フェッチ中断の発生なく）実行可
能と判定されたアトムがＮＯＰアトム以外の場合には、
当該アトムを実行することになるが、それがＮＯＰアト
ムの場合には、ＰｅｎｄｉｎｇＱｕｅｕｅのアトムに
実行の機会が与えられることになる。If the atom determined to be executable (without fetch interruption) is other than the NOP atom,
The atom is executed. If the atom is a NOP atom, the atom of the Pending Queue is given an opportunity to execute.

【０１１４】次に、ＰｅｎｄｉｎｇＱｕｅｕｅからの
アトムの実行可能性について説明する。Next, the possibility of executing an atom from the Pending Queue will be described.

【０１１５】ＰｅｎｄｉｎｇＱｕｅｕｅからのアトム
の実行可能性は、スコアボードに基づいて、いくつかの
条件に該当するか否かで判定される（実行可能と判定さ
れた場合には、実行されることが決定される）。The feasibility of the atom from the Pending Queue is determined based on a scoreboard based on whether or not some conditions are satisfied. It is determined).

【０１１６】図１０に、各サイクル／各スロットにおけ
る、ＰｅｎｄｉｎｇＱｕｅｕｅからのアトムの実行可
能性を判断する手順の一例を示す。FIG. 10 shows an example of a procedure for judging the feasibility of an atom from a pending queue in each cycle / each slot.

【０１１７】まず、インストラクションキャッシュのミ
ス中である場合、フェッチしてきたアトム自身が上記の
（ｉ）〜（iv），（vi）の条件に該当して実行できない
場合、（v）のフェッチ中断の場合（他のスロットのア
トムが上記の（ｉ）〜（iv），（vi）の条件に該当して
実行できない場合）、フェッチしてきたアトムがＮＯＰ
命令である場合（ただし、フェッチの中断はない場
合）、のいずれかの条件を満たした場合に（ステップＳ
２１）、当該ＰｅｎｄｉｎｇＱｕｅｕｅのアトムに、
実行の機会が与えられる（図６参照）。First, when the instruction cache is missed, when the fetched atom itself cannot be executed according to the above conditions (i) to (iv) and (vi), the interruption of the fetch in (v) is stopped. In this case (if the atom in another slot cannot be executed according to the above conditions (i) to (iv) and (vi)), the fetched atom is NOP
Instruction (if fetch is not interrupted) or any of the conditions is satisfied (step S
21), the atom of the Pending Queue
An opportunity for execution is given (see FIG. 6).

【０１１８】このとき、ＰｅｎｄｉｎｇＱｕｅｕｅか
らのアトムの実行可能性は、次の条件で判定される。At this time, the feasibility of the atom from the Pending Queue is determined under the following conditions.

【０１１９】なお、ＰｅｎｄｉｎｇＱｕｅｕｅをＦＩ
ＦＯ構成とすれば、ＰｅｎｄｉｎｇＱｕｅｕｅの先頭
のアトムについてのみ、実行可能性の判定を行うだけで
よい。Note that the Pending Queue is set to FI
With the FO configuration, it is only necessary to determine the feasibility of only the first atom of the Pending Queue.

【０１２０】（ｉ）判定対象のアトムについて、そのす
べてのｓｒｃとなるレジスタのＰビットがリセットされ
た状態であり、かつ、ｄｅｓｔとなるレジスタのＰカウ
ントが０である場合（ステップＳ２２）には、そのアト
ムは実行可能である（ステップＳ２６）。(I) When the P bits of all the src registers of the atom to be determined are reset and the P count of the dest register is 0 (step S22), , The atom is executable (step S26).

【０１２１】（ii）ＰｅｎｄｉｎｇＱｕｅｕｅのタグ
のフィールドＢｉｔ０にビットがセットされている場合
には、スコアボードにおいて、当該アトムのｓｒｃ２と
なるレジスタのＰビットがセットされておらず、かつ、
ｄｅｓｔとなるレジスタのＰカウントが１であるならば
（ステップＳ２３）、当該アトムは実行可能となる（ス
テップＳ２６）。（iii ）ＰｅｎｄｉｎｇＱｕｅｕｅのタグのフィール
ドＢｉｔ１にビットがセットされている場合には、スコ
アボードにおいて、当該アトムのｓｒｃ１となるレジス
タのＰビットがセットされておらず、かつ、ｄｅｓｔと
なるレジスタのＰカウントが１であるならば（ステップ
Ｓ２４）、当該アトムは実行可能となる（ステップＳ２
６）。(Ii) If a bit is set in the field Bit0 of the tag of the Pending Queue, the P bit of the register src2 of the atom is not set in the scoreboard, and
If the P count of the dest register is 1 (step S23), the atom becomes executable (step S26). (Iii) If a bit is set in the field Bit1 of the Pending Queue tag, the P bit of the register that is src1 of the atom is not set and the P bit of the register that is dest is set in the scoreboard. If the count is 1 (step S24), the atom becomes executable (step S2).
6).

【０１２２】上記の（ii）、（iii ）において、判断条
件として、Ｐカウントが０でなく、１で実行可能として
いるのは、該当するタグのフィールドにビットがセット
されている場合は、当該アトムのｓｒｃとｄｅｓｔとが
一致している場合を示しているからである。In the above (ii) and (iii), the condition that the P count is 1 instead of 0 as the judgment condition is that execution is possible when a bit is set in the field of the corresponding tag. This is because the case where the src of the atom matches the dest is shown.

【０１２３】（iii ）その他の場合は、そのアトムは実
行できないことになる（ステップＳ２５）。(Iii) In other cases, the atom cannot be executed (step S25).

【０１２４】以下では、このように設定されたスコアボ
ードやＰｅｎｄｉｎｇＱｕｅｕｅのタグのメカニズム
がどう働いているかについていくつかの例を示して説明
する。なお、説明を簡単にするために、ディペンデンシ
のあるアトムのみを図１１に示す。また、ディペンデン
シは、当然、スロット間にまたがってあること（例えば
ＦＰの演算の結果をＩＮＴで使う）もあるが、スロット
の関係については説明を省略している（説明の関係上、
他のスロットは、スコアボードの内容に影響を与えず、
フェッチを中断させることもなかったものとする）。Hereinafter, how the scoreboard and the Pending Queue tag mechanisms set as described above work will be described with reference to some examples. For simplicity of description, FIG. 11 shows only atoms having dependencies. In addition, the dependency may naturally extend over the slots (for example, the result of the FP operation is used in the INT), but the description of the relationship of the slots is omitted (for the sake of explanation,
Other slots do not affect the content of the scoreboard,
The fetch was not interrupted.)

【０１２５】（ｉ）まず、図１１（ａ）の例を用いて、
フェッチしたアトムのｓｒｃとなるレジスタＰビットが
セットされているためにアトムをＰｅｎｄｉｎｇＱｕ
ｅｕｅに入れる場合について説明する。(I) First, using the example of FIG.
Since the register P bit which is the src of the fetched atom is set, the atom is
The case of entering eue will be described.

【０１２６】この例は、あるスロットにおいて、ＬＤ
Ｒ８，（Ｒ５）が実行され、次ＡＤＤＲ１０，Ｒ８，
Ｒ９が実行される、という例である。In this example, in a certain slot, LD
R8, (R5) is executed, and the next ADD R10, R8,
R9 is executed.

【０１２７】図１２に、この場合のスコアボードやＰｅ
ｎｄｉｎｇＱｕｅｕｅの内容の推移を示す。FIG. 12 shows the scoreboard and Pe in this case.
The transition of the content of the nding Queue is shown.

【０１２８】図１２では、スコアボードとＰｅｎｄｉｎ
ｇＱｕｅｕｅのタグにおいて、０の記述を省略してい
る。また、ＰビットとＱｕｅｕｅのタグにおいて、１が
セット状態を示し、０がリセット状態を示すものとす
る。なお、この点は、後に参照する図１３、図１４でも
同様である。In FIG. 12, the scoreboard and the Pendin
The description of 0 is omitted in the tag of g Queue. In the P bit and the Queue tag, 1 indicates a set state, and 0 indicates a reset state. This is the same in FIGS. 13 and 14 which will be referred to later.

【０１２９】さて、初期状態において、スコアボードの
状態は図１２（ａ）、ＰｅｎｄｉｎｇＱｕｅｕｅの状
態は図１２（ｂ）のようであるものとする。In the initial state, the state of the scoreboard is as shown in FIG. 12A, and the state of the Pending Queue is as shown in FIG. 12B.

【０１３０】まず、ＬＤＲ８，（Ｒ５）がフェッチさ
れ、スコアボードを参照して実行可能であると判定され
るので実行される。First, LD R8, (R5) is fetched and is executed because it is determined that the execution is possible with reference to the scoreboard.

【０１３１】ここで、キャッシュミスが発生したものと
すると、図１２（ｃ）に示すように、当該アトムのｄｅ
ｓｔとなるレジスタＲ８のＰビットがセットされる。Ｐ
ｅｎｄｉｎｇＱｕｅｕｅについては図１２（ｄ）のよ
うに変化はない。Here, assuming that a cache miss has occurred, as shown in FIG.
The P bit of the register R8 to be st is set. P
The ending queue does not change as shown in FIG.

【０１３２】次に、ＡＤＤＲ１０，Ｒ８，Ｒ９がフェ
ッチされるが、スコアボードを参照すると、ｓｒｃとな
るＲ８のＰビットがセットされているので、このアトム
はＰｅｎｄｉｎｇＱｕｅｕｅへ投入されることになり
（図１２（ｆ））、この結果、当該アトムのｄｅｓｔと
なるレジスタＲ１０のＰビットがセットされ、ｓｒｃと
なるレジスタＲ８とＲ９のＰカウント（Ｐｅｎｄｉｎｇ
ｕＣｏｕｎｔ）がそれぞれインクリメントされて１と
なる（図１２（ｅ））。Next, ADD R10, R8, and R9 are fetched. Referring to the scoreboard, since the P bit of R8 serving as src is set, this atom is input to the Pending Queue. (FIG. 12 (f)) As a result, the P bit of the register R10 which is the dest of the atom is set, and the P count of the registers R8 and R9 which become the src (Pending)
u Count) is incremented to 1 (FIG. 12E).

【０１３３】このように、値の入っていないレジスタを
使おうとするアトムは、ＰｅｎｄｉｎｇＱｕｅｕｅに
入れられる。As described above, an atom attempting to use a register that does not contain a value is placed in the Pending Queue.

【０１３４】なお、ＬＤ命令が完了したならば、スコア
ボードとＰｅｎｄｉｎｇＱｕｅｕｅは、図１２（ｇ）
と（ｈ）のようになり、ＡＤＤＲ１０，Ｒ８，Ｒ９は
実行可能となる。そして、ＡＤＤＲ１０，Ｒ８，Ｒ９
がＰｅｎｄｉｎｇＱｕｅｕｅから取り出されると、ス
コアボードとＰｅｎｄｉｎｇＱｕｅｕｅは、図１２
（ａ）と（ｂ）の状態に戻る（ただし、スコアボードと
ＰｅｎｄｉｎｇＱｕｅｕｅに他の変化がなかったもの
とする）。When the LD instruction is completed, the scoreboard and the Pending Queue are changed to the state shown in FIG.
And (h), and ADD R10, R8, R9 becomes executable. And ADD R10, R8, R9
Is retrieved from the Pending Queue, the scoreboard and the Pending Queue are shown in FIG.
Return to the state of (a) and (b) (provided that there is no other change in the scoreboard and the Pending Queue).

【０１３５】（ii）次に、図１１（ｂ）の例を用いて、
Ｐビットがｄｅｓｔにセットされていて新たな命令のフ
ェッチを止める場合について説明する。(Ii) Next, using the example of FIG.
The case where the P bit is set to dest and the fetch of a new instruction is stopped will be described.

【０１３６】この例は、図１２（ａ）の２つのアトムに
続いて、ＳＵＢＲ１０，Ｒ２，Ｒ３が実行される、と
いう例である。In this example, SUB R10, R2, and R3 are executed following the two atoms in FIG.

【０１３７】スコアボードとＰｅｎｄｉｎｇＱｕｅｕ
ｅの初期状態が図１２（ａ）と（ｂ）のようであり、Ｌ
ＤＲ８，（Ｒ５）がキャッシュミスを起こしたとする
と、ＡＤＤＲ１０，Ｒ８，Ｒ９がＰｅｎｄｉｎｇＱ
ｕｅｕｅへ投入されたときのスコアボードとＰｅｎｄｉ
ｎｇＱｕｅｕｅは初期状態は図１２（ｅ）、（ｆ）の
ようになる。Scoreboard and Pending Queu
The initial state of e is as shown in FIGS.
Assuming that DR R8, (R5) has caused a cache miss, ADD R10, R8, R9 is
Scoreboard and Pendi when it is put into ueue
The initial state of the ng Queue is as shown in FIGS.

【０１３８】次に、ＳＵＢＲ１０，Ｒ２，Ｒ３がフェ
ッチされるが、スコアボードを参照すると、ｄｅｓｔと
なるＲ１０のＰビットがセットされている。ここで、も
しＳＵＢＲ１０，Ｒ２，Ｒ３を実行すると、レジスタ
Ｒ１０がまだＡＤＤＲ１０，Ｒ８，Ｒ９によって書かれ
ないうちに上書きしてしまうことになる。そこで、フェ
ッチ中断となる（スコアボードとＰｅｎｄｉｎｇＱｕ
ｅｕｅの内容は変化しない）。Next, SUB R10, R2, and R3 are fetched. When the scoreboard is referred to, the P bit of dest R10 is set. Here, if SUB R10, R2, R3 is executed, the register R10 will be overwritten before it is written yet by ADDR10, R8, R9. Therefore, the fetch is interrupted (score board and Pending Qu
eue does not change).

【０１３９】このように、Ｐビットがセットされている
レジスタをｄｅｓｔとして使おうとするアトムについて
は、ＰｅｎｄｉｎｇＱｕｅｕｅに入れず、新たなフェ
ッチを止める。As described above, an atom to be used as a dest in a register in which the P bit is set is not included in the Pending Queue, and a new fetch is stopped.

【０１４０】なお、ＬＤ命令が完了し、ＡＤＤＲ１
０，Ｒ８，Ｒ９がＰｅｎｄｉｎｇＱｕｅｕｅから取り出
されると、スコアボードとＰｅｎｄｉｎｇＱｕｅｕｅ
は、図１２（ａ）と（ｂ）の状態に戻るので、この時点
でＳＵＢＲ１０，Ｒ２，Ｒ３が実行可能となる。It should be noted that the LD instruction is completed and ADD R1
When 0, R8 and R9 are taken out of the Pending Queue, the scoreboard and the Pending Queue
Returns to the state shown in FIGS. 12A and 12B, so that SUB R10, R2, and R3 can be executed at this time.

【０１４１】（iii ）次に、図１１（ｃ）の例を用い
て、ｄｅｓｔとなるレジスタのＰｅｎｄｉｎｇＣｏｕ
ｎｔが０でないために新たな命令のフェッチを止める場
合について説明する。(Iii) Next, using the example of FIG. 11C, the Pending Cou of the register to be the dest
A case where the fetch of a new instruction is stopped because nt is not 0 will be described.

【０１４２】この例は、図１２（ａ）の２つのアトムに
続いて、ＳＵＢＲ８，Ｒ７，Ｒ６が実行される、とい
う例である。In this example, SUB R8, R7, and R6 are executed following the two atoms in FIG.

【０１４３】スコアボードとＰｅｎｄｉｎｇＱｕｅｕ
ｅの初期状態が図１２（ａ）と（ｂ）のようであり、Ｌ
ＤＲ８，（Ｒ５）がキャッシュミスを起こしたとする
と、ＡＤＤＲ１０，Ｒ８，Ｒ９がＰｅｎｄｉｎｇＱ
ｕｅｕｅへ投入されたときのスコアボードとＰｅｎｄｉ
ｎｇＱｕｅｕｅは初期状態は図１２（ｅ）、（ｆ）の
ようになる。Scoreboard and Pending Queu
The initial state of e is as shown in FIGS.
Assuming that DR R8, (R5) has caused a cache miss, ADD R10, R8, R9 is
Scoreboard and Pendi when it is put into ueue
The initial state of the ng Queue is as shown in FIGS.

【０１４４】次に、ＳＵＢＲ８，Ｒ７，Ｒ６がフェッ
チされるが、スコアボードを参照すると、ｄｅｓｔとな
るＲ８のＰカウントが０でない。ここで、もしＳＵＢＲ
８，Ｒ７，Ｒ６を実行すると、レジスタＲ８がまだＡＤ
ＤＲ１０，Ｒ８，Ｒ９によって参照されないうちにこ
れを更新してしまうことになる。そこで、フェッチ中断
となる（スコアボードとＰｅｎｄｉｎｇＱｕｅｕｅの
内容は変化しない）。Next, SUBs R8, R7, and R6 are fetched. When the scoreboard is referred to, the P count of the dest R8 is not 0. Here, if SUBR
When 8, R7 and R6 are executed, register R8 is still AD
This would be updated before being referenced by DR10, R8, R9. Therefore, the fetch is interrupted (the contents of the scoreboard and the Pending Queue do not change).

【０１４５】このように、Ｐカウントが０でないレジス
タをｄｅｓｔとして使おうとするアトムについては、Ｐ
ｅｎｄｉｎｇＱｕｅｕｅに入れず、新たなフェッチを
止める。As described above, for an atom that attempts to use a register whose P count is not 0 as dest, P
Stop new fetch without entering ending Queue.

【０１４６】なお、ＬＤ命令が完了し、ＡＤＤＲ１
０，Ｒ８，Ｒ９がＰｅｎｄｉｎｇＱｕｅｕｅから取り出
されると、スコアボードとＰｅｎｄｉｎｇＱｕｅｕｅ
は、図１２（ａ）と（ｂ）の状態に戻るので、この時点
でＳＵＢＲ８，Ｒ７，Ｒ６が実行可能となる。It should be noted that the LD instruction is completed and ADD R1
When 0, R8 and R9 are taken out of the Pending Queue, the scoreboard and the Pending Queue
Returns to the state shown in FIGS. 12A and 12B, so that SUB R8, R7, and R6 can be executed at this time.

【０１４７】（iv）次に、図１１（ｄ）の例を用いて、
ＰｅｎｄｉｎｇＱｕｅｕｅのタグがセットされる場合
について説明する。(Iv) Next, using the example of FIG.
The case where the tag of the Pending Queue is set will be described.

【０１４８】この例は、あるスロットにおいて、ＬＤ
Ｒ８，（Ｒ５）が実行され、次ＡＤＤＲ６，Ｒ６，Ｒ
８が実行される、という例である。In this example, in a certain slot, LD
R8, (R5) are executed, and the next ADD R6, R6, R
8 is executed.

【０１４９】図１３に、この場合のスコアボードやＰｅ
ｎｄｉｎｇＱｕｅｕｅの内容の推移を示す。FIG. 13 shows the scoreboard and Pe in this case.
The transition of the content of the nding Queue is shown.

【０１５０】さて、初期状態において、スコアボードの
状態は図１３（ａ）、ＰｅｎｄｉｎｇＱｕｅｕｅの状
態は図１３（ｂ）のようであるものとする。In the initial state, the state of the scoreboard is as shown in FIG. 13A, and the state of the Pending Queue is as shown in FIG. 13B.

【０１５１】まず、ＬＤＲ８，（Ｒ５）がフェッチさ
れ、スコアボードを参照して実行可能であると判定され
るので実行される。First, LD R8, (R5) is fetched, and is executed because it is determined that it is executable by referring to the scoreboard.

【０１５２】ここで、ＬＤアトムの実行においてキャッ
シュミスが発生したものとすると、図１３（ｃ）に示す
ように、当該アトムのｄｅｓｔとなるレジスタＲ８のＰ
ビットがセットされる。ＰｅｎｄｉｎｇＱｕｅｕｅに
ついては図１３（ｄ）のように変化はない。Here, assuming that a cache miss has occurred during the execution of the LD atom, as shown in FIG. 13C, the P
Bit is set. The Pending Queue does not change as shown in FIG.

【０１５３】次に、ＡＤＤＲ６，Ｒ６，Ｒ８がフェッ
チされるが、スコアボードを参照すると、ｓｒｃ２とな
るＲ８のＰビットがセットされているので、このアトム
はＰｅｎｄｉｎｇＱｕｅｕｅへ投入されることにな
り、そして、このアトムのｄｅｓｔとｓｒｃ１が一致し
ているので、Ｂｉｔ０に１をセットすることになる（図
１３（ｆ））。また、当該アトムのｄｅｓｔとなるレジ
スタＲ６のＰビットがセットされ、ｓｒｃとなるレジス
タＲ６とＲ８のＰカウントがそれぞれインクリメントさ
れて１となる（図１３（ｅ））。Next, ADD R6, R6, and R8 are fetched. Referring to the scoreboard, since the P bit of R8, which is src2, is set, this atom is input to the Pending Queue. Then, since the atom's dest and src1 match, 1 is set to Bit0 (FIG. 13 (f)). Also, the P bit of the register R6, which is the dest of the atom, is set, and the P count of the registers R6, R8, which are the src, is incremented to 1 (FIG. 13 (e)).

【０１５４】このＡＤＤＲ６，Ｒ６，Ｒ８は、Ｒ８が
有効になるのを待っているので、（自身のためにセット
された）レジスタＲ６のＰビットが１であっても実行可
能となる。この場合、Ｂｉｔ０が１であるからｓｒｃ２
となるＲ８のＰビットが０になると実行可能となる。Since ADD R6, R6, and R8 are waiting for R8 to become valid, they can be executed even if the P bit of register R6 (set for itself) is 1. In this case, since Bit0 is 1, src2
Becomes executable when the P bit of R8 becomes 0.

【０１５５】ＬＤ命令が完了したならば、スコアボード
とＰｅｎｄｉｎｇＱｕｅｕｅは、図１３（ｇ）と
（ｈ）のようになり、ＡＤＤＲ１０，Ｒ８，Ｒ９は実
行可能となる。そして、ＡＤＤＲ１０，Ｒ８，Ｒ９が
ＰｅｎｄｉｎｇＱｕｅｕｅから取り出されると、スコ
アボードとＰｅｎｄｉｎｇＱｕｅｕｅは、図１３
（ａ）と（ｂ）の状態に戻る。When the LD instruction is completed, the scoreboard and the Pending Queue are as shown in FIGS. 13 (g) and 13 (h), and ADD R10, R8, R9 can be executed. When the ADD R10, R8, and R9 are extracted from the Pending Queue, the scoreboard and the Pending Queue are displayed as shown in FIG.
It returns to the state of (a) and (b).

【０１５６】なお、図１４に、図５の例に本構成例と手
順例を適用した場合におけるスコアボードとＰｅｎｄｉ
ｎｇＱｕｅｕｅの内容の推移を示しておく。（ａ）と
（ｂ）は初期状態、（ｃ）と（ｄ）はサイクル１でＬＤ
がエラーを起こしたときの状態、（ｅ）と（ｆ）はサイ
クル３でＳＵＢＲ１１，Ｒ５，Ｒ８がＰｅｎｄｉｎｇ
Ｑｕｅｕｅに入れられたときの状態、（ｇ）と（ｈ）
はサイクル５でＬＤが完了したときの状態（ここでＳＵ
ＢＲ１１，Ｒ５，Ｒ８が実行可能となる）をそれぞれ
示す。また、サイクル６でＳＵＢＲ１１，Ｒ５，Ｒ８
がＰｅｎｄｉｎｇＱｕｅｕｅから取り出され、そのと
きの状態は（ａ）と（ｂ）のようになる。FIG. 14 shows the scoreboard and Pendi when the present configuration example and the procedure example are applied to the example of FIG.
The transition of the content of ng Queue will be shown. (A) and (b) are initial states, (c) and (d) are LDs in cycle 1
Are the states when an error occurred, and (e) and (f) are cycle 3 and SUB R11, R5, and R8 are pending.
State when put in Queue, (g) and (h)
Is the state when the LD is completed in cycle 5 (here, SU
B R11, R5, and R8 become executable). In cycle 6, SUB R11, R5, R8
Is extracted from the Pending Queue, and the state at that time is as shown in (a) and (b).

【０１５７】次に、ＰｅｎｄｉｎｇＱｕｅｕｅに入れ
ないものとする命令について説明する。Next, an instruction that is not included in the Pending Queue will be described.

【０１５８】先に言及したように、いくつかの命令は実
行可能でないときＰｅｎｄｉｎｇＱｕｅｕｅに入れずに
新たなフェッチを中断することになる。実行可能性の説
明において処理の概要について述べたが、ここでは、そ
れらのアトムについてさらに詳しく述べる。As mentioned earlier, some instructions will suspend new fetches without entering the PendingQueue when they are not executable. Although the outline of the processing has been described in the description of the feasibility, those atoms will be described in more detail here.

【０１５９】まず、ＬＤおよびＳＴのアドレスを示すレ
ジスタにＰビットがセットされている場合、もしくはＳ
Ｔのデータを示すレジスタにＰビットがセットされてい
る場合、ＰｅｎｄｉｎｇＱｕｅｕｅには入れないよう
にするが、これは処理を単純化するためと、ＬＤ、ＳＴ
の順序関係を保持するためである。例えば、ＳＴがＰｅ
ｎｄｉｎｇＱｕｅｕｅにいる間にＬＤが実行されて、
これがＰｅｎｄｉｎｇＱｕｅｕｅにいるＳＴと同じア
ドレスであった場合などには、正しい関係にするのは不
可能である。同様に、ＬＤがＰｅｎｄｉｎｇＱｕｅｕ
ｅにいる間に次のＳＴが実行されてアドレスが同じ場合
もＰｅｎｄｉｎｇＱｕｅｕｅから出てきたＬＤはＳＴ
によって本来とは違う値を読み込んでしまう。First, if the P bit is set in the register indicating the LD and ST addresses, or
When the P bit is set in the register indicating the data of T, the P bit is not included in the Pending Queue.
This is for maintaining the order relation of. For example, if ST is Pe
The LD is executed while in the nding Queue,
If this is the same address as the ST in the Pending Queue, it is impossible to establish a correct relationship. Similarly, if the LD is pending queuing
e, while the next ST is executed and the address is the same, the LD coming out of the Pending Queue
Will read a value different from the original.

【０１６０】また、トラップを起こしたときも、ＬＤ、
ＳＴがＰｅｎｄｉｎｇＱｕｅｕｅに入っている場合は
再実行をさせるために複雑な機構が必要になるので、Ｌ
Ｄ、ＳＴはＰｅｎｄｉｎｇＱｕｅｕｅに積まないこと
とする。When a trap occurs, the LD,
If the ST is in the Pending Queue, a complicated mechanism is required to re-execute it.
D and ST do not accumulate in the Pending Queue.

【０１６１】トラップを起こさせる要因としては、この
他に算術演算命令があるが、オーバーフローなどのトラ
ップが起こった場合には、一般にそのプロセスの実行が
終了し、再実行はさせないので、これらの算術演算命令
はＰｅｎｄｉｎｇＱｕｅｕｅに積むことにしている。Other causes of the trap include arithmetic operation instructions. When a trap such as an overflow occurs, the execution of the process is generally terminated and re-execution is not performed. The operation instruction is loaded on the Pending Queue.

【０１６２】ＰｅｎｄｉｎｇＱｕｅｕｅに積まない命
令として上げられるものには、Ｅｘｅｃｕｔｅに複数の
命令サイクルが必要な命令、例えば、ＤＩＶ（割り算）
命令などがある。Ｅｘｅｃｕｔｅに複数のサイクルが必
要な場合は、ＰｅｎｄｉｎｇＱｕｅｕｅに積んでしま
うと、再びＰｅｎｄｉｎｇＱｕｅｕｅから出る場合に
複数のサイクルにわたってフェッチ側からのアトムの発
行を止めなければならない。これを実現するのは機構が
複雑になるので、Ｅｘｅｃｕｔｅに複数のサイクルが必
要なアトムは、ＰｅｎｄｉｎｇＱｕｅｕｅに積まない
で、新たな命令のフェッチを止めることにしている。Instructions that do not accumulate on the Pending Queue include instructions requiring a plurality of instruction cycles in the Execute, for example, DIV (division).
There are instructions. If the Execute requires a plurality of cycles, if the Pending Queue is loaded, the issuing of the atom from the fetch side must be stopped for a plurality of cycles when exiting the Pending Queue again. To achieve this, the mechanism becomes complicated. Therefore, an atom that requires a plurality of cycles for Execute is stopped from fetching a new instruction without being stacked in the Pending Queue.

【０１６３】ただし、Ｅｘｅｃｕｔｅに複数のサイクル
が必要な命令もＰｅｎｄｉｎｇＱｕｅｕｅに入れ、Ｐｅ
ｎｄｉｎｇＱｕｅｕｅから出て実行されるときには、
他のパイプの実行をストールさせるという方法もある。However, an instruction requiring a plurality of cycles in Execute is also put in PendingQueue, and Pe
When running out of the nding Queue,
Another method is to stall the execution of other pipes.

【０１６４】条件ブランチをＰｅｎｄｉｎｇＱｕｅｕ
ｅに入れることもハードウェアが複雑になるので行って
いない。条件ブランチをＰｅｎｄｉｎｇＱｕｅｕｅに
入れた場合には、条件ブランチの結果に関わらず、条件
ブランチ以後の命令を投機的に実行することになるが、
投機が間違っていた場合には結果をキャンセルする必要
があり、そのためのシャドウレジスタなどがハードウェ
アのコスト増大となることが多いと考えられるからであ
る。ただし、ハードウェアのコスト増大にみあう性能向
上があれば条件ブランチをＰｅｎｄｉｎｇＱｕｅｕｅ
に入れる方式も可能となる。Pending Queue with conditional branch
e is not performed because the hardware becomes complicated. When the conditional branch is put in the Pending Queue, the instructions after the conditional branch are speculatively executed regardless of the result of the conditional branch.
This is because if the speculation is wrong, it is necessary to cancel the result, and it is considered that the cost of hardware such as a shadow register often increases. However, if there is a performance improvement corresponding to the increase in hardware cost, the conditional branch is set to Pending Queue.
It is also possible to put it in a box.

【０１６５】以下では、ダイナミックＶＬＩＷ方式を適
用したプロセッサの構成例について説明する。Hereinafter, an example of the configuration of a processor to which the dynamic VLIW method is applied will be described.

【０１６６】図１５に、プロセッサの概略的な構成例を
示す。FIG. 15 shows a schematic configuration example of a processor.

【０１６７】この例は、ＦＰ（浮動小数点演算）のパイ
プを１本（１６−１）とＩＮＴ（整数演算）のパイプを
２本（１６−２，１６−３）の計３本のパイプを持つプ
ロセッサの例である。In this example, one pipe (16-1) for FP (floating point operation) and two pipes (16-2, 16-3) for INT (integer operation) are used, for a total of three pipes. It is an example of a processor that has a

【０１６８】前述したように、ダイナミックＶＬＩＷ方
式では、従来のＶＬＩＷ方式を基本としているので、Ｖ
ＬＩＷ命令を構成する各アトムの位置（スロット）によ
って投入されるパイプの位置が決まっている。図１５の
例の場合、ＶＬＩＷ命令の左端のアトムからそれぞれＦ
Ｐ、ＩＮＴ０、ＩＮＴ１のパイプに投入されることにな
る。As described above, the dynamic VLIW system is based on the conventional VLIW system.
The position of the pipe to be inserted is determined by the position (slot) of each atom constituting the LIW instruction. In the case of the example of FIG. 15, each of the VLIW instructions starts with F
P, INT0 and INT1 are put into the pipes.

【０１６９】ＰｅｎｄｉｎｇＱｕｅｕｅ（１２−１〜
１２−３）は、パイプ１本毎に設けられており、例え
ば、ＦＰに対応するスロットのアトムは必ずＦＰのＰｅ
ｎｄｉｎｇＱｕｅｕｅ（１２−１）に投入されること
になる。Pending Queue (12-1 to 12-1)
12-3) is provided for each pipe. For example, the atom of the slot corresponding to the FP is always Pe of the FP.
It will be input to the nding Queue (12-1).

【０１７０】このようにパイプとＰｅｎｄｉｎｇＱｕ
ｅｕｅの関係を固定することにより、ハードウェアを単
純化している。As described above, the pipe and the Pending Qu
The hardware is simplified by fixing the relation of eue.

【０１７１】ＰｅｎｄｉｎｇＱｕｅｕｅは、ＦＩＦＯ
を構成していて、Ｐｅｎｄｉｎされたアトムが発行され
るときは、必ず先頭から、また、ＰｅｎｄｉｎｇＱｕ
ｅｕｅに積まれるときは末尾に積まれる。The Pending Queue is a FIFO
And when a Pending atom is issued, the Pending Qu
When it is loaded to eue, it is loaded at the end.

【０１７２】なお、ＰｅｎｄｉｎｇＱｕｅｕｅのうち
どれかがいっぱいになると、新たなフェッチを行わない
ことでＰｅｎｄｉｎｇＱｕｅｕｅがあふれるのを防止
する。これにより、ＰｅｎｄｉｎｇＱｕｅｕｅからア
トムが発行されるようになり、ＰｅｎｄｉｎｇＱｕｅ
ｕｅの長さが減少することになる。When any of the Pending Queues is full, a new fetch is not performed to prevent the Pending Queues from overflowing. As a result, an atom is issued from the Pending Queue, and the Pending Queue is issued.
The length of ue will decrease.

【０１７３】続いて、ダイナミックＶＬＩＷ方式におけ
るパイプラインの構成例について説明する。Next, an example of the configuration of a pipeline in the dynamic VLIW system will be described.

【０１７４】ダイナミックＶＬＩＷ方式でパイプライン
を実施する場合、例えば、従来のＲＩＳＣでよく使われ
る５段のパイプライン、すなわちＦ（Ｆｅｔｃｈ；フェ
ッチ）、Ｄ（Ｄｅｃｏｄｅ；デコード）、Ｅ（Ｅｘｅｃ
ｕｔｅ；実行）、Ｍ（Ｍｅｍｏｒｙ；メモリ）、Ｗ（Ｗ
ｒｉｔｅｂａｃｋ；ライトバック）のステージに、Ｒ
（ＲｅｇｉｓｔｅｒＲｅａｄ；レジスタリード）のス
テージを付け加えたＦＤＲＥＭＷの６段のパイプライン
で構成する。もちろん、６段のパイプラインは、一例で
あって、例えばＦやＥやＭの段が複数になっていても同
じ考え方でパイプラインを構成することができる。When the pipeline is implemented by the dynamic VLIW system, for example, a five-stage pipeline often used in the conventional RISC, that is, F (Fetch; fetch), D (Decode; decode), E (Exec)
ute; execution), M (Memory; W), W (W
writeback; writeback) stage, R
(Register Read; register read) is added to the FDREMW six-stage pipeline. Of course, the six-stage pipeline is merely an example. For example, even if there are a plurality of stages of F, E, and M, the pipeline can be configured based on the same concept.

【０１７５】図１６に、図１５の例のプロセッサに、こ
のようなパイプライン構成を適用した場合の概略的な構
成図を示す。図中、２１はインストラクション・キャッ
シュ、２２−１〜２２−３は各スロットのＰｅｎｄｉｎ
ｇＱｕｅｕｅ、２３−１〜２３−３は各スロットのセ
レクタ、２４はスコアボード、２５−１は浮動小数点演
算用のレジスタファイル（ＲｅｇｉｓｔｅｒＦｉｌ
ｅ）、２５−２は整数演算用のＲｅｇｉｓｔｅｒＦｉ
ｌｅ、２６−１は浮動小数点演算ユニット、２６−２，
２６−３は各スロットの整数演算ユニット、２７はデー
タキャッシュである。また、Ｆ、Ｄ、Ｒ、Ｅ、Ｍ、Ｗ
は、その下方のブロックが動作するステージとの関連を
示している。FIG. 16 is a schematic block diagram showing a case where such a pipeline configuration is applied to the processor shown in FIG. In the figure, 21 is an instruction cache, 22-1 to 22-3 are Pendin of each slot.
g Queue, 23-1 to 23-3 are selectors for each slot, 24 is a scoreboard, 25-1 is a register file for floating-point arithmetic (Register File)
e) and 25-2 are Register Fis for integer arithmetic.
le, 26-1 are floating point arithmetic units, 26-2,
26-3 is an integer operation unit of each slot, and 27 is a data cache. F, D, R, E, M, W
Indicates the relation with the stage on which the block below operates.

【０１７６】セレクタ２３−１〜２３−３は、フェッチ
側とＰｅｎｄｉｎｇＱｕｅｕｅ側とのうち、アトムを
Ｒステージに投入する方を選択するスイッチである。た
だし、いずれの側も選択されない場合もあり、この場合
には、ＮＯＰアトムをＲステージに投入するものとす
る。The selectors 23-1 to 23-3 are switches for selecting which one of the fetch side and the pending queue side inputs an atom to the R stage. However, there is a case where neither side is selected, and in this case, the NOP atom is to be input to the R stage.

【０１７７】レジスタファイル２５−１は２Ｒｅａｄ／
１Ｗｒｉｔｅの計３ポート、レジスタファイル２５−２
は４Ｒｅａｄ／２Ｗｒｉｔｅの計６ポートの構成となっ
ている。The register file 25-1 is 2Read /
1 Write, 3 ports, register file 25-2
Has a configuration of a total of 6 ports of 4Read / 2Write.

【０１７８】なお、図１６では、３つのパイプを持つ場
合の例を示したが、もちろん本発明は、４以上のパイプ
を持つプロセッサにも適用可能である。例えば、図１６
の構成にレジスタファイル２５―１を共用するもう１つ
の浮動小数点演算ユニットが設けられてもよい（この場
合、レジスタファイル２５−１は４Ｒｅａｄ／２Ｗｒｉ
ｔｅの計６ポートとなる）。Although FIG. 16 shows an example in which three pipes are provided, the present invention is of course applicable to a processor having four or more pipes. For example, FIG.
May be provided with another floating-point operation unit sharing the register file 25-1 (in this case, the register file 25-1 is 4Read / 2Wri
te total 6 ports).

【０１７９】以下では、上記のパイプラインの各ステー
ジの処理例について説明する。In the following, a processing example of each stage of the above pipeline will be described.

【０１８０】まず、Ｆステージでは、ＶＬＩＷ命令のフ
ェッチが行われる。First, at the F stage, a VLIW instruction is fetched.

【０１８１】次に、Ｄステージでは、スコアボードを参
照してフェッチしたアトムが実行可能かどうかを判定
し、（フェッチの中断がなく）実行可能であればフェッ
チしたアトムをＲステージに送り、（フェッチの中断が
なく）実行可能でなければこのアトムをＰｅｎｄｉｎｇ
Ｑｕｅｕｅの末尾に付け、もしくはフェッチを中断
し、さらにフェッチしたアトムを実行しない場合に、Ｐ
ｅｎｄｉｎｇＱｕｅｕｅの先頭のアトムが実行可能であ
ればこのアトムをＲステージに送る。フェッチしたアト
ム、ＰｅｎｄｉｎｇＱｕｅｕｅの先頭がともに実行可
能でないときはＲステージにはＮＯＰが送られるものと
する。Next, in the D stage, it is determined whether or not the fetched atom is executable with reference to the scoreboard. If the fetched atom is executable (without interruption of the fetch), the fetched atom is sent to the R stage. Pending this atom if not executable (without interruption of fetch)
At the end of the Queue or when the fetch is interrupted and the fetched atom is not executed, P
If the first atom of the endingQueue is executable, this atom is sent to the R stage. If neither the fetched atom nor the head of the Pending Queue is executable, NOP is sent to the R stage.

【０１８２】なお、このステージでスコアボードやＰｅ
ｎｄｉｎｇＱｕｅｕｅが更新される。At this stage, the scoreboard and Pe
The nding Queue is updated.

【０１８３】Ｒステージでは、Ｄステージから送られて
きたアトムがＲｅｇｉｓｔｅｒＦｉｌｅのＲｅａｄを行
い、結果がＥステージに送られる。At the R stage, the atom sent from the D stage reads RegisterFile, and the result is sent to the E stage.

【０１８４】Ｅステージでは、Ｒステージから送られて
きたデータに対して演算を施し、その結果がＭステージ
に送られる。In the E stage, an operation is performed on the data sent from the R stage, and the result is sent to the M stage.

【０１８５】Ｍステージでは、アトムがメモリアクセス
を行うアトムであるときにデータキャッシュのアクセス
およびＴＬＢのアクセスが行われる。In the M stage, data cache access and TLB access are performed when the atom is an atom for performing memory access.

【０１８６】なお、キャッシュミスはこのステージで検
出される。A cache miss is detected at this stage.

【０１８７】Ｗステージでは、メモリアクセスの結果も
しくはＥステージ、Ｍステージを通った演算結果がＲｅ
ｇｉｓｔｅｒＦｉｌｅに書き込まれる。In the W stage, the result of the memory access or the operation result passed through the E and M stages is
written to the guest file.

【０１８８】このように実行可能の判断をわざわざＤス
テージというパイプ段を増やして行っているのは、レジ
スタのポート数を減らすためである。レジスタのポート
数を増加させることは一般に面積を増加させ、コストの
増大を招くことが多いからである。The reason why the determination of the feasibility is made by increasing the number of pipe stages called the D stage is to reduce the number of register ports. This is because increasing the number of ports of the register generally increases the area and often increases the cost.

【０１８９】ただし、レジスタのポート数がコストに影
響しない場合には、Ｄステージで実行可能性を判定し、
実行可能な一方のアトムがＲｅｇｉｓｔｅｒＦｉｌｅ
にアクセスするのではなく、ＤステージとＲステージを
１つのステージにまとめ、フェッチしたアトムとＰｅｎ
ｄｉｎｇＱｕｅｕｅの先頭のアトムの両方についてと
もにＲｅｇｉｓｔｅｒＦｉｌｅにアクセスすると同時
にスコアボードにアクセスを行い、実行可能なアトムが
Ｅステージに送られるように変更すれば、従来と同じ５
段のパイプラインで実現できる。パイプラインが５段に
なれば、ブランチのペナルティは減少されるのでさらな
る性能向上が見込まれる。However, if the number of register ports does not affect the cost, the feasibility is determined in the D stage.
One executable atom is a Register File
Rather than accessing, the D stage and the R stage are combined into one stage, and the fetched atom and Pen
By accessing both the Register File and the scoreboard at the same time for both of the leading atoms of the ding Queue, and changing the executable atoms to be sent to the E stage, the same as in the related art can be achieved.
It can be realized by a pipeline of stages. If the pipeline has five stages, the penalty of the branch is reduced, so that further performance improvement is expected.

【０１９０】次に、トラップ時の動作について説明す
る。Next, the operation at the time of trapping will be described.

【０１９１】トラップは、アトム実行によって引き起こ
されるものでアトムの実行を中断して起こるので割り込
みとは区別している。A trap is caused by the execution of an atom, and is distinguished from an interrupt because it is generated by interrupting the execution of the atom.

【０１９２】トラップには再実行可能なトラップと再実
行が不可能なトラップがある。再実行可能なトラップは
メモリアクセスを伴うもので、ＬＤやＳＴ命令によるＴ
ＬＢのミスなどがある。再実行不可能なトラップは、算
術演算にともなうオーバーフローなどである。There are traps that can be re-executed and traps that cannot be re-executed. A re-executable trap involves memory access,
There are LB mistakes. Non-reexecutable traps include overflows caused by arithmetic operations.

【０１９３】再実行可能なトラップを引き起こすアトム
は、前述したようにＰｅｎｄｉｎｇＱｕｅｕｅには積
まないアトムであるので、トラップが引き起こされた時
点で新たなＶＬＩＷ命令の中断をしてＰｅｎｄｉｎｇ
Ｑｕｅｕｅからのみアトムを発行すれば、Ｐｅｎｄｉｎ
ｇＱｕｅｕｅが空になった時点でアウトオブオーダー
を起こしているアトムはなくなる。トラップはＭステー
ジで引き起こされるから、トラップを起こしたアトムと
同時にフェッチ側から発行されたアトムは、Ｗステージ
でレジスタに値を書き込む前にキャンセルしておく。ま
た、トラップを起こしたアトムと同時にフェッチ側から
発行されたアトムのうちＰｅｎｄｉｎｇＱｕｅｕｅに
入ったものをキャンセルする。さらに、トラップを起こ
したアトムと同時にフェッチ側から発行されたアトムで
パイプライン中に存在するアトムはキャンセルする。こ
の時点のＰＣの値をＥＰＣというレジスタに保存してお
く。これで、トラップを起こしたアトムと同時かまたは
後にフェッチ側から発行されたアトムはすべてキャンセ
ルされ、再実行時に同じアトムを２度実行するというこ
とがなくなる。Since the atom that causes the re-executable trap is an atom that does not accumulate in the Pending Queue as described above, a new VLIW instruction is interrupted when the trap is triggered, and the Pending
If you issue an atom only from Queue, Pendin
When g Queue becomes empty, there are no atoms out of order. Since the trap is caused at the M stage, the atom issued from the fetch side at the same time as the atom causing the trap is canceled before writing the value to the register at the W stage. Also, among the atoms issued from the fetch side at the same time as the atom that caused the trap, those atoms that entered the Pending Queue are canceled. Further, the atom issued from the fetch side at the same time as the atom causing the trap and the atom existing in the pipeline is canceled. The value of PC at this point is stored in a register called EPC. As a result, all atoms issued from the fetch side at the same time as or after the atom that caused the trap are all canceled, and the same atom is not executed twice when re-executed.

【０１９４】次に、トラップの処理ルーチンに制御を移
し、トラップ処理を行い、最後にＥＰＣで示される場所
に制御を移すことで中断されていたトラップを起こした
アトムを含むＶＬＩＷ命令から再実行されることにな
る。Next, the control is transferred to the trap processing routine, the trap processing is performed, and finally the control is transferred to the location indicated by the EPC, whereby the VLIW instruction including the atom causing the trap which was interrupted is re-executed. Will be.

【０１９５】再実行が不可能なアトムでは、それがＰｅ
ｎｄｉｎｇＱｕｅｕｅから発行されるときに、既に他
のアトムに追い越されているかもしれない。したがっ
て、トラップを起こしたときにインオーダーで実行した
のと同じ状態で止めることができない。すなわち、トラ
ップを起こしたアトムを追い越している命令をハードウ
ェア的にキャンセルすることができないので、トラップ
を起こしたアトムを含むＶＬＩＷ命令から再実行すると
２度実行してしまうアトムが存在する可能性がある。For an atom that cannot be re-executed,
When issued from the nding Queue, it may already be overtaken by another atom. Therefore, when the trap is caused, it cannot be stopped in the same state as that executed in order. That is, since an instruction overtaking an atom that caused a trap cannot be canceled by hardware, there is a possibility that an atom that is executed twice when re-executed from a VLIW instruction that includes the atom that caused a trap may exist. is there.

【０１９６】ところで、再実行が不可能なアトムすなわ
ち算術演算アトムのオーバーフローなどではトラップが
起こったときに一般にプロセスを殺すようにしているの
で再実行は必要でなく、再実行が不可能なことが問題に
なることはない。By the way, in the case of an atom that cannot be re-executed, that is, an overflow of an arithmetic operation atom, the process is generally killed when a trap occurs, so that re-execution is not necessary. No problem.

【０１９７】プロセスを殺すようにしているために算術
演算アトムのオーバーフローがオーバーフローを起こし
たアトムがもともとあったプロセスから別のプロセスに
影響が伝達する可能性がある。例えば、オーバーフロー
を起こす算術演算アトムがプロセスＰＮ中にあったとす
る。何らかの原因でこれがＰｅｎｄｉｎｇＱｕｅｕｅに
入り、発行されないまま次のプロセスＰＭが起動してＰ
Ｍの動作中にＰｅｎｄｉｎｇＱｕｅｕｅからこの算術
演算アトムが発行されたオーバーフローのトラップが起
こると新しいプロセスＰＭを殺してしまう。Since an attempt is made to kill a process, an overflow of an arithmetic operation atom may affect another process from the process in which the overflowing atom originated. For example, it is assumed that an arithmetic operation atom causing an overflow exists in the process PN. This enters PendingQueue for some reason, the next process PM starts up without being issued, and P
If an overflow trap in which the arithmetic operation atom is issued from the Pending Queue during the operation of M occurs, the new process PM is killed.

【０１９８】このような動作を防ぐためには、プロセス
の起動あるいは終了時には必ずＰｅｎｄｉｎｇＱｕｅ
ｕｅが空になっていることを保証することが必要であ
る。一般にプロセスの起動あるいは終了には割込み動作
を伴うので、割り込みがトラップと同じようなＰｅｎｄ
ｉｎｇＱｕｅｕｅが空になってから割り込みをかける
という性質を持っていれば良い。In order to prevent such an operation, when starting or ending the process, the Pending Query must be executed.
It is necessary to ensure that ue is empty. Generally, starting or terminating a process involves an interrupt operation.
What is necessary is just to have a property of interrupting after the ing Queue becomes empty.

【０１９９】続いて、割り込み処理について説明する。Next, the interrupt processing will be described.

【０２００】割り込み処理は特定のアトムの実行に伴わ
ずに起こる処理である。割り込み処理の場合、問題とな
るのは、割り込み処理がかかった時点でＰｅｎｄｉｎｇ
Ｑｕｅｕｅにあるアトムをいつ実行するかという問題で
ある。仮に、ＰｅｎｄｉｎｇＱｕｅｕｅにあるアトム
が割り込み処理ルーチン中で発行され、例外処理を起こ
したりすると割り込み処理のルーチンが複雑になってし
まう。また、演算のオーバーフロー等例外処理を起こす
可能性のあるアトムは多いので例外処理を起こす命令は
ＰｅｎｄｉｎｇＱｕｅｕｅに入れないという方針はＰ
ｅｎｄｉｎｇＱｕｅｕｅの効果を引き下げてしまう。An interrupt process is a process that occurs without execution of a specific atom. In the case of interrupt processing, the problem is that the Pending
The question is when to execute the atom in Queue. If an atom in the Pending Queue is issued in the interrupt processing routine, and an exception processing occurs, the interrupt processing routine becomes complicated. Also, since there are many atoms that may cause exception processing such as overflow of operations, the policy that exception-causing instructions should not be included in the Pending Queue is P
Ending Queue effect will be reduced.

【０２０１】さらに、上記したプロセスの起動あるいは
終了時の処理を考えると割り込み処理をトラップと同じ
ように処理するのが適切である。例えば、Ｐｅｎｄｉｎ
ｇＱｕｅｕｅが空になってから割り込みを起こすとい
う方法をとるようにすると好ましい。Further, considering the above-described process at the time of starting or terminating the process, it is appropriate to process the interrupt process in the same manner as the trap. For example, Pendin
It is preferable to take a method of causing an interrupt after g Queue is empty.

【０２０２】次に、条件ブランチアトムの実行について
説明する。Next, the execution of the conditional branch atom will be described.

【０２０３】条件ブランチアトムはＰｅｎｄｉｎｇＱ
ｕｅｕｅに積まないアトムである。条件ブランチの条件
にＰビットがセットされていたらブランチが成立するか
どうかわからないので、ＰｅｎｄｉｎｇＱｕｅｕｅに
は積まず、新たなフェッチを行わないようにしている。The conditional branch atom is Pending Q
It is an atom that does not accumulate on ueue. If the P bit is set in the condition of the conditional branch, it is not known whether or not the branch will be established. Therefore, the Pending Queue is not loaded and a new fetch is not performed.

【０２０４】一般に高性能なプロセッサでは、条件ブラ
ンチをフェッチするとその結果を予測するロジックによ
り条件の演算結果を予測し、予測結果により後続するフ
ェッチが条件成立側からフェッチするのか不成立側から
フェッチするのかを決定する機構を実装している。この
機構により条件の予測が当たった場合にはブランチのパ
イプラインペナルティが小さく、平均的なブランチペナ
ルティが小さくなる。In general, in a high-performance processor, when a conditional branch is fetched, the result of the condition operation is predicted by logic for predicting the result. Implements a mechanism to determine When the condition is predicted by this mechanism, the pipeline penalty of the branch is small, and the average branch penalty is small.

【０２０５】この条件ブランチの予測メカニズムをダイ
ナミックにＶＬＩＷに適用する場合は次のようにすれば
よい。When the prediction mechanism of the conditional branch is dynamically applied to the VLIW, the following may be performed.

【０２０６】条件予測に成功した場合は、従来通りの処
理を行えばよい。When the condition prediction is successful, the conventional processing may be performed.

【０２０７】これに対して、条件予測に失敗した場合
は、条件の成立もしくは不成立はＥステージにおいて決
定されるので、ＦステージからＥステージまで失敗した
予測に基づいてフェッチされたアトムをキャンセルする
必要がある。また、失敗した予測に基づいてＰｅｎｄｉ
ｎｇＱｕｅｕｅに積まれたアトムもキャンセルする必
要がある。On the other hand, if the condition prediction fails, the satisfaction or failure of the condition is determined in the E stage. Therefore, it is necessary to cancel the fetched atoms from the F stage to the E stage based on the failed prediction. There is. Also, based on failed predictions, Pendi
It is also necessary to cancel atoms stored in ng Queue.

【０２０８】失敗した予測に基づいてフェッチされたア
トムはキャンセルする必要があるが、Ｐｅｎｄｉｎｇ
Ｑｕｅｕｅから発行されたアトムは条件分岐よりプログ
ラムの順序では前に実行されるべきものなので、これは
キャンセルしてはいけない。フェッチされたアトムであ
るのかＰｅｎｄｉｎｇＱｕｅｕｅから発行されたアト
ムであるかの区別をつけるためには、例えば、パイプラ
イン中のアトムでＰｅｎｄｉｎｇＱｕｅｕｅから発行
されたアトムにはＱタグを付けておく。条件ブランチの
予測に失敗したときはパイプライン中でＥステージより
前にある（Ｆ，Ｄ，Ｒステージにある）Ｑタグの付いて
いないアトムをキャンセルすることになる。An atom fetched based on a failed prediction needs to be canceled, but Pending
Do not cancel the atom issued from Queue because it must be executed before the conditional branch in program order. In order to distinguish whether the atom is a fetched atom or an atom issued from the Pending Queue, for example, an atom in the pipeline issued from the Pending Queue is tagged with a Q tag. If the prediction of the conditional branch fails, the atom without the Q tag (at the F, D, and R stages) before the E stage in the pipeline is canceled.

【０２０９】一方、失敗した予測に基づいてＰｅｎｄｉ
ｎｇＱｕｅｕｅに積まれたアトムもキャンセルする必
要があるが、これは例えば次のようにして区別される。On the other hand, based on the failed prediction, Pendi
It is also necessary to cancel atoms stored in ng Queue, which are distinguished, for example, as follows.

【０２１０】Ｅステージで予測が失敗したことが分かっ
たときはＲステージにあるはずのアトムがＰｅｎｄｉｎ
ｇＱｕｅｕｅの末尾に付け加えられている可能性があ
るので、ＰｅｎｄｉｎｇＱｕｅｕｅの末尾をキャンセ
ルする必要がある場合がある。この場合を判定するため
に、ＤステージからＰｅｎｄｉｎｇＱｕｅｕｅにアト
ムが積まれる場合には、同時にＲステージのアトムにＣ
タグを付ける。Ｃタグの付いているアトムは、Ｐｅｎｄ
ｉｎｇＱｕｅｕｅに積まれた命令の代わりにパイプに
存在していることを示している。If it is found at the E stage that the prediction has failed, the atom supposed to be at the R stage is changed to Pendin
Since the end of g Queue may have been added, it may be necessary to cancel the end of Pending Queue. In order to determine this case, when an atom is stacked on the Pending Queue from the D stage, C
Attach a tag. Atom with C tag is Pend
This indicates that the instruction is loaded on the pipe instead of the instruction loaded on the ing Queue.

【０２１１】Ｅステージで予測が失敗したことが分かっ
てＲステージのアトムにＣタグが付いているときは、Ｐ
ｅｎｄｉｎｇＱｕｅｕｅの末尾をキャンセルすれば、
予測に失敗してＰｅｎｄｉｎｇＱｕｅｕｅに入れてし
まったアトムのキャンセルができる。When it is found that the prediction has failed at the E stage and the atom at the R stage has a C tag,
If you cancel the end of the ending queue,
It is possible to cancel an atom that has failed to be predicted and has been placed in the Pending Queue.

【０２１２】ここで、図１７の具体例を使って条件予測
に失敗した場合の処理について説明する。Here, the processing in the case where the condition prediction has failed will be described with reference to the specific example of FIG.

【０２１３】図１７は、例となる命令列のＢＲＺ（分岐
命令）がｎｏｔｔａｋｅｎと予想されていて予想が外れ
てｔａｋｅｎとなった場合のパイプラインの状態に付い
て示している。図１７では、説明を簡単にするために、
ＶＬＩＷの複数のスロットのうち一つだけを示してい
る。FIG. 17 shows the state of the pipeline when the BRZ (branch instruction) of the example instruction sequence is predicted to be nottaken and is unexpectedly taken to be taken. In FIG. 17, for simplicity of explanation,
Only one of a plurality of VLIW slots is shown.

【０２１４】図１７（ａ）は、あるスロットの命令列の
例を示す。アトムが、ＢＲＺ、Ｘ、Ｙと続き、またＢＲ
Ｚの前にＺがあったものとする（Ｘ、Ｙ、Ｚの各アトム
の具体的内容はここでは特に規定していない）。（ｂ）
は、あるタイミングにおける各ステージの状態を示して
いる。（ｃ）はＰｅｎｄｉｎｇＱｕｅｕｅの末尾にＸ
アトムが保持されている状態を示している。FIG. 17A shows an example of an instruction sequence in a certain slot. Atom followed by BRZ, X, Y, and BR
It is assumed that there is Z before Z (the specific contents of each atom of X, Y, and Z are not particularly defined here). (B)
Indicates the state of each stage at a certain timing. (C) is an X at the end of the Pending Queue.
This shows a state where an atom is held.

【０２１５】さて、図１７の例において、ＢＲＺアトム
の次のアトムＸは実行可能ではなかったのでＰｅｎｄｉ
ｎｇＱｕｅｕｅに入れられて、代わりにＰｅｎｄｉｎ
ｇＱｕｅｕｅからＺというアトムがＲステージに入っ
ている。この状態で、ブランチの予想が外れると、Ｘと
Ｙをともにパイプラインからキャンセルする必要があ
る。Now, in the example of FIG. 17, the atom X next to the BRZ atom was not executable, so
ng Queue, instead of Pendin
An atom called Z from g Queue is in the R stage. In this state, if the branch is not predicted, it is necessary to cancel both X and Y from the pipeline.

【０２１６】ＲステージにはＣタグがついたアトムがあ
るので、ここに来るべきアトムがＰｅｎｄｉｎｇＱｕ
ｅｕｅにあることがわかり、ＰｅｎｄｉｎｇＱｕｅｕ
ｅの末尾をキャンセルする。さらに、ＹはＤステージか
らキャンセルする。一方、ＺはＲステージでもＱタグが
付いているのでＰｅｎｄｉｎｇＱｕｅｕｅから発行さ
れたアトムであることがわかり、これはキャンセルしな
い。Since there is an atom with a C tag in the R stage, the atom to come here is Pending Qu.
eue, and Pending Queu
Cancel the end of e. Further, Y cancels from the D stage. On the other hand, since Z has a Q tag even in the R stage, it is known that Z is an atom issued from the Pending Queue, and this is not canceled.

【０２１７】本発明は、上述した実施の形態に限定され
るものではなく、その技術的範囲において種々変形して
実施することができる。The present invention is not limited to the above-described embodiments, but can be implemented with various modifications within the technical scope.

【０２１８】[0218]

【発明の効果】本発明では、先行する命令が直ちには実
行できない場合に、これを一時待避しておき、後続の命
令を先に実行できるようにしたので、処理の高速化を図
ることができる。According to the present invention, when the preceding instruction cannot be executed immediately, it is temporarily saved and the subsequent instruction can be executed first, so that the processing can be speeded up. .

【０２１９】また、ハードウェアも簡易な構成で済むの
で、効果的な高速化を期待することができる。Further, since the hardware can have a simple configuration, an effective increase in speed can be expected.

[Brief description of the drawings]

【図１】本発明の一実施形態に係るＶＬＩＷ命令の一例
を示す図FIG. 1 is a diagram showing an example of a VLIW instruction according to an embodiment of the present invention.

【図２】同実施形態に係るダイナミックＶＬＩＷ方式を
説明するための図FIG. 2 is an exemplary view for explaining a dynamic VLIW method according to the embodiment;

【図３】命令列の一例を示す図FIG. 3 is a diagram showing an example of an instruction sequence.

【図４】図４の命令列を従来のＶＬＩＷ方式で実行した
場合について説明するための図FIG. 4 is a diagram for explaining a case where the instruction sequence of FIG. 4 is executed by a conventional VLIW method;

【図５】図４の命令列を同実施形態に係るダイナミック
ＶＬＩＷ方式で実行した場合について説明するための図FIG. 5 is an exemplary view for explaining a case where the instruction sequence of FIG. 4 is executed by the dynamic VLIW method according to the embodiment;

【図６】同実施形態に係るフェッチから実行までの手順
の一例を示す図FIG. 6 is an exemplary view showing an example of a procedure from fetch to execution according to the embodiment.

【図７】同実施形態に係るＰｅｎｄｉｎｇＱｕｅｕｅ
の構成例を示す図FIG. 7 is a diagram illustrating a Pending Queue according to the embodiment;
Figure showing a configuration example of

【図８】同実施形態に係るスコアボードの構成例を示す
図FIG. 8 is a view showing a configuration example of a scoreboard according to the embodiment;

【図９】同実施形態に係るフェッチしたアトムの実行可
能性を判断する手順の一例を示すフローチャートFIG. 9 is an exemplary flowchart illustrating an example of a procedure for determining the executability of a fetched atom according to the embodiment.

【図１０】同実施形態に係るＰｅｎｄｉｎｇＱｕｅｕ
ｅからのアトムの実行可能性を判断する手順の一例を示
すフローチャートFIG. 10 shows a Pending Queu according to the embodiment.
A flowchart showing an example of a procedure for determining the feasibility of an atom from e.

【図１１】命令列のいくつかの例を示す図FIG. 11 is a diagram showing some examples of an instruction sequence.

【図１２】スコアボードとＰｅｎｄｉｎｇＱｕｅｕｅ
の内容の推移の一例を示す図FIG. 12: Scoreboard and Pending Queue
Diagram showing an example of the transition of the contents of

【図１３】スコアボードとＰｅｎｄｉｎｇＱｕｅｕｅ
の内容の推移の他の例を示す図FIG. 13: Scoreboard and Pending Queue
Figure showing another example of the transition of the contents of

【図１４】スコアボードとＰｅｎｄｉｎｇＱｕｅｕｅ
の内容の推移のさらに他の例を示す図FIG. 14: Scoreboard and Pending Queue
Figure showing still another example of the transition of the contents of

【図１５】同実施形態に係るプロセッサの概略的な構成
例を示す図FIG. 15 is a diagram showing a schematic configuration example of a processor according to the embodiment;

【図１６】同実施形態に係るプロセッサの概略的な構成
例を示す図FIG. 16 is a diagram showing a schematic configuration example of a processor according to the embodiment;

【図１７】条件ブランチアトムの実行について説明する
ための図FIG. 17 is a diagram illustrating execution of a conditional branch atom.

[Explanation of symbols]

２−１，２−２，１２−１〜１２−３，２２−１〜２２
−３…ペンディングキュー（ＰｅｎｄｉｎｇＱｕｅｕ
ｅ）４，２４…スコアボード６−１，６−２…パイプラインユニット１６−１〜１６−３…パイプ２１…インストラクション・キャッシュ２３−１〜２３−３…セレクタ２５−１，２５−２…レジスタファイル（Ｒｅｇｉｓｔ
ｅｒＦｉｌｅ）２６−１〜２６−３…演算ユニット２７…データキャッシュ2-1, 2-2, 12-1 to 12-3, 22-1 to 22
-3 ... Pending Queue
e) 4, 24 score board 6-1, 6-2 pipeline unit 16-1 to 16-3 pipe 21 instruction cache 23-1 to 23-3 selector 25-1, 25-2 ... Register file (Regist
er File) 26-1 to 26-3: arithmetic unit 27: data cache

Claims

[Claims]

An instruction storage means for temporarily saving an instruction that has been fetched but cannot be executed so that a subsequent instruction can be executed in advance, and stores information on the use status of each register. Storage means for determining whether or not the fetched instruction or the instruction stored in the instruction storage means can be executed, based on information stored in the storage means. Arithmetic processing unit.

2. The arithmetic processing according to claim 1, wherein said determining means includes means for determining whether an instruction that has been fetched but cannot be executed is to be saved in said instruction storage means or fetching is interrupted. apparatus.

3. The arithmetic processing according to claim 1, wherein said judging means includes means for judging whether or not an instruction stored in said instruction accumulating means is given an opportunity for execution. apparatus.

4. The method according to claim 1, wherein the determining unit determines that the fetched instruction is to be input to the instruction storing unit. At least when the instruction has a register to which an execution result is to be written, the register can be overwritten. 4. The operation according to claim 1, wherein a condition that a value of a register referred to by the instruction has not been determined has been satisfied. Processing equipment.

5. The method according to claim 1, wherein said determining means determines that the fetch is interrupted when a register to which an execution result of the fetched instruction is to be written cannot be overwritten. An arithmetic processing unit according to any one of the preceding claims.

6. The arithmetic processing apparatus according to claim 1, wherein a specific instruction not to be input to said instruction storage means in any case is determined in advance.

7. The method according to claim 1, wherein the determining means determines whether the instruction stored in the instruction storing means is executable or not, if the instruction to be determined has a register to which the execution result is to be written, the register can be overwritten. 7. The arithmetic processing device according to claim 1, wherein if the value of the register referred to by the instruction is determined, it is determined that the instruction can be executed.

8. A plurality of arithmetic processing units for executing a given instruction and a plurality of registers used for executing the instruction, wherein a plurality of instructions previously associated with the arithmetic processing unit are simultaneously fetched. An arithmetic processing unit capable of processing a plurality of instructions in parallel by an arithmetic processing unit corresponding to each of the corresponding instructions, wherein a plurality of simultaneously fetched instructions which cannot be immediately executed is replaced by a subsequent instruction. An instruction storage means provided for each arithmetic unit for temporarily saving to enable execution, and for each register, completion of execution by a preceding instruction for writing to the register. First information indicating whether or not there is a non-existent instruction, and second information indicating the number of precedent instructions which refer to the register and have not been executed yet. For each of a plurality of instructions fetched at the same time, based on the type of each instruction, the register used by each instruction, and the corresponding first and second information, It is determined whether the instruction is to be input to the storage means or the fetch is interrupted, and the instruction determined to be executable immediately is the NOP instruction or the NOP instruction.
In the case of an instruction corresponding to the instruction, or when it is not determined that the instruction can be executed immediately, if an instruction is present in the instruction storage means corresponding to the instruction, the instruction in the instruction storage means is used by the instruction. An arithmetic processing apparatus comprising: a register; and a determination unit configured to determine whether or not execution is possible immediately based on the corresponding first and second information.

9. When a fetched instruction is input to a corresponding instruction storage means, first information of a register serving as a destination of the instruction among information stored in the storage means is set, and When the second information of the register serving as the source of the instruction is incremented and the fetched instruction is taken out from the corresponding instruction storage means, the information becomes the destination of the instruction among the information stored in the storage means. First information updating means for resetting the first information of the register and decrementing the second information of the register which is the source of the instruction; and when the load instruction being executed causes a fetch miss. ,
From among the information stored in the storage means, the first information of a register to which the data loaded by the instruction is to be written is set. When the execution of the load instruction in which the fetch error has occurred is completed, the first information is stored in the storage means. 9. The arithmetic processing according to claim 8, further comprising: a second information updating unit that resets first information of a register to which the data loaded by the instruction is to be written, among the pieces of information that have been written. apparatus.

10. The instruction storage means according to claim 1, wherein, for each stored instruction, when a register serving as a destination of the instruction and a register serving as a first source are the same register, a first indication indicating this is provided. If the register serving as the destination of the instruction and the register serving as the second source are the same register, a second tag indicating that fact is stored, and the stored instruction corresponding to the fetched instruction is stored. The first or second tag is set when applicable, and the first or second tag is set when the fetched instruction is retrieved from the corresponding instruction storage means. 10. The arithmetic processing device according to claim 9, further comprising a third information updating unit for resetting the information.

11. The method according to claim 1, wherein the determining unit determines whether or not the instruction stored in the instruction storing unit is executable if the first tag of the instruction is set. Wherein the first information of a register serving as a second source of the instruction is reset and the second information of a register serving as a destination of the instruction is 0, and the program can be executed. When the second tag of the instruction is set, the first information of the register serving as the first source of the instruction among the information stored in the storage unit is reset. 11. The arithmetic processing device according to claim 10, wherein if the second information of the register serving as the destination of the instruction is 0, it is determined that the instruction can be executed.

12. The information processing apparatus according to claim 1, wherein when determining whether or not the fetched instruction is executable, the first information of a register serving as a destination of the fetched instruction is set out of information stored in the storage means. 12. The arithmetic processing device according to claim 8, wherein it is determined that fetching is interrupted when the fetch is performed and / or when the second information of the register is not 0.

13. The deciding means does not interrupt the fetching when judging whether or not the fetched instruction can be executed, and becomes a source of the fetched instruction among the information stored in the storage means. 2. The method according to claim 1, wherein when the first information of the register is set, it is determined that the instruction is to be input to the instruction storage unit.
3. The processing device according to 2.

14. When the fetched instruction corresponds to a load instruction, a store instruction or a conditional branch instruction, the determination means determines that the instruction is not input to the instruction storage means in any case. The arithmetic processing device according to any one of claims 8 to 13, wherein:

15. An instruction which is fetched and being executed based on the failed condition prediction when the condition prediction of the currently executed conditional branch instruction fails, and the instruction inputted and stored in the instruction storage means. The arithmetic processing device according to claim 1, further comprising a unit that cancels the operation.

16. An arithmetic processing device according to claim 1, wherein said instruction storage means is constituted by a FIFO buffer.

17. The instruction according to claim 1, wherein said instruction is executed by pipeline processing, said execution is determined in a decode stage, and said information is updated in a decode stage and a memory stage. The arithmetic processing device according to claim 1.

18. An instruction is fetched and its execution is determined. If it is determined that the fetched instruction cannot be executed, if the interruption of the fetch has not occurred, the instruction is stored in the temporary storage means. And if it is determined that the fetched instruction is to be stored in the instruction storage means, it is determined whether or not another instruction stored in the instruction storage means is executable at the earliest. And determining that the instruction is to be executed if it is determined that the instruction can be executed.

19. A plurality of arithmetic processing units for executing a given instruction, a plurality of registers used for executing the instruction, storage means for storing information on the use status of each register, and a fetched instruction Instruction storage means provided corresponding to each arithmetic unit to save the instruction in order to enable the preceding execution of a subsequent instruction when the instruction cannot be immediately executed, and An instruction sequence control method for an arithmetic processing device that fetches a plurality of attached instructions at the same time and processes the plurality of instructions while changing the order of the instructions using the instruction storage means, comprising: Instructions are fetched at the same time, and based on the information stored in the storage unit, the fetched instruction is executed for each arithmetic unit, or the instruction stored in the instruction storage unit is executed. To run, or NO
Determining whether to execute the P instruction, and determining whether to store the instruction in the instruction storage means when the fetched instruction is not to be executed, and executing the instruction stored in the instruction storage means; Alternatively, when it is determined that at least one of storing the fetched instruction in the instruction storage unit is performed, the information stored in the storage unit is updated, the instruction is fetched from the instruction storage unit, and the instruction storage is performed. A method of controlling the order of instructions, wherein the instructions stored in the means are executed, and the determined instructions are executed.