JPH0778736B2

JPH0778736B2 - Electronic computer

Info

Publication number: JPH0778736B2
Application number: JP2320916A
Authority: JP
Inventors: 博文村谷
Original assignee: 工業技術院長
Priority date: 1990-11-27
Filing date: 1990-11-27
Publication date: 1995-08-23
Anticipated expiration: 2010-08-23
Also published as: JPH04191931A

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は遅延スロット長の異なる遅延分岐命令を用いて
効率的に命令を実行する電子計算機に関する。DETAILED DESCRIPTION OF THE INVENTION Object of the Invention (Field of Industrial Application) The present invention relates to an electronic computer that efficiently executes instructions using delayed branch instructions having different delay slot lengths.

（従来の技術）最近、RISC（Reduced Instruction Set Computer）型の
アーキテクチャを持つマイクロプロセッサが注目されて
いる。この種のマイクロプロセッサでは、そのパイプラ
イン処理の流れを乱すことなく命令を実行することが、
その性能を十分に発揮させる上での重要な課題となる。(Prior Art) Recently, a microprocessor having a RISC (Reduced Instruction Set Computer) type architecture has attracted attention. In this type of microprocessor, it is possible to execute instructions without disturbing the flow of pipeline processing.
It is an important issue to fully demonstrate its performance.

さて上述したパイプライン処理の乱れを招く原因の１つ
に分岐命令がある。即ち、パイプライン処理は、既に前
のフェーズでフェッチした命令のデコードと、次の命令
のフェッチとを同じフェーズで並行して行われる。この
際、デコードによって命令の種別が分岐であることが解
釈されると、上述した如くフェッチした次の命令を取消
し、前記分岐命令の分岐先の命令を改めてフェッチする
ことが必要となる。このような命令フェッチのやり直し
がクロックのロスとなり、パイプライン処理の乱れの原
因となる。A branch instruction is one of the causes of the disturbance of the pipeline processing described above. That is, in the pipeline processing, decoding of an instruction already fetched in the previous phase and fetching of the next instruction are performed in parallel in the same phase. At this time, if the decode interprets that the instruction type is branch, it is necessary to cancel the next instruction fetched as described above and fetch the branch destination instruction of the branch instruction again. Such re-execution of instruction fetch causes a clock loss, which causes disturbance of pipeline processing.

このような分岐命令によるパイプライン処理の乱れに起
因する不具合を解決する手段として、例えば『D.J.Lilja, “Reducing the Branch Penalty in Pipelined Process
ors,"Computer,July,1988,pp.47〜55』等の文献に紹介されるように、静的な分岐予測（Static Branch Prediction）動的な分岐予測（Dynamic Branch Prediction）遅延分岐（Delayed Branch）ループバッファ（Loop Buffers）等の手法を採用することが考えられている。As a means for solving the problem caused by the disturbance of the pipeline processing due to such a branch instruction, for example, “DJLilja,“ Reducing the Branch Penalty in Pipelined Process
ors, "Computer, July, 1988, pp.47-55", etc., static branch prediction (Dynamic Branch Prediction) delayed branch (Delayed Branch) ) It is considered to adopt methods such as Loop Buffers.

上述した遅延分岐の手法で用いられる遅延分岐命令は、
その命令（遅延分岐命令）の実行直後に一定長の遅延ス
ロットを設定するものであり、この遅延スロット期間に
ある命令は前記遅延分岐命令の実行前に実行されること
になる。尚、この遅延分岐命令の遅延スロット長はパイ
プライン処理のステージ数の各ステージでの処理の内容
に応じて決定される。例えば『Berkeley RISC II』等で
は、一般的に遅延スロット長［１］の遅延分岐命令が設
定される。The delay branch instruction used in the above-described delay branch method is
Immediately after the execution of the instruction (delayed branch instruction), a delay slot of a fixed length is set, and the instruction in this delay slot period is executed before the execution of the delayed branch instruction. The delay slot length of this delay branch instruction is determined according to the contents of processing in each stage of the number of stages of pipeline processing. For example, in "Berkeley RISC II" or the like, a delay branch instruction having a delay slot length [1] is generally set.

さて分岐命令のデコード時には、その直後（次）の命令
を既にフェッチしているので、一般的にはデコードによ
って命令の分岐が確定した場合、既にフェッチした次の
命令の実行を取消す必要がある。この点、遅延分岐命令
によれば、その次の命令の取消しを行なう代りに、遅延
分岐命令の後にnop（no operation）命令を置き、このn
op命令をそのまま実行することになる。この結果、前述
したように次の命令を取消す場合と実質的に等価な処理
を、より単純なハードウェアにより、しかもパイプライ
ン処理を乱すことなく実行することが可能となる。At the time of decoding a branch instruction, the immediately following (next) instruction has already been fetched. Therefore, in general, when the branch of the instruction is confirmed by decoding, it is necessary to cancel the execution of the already fetched next instruction. In this regard, according to the delayed branch instruction, instead of canceling the next instruction, a nop (no operation) instruction is placed after the delayed branch instruction, and this n
The op instruction is executed as it is. As a result, it becomes possible to execute the processing substantially equivalent to the case of canceling the next instruction as described above, with simpler hardware and without disturbing the pipeline processing.

ところでこの種の遅延分岐命令を持つ電子計算機におい
ては、遅延分岐命令によって設定される遅延スロット期
間に、例えばその分岐命令の直前の命令や分岐先の命
令、或いは分岐しない先の命令を持ってくることができ
る。このような命令実行順序の変更（シフト）により命
令列（コード）の最適化を行うことが可能となる。By the way, in an electronic computer having a delayed branch instruction of this kind, for example, an instruction immediately before the branch instruction, a branch destination instruction, or a non-branch destination instruction is brought into the delay slot period set by the delay branch instruction. be able to. By changing (shifting) the instruction execution order as described above, it becomes possible to optimize the instruction sequence (code).

このコードの最適化の手法については、例えば『T.R.Gross and J.L.Hennessy. “Optimizing Delayed Branchs," Proc.IEEE Micro−15,Oct.1982,pp.114〜120』等の文献に詳しく紹介される。しかしてこのようなコー
ドの最適化手法を用いることにより、例えば本来的には
何の処理も実行しなかったクロックを有効に使って、そ
の命令列を効率的に実行することが可能となる。The method of optimizing the code is described in detail in, for example, “TRGross and JL Hennessy.“ Optimizing Delayed Branchs, ”Proc. IEEE Micro-15, Oct. 1982, pp. 114-120”. By using such a code optimizing method, it becomes possible to efficiently execute the instruction sequence, for example, by effectively using the clock that originally did not execute any processing.

ここで上述した遅延分岐命令が関係するコード最適化手
法の代表的な例について説明する。ここに示す代表的な
コード最適化手法の１つは、或る分岐命令Ａの分岐先ａ
が分岐命令Ｂである場合、その分岐元の分岐命令Ａの分
岐先ａを、その分岐先ａの分岐命令Ｂの分岐先ｂにて置
換えると云う最適化手法でるある。もう１つの手法は、
本来、インタロックを起こしてしまうような２つの命令
の間に置かれたnop命令の位置に分岐命令を持ってくる
ことで、無駄なクロックを消すようにした最適化の手法
である。尚、ここで用いる［インタロック］の意味につ
いては後述する。Here, a typical example of the code optimization method related to the above-mentioned delayed branch instruction will be described. One of the typical code optimization methods shown here is a branch destination a of a branch instruction A.
Is a branch instruction B, the branch destination a of the branch instruction A of the branch source is replaced with the branch destination b of the branch instruction B of the branch destination a. Another method is
Originally, this is an optimization method in which a useless clock is erased by bringing a branch instruction at the position of a nop instruction placed between two instructions that cause an interlock. The meaning of [interlock] used here will be described later.

先ず、これらの最適化手法を実際に適用する際しての制
約について考える。First, let us consider the constraints when actually applying these optimization methods.

分岐命令Ａの分岐先が分岐命令Ｂである場合、分岐元の
分岐命令Ａの分岐先ａを、その分岐先ａの分岐命令Ｂの
分岐先ｂで置き換えることで、分岐の回数を減らすよう
にした最適変手法について述べる。但し、ここでは分岐
元、および分岐先の両分岐命令とも遅延分岐命令である
場合を考える。また説明の簡単化の為、ここではこれら
の遅延分岐命令の遅延スロット長を［１］として説明す
る。When the branch destination of the branch instruction A is the branch instruction B, the branch destination a of the branch source branch instruction A is replaced with the branch destination b of the branch instruction B of the branch destination a to reduce the number of branches. The optimal variation method is described below. However, here, it is considered that both the branch source branch instruction and the branch destination branch instruction are delayed branch instructions. Further, for simplification of description, the delay slot length of these delay branch instructions will be described as [1] here.

例えば第11図（ａ）に示すように、分岐先および分岐元
の両遅延命令とも無条件分岐命令（以下、［brd］と標
記する）であるような命令列である場合、これらの各無
条件分岐命令brd A,brd Bの次の命令＜ope1＞，＜ope2
＞の少なくとも一方がnop命令であるならば、この命令
列を第11図（ｂ）に示すように分岐元の無条件分岐命令
brd Aをbrd Bに書き換えて最適化することができる。For example, as shown in FIG. 11 (a), when both the branch target and the branch source delay instructions are unconditional branch instructions (hereinafter, referred to as [brd]), each of these delay instructions does not exist. Conditional branch instructions brd A, brd B next instructions <ope1>, <ope2
If at least one of> is a nop instruction, this instruction string is used as the branch source unconditional branch instruction as shown in FIG. 11 (b).
brd A can be rewritten to brd B for optimization.

尚、命令＜ope＞は、命令＜ope1＞，＜ope2＞のうちのn
op命令ではない方の命令を示している。また上記命令＜
ope1＞，＜ope2＞の両方がそれぞれnop命令である場合
には、上記命令＜ope＞もnop命令となる。The instruction <ope> is n out of the instructions <ope1> and <ope2>.
The instruction that is not the op instruction is shown. The above command <
When both ope1> and <ope2> are nop instructions, the above instruction <ope> is also a nop instruction.

また上記命令＜ope1＞，＜ope2＞が共にnop命令ではな
い場合には、例えば第11図（ｃ）に示すように分岐元の
無条件分岐命令の分岐先を書き換え（brd A→brd B）、
この無条件分岐命令の前に命令＜ope1＞を移動（シフ
ト）することによりその命令列を最適化することができ
る。しかし命令＜ope1＞を無条件分岐命令brdの前に移
動すると、もともの上記無条件分岐命令brdの前に存在
する命令との間でインタロックを起こす虞れがある。従
ってこのような場合には、例えばそれらの命令の間にno
p命令を挿入する必要が生じる。これ故、上述した第11
図（ｃ）に示すような最適化は、必ずしも有効であると
は云えない。If neither of the instructions <ope1> and <ope2> is a nop instruction, the branch destination of the unconditional branch instruction of the branch source is rewritten (brd A → brd B), for example, as shown in FIG. 11 (c). ,
The instruction sequence can be optimized by moving (shifting) the instruction <ope1> before this unconditional branch instruction. However, if the instruction <ope1> is moved before the unconditional branch instruction brd, there is a risk of interlocking with the instruction that originally exists before the unconditional branch instruction brd. So in such a case, say no between those instructions.
It becomes necessary to insert the p instruction. Therefore, the 11th
The optimization shown in FIG. 6C is not always effective.

これに対して第12図（ａ）に示すように、分岐元が条件
分岐命令（以下、［cbrd］と表記する）であり、その分
岐先が無条件分岐命令brdである場合には次のようにし
てその最適化を行うことができる。即ち、条件分岐命令
cbrdが、その条件の成立・不成立に拘らず遅延スロット
期間にある命令を実行するものとすると、例えば第12図
（ｂ）に示すように条件分岐命令cbrdの分岐先を書き換
え（cbrd cc,A→cbrd cc,B）、命令＜ope1＞を上記条件
分岐命令cbrdの実行前に移動することにより最適化する
ことができる。但し、このような最適化が行えるのは命
令＜ope1＞の実行の結果、そのコンディション・コード
が変わらない場合だけである。On the other hand, as shown in FIG. 12A, when the branch source is a conditional branch instruction (hereinafter, referred to as [cbrd]) and the branch destination is an unconditional branch instruction brd, The optimization can be performed in this way. That is, a conditional branch instruction
If cbrd executes an instruction in the delay slot period regardless of whether the condition is satisfied or not satisfied, for example, the branch destination of the conditional branch instruction cbrd is rewritten (cbrd cc, A as shown in FIG. 12B). → cbrd cc, B), the instruction <ope1> can be optimized by moving it before execution of the conditional branch instruction cbrd. However, such optimization can be performed only when the condition code does not change as a result of the execution of the instruction <ope1>.

ちなみに命令＜ope1＞の実行によってコンディション・
コードが変わるような場合は、上記＜ope1＞を前記条件
分岐命令cbrdの前に移動させることはできない。しかし
この場合には、例えば第12図（ｃ）に示すように条件分
岐命令cbrdの分岐先を書き換え（cbrd cc,A→cbrd cc,
C）、その分岐先を命令＜ope2＞にすることにより最適
化することができる。但し、ラベルB:で始まる基本ブロ
ックにとって、その直前の基本ブロックがプレデセッサ
であって、且つその基本ブロックの最後の命令が分岐命
令でなく、また遅延分岐命令の遅延スロット期間に存在
するものでもない場合には、上述した最適化の後、ラベ
ルC:の前にラベルB:への分岐命令br Bを挿入することが
必要となる。従ってこのような最適化を行っても、その
プログラムの実行時間が短くなるという保証はない。By the way, the condition by executing the instruction <ope1>
If the code changes, the above <ope1> cannot be moved before the conditional branch instruction cbrd. In this case, however, the branch destination of the conditional branch instruction cbrd is rewritten (cbrd cc, A → cbrd cc, as shown in FIG. 12 (c), for example.
C), it can be optimized by setting the branch destination to the instruction <ope2>. However, for the basic block starting with label B :, the basic block immediately before it is the predecessor, and the last instruction of the basic block is neither a branch instruction nor present in the delay slot period of the delayed branch instruction. In this case, after the above-mentioned optimization, it is necessary to insert a branch instruction br B to the label B: before the label C :. Therefore, even if such optimization is performed, there is no guarantee that the execution time of the program will be shortened.

しかし上記ラベルB:で始まる基本ブロックへの入り方
が、分岐だけによって発生するならば上述したような新
たな分岐命令の挿入の必要はなく、従って上述した最適
化はいちがいに悪とは云えない。However, if the basic block starting with the label B: is generated only by a branch, it is not necessary to insert a new branch instruction as described above, and thus the above optimization cannot be said to be bad.

次に、例えば第13図（ａ）に示すように、分岐元の条件
分岐命令が、条件不成立時に遅延スロット期間に存在す
る命令の実行を打ち消すような条件分岐命令（以下、
［cbrd^＊］と標記する）である場合について考える。こ
の場合には前述した第12図（ｂ）に示したような最適化
はできず、前述した第12図（ｃ）に示した最適化と同様
にして、例えば第13図（ｂ）に示すように条件分岐命令
の分岐先を変更（cbrd^＊ cc,A→cbrd^＊ cc,B）すること
で、その最適化が可能である。Next, as shown in FIG. 13A, for example, the conditional branch instruction of the branch source cancels the execution of the instruction existing in the delay slot period when the condition is not satisfied (hereinafter,
[Cbrd ^* ])). In this case, the optimization shown in FIG. 12 (b) cannot be performed, and the optimization shown in FIG. 13 (b) can be performed in the same manner as the optimization shown in FIG. 12 (c). As described above, the branch destination of the conditional branch instruction can be changed (cbrd ^* cc, A → cbrd ^* cc, B) to optimize it.

但し、この場合にもラベルB:で始まる基本ブロックの直
前の基本ブロックがプレデセッサであって、この基本ブ
ロックの最後の命令が分岐命令でなく、また遅延分岐命
令の遅延スロット期間に存在するものでもない場合に
は、その最適化の後にラベルC:の前にラベルB:への分岐
命令br Bを挿入することが必要となる。従ってこの場合
にも、上述した最適化によりプログラムの実行時間が短
くなるという保証はない。しかし上記ラベルB:で始まる
基本ブロックへの入り方が前述したように分岐だけによ
って発生するならば新たな分岐命令の挿入の必要がなく
なるので、この最適化は非常に有効に作用することにな
る。However, even in this case, even if the basic block immediately before the basic block starting with label B: is the predecessor, the last instruction of this basic block is not a branch instruction, and it exists in the delay slot period of the delayed branch instruction. If not, it is necessary to insert a branch instruction br B to the label B: before the label C: after the optimization. Therefore, even in this case, there is no guarantee that the execution time of the program will be shortened by the above-described optimization. However, if the entry into the basic block starting with the label B: above is caused only by a branch as described above, there is no need to insert a new branch instruction, so this optimization works very effectively. .

最後に、本来、インタロックを起こしてしまうような２
つの命令の間に置かれたnop命令の位置に分岐命令を持
ってくることで、無駄なクロックを消すようにした最適
化の手法について述べる。Finally, 2 that would cause an interlock by nature
We describe an optimization method that eliminates unnecessary clocks by bringing a branch instruction to the position of the nop instruction placed between two instructions.

第14図（ａ）はインタロックを起こしてしまうような２
つの命令＜ope1＞，＜ope2＞の間にnop命令を持つ命令
列を示している。ここで上記２つの命令＜ope1＞，＜op
e2＞の間の挿入されているnop命令は、命令＜ope1＞，
＜ope2＞が起こすインタロックを防止する為のものであ
る。Fig. 14 (a) shows 2 which causes an interlock.
An instruction string having a nop instruction between two instructions <ope1> and <ope2> is shown. Here, the above two instructions <ope1>, <op
The nop instruction inserted between e2> is the instruction <ope1>,
This is to prevent interlock caused by <ope2>.

さて上述したインタロックとは、命令＜ope1＞の或るフ
ェーズと命令＜ope2＞の或るフェーズとの間に何等かの
依存関係があり、これらの命令＜ope1＞，＜ope2＞を連
続して実行すると、上記依存関係が因果律に反し、この
結果、正しい実行動作が保証されないような現象のこと
を称する。The interlock described above has some dependency between a certain phase of the instruction <ope1> and a certain phase of the instruction <ope2>, and these instructions <ope1> and <ope2> are consecutively connected. When this is executed, the above-mentioned dependency relationship violates the causality, and as a result, a correct execution operation is not guaranteed.

例えば第15図（ａ）に示すように命令＜ope1＞がフェー
ズＭを終了した後でないと命令＜ope2＞のフェーズＥを
実行できない場合、命令＜ope1＞＜ope2＞を連続して実
行するとインタロックが発生する。従ってこのような場
合には、第15図（ｂ）に示すように連続する２つの命令
＜ope1＞，＜ope2＞の間にnop命令をおくことで、イン
タロックの発生を防止する必要がある。For example, as shown in FIG. 15 (a), if the instruction <ope1> cannot execute the phase E of the instruction <ope2> until the instruction <ope1> finishes the phase M, if the instructions <ope1><ope2> are continuously executed, Lock occurs. Therefore, in such a case, it is necessary to prevent the interlock from occurring by placing a nop instruction between two consecutive instructions <ope1> and <ope2> as shown in FIG. 15 (b). .

この場合、例えば命令＜ope2＞の次の命令が無条件分岐
命令brである場合には、例えば第15図（ｃ）に示すよう
に上記分岐命令brを遅延分岐命令brdに変え、前記命令
＜ope1＞，＜ope2＞の間に移動することにより、nop命
令が与えられていた部分での無駄なクロックを無くすこ
とが可能となる。但し、このような最適化が行なえるの
は、nop命令がbr命令の２つの前（遅延スロット長＋１
スロット前）にある場合だけである。このような条件が
満たされない場合には、例えば命令のスケジューリング
により、nop命令をその位置まで移動させる必要があ
り、このようなnop命令の移動が必ずしも可能であると
は保証されない。In this case, for example, when the instruction next to the instruction <ope2> is the unconditional branch instruction br, the branch instruction br is changed to the delayed branch instruction brd as shown in FIG. By moving between ope1> and <ope2>, it is possible to eliminate a wasteful clock at the portion where the nop instruction was given. However, such optimization can be done only if the nop instruction is two before the br instruction (delay slot length + 1
Only before the slot). If such a condition is not satisfied, it is necessary to move the nop instruction to that position by, for example, instruction scheduling, and it is not always guaranteed that such a movement of the nop instruction is possible.

（発明が解決しようとする課題）このようにRISC型のアーキテクチャを持つマイクロプロ
セッサにおいては、遅延分岐命令の分岐先が同様な遅延
分岐命令である場合、その分岐元の分岐命令の分岐先
を、その分岐先の分岐命令の分岐先にて置き換えると云
う命令コードの最適化が考えられる。しかし遅延命令の
種類によっては、種々の制約により必ずしもその最適化
を行い得るとは限らない。また最適化を行うに際して新
たな分岐命令の追加が必要になる場合もあり、実行時間
を減らすことが可能となると云う保証もない。従ってこ
れらの問題を考慮すると、その最適化は不浄に複雑なも
のとなることが否めない。(Problems to be Solved by the Invention) In a microprocessor having a RISC-type architecture as described above, when the branch destinations of the delayed branch instructions are similar delayed branch instructions, the branch destination of the branch instruction of the branch source is It is possible to optimize the instruction code to replace the branch instruction at the branch destination with the branch destination. However, depending on the type of delay instruction, it is not always possible to optimize it due to various restrictions. In addition, it may be necessary to add a new branch instruction when performing optimization, and there is no guarantee that the execution time can be reduced. Therefore, considering these problems, the optimization is unavoidably complicated.

またインタロックを起こすような２つの命令間に分岐命
令を移動させることで無駄なクロックをなくすような最
適化においては、nop命令のスケジューリングにより最
適化可能な位置に置くことができるとは限らない。Further, in optimization in which a wasteful clock is eliminated by moving a branch instruction between two instructions that cause an interlock, it is not always possible to place it in an optimizable position by scheduling a nop instruction. .

本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、命令列の最適化手続きの複雑さ
を解決し、更には最適化を行う上での制約を緩和して実
際の命令列の最適化を容易に行って無駄なく効率的に命
令列を実行することのできる電子計算機を提供すること
にある。The present invention has been made in view of such circumstances, and an object of the present invention is to solve the complexity of the optimization procedure of the instruction sequence and further to relax the constraint on the optimization. An object of the present invention is to provide an electronic computer capable of easily optimizing an actual instruction sequence and efficiently executing the instruction sequence without waste.

［発明の構成］（課題を解決するための手段）本発明に係る電子計算機は、命令を順次取得する命令プ
リフェッチレジスタと、前記取得された命令を解釈する
デコーダと、前記解釈された命令が、遅延スロット長が
可変である遅延分岐命令または遅延スロット長が異なる
複数の遅延分岐命令の場合に、前記分岐命令の分岐先ア
ドレスを格納する分岐先アドレスレジスタと、前記解釈
された命令が、遅延スロット長が可変である遅延分岐命
令または遅延スロット長が異なる複数の遅延分岐命令の
場合に、前記分岐命令の遅延スロット長の値が格納さ
れ、命令の同期をとるためのクロック信号に応じて前記
格納された遅延スロット長の値を減少する分岐カウンタ
と、遅延スロット期間終了を判定するための設定値と前
記分岐カウンタに格納された値とを比較し、該比較によ
り一致のときには遅延スロット期間の終了を検出する比
較器と、前記デコーダによりデコードされた命令が前記
遅延分岐命令でない分岐命令の場合には、該分岐命令の
分岐先アドレスが格納され、前記デコーダによりデコー
ドされた命令が分岐命令ではない場合には、格納されて
いるアドレスの次のアドレスが格納され、前記遅延スロ
ット期間の終了が検出された場合には前記分岐先アドレ
スレジスタに格納されているアドレスが格納されるプロ
グラムカウンタと、前記遅延スロット期間の終了が検出
されたときに前記分岐先アドレスレジスタに格納されて
いる値を選択し、前記遅延スロット期間終了が検出され
ないときには、前記プログラムカウンタの値を選択する
ことにより、次に取得する命令の制御を行うセレクタと
を備え、無条件分岐命令の分岐先が無条件分岐命令であ
るときには、分岐元の無条件分岐命令の分岐先を、分岐
先の無条件分岐命令の分岐先が分岐先であり、かつ遅延
スロット長を［２］とする遅延分岐命令に変更し、更
に、分岐先の無条件分岐命令の直後の命令を分岐元の無
条件分岐命令の直後の命令の後に複写する最適化を行
い、命令列が第１の命令、インタロックを防止するため
のnop命令、第２の命令、…、第Ｍ＋１（Ｍは整数とす
る）の命令、無条件分岐命令、の順番で並んでいるとき
には、前記nop命令を、前記無条件分岐命令の分岐先が
分岐先であり、かつ遅延スロット長を［Ｍ］とする遅延
分岐命令に変更し、更に、前記無条件分岐命令を削除す
る最適化を行い、遅延スロット長［Ｍ］の遅延分岐命令
の分岐先が、遅延スロット長［Ｎ］（Ｎは整数とする）
の遅延分岐命令であるときには、分岐元の遅延分岐命令
を、分岐先の遅延分岐命令の分岐先が分岐先であり、か
つ分岐元の遅延分岐命令の遅延スロットの値と分岐先の
遅延分岐命令の遅延スロットの値との合計を遅延スロッ
ト長とする遅延分岐命令に変更し、更に、分岐先の遅延
分岐命令の後の命令Ｎ個を分岐元の遅延分岐命令の後の
Ｍ個の命令の後に複写する最適化を行うことを特徴と
し、更に、パイプライン機能を持つことを特徴とするも
のである。[Configuration of the Invention] (Means for Solving the Problem) An electronic computer according to the present invention is configured such that an instruction prefetch register that sequentially acquires instructions, a decoder that interprets the acquired instructions, and the interpreted instructions are In the case of a delay branch instruction having a variable delay slot length or a plurality of delay branch instructions having different delay slot lengths, a branch destination address register storing a branch destination address of the branch instruction and the interpreted instruction are In the case of a delay branch instruction having a variable length or a plurality of delay branch instructions having different delay slot lengths, the value of the delay slot length of the branch instruction is stored, and the storage is performed according to a clock signal for synchronizing the instructions. A branch counter for reducing the value of the delay slot length, a set value for determining the end of the delay slot period, and the branch counter And a comparator for detecting the end of the delay slot period when there is a match by the comparison, and a branch destination of the branch instruction when the instruction decoded by the decoder is not the delay branch instruction. When the address is stored and the instruction decoded by the decoder is not a branch instruction, the address next to the stored address is stored, and when the end of the delay slot period is detected, the branch destination is detected. A program counter in which the address stored in the address register is stored and a value stored in the branch destination address register when the end of the delay slot period is detected, and the end of the delay slot period is detected. If not, the value of the program counter is selected to control the instruction to be acquired next. And a branch destination of the unconditional branch instruction is a branch destination of the unconditional branch instruction of the branch source, and a branch destination of the unconditional branch instruction of the branch destination is Change to a delayed branch instruction with a delay slot length of [2], and further optimize the instruction immediately after the unconditional branch instruction at the branch destination after the instruction immediately after the unconditional branch instruction at the branch source, When the instruction sequence is arranged in the order of the first instruction, the nop instruction for preventing interlock, the second instruction, ..., The M + 1th (M is an integer) instruction, and the unconditional branch instruction, The nop instruction is changed to a delayed branch instruction in which the branch destination of the unconditional branch instruction is the branch destination and the delay slot length is [M], and optimization is performed to delete the unconditional branch instruction. , The branch destination of the delay branch instruction with delay slot length [M] is delayed Lot length [N] (N is an integer)
, The delay source of the branch source is the branch destination of the delay branch instruction of the branch destination, the delay slot value of the delay branch instruction of the branch source, and the delay branch instruction of the branch destination. Of delay instructions of the branch destination delay branch instruction is changed to a delay branch instruction having a delay slot length as a sum of the delay slot value of It is characterized by performing optimization for copying later, and further characterized by having a pipeline function.

（作用）本発明によれば、遅延分岐命令の遅延スロット長が可変
され、その遅延スロット長に応じて分岐の実行が制御さ
れるので、例えば遅延スロット期間を長くしてその間に
他の命令を効率的に実行することが可能となり、命令列
の最適化に対する制約条件を大幅に緩和することが可能
となる。この結果、命令列の最適化を簡単に、且つ効率
的に行い、クロックの無駄を招くことなくその命令実行
時間の短縮化を図ることが可能となる。(Operation) According to the present invention, the delay slot length of the delayed branch instruction is changed and the execution of the branch is controlled according to the delay slot length. Can be efficiently executed, and the constraint condition for optimizing the instruction sequence can be significantly relaxed. As a result, it becomes possible to optimize the instruction sequence easily and efficiently, and to reduce the instruction execution time without wasting the clock.

（実施例）本発明に係る電子計算機は、第１図に示すように構成さ
れた命令フェッチ部を備える。この命令フェッチ部は命
令を順次取得する命令プリフェッチレジスタと、前記取
得された命令を解釈するデコーダと、前記解釈された命
令が、遅延スロット長が可変である遅延分岐命令または
遅延スロット長が異なる複数の遅延分岐命令の場合に、
前記分岐命令の分岐先アドレスを格納する分岐先アドレ
スレジスタと、前記解釈された命令が、遅延スロット長
が可変である遅延分岐命令または遅延スロット長が異な
る複数の遅延分岐命令の場合に、前記分岐命令の遅延ス
ロット長の値が格納され、命令の同期をとるためのクロ
ック信号に応じて前記格納された遅延スロット長の値を
減少する分岐カウンタと、遅延スロット期間終了を判定
するための設定値と前記分岐カウンタに格納された値と
を比較し、該比較により一致のときには遅延スロット期
間の終了を検出する比較器と、前記デコーダによりデコ
ードされた命令が前記遅延分岐命令でない分岐命令の場
合には、該分岐命令の分岐先アドレスが格納され、前記
デコーダによりデコードされた命令が分岐命令ではない
場合には、格納されているアドレスの次のアドレスが格
納され、前記遅延スロット期間の終了が検出された場合
には前記分岐先アドレスレジスタに格納されているアド
レスが格納されるプログラムカウンタと、前記遅延スロ
ット期間の終了が検出されたときに前記分岐先アドレス
レジスタに格納されている値を選択し、前記遅延スロッ
ト期間終了が検出されないときには、前記プログラムカ
ウンタの値を選択することにより、次に取得する命令の
制御を行うセレクタとを備えており、更に、パイプライ
ン機能を有する電子計算機を用いる。(Embodiment) An electronic computer according to the present invention includes an instruction fetch unit configured as shown in FIG. The instruction fetch unit includes an instruction prefetch register for sequentially acquiring instructions, a decoder for interpreting the acquired instructions, and a plurality of delayed branch instructions having different delay slot lengths or different delay slot lengths for the interpreted instructions. In case of the delayed branch instruction of
A branch destination address register for storing a branch destination address of the branch instruction; and the interpreted instruction is a delay branch instruction having a variable delay slot length or a plurality of delay branch instructions having different delay slot lengths, the branch A branch counter that stores the value of the delay slot length of the instruction and decreases the stored value of the delay slot length according to a clock signal for synchronizing the instruction, and a set value for determining the end of the delay slot period And a value stored in the branch counter, and a comparator for detecting the end of the delay slot period when there is a match by the comparison, and a case where the instruction decoded by the decoder is a branch instruction which is not the delayed branch instruction. Is stored when the branch destination address of the branch instruction is stored and the instruction decoded by the decoder is not a branch instruction. The address next to the current address is stored, and when the end of the delay slot period is detected, the program counter storing the address stored in the branch destination address register and the end of the delay slot period are detected. When the value is detected, the value stored in the branch destination address register is selected, and when the end of the delay slot period is not detected, the value of the program counter is selected to control the instruction to be acquired next. An electronic computer having a selector and further having a pipeline function is used.

上記可変の遅延スロット長を持つ遅延分岐命令は、その
遅延スロット長を［Ｎ］としたとき、無条件分岐命令の
場合にはbrd（Ｎ）として標記され、また条件分岐命令
の場合にはcbrd^＊（Ｎ）として標記される。そして、例
えば第２図に示すような命令列に対しては、遅延スロッ
ト長が［Ｎ］である遅延分岐命令brd（Ｎ）の後に続く
命令列＜ope1＞，＜ope2＞，…，＜opeN＞を、上記遅延
分岐命令brd（Ｎ）により設定される遅延スロット期間
内に順に実行し、その後、上記遅延スロット［Ｎ］を経
た時点で前期遅延分岐命令brd（Ｎ）の分岐先の命令で
ある＜ope＞を実行するものとなっている。When the delay slot length is [N], the delay branch instruction having the variable delay slot length is marked as brd (N) in the case of an unconditional branch instruction, and cbrd in the case of a conditional branch instruction. ^* Marked as (N). Then, for an instruction string as shown in FIG. 2, for example, an instruction string <ope1>, <ope2>, ..., <opeN following a delayed branch instruction brd (N) having a delay slot length of [N]. > Are sequentially executed within the delay slot period set by the delayed branch instruction brd (N), and after that, when the delay slot [N] is passed, the instruction at the branch destination of the preceding delayed branch instruction brd (N) is executed. It is supposed to execute a certain <ope>.

つまり本発明では可変の遅延スロット長を持つ遅延分岐
命令が取り扱われるので、第３図（ａ）に示すよう遅延
分岐命令の分岐先を数スロット先に意図的に延ばし、そ
の遅延スロット期間に他の複数の命令を実行し得るもの
となっている。ちなみに従来にあっては、遅延分岐命令
の遅延スロット長が［１］として固定的に定められてい
るだけなので、第３図（ｂ）に対比して示すように、そ
の遅延スロット期間に高々１の命令しか実行することが
できない。これ故、前述したように命令コード列の最適
化の上での種々の制約が発生し、その最適化が困難化す
ると云う不具合が生じていた。That is, in the present invention, since a delay branch instruction having a variable delay slot length is handled, the branch destination of the delay branch instruction is intentionally extended to a few slots ahead as shown in FIG. Is capable of executing multiple instructions. By the way, in the prior art, since the delay slot length of the delay branch instruction is fixedly set as [1], as shown in comparison with FIG. Can only execute the command. Therefore, as described above, various restrictions occur in optimizing the instruction code string, which causes a problem that the optimization becomes difficult.

以下、このような可変の遅延スロット長を持つ遅延分岐
命令による、上述した問題の解決について説明する。Hereinafter, the solution of the above-mentioned problem by the delay branch instruction having such a variable delay slot length will be described.

先ず前述した第11図（ａ）に示したような命令列に対し
ては、例えば遅延スロット長が［２］である無条件分岐
命令brd（２）を用いることにより、その不具合を解消
することができる。即ち、この場合には、例えば第４図
（ａ）に示すように、分岐元の無条件分岐命令brdの分
岐先を、その分岐先の無条件分岐命令brdの分岐先に変
更し（brd A→brd B）、かつ、その遅延スロット長を
［２］とし、更に、分岐先の無条件分岐命令の直後の命
令を分岐元の無条件分岐命令の直後の命令の後に複写す
るような（brd B→brd（Ｎ） B）最適化を行えば良い。First, with respect to the instruction sequence as shown in FIG. 11 (a), the unconditional branch instruction brd (2) whose delay slot length is [2] is used to solve the problem. You can That is, in this case, as shown in FIG. 4A, for example, the branch destination of the unconditional branch instruction brd of the branch source is changed to the branch destination of the unconditional branch instruction brd of the branch destination (brd A → brd B), and its delay slot length is [2], and the instruction immediately after the unconditional branch instruction at the branch destination is copied after the instruction immediately after the unconditional branch instruction at the branch source (brd B → brd (N) B) Optimization may be performed.

この場合、前述した第11図（ｂ）に示した最適化のよう
に、命令＜ope1＞，＜ope2＞のいずれかがnop命令であ
る必要はない。また第11図（ｃ）に示した最適化のよう
に、命令＜ope1＞を分岐命令brd（Ｎ） Bの前に移動す
る必要が無いので、その命令＜ope1＞とその直前の命令
との間のインタロックを考慮する必要はない。但し、命
令＜ope1＞＜ope2＞の間のインタロックの可能性につい
ては十分に考慮する必要がある。このような新しい最適
化は非常に単純であり、且つその最適化に成功する可能
性が非常に高いものである。In this case, it is not necessary for any of the instructions <ope1> and <ope2> to be a nop instruction as in the optimization shown in FIG. 11 (b) described above. Further, unlike the optimization shown in FIG. 11 (c), since it is not necessary to move the instruction <ope1> before the branch instruction brd (N) B, the instruction <ope1> and the instruction immediately before it are not There is no need to consider interlocks between them. However, it is necessary to fully consider the possibility of interlocking between the instructions <ope1> and <ope2>. Such a new optimization is very simple and very likely to succeed.

一方、第12図（ａ）に示したような命令列に対しては、
例えば同様にして遅延スロット長［２］の条件分岐命令
cbrd（２）を準備する。このような条件分岐命令cbrd
（２）を用いれば、先の無条件分岐の場合と同様にし
て、その分岐元の分岐命令cbrd cc,Aをcbod（２） cc,B
に変更することで、第４図（ｂ）に示すように最適化す
ることが可能となる。On the other hand, for the instruction sequence shown in FIG. 12 (a),
For example, similarly, a conditional branch instruction with a delay slot length [2]
Prepare cbrd (2). Such a conditional branch instruction cbrd
If (2) is used, the branch instruction cbrd cc, A of the branch source is changed to cbod (2) cc, B in the same manner as the case of the unconditional branch.
By changing to, it becomes possible to optimize as shown in FIG.

この場合にも、前述した第12図（ｂ）に示した最適化の
ように、命令＜ope1＞を分岐命令cbrdの前に移動する必
要がないので、命令＜ope1＞がコンディション・コード
を変えるか否かを考慮する必要がなくなる。また第12図
（ｃ）に示した最適化のように、新たな分岐命令を挿入
する必要もないので、確実にプログラムの実行時間を短
くすることが可能となる。Also in this case, it is not necessary to move the instruction <ope1> before the branch instruction cbrd as in the optimization shown in FIG. 12 (b), so the instruction <ope1> changes the condition code. There is no need to consider whether or not. Further, unlike the optimization shown in FIG. 12 (c), it is not necessary to insert a new branch instruction, so that the execution time of the program can be surely shortened.

また前述した第13図（ａ）に示したような命令列に対し
ても、例えば遅延スロット長が［２］の条件分岐命令cb
rd^＊（２）を準備する。そしてこの条件分岐命令cbrd
^＊（２）を用い、条件分岐命令cbrd^＊ cc,Aをcbrd^＊
（２） cc,Bに変更することで、例えば第４図（ｃ）
に示すように最適化することが可能となる。尚、条件分
岐命令cbrd^＊は、その条件不成立時には遅延スロットに
存在する命令の実行を取消すものである。この場合に
も、前述した第13図（ｂ）に示した最適化のように新た
な分岐命令を挿入する必要がないので、そのプログラム
の実行時間を確実に短くすることが可能となる。Also for the instruction sequence as shown in FIG. 13 (a), for example, the conditional branch instruction cb with the delay slot length [2]
Prepare rd ^* (2). And this conditional branch instruction cbrd
^{* Using} (2), the conditional branch instruction cbrd ^* cc, A is cbrd ^*
(2) By changing to cc, B, for example, Fig. 4 (c)
It is possible to optimize as shown in. The conditional branch instruction cbrd ^* cancels the execution of the instruction existing in the delay slot when the condition is not satisfied. Also in this case, since it is not necessary to insert a new branch instruction as in the optimization shown in FIG. 13 (b) described above, the execution time of the program can be surely shortened.

尚、第図５（ａ）に示すように命令列が第１の命令、イ
ンタロックを防止するためのnop命令、第２の命令、
…、第Ｍ＋１（Ｍは整数とする）の命令、無条件分岐命
令、の順番で並んでいる場合について考える。従来、こ
のような命令列は、前記命令＜opeM＋１＞のＭが［１］
でなければ最適化の対象にならなかったものである。As shown in FIG. 5A, the instruction sequence is the first instruction, the nop instruction for preventing interlock, the second instruction,
Let us consider a case where the M + 1th (M is an integer) instruction and the unconditional branch instruction are arranged in this order. Conventionally, in such an instruction sequence, M of the instruction <opeM + 1> is [1].
Otherwise, it would not have been the object of optimization.

このような命令列については、一般的には命令のスケジ
ューリングにより、例えば第６図（ａ）に示すように並
べ変えることができる。このようにスケジューリングさ
れた命令列の場合には、上記nop命令が分岐命令brの２
スロット前（遅延スロット長＋１スロット前）に置かれ
るときにだけ、つまり［Ｍ＝１］の形になるときにだ
け、例えば第６図（ｂ）に示すような最適化が可能であ
る。但し、上記分岐命令brは遅延スロット長［０］の無
条件分岐命令を表している。尚、遅延スロット期間の全
てがnop命令で埋められている遅延分岐命令brdの場合に
も同様に最適化することが可能である。Such an instruction sequence can be generally rearranged by instruction scheduling, for example, as shown in FIG. 6 (a). In the case of an instruction sequence scheduled in this way, the above nop instruction is the branch instruction br 2
Only when placed before the slot (delay slot length + 1 slot before), that is, when the form becomes [M = 1], optimization as shown in FIG. 6 (b) is possible. However, the branch instruction br represents an unconditional branch instruction having a delay slot length [0]. In the case of the delayed branch instruction brd in which the entire delay slot period is filled with the nop instruction, the optimization can be similarly performed.

しかし前述した遅延スロット長［Ｍ］の遅延分岐命令を
用いれば、例えば第５図（ａ）に示した命令列を、第６
図（ａ）に示したようにスケジューリングすることがで
きないような場合であっても、その命令列を第５（ｂ）
に示すように前記インターロックを防止するためのnop
命令を、前記分岐命令の分岐先が分岐先であり、かつそ
の遅延スロット長を［Ｍ］とする遅延分岐命令に変更
し、更に前記分岐命令を削除することで、直接的に最適
化することが可能となる。従って第５図（ａ）に示すよ
うな命令列をわざわざスケジューリングして第６図
（ａ）に示すような命令列に変換する必要がなくなり、
その最適化を非常に簡単に行うことが可能となる。However, if the delay branch instruction having the delay slot length [M] described above is used, for example, the instruction sequence shown in FIG.
Even in the case where scheduling cannot be performed as shown in FIG. 7A, the instruction sequence is changed to the fifth (b)
Nop to prevent the interlock as shown in
Directly optimizing an instruction by changing it to a delayed branch instruction whose branch destination is the branch destination and whose delay slot length is [M], and further deleting the branch instruction. Is possible. Therefore, there is no need to purposely schedule the instruction sequence as shown in FIG. 5A and convert it into the instruction sequence as shown in FIG.
The optimization can be performed very easily.

このように可変の遅延スロット長を持つ分岐命令を導入
することにより、本発明に係る電子計算機では、従来よ
り問題となっていた最適化の複雑さを解決し、しかも従
来では最適化の対象とならなかった命令列に対しても、
これを効果的に最適化することが可能となる。By introducing a branch instruction having a variable delay slot length in this way, in the electronic computer according to the present invention, the complexity of optimization, which has been a problem in the past, can be solved. Even for the instruction sequence that did not become,
It is possible to effectively optimize this.

ところで一般的に、遅延スロット長が長くなるとその最
適化は難しくなると云われている。その理由は遅延スロ
ット期間内に移動可能な命令の種別に制約があることに
起因する。しかしこのような議論が成立するのは、遅延
スロット長が固定されているとの前提に基づくものであ
る。然し乍ら、前述したように遅延スロット長（期間）
が可変される場合には、逆に長い遅延スロット長を持つ
遅延分岐命令が存在することにより、その長い遅延スロ
ット期間内に移動可能な命令に対する制約が少なくな
り、最適化の観点から見ると幾つかの利点が生じる。By the way, it is generally said that the optimization becomes difficult as the delay slot length increases. The reason is that there are restrictions on the types of instructions that can be moved within the delay slot period. However, such an argument is established based on the assumption that the delay slot length is fixed. However, as mentioned above, the delay slot length (period)
When the variable is variable, on the contrary, the existence of a delay branch instruction having a long delay slot length reduces the restrictions on the instructions that can be moved within the long delay slot period. There are advantages.

その１つは前述したように、最適化を単純に行うことが
できることと、従来では最適化の対象とならなかった命
令列に対しても最適化を行なうことが可能となると云う
点である。今１つは、新たに導入した遅延スロット長が
可変な遅延分岐命令を含む命令列に対しても、前述した
新たな最適化の手法を一般化して適用できるという点で
ある。One of them is, as described above, that the optimization can be performed simply, and that it is possible to perform the optimization even for an instruction string that has not been conventionally targeted for optimization. The other is that the new optimization method described above can be generalized and applied to the newly introduced instruction sequence including a delayed branch instruction with a variable delay slot length.

この新たな最適化の手法の一般化した適用について説明
する。例えば前述した第４図（ａ）に示す最適化の一般
化について考えると、この場合には第７図（ａ）に示す
ように分岐元の遅延スロット長［Ｍ］の遅延分岐命令br
d（Ｍ）の分岐先が、遅延スロット長［Ｎ］の遅延分岐
命令brd（Ｎ）であるとして一般化することができる。A generalized application of this new optimization technique will be described. For example, considering the generalization of the optimization shown in FIG. 4A, in this case, as shown in FIG. 7A, the delayed branch instruction br having the delay slot length [M] of the branch source is br
The branch destination of d (M) can be generalized as a delay branch instruction brd (N) having a delay slot length [N].

しかしてこの場合には、分岐元の遅延分岐命令brd
（Ｍ）の分岐先を、分岐先の遅延分岐命令の分岐先が分
岐先であり、かつ分岐元の遅延分岐命令の遅延スロット
の値と分岐先の遅延分岐命令の遅延スロットの値との合
計を遅延スロット長とする遅延分岐命令に変更し（brd
（Ｍ） A→brd（Ｍ＋Ｎ） B）、更に、分岐先の遅延分
岐命令の後の命令Ｎ個を分岐元の遅延分岐命令の後のＭ
個の命令の後に複写することにより、第７図（ｂ）に示
すように最適化することが可能となる。この最適化に際
しては、その遅延スロット期間内に移動されるどの命令
もnop命令である必要はない。但し、命令＜opeM＞と命
令＜opeM＋１＞との間のインタロックについては注意す
る必要がある。However, in this case, the branch source delayed branch instruction brd
The branch destination of (M) is the sum of the value of the delay slot of the delayed branch instruction of the branch destination and the value of the delay slot of the delayed branch instruction of the branch destination of the branch destination of the delayed branch instruction of the branch destination. To the delay branch instruction with the delay slot length of (brd
(M) A → brd (M + N) B), and N instructions after the delayed branch instruction of the branch destination are M after the delayed branch instruction of the branch source.
By copying after each instruction, it becomes possible to optimize as shown in FIG. 7 (b). This optimization does not require that any instruction moved within that delay slot period be a nop instruction. However, it is necessary to pay attention to the interlock between the instruction <opeM> and the instruction <opeM + 1>.

同様に第８図（ａ）に示すような遅延スロット長が可変
な条件遅延分岐命令cbrd（Ｍ） cc,Aを含む命令列につ
いても第８図（ｂ）に示すように最適化することがで
き、第９図（ａ）に示すような遅延スロット長が可変な
条件遅延分岐命令cbrd^＊（Ｍ） cc,Aを含む命令列に
ついても第９図（ｂ）に示すように最適化することがで
きる。Similarly, as shown in FIG. 8 (b), the instruction sequence including the conditional delay branch instruction cbrd (M) cc, A whose delay slot length is variable can be optimized as shown in FIG. 8 (b). It is possible to optimize the instruction sequence including the conditional delay branch instruction cbrd ^* (M) cc, A whose delay slot length is variable as shown in FIG. 9 (a) as shown in FIG. 9 (b). You can

いずれの場合であっても、連続する命令＜opeM＞，＜op
eM＋１＞との間のインタロックについては十分に注意す
る必要があるが、条件分岐命令の前の位置に命令が移動
することが無いので、コンディション・コードが変わる
か否かについては注意する必要はない。In either case, consecutive instructions <opeM>, <op
It is necessary to pay sufficient attention to the interlock with eM + 1>, but since the instruction does not move to the position before the conditional branch instruction, it is necessary to pay attention to whether the condition code changes. Absent.

但し、このようにして最適化を進めると、分岐命令の遅
延スロット長が長くなる傾向がある。However, when the optimization is advanced in this way, the delay slot length of the branch instruction tends to be long.

然し乍ら、遅延スロット期間内におかれた命令列＜ope1
＞，＜ope2＞，…，＜opeM＋Ｎ＞をスケジューリング
し、その遅延スロット期間内から外部に出せるものにつ
いては、例えば遅延分岐命令の前の位置に移動すること
により、必要な遅延スロット長を或る程度短くすること
ができる。また遅延スロット期間内の長い命令列をその
まま新たな最適化の対象としたり、或いはスケジューリ
ングにより新たに最適化の対象となる命令列に変換する
ことも可能である。従って可変の遅延スロット長を持つ
分岐命令を導入しても、従来の最適化の手法をそのまま
一般化し、最適化の複雑さを解消して種々の命令列を効
果的に最適化して、その命令列を効率的に実行すること
が可能となる。However, the instruction sequence <ope1 placed within the delay slot period
, <Ope2>, ..., <opeM + N>, and those that can be output to the outside from within the delay slot period are, for example, moved to the position before the delay branch instruction to have a required delay slot length. It can be shortened. It is also possible to directly target a long instruction sequence within the delay slot period as a new optimization target or convert it into a new optimization target instruction sequence by scheduling. Therefore, even if a branch instruction with a variable delay slot length is introduced, the conventional optimization method is generalized as it is, the complexity of optimization is eliminated, various instruction sequences are effectively optimized, and It becomes possible to execute the column efficiently.

また第10図（ａ）は、命令＜ope1＞，＜ope2＞の間に、
インタロックを防止する為のnop命令を置き、その後に
条件分岐命令が出現する命令列の例を示している。この
場合には、例えば第10図（ｂ）に示すように条件分岐命
令をnop命令の位置に移動させ、命令＜ope1＞，＜ope2
＞の間にインタロックを防止する為にわざわざ設けられ
たnop命令を消し、スロットの無駄な使用をなくすこと
ができる。但し、このような無駄スロットの消去は、常
に可能であると云うものではなく、命令＜ope2＞，…，
＜opeM＋１＞がコンディション・コードを変えない場合
にだけ可能なものである。またこのようにして条件分岐
命令をnop命令の位置に移動させる場合、これによって
連続する命令＜opeM＋１＞，＜opeM＋２＞との間で新た
なインタロックが生じないように配慮する必要があるこ
とは勿論のことである。Further, FIG. 10 (a) shows that between the instructions <ope1> and <ope2>,
An example of an instruction string in which a nop instruction for preventing interlock is placed and a conditional branch instruction appears after that is shown. In this case, for example, as shown in FIG. 10B, the conditional branch instruction is moved to the position of the nop instruction, and the instructions <ope1>, <ope2
It is possible to eliminate the wasteful use of slots by deleting the nop instruction that was purposely provided to prevent interlock during>. However, it is not always possible to erase such a waste slot, and the instruction <ope2>, ...,
It is possible only when <opeM + 1> does not change the condition code. Further, when moving the conditional branch instruction to the position of the nop instruction in this way, it is necessary to take care so that a new interlock does not occur between the consecutive instructions <opeM + 1> and <opeM + 2>. Of course.

次に上述した可変遅延スロット長を持つ遅延分岐命令を
含む命令列を実行する電子計算機（マイクロプロセッ
サ）の命令フェッチ部について説明する。この命令フェ
ッチ部は、前述したように第１図に示す如く構成され
る。Next, the instruction fetch unit of the electronic computer (microprocessor) that executes the instruction sequence including the delayed branch instruction having the variable delay slot length described above will be described. This instruction fetch unit is configured as shown in FIG. 1 as described above.

第１図において命令プリフェッチ・レジスタ１は、キャ
ッシュバス２を介してプログラムメモリ（図示せず）か
ら、後述するプログラム・カウンタ３の制御の下で命令
を順次プリフェッチする。この命令プリフェッチ・レジ
スタ１にフェッチされた命令はデコーダ４によりデコー
ドされ、その命令が解釈される。しかしてその命令が可
変遅延スロット長を持つ遅延分岐命令の場合には、その
遅延スロット長の値が分岐カウンタ５に格納され、また
その分岐先アドレスが分岐先アドレス・レジスタ６に格
納される。また前記デコーダ４によりデコードされた命
令が通常の分岐命令の場合には、その分岐命令の分岐先
アドレスはそのままプログラム・カウンタ３に格納され
る。In FIG. 1, an instruction prefetch register 1 sequentially prefetches instructions from a program memory (not shown) via a cache bus 2 under the control of a program counter 3 described later. The instruction fetched in the instruction prefetch register 1 is decoded by the decoder 4 and the instruction is interpreted. If the instruction is a delayed branch instruction having a variable delay slot length, the value of the delay slot length is stored in the branch counter 5, and the branch destination address is stored in the branch destination address register 6. If the instruction decoded by the decoder 4 is a normal branch instruction, the branch destination address of the branch instruction is stored in the program counter 3 as it is.

尚、デコーダ４によりデコードされた命令が上述したよ
うな分岐命令以外の一般的な命令である場合には、前記
プログラム・カウンタ３は命令の実行に伴ってインクリ
メントされるだけである。When the instruction decoded by the decoder 4 is a general instruction other than the branch instruction described above, the program counter 3 is only incremented as the instruction is executed.

さて前記分岐カウンタ５は、そこに格納された遅延スロ
ット長を示す位置が［１］以上ならば、命令の実行に伴
うクロックを受けて［０］になるまでデクリメントす
る。つまり遅延分岐命令のデコードによりその遅延スロ
ット長を示す値が分岐カウンタ５が格納されると、命令
の実行に伴うスロットの進行に伴って分岐カウンタ５に
格納された値がデクリメントされ、遅延分岐命令によっ
て設定された遅延スロットの残りスロット期間が示され
るようになっている。しかしてこの分岐カウンタ５の値
は比較器７にて設定値［１］と比較され、その一致が検
出されたとき遅延スロット期間の終了として判定されて
いる。この時点で前記分岐先アドレス・レジスタ６に格
納されている値（分岐先アドレス）が前記プログラム・
カウンタ３に移される。When the position indicating the delay slot length stored therein is [1] or more, the branch counter 5 receives the clock associated with the execution of the instruction and decrements until it reaches [0]. That is, when the value indicating the delay slot length is stored in the branch counter 5 by decoding the delayed branch instruction, the value stored in the branch counter 5 is decremented as the slot progresses as the instruction is executed, and the delayed branch instruction is decremented. The remaining slot period of the delay slot set by is shown. Then, the value of the branch counter 5 is compared with the set value [1] by the comparator 7, and when the coincidence is detected, it is determined as the end of the delay slot period. At this point, the value (branch destination address) stored in the branch destination address register 6 is the program
It is moved to the counter 3.

セレクタ８は、常時は前記プログラム・カウンタ３の値
を選択して前記プログラム・メモリからの命令の読み出
しを制御し、上述した如く遅延スロット期間の終了が検
出されたときに前記分岐先アドレス・レジスタ６に格納
されている値（分岐先アドレス）を選択することで、命
令の分岐を実現している。The selector 8 normally selects the value of the program counter 3 to control the reading of an instruction from the program memory, and when the end of the delay slot period is detected as described above, the branch destination address register. By selecting the value (branch destination address) stored in 6, the instruction branch is realized.

つまりこの命令フェッチ部では、遅延分岐命令が与えら
れたとき、その遅延分岐命令の遅延スロット長を分岐カ
ウンタ５により計測することで、遅延分岐命令の実行を
上記遅延スロット長分だけ意図的に送らせるように構成
されている。That is, in this instruction fetch unit, when a delayed branch instruction is given, the delay slot length of the delayed branch instruction is measured by the branch counter 5 to intentionally send the execution of the delayed branch instruction by the delay slot length. Is configured to let.

かくしてこのように構成された命令フェッチ部によれ
ば、例えば第３図（ａ）に示すように遅延分岐命令をデ
コードしたのち、その分岐先の命令を実行するまでに複
数スロットに亘る遅延期間が設定されるので、この遅延
スロット期間を利用して他の複数の命令を円滑に実行す
ることができる。Thus, according to the instruction fetch unit configured as described above, for example, as shown in FIG. 3A, after a delayed branch instruction is decoded, there is a delay period over a plurality of slots before the instruction at the branch destination is executed. Since the delay slot period is set, it is possible to smoothly execute a plurality of other instructions by using this delay slot period.

ちなみに従来では、遅延分岐命令の遅延スロット長が
［１］に固定されているので、第３図（ｂ）に示すよう
にその遅延スロット期間（１スロット）内に高々１つの
命令しか実行することができない。この結果、前述した
ように命令コードの最適化を行う場合、その制約から種
々の不具合が生じ、例えば或る命令を遅延分岐命令の実
行前に実行するようにその命令を移動させる等の不具合
があった。この点、可変長の遅延分岐命令を用いる実施
例計算機によれば、遅延分岐命令の分岐先の命令を実行
するフェーズを意図的に遅らせることができるので、そ
の遅延スロット期間を利用して、必要な複数の命令を効
率的に実行することが可能となる。この結果、命令列の
最適化を行う場合でも、これを容易になすことが可能と
なる。Incidentally, in the prior art, since the delay slot length of the delayed branch instruction is fixed to [1], only at most one instruction is executed within the delay slot period (1 slot) as shown in FIG. 3 (b). I can't. As a result, in the case of optimizing the instruction code as described above, various problems occur due to the restriction, and for example, a problem such as moving an instruction so that it is executed before the execution of the delayed branch instruction occurs. there were. In this respect, according to the embodiment computer using the variable-length delayed branch instruction, the phase in which the instruction of the branch destination of the delayed branch instruction is executed can be intentionally delayed. Multiple instructions can be efficiently executed. As a result, this can be easily done even when optimizing the instruction sequence.

尚、上述した命令コードの最適化を行うに際しては、コ
ンディション・コードを変える算術演算命令や論理演算
命令とは別に、コンディション・コードを変えない算術
演算命令や論理演算命令を用意しておくことも有用であ
る。このようにすれば、例えば条件判定の必要がない演
算の場合、上記コンディション・コードを変えない命令
を用いることにより、その最適化の可能性を更に大きく
することが可能となる。When optimizing the instruction code described above, it is possible to prepare an arithmetic operation instruction or a logical operation instruction that does not change the condition code, in addition to the arithmetic operation instruction or the logical operation instruction that changes the condition code. It is useful. By doing so, for example, in the case of an operation that does not require the condition determination, it is possible to further increase the optimization possibility by using the instruction that does not change the condition code.

またコンディション・コードを変えない算術演算命令や
論理演算命令を別に用意することは、条件分岐命令の遅
延スロット内の命令を上記条件分岐命令の前に移動し得
る可能性を高くするので、最適化の自由度を更に大きく
すると云える。In addition, preparing an arithmetic operation instruction or a logical operation instruction that does not change the condition code increases the possibility that the instruction in the delay slot of the conditional branch instruction can be moved before the conditional branch instruction. It can be said that the degree of freedom of is further increased.

尚、本発明は上述した実施例に限定されるものではな
い。例えば遅延スロット長に関する情報を遅延分岐命令
に持たせる手法としては、遅延分岐命令の命令コード内
に遅延スロット長を直接指定するためのフィールドを用
意したり、或いは遅延スロット長の異なる複数の遅延分
岐命令をそれぞれ独立した命令として用意しておくよう
にしても良い。この場合には、遅延スロット長に関する
情報を格納する為の分岐カウンタ５を省略することが可
能である。分岐カウンタ５を省略する場合には、例えば
遅延スロット長に関する情報をプロセッサの制御部の論
理として組み込んでおくようにすれば良い。また遅延ス
ロット長に関する情報をレジスタやメモリに格納してお
き、レジスタ番号やメモリ・アドレスを指定するするこ
とで、その制御を行うようにしても良い。その他、本発
明はその要旨を逸脱しない範囲で種々変形して実施する
ことができる。The present invention is not limited to the above embodiment. For example, as a method of giving information about the delay slot length to the delay branch instruction, a field for directly designating the delay slot length is prepared in the instruction code of the delay branch instruction, or a plurality of delay branch instructions having different delay slot lengths are prepared. The instructions may be prepared as independent instructions. In this case, the branch counter 5 for storing the information about the delay slot length can be omitted. When the branch counter 5 is omitted, for example, information about the delay slot length may be incorporated as the logic of the control unit of the processor. Alternatively, the information about the delay slot length may be stored in a register or a memory, and the control may be performed by designating a register number or a memory address. In addition, the present invention can be variously modified and implemented without departing from the scope of the invention.

［発明の効果］以上詳述したように本発明によれば、遅延分岐命令の分
岐先が遅延分岐命令である場合や、遅延分岐命令より数
ステップ前にnop命令がある場合のように、従来では最
適化が複雑であった命令列に対しても、これを簡易に最
適化して効率的に命令を実行することが可能となる。し
かも最適化の際の制約条件を大幅に緩和することがで
き、従来では最適化の対象とならなかった幾つかの命令
列に対しても、その最適化を行なうことを可能とする等
の実用上多大なる効果が奏せられる。[Effects of the Invention] As described in detail above, according to the present invention, the delay branch instruction has a branch destination of a delay branch instruction or a nop instruction several steps before the delay branch instruction. Thus, even for an instruction sequence that has been complicated to optimize, it is possible to easily optimize the instruction sequence and efficiently execute the instruction. Moreover, the constraint conditions for optimization can be significantly relaxed, and it is possible to perform optimization on some instruction sequences that were not previously targeted for optimization. Great effect can be achieved.

[Brief description of drawings]

第１図は本発明の一実施例に係る遅延分岐命令を実行す
る命令フェッチ部の構成例を示す図、第２図は本発明に
おける遅延分岐命令を含む命令列の実行順序を示す図、
第３図は遅延分岐命令のパイプライン処理を表す図、第
４図乃至第10図はそれぞれ本発明に係る遅延分岐命令を
含む命令列の最適化の手法を示す図である。また第11図
乃至第14図はそれぞれ従来の最適化の手法を示す図、第
15図はインタロックの発生とその回避の状況を示す図で
ある。１……命令プリフェッチ・レジスタ、２……キャッシュ
・バス、３……プログラム・カウンタ、４……デコー
ダ、５……分岐カウンタ、６……分岐先アドレス・レジ
スタ、７……比較器、８……セレクタ。FIG. 1 is a diagram showing a configuration example of an instruction fetch unit for executing a delayed branch instruction according to an embodiment of the present invention, and FIG. 2 is a diagram showing an execution sequence of an instruction string including a delayed branch instruction in the present invention,
FIG. 3 is a diagram showing pipeline processing of a delayed branch instruction, and FIGS. 4 to 10 are diagrams showing a method of optimizing an instruction string including a delayed branch instruction according to the present invention. Further, FIGS. 11 to 14 are diagrams showing a conventional optimization method and FIG.
FIG. 15 is a diagram showing a situation of occurrence and avoidance of interlock. 1 ... Instruction prefetch register, 2 ... Cache bus, 3 ... Program counter, 4 ... Decoder, 5 ... Branch counter, 6 ... Branch destination address register, 7 ... Comparator, 8 ... …selector.

Claims

[Claims]

1. An instruction prefetch register for sequentially acquiring instructions, a decoder for interpreting the acquired instructions, and a plurality of delay branch instructions having different delay slot lengths or different delay slot lengths for the interpreted instructions. In the case of the delayed branch instruction, the branch destination address register for storing the branch destination address of the branch instruction, and the interpreted instruction are a delay branch instruction having a variable delay slot length or a plurality of delays having different delay slot lengths. In the case of a branch instruction, a value of the delay slot length of the branch instruction is stored, and a branch counter that decreases the stored value of the delay slot length according to a clock signal for synchronizing the instruction, and a delay slot period The set value for determining the end is compared with the value stored in the branch counter, and if there is a match by the comparison, the delay slot period A comparator for detecting the end of the interval, and if the instruction decoded by the decoder is a branch instruction other than the delayed branch instruction, the branch destination address of the branch instruction is stored and the instruction decoded by the decoder branches. If the instruction is not an instruction, the address next to the stored address is stored, and if the end of the delay slot period is detected, the address stored in the branch destination address register is stored. A counter and a value stored in the branch destination address register when the end of the delay slot period is detected, and a value of the program counter when the end of the delay slot period is not detected. , The branch to which the unconditional branch instruction branches is unconditional When the instruction is an instruction, the branch destination of the unconditional branch instruction of the branch source is changed to a delayed branch instruction in which the branch destination of the unconditional branch instruction of the branch destination is the branch destination and the delay slot length is [2], Further, the instruction sequence immediately after the unconditional branch instruction at the branch destination is copied after the instruction immediately after the unconditional branch instruction at the branch source, and the instruction sequence is the first instruction, no for preventing interlock.
p instruction, second instruction, ..., M + 1th (M is an integer)
When the unconditional branch instruction and the unconditional branch instruction are arranged in this order, the nop instruction becomes a delayed branch instruction whose branch destination is the unconditional branch instruction and whose delay slot length is [M]. The modification is further performed, and optimization is performed to delete the unconditional branch instruction. The branch destination of the delay branch instruction having the delay slot length [M] is the delay branch instruction having the delay slot length [N] (N is an integer). , The branch source delayed branch instruction is the branch destination of the branch destination delayed branch instruction, and the delay slot value of the branch source delayed branch instruction and the delay slot of the branch destination delayed branch instruction Optimal to change to a delayed branch instruction having a sum of the value and the delay slot length, and to copy N instructions after the delayed branch instruction at the branch destination after M instructions after the delayed branch instruction at the branch source A pipeline mechanism characterized by An electronic calculator to have.