JP3182438B2

JP3182438B2 - Data processor

Info

Publication number: JP3182438B2
Application number: JP28103091A
Authority: JP
Inventors: 進成田; 文男荒川; 邦男内山; 郭和青木
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1991-10-28
Filing date: 1991-10-28
Publication date: 2001-07-03
Anticipated expiration: 2016-07-03
Also published as: KR100259306B1; US5454087A; KR930008615A; JPH05120013A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は分岐命令を実行する機能
と命令をプリフェッチする機能を共に有するデータプロ
セッサに関わり、特に分岐履歴情報に対して命令プリフ
ェッチ機能を連動させることにより、高速に分岐処理を
実行するデータプロセッサに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processor having both a function of executing a branch instruction and a function of prefetching an instruction. In particular, the present invention relates to a branch processing at high speed by linking an instruction prefetch function to branch history information. A data processor that performs

【０００２】[0002]

【従来の技術】従来、分岐処理の高速化のために、分岐
の履歴情報として分岐命令のアドレスと分岐先命令のア
ドレスをバッファに格納しておき、命令プリフェッチ時
にプリフェッチアドレスを入力として履歴情報を検索
し、先行的に分岐するデータ装置について、特開平１−
２４０９３１号公報に記載がある。2. Description of the Related Art Conventionally, in order to speed up branch processing, the address of a branch instruction and the address of a branch destination instruction are stored in a buffer as history information of a branch, and the history information is input using a prefetch address at the time of instruction prefetch. Japanese Patent Laid-Open Publication No. Hei.
No. 2,093,131.

【０００３】また従来、同じく分岐処理の高速化のため
に、分岐の履歴情報として分岐命令に先行する命令のア
ドレスと分岐先命令のアドレスをバッファに格納してお
き、命令解読時に命令アドレスを入力として履歴情報を
検索し、無条件分岐命令の実行処理を削除することで高
速に分岐するデータプロセッサについて、特開平２−１
６６５２０号公報に記載がある。Conventionally, in order to speed up branch processing, the address of an instruction preceding a branch instruction and the address of a branch destination instruction are stored in a buffer as branch history information, and the instruction address is input when the instruction is decoded. Japanese Patent Application Laid-Open No. Hei 2-1 describes a data processor that searches for history information and branches at high speed by deleting the execution of unconditional branch instructions.
No. 66520.

【０００４】[0004]

【発明が解決しようとする課題】上記従来技術のいずれ
も、主に無条件分岐命令の分岐処理を高速化するもので
あるが、無条件分岐命令の一つであるサブルーチンから
のリターン命令（ｒｔｓ命令）の分岐処理には対応でき
ないことが本願発明者等の検討により明らかとされた。
これは以下のような理由による。In each of the above-mentioned prior arts, mainly for speeding up branch processing of an unconditional branch instruction, a return instruction (rts) from a subroutine which is one of the unconditional branch instructions is used. The inventors of the present application have clarified that they cannot cope with the branch processing of the instruction).
This is for the following reasons.

【０００５】上記従来技術はいずれも、分岐命令の各々
は常に分岐先アドレスが不変であるという仮定に立っ
て、分岐の履歴情報が次の分岐処理にも役に立つと考え
ている。しかしながら、サブルーチンからのリターン命
令の場合にはその仮定が成り立たない。なぜならば、リ
ターン命令はサブルーチンからコール元のルーチンへの
リターンに用いられる命令であるため、コール元が異な
れば当然、リターン先のアドレスも異なるものとなるか
らである。[0005] All of the above prior arts believe that the branch history information is useful for the next branch processing on the assumption that each branch instruction always has the same branch destination address. However, in the case of a return instruction from a subroutine, that assumption does not hold. This is because the return instruction is an instruction used for returning from the subroutine to the routine of the caller, and therefore, if the caller is different, the address of the return destination is naturally different.

【０００６】サブルーチンのコールとリターンの処理
は、基本的には次のように行う。The subroutine call and return processes are basically performed as follows.

【０００７】まず、コール元のルーチンではサブルーチ
ンコール命令を実行する。サブルーチンコール命令（ｂ
ｓｒ命令）は、まずリターン先アドレスを計算し、スタ
ックと呼ばれるメモリ上のＬＩＦＯキュー（ソフトウェ
アで作成する）に、リターンアドレスをストアする。そ
して、そのサブルーチンコール命令（ｂｓｒ命令）によ
って、サブルーチンへと分岐する。続いてサブルーチン
での処理を実行し、最後にリターン命令を実行する。リ
ターン命令はまずスタックからリターンアドレスを読み
出し、続いてそのリターンアドレスへと分岐することで
コール元のルーチンにリターンする。First, the calling routine executes a subroutine call instruction. Subroutine call instruction (b
The sr instruction) first calculates a return destination address and stores the return address in a LIFO queue (created by software) on a memory called a stack. Then, a branch is made to a subroutine by the subroutine call instruction (bsr instruction). Subsequently, the subroutine is executed, and finally, a return instruction is executed. The return instruction first reads the return address from the stack, and then returns to the calling routine by branching to the return address.

【０００８】このように、リターンアドレスはサブルー
チンコール命令（ｂｓｒ命令）のアドレスによって定ま
るものであって、一つのサブルーチンをコールするサブ
ルーチンコール命令（ｂｓｒ命令）が複数ある場合に
は、リターン命令（ｒｔｓ命令）に対応したリターンア
ドレスは一意には定まらない。As described above, the return address is determined by the address of the subroutine call instruction (bsr instruction). When there are a plurality of subroutine call instructions (bsr instructions) for calling one subroutine, the return instruction (rts The return address corresponding to (instruction) is not uniquely determined.

【０００９】従って、本発明の目的とするところは、高
速に分岐処理を実行することが可能であるとともに、無
条件分岐命令の一つであるリターン命令の分岐処理に対
応することの可能なデータプロセッサを提供することに
ある。Accordingly, it is an object of the present invention to provide a data processing apparatus capable of executing branch processing at high speed and capable of coping with branch processing of a return instruction which is one of unconditional branch instructions. It is to provide a processor.

【００１０】[0010]

【課題を解決するための手段】上記目的を解決するため
には、次の二つの手段が採用される。In order to achieve the above object, the following two means are employed.

【００１１】（１）第１のバッファに、データプロセッ
サが一度実行した分岐命令のアドレス、分岐先アドレ
ス、分岐命令の種類を示す分岐履歴情報を格納する。(1) The first buffer stores the address of the branch instruction once executed by the data processor, the branch destination address, and the branch history information indicating the type of the branch instruction.

【００１２】（２）第２のバッファにサブルーチンから
のリターンアドレスを格納する。(2) The return address from the subroutine is stored in the second buffer.

【００１３】[0013]

【作用】第１のバッファには、データプロセッサが一度
実行した分岐命令のアドレス、分岐先アドレス、分岐命
令の種類を示す分岐履歴情報が格納されている。従っ
て、同一の分岐命令をデータプロセッサが実行するに際
して、第１のバッファから分岐先アドレスが高速に読み
出され、次命令プリフェッチアドレスが生成されること
ができる。The first buffer stores the address of the branch instruction once executed by the data processor, the branch destination address, and the branch history information indicating the type of the branch instruction. Therefore, when the data processor executes the same branch instruction, the branch destination address can be read from the first buffer at a high speed, and the next instruction prefetch address can be generated.

【００１４】一方、リターンアドレスを格納する第２の
バッファは、サブルーチンコール命令（ｂｓｒ命令）を
実行する際に書き込みを行い、リターン命令（ｒｔｓ命
令）がプリフェッチされたときに読み出しを行うＬＩＦ
Ｏ（ラストイン・ファーストアウト）キューである。
尚、リターン命令（ｒｔｓ命令）を実行するまで、第２
のバッファのポインタの更新は行わない。言い替える
と、第２のバッファはメモリ上にあるスタックの一部
（リターンアドレス）のコピーを格納するキャッシュメ
モリである。On the other hand, the second buffer for storing the return address performs writing when a subroutine call instruction (bsr instruction) is executed, and performs reading when the return instruction (rts instruction) is prefetched.
This is an O (last in first out) queue.
Note that until the return instruction (rts instruction) is executed, the second
Does not update the buffer pointer. In other words, the second buffer is a cache memory that stores a copy of a part (return address) of the stack in the memory.

【００１５】リターン命令がプリフェッチされたことの
検出は、分岐履歴情報を格納する第１のバッファを用い
て行う。すなわち、リターン命令（ｒｔｓ命令）もまた
他の分岐命令（ｂｒａ命令、ｂｓｒ命令）と同様に、分
岐履歴情報をこの第１のバッファに登録する。但し、リ
ターン命令の場合はその検出にのみ第１のバッファを使
用し、分岐先アドレスのフィールドは使用しない。代り
に、上記のリターンアドレスを格納する第２のバッファ
から、分岐先アドレスを得る。The detection of the prefetch of the return instruction is performed using the first buffer storing the branch history information. That is, the return instruction (rts instruction) also registers the branch history information in the first buffer, like the other branch instructions (bra instruction, bsr instruction). However, in the case of a return instruction, the first buffer is used only for its detection, and the field of the branch destination address is not used. Instead, the branch destination address is obtained from the second buffer storing the return address.

【００１６】リターン命令を他の分岐命令と共に、同一
の分岐履歴情報を格納する第１のバッファに登録するた
め、分岐先アドレスとしていずれのバッファのそれを用
いるかを判断するための情報が必要になる。この問題を
解決するため、分岐履歴情報を格納する第１のバッファ
内に、分岐命令の種類を示す情報（分岐命令タイプ）を
格納している。Since the return instruction is registered together with the other branch instructions in the first buffer storing the same branch history information, information for determining which buffer is used as the branch destination address is required. Become. To solve this problem, information indicating the type of branch instruction (branch instruction type) is stored in the first buffer storing the branch history information.

【００１７】また別の解決方法として、単純に第１と第
２のバッファのいずれの分岐先アドレスを使用するかを
示す情報（選択候補が二つの場合には１ビット）を格納
してもいい。As another solution, information indicating which branch destination address of the first and second buffers is to be used (1 bit if there are two selection candidates) may be stored. .

【００１８】また第１と第２のいずれのバッファの分岐
先アドレスを用いるかを示す情報は、必ずしも分岐命令
履歴情報を格納する第１バッファ内に持つ必要はない。
データプロセッサの制御回路内に、同等の情報を格納す
ることでも、同じ効果を期待できる。The information indicating which of the first and second buffers uses the branch destination address does not necessarily need to be stored in the first buffer for storing the branch instruction history information.
The same effect can be expected by storing equivalent information in the control circuit of the data processor.

【００１９】また分岐履歴情報の検索に、プリフェッチ
アドレスではなく命令アドレスを使用することも可能で
ある。但しこの方法では、以下の実施例に示すほどの分
岐高速化の効果は期待できない。It is also possible to use an instruction address instead of a prefetch address to search for branch history information. However, in this method, the effect of speeding up branching as shown in the following embodiments cannot be expected.

【００２０】また分岐履歴情報の検索に特開平２−１６
６５２０号公報のように、分岐命令に先行する命令のア
ドレスを使用することも可能である。マイクロプロセッ
サのアーキテクチャによっては、一命令以上の先行を有
する場合もある。Japanese Patent Laid-Open No. 2-16 / 1990 discloses a search for branch history information.
It is also possible to use the address of the instruction preceding the branch instruction, as in JP 6520. Some microprocessor architectures have more than one instruction precedence.

【００２１】[0021]

【実施例】図１は本発明の一実施例であるマイクロプロ
セッサのブロック図である。本発明は命令プリフェッチ
時の分岐処理を高速化する手法であるため、ここでは命
令プリフェッチ部を中心に説明する。FIG. 1 is a block diagram of a microprocessor according to an embodiment of the present invention. Since the present invention is a technique for speeding up branch processing at the time of instruction prefetch, the following description focuses on the instruction prefetch unit.

【００２２】１．マイクロプロセッサの内部構造図１を用いてマイクロプロセッサの内部構造を説明す
る。図１のマイクロプロセッサの各構成要素の機能は次
のとおりである。 1. The internal structure of the microprocessor will be described with reference to FIG. The function of each component of the microprocessor of FIG. 1 is as follows.

【００２３】ＰＡＧ１０１プリフェッチアドレス発
生器（加算器）。ＢＷ１０２分岐命令用バッファ。分岐先アドレス
を保持する。ＲＢ１０３リターンバッファ。リターンアドレス
を保持する。ＩＣ１０４命令キャッシュ。ＰＦＱ１１１プリフェッチキュー。ＰＣＱ１２１命令アドレスキュー。ＰＡＧで生成し
た命令アドレスを保持する。ＩＤ１１３命令デコーダ。ＲＦ１１４レジスタファイル。ＡＬＵ１１７整数論理演算器。ＯＣ１１９オペランドキャッシュ。上記構成要素のうち、ＰＡＧ、ＢＷ、ＲＢ、ＩＣ、ＰＦ
Ｑ、ＰＣＱが命令プリフェッチ部に含まれる。以下順に
各構成要素を説明する。PAG 101 Prefetch address generator (adder). BW 102 Branch instruction buffer. Holds the branch destination address. RB 103 return buffer. Holds return address. IC 104 instruction cache. PFQ 111 prefetch queue. PCQ 121 Instruction address queue. Holds the instruction address generated by PAG. ID 113 Instruction decoder. RF 114 register file. ALU 117 Integer logical operation unit. OC 119 Operand cache. Among the above components, PAG, BW, RB, IC, PF
Q and PCQ are included in the instruction prefetch unit. Hereinafter, each component will be described in order.

【００２４】２９ビット加算器であるＰＡＧ１０１は、
プリフェッチアドレスを生成する。プログラムを逐次実
行する場合、すなわち分岐時以外には、一回プリフェッ
チする毎にプリフェッチアドレスに固定値を加算して次
のプリフェッチアドレスを生成する。加算する固定値は
一度にプリフェッチする命令のバイト幅と等しく、外部
メモリとＩＣ（命令キャッシュ）間のデ−タ線幅が８バ
イトの場合、加算値は８である。加算器への入力の一方
は信号線１１０であり、これにはプリフェッチアドレス
の値が出力されている。今一方の値は固定値であり、本
実施例では値８（ＬＳＢへのキャリー１ビットで値８を
表現できる）である。これらの加算結果は信号線１０５
に出力される。The PAG 101, which is a 29-bit adder,
Generate a prefetch address. When the program is executed sequentially, that is, except at the time of branching, a fixed value is added to the prefetch address each time the prefetch is performed once to generate the next prefetch address. The fixed value to be added is equal to the byte width of the instruction to be prefetched at one time. When the data line width between the external memory and the IC (instruction cache) is 8 bytes, the added value is 8. One of the inputs to the adder is a signal line 110 to which the value of the prefetch address is output. The other value is a fixed value, and in this embodiment, the value is 8 (the value 8 can be expressed by 1 bit of carry to LSB). The result of these additions is
Is output to

【００２５】分岐命令用バッファＢＷ１０２は、分岐命
令の履歴すなわち分岐命令のアドレスと分岐先アドレ
ス、さらに分岐命令タイプを一組にして履歴情報として
記憶しておく。命令プリフェッチ時にはプリフェッチア
ドレスと履歴情報中の分岐命令アドレスを比較し、一致
した場合には分岐先アドレスと分岐命令タイプを出力す
る。The branch instruction buffer BW 102 stores a history of branch instructions, that is, a set of a branch instruction address and a branch destination address, and a branch instruction type as history information. At the time of instruction prefetch, the prefetch address is compared with the branch instruction address in the history information, and if they match, the branch destination address and the branch instruction type are output.

【００２６】リターンバッファＲＢ１０３は、分岐命令
の一つであるリターン命令用にリターンアドレスを保持
するＬＩＦＯ（ラストイン・ファーストアウト）キュー
である。The return buffer RB103 is a LIFO (last-in / first-out) queue for holding a return address for a return instruction which is one of branch instructions.

【００２７】命令キャッシュＩＣ１０４は、信号線１１
０を通してプリフェッチアドレス（２９ビット）を入力
し、対応する命令（６４ビット）をキャッシュから読み
出し、信号線１０８に出力する。入力したアドレスに対
応する命令がキャッシュＩＣ１０４内にない場合には、
外部メモリアクセスを起動し、信号線１２０（６４ビッ
ト）を通して命令が外部メモリから読み出されたキャッ
シュＩＣ１０４に書き込まれる。The instruction cache IC 104 is connected to the signal line 11
The prefetch address (29 bits) is input through 0, the corresponding instruction (64 bits) is read from the cache, and output to the signal line 108. If the instruction corresponding to the input address is not in the cache IC 104,
The external memory access is activated, and the instruction is written to the cache IC 104 read from the external memory via the signal line 120 (64 bits).

【００２８】プリフェツチキューＰＦＱ１１１はプリフ
ェッチした命令を保持するＦＩＦＯ（ファーストイン・
ファーストアウト）キューの機能と、命令を整列する機
能（本実施例では６４ビットから１６ビットに整列す
る）を持つ。入力は信号線１０８（６４ビット幅）、出
力は信号線１１２（１６ビット幅）である。The prefetch queue PFQ 111 is a FIFO (first-in-first-out) for holding prefetched instructions.
A first-out) queue function and a function of aligning instructions (aligning from 64 bits to 16 bits in this embodiment). The input is the signal line 108 (64-bit width), and the output is the signal line 112 (16-bit width).

【００２９】命令アドレスキューＰＣＱ１２１は、ＰＡ
Ｇ１０１で生成したプリフェッチアドレスを保持するＦ
ＩＦＯキューである。入力は３入力セレクタ１０９の出
力信号１１０（２９ビット）であり、出力は信号線１２
２（２９ビット）である。The instruction address queue PCQ121 has a PA
F holding the prefetch address generated in G101
This is an IFO queue. The input is the output signal 110 (29 bits) of the three-input selector 109, and the output is the signal line 12
2 (29 bits).

【００３０】命令デコーダＩＤ１１３には、ＰＦＱ１１
１から信号線１１２（１６ビット）を通して命令が入力
される。入力される各命令の長さは１６ビット単位で整
列された状態であり、命令のデコード結果は制御線を通
して各構成要素に接続される。ただし図１では制御信号
線を省略してある。The instruction decoder ID 113 includes a PFQ 11
An instruction is input from 1 through a signal line 112 (16 bits). The length of each input instruction is aligned in units of 16 bits, and the decoded result of the instruction is connected to each component through a control line. However, in FIG. 1, the control signal lines are omitted.

【００３１】レジスタファイルＲＦ１１４は、本実施例
では３２ビット幅のレジスタ１６本から構成され、入力
ポートを一つ、出力ポートを二つ持っている。かつこれ
らの三つのポートは同時に動作可能である。各ポートの
ビット幅は３２ビットであり、それぞれ信号線１２１、
１１５、１１６に接続されている。In this embodiment, the register file RF114 is composed of 16 registers each having a 32-bit width, and has one input port and two output ports. And these three ports can operate simultaneously. The bit width of each port is 32 bits, and the signal lines 121,
115, 116 are connected.

【００３２】３２ビット幅の整数論理演算器ＡＬＵ１１
７の入力はレジスタファイルから出力された信号線１１
５、１１６であり、演算結果を信号線１１８に出力す
る。本実施例ではデータの演算とアドレスの計算を共に
ＡＬＵ１１７で処理する。そのため、信号線１１８には
データまたはアドレスの何れかが出力される。A 32-bit integer logical operation unit ALU11
7 is the signal line 11 output from the register file
5, 116, and outputs the operation result to the signal line 118. In this embodiment, the ALU 117 processes both data operation and address calculation. Therefore, either data or address is output to the signal line 118.

【００３３】オペランドキャッシュＯＣ１１９は、オペ
ランドをフェッチする場合には、信号線１１８にアドレ
スを入力としてアクセスされ、結果として得られたデー
タを信号線１２１に出力してレジスタファイル１１４に
値を転送する。オペランドストアの場合には、まず最初
のサイクルでストアアドレスをＡＬＵ１１７からＯＣ１
１９に信号線１１８を用いて転送し、そのアドレスはＯ
Ｃ１１９内に保持する。続く次のサイクルでは信号線１
１８を用いてストアデータを転送し、ＯＣ１１９及び外
部メモリへのオペランドストア処理を行う。また、オペ
ランドフェッチの際にアクセスしたデータがオペランド
キャッシュ１１９内に無い場合には、外部メモリアクセ
スを起動し、外部メモリからＯＣ１１９に信号線１２０
を通してオペランドの転送を行う。When fetching an operand, operand cache OC 119 is accessed by inputting an address to signal line 118, outputs the resulting data to signal line 121, and transfers the value to register file 114. In the case of operand store, first, the store address is transferred from ALU 117 to OC1 in the first cycle.
19, using a signal line 118, and its address is O
It is held in C119. In the following next cycle, signal line 1
18 to transfer the store data, and perform an operand store process to the OC 119 and the external memory. If the data accessed at the time of operand fetch is not in the operand cache 119, an external memory access is started and the signal line 120 is sent from the external memory to the OC 119.
Transfer of operands through

【００３４】２．パイプライン処理の流れ図２から図６を用いて、図１の本実施例のマイクロプロ
セッサのパイプライン処理の流れを説明する。[0034] 2. From the flow diagram 2 of the pipeline processing with reference to FIG. 6, the flow of the pipeline processing of the microprocessor of this embodiment of FIG.

【００３５】２．１分岐の無い時のパイプラインの流
れ図２は分岐が無い時ときのパイプラインの処理の流れを
示している。横軸は時間であり、ｔ０、ｔ１、・・・の
それぞれが１サイクルである。縦軸にはパイプラインの
各ステージの処理をとっている。 2.1 Pipeline Flow without Branch
It is Figure 2 shows a flow of pipeline processing when when the branch is not. The horizontal axis is time, and each of t0, t1,... Is one cycle. The vertical axis represents the processing of each stage of the pipeline.

【００３６】ステージｉは命令プリフェッチステージで
あり、図１のＰＡＧ１０１、ＢＷ１０２、ＲＢ１０３、
ＩＣ１０４、ＰＦＱ１１１、ＰＣＱ１２１の処理が含ま
れる。例えば図２において、時刻ｔ０では命令１のプリ
フェッチが行われ、ＰＦＱ１１１から信号線１１２を通
して命令１が次の処理ステージ（命令デコード）に転送
される。尚、図２中の命令読み出しはすべて命令キャッ
シュＩＣ１０４から行われると仮定している。The stage i is an instruction prefetch stage, and includes PAG101, BW102, RB103,
The processing of the IC 104, the PFQ 111, and the PCQ 121 is included. For example, in FIG. 2, at time t0, the instruction 1 is prefetched, and the instruction 1 is transferred from the PFQ 111 to the next processing stage (instruction decode) via the signal line 112. It is assumed that all the instruction readings in FIG. 2 are performed from the instruction cache IC 104.

【００３７】ステージｄは命令デコードステージであ
り、図１のＩＤ１１３、ＲＦ１１４の処理が含まれる。
図２の時刻ｔ１において命令１は命令デコーダＩＤ１１
３でデコードされ、続いてデコード結果に基づいてレジ
スタファイルＲＦ１１４からの読み出しを行う。読み出
されたデータは信号線１１５、１１６に出力される。Stage d is an instruction decode stage, and includes the processing of ID 113 and RF 114 in FIG.
At time t1 in FIG. 2, instruction 1 is an instruction decoder ID11.
3 and then read from the register file RF114 based on the decoding result. The read data is output to signal lines 115 and 116.

【００３８】ステージｅは演算及びアドレス計算ステー
ジであり、図１のＡＬＵ１１７の処理が含まれる。図２
の時刻ｔ２において命令１は命令デコーダＩＤ１１３か
らの制御に基づいて、ＡＬＵ１１７を用いて演算を実行
する。入力は信号線１１５、１１６であり、結果を信号
線１１８に出力する。The stage e is an operation and address calculation stage, and includes the processing of the ALU 117 in FIG. FIG.
At time t2, the instruction 1 executes an operation using the ALU 117 based on the control from the instruction decoder ID 113. Inputs are signal lines 115 and 116, and the result is output to a signal line 118.

【００３９】ステージａはオペランドアクセスステージ
であり、図１のＯＣ１１９の処理が含まれる。図２の時
刻ｔ３において命令１は命令デコーダ１１３からの制御
に基づいて、オペランドアクセス処理を実行する。実行
する処理は次の３種類である。Stage a is an operand access stage, which includes the processing of OC 119 in FIG. At time t3 in FIG. 2, instruction 1 executes an operand access process based on the control from the instruction decoder 113. The processes to be executed are the following three types.

【００４０】（１）オペランドフェッチ（ＯＣ１１９内
にフェッチすべきデータが無い場合には、外部メモリか
らＯＣ１１９へデータを転送後、再度ＯＣ１１９からフ
ェッチ処理を実行する。）（２）オペランドストア（ＯＣ１１９と外部メモリの両
方にデータをストアする。）（３）データ転送（ＡＬＵ１１７の演算結果をＲＦ１１
４にストアする場合の処理。）いずれも入力は信号線１１８であり、上記処理（１）と
（３）における出力は信号線１２１に行う。また、外部
メモリとのデータの入出力には信号線１２０を用いる。
外部メモリへのアドレス出力線は図１では省略してい
る。(1) Operand fetch (if there is no data to be fetched in the OC 119, the data is transferred from the external memory to the OC 119, and the fetch process is executed again from the OC 119.) (2) Operand store (with the OC 119 and the OC 119) Data is stored in both external memories.) (3) Data transfer (operation result of ALU 117 is stored in RF11
Processing for storing data in No. 4. In each case, the input is the signal line 118, and the outputs in the above processes (1) and (3) are performed on the signal line 121. The signal line 120 is used for inputting and outputting data to and from the external memory.
The address output line to the external memory is omitted in FIG.

【００４１】ステージｓはレジスタストアステージであ
り、図１のＲＦ１１４の処理が含まれる。図２の時刻ｔ
４において命令１は命令デコーダ１１３からの制御に基
づいて、レジスタストアを実行する。入力は信号線１２
１である。The stage s is a register store stage, and includes the processing of the RF 114 in FIG. Time t in FIG.
In instruction 4, the instruction 1 executes the register store based on the control from the instruction decoder 113. Input is signal line 12
It is one.

【００４２】２．２分岐時のパイプラインの流れ図３は無条件分岐命令ｂｒａの実行時のパイプラインの
流れを示している。実行している命令の種類が無条件分
岐命令であることは、この命令の処理が命令デコードス
テージｄを終了した時点すなわち時刻ｔ０で判る。但し
分岐先アドレスはアドレス計算ステージｅが終了する時
点すなわち時刻ｔ１で明らかになる。そのため、分岐先
アドレスの命令１１、１２の命令プリフェッチは時刻ｔ
２で行われる。そしてその結果として、１回の分岐あた
り２サイクルのオーバーヘッド（パイプライン処理のあ
き時間）が生じる。[0042] 2.2 Flow diagram of the pipeline when a branch 3 shows a flow of the pipeline during execution of an unconditional branch instruction bra. That the type of the instruction being executed is an unconditional branch instruction is known at the time when the processing of this instruction ends the instruction decode stage d, that is, at time t0. However, the branch destination address becomes clear at the time when the address calculation stage e ends, that is, at time t1. Therefore, the instruction prefetch of the instructions 11 and 12 at the branch destination address is performed at the time t.
2 is performed. As a result, an overhead of two cycles per branch (open time of pipeline processing) occurs.

【００４３】図４は、図１の分岐命令用バッファＢＷ１
０２を用いてｂｒａ命令の分岐処理を高速化した場合の
パイプラインの流れを示している。図３と比べて、オー
バーヘッドが０サイクルになっている。これは時刻ｔ０
のｉステージにおいて、ｂｒａ命令がプリフェッチされ
た事をＢＷ１０２の履歴情報を用いて検出し、時刻ｔ１
では既に分岐先命令である命令１１をプリフェッチでき
ている結果である。またｂｒａ命令自体の処理は分岐処
理以外何も無いので、ｉステージにおいてｂｒａ命令そ
のものが削除され、ｄステージにはｂｒａ命令は転送さ
れていない。FIG. 4 shows the branch instruction buffer BW1 of FIG.
02 shows the flow of the pipeline when the branch processing of the bra instruction is speeded up using 02. Compared to FIG. 3, the overhead is 0 cycle. This is time t0
Is detected using the history information of the BW 102 at the i stage of
Is the result that the instruction 11 which is the branch destination instruction has already been prefetched. Also, since there is no processing other than branch processing in the bra instruction itself, the bra instruction itself is deleted in the i stage, and the bra instruction is not transferred to the d stage.

【００４４】図５は図４と同様に、図１の分岐命令バッ
ファＢＷ１０２を用いて分岐処理を高速化した例であ
る。但し分岐命令は、サブルーチンコール命令ｂｓｒで
ある。サブルーチンコールのためのｂｓｒ命令は無条件
分岐のためのｂｒａ命令と異なり、分岐以外にも実行す
べき処理があるため、ｂｒａ命令のように削除する訳に
は行かない。そのため分岐には２サイクルの実行時間を
必要とする。分岐によるオーバーヘッドは図４の場合と
同じく０サイクルである。FIG. 5 shows an example in which the branch processing is speeded up using the branch instruction buffer BW102 of FIG. 1, as in FIG. However, the branch instruction is a subroutine call instruction bsr. Unlike the bra instruction for unconditional branch, the bsr instruction for subroutine call has processing to be executed other than the branch, and therefore cannot be deleted like the bra instruction. Therefore, the branch requires two cycles of execution time. The overhead due to the branch is 0 cycle as in the case of FIG.

【００４５】図６はやはり図４と同様に、分岐命令バッ
ファＢＷ１０２を用いて分岐処理を高速化した例であ
る。但し分岐命令はリターン命令ｒｔｓである。ｒｔｓ
命令はｂｓｒ命令と同じ理由で、削除することはできな
い。そのため分岐には１サイクルを要する。但し分岐命
令バッファＢＷ１０２の効果によってオーバーヘッドは
０サイクルである。FIG. 6 shows an example in which the branch processing is speeded up by using the branch instruction buffer BW102, similarly to FIG. However, the branch instruction is a return instruction rts. rts
Instructions cannot be deleted for the same reasons as bsr instructions. Therefore, one cycle is required for branching. However, the overhead is 0 cycle due to the effect of the branch instruction buffer BW102.

【００４６】以上説明したように、無条件分岐命令ｂｒ
ａとサブルーチンコール命令ｂｓｒの処理には分岐命令
用バッファＢＷ１０２を用い、リターンバッファＲＢ１
０３は用いない。しかし、リターン命令ｒｔｓの処理に
はＢＷ１０２とＲＢ１０３の両方を用いる。動作フロー
を説明する前に、次にＢＷとＲＢの構造を説明する。３．ＢＷとＲＢの構造と動作３．１ＢＷの構造と動作図７は、図１の分岐命令バッファＢＷ１０２をより詳細
に説明する構成図である。分岐命令バッファＢＷはアド
レスデコーダ２０１、アドレスタグ部ＢＷＡ２０２、デ
ータ部ＢＷＤ２０３、及び一致比較器ＣＭＰ２０９で構
成されている。ここにアドレスデコーダ２０１は５ビッ
トデコーダであり、デコードの結果としてＢＷＡ２０２
とＢＷＤ２０３の３２個のエントリの一つを指定するポ
インタを得る。ＢＷＡ２０２とＢＷＤ２０３とは、いず
れもＲＡＭメモリを用いて構成されているが、連想メモ
リを用いて構成することも可能である。エントリすなわ
ちワード数は共に３２であり、ビット幅はそれぞれ３
２、３３ビットである。As described above, the unconditional branch instruction br
a and a branch instruction to process subroutine call instruction bsr
Using the return buffer RB1
03 is not used. However, in the processing of the return instruction rts
Uses both BW102 and RB103. Operation flow
Before explaining this, the structures of BW and RB will be described next. 3. Structure and operation of BW and RB 3.1 Structure and operation of BW FIG. 7 shows the branch instruction buffer BW102 of FIG. 1 in more detail.
FIG. Branch instruction buffer BW is added
Address decoder 201, address tag section BWA202,
Data unit BWD203 and the coincidence comparator CMP209.
Has been established. Here, the address decoder 201 has 5 bits.
And a BWA 202 as a decoding result.
And one of the 32 entries of the BWD 203
Get Inter. BWA202 and BWD203
These are also configured using RAM memory,
It is also possible to use a configuration using a key. Entry sandals
The number of words is 32 and the bit width is 3
2, 33 bits.

【００４７】３２ビットの内訳は、ＢＷＡ２０２が分岐
命令アドレス３１ビット（入力線１２２、出力線２０
６、２１３）、バリッドビット１ビット（入力線２０
４、出力線２０７）である。またＢＷＤ２０３では分岐
先アドレス３１ビット（入力線１１８、出力線１０
６）、分岐命令タイプ２ビット（入力線２０５、出力線
２０８）である。分岐命令タイプは、前述の無条件分岐
命令ｂｒａ、サブルーチンコール命令ｂｓｒ、リターン
命令ｒｔｓの命令の種別を区別する情報である。ＣＭＰ
２０９は２４ビット幅の一致比較器であり、一致比較結
果を示す信号２１０（一致時には値１、不一致時には値
０となる）はＡＮＤ回路２１１でバリッド信号２０７と
論理積が取られ、ＡＮＤ回路２１１の出力はヒット信号
２１２となって命令プリフェッチ部内のＢＷ制御回路へ
と出力される。The breakdown of the 32 bits is that the BWA 202 uses the 31-bit branch instruction address (input line 122, output line 20).
6, 213), 1 valid bit (input line 20
4, output line 207). In the BWD 203, the branch destination address 31 bits (input line 118, output line 10
6), 2-bit branch instruction type (input line 205, output line 208). The branch instruction type is information for distinguishing the types of the above-described unconditional branch instruction bra, subroutine call instruction bsr, and return instruction rts. CMP
A coincidence comparator 209 has a 24-bit width. A signal 210 indicating the result of the coincidence comparison (the value is 1 when the value matches, and the value is 0 when the value does not match) is ANDed with the valid signal 207 by the AND circuit 211. Is output as a hit signal 212 to the BW control circuit in the instruction prefetch unit.

【００４８】分岐命令バッファ１０２ＢＷの書き込み動
作は、次の通りである。The write operation of the branch instruction buffer 102BW is as follows.

【００４９】まず２９ビットのプリフェッチアドレス信
号線１１０の下位５ビットが、アドレスデコーダ２０１
に入力される。続いて、アドレスデコーダ２０１でアド
レスデコードが行われ、３２個のエントリのうちの一つ
が選択される。この時、ＢＷＡ２０２とＢＷＤ２０３に
入力されている信号線１２２、２０４、１１８、２０５
の値が、選択されたエントリに同時に書き込まれる。分
岐命令アドレス１２２は図１の命令アドレスキューＰＣ
Ｑ１２１からの出力信号線、バリッドビット２０４は命
令プリフェッチ部の制御回路からの出力、分岐先アドレ
ス１１８は図１のＡＬＵ１１７からの出力信号線（アド
レス計算の結果）、分岐命令タイプは命令デコーダＩＤ
１１３の出力情報を命令プリフェッチ部の制御回路でタ
イミング調整した信号である。First, the lower 5 bits of the 29-bit prefetch address signal line 110 correspond to the address decoder 201.
Is input to Subsequently, address decoding is performed by the address decoder 201, and one of the 32 entries is selected. At this time, the signal lines 122, 204, 118, 205 input to the BWA 202 and the BWD 203
Is simultaneously written to the selected entry. The branch instruction address 122 is the instruction address queue PC of FIG.
The output signal line from Q121, the valid bit 204 is the output from the control circuit of the instruction prefetch unit, the branch destination address 118 is the output signal line from the ALU 117 in FIG. 1 (result of address calculation), and the branch instruction type is the instruction decoder ID.
The output information 113 is a signal whose timing is adjusted by the control circuit of the instruction prefetch unit.

【００５０】分岐命令バッファＢＷの読み出し動作は、
次の通りである。The read operation of the branch instruction buffer BW is as follows.
It is as follows.

【００５１】書き込み動作と同様に、２９ビットのプリ
フェッチアドレス信号線１１０の下位５ビットがアドレ
スデコーダ２０１に、残りの２４ビットが一致比較器Ｃ
ＭＰ２０９に入力される。続いてアドレスデコーダ２０
１でアドレスデコードが行われ、３２個のエントリのう
ちの一つが選択される。ＢＷＡとＢＷＤの選択されたエ
ントリのデータが読み出され、信号線２０６、２１３、
２０７、１０６、２０８にそれぞれ出力される。Similarly to the write operation, the lower 5 bits of the 29-bit prefetch address signal line 110 are sent to the address decoder 201, and the remaining 24 bits are sent to the match comparator C.
Input to MP209. Subsequently, the address decoder 20
At 1, address decoding is performed, and one of the 32 entries is selected. The data of the selected entry of BWA and BWD is read, and the signal lines 206, 213,
207, 106, and 208, respectively.

【００５２】信号線２０６に出力された分岐命令アドレ
スは、選択されたエントリに登録されている分岐命令の
アドレスの上位２４ビットである。エントリの選択には
アドレスの下位５ビットのみを用いているので、登録さ
れている分岐命令のアドレスとプリフェッチ中の命令に
含まれる分岐命令のアドレスが同一であるかどうかを調
べるために、ＣＭＰ２０９を用いてアドレスの上位２４
ビットの一致比較を行う。The branch instruction address output to the signal line 206 is the upper 24 bits of the address of the branch instruction registered in the selected entry. Since only the lower 5 bits of the address are used for selecting an entry, the CMP 209 is used to check whether the address of the registered branch instruction is the same as the address of the branch instruction included in the instruction being prefetched. Top 24 addresses using
Performs bit match comparison.

【００５３】この２４ビットに続くアドレスの下位７ビ
ットが、信号線２１３に出力される。信号線２１３の上
位５ビットは、プリフェッチアドレス１１０の５ビット
と一致している。それ故に、この５ビットをＢＷＡから
削除することもまた可能である。信号線２１３の下位２
ビットはプリフェッチされた命令列８バイト中での分岐
命令（２バイト長）の位置を示しており、プリフェッチ
キューＰＦＱ１２１の制御情報として使用される。The lower 7 bits of the address following the 24 bits are output to the signal line 213. The upper 5 bits of the signal line 213 match the 5 bits of the prefetch address 110. Therefore, it is also possible to delete this 5 bits from the BWA. Lower 2 of signal line 213
The bit indicates the position of the branch instruction (2-byte length) in the prefetched instruction string of 8 bytes, and is used as control information of the prefetch queue PFQ121.

【００５４】また、バリッドビット２０７は読み出され
たエントリの情報が有効であるか否かを示している。こ
のバリッドビット２０７は読み出されたエントリの情報
が有効な時は”１”となり、無効な時は”０”となる。
そこでバリッド信号２０７と一致比較結果の信号２１０
の論理積をとることで、ＢＷＡ２０２とＢＷＤ２０３と
から読み出したデータがプリフェッチアドレス１１０に
対応した有効な情報であるか否かを示すヒット信号２１
２を生成できる。The valid bit 207 indicates whether the information of the read entry is valid. The valid bit 207 is “1” when the read entry information is valid, and is “0” when the information is invalid.
Therefore, the valid signal 207 and the signal 210 of the match comparison result
And a hit signal 21 indicating whether or not the data read from the BWA 202 and the BWD 203 is valid information corresponding to the prefetch address 110.
2 can be generated.

【００５５】ＢＷＤ２０３から読み出された分岐先アド
レス１０６は、命令プリフェッチ部内での分岐処理に用
いられる。すなわち、図１においてＢＷ１０２から出力
された分岐先アドレス１０６はセレクタ１０９、信号線
１１０を通して、プリフェッチアドレスとして命令キャ
ッシュＩＣ１０４、分岐命令用バッファＢＷ１０２、プ
リフェッチアドレス発生器ＰＡＧ１０１に入力される。The branch destination address 106 read from the BWD 203 is used for branch processing in the instruction prefetch unit. That is, the branch destination address 106 output from the BW 102 in FIG. 1 is input as a prefetch address to the instruction cache IC 104, the branch instruction buffer BW102, and the prefetch address generator PAG101 through the selector 109 and the signal line 110.

【００５６】また、分岐命令タイプ２０８は、読み出さ
れたエントリの分岐命令が無条件分岐命令ｂｒａ、サブ
ルーチンコール命令ｂｓｒ、リターン命令ｒｔｓのいず
れであるかを示している。命令プリフェッチ部内の制御
回路はこの情報を受けて、図４から図６に示した処理の
いずれかを実行する。The branch instruction type 208 indicates whether the branch instruction of the read entry is an unconditional branch instruction bra, a subroutine call instruction bsr, or a return instruction rts. The control circuit in the instruction prefetch unit receives this information and executes one of the processes shown in FIGS.

【００５７】３．２リターンバッファＲＢの構造と動
作図８は、図１のリターンバッファＲＢ１０３の構成をよ
り詳細に示している。ＲＢは、デコーダ３０２とＲＡＭ
メモリＲＢＤ３０３とで構成される。デコーダ３０２は
４ビットデコーダであり、４ビット幅のポインタ３０１
を入力として、ＲＢＤ３０３の１６エントリのうちの一
つを選択する。ＲＢＤ３０３は１６エントリ、ビット幅
３２ビットのＲＡＭである。３２ビットの内訳は、リタ
ーンアドレス３１ビット（入力線１１８、出力線１０
７）、バリッドビット１ビット（入力線３０４、出力線
３０５）である。ＲＢ３０３では読み出されたバリッド
ビットがそのままヒット信号となる。 3.2 Structure and Operation of Return Buffer RB
Create Figure 8 shows the configuration of the return buffer RB103 in FIG 1 in greater detail. RB is a decoder 302 and RAM
And a memory RBD 303. The decoder 302 is a 4-bit decoder, and has a 4-bit width pointer 301.
, And one of the 16 entries of the RBD 303 is selected. The RBD 303 is a RAM having 16 entries and a bit width of 32 bits. The breakdown of the 32 bits is that the return address is 31 bits (input line 118, output line 10
7), one valid bit (input line 304, output line 305). In the RB 303, the read valid bit becomes a hit signal as it is.

【００５８】リターンバッファＲＢ１０３への書き込み
動作は、次の通りである。The write operation to the return buffer RB103 is as follows.

【００５９】４ビットのポインタ３０１は、命令プリフ
ェッチ部の制御回路から与えられる。ポインタ３０１は
デコーダ３０２でデコードされ、ＲＢＤ３０３のエント
リの一つを選択する。この時、ＲＢＤ３０３に入力され
ているリターンアドレス１１８とバリッド信号３０４の
値とが選択されたエントリに書き込まれる。バリッド信
号３０４は命令プリフェッチ部の制御回路から与えられ
る。ＲＢ１０３への書き込みは、サブルーチンコール命
令（例えばｂｓｒ命令）の実行時に起動される。そして
その際、ポインタを一つ進める。The 4-bit pointer 301 is provided from the control circuit of the instruction prefetch unit. The pointer 301 is decoded by the decoder 302, and selects one of the entries of the RBD 303. At this time, the return address 118 and the value of the valid signal 304 input to the RBD 303 are written to the selected entry. The valid signal 304 is provided from a control circuit of the instruction prefetch unit. Writing to the RB 103 is started when a subroutine call instruction (for example, a bsr instruction) is executed. At this time, the pointer is advanced by one.

【００６０】リターンバッファＲＢ１０３からの読み出
し動作は、次の通りである。The read operation from the return buffer RB103 is as follows.

【００６１】書き込み時と同じく、４ビットのポインタ
３０１は命令プリフェッチ部の制御回路から与えられ
る。ポインタ３０１はデコーダ３０２でデコードされ、
ＲＢＤ３０３のエントリの一つを選択する。選択された
エントリのデータは、信号線１０７と信号線３０５に同
時に出力される。ＢＷ１０２で読み出された分岐命令の
タイプがリターン命令ｒｔｓでありかつリターンバッフ
ァＲＢ１０３から読み出されたヒット（バリッド）信号
が値１であった場合、リターンアドレス１０７を次のプ
リフェッチアドレスとして、図１のセレクタ１０９、信
号線１１０を通して命令キャッシュＩＣ１０４、分岐命
令用バッファＢＷ１０２、プリフェッチアドレス発生器
ＰＡＧ１０１に転送する。リターンバッファＲＢ１０３
のポインタは、サブルーチンリターン命令（例えばｒｔ
ｓ命令）の実行時に一つ戻す。As in the case of writing, the 4-bit pointer 301 is given from the control circuit of the instruction prefetch unit. The pointer 301 is decoded by the decoder 302,
One of the entries of the RBD 303 is selected. The data of the selected entry is output to the signal lines 107 and 305 at the same time. When the type of the branch instruction read by the BW 102 is the return instruction rts and the hit (valid) signal read from the return buffer RB103 is the value 1, the return address 107 is set as the next prefetch address in FIG. Through the selector 109 and the signal line 110 to the instruction cache IC 104, the branch instruction buffer BW102, and the prefetch address generator PAG101. Return buffer RB103
Pointer is a subroutine return instruction (for example, rt
s instruction).

【００６２】サブルーチンリターン命令ｒｔｓが無条件
分岐命令ｂｒａ、サブルーチンコール命令ｂｓｒと異な
り、分岐先アドレスを分岐命令用バッファＢＷ１０２内
に保持できない理由は次のとおりである。The reason why the subroutine return instruction rts cannot hold the branch destination address in the branch instruction buffer BW102, unlike the unconditional branch instruction bra and the subroutine call instruction bsr, is as follows.

【００６３】リターン命令ｒｔｓは、サブルーチンから
のリターンのために使用される。すなわち、サブルーチ
ンの最後の命令として、サブルーチンをコールしたルー
チンのコール命令の次の命令に分岐する。一方、リター
ンアドレスはコール時にスタックに退避され、リターン
時にはスタックから回復される。そのため、どのルーチ
ンからコールされるかによって、同じサブルーチンリタ
ーン命令ｒｔｓであっても、リターン先アドレスが異な
る。The return instruction rts is used for returning from a subroutine. That is, as the last instruction of the subroutine, the process branches to the instruction following the call instruction of the routine that called the subroutine. On the other hand, the return address is saved on the stack at the time of a call, and is restored from the stack at the time of a return. Therefore, depending on which routine is called from, the return destination address differs even for the same subroutine return instruction rts.

【００６４】これに対して分岐命令バッファＢＷ１０２
では、分岐命令のアドレスと分岐先アドレスを一組にし
た履歴情報を保持する。前述のようにサブルーチンリタ
ーン命令ｒｔｓの場合には分岐命令アドレスと分岐先ア
ドレスの関係が一意ではないから、サブルーチンリター
ン命令ｒｔｓの分岐先リターンアドレスを分岐命令用バ
ッフアＢＷ１０２内に保持することはできない。そこ
で、スタックのリターンアドレスのコピーを保持するバ
ッファであるＲＢ１０３を設け、このバッファＲＢ１０
３からリターンアドレスを得ることでこの問題を解決し
ている。On the other hand, the branch instruction buffer BW102
Holds history information in which the address of the branch instruction and the branch destination address are paired. As described above, in the case of the subroutine return instruction rts, since the relationship between the branch instruction address and the branch destination address is not unique, the branch destination return address of the subroutine return instruction rts cannot be stored in the branch instruction buffer BW102. Therefore, a buffer RB103 for holding a copy of the return address of the stack is provided.
This problem has been solved by obtaining the return address from No.3.

【００６５】４．動作フロー図９、図１０は上記に説明した本実施例によるデータプ
ロセッサの命令プリフェッチ部の制御基本フローであ
る。分岐、リセット、例外発生時の状態遷移は省略し
た。以下、図９、図１０を用いて制御フローを説明す
る。[0065] 4. Operation Flow FIGS. 9 and 10 are basic control flows of the instruction prefetch unit of the data processor according to the present embodiment described above. Branches, resets, and state transitions when an exception occurs are omitted. Hereinafter, the control flow will be described with reference to FIGS.

【００６６】ステップ９００はプリフェッチキューＰＦ
Ｑ１１１のあきを待っている状態である。ＰＦＱ１１１
はＦＩＦＯキューであるので、データが満杯になった場
合には、命令デコーダＩＤ１１３へのデータ転送によっ
て空きができるまで、次のデータ書き込みができなくな
る。そのためステップ９０１の判定動作（ＰＦＱが満杯
か否か）によって、再び待ち状態ステップ９００に遷移
するか、それともステップ９０２に遷移するかが決定さ
れる。Step 900 is a prefetch queue PF
It is in a state of waiting for the opening of Q111. PFQ111
Is a FIFO queue, so when data is full, the next data cannot be written until a space is created by data transfer to the instruction decoder ID 113. Therefore, whether to transit to the waiting state step 900 again or to step 902 is determined by the determination operation of step 901 (whether or not the PFQ is full).

【００６７】ステップ９０２は、命令キャッシュＩＣ１
０４、分岐命令バッファＢＷ１０２及びリターンバッフ
ァＲＢ１０３の読み出しを行う状態である。それらの読
み出しの結果をステップ９０３から９０６で判定し、次
の動作状態を９０７から９１０のどれか一つに決定す
る。Step 902 is the instruction cache IC 1
04, a state in which the branch instruction buffer BW102 and the return buffer RB103 are read out. The results of those readings are determined in steps 903 to 906, and the next operation state is determined to be one of 907 to 910.

【００６８】ステップ９０３は命令キャッシュＩＣ１０
４の読み出しが成功（ヒット信号の値が１）したか否か
を判定し、成功した場合にはステップ９０４へ、失敗し
た場合にはステップ９１０に遷移する。Step 903 is the instruction cache IC 10
Then, it is determined whether or not reading of No. 4 is successful (the value of the hit signal is 1). If the reading is successful, the process proceeds to step 904;

【００６９】ステップ９０４は分岐命令バッファＢＷ１
０２の読み出しが成功したか否か、すなわち図７のヒッ
ト信号２１２の値が１であるか否かを判定する。読み出
しが成功した場合にはステップ９０５へ、失敗した場合
にはステップ９０９へ遷移する。Step 904 is a branch instruction buffer BW1
It is determined whether or not the readout of No. 02 is successful, that is, whether or not the value of the hit signal 212 in FIG. If the reading is successful, the process proceeds to step 905; otherwise, the process proceeds to step 909.

【００７０】ステップ９０５は分岐命令バッファＢＷ１
０２から読み出された分岐命令タイプ情報（図７の信号
２０８）がリターン命令ｒｔｓを指示するものか否かを
判定する。リターン命令ｒｔｓの場合にはステップ９０
６へ、そうでない場合にはステップ９０７へ遷移する。Step 905 is a branch instruction buffer BW1.
It is determined whether or not the branch instruction type information (signal 208 in FIG. 7) read from 02 indicates a return instruction rts. Step 90 in the case of the return instruction rts
The process proceeds to step 907 otherwise.

【００７１】ステップ９０６はリターンバッファＲＢ１
０３の読み出しが成功したか否か、すなわち図８のヒッ
ト信号３０５の値が１であるか否かを判定する。成功し
た場合にはステップ９０８へ、失敗した場合にはステッ
プ９０９へ遷移する。Step 906 is the return buffer RB1
It is determined whether or not the reading of the hit signal 03 is successful, that is, whether or not the value of the hit signal 305 in FIG. If successful, go to step 908; if unsuccessful, go to step 909.

【００７２】ステップ９０７は分岐命令バッファＢＷ１
０２にヒットし、かつその分岐命令が無条件分岐命令ｂ
ｒａまたはサブルーチンコール命令ｂｓｒである場合に
遷移する状態であり、次のサイクルのプリフェッチアド
レスとしてＢＷ１０２から出力された分岐先アドレス信
号１０６を用いる。具体的には、図１のセレクタ１０９
により信号１０６を選択し、その値を信号１１０に出力
する。Step 907 is a branch instruction buffer BW1.
02 and the branch instruction is an unconditional branch instruction b
This is a transition state when the instruction is ra or the subroutine call instruction bsr, and the branch destination address signal 106 output from the BW 102 is used as the prefetch address in the next cycle. Specifically, the selector 109 of FIG.
Selects the signal 106, and outputs the value to the signal 110.

【００７３】ステップ９０８は分岐命令バッファＢＷ１
０２とリターンバッファＲＢ１０３の両方にヒットし、
かつその分岐命令がリターン命令ｒｔｓである場合に遷
移する状態であり、次のサイクルのプリフェッチアドレ
スとしてＲＢ１０３から出力されたリターンアドレス信
号１０７を用いる。具体的には、図１のセレクタ１０９
により信号１０７を選択し、その値を信号１１０に出力
する。Step 908 is a branch instruction buffer BW1.
02 and both return buffer RB103
This is a state where the transition is made when the branch instruction is the return instruction rts, and the return address signal 107 output from the RB 103 is used as the prefetch address in the next cycle. Specifically, the selector 109 of FIG.
Selects the signal 107, and outputs the value to the signal 110.

【００７４】ステップ９０９は命令キャッシュＩＣには
ヒットしたが、ステップ９０７、ステップ９０８のいず
れへの遷移条件も満たさない場合に遷移する状態であ
る。次のサイクルのプリフェッチアドレスとしては図１
のＰＡＧ１０１から出力された信号１０５の値を用い
る。具体的には、図１のセレクタ１０９により信号１０
５を選択し、その値を信号１１０に出力する。Step 909 is a state where a transition is made when the instruction cache IC has been hit, but the transition condition to neither step 907 nor step 908 is satisfied. Figure 1 shows the prefetch address of the next cycle.
The value of the signal 105 output from the PAG 101 is used. Specifically, the signal 10 is output by the selector 109 of FIG.
5 is selected and the value is output to the signal 110.

【００７５】ステップ９１０は命令キャッシュＩＣ１０
４の読み出しが失敗した場合に遷移する状態であり、外
部メモリアクセスが起動される。アクセスに用いるアド
レスは命令キャッシュの読み出しに用いたプリフェッチ
アドレス（信号１１０）である。このアドレス信号線は
図１中では省略している。外部メモリからの命令転送が
終了すると、状態は再びステップ９０２へと遷移する。Step 910 is the instruction cache IC 10
This is a state where a transition is made when the reading of No. 4 fails, and external memory access is activated. The address used for access is the prefetch address (signal 110) used for reading the instruction cache. This address signal line is omitted in FIG. When the instruction transfer from the external memory is completed, the state transitions to step 902 again.

【００７６】ステップ９０７、９０８、９０９の次状態
は、ステップ９１１である。The next state after steps 907, 908 and 909 is step 911.

【００７７】ステップ９１１では、ＢＷ１０２、ＲＢ１
０３への書き込み動作を起動するか否かを決定する。そ
の判定は基本的には、無条件分岐命令ｂｒａ、サブルー
チンコール命令ｂｓｒ、リターン命令ｒｔｓのいずれか
の命令が命令デコーダ１１３でデコードされたという、
デコード情報を基に起動する。但しこれら３つの分岐命
令ごとに起動条件は若干異なる。In step 911, the BW 102, RB1
It is determined whether or not to start the write operation to the 03. The determination is basically that any one of the unconditional branch instruction bra, the subroutine call instruction bsr, and the return instruction rts has been decoded by the instruction decoder 113.
Start based on decode information. However, the activation conditions are slightly different for each of these three branch instructions.

【００７８】（１）無条件分岐命令ｂｒａｂｒａ命令がデコードされた場合にはＢＷ１０２、ＲＢ
１０３が使用可能状態にある時には常に、ＢＷ１０２へ
の書き込みを起動する。ｂｒａ命令がＢＷ１０２を用い
た分岐を生じる場合、ｂｒａ命令は命令プリフェッチ部
内で削除され、命令デコード部へは転送されない。逆
に、ｂｒａ命令がデコードされた場合には、そのｂｒａ
命令はＢＷ１０２を用いた分岐処理を実行していない。(1) Unconditional branch instruction bra If the bra instruction is decoded, BW102, RB
Whenever 103 is in a usable state, it starts writing to BW 102. When the bra instruction causes a branch using the BW 102, the bra instruction is deleted in the instruction prefetch unit and is not transferred to the instruction decode unit. Conversely, when the bra instruction is decoded,
The instruction does not execute the branch processing using the BW 102.

【００７９】（２）サブルーチンコール命令ｂｓｒ、リ
ターンｒｔｓ命令ｂｓｒ命令またはｒｔｓ命令を、ＢＷ１０２を用いて分
岐処理する場合、命令プリフェッチ部は分岐命令を信号
１１２を通して命令デコード部へ転送すると共に、その
分岐命令がＢＷ１０２を用いて分岐処理済みであること
を示すタグ情報を命令デコード部に送る。命令デコード
部ではこのタグ情報を基に、ＢＷ１０２を用いて分岐済
みの分岐命令について、分岐指示信号と、ｂｓｒ命令ま
たはｒｔｓ命令がデコードされたというデコード情報と
を命令プリフェッチ部に送らないようにする。(2) Subroutine call instruction bsr, return rts instruction When a bsr instruction or an rts instruction is subjected to branch processing using the BW 102, the instruction prefetch unit transfers the branch instruction to the instruction decode unit through the signal 112 and branches the instruction. The tag information indicating that the instruction has been subjected to the branch processing using the BW 102 is sent to the instruction decoding unit. Based on the tag information, the instruction decoding unit does not send a branch instruction signal and decode information indicating that the bsr instruction or the rts instruction has been decoded to the instruction prefetch unit for the branch instruction that has been branched using the BW 102. .

【００８０】ＢＷ１０２への書き込みを起動する場合に
はステップ９１２へ、起動しない場合にはステップ９０
１へ遷移する。If the writing to the BW 102 is to be started, the process proceeds to step 912;
Transitions to 1.

【００８１】ステップ９１２は分岐指示信号待ち状態で
ある。分岐指示信号がアサ−トされたか否かはステップ
９１３で判定する。ステップ９１２では図１の命令アド
レスキューＰＣＱ１２１を用いて分岐命令のアドレスを
信号線１２２に出力し、さらに図７の信号線２０４、２
０５を生成して、分岐指示信号のアサートを待つ。分岐
指示信号アサート時にはステップ９１４へ遷移する。Step 912 is a state of waiting for a branch instruction signal. It is determined in step 913 whether or not the branch instruction signal has been asserted. In step 912, the address of the branch instruction is output to the signal line 122 using the instruction address queue PCQ121 of FIG.
05 and waits for the branch instruction signal to be asserted. When the branch instruction signal is asserted, the flow goes to step 914.

【００８２】ステップ９１４ではＢＷ１０２への登録を
行う。図１の本実施例のマイクロプロセッサでは、分岐
指示信号と分岐先アドレスは同時に生成される。分岐先
アドレスは図１のＡＬＵ１１７で生成され、信号線１１
８を通してＢＷに転送される。ＢＷ１０２では分岐指示
信号のアサートにより、すべての入力データが揃い、図
７のＢＷＡ２０２とＢＷＤ２０３のプリフェッチアドレ
ス１１０で選択されたエントリに同時に書き込みを行
う。本アルゴリズムでは、ステップ９１４でＢＷ１０２
の書き込みとＩＣ１０４の読み出しを同時に行う。その
ため、ＢＷ１０２の書き込み時にはＢＷ１０２の読み出
しを行えない。但しこれは、ＢＷ１０２を２ポートメモ
リとすれば、実現できる。At step 914, registration with the BW 102 is performed. In the microprocessor of this embodiment shown in FIG. 1, the branch instruction signal and the branch destination address are generated simultaneously. The branch destination address is generated by the ALU 117 of FIG.
8 to the BW. In the BW 102, all the input data are aligned by the assertion of the branch instruction signal, and the BW 202 and the BWD 203 in FIG. 7 simultaneously write to the entry selected by the prefetch address 110. In the present algorithm, in step 914, the BW102
And reading of the IC 104 are performed simultaneously. Therefore, when writing to the BW 102, the BW 102 cannot be read. However, this can be realized if the BW 102 is a two-port memory.

【００８３】ステップ９１４でＢＷ１０２への書き込み
が終了した後は、再びステップ９０３に遷移する。After the writing to the BW 102 is completed in step 914, the process returns to step 903.

【００８４】[0084]

【発明の効果】以上に示したように、本発明を用いた実
施例においては、図４から図６に示したように分岐処理
特に無条件分岐処理を高速化できる。その特徴は、１）分岐命令アドレスと分岐先アドレスを組にした分岐
の履歴情報を保持し、その履歴情報をプリフェッチアド
レスを用いて検索するため、早い時点での分岐が可能で
ある。As described above, in the embodiment using the present invention, the branch processing, particularly the unconditional branch processing, can be sped up as shown in FIGS. Its features are as follows: 1) Branch history information, which is a set of a branch instruction address and a branch destination address, is held and the history information is searched using a prefetch address, so that an earlier branch is possible.

【００８５】２）ＢＷ（分岐履歴情報を保持するバッフ
ァ）とＲＢ（リターンアドレスを保持するバッファ）を
連動して動作させることにより、リターン命令の分岐処
理をも高速化できる。2) By operating the BW (buffer holding branch history information) and RB (buffer holding return address) in conjunction with each other, the branch processing of the return instruction can be sped up.

【００８６】３）ＢＷ内に分岐命令のタイプ情報を保持
することにより、上記２）の連動動作が可能になると共
に、分岐命令ごとに極め細かい制御が可能になる。例え
ば、無条件分岐命令ｂｒａの場合には実行処理を削除
し、命令ｂｓｒの場合には削除しないといった処理の制
御にこの情報を使用している。3) By holding the type information of the branch instruction in the BW, the interlocking operation of the above 2) can be performed, and very fine control can be performed for each branch instruction. For example, this information is used to control processing such that the execution process is deleted in the case of the unconditional branch instruction bra and is not deleted in the case of the instruction bsr.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の実施例によるマイクロプロセッサの全
体構成を示す図である。FIG. 1 is a diagram showing an overall configuration of a microprocessor according to an embodiment of the present invention.

【図２】分岐が無い時の図１のマイクロプロセッサのパ
イプラインの流れを示す図である。FIG. 2 is a diagram showing a flow of a pipeline of the microprocessor of FIG. 1 when there is no branch;

【図３】図１のマイクロプロセッサの分岐命令バッファ
ＢＷ１０２のミス時のパイプラインの流れを示す図であ
る。FIG. 3 is a diagram showing a pipeline flow at the time of a miss in a branch instruction buffer BW102 of the microprocessor of FIG. 1;

【図４】図１のマイクロプロセッサの分岐命令バッファ
ＢＷ１０２のヒット時のパイプラインの流れを示す図で
ある。4 is a view showing a flow of a pipeline at the time of a hit of a branch instruction buffer BW102 of the microprocessor of FIG. 1;

【図５】図１のマイクロプロセッサの分岐命令バッファ
ＢＷ１０２のヒット時のパイプラインの流れを示す図で
ある。5 is a view showing a flow of a pipeline at the time of a hit of a branch instruction buffer BW102 of the microprocessor of FIG. 1;

【図６】図１のマイクロプロセッサの分岐命令バッファ
ＢＷ１０２、リターンバッファＲＢ１０３のヒット時の
パイプラインの流れを示す図である。6 is a diagram showing a pipeline flow at the time of a hit of a branch instruction buffer BW102 and a return buffer RB103 of the microprocessor of FIG. 1;

【図７】図１のマイクロプロセッサ中の分岐命令バッフ
ァＢＷ１０２の構成をより詳細に示す図である。FIG. 7 is a diagram showing the configuration of a branch instruction buffer BW102 in the microprocessor of FIG. 1 in more detail;

【図８】図１のマイクロプロセッサ中のリターンバッフ
ァＲＢ１０３の構成をより詳細に示す図である。FIG. 8 is a diagram showing the configuration of a return buffer RB103 in the microprocessor of FIG. 1 in more detail;

【図９】図１のマイクロプロセッサの命令プリフェッチ
部の制御フローを示す図である。FIG. 9 is a diagram illustrating a control flow of an instruction prefetch unit of the microprocessor of FIG. 1;

【図１０】図１のマイクロプロセッサの命令プリフェッ
チ部の制御フローを示す図である（図９の続き）。10 is a diagram illustrating a control flow of an instruction prefetch unit of the microprocessor of FIG. 1 (continuation of FIG. 9);

[Explanation of symbols]

１００…マイクロプロセッサ、１０１…プリフェッチア
ドレス発生器。２９ビット加算器。１０２…分岐命令用
バッファ。分岐先アドレスを保持する。１０３…リター
ンバッファ。リターンアドレスを保持する。１０４…命
令キャッシュ。２０１…５ビットのアドレスデコ−ダ。
２０２…分岐命令バッファのアドレスタグ部。３２エン
トリ、３２ビット幅。２０３…分岐命令バッファのデ−
タ部。３２エントリ、３３ビット幅。２０９…２４ビッ
ト一致比較器。３０２…４ビットのアドレスデコ−ダ。
３０３…リターンバッファのデ−タ部。１６エントリ、
３２ビット幅。100: microprocessor, 101: prefetch address generator. 29-bit adder. 102: Branch instruction buffer. Holds the branch destination address. 103 ... Return buffer. Holds return address. 104 Instruction cache. 201 ... 5-bit address decoder.
202: Address tag part of the branch instruction buffer. 32 entries, 32 bits wide. 203: Data of the branch instruction buffer
Part. 32 entries, 33 bits wide. 209 24-bit match comparator. 302 4-bit address decoder.
303 Data part of return buffer. 16 entries,
32 bits wide.

───────────────────────────────────────────────────── フロントページの続き (72)発明者青木郭和東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (56)参考文献特開平２−155038（ＪＰ，Ａ) 特開平２−255917（ＪＰ，Ａ) 特開昭61−196332（ＪＰ，Ａ) 特開平２−121034（ＪＰ，Ａ) 特開平３−31933（ＪＰ，Ａ) 特開昭62−151936（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 9/30 - 9/42 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Kouwa Aoki 1-280 Higashi Koikebo, Kokubunji-shi, Tokyo Inside the Central Research Laboratory of Hitachi, Ltd. (56) References JP-A-2-155038 (JP, A) JP-A-Hei JP-A-2-255917 (JP, A) JP-A-61-196332 (JP, A) JP-A-2-121034 (JP, A) JP-A-3-31933 (JP, A) JP-A-62-151936 (JP, A) A) (58) Field surveyed (Int. Cl. ⁷ , DB name) G06F 9/30-9/42

Claims

(57) [Claims]

A prefetch address generator for generating a prefetch address; a prefetch queue for prefetching an instruction from a memory according to the prefetch address; an instruction decoder for decoding an instruction stored in the prefetch queue; and an output of the instruction decoder. And the return address of the call instruction by the address operation.
An operation unit capable of generating a return address of the instruction, and a call instruction generated by the prefetch address generator.
Instruction and return instruction addresses, generated by the arithmetic unit
The branch address of the called call instruction and the call instruction
Indicates the type of turn command or not and is output from the command decoder
A first buffer for storing information to be read, a second buffer for storing a return address generated by the arithmetic unit, and a prefetch address for a branch instruction stored in the first buffer. And a comparator for comparing with the address, the information indicating the corresponding type read from the first buffer when the comparison result by the comparator matches.
If it indicates that the instruction is a call instruction, the corresponding branch destination address is read from the first buffer, and when the comparison result by the comparator matches, the information indicating the corresponding type read from the first buffer is returned. A data processor for reading a return address from a second buffer if the instruction indicates an instruction, and generating a next prefetch address according to the read address .

2. The memory according to claim 2, wherein the memory is a cache memory.
2. The data processor according to claim 1, wherein instructions are stored from an external memory, and the instructions are prefetched from said cache memory to a prefetch queue according to said prefetch address.

Wherein the prefetch queue, the instruction decoder, before Ki演 calculation unit, said comparator, said first buffer, said second buffer, and the cache memory are those formed by formed on a semiconductor substrate 3. The data processor according to claim 2, wherein:

Wherein said prefetch queue, the data processor of claim 3, wherein said instruction decoder, and before Ki演 calculation unit performs a pipeline process.

5. The data processor according to claim 4, wherein said second buffer is a last-in first-out buffer.