JP3493110B2

JP3493110B2 - High-speed branch processing unit

Info

Publication number: JP3493110B2
Application number: JP01564297A
Authority: JP
Inventors: 美次荒木
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-01-29
Filing date: 1997-01-29
Publication date: 2004-02-03
Anticipated expiration: 2017-01-29
Also published as: JPH10214187A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、動作周波数の向上
を可能とする高速分岐処理装置に関する。 TECHNICAL FIELD The present invention relates to an improvement in operating frequency.
The present invention relates to a high-speed branch processing device that enables

【０００２】[0002]

【従来の技術】近年、マルチメディア時代の到来と共
に、ＭＰＥＧ（Moving Picture ExpertsGroup）処理な
どに代表されるような動画処理において画像の実時間処
理が非常に重要となってきている。このような動画処理
は画像データを取り扱う関係上、処理すべきデータが莫
大であり、このため、マイクロプロセッサには非常に大
きな処理能力が要求される。さらに、処理のすべてをハ
ードウェアで行うとコストの増大が避けられず、そのた
め、できれば処理の一部、若しくは、全ての処理を汎用
のマイクロプロセッサを用いてソフトウェア処理した方
が望ましく、従って、益々マイクロプロセッサの処理能
力の向上が望まれる状況にある。また、ＰＤＡ（Person
al Digital Assistance ）などの携帯機器においても同
様に高速処理が要求されているが、これらの機器におい
てはさらに消費電力の削減も重要である。2. Description of the Related Art In recent years, with the advent of the multimedia era, real-time image processing has become very important in moving image processing represented by MPEG (Moving Picture Experts Group) processing. Since such moving image processing deals with image data, the amount of data to be processed is enormous, and therefore, a very large processing capability is required for the microprocessor. Further, if all the processing is performed by hardware, an increase in cost is unavoidable. Therefore, if possible, it is desirable to perform some or all of the processing by software using a general-purpose microprocessor, and therefore, more and more. There is a situation where it is desired to improve the processing capability of the microprocessor. In addition, PDA (Person
Similarly, high-speed processing is required for mobile devices such as al Digital Assistance), but it is also important to reduce power consumption in these devices.

【０００３】マイクロプロセッサの処理能力を決める重
要な要因には、大別して、動作周波数、実行命令数及び
一命令当たりのサイクル数の３つがある。例えば、動作
周波数は、一般には、システムのクリティカルパス（cr
itical path ）の遅延時間により決まるが、マイクロプ
ロセッサでは、特に、条件分岐命令実行時の条件成立時
における命令フェッチがクリティカルパスとなる場合が
多い。There are roughly three important factors that determine the processing capability of a microprocessor: the operating frequency, the number of executed instructions, and the number of cycles per instruction. For example, the operating frequency is typically determined by the critical path (cr
It is determined by the delay time of itical path), but in microprocessors, instruction fetch is often a critical path, especially when the condition is met during execution of a conditional branch instruction.

【０００４】ここで、代表的なＲＩＳＣ（Reduced Inst
ruction Set Computer）プロセッサであるＭＩＰＳＣ
ｏｍｐ．Ｓｙｓｔｅｍｓ，Ｉｎｃ．のＲ３０００を例に
して上記命令フェッチがクリティカルパスとなる場合に
ついて説明する。Here, a typical RISC (Reduced Inst
ruction set computer) MIPS C which is a processor
omp. Systems, Inc. A case where the above instruction fetch becomes a critical path will be described by taking R3000 of FIG.

【０００５】条件分岐命令実行時の条件成立時における
命令フェッチは、条件分岐命令（ｂｎｅ、ｂｅｑなど）
の演算ステージでレジスタの比較を行い、比較結果が確
定した後に命令メモリから分岐先アドレスに対する命令
を読み出し、命令レジスタに取り込むことにより行われ
るが、この処理が同一サイクル内で実行されるためにク
リティカルパスとなる場合が多いのである。以下、上記
命令フェッチにおけるデータ及び制御信号の流れについ
て図面を用いて説明する。The instruction fetch when the condition is satisfied when the conditional branch instruction is executed is a conditional branch instruction (bne, beq, etc.).
It is performed by comparing the registers at the operation stage of, and after the comparison result is confirmed, reading the instruction for the branch destination address from the instruction memory and fetching it into the instruction register, but this processing is executed in the same cycle, so it is critical. It is often a pass. The flow of data and control signals in the above instruction fetch will be described below with reference to the drawings.

【０００６】図１１は、従来の分岐処理装置のブロック
図である。この分岐処理装置は命令実行時間を短縮する
技法であるパイプライン方式を用いたものであり、現在
実行中の命令のアドレスを格納するプログラムカウンタ
１と、命令を格納する命令メモリ３と、命令メモリ３か
ら読み出された命令が取り込まれる命令レジスタ５と、
命令レジスタ５にセットされた命令をデコードし、各種
制御信号を生成する命令デコーダ７と、加算器９と、加
算器９が出力する分岐先アドレスを取り込む分岐先アド
レスレジスタ１１と、条件分岐命令の比較対象である内
容を格納する複数のレジスタ（図示省略）から成るレジ
スタファイル１３と、ＡＬＵ（Arithmetic and Logic U
nit ；算術論理演算装置）１５と、ＡＬＵ１５の入力レ
ジスタ１７ａ及び１７ｂと、セレクタ１９とから構成さ
れている。FIG. 11 is a block diagram of a conventional branch processing device. This branch processing device uses a pipeline method, which is a technique for reducing instruction execution time, and includes a program counter 1 for storing an address of an instruction currently being executed, an instruction memory 3 for storing an instruction, and an instruction memory. An instruction register 5 in which the instruction read from 3 is fetched,
An instruction decoder 7 that decodes the instruction set in the instruction register 5 to generate various control signals, an adder 9, a branch destination address register 11 that captures the branch destination address output from the adder 9, and a conditional branch instruction A register file 13 including a plurality of registers (not shown) for storing contents to be compared, and an ALU (Arithmetic and Logic U)
nit; arithmetic and logic unit 15), input registers 17a and 17b of the ALU 15, and a selector 19.

【０００７】このような構成である従来の分岐処理装置
は、まず、Ｆステージ（フェッチステージ）において、
命令メモリ３から命令（条件分岐命令）が読み出され、
命令レジスタ５に取り込まれる。In the conventional branch processing device having such a configuration, first, in the F stage (fetch stage),
An instruction (conditional branch instruction) is read from the instruction memory 3,
It is taken into the instruction register 5.

【０００８】次に、Ｄステージ（デコードステージ）に
おいて、命令レジスタ５中の命令を命令デコーダ７でデ
コードし、アドレス計算を行うための各制御信号を出力
する。命令中に指定されているｉｎｄｅｘ値とプログラ
ムカウンタ１に格納されたアドレスを加算器９により加
算することにより分岐先アドレスが求められ、分岐先ア
ドレスレジスタ１１に取り込まれる。また、同時に、命
令デコーダ７から与えられる制御信号によりレジスタフ
ァイル１３から比較対象である２つのレジスタに格納さ
れた値がＡＬＵ１５の入力レジスタ１７Ａ及び１７Ｂに
取り込まれる。Next, in the D stage (decode stage), the instruction in the instruction register 5 is decoded by the instruction decoder 7 and each control signal for performing address calculation is output. The branch destination address is obtained by adding the index value specified in the instruction and the address stored in the program counter 1 by the adder 9, and the branch destination address is stored in the branch destination address register 11. At the same time, the values stored in the two registers to be compared are fetched from the register file 13 into the input registers 17A and 17B of the ALU 15 by the control signal given from the instruction decoder 7.

【０００９】次に、Ｅステージ（実行ステージ）におい
て、ＡＬＵ１５は、入力レジスタ１７Ａ及び１７Ｂに格
納された２つの値を入力し、演算結果を得る。この演算
結果は分岐判定の制御信号となり、分岐が成立しない場
合の命令メモリ３の読み出しアドレスであるプログラム
カウンタ１の示すアドレスと、分岐が成立した場合の命
令メモリ３の読み出しアドレスである分岐先アドレスレ
ジスタ１１に格納された分岐先アドレスのいずれか一方
が、制御信号である演算結果に基づきセレクタ１９を通
して選択される。ここで、分岐が成立した場合には、命
令メモリ３の分岐先アドレスにアクセスされ、分岐先の
命令が読み出され、上記と同様に命令レジスタ５に格納
される。Next, in the E stage (execution stage), the ALU 15 inputs the two values stored in the input registers 17A and 17B and obtains the operation result. The result of this operation becomes a control signal for branch determination, and the address indicated by the program counter 1 which is the read address of the instruction memory 3 when the branch is not taken and the branch destination address which is the read address of the instruction memory 3 when the branch is taken One of the branch destination addresses stored in the register 11 is selected through the selector 19 based on the operation result which is the control signal. If the branch is taken, the branch destination address of the instruction memory 3 is accessed, the branch destination instruction is read out, and stored in the instruction register 5 in the same manner as described above.

【００１０】以上説明したように、従来の分岐処理装置
では、条件分岐命令が実行される場合、同一サイクル内
で分岐判定のためのＡＬＵ演算と分岐が成立した場合に
おける分岐先の命令のキャッシュ読み出しの処理とを直
列に行う必要があるため、このパスの遅延が大きく、ク
リティカルパスとなる場合が多い。従って、条件分岐命
令実行時において、分岐が成立した場合の命令フェッチ
がネックとなり、動作周波数を上げることが非常に難し
かった。As described above, in the conventional branch processing device, when a conditional branch instruction is executed, an ALU operation for branch determination in the same cycle and a cache read of a branch destination instruction when the branch is taken. Since it is necessary to perform the processing of 1 and 3 in series, the delay of this path is large and it often becomes a critical path. Therefore, when executing a conditional branch instruction, the instruction fetch when the branch is taken becomes a bottleneck, and it is very difficult to raise the operating frequency.

【００１１】[0011]

【発明が解決しようとする課題】上述したように、従来
の分岐処理装置では、同一サイクル内で分岐判定、それ
に続いて、分岐先のアドレスに対する命令をフェッチを
しなければならないようなパイプライン構成をとってい
た場合、このパスがクリティカルパスとなり、マイクロ
プロセッサ全体としての動作周波数を十分に上げること
ができなかった。As described above, in the conventional branch processing device, the pipeline structure is such that the branch judgment must be made in the same cycle, and subsequently the instruction to the branch destination address must be fetched. If this is the case, this path becomes a critical path, and the operating frequency of the microprocessor as a whole cannot be raised sufficiently.

【００１２】本発明は上記事情に鑑みて成されたもの
であり、その目的は、動作周波数の向上を可能とし、さ
らに、消費電力の削減をも実現することができる高速分
岐処理装置を提供することにある。 The present invention has been made in view of the above circumstances.
And its purpose is to enable an increase in operating frequency,
In addition, high-speed components that can also reduce power consumption
The purpose is to provide a processing unit.

【００１３】[0013]

【課題を解決するための手段】上記目的を達成するため
に、本発明の第１の特徴は、命令を記憶する命令メモリ
と、前記命令メモリへのアドレスを保持するプログラム
カウンタと、前記命令メモリから読み出されてデコード
される命令を格納する命令レジスタとを備えた高速分岐
処理装置において、前記命令メモリからアドレスが連続
する所定の数の命令を一度に読み出す読み出し制御手段
と、前記読み出し制御手段により前記命令メモリから読
み出された複数個の命令を一時的に記憶する２つの命令
バッファと、前記２つの命令バッファのうちいずれか一
方を選択し、さらに選択された命令バッファに記憶され
た複数個の命令のうちいずれか１個の命令を選択し、該
命令を前記命令レジスタに出力する命令バッファ選択手
段とを有し、前記プログラムカウンタにより前記複数個
の命令の先頭アドレスが指定されると、その第１の複数
個の命令を前記読み出し制御手段が前記命令メモリから
読み出し、前記第１の複数個の命令を前記２つの命令バ
ッファのうち最後のアクセス時点の古い方に記憶し、前
記命令バッファ選択手段が前記第１の複数個の命令をア
ドレス順に前記命令レジスタに出力し、デコードされた
命令が条件分岐命令である場合には条件分岐判定が終了
する前にその分岐先アドレスを含む連続するアドレスの
第２の複数個の命令を前記読み出し制御手段が前記命令
メモリから読み出し、前記第２の複数個の命令を前記２
つの命令バッファのうち最後のアクセス時点の古い方に
記憶し、前記命令バッファ選択手段が前記第２の複数個
の命令をアドレス順に前記命令レジスタに出力すると共
に、命令メモリに格納された命令のアドレス配置は、第
１の複数個の命令の読み出しと第２の複数個の命令の読
み出しとが一致しないものとなっている高速分岐処理装
置であることを要旨とする。In order to achieve the above object, a first feature of the present invention is to provide an instruction memory for storing instructions, a program counter for holding an address to the instruction memory, and the instruction memory. In a high-speed branch processing device including an instruction register for storing an instruction read from and decoded from the instruction memory, a read control means for reading a predetermined number of instructions with consecutive addresses from the instruction memory at once, and the read control means. Two instruction buffers for temporarily storing a plurality of instructions read from the instruction memory, and one of the two instruction buffers selected, and a plurality of instruction buffers stored in the selected instruction buffer. An instruction buffer selecting unit for selecting any one of the instructions and outputting the instruction to the instruction register, When the start address of said plurality of instructions by program counter is specified, reads a first plurality of instructions that from said read control means said instruction memory, said first plurality of instructions of the two instructions when stored towards old last access time of the buffer, the instruction buffer selecting means is output to the instruction register to the first plurality of instructions in the address order, the decoded instruction is a conditional branch instruction Is the address of consecutive addresses including the branch destination address before the conditional branch judgment is completed.
The read control means reads a second plurality of instructions from the instruction memory, and the second plurality of instructions is read by the second controller.
One of the stores in the older of the last access time of the instruction buffer, co if the instruction buffer selection means outputs to the instruction register to the second plurality of instruction address order
The address allocation of the instruction stored in the instruction memory is
Read multiple instructions of 1 and read multiple instructions of 2
A high-speed branch processing device whose protrusion does not match
The main point is that it is a table .

【００１４】上記構成によれば、クリティカルパスとな
る場合の多い条件分岐命令における分岐成立時の分岐先
アドレスのフェッチ処理をコストの上昇を招くことなく
高速化を図ることができる。従って、マイクロプロセッ
サの動作周波数を向上させることができる。According to the above configuration, the fetch processing of the branch destination address when the branch is taken in the conditional branch instruction which often becomes the critical path can be speeded up without increasing the cost. Therefore, the operating frequency of the microprocessor can be improved.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を用いて説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００１６】ここで、本実施の形態に係る高速分岐処理
装置は、基本的には次に示す５段（５ステージ）パイプ
ラインの構成を有し、例えばＭＩＰＳＣｏｍｐ．Ｓｙ
ｓｔｅｍｓ，Ｉｎｃ．のＲ３０００のような一般的なＲ
ＩＳＣプロセッサにて用いられるパイプライン構成であ
る。Here, the high-speed branch processing device according to the present embodiment basically has the following five-stage (five-stage) pipeline configuration, for example, MIPS Comp. Sy
stems, Inc. General R such as R3000
This is a pipeline configuration used in the ISC processor.

【００１７】（１）Ｆステージ（フェッチステージ）：
命令をメモリ（キャッシュ）からフェッチして命令レジ
スタにセットする命令を入力するステージである。（２）Ｄステージ（デコードステージ）：命令レジスタ
上の命令をデコードし、各制御信号を生成するステージ
である。（３）Ｅステージ（実行ステージ）：演算を実行する。
またはメモリ（キャッシュ）に対するアクセスのアドレ
スを生成するステージである。（４）Ｍステージ（メモリステージ）：メモリ（キャッ
シュ）に対するアクセスを行うステージである。（５）ＷＢステージ（ライトバックステージ）：レジス
タへのデータの書き込みを行うステージである。(1) F stage (fetch stage):
This is a stage for inputting an instruction that fetches the instruction from the memory (cache) and sets it in the instruction register. (2) D stage (decode stage): This stage decodes the instruction on the instruction register and generates each control signal. (3) E stage (execution stage): executes an operation.
Alternatively, it is a stage for generating an address for access to the memory (cache). (4) M stage (memory stage): This is a stage for accessing the memory (cache). (5) WB stage (write back stage): a stage for writing data to the register.

【００１８】第１の実施の形態図１は、本実施の形態に係る高速分岐処理装置のブロッ
ク図である。図１において、本実施の形態に係る高速分
岐処理装置は現在実行中の命令のアドレスを格納するプ
ログラムカウンタ１と、命令を格納する命令メモリ３
と、命令メモリ３から読み出された命令が取り込まれる
命令レジスタ５と、命令レジスタ５にセットされた命令
をデコードし、各種制御信号を生成する命令デコーダ７
と、加算器９と、加算器９が出力する分岐先アドレスを
取り込む分岐先アドレスレジスタ１１と、条件分岐命令
の比較対象である内容を格納する複数のレジスタ（図示
省略）から成るレジスタファイル１３と、ＡＬＵ（Arit
hmetic and Logic Unit ；算術論理演算装置）１５と、
ＡＬＵ１５の入力レジスタ１７ａ及び１７ｂと、プログ
ラムカウンタ１の示すアドレスと分岐先アドレスレジス
タ１１に格納された分岐先アドレスのどちらか一方を選
択するセレクタ１９と、命令メモリ３から読み出された
連続する複数の命令を取り込む命令バッファ２１Ａ及び
２１Ｂと、命令バッファ２１Ａ及び２１Ｂに取り込まれ
た複数の命令のうち一つを選択するセレクタ２３Ａ及び
２３Ｂと、セレクタ２３Ａ及び２３Ｂにより選択された
２つの命令のうちどちらか一方を選択するセレクタ２５
と、セレクタ２５の制御を行う制御回路２７と、命令デ
コーダ７から出力される分岐命令フラグを取り込む分岐
命令フラグレジスタ２９とから構成されている。First Embodiment FIG. 1 is a block diagram of a high-speed branch processing device according to this embodiment. In FIG. 1, a high-speed branch processing device according to the present embodiment has a program counter 1 for storing the address of an instruction currently being executed and an instruction memory 3 for storing the instruction.
And an instruction register 5 for fetching the instruction read from the instruction memory 3, and an instruction decoder 7 for decoding the instruction set in the instruction register 5 and generating various control signals.
A register file 13 including an adder 9, a branch destination address register 11 for fetching a branch destination address output from the adder 9, and a plurality of registers (not shown) for storing contents to be compared with the conditional branch instruction. , ALU (Arit
hmetic and Logic Unit) 15 and
Input registers 17a and 17b of the ALU 15, a selector 19 for selecting one of the address indicated by the program counter 1 and the branch destination address stored in the branch destination address register 11, and a plurality of consecutive memory read from the instruction memory 3. Of the two instructions selected by the selectors 23A and 23B, the instruction buffers 21A and 21B for fetching the instruction of, the selectors 23A and 23B for selecting one of the plurality of instructions fetched in the instruction buffers 21A and 21B, and Selector 25 for selecting either
And a branch instruction flag register 29 for fetching a branch instruction flag output from the instruction decoder 7, and a control circuit 27 for controlling the selector 25.

【００１９】ここで、命令メモリ３から命令バッファ２
１Ａ及び命令バッファ２１Ｂへの命令の取り込みはＬＲ
Ｕ（Least Recently Used ）方式により実行される。Ｌ
ＲＵ方式は、命令バッファ２１Ａと２１Ｂのうち最後の
取り込み時点の古い方のバッファに命令を取り込む方式
である。Here, from the instruction memory 3 to the instruction buffer 2
1A and instruction buffer 21B fetches instructions by LR
It is executed by the U (Least Recently Used) method. L
The RU method is a method of fetching an instruction in the older one of the instruction buffers 21A and 21B at the time of the last fetch.

【００２０】次に、本実施の形態に係る高速分岐処理装
置の動作について説明する。まず最初に、図１に示す命
令メモリ３の読み出しについて図２を用いて説明する。
図２は、図１に示す命令メモリ３の読み出し動作を示す
概念図であり、読み出し信号がEnable状態の時に、指定
されたアドレスを含む連続したアドレスの複数個の命令
が一度に読み出される。一度に読み出される命令の数は
任意であるが、例えば、図２においては４個である。Next, the operation of the high-speed branch processing device according to this embodiment will be described. First, reading of the instruction memory 3 shown in FIG. 1 will be described with reference to FIG.
FIG. 2 is a conceptual diagram showing the read operation of the instruction memory 3 shown in FIG. 1, and when the read signal is in the Enable state, a plurality of instructions at consecutive addresses including the designated address are read at once. The number of instructions read at one time is arbitrary, but is four in FIG. 2, for example.

【００２１】読み出し信号がEnable状態となるのは２つ
の場合があり、一つは図１のプログラムカウンタ１に格
納されたアドレスが一度に読み出される複数個の命令の
アドレス境界に来た場合、他の一つは分岐命令による分
岐先アドレスの命令をフェッチする場合である。There are two cases where the read signal is in the Enable state. One is when the address stored in the program counter 1 of FIG. 1 comes to the address boundary of a plurality of instructions which are read at a time, and the other. One is when fetching an instruction at a branch destination address by a branch instruction.

【００２２】プログラムカウンタ１のアドレスが一度に
読み出される複数個の命令のアドレス境界に来た場合と
は、例えば、図３（ａ）に示すような命令列に分岐命令
がない命令シーケンスにおいて、プログラムカウンタ１
のアドレスが「８０００８０００」を示すと、読み出し
信号がEnable状態となり、アドレスが「８０００８００
０」、「８０００８００４」、「８０００８００８」、
「８０００８００ｃ」である４個の命令が一度に読み出
される。その後パイプラインが進み、プログラムカウン
タ１のアドレスが「８０００８０１０」を示すと、再
度、読み出し信号がEnable状態となり、アドレスが「８
０００８０１０」、「８０００８０１４」、「８０００
８０１８」、「８０００８０１ｃ」である４個の命令が
同様に一度に読み出される。このように、命令メモリ３
への読み出しはプログラムカウンタ１のアドレスが一度
に読み出される４個の命令のアドレス境界に来た時に行
なわれるので、命令メモリ３へのアクセスは従来と比べ
て減少する。When the address of the program counter 1 reaches the address boundary of a plurality of instructions which are read at once, for example, in the instruction sequence in which there is no branch instruction in the instruction sequence as shown in FIG. Counter 1
When the address of "8008000" indicates that the read signal is in the Enable state, the address is "8000800".
0 "," 80008004 "," 800008008 ",
Four instructions of "8000800c" are read at one time. After that, when the pipeline progresses and the address of the program counter 1 indicates "80008010", the read signal becomes the enable state again and the address becomes "8.
"0008010", "80008014", "8000
Similarly, four instructions “8018” and “8000801c” are read at once. In this way, the instruction memory 3
The access to the instruction memory 3 is reduced as compared with the conventional case because the address of the program counter 1 is read at the address boundary of four instructions which are read at once.

【００２３】一方、分岐命令による分岐先アドレスの命
令をフェッチする場合とは、例えば、次のような場合で
ある。図３（ｂ）に示すような命令列に分岐命令がある
命令のシーケンスにおいて、プログラムカウンタ１のア
ドレスが「８０００８０００」を示すと、図３（ａ）と
同様読み出し信号がEnable状態となり、アドレスが「８
０００８０００」、「８０００８００４」、「８０００
８００８」、「８０００８００ｃ」の４個の命令が一度
に読み出される。この時、読み出された命令に分岐命令
（アドレスが「８０００８００４」の命令）があるとそ
の分岐命令が実行され分岐先の命令（図中targetで示す
アドレスが「８０００８１０４」の命令）が読み出され
るが、この場合にも読み出し信号がEnable状態となるの
である。また、その分岐先の命令が図３（ａ）に示すよ
うな上述したアドレス境界にない場合にはアドレスが
「８０００８１０４」である分岐先命令を含む４個の命
令、つまり、アドレス「８０００８１００」、「８００
０８１０４」、「８０００８１０８」、「８０００８１
ｃ」の４個の命令が一度に読み出される。このように分
岐先命令のアドレスを先頭アドレスとして連続するアド
レスが「８０００８１０４」、「８０００８１０８」、
「８０００８１ｃ」、「８０００８２００」の４個の命
令ではなく、分岐先命令を含むアドレスが「８０００８
１００」、「８０００８１０４」、「８０００８１０
８」、「８０００８１ｃ」の４個の命令を一度に読み出
すのは、それにより命令メモリ３のｉｍｐｌｅｍｅｎｔ
を容易にすることができるからである。On the other hand, the case of fetching the instruction of the branch destination address by the branch instruction is, for example, the following case. When the address of the program counter 1 indicates "8008000" in the sequence of instructions having a branch instruction in the instruction sequence as shown in FIG. 3B, the read signal is in the enable state as in FIG. "8
00080000 "," 80008004 "," 8000
Four instructions "8008" and "8000800c" are read at one time. At this time, if the read instruction includes a branch instruction (instruction with address “80008004”), the branch instruction is executed and the branch destination instruction (instruction with address “800008104” indicated by target in the figure) is read. However, even in this case, the read signal is in the Enable state. Further, when the instruction of the branch destination is not on the above-mentioned address boundary as shown in FIG. 3A, four instructions including the branch destination instruction whose address is “800008104”, that is, the address “80008100”, "800
08104 "," 80008108 "," 800081 "
Four instructions of "c" are read at one time. In this way, consecutive addresses with the address of the branch destination instruction as the start address are “800008104”, “800008108”,
Instead of the four instructions “80081c” and “80008200”, the address including the branch destination instruction is “80008c”.
100 "," 80008104 "," 80000810 "
The four instructions “8” and “80081c” are read at a time because the instruction of the instruction memory 3 is implemented.
This can be facilitated.

【００２４】次に、本実施の形態に係る高速分岐処理装
置における条件分岐命令のフェッチ、条件分岐命令実
行、そして、分岐先の命令がフェッチされるまでのデー
タ及び制御信号の流れについて図１を参照しつつ説明す
る。ここで、前提条件として、現在、プログラムカウン
タ１は、一度に読み出される４個の命令の先頭アドレス
を示しているものとし、かつ、その先頭アドレスに格納
されている命令は条件分岐命令であるとする。但し、先
行する２つ前の命令は分岐命令でないとする。また、仮
想アドレスと物理アドレスは同じものとしているが、仮
にアドレス変換が必要であったとしても本願発明の内容
が変わることはない。Next, FIG. 1 shows the flow of data and control signals until the instruction of the conditional branch instruction is fetched, the conditional branch instruction is executed, and the instruction at the branch destination is fetched in the high-speed branch processing apparatus according to the present embodiment. The description will be made with reference. Here, as a precondition, it is assumed that the program counter 1 currently indicates the head addresses of four instructions read at one time, and the instructions stored at the head addresses are conditional branch instructions. To do. However, it is assumed that the preceding two instructions are not branch instructions. Further, although the virtual address and the physical address are the same, the contents of the present invention do not change even if address conversion is required.

【００２５】まず最初に、Ｆステージ（フェッチステー
ジ）において、プログラムカウンタ１が示すアドレスと
分岐先アドレスレジスタ１１に格納された分岐先アドレ
スとがセレクタ１９にそれぞれ入力され制御信号ａによ
りどちらか一方が選択される。制御信号ａは、先行命
令、すなわち、Ｅステージ（実行ステージ）にある命令
が分岐命令であるか否かを示す信号であり、セレクタ１
９は制御信号ａが先行命令は分岐命令であると示す場合
には分岐先アドレスレジスタ１１に格納された分岐先ア
ドレスを選択し、分岐命令ではないと示す場合にはプロ
グラムカウンタ１が示すアドレスを選択する。ここで
は、上述した前提条件により先行する２つの命令は分岐
命令ではないとしているので、制御信号ａにより命令メ
モリ３にはその読み出しアドレスとしてプログラムカウ
ンタ１が示すアドレスが入力される。さらに、そのアド
レスは前提条件によりアドレス境界の先頭アドレスであ
ることから、命令メモリ３の読み出し信号はEnable状態
となり、プログラムカウンタ１の示すアドレスに続く４
個の命令が命令メモリ３から読み出される。命令メモリ
３から読み出された４個の命令は命令バッファ２１Ａま
たは２１Ｂに取り込まれる。読み出された命令を取り込
む命令バッファの選択は上述したＬＲＵ方式により実行
される。例えば、命令バッファ２１Ａの前サイクルにお
いて最後の取り込みのほうが古い場合には、命令バッフ
ァ２１Ａに入力される取り込み信号ｃがEnable状態、命
令バッファ２１Ｂに入力される取り込み信号ｄがDisabl
e 状態となり、その結果、読み出された命令は命令バッ
ファ２１Ａに取り込まれることになる。First, in the F stage (fetch stage), the address indicated by the program counter 1 and the branch destination address stored in the branch destination address register 11 are input to the selector 19 and either one of them is controlled by the control signal a. To be selected. The control signal a is a signal indicating whether or not the preceding instruction, that is, the instruction at the E stage (execution stage) is a branch instruction.
Reference numeral 9 selects the branch destination address stored in the branch destination address register 11 when the control signal a indicates that the preceding instruction is a branch instruction, and selects the address indicated by the program counter 1 when it indicates that it is not a branch instruction. select. Here, since the preceding two instructions are not branch instructions according to the above-mentioned preconditions, the address indicated by the program counter 1 is input to the instruction memory 3 by the control signal a as the read address. Further, since that address is the start address of the address boundary according to the precondition, the read signal of the instruction memory 3 is in the Enable state, and it follows the address indicated by the program counter 4
Instructions are read from the instruction memory 3. The four instructions read from the instruction memory 3 are loaded into the instruction buffer 21A or 21B. The selection of the instruction buffer for fetching the read instruction is executed by the above-mentioned LRU method. For example, if the last fetch in the previous cycle of the instruction buffer 21A is older, the fetch signal c input to the instruction buffer 21A is in the Enable state, and the fetch signal d input to the instruction buffer 21B is Disabl.
The state becomes e, and as a result, the read instruction is taken into the instruction buffer 21A.

【００２６】そして、命令バッファ２１Ａに格納されて
いる４個の命令はセレクタ２３Ａに、命令バッファ２１
Ｂに格納されている４個の命令はセレクタ２３Ｂにそれ
ぞれ入力され、制御信号ｅ及びｆにより指定されたアド
レスの命令が１個選択される。各セレクタにより選択さ
れた２個の命令はセレクタ２５に出力される。セレクタ
２５は制御回路２７から出力される制御信号ｇにより２
個の命令のうちどちらか一方を選択し、命令レジスタ５
に出力する。制御回路２７は、Ｅステージ（実行ステー
ジ）におけるＡＬＵ回路１５の演算結果である条件分岐
判定信号ｈと、Ｅステージ（実行ステージ）の命令が分
岐命令であることを示す制御信号ａと、命令バッファ２
１Ａ、２１Ｂのうちどちらの最後の取り込み命令が古い
かを示す制御信号ｉとを入力し、Ｅステージ（実行ステ
ージ）の命令が非分岐命令であるかもしくは、分岐命令
で分岐成立と判定された場合には、２つの命令バッファ
うち取り込みが現サイクルにおいて新しい方のバッファ
を選択し、Ｅステージ（実行ステージ）の命令が分岐命
令で非分岐と判定された場合には、現サイクルにおいて
古い方のバッファを選択する制御信号ｇを生成してセレ
クタ２５に出力する。ここでは、前提条件としてＥステ
ージ（実行ステージ）には分岐命令がないことを仮定し
ているので、その結果、現サイクルにおいて最後の取り
込みが新しい方である命令バッファ２１Ａから出力され
る命令がセレクタ２５により選択され、命令レジスタ５
に取り込まれる。Then, the four instructions stored in the instruction buffer 21A are sent to the selector 23A and the instruction buffer 21
The four instructions stored in B are input to the selector 23B, and one instruction at the address specified by the control signals e and f is selected. The two instructions selected by each selector are output to the selector 25. The selector 25 outputs 2 according to the control signal g output from the control circuit 27.
Instruction register 5
Output to. The control circuit 27 receives a conditional branch determination signal h, which is the operation result of the ALU circuit 15 in the E stage (execution stage), a control signal a indicating that the instruction in the E stage (execution stage) is a branch instruction, and an instruction buffer. Two
A control signal i indicating which one of 1A and 21B is the last fetched instruction is input, and it is determined that the instruction at the E stage (execution stage) is a non-branch instruction or the branch instruction is a branch taken instruction. In this case, of the two instruction buffers, fetching selects the newer buffer in the current cycle, and when the E stage (execution stage) instruction is determined to be a non-branch by a branch instruction, the oldest one in the current cycle is selected. A control signal g for selecting a buffer is generated and output to the selector 25. Since it is assumed here that there is no branch instruction in the E stage (execution stage) as a precondition, as a result, the instruction output from the instruction buffer 21A whose fetch is the latest in the current cycle is the selector. 25, the instruction register 5
Is taken into.

【００２７】次に、Ｄステージ（デコードステージ）に
おいて、命令レジスタ５に取り込まれている命令を命令
デコーダ７がデコードし、各種制御信号を生成する。命
令デコーダ７は命令中のｉｎｄｅｘ値を加算器９に出力
し、加算器９はそのｉｎｄｅｘ値とプログラムカウンタ
１の示すアドレスを加算し、分岐先アドレスを決定す
る。分岐先アドレスは分岐先アドレスレジスタ１１に取
り込まれ、同時に、当該命令が分岐命令であることを示
す分岐命令フラグが分岐命令フラグレジスタ２９に取り
込まれる。一方、命令デコーダ７は次のＥステージ（実
行ステージ）で条件分岐判定をするためにレジスタファ
イル１３に制御信号ｊを入力する。レジスタファイル１
３は該当する２組のレジスタの値を読み出し、ＡＬＵ１
５の入力レジスタ１７Ａ及び１７Ｂに格納する。Next, in the D stage (decode stage), the instruction decoder 7 decodes the instruction stored in the instruction register 5 to generate various control signals. The instruction decoder 7 outputs the index value in the instruction to the adder 9, and the adder 9 adds the index value and the address indicated by the program counter 1 to determine the branch destination address. The branch destination address is fetched in the branch destination address register 11, and at the same time, the branch instruction flag indicating that the instruction is a branch instruction is fetched in the branch instruction flag register 29. On the other hand, the instruction decoder 7 inputs the control signal j to the register file 13 to make a conditional branch decision at the next E stage (execution stage). Register file 1
3 reads the values of the corresponding two sets of registers, and ALU1
5 in the input registers 17A and 17B.

【００２８】次に、Ｅステージ（実行ステージ）におい
て、ＡＬＵ１５において演算が実行され、演算結果ｈが
出力される。ここで、本発明の特徴はこの演算が行われ
ている時に同時に次に示す処理を行う点にあり、以下続
けて説明する。Next, in the E stage (execution stage), the arithmetic operation is executed in the ALU 15, and the arithmetic result h is output. Here, the feature of the present invention resides in that the following processing is performed at the same time when this calculation is performed, and will be continuously described below.

【００２９】上述したように命令デコーダ７は入力され
た命令が分岐命令であることを示す分岐命令フラグをレ
ジスタ２９に出力するが、この分岐命令フラグを受けて
命令メモリ３の読み出し信号ｂはEnable状態となる。す
なわち、図７に示す従来技術では、Ｅステージ（実行ス
テージ）におけるＡＬＵ１５の演算が終了するのを待
ち、その演算結果に基づき分岐成立時のみこの読み出し
信号をEnable状態としていたが、本実施の形態では、Ｅ
ステージ（実行ステージ）の命令が分岐命令である場合
には分岐の成立・非成立にかかわらず読み出し信号ｂは
Enable状態となる。従って、ＡＬＵ１５による演算の終
了を待つことなく分岐先アドレスの命令が読み出される
ことになる。一方、分岐先アドレスレジスタ１１に取り
込まれた分岐先アドレスとプログラムカウンタの示すア
ドレスは共に上記セレクタ１９に入力され、分岐命令フ
ラグレジスタ２９に格納された分岐命令フラグに基づき
制御信号ａにより分岐先アドレスが選択され、命令メモ
リ３に出力される。従って、この分岐先アドレスが命令
メモリ３に対する読み出しアドレスとなる。そして、分
岐先アドレスを含む連続する４個の命令が命令メモリ３
から読み出される。命令メモリ３から読み出された４個
の命令は命令バッファ２１Ａまたは２１Ｂに取り込まれ
る。読み出された命令を取り込む命令バッファの選択は
上述したＬＲＵ方式により実行される。命令バッファ２
１Ｂの方が前サイクルにおける最後の取り込みが古い場
合には、命令バッファ２１Ｂに入力される取り込み信号
がEnable状態、命令バッファ２１Ａに入力される取り込
み信号がDisable 状態となり、その結果、読み出された
命令は命令バッファ２１Ｂに取り込まれることになる。As described above, the instruction decoder 7 outputs a branch instruction flag indicating that the input instruction is a branch instruction to the register 29. In response to this branch instruction flag, the read signal b of the instruction memory 3 is enabled. It becomes a state. That is, in the conventional technique shown in FIG. 7, the read signal is enabled only when the branch is taken based on the result of the operation, waiting for the completion of the operation of the ALU 15 in the E stage (execution stage). Then E
When the stage (execution stage) instruction is a branch instruction, the read signal b is
It will be in the Enable state. Therefore, the instruction at the branch destination address is read without waiting for the completion of the operation by the ALU 15. On the other hand, the branch destination address fetched in the branch destination address register 11 and the address indicated by the program counter are both input to the selector 19, and the branch destination address is set by the control signal a based on the branch instruction flag stored in the branch instruction flag register 29. Is selected and output to the instruction memory 3. Therefore, this branch destination address becomes the read address for the instruction memory 3. Then, four consecutive instructions including the branch destination address are stored in the instruction memory 3
Read from. The four instructions read from the instruction memory 3 are loaded into the instruction buffer 21A or 21B. The selection of the instruction buffer for fetching the read instruction is executed by the above-mentioned LRU method. Instruction buffer 2
When the last fetch in the previous cycle is older in 1B, the fetch signal input to the instruction buffer 21B is in the Enable state and the fetch signal input to the instruction buffer 21A is in the Disable state, and as a result, it is read. The instruction will be fetched into the instruction buffer 21B.

【００３０】そして、命令バッファ２１Ａに格納され
た４個の命令はセレクタ２３Ａに、命令バッファ２１Ｂ
に格納された４個の命令はセレクタ２３Ｂにそれぞれ入
力され、制御信号ｅ及びｆにより指定されたアドレスの
命令が１個選択される。各セレクタにより選択された２
個の命令はセレクタ２５に出力される。セレクタ２５は
制御回路２７から出力される制御信号ｇにより２個の命
令のうちどちらか一方を選択し、命令レジスタ５に出力
する。ここで、条件分岐が成立する場合には、制御回路
２７は２つの命令バッファのうち取り込みが新しい方の
バッファを選択する制御信号ｇを生成するので、分岐先
の命令が選択され命令レジスタ５に出力される。Then, the four instructions stored in the instruction buffer 21A are stored in the selector 23A and stored in the instruction buffer 21B.
The four instructions stored in are input to the selector 23B, and one instruction at the address specified by the control signals e and f is selected. 2 selected by each selector
This instruction is output to the selector 25. The selector 25 selects one of the two instructions according to the control signal g output from the control circuit 27 and outputs it to the instruction register 5. Here, if the conditional branch is established, the control circuit
27 generates a control signal g for selecting the buffer of the two instruction buffers that has been fetched later, so that the branch destination instruction is selected and output to the instruction register 5.

【００３１】このように、従来では、Ｅステージ（実行
ステージ）における分岐判定のためのＡＬＵ演算の終了
を持ち、その演算結果により分岐成立と判定された後に
行われていた条件分岐命令の条件判定成立時のおける分
岐先アドレスの命令フェッチを、本実施の形態によれ
ば、Ｄステージ（デコードステージ）の命令が条件分岐
命令である場合には次のＥステージ（実行ステージ）に
おけるＡＬＵ演算の結果を待つことなく独立に実行し、
条件判定成立・非成立にかかわらず分岐先アドレスの命
令を命令メモリ３から読み出すようにしているので、Ａ
ＬＵ演算がかかる命令メモリ読み出し処理に隠れるた
め、分岐先の命令のフェッチを高速に行うことが可能と
なる。従って、マイクロプロセッサにおいて、クリティ
カルパスとなりがちな条件分岐命令による分岐成立時の
分岐先命令のフェッチをコストの上昇を招くことなく高
速に実行することができるようになる。それにより、マ
イクロプロセッサの動作周波数を向上することができ
る。As described above, conventionally, the condition judgment of the conditional branch instruction is made after the ALU operation for branch judgment at the E stage (execution stage) is completed and the branch is judged from the result of the operation. According to the present embodiment, when the instruction at the branch destination address when taken is satisfied, the result of the ALU operation at the next E stage (execution stage) when the instruction at the D stage (decode stage) is a conditional branch instruction. Run independently without waiting for
The instruction at the branch destination address is read from the instruction memory 3 regardless of whether the condition determination is satisfied or not satisfied.
Since the LU operation is hidden by the instruction memory reading process, the instruction at the branch destination can be fetched at high speed. Therefore, in the microprocessor, the fetching of the branch destination instruction when the branch is taken by the conditional branch instruction that tends to become the critical path can be executed at high speed without increasing the cost. Thereby, the operating frequency of the microprocessor can be improved.

【００３２】ここで、図４は、分岐先命令のフェッチの
動作を示すタイミングチャートであり、（ａ）が図１１
に示す従来の分岐処理装置の動作を示すものであり、
（ｂ）が図１に示す本実施の形態に係る分岐処理装置の
動作を示すものである。図４（ａ）に示すように、従来
分岐処理装置では、時刻ｔ₁〜ｔ₂においてＡＬＵ１５
による演算、時刻ｔ₂〜ｔ₃においてセレクタ１９によ
る選択、時刻ｔ₃〜ｔ₄において命令メモリ３からの命
令の読み出し、時刻ｔ₄〜ｔ₅において命令レジスタ５
への命令の取り込みが順次行われている。一方、図４
（ｂ）に示すように、本実施の形態に係る分岐処理装置
では、時刻Ｔ₁〜Ｔ₂において命令メモリ３からの命令
の読み出し、時刻Ｔ₂〜Ｔ₃においてセレクタ２５によ
る選択、時刻Ｔ₃〜Ｔ₄において命令レジスタ５への命
令の取り込みは順次行われているが、ＡＬＵ１５による
演算（図中Ａで示す期間）は命令メモリ３からの命令の
読み出しが行われる期間中に実行され、見かけ上命令メ
モリ３からの命令の読み出しに隠れている。従って、図
４（ａ）及び（ｂ）から明らかなように、本実施の形態
のほうが動作周波数を高くすることが可能である。Here, FIG. 4 is a timing chart showing the fetch operation of the branch destination instruction, and FIG.
The operation of the conventional branch processing device shown in
FIG. 3B shows the operation of the branch processing device according to the present embodiment shown in FIG. As shown in FIG. 4A, in the conventional branch processing device, the ALU 15 is operated at times t _{1 to} t ₂ .
Instruction register operation, selected by the selector 19 at time t ₂ ~t _3, read instructions from the instruction memory 3 at time t ₃ ~t _4, at time t ₄ ~t ₅ by 5
Commands are being sequentially fetched into. On the other hand, FIG.
(B), the branched processing apparatus according to this embodiment, the time T ₁ through T reading of instructions from the instruction memory 3 in _2, selected by the selector 25 at time T ₂ through T _3, time T ₃ Although instructions are sequentially fetched into the instruction register 5 at T _{4 to} T ₄ , the operation by the ALU 15 (the period indicated by A in the figure) is executed during the period in which the instructions are read from the instruction memory 3 and is apparent. It is hidden in the reading of the instruction from the upper instruction memory 3. Therefore, as apparent from FIGS. 4A and 4B, the operating frequency can be increased in this embodiment.

【００３３】第２の実施の形態本実施の形態に係る高速分岐処理装置は、図１に示す命
令メモリ３に格納された条件分岐命令を、命令メモリ３
の読み出し信号がEnable状態となる２つの場合、すなわ
ち、図１のプログラムカウンタ１に格納されたアドレス
が一度に読み出される複数個の命令のアドレス境界に来
た場合と分岐命令による分岐先アドレスの命令をフェッ
チする場合とが一致しないようなアドレスに配置した構
成としたものである。Second Embodiment A high-speed branch processing device according to the present embodiment stores the conditional branch instruction stored in the instruction memory 3 shown in FIG.
In the two cases in which the read signal is in the Enable state, that is, when the address stored in the program counter 1 in FIG. 1 comes to the address boundary of a plurality of instructions to be read at once, and the instruction of the branch destination address by the branch instruction. Are arranged at addresses that do not match when fetching.

【００３４】命令メモリ３に格納される実行プログラム
の命令の配置を上記構成とすることにより、第１の実施
の形態では発生する、アドレス境界による読み出しと分
岐命令実行による読み出しとの競合を回避することがで
きる。従って、ハードウェアコストの上昇を招くことな
く、高速な分岐処理を実現することができる。By arranging the instructions of the execution program stored in the instruction memory 3 as described above, the conflict between the read by the address boundary and the read by the branch instruction execution which occurs in the first embodiment is avoided. be able to. Therefore, high-speed branch processing can be realized without increasing the hardware cost.

【００３５】というのは、図５（ａ）に示すように分岐
命令（図中branchで示すアドレス「８０００８００８」
の命令）が一度に読み出される複数個の命令のアドレス
境界の最後から２番目に分岐命令がある場合には、分岐
命令がＥステージ（実行ステージ）に来たときに、Ｆス
テージ（フェッチステージ）のプログラムカウンタが示
すアドレスがアドレス境界の先頭番地に来るため、第１
の実施の形態に係る分岐処理装置では、上述したように
アドレス境界による読み出しと分岐命令実行による読み
出しとの競合が生じる。しかし、図５（ｂ）〜（ｃ）に
示すように、分岐命令をアドレス境界の後ろから２番目
以外に配置することにより上記競合が生じることはない
のである。This is because, as shown in FIG. 5A, a branch instruction (address "800008008" indicated by branch in the figure) is used.
When the branch instruction reaches the E stage (execution stage), the F stage (fetch stage) Since the address indicated by the program counter of is at the beginning address of the address boundary,
In the branch processing device according to the embodiment of the present invention, as described above, contention occurs between the reading by the address boundary and the reading by executing the branch instruction. However, as shown in FIGS. 5B to 5C, the above conflict does not occur by arranging the branch instruction other than the second from the end of the address boundary.

【００３６】第３の実施の形態上述した第２の実施の形態に係る高速分岐処理装置にお
いては、分岐命令をアドレス境界の後ろから２番目以外
に配置することによりアドレス境界による読み出しと分
岐命令実行による読み出しとの競合を回避するようにし
ている。図６は、上記図５（ａ）に示すアドレス配置を
分岐命令がアドレス境界の後ろから２番目以外に配置さ
れるようにｎｏｐ（No OPeration；無操作）命令を挿入
することにより上記２つの読み出しが競合しないように
したアドレス配置を示す図である。このようにすること
により、上記競合を回避することができるが、その一方
でプログラムのコードサイズの増大を招く恐れがある。Third Embodiment In the high-speed branch processing device according to the second embodiment described above, by arranging a branch instruction at a position other than the second from the end of the address boundary, reading and execution of the branch instruction at the address boundary are performed. The conflict with the read by is avoided. FIG. 6 shows the above-mentioned two read operations by inserting the nop (No OPeration) instruction so that the branch instruction is placed at a position other than the second from the end of the address boundary, as shown in FIG. 5 (a). FIG. 3 is a diagram showing an address arrangement in which no conflict occurs. By doing so, the above conflict can be avoided, but on the other hand, the code size of the program may increase.

【００３７】そこで、本実施の形態では、ハードウェア
により上記２つの読み出しの競合の検出及びその競合に
対しての処理を行う構成とすることにより、上記第２の
実施の形態において起こり得るプログラムのコードサイ
ズの増大を招くことなく、上記競合を回避することを可
能とする。Therefore, in the present embodiment, by adopting a configuration in which the hardware detects the conflict between the two readings and processes the conflict, the program that may occur in the second embodiment is stored. It is possible to avoid the above conflict without increasing the code size.

【００３８】図７は、本実施の形態に係る高速分岐処理
装置における実行命令のアドレス配置の一例を示す図で
あり、図７（ａ）は分岐が成立したと判定された場合を
示す図、図７（ｂ）は分岐が成立しないと判定された場
合を示す図である。図７（ａ）及び（ｂ）において、ま
ず、分岐命令（inst2 ）のＤステージ（デコードステー
ジ）で、分岐先命令（insta ）の読み出しとアドレス境
界の先頭命令（inst4）の読み出しとの競合を検出す
る。ここで、この検出は、命令のアドレスと対応する命
令コードが分岐命令であるか否かをチェックすることに
より容易に行うことができる。競合が検出されると、次
のステージで分岐が成立しないと判定された場合に実行
されるアドレス境界の先頭命令（inst4 ）の読み出しを
保留し、分岐先命令（insta ）を読み出す。そして、分
岐が成立したと判定された場合には、図７（ａ）に示す
ように、そのまま処理が進む。一方、分岐が成立しない
と判定された場合には、図７（ｂ）に示すように、さら
に、その次のステージで先に読み出された分岐先命令
（insta ）を無効化（invalidate）すると共に、保留さ
れていたアドレス境界の先頭命令（inst4 ）の読み出し
が実行され、その後処理が進む。FIG. 7 is a diagram showing an example of the address arrangement of the execution instructions in the high speed branch processing device according to the present embodiment, and FIG. 7 (a) is a diagram showing a case where it is determined that the branch is taken, FIG. 7B is a diagram showing a case where it is determined that the branch is not taken. In FIGS. 7A and 7B, first, in the D stage (decode stage) of the branch instruction (inst2), there is a conflict between the read of the branch target instruction (insta) and the read of the first instruction (inst4) of the address boundary. To detect. Here, this detection can be easily performed by checking whether the instruction code corresponding to the address of the instruction is a branch instruction. When a conflict is detected, the read of the first instruction (inst4) at the address boundary, which is executed when it is determined that the branch is not taken in the next stage, is suspended and the branch destination instruction (insta) is read. Then, when it is determined that the branch is taken, the process proceeds as it is, as shown in FIG. On the other hand, when it is determined that the branch is not taken, as shown in FIG. 7B, the branch destination instruction (insta) read earlier in the next stage is further invalidated. At the same time, the read of the suspended first instruction (inst4) of the address boundary is executed, and then the process proceeds.

【００３９】図８は、上述した処理を行う高速分岐処理
装置の一部を示すブロック図であり、この高速分岐処理
装置は、図１に示す高速分岐処理装置を構成するプログ
ラムカウンタ１とセレクタ１９との間にさらにセレクタ
３１と加算器３３を加えた構成となっている。なお、そ
の他の部分は図１の高速分岐処理装置と全く同一であ
る。図８において、分岐命令（inst2 ）のＥステージ
（実行ステージ）で、プログラムカウンタ１に格納され
たアドレスと分岐先アドレスレジスタ１１の格納された
分岐先アドレスがセレクタ１９に入力される。ここで、
上述したように、前のＤステージ（デコードステージ）
で、分岐先命令（insta ）の読み出しとアドレス境界の
先頭命令（inst4 ）の読み出しとの競合が検出された場
合には、制御信号ｌにより優先的に分岐先アドレスが選
択され、命令メモリ（図示省略）にはその読み出しアド
レスとして分岐先アドレスが入力される。また、分岐先
アドレスは加算器３３にも入力され、該加算器３３によ
り（ここでは４が）加算される。この加算結果とプログ
ラムカウンタ１に格納されたアドレスとがセレクタ３１
に入力され、制御信号ｋによりそのうち一方が選択され
る。このＥステージ（実行ステージ）においては同時に
条件分岐判定が行われるが、分岐が成立したと判定され
た場合（図７（ａ）の場合）には、制御信号ｋにより上
記加算結果が選択され、プログラムカウンタ１に入力さ
れ、プログラムカウンタの指示するアドレスを上記加算
結果に変更する。一方、分岐が成立しないと判定された
場合（図７（ｂ）の場合）には、プログラムカウンタ１
の格納されているアドレスが選択され、同様にプログラ
ムカウンタ１に入力される。すなわち、プログラムカウ
ンタ１に格納されたアドレスが保持される。さらに、命
令レジスタ（図示省略）に格納されている分岐先命令を
無効化する。FIG. 8 is a block diagram showing a part of a high-speed branch processing device for performing the above-mentioned processing. This high-speed branch processing device is a program counter 1 and a selector 19 which constitute the high-speed branch processing device shown in FIG. It is configured such that a selector 31 and an adder 33 are further added between and. The other parts are exactly the same as those of the high-speed branch processing device of FIG. In FIG. 8, at the E stage (execution stage) of the branch instruction (inst2), the address stored in the program counter 1 and the branch destination address stored in the branch destination address register 11 are input to the selector 19. here,
As mentioned above, the previous D stage (decode stage)
When a conflict between the read of the branch destination instruction (insta) and the read of the first instruction (inst4) of the address boundary is detected, the branch destination address is preferentially selected by the control signal 1 and the instruction memory (shown in the figure The branch destination address is input as the read address. The branch destination address is also input to the adder 33, and the adder 33 adds (in this case, 4). The addition result and the address stored in the program counter 1 are assigned to the selector 31.
One of them is selected by the control signal k. In the E stage (execution stage), the conditional branch determination is performed at the same time, but when it is determined that the branch is taken (in the case of FIG. 7A), the addition result is selected by the control signal k, The address input to the program counter 1 and designated by the program counter is changed to the above addition result. On the other hand, when it is determined that the branch is not taken (in the case of FIG. 7B), the program counter 1
The address stored in is selected and similarly input to the program counter 1. That is, the address stored in the program counter 1 is held. Further, the branch destination instruction stored in the instruction register (not shown) is invalidated.

【００４０】第４の実施の形態上記第３の実施の形態においては、分岐先命令（insta
）の読み出しとアドレス境界の先頭命令（inst4 ）の
読み出しとの競合が生じた場合にいったん保留をするの
はアドレス境界の先頭命令（inst4 ）の方であったが、
本実施の形態においては分岐先命令（insta ）の読み出
しを保留する構成としたものである。Fourth Embodiment In the third embodiment, the branch destination instruction (insta
) And the read of the first instruction of the address boundary (inst4) conflicts, the first instruction of the address boundary (inst4) suspends it, but
In this embodiment, the reading of the branch destination instruction (insta) is suspended.

【００４１】図９は、本実施の形態に係る高速分岐処理
装置における実行命令のアドレス配置の一例を示す図で
あり、図９（ａ）は分岐が成立したと判定された場合を
示す図、図９（ｂ）は分岐が成立しないと判定された場
合を示す図である。図９（ａ）及び（ｂ）において、ま
ず、分岐命令（inst2 ）のＤステージ（デコードステー
ジ）で、分岐先命令（insta ）の読み出しとアドレス境
界の先頭命令（inst4）の読み出しとの競合を検出す
る。ここで、この検出は、命令のアドレスと対応する命
令コードが分岐命令であるか否かをチェックすることに
より容易に行うことができる。競合が検出されると、次
のステージで分岐先命令（insta ）の読み出しを保留
し、分岐が成立しないと判定された場合に実行されるア
ドレス境界の先頭命令（inst4 ）を読み出す。そして、
分岐が成立したと判定された場合には、図９（ａ）に示
すように、さらに、その次のステージで先に読み出され
たアドレス境界の先頭命令（inst4 ）を無効化（invali
date）されると共に、保留されていた分岐先命令（inst
a ）の読み出しが実行され、その後処理が進む。一方、
分岐が成立しないと判定された場合には、図９（ｂ）に
示すように、そのまま処理が進む。FIG. 9 is a diagram showing an example of the address arrangement of the execution instructions in the high speed branch processing device according to the present embodiment, and FIG. 9 (a) is a diagram showing the case where the branch is determined to be taken, FIG. 9B is a diagram showing a case where it is determined that the branch is not taken. In FIGS. 9A and 9B, first, in the D stage (decode stage) of the branch instruction (inst2), there is a conflict between the read of the branch destination instruction (insta) and the read of the first instruction (inst4) of the address boundary. To detect. Here, this detection can be easily performed by checking whether the instruction code corresponding to the address of the instruction is a branch instruction. When a conflict is detected, the reading of the branch destination instruction (insta) is suspended at the next stage, and the first instruction (inst4) of the address boundary that is executed when it is determined that the branch is not taken is read. And
When it is determined that the branch is taken, as shown in FIG. 9A, the first instruction (inst4) of the address boundary read earlier in the next stage is invalidated (invali).
date) and the branch target instruction (inst
The reading of a) is executed, and then the process proceeds. on the other hand,
When it is determined that the branch is not taken, the process proceeds as it is, as shown in FIG.

【００４２】図１０は、上述した処理を行う高速分岐処
理装置の一部を示すブロック図であり、この高速分岐処
理装置は、図１に示す高速分岐処理装置を構成するプロ
グラムカウンタ１とセレクタ１９との間にさらにセレク
タ３５と加算器３７を加えた構成となっている。なお、
その他の部分は図１の高速分岐処理装置と全く同一であ
る。図９において、分岐命令（inst2 ）のＥステージ
（実行ステージ）で、プログラムカウンタ１に格納され
たアドレスと分岐先アドレスレジスタ１１の格納された
分岐先アドレスがセレクタ１９に入力される。ここで、
上述したように、前のＤステージ（デコードステージ）
で、分岐先命令（insta ）の読み出しとアドレス境界の
先頭命令（inst4 ）の読み出しとの競合が検出された場
合には、制御信号ｎにより優先的にプログラムカウンタ
１に格納されたアドレスが選択され、命令メモリ（図示
省略）にはその読み出しアドレスとしてプログラムカウ
ンタ１に格納されたアドレスが入力される。また、プロ
グラムカウンタ１に格納されたアドレスは加算器３７に
も入力され、該加算器３７により（ここでは４）が加算
される。この加算結果と分岐先アドレスとがセレクタ３
５に入力され、制御信号ｍによりそのうち一方が選択さ
れる。このＥステージ（実行ステージ）においては同時
に条件分岐判定が行われるが、分岐が成立したと判定さ
れた場合（図９（ａ）の場合）には、制御信号ｍにより
分岐先アドレスが選択され、プログラムカウンタ１に入
力され、プログラムカウンタの指示するアドレスを上記
分岐先アドレスに変更する。さらに、命令レジスタ（図
示省略）に格納されているアドレス境界の先頭命令（in
st4 ）を無効化する。一方、分岐が成立しないと判定さ
れた場合（図９（ｂ）の場合）には、上記加算結果が選
択され、同様にプログラムカウンタ１に入力される。FIG. 10 is a block diagram showing a part of a high-speed branch processing device for performing the above-mentioned processing. This high-speed branch processing device is a program counter 1 and a selector 19 which constitute the high-speed branch processing device shown in FIG. Further, a selector 35 and an adder 37 are added between and. In addition,
The other parts are exactly the same as those of the high-speed branch processing device of FIG. In FIG. 9, at the E stage (execution stage) of the branch instruction (inst2), the address stored in the program counter 1 and the branch destination address stored in the branch destination address register 11 are input to the selector 19. here,
As mentioned above, the previous D stage (decode stage)
When a conflict between the read of the branch destination instruction (insta) and the read of the first instruction (inst4) of the address boundary is detected, the address stored in the program counter 1 is preferentially selected by the control signal n. The address stored in the program counter 1 is input to the instruction memory (not shown) as its read address. The address stored in the program counter 1 is also input to the adder 37, and the adder 37 adds (4 in this case). The result of this addition and the branch destination address are the selector 3
5, and one of them is selected by the control signal m. In the E stage (execution stage), the conditional branch determination is performed at the same time, but when it is determined that the branch is taken (in the case of FIG. 9A), the branch destination address is selected by the control signal m, The address input to the program counter 1 and designated by the program counter is changed to the branch destination address. Furthermore, the first instruction (in the address boundary) stored in the instruction register (not shown)
st4) is invalidated. On the other hand, when it is determined that the branch is not taken (the case of FIG. 9B), the addition result is selected and similarly input to the program counter 1.

【００４３】[0043]

【発明の効果】以上説明したように、本発明によれば、
マイクロプロセッサの動作周波数を律速するクリティカ
ルパスである条件分岐命令の分岐成立時の分岐先アドレ
スのフェッチ処理を高速に行うことができる。また、複
数個の命令を一度に読み出すことにより、命令メモリへ
のアクセスの回数を減少させることができ、それによ
り、消費電力の低減化を図ることができる。さらに、読
み出した複数の命令を一時的に記憶する命令バッファを
２つ設けることにより、命令メモリも２ポート読み出し
ではなく、１ポート読み出しで対処することができるの
で、コストの増大を招くこともない。As described above, according to the present invention,
The fetch processing of the branch destination address can be performed at high speed when the branch of the conditional branch instruction, which is a critical path that limits the operating frequency of the microprocessor, is taken. Further, by reading out a plurality of instructions at once, the number of times of accessing the instruction memory can be reduced, and thus power consumption can be reduced. Furthermore, by providing two instruction buffers for temporarily storing a plurality of read instructions, the instruction memory can be handled by the one-port reading instead of the two-port reading, so that the cost does not increase. .

[Brief description of drawings]

【図１】本発明の第１の実施の形態に係る高速分岐処理
装置のブロック図である。FIG. 1 is a block diagram of a high-speed branch processing device according to a first embodiment of the present invention.

【図２】図１に示す命令メモリの読み出し動作を示す概
念図である。FIG. 2 is a conceptual diagram showing a read operation of the instruction memory shown in FIG.

【図３】図１に示す命令メモリの読み出し動作を説明す
るための図である。FIG. 3 is a diagram for explaining a read operation of the instruction memory shown in FIG.

【図４】分岐先命令のフェッチの動作を示すタイミング
チャートであり、（ａ）が図７に示す従来の分岐処理装
置の動作を示すものであり、（ｂ）が図１に示す第１の
実施の形態に係る高速分岐処理装置の動作を示すもので
ある。4 is a timing chart showing a fetch operation of a branch target instruction, FIG. 4A shows an operation of the conventional branch processing device shown in FIG. 7, and FIG. 4B is the first branch shown in FIG. 3 illustrates an operation of the high-speed branch processing device according to the embodiment.

【図５】本発明の第２の実施の形態に係る高速分岐処理
装置を説明するための図である。FIG. 5 is a diagram for explaining a high-speed branch processing device according to a second embodiment of the present invention.

【図６】本発明の第３の実施の形態に係る高速分岐処理
装置を説明するための図である。FIG. 6 is a diagram for explaining a high-speed branch processing device according to a third embodiment of the present invention.

【図７】本発明の第３の実施の形態に係る高速分岐処理
装置を説明するための他の図である。FIG. 7 is another diagram for explaining the high-speed branch processing device according to the third embodiment of the present invention.

【図８】本発明の第３の実施の形態に係る高速分岐処理
装置の一部を示すブロック図である。FIG. 8 is a block diagram showing a part of a high-speed branch processing device according to a third embodiment of the present invention.

【図９】本発明の第４の実施の形態に係る高速分岐処理
装置を説明するための図である。FIG. 9 is a diagram for explaining a high-speed branch processing device according to a fourth embodiment of the present invention.

【図１０】本発明の第４の実施の形態に係る高速分岐処
理装置の一部を示すブロック図である。FIG. 10 is a block diagram showing a part of a high-speed branch processing device according to a fourth embodiment of the present invention.

【図１１】従来の分岐処理装置のブロック図である。FIG. 11 is a block diagram of a conventional branch processing device.

[Explanation of symbols]

１プログラムカウンタ３命令メモリ５命令レジスタ７命令デコーダ９、３３、３７加算器１１分岐先アドレスレジスタ１３レジスタファイル１５ＡＬＵ（Arithmetic and Logic Unit ；算術論理
演算装置）１７Ａ、１７Ｂ入力レジスタ１９、２３Ａ、２３Ｂ、２５、３１、３５セレクタ２１Ａ、２１Ｂ命令バッファ２７制御回路２９分岐命令フラグレジスタ1 Program Counter 3 Instruction Memory 5 Instruction Register 7 Instruction Decoder 9, 33, 37 Adder 11 Branch Destination Address Register 13 Register File 15 ALU (Arithmetic and Logic Unit) 17A, 17B Input Register 19, 23A, 23B , 25, 31, 35 selectors 21A, 21B instruction buffer 27 control circuit 29 branch instruction flag register

フロントページの続き (56)参考文献特開平６−161751（ＪＰ，Ａ) 特開平７−121371（ＪＰ，Ａ) 特開平４−162134（ＪＰ，Ａ) 特開平３−156534（ＪＰ，Ａ) 特開平２−105938（ＪＰ，Ａ) 特開昭63−318634（ＪＰ，Ａ) 特開平５−224927（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 9/38 Continuation of front page (56) Reference JP-A-6-161751 (JP, A) JP-A-7-121371 (JP, A) JP-A-4-162134 (JP, A) JP-A-3-156534 (JP , A) JP-A-2-105938 (JP, A) JP-A-63-318634 (JP, A) JP-A-5-224927 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB) Name) G06F 9/38

Claims

(57) [Claims]

1. An instruction memory for storing instructions, a program counter for holding an address to the instruction memory,
A high-speed branch processing device comprising an instruction register for storing an instruction read from the instruction memory and decoded, and a read control unit for reading out a predetermined number of consecutive addresses from the instruction memory at one time, Two instruction buffers for temporarily storing a plurality of instructions read from the instruction memory by the read control means, and one of the two instruction buffers are selected,
An instruction buffer selecting unit that selects any one of the plurality of instructions stored in the selected instruction buffer and outputs the selected instruction to the instruction register; When the start address of each instruction is designated, the read control means reads the first plurality of instructions from the instruction memory, and the first plurality of instructions is read as the last one of the two instruction buffers. The instruction buffer selecting means stores the oldest one at the time of access, outputs the first plurality of instructions to the instruction register in the address order, and determines the conditional branch if the decoded instruction is a conditional branch instruction. Before completion, the read control means reads from the instruction memory a second plurality of instructions at consecutive addresses including the branch destination address, A plurality of instructions are stored in the older one of the two instruction buffers at the time of the last access, and the instruction buffer selecting means outputs the second plurality of instructions to the instruction register in the order of addresses, and A high-speed branch process characterized in that the address arrangement of the instructions stored in the instruction memory is such that the reading of the first plurality of instructions and the reading of the second plurality of instructions do not match. apparatus.

2. An instruction memory for storing instructions and the instructions
A program counter that holds the address to memory,
Instructions read from the instruction memory and decoded
In a high-speed branch processing device equipped with an instruction register for storing
And a predetermined number of consecutive addresses from the instruction memory
Read control means for reading at a time, and read from the instruction memory by the read control means.
Two instruction buffers for temporarily storing a plurality of stored instructions.
And one of the two instruction buffers,
In addition, multiple instructions stored in the selected instruction buffer
Select any one of the orders, and
Instruction buffer selecting means for outputting to the instruction register, and the program counter causes the head of the plurality of instructions to be output.
When an address is specified, the first plurality of instructions
The read control means reads from the instruction memory,
The first plurality of instructions are stored in the two instruction buffers.
The instruction buffer is stored in the old one at the time of the last access.
Selecting means precedes the first plurality of instructions in the order of addresses.
If the decoded instruction is a conditional branch instruction, the condition is output to the instruction register.
Consecutive including the branch destination address before the branch judgment is completed
Read control of a second plurality of instructions at an address
Means reads from said instruction memory, said second plurality
Last instruction of the two instruction buffers
The oldest one is stored, and the instruction buffer selection means
The second plurality of instructions in the order of addresses
In addition to the above, the high-speed branch processing device further detects a conflict between the reading of the first plurality of instructions and the reading of the second plurality of instructions, and the conflict detecting means. If a read conflict is detected by, the read of the first plurality of instructions is suspended and the read of the second plurality of instructions is preferentially performed before the end of the conditional branch determination. As a result of the conditional branch determination, if the branch is not taken, the read second plurality of instructions are invalidated, and the suspended first instruction is read. And a high-speed branching processing device.

3. An instruction memory for storing instructions and the instructions
A program counter that holds the address to memory,
Instructions read from the instruction memory and decoded
In a high-speed branch processing device equipped with an instruction register for storing
And a predetermined number of consecutive addresses from the instruction memory
Read control means for reading at a time, and read from the instruction memory by the read control means.
Two instruction buffers for temporarily storing a plurality of stored instructions.
And one of the two instruction buffers,
In addition, multiple instructions stored in the selected instruction buffer
Select any one of the orders, and
Instruction buffer selecting means for outputting to the instruction register, and the program counter causes the head of the plurality of instructions to be output.
When an address is specified, the first plurality of instructions
The read control means reads from the instruction memory,
The first plurality of instructions are stored in the two instruction buffers.
The instruction buffer is stored in the old one at the time of the last access.
Selecting means precedes the first plurality of instructions in the order of addresses.
If the decoded instruction is a conditional branch instruction, the condition is output to the instruction register.
Consecutive including the branch destination address before the branch judgment is completed
Read control of a second plurality of instructions at an address
Means reads from said instruction memory, said second plurality
Last instruction of the two instruction buffers
The oldest one is stored, and the instruction buffer selection means
The second plurality of instructions in the order of addresses
In addition to the above, the high-speed branch processing device further detects a conflict between the reading of the first plurality of instructions and the reading of the second plurality of instructions, and the conflict detecting means. If a read conflict is detected by, the read of the second plurality of instructions is suspended and the read of the first plurality of instructions is preferentially performed before the end of the conditional branch determination. As a result of the conditional branch determination, if the branch is taken, the first plurality of instructions that have been read are invalidated, and the second plurality of instructions that have been suspended are read. And a high-speed branching processing device.