JPH0769807B2

JPH0769807B2 - Data processing device

Info

Publication number: JPH0769807B2
Application number: JP63205565A
Authority: JP
Inventors: 達也上田; 雅仁松尾
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1988-08-18
Filing date: 1988-08-18
Publication date: 1995-07-31
Anticipated expiration: 2010-07-31
Also published as: JPH0254336A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、多段パイプライン処理を行うデータ処理装
置に係り、特に分岐命令を多段パイプライン処理するデ
ータ処理装置に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device for performing multi-stage pipeline processing, and more particularly to a data processing device for performing multi-stage pipeline processing of branch instructions.

[Conventional technology]

第５図は従来のデータ処理装置のパイプライン処理構成
を説明するブロック図であり、以下、構成ならびに動作
について説明する。FIG. 5 is a block diagram for explaining the pipeline processing configuration of the conventional data processing apparatus, and the configuration and operation will be described below.

命令フェッチステージ（IFステージ）51は、メモリから
命令コードをフェッチして命令デコードステージ（Ｄス
テージ）52に出力する。Ｄステージ52は、IFステージ51
から入力される命令コードをデコードして、デコード結
果をオペランドアドレス計算ステージ（Ａステージ）53
に出力する。Ａステージ53は命令コード中で指定された
オペランドの実行アドレスの計算を行い、計算したオペ
ランドアドレスをオペランドフェッチステージ（Ｆステ
ージ）54に出力する。Ｆステージ54はＡステージ53から
入力されたオペランドアドレスに従い、メモリよりオペ
ランドをフェッチする。フェッチしたオペランド命令実
行ステージ（Ｅステージ）55に出力する。Ｅステージ55
は、Ｆステージ54から入力されたオペランドに対して命
令コード中で指定された演算を実行し、必要に応じて演
算結果をメモリにストアする。The instruction fetch stage (IF stage) 51 fetches an instruction code from the memory and outputs it to the instruction decode stage (D stage) 52. D stage 52 is IF stage 51
The instruction code input from is decoded, and the decoded result is the operand address calculation stage (A stage) 53
Output to. The A stage 53 calculates the execution address of the operand specified in the instruction code, and outputs the calculated operand address to the operand fetch stage (F stage) 54. The F stage 54 fetches an operand from the memory according to the operand address input from the A stage 53. Output to the fetched operand instruction execution stage (E stage) 55. E stage 55
Executes the operation specified in the instruction code on the operand input from the F stage 54, and stores the operation result in the memory as necessary.

上記パイプライン処理機構により、各命令で指定される
処理は５つに分解され、５つの処理を順番に実行するこ
とにより、指定された処理を完了する。各々５つの処理
は異なる命令に対しては並列動作させることが可能であ
り、理想的には上記５段のパイプライン処理機構により
５つの命令を同時に処理し、パイプライン処理を行わな
い場合に比べて、最大で５倍の処理能力をもつデータ処
理が実行可能となる。The pipeline processing mechanism decomposes the process specified by each instruction into five processes, and executes the five processes in order to complete the specified process. Each of the five processes can be operated in parallel for different instructions. Ideally, the five-stage pipeline processing mechanism processes five instructions at the same time, compared to the case where no pipeline processing is performed. As a result, data processing having a maximum processing capacity of 5 times can be executed.

従来のパイプライン処理は、上記のように例えば５段の
ステージ処理を実行させることのより、処理能力を大幅
に向上させることが可能となり、データの高速処理に大
きく貢献できるが、命令シーケンスを乱す分岐命令に関
しては、常に理想的なパイプライン処理が実行できなく
なる場合もある。The conventional pipeline processing can significantly improve the processing capacity by executing, for example, five stages of processing as described above, and can greatly contribute to high-speed processing of data, but disturb the instruction sequence. With respect to branch instructions, ideal pipeline processing may not always be executed.

特に第５図に示すようなパイプライン処理を行うデータ
処理装置において、分岐命令をＥステージ55で処理して
から分岐先の命令をIFステージ51が処理する場合、パイ
プライン処理が大幅に乱れる。Particularly, in a data processing device for performing pipeline processing as shown in FIG. 5, when a branch instruction is processed by the E stage 55 and then a branch destination instruction is processed by the IF stage 51, the pipeline processing is significantly disturbed.

第６図は分岐命令処理に係る各ステージの命令実行サイ
クルを説明する模式図であり、縦軸は各ステージを示
し、横軸は時間を示す。なお、命令3,命令12が分岐命令
である。FIG. 6 is a schematic diagram for explaining an instruction execution cycle of each stage related to branch instruction processing, where the vertical axis shows each stage and the horizontal axis shows time. The instructions 3 and 12 are branch instructions.

命令３が実行されると、既にパイプライン処理中の命令
４〜命令７はキャンセルされ、新たに命令11がIFステー
ジ51から処理される。When the instruction 3 is executed, the instructions 4 to 7 which are already pipelined are canceled, and the instruction 11 is newly processed from the IF stage 51.

命令３がＥステージ55で実行されてから命令11がＥステ
ージ55で実行されるまでには、４命令処理分の時間が無
駄になるとともに、命令12についても同様に４命令処理
分の時間が無駄になる。この無駄時間は、分岐命令の実
行後に処理すべき命令フェッチが、分岐命令に対する全
パイプライン処理が終了した後に実行されるためであ
り、パイプライン処理の段数が多いほど、この無駄時間
も長くなる。From the time the instruction 3 is executed in the E stage 55 to the time the instruction 11 is executed in the E stage 55, the time for processing four instructions is wasted, and the time for processing four instructions is the same for the instruction 12. To be wasted. This dead time is because the instruction fetch to be processed after the execution of the branch instruction is executed after all the pipeline processing for the branch instruction is completed, and the larger the number of stages of the pipeline processing, the longer the dead time. .

このような、分岐命令によるパイプラインの乱れを少な
くするため、Ｄステージ52において、命令をデコードす
る段階で条件分岐命令の分岐を予測し、その分岐命令が
Ｅステージ55で実行される前にIFステージ51による命令
のフェッチ先をあらかじめ分岐先に変更する分岐予測処
理が従来より実行されている。In order to reduce the disturbance of the pipeline due to the branch instruction, the branch of the conditional branch instruction is predicted at the stage of decoding the instruction at the D stage 52, and the IF of the IF instruction is executed before the branch instruction is executed at the E stage 55. Conventionally, a branch prediction process of changing the instruction fetch destination by the stage 51 to a branch destination has been executed.

すなわち、Ｄステージ52において、命令をデコードする
段階で条件分岐命令の分岐予測を行うことにより、命令
フェッチをあらかじめ分岐先に切り換えることで、パイ
プライン処理効率を一般的には向上させている。That is, in the D stage 52, by predicting the branch of the conditional branch instruction at the stage of decoding the instruction, the instruction fetch is switched to the branch destination in advance, so that the pipeline processing efficiency is generally improved.

[Problems to be Solved by the Invention]

従来の分岐予測処理は、条件分岐命令の１つ前にデコー
ドされた命令のアドレスを使用して実行されている。分
岐命令がＥステージ55で実行された結果、実際に分岐が
起こった場合、パイプライン中で処理されている命令は
全てキャンセルされる。そして、キャンセルされる直前
に分岐予測機構では、分岐命令に引き続く命令のアドレ
スで分岐予測を行っているから、その予測は分岐先の命
令とは全く関係のないものとなってしまうという問題点
があった。The conventional branch prediction process is executed using the address of the instruction decoded one before the conditional branch instruction. If a branch instruction is executed in the E stage 55 and a branch actually occurs, all the instructions being processed in the pipeline are canceled. Then, immediately before being canceled, the branch prediction mechanism makes a branch prediction at the address of the instruction following the branch instruction, so that the prediction has nothing to do with the instruction at the branch destination. there were.

すなわち、１つ前にデコードされた命令のアドレスを使
用して分岐予測を行う場合には、Ｅステージ55における
分岐直後にデコードされる命令の分岐予測は当てになら
ない処理となってしまう。That is, when the branch prediction is performed using the address of the instruction decoded immediately before, the branch prediction of the instruction decoded immediately after the branch in the E stage 55 becomes a misleading process.

この発明は、上記の問題点を解決するためになされたも
ので、命令実行ステージにおける分岐直後を示す情報を
保持することにより、命令実行ステージにおける分岐直
後にデコードされる命令に対する分岐予測処理を「分岐
しない」と常に予測できるデータ処理装置を得ることを
目的とする。The present invention has been made to solve the above problems, and holds information indicating immediately after a branch in an instruction execution stage to perform branch prediction processing for an instruction decoded immediately after a branch in an instruction execution stage. The purpose is to obtain a data processing device that can always predict that "no branching" will occur.

[Means for Solving the Problems]

この発明に係るデータ処理装置は、第２の分岐処理発生
を示す発生情報を保持する保持手段と、この第２の分岐
処理発生直後の前記第１のパイプライン処理ステージに
おける分岐予測結果を前記保持手段に保持される発生情
報に基づいて非分岐とする分岐予測制御手段とを設けた
ものである。The data processing apparatus according to the present invention holds the generation means indicating the occurrence of the second branch processing, and the branch prediction result in the first pipeline processing stage immediately after the occurrence of the second branch processing. Branch prediction control means for making non-branch based on the occurrence information held in the means.

[Action]

この発明においては、第１のパイプラインステージより
転送される単位処理コードに基づいて第２のパイプライ
ン処理ステージにおいて第２の分岐処理が実行される
と、第２の分岐処理発生を示す発生情報が保持手段に保
持される。そして、この第２の分岐処理発生直後に第１
のパイプラインステージで予測された第１の分岐予測処
理結果を分岐予測制御手段が保持手段に保持される発生
情報に基づいて命令デコード機構における分岐予測を非
分岐とし、分岐予測機構は命令デコード機構に対して分
岐しないと予測させる。In the present invention, when the second branch processing is executed in the second pipeline processing stage based on the unit processing code transferred from the first pipeline stage, the occurrence information indicating the occurrence of the second branch processing is generated. Are held by the holding means. Then, immediately after the occurrence of the second branch processing, the first branch processing is performed.
The branch prediction control means sets the branch prediction in the instruction decoding mechanism to non-branch based on the occurrence information held in the holding means, and the branch prediction mechanism uses the instruction decoding mechanism. Predict not to branch.

〔Example〕

第１図はこの発明の一実施例を示すデータ処理装置の構
成を説明するブロック図であり、１は命令フェッチ部
で、ブランチバッファ，命令キュー，制御部等から構成
され、次にフェッチすべき命令のアドレスを決定してブ
ランチバッファやCPU外部のメモリから命令をフェッチ
する。また、例えば特願昭61-202041号に記載されるブ
ランチバッファへの命令登録も行う。FIG. 1 is a block diagram for explaining the configuration of a data processing device showing an embodiment of the present invention. Reference numeral 1 is an instruction fetch unit, which is composed of a branch buffer, an instruction queue, a control unit, etc. and should be fetched next. Determines the address of the instruction and fetches the instruction from the branch buffer or memory outside the CPU. Also, for example, instruction registration in the branch buffer described in Japanese Patent Application No. 61-202041 is performed.

ブランチバッファは小規模であるため、セレクティブキ
ャッシュとして動作する。Since the branch buffer is small, it operates as a selective cache.

また、CPU外部のメモリから命令をフェッチするとき
は、外部バスインターフェース部に出力し、データ入出
力回路９から命令コードをフェッチし、バッファリング
した命令コードのうち、命令デコード部２で次にデコー
ドすべき命令コードを命令デコード部２に出力する。When fetching an instruction from a memory external to the CPU, the instruction decode unit 2 outputs the instruction code to the external bus interface unit, fetches the instruction code from the data input / output circuit 9, and then decodes the buffered instruction code by the instruction decoding unit 2. The instruction code to be output is output to the instruction decoding unit 2.

命令デコード部２（第１のパイプラインステージに含ま
れる）では、基本的に16ビット（ハーフワード）単位に
命令コードをデコードする。このブロックには第１ハー
フワードに含まれるオペコードをデコードするFHWデコ
ーダ，第2,第３ハーフワードに含まれるオペコードをデ
コードするNFHWデコーダ，アドレッシングモードをデコ
ードするアドレッシングモードデコーダが含まれる。な
お、命令デコード部２には、この発明の保持手段と分岐
予測制御手段とを具備しており、第１のパイプラインス
テージ（後述するＤステージ）より転送される単位処理
コードに基づいて第２のパイプライン処理ステージ（後
述するＥステージ）において第２の分岐処理が実行され
ると、第２の分岐処理発生を示す発生情報が保持手段に
保持される。そして、この第２の分岐処理発生直後に第
１のパイプラインステージで予測された第１の分岐予測
処理結果を、分岐予測制御手段が第２図に示す分岐予測
テーブル（保持手段）に保持される発生情報に基づいて
非分岐とし、分岐予測機構は命令デコード機構に対して
「分岐しない」と予測させる。The instruction decoding unit 2 (included in the first pipeline stage) basically decodes the instruction code in 16-bit (halfword) units. This block includes an FHW decoder that decodes the operation code included in the first halfword, an NFHW decoder that decodes the operation code included in the second and third halfwords, and an addressing mode decoder that decodes the addressing mode. The instruction decoding unit 2 is provided with the holding unit and the branch prediction control unit of the present invention, and the second unit based on the unit processing code transferred from the first pipeline stage (D stage described later). When the second branch processing is executed in the pipeline processing stage (E stage to be described later) of, the holding information holds the occurrence information indicating the occurrence of the second branch processing. Immediately after the occurrence of the second branch processing, the branch prediction control means holds the result of the first branch prediction processing predicted in the first pipeline stage in the branch prediction table (holding means) shown in FIG. Based on the generated information, the branch prediction mechanism causes the instruction decoding mechanism to predict "no branch".

さらに、FHWデコーダ,NFHWデコーダの出力をさらにデコ
ードして、マイクロROMのエントリアドレスを計算する
デコーダ，条件分岐命令の分岐予測を行う分岐予測機
構，オペランドアドレス計算のときのパイプラインコン
フリクトをチェックするアドレス計算コンフリクトチェ
ック機構も具備されている。Furthermore, a decoder that further decodes the outputs of the FHW decoder and NFHW decoder to calculate the entry address of the micro ROM, a branch prediction mechanism that performs branch prediction of conditional branch instructions, an address that checks pipeline conflicts when calculating operand addresses. A calculation conflict check mechanism is also provided.

なお、命令デコード部２は、命令フェッチ部１より入力
された命令コードを２クロックにつき０〜６バイトのデ
コードを行う。The instruction decoding unit 2 decodes the instruction code input from the instruction fetch unit 1 into 0 to 6 bytes every 2 clocks.

デコード結果のうち、データ演算部６での演算に関する
情報がマイクロROM部５に、オペランドアドレス計算に
関係する情報がオペランドアドレス計算部４に、プログ
ラムカウンタ計算に関する情報がPC計算部３にそれぞれ
出力される。Among the decoding results, the information related to the operation in the data operation unit 6 is output to the micro ROM unit 5, the information related to the operand address calculation is output to the operand address calculation unit 4, and the information related to the program counter calculation is output to the PC calculation unit 3, respectively. It

PC計算部３は、命令デコード部２から出力されるPC計算
に関係する情報でハードワイヤードに制御され、命令の
PC値を計算する。なお、この実施例におけるデータ処理
装置においては、後述する可変長命令セットをもってお
り、命令をデコードしてみないとその命令の長さが分か
らない。そこで、PC計算部３は、命令デコード部２から
出力される命令長をデコード中の命令のPC値に加算する
ことにより、次の命令のPC値を作り出す。また、命令デ
コード部２が、分岐命令をデコードしてデコード段階で
の分岐を指示したときは、命令長の代わりに分岐変位を
分岐命令のPC値に加算することにより、分岐先命令のPC
値を計算する。分岐命令に対して命令デコード段階で分
岐を行うことをプリブランチという。なお、このプリブ
ランチの処理については、特願昭61-204500号，特願昭6
1-200557号等に詳しく記載されているので説明は省略す
る。The PC calculation unit 3 is hard-wired controlled by the information related to the PC calculation output from the instruction decoding unit 2,
Calculate the PC value. The data processing device in this embodiment has a variable length instruction set described later, and the length of the instruction cannot be known unless the instruction is decoded. Therefore, the PC calculating unit 3 creates the PC value of the next instruction by adding the instruction length output from the instruction decoding unit 2 to the PC value of the instruction being decoded. When the instruction decoding unit 2 decodes a branch instruction and gives an instruction to branch at the decoding stage, the branch displacement is added to the PC value of the branch instruction instead of the instruction length, so that the PC
Calculate the value. Pre-branching is the branching of a branch instruction at the instruction decoding stage. Regarding the processing of this pre-branch, Japanese Patent Application Nos. 61-204500 and 6-6
Since it is described in detail in No. 1-200557 etc., its explanation is omitted.

PC計算部３の計算結果は、各命令のPC値として命令のデ
コード結果とともに出力されるほか、プリブランチ時に
は、次にデコードすべき命令のアドレスとして命令フェ
ッチ部１に出力される。The calculation result of the PC calculation unit 3 is output as the PC value of each instruction together with the instruction decode result, and is output to the instruction fetch unit 1 as the address of the next instruction to be decoded during pre-branching.

また、次の命令デコード部２でデコードされる命令の分
岐予測のためのアドレスにも使用される。なお、上記分
岐予測処理については特願昭62-8394号等で詳細に説明
されている。It is also used as an address for branch prediction of an instruction decoded by the next instruction decoding unit 2. The branch prediction process is described in detail in Japanese Patent Application No. 62-8394.

オペランドアドレス計算部４は、命令デコード部２のア
ドレスデコーダなどから出力されたオペランドアドレス
計算に関係する情報によりハードワイヤード制御され
る。このブロックでは、オペランドのアドレス計算に関
する殆どの処理が行われる。アドレス計算結果は、外部
バスインタフェース７に送られる。アドレス計算に必要
な汎用レジスタやプログラムカウンタの値はデータ演算
部６より入力される。なお、メモリ間接アドレッシング
を行うときは、外部バスインターフェース部７をそのま
ま通過させてアドレス出力回路８からCPU外部に参照す
べきメモリアドレスを出力し、データ入出力回路９から
入力された間接アドレス値を命令デコード部２を通して
フェッチする。The operand address calculation unit 4 is hard-wired controlled by the information related to the operand address calculation output from the address decoder of the instruction decoding unit 2 or the like. In this block, most of the processing for calculating the address of the operand is performed. The address calculation result is sent to the external bus interface 7. The values of general-purpose registers and program counters required for address calculation are input from the data calculation unit 6. When performing the memory indirect addressing, the address output circuit 8 outputs the memory address to be referred to outside the CPU through the external bus interface unit 7 as it is, and the indirect address value input from the data input / output circuit 9 is output. Fetch through the instruction decoding unit 2.

マイクロROM部５には、主にデータ演算部６を制御する
マイクロプログラムが格納されているマイクロROM,マイ
クロシーケンサ，マイクロ命令デコーダ等が設けられて
おり、マイクロ命令はマイクロROMから２クロックに１
度読み出される。マイクロシーケンサは、マイクロプロ
グラムで示されるシーケンス処理の他に、例外，割込
み，トラップ（この３つの処理を併せてEIT処理と呼
ぶ）の処理をハードウエア的に受付ける。The micro ROM unit 5 is provided with a micro ROM that mainly stores a micro program that controls the data operation unit 6, a micro sequencer, a micro instruction decoder, and the like. Micro instructions are sent from the micro ROM once every two clocks.
Read once. The micro sequencer accepts, in addition to the sequence processing indicated by the micro program, exception, interrupt, and trap processing (these three processings are collectively called EIT processing) by hardware.

また、マイクロROM部５は、ストアバッファの管理も行
う。マイクロROM部５には命令コードに依存しない割込
みや演算実行結果によるフラッグ情報と、デコーダの出
力など命令デコード部２の出力が入力される。マイクロ
デコーダの出力は主にデータ演算部６に対して出力され
るが、ジャンプ命令の実行による他の先行命令中止情報
等一部の情報は他のブロックへ出力される。The micro ROM unit 5 also manages the store buffer. The micro ROM unit 5 receives the interrupt information not depending on the instruction code and the flag information based on the operation execution result, and the output of the instruction decoding unit 2 such as the output of the decoder. The output of the microdecoder is mainly output to the data operation unit 6, but some information such as other preceding instruction stop information due to execution of the jump instruction is output to other blocks.

データ演算部６はマイクロプログラムにより制御され、
マイクロROM部５の出力情報に従い、各命令の機能を実
現するのに必要な演算をレジスタと演算器で実行する。
演算対象となるオペランドがアドレスや即値の場合は、
オペランドアドレス計算部４で計算されたアドレスや即
値を外部バスインタフェース部７を通過させて得る。ま
た、演算対象となるオペランドがCPU外部のメモリにあ
るデータの場合は、オペランドアドレス計算部４で計算
されたアドレスを外部バスインタフェース部７がアドレ
ス出力回路８から出力して、CPU外部のメモリからフェ
ッチしたオペランドをデータ入出力回路９から得る。The data calculation unit 6 is controlled by a micro program,
According to the output information of the micro ROM unit 5, the operations required to realize the function of each instruction are executed by the register and the arithmetic unit.
If the operand to be operated is an address or immediate value,
The address or immediate value calculated by the operand address calculation unit 4 is obtained by passing through the external bus interface unit 7. When the operand to be operated is the data in the memory outside the CPU, the external bus interface unit 7 outputs the address calculated by the operand address calculation unit 4 from the address output circuit 8 The fetched operand is obtained from the data input / output circuit 9.

演算器としては、ALU,バレルシフタ，プライオリティエ
ンコーダやカウンタ，シフトレジスタ等がある。レジス
タと主な演算器の間は３バスで結合されており、１つの
レジスタ間演算を指示する１マイクロ命令を２クロック
サイクルで処理する。The arithmetic unit includes an ALU, barrel shifter, priority encoder, counter, shift register and the like. The registers and main arithmetic units are connected by three buses, and one microinstruction for instructing one inter-register operation is processed in two clock cycles.

データ演算のとき、CPU外部のメモリをアクセスする必
要がある時は、マイクロプログラムの指示により外部バ
スインタフェース部７を通してアドレス出力回路８から
アドレスをCPU外部に出力し、データ入出力回路９を通
して目的のデータをフェッチする。When it is necessary to access the memory outside the CPU during data operation, the address is output from the address output circuit 8 to the outside of the CPU through the external bus interface section 7 according to the instruction of the microprogram, and the target is output through the data input / output circuit 9. Fetch the data.

CPU外部のメモリからデータをリードするときは、アド
レスを後述するAA1レジスタに設定し、そのアドレスを
外部バスインタフェース部７を通してアドレス出力回路
８より出力し、データをデータ入出力回路９から後述す
るDDバス18を通してDDR1レジスタ6bに取り込む。CPU外
部のメモリへデータをライトするときは、後述するアド
レスをAA1レジスタに設定し、そのアドレスを外部バス
インタフェース部７を通してアドレス出力回路８より出
力し、DD2レジスタ6cに設定したデータをDDバス18を通
してデータ入出力回路９よりCPU外部に出力する。When reading data from a memory external to the CPU, the address is set in the AA1 register to be described later, the address is output from the address output circuit 8 through the external bus interface unit 7, and the data is input from the data input / output circuit 9 to the DD described later. It is taken into the DDR1 register 6b through the bus 18. When writing data to the memory outside the CPU, set the address described below in the AA1 register, output that address from the address output circuit 8 through the external bus interface unit 7, and set the data set in the DD2 register 6c to the DD bus 18 Through the data input / output circuit 9 to the outside of the CPU.

なお、ジャンプ命令の処理や例外処理等を行って新たな
命令アドレスをデータ演算部６が得たときは、これを命
令フェッチ部１とPC計算部３に出力する。When the data operation unit 6 obtains a new instruction address by performing a jump instruction process, an exception process, or the like, this is output to the instruction fetch unit 1 and the PC calculation unit 3.

外部バスインタフェース部７は、外部バスでの通信を制
御し、特にメモリアクセスはすべてクロック同期で行わ
れ、最小２クロックサイクルで行うことができる。The external bus interface unit 7 controls communication on the external bus, and particularly memory access is performed in clock synchronization and can be performed in a minimum of two clock cycles.

メモリに対するアクセス要求は命令フェッチ部1,オペラ
ンドアドレス計算部4,データ演算部６から独立に発生
し、さらに、オペランドプリフェッチを行うためのアク
セス要求も生じる。外部バスインタフェース部７は、こ
れらのメモリアクセス要求を調停する。さらに、メモリ
とCPUを結ぶデータバスサイズである32ビット（ワー
ド）の整置境界をまたぐメモリ番地にあるデータのアク
セスは、このブロック内で自動的にワード境界をまたぐ
ことを検知して、２回のメモリアクセスに分解して行
う。An access request to the memory is independently generated from the instruction fetch unit 1, the operand address calculation unit 4, and the data operation unit 6, and further, an access request for performing the operand prefetch is also generated. The external bus interface unit 7 arbitrates these memory access requests. Furthermore, access to data at a memory address that crosses a 32 bit (word) aligned boundary, which is the size of the data bus connecting the memory and the CPU, automatically detects that the word boundary is crossed within this block. It is performed by breaking it down into memory accesses.

プリフェッチするオペランドとストアするオペランドが
重なる場合の、コンフリクト防止処理やストアオペラン
ドからフェッチオペランドへのバイパス処理も行う。When the prefetch operand and the store operand overlap, conflict prevention processing and bypass processing from the store operand to the fetch operand are also performed.

なお、命令フェッチ部１からのアクセス要求がある場合
には、後述するCAAレジスタにアドレスが設定される。
オペランドアドレス計算部４からのアクセス要求がある
場合には、後述するIAレジスタにアドレスが設定され
る。データ演算部６からのアクセス要求がある場合は、
後述するAA1レジスタにアドレスが設定される。When there is an access request from the instruction fetch unit 1, an address is set in the CAA register described later.
When there is an access request from the operand address calculation unit 4, the address is set in the IA register described later. If there is an access request from the data calculation unit 6,
An address is set in the AA1 register described later.

オペランドのプリフェッチを行うためのアクセス要求が
ある場合は、後述するFAレジスタに設定されたアドレス
がAAバスに出力れ、CPU外部のメモリからオペランドデ
ータがフェッチされる。フェッチされたオペランドデー
タはDDバスを通してラッチSDATA（第２図参照）に入力
される。また、アクセスに用いたAAバス上のアドレスが
ラッチSCAMに入力される。ラッチSCAM,ラッチSDATAは一
致指示線でつながっている。ラッチSDATAには、整置さ
れた４バイトのデータが２つまで入る。ラッチSCAMに
は、ラッチSDATA中のデータに対応するアドレスが入
る。ラッチSDATAへのデータの入力は整置されてなされ
るが、そのデータをデータ演算部６が取り出して使う時
には、任意のアドレスから任意のデータ長（ただし４バ
イト以内）で取り出しが行える。When there is an access request for prefetching the operand, the address set in the FA register described later is output to the AA bus, and the operand data is fetched from the memory outside the CPU. The fetched operand data is input to the latch SDATA (see FIG. 2) through the DD bus. Further, the address on the AA bus used for access is input to the latch SCAM. The latch SCAM and the latch SDATA are connected by a match instruction line. Up to two aligned 4-byte data are stored in the latch SDATA. The address corresponding to the data in the latch SDATA is stored in the latch SCAM. Data is input to the latch SDATA in an aligned manner, but when the data operation unit 6 takes out and uses the data, it can be taken out from any address with any data length (however within 4 bytes).

次に第２図，第３図を参照しながらこの発明による分岐
命令のデータ処理について説明する。Next, the data processing of the branch instruction according to the present invention will be described with reference to FIGS. 2 and 3.

なお、ここで、この実施例における分岐命令種別につい
て説明する。The branch instruction type in this embodiment will be described.

この発明におけるデータ処理装置においては、動的分岐
予測処理を行う命令をプリブランチ命令と呼び、このプ
リブランチ命令には無条件分岐命令のように、動的予測
にかかわらず必ず分岐する命令も含み、分岐命令は、分
岐条件がスタティックかダイナミックかおよび分岐先が
スタティックがダイナミックかにより計４種類に分類で
きるが、この実施例においては、次の２種類に分類され
る命令をプリブランチ命令とする。In the data processing device according to the present invention, an instruction for performing dynamic branch prediction processing is called a pre-branch instruction, and this pre-branch instruction includes an instruction that always branches regardless of dynamic prediction, such as an unconditional branch instruction. The branch instructions can be classified into four types in total depending on whether the branch condition is static or dynamic and whether the branch destination is static or dynamic. In this embodiment, the instructions classified into the following two types are pre-branch instructions. .

第１の種類の分岐命令は、分岐条件，分岐先ともスタテ
ィックな命令であり、この種の命令としては無条件分岐
命令（BRA命令）とサブルーチン呼出し命令（BSR命令）
がある。The first type of branch instruction is a static instruction with both a branch condition and a branch destination. For this type of instruction, an unconditional branch instruction (BRA instruction) and a subroutine call instruction (BSR instruction)
There is.

第２の種類の分岐命令は、分岐条件がダイナミックで、
分岐先がスタティックな命令であり、この種の命令とし
ては、条件分岐命令（Bcc命令），ループ制御命令（ACB
命令）がある。The second type of branch instruction has a dynamic branch condition,
The branch destination is a static instruction. Examples of this type of instruction include conditional branch instructions (Bcc instructions) and loop control instructions (ACB
There is an instruction).

第２図は、第１に示したデータ処理装置の構成を示す詳
細ブロック図であり、以下構成ならびに動作について説
明する。FIG. 2 is a detailed block diagram showing the configuration of the data processing device shown in FIG. 1. The configuration and operation will be described below.

命令デコーダ2aとPC加算器3bの入力側、アドレス加算器
4dの入力側は、ディスプレースメント値，分岐命令の変
位値を転送するDISPバス10で結ばれている。Input side of instruction decoder 2a and PC adder 3b, address adder
The input side of 4d is connected by the DISP bus 10 which transfers the displacement value and the displacement value of the branch instruction.

命令デコーダ2aとアドレス加算器4dの入力側はステップ
コード生成に使用した命令コード長，スタックプッシュ
モードのときのプリデクリメント値などを転送する補正
値バス12でも結ばれている。命令デコーダ2aとPC加算器
3bの入力側はステップコード生成に使用した命令コード
長を転送する命令長バス11でも結ばれている。レジスタ
ファイル6dとアドレス加算器4dの入力側はレジスタファ
イル6dに蓄えられているアドレス値を転送するＡバス13
で結ばれている。The instruction decoder 2a and the input side of the address adder 4d are also connected to a correction value bus 12 that transfers the instruction code length used for step code generation, the pre-decrement value in the stack push mode, and the like. Instruction decoder 2a and PC adder
The input side of 3b is also connected to an instruction length bus 11 that transfers the instruction code length used for step code generation. The input side of the register file 6d and the address adder 4d transfers the address value stored in the register file 6d to the A bus 13
Tied with.

命令デコーダ2aには命令キュー2bから命令コードが入力
され、分岐予測テーブル2cから分岐予測ビットが入力さ
れる。命令デコーダ2aの出力部には、分岐予測結果によ
り、条件分岐命令の分岐条件指定フィールドを、命令実
行ステージ（第２のパイプライン処理ステージ）にその
まま出力するか条件指定を反転して出力するかの選択を
行う分岐条件生成回路2dを有している。The instruction code is input to the instruction decoder 2a from the instruction queue 2b, and the branch prediction bit is input from the branch prediction table 2c. Whether the branch condition designation field of the conditional branch instruction is output to the instruction execution stage (second pipeline processing stage) as it is or the condition designation is inverted and output to the output unit of the instruction decoder 2a according to the branch prediction result. It has a branch condition generation circuit 2d for selecting.

命令長バス11の値とDISPバス10の値のどちらかを選択し
て入力する被加算値選択回路3aの出力と、第１のパイプ
ライン処理ステージとなるデコードステージ（後述す
る）でデコードした命令のPC値（プログラムカウンタ
値）を保持するDPCラッチ3e,ステップコードの切れ目毎
の作業用PC値を保持するTPCラッチ3dのどちらかがPC加
算器3bに入力される。PC加算器3bの出力はPC加算器出力
ラッチ3cを通してCAバス14,POバス15に出力される。PO
バス15は、TPCラッチ3d,DPCラッチ3e,オペランドアドレ
ス計算ステージで処理中の命令のPC値を保持するAPCラ
ッチ3f,さらに分岐予測テーブル2cにも結合している。T
PCラッチ3dには、命令実行ステージで分岐やジャンプが
生じたとき、新たな命令番地を入力するためのCAバス14
からの入力経路もある。The output of the added value selection circuit 3a which selects and inputs either the value of the instruction length bus 11 or the value of the DISP bus 10 and the instruction decoded by the decode stage (described later) which is the first pipeline processing stage. Either the DPC latch 3e that holds the PC value (program counter value) or the TPC latch 3d that holds the working PC value at each step code break is input to the PC adder 3b. The output of the PC adder 3b is output to the CA bus 14 and the PO bus 15 through the PC adder output latch 3c. PO
The bus 15 is also connected to the TPC latch 3d, the DPC latch 3e, the APC latch 3f that holds the PC value of the instruction being processed in the operand address calculation stage, and the branch prediction table 2c. T
The PC latch 3d has a CA bus 14 for inputting a new instruction address when a branch or jump occurs at the instruction execution stage.
There is also an input path from.

補正値バス12の出力とDISPバス10の出力はディスプレー
スメント選択回路4bに入力され、どちらか一方がアドレ
ス加算器4dに入力される。DISPバス10の出力とＡバス13
の出力はベースアドレス選択回路4cに入力され、どちら
か一方がアドレス加算器4dに入力される。アドレス加算
器4dは、ディスプレースメント選択回路4bの出力，ベー
スアドレス選択回路4cの出力，それにＡバス13より入力
された値をシフトすることにより、１倍,2倍,4倍,8倍の
値とするインデックス値生成回路4aの出力の計３入力値
の加算を実行し、加算結果がアドレス加算器出力ラッチ
4eを介してAOバス16に出力される。AOバス16は、メモリ
間接アドレッシングを行うとき、AAバス17を通してアド
レス出力回路８からCPU外部にアドレス値を出力すると
き、そのアドレス値を保持するIAラッチ7a,オペランド
フェッチステージでのオペランドフェッチ時に、AAバス
17を通してアドレス出力回路８からCPU外部にオペラン
ドアドレス値を出力するとき、そのオペランドアドレス
を保持するFAラッチ7bとに接続される。The output of the correction value bus 12 and the output of the DISP bus 10 are input to the displacement selection circuit 4b, and one of them is input to the address adder 4d. DISP bus 10 output and A bus 13
Is output to the base address selection circuit 4c, and one of them is input to the address adder 4d. The address adder 4d shifts the output of the displacement selection circuit 4b, the output of the base address selection circuit 4c, and the value input from the A bus 13 to obtain a value of 1 ×, 2 ×, 4 ×, 8 ×. The sum of the three input values of the output of the index value generation circuit 4a is added, and the addition result is the address adder output latch.
It is output to the AO bus 16 via 4e. The AO bus 16 outputs the address value from the address output circuit 8 to the outside of the CPU through the AA bus 17 when performing the memory indirect addressing, the IA latch 7a that holds the address value, and the operand fetch at the operand fetch stage, AA bus
When the operand output value is output from the address output circuit 8 to the outside of the CPU through 17, it is connected to the FA latch 7b that holds the operand address.

FAラッチ7bは、アドレス加算器4dで計算されたオペラン
ドアドレスを命令実行ステージで使用するためにオペラ
ンドアドレス値を保持するSAラッチ7cへの出力経路を持
つ。SAラッチ7cはデータ演算器6eの汎用データバスであ
るＳバス19への出力経路も持つ。命令のアドレスを転送
するCAバス14はPC加算器出力ラッチ3cと、TPCラッチ3d
と、命令フェッチステージがプリフェッチする命令コー
ドの番地を管理するカウンタQINPCと、命令フェッチの
ためのアドレスをAAバス17を通してアドレス出力回路８
からCPU外部に出力するときその値を保持するCAAラッチ
7eと、命令実行ステージで分岐ジャンプが起きたときに
新たな命令番地をＳバス19から入力するEBラッチ6aとに
結合している。APCラッチ3fは、Ａバス13と、オペラン
ドフェッチステージで処理中の命令のPC値を保持するFP
Cラッチ3gに出力経路がある。7dはAA1ラッチである。The FA latch 7b has an output path to the SA latch 7c that holds the operand address value for using the operand address calculated by the address adder 4d in the instruction execution stage. The SA latch 7c also has an output path to the S bus 19 which is a general-purpose data bus of the data calculator 6e. The CA bus 14 that transfers the address of the instruction is the PC adder output latch 3c and the TPC latch 3d.
, A counter QINPC that manages the address of the instruction code prefetched by the instruction fetch stage, and an address output circuit 8 for the address for instruction fetch through the AA bus 17.
CAA latch that holds the value when it is output from the CPU to the outside of the CPU
7e and the EB latch 6a which inputs a new instruction address from the SBus 19 when a branch jump occurs in the instruction execution stage. The APC latch 3f is an A bus 13 and an FP that holds the PC value of the instruction being processed in the operand fetch stage.
There is an output path in C latch 3g. 7d is the AA1 latch.

FCPラッチ3gは命令実行ステージで処理中の命令のPC値
を保持するCPCラッチ3hへの出力経路を持つ。CPCラッチ
3hは、Ｓバス19と、分岐履歴書換えのためにPC値の最下
位バイトの値を保持するOPCラッチ3iとに出力経路を持
つ。レジスタファイル6dは、汎用レジスタや作用用レジ
スタからなり、Ｓバス19とＡバス13への出力経路を持
ち、Ｄバス20からの入力経路を持つ。The FCP latch 3g has an output path to the CPC latch 3h which holds the PC value of the instruction being processed in the instruction execution stage. CPC latch
3h has an output path to the S bus 19 and an OPC latch 3i which holds the value of the least significant byte of the PC value for rewriting the branch history. The register file 6d includes general-purpose registers and working registers, has an output path to the S bus 19 and the A bus 13, and has an input path from the D bus 20.

データ演算部６の演算機構であるデータ演算器6eはＳバ
ス19から入力経路を持ち、Ｄバス20への出力経路を持
つ。The data calculator 6e, which is the calculation mechanism of the data calculator 6, has an input path from the S bus 19 and an output path to the D bus 20.

なお、この実施例におけるデータ処理装置においては、
無条件分岐命令BRA,サブルーチン分岐命令BSR,ループ制
御命令ACB等の３つの命令については、分岐予測テーブ
ルの出力である分岐予測ビットにかかわらず、必ず分岐
すると予測する。無条件分岐命令BRA,サブルーチン分岐
命令BSRに対してはこの予測は必ず正しい。In the data processing device in this embodiment,
Unconditional branch instruction BRA, subroutine branch instruction BSR, loop control instruction ACB, and other three instructions are always predicted to branch regardless of the branch prediction bit output from the branch prediction table. This prediction is always correct for unconditional branch instructions BRA and subroutine branch instructions BSR.

ループ制御命令ACBは、ループ制御変数に指定された値
を加えて、その結果がループ終了条件を満たすかどうか
を判定し、ループ終了条件を満たさなければ分岐し、満
たせば分岐しない命令である。従って、大多数のソフト
ウエアではループ制御命令ACBについてもこの予測の確
立が正しい。また、ループ制御命令ACBに対する処理を
考慮してソフトウエアを作成すれば、意識しない場合に
比べて効率的なプログラムを作成可能となる。The loop control instruction ACB is an instruction that adds a specified value to a loop control variable and determines whether or not the result satisfies a loop end condition, branches if the loop end condition is not satisfied, and does not branch if the loop end condition is satisfied. Therefore, the majority of software is correct in establishing this prediction for the loop control instruction ACB. Further, if the software is created in consideration of the processing for the loop control instruction ACB, it is possible to create an efficient program as compared with the case where the software is not considered.

また、条件分岐命令BCCについては、分岐するかしない
かを過去の履歴に従って判断する。すなわち、履歴は、
条件分岐命令BCCの１つ前に実行した命令のアドレスの
下位８ビットのアドレスに基づいて行い、分岐予測は過
去１回の分岐履歴のみに従い、１ビットで示される。For the conditional branch instruction BCC, whether to branch or not is determined according to the past history. That is, the history is
It is performed based on the lower 8 bits of the address of the instruction executed immediately before the conditional branch instruction BCC, and branch prediction is indicated by 1 bit according to only the branch history of the past one time.

第３図は、第２図に示した分岐予測テーブル2cの構成を
説明する詳細ブロック図であり、以下、構成ならびに動
作について説明する。FIG. 3 is a detailed block diagram for explaining the configuration of the branch prediction table 2c shown in FIG. 2, and the configuration and operation will be described below.

POバス15からの入力７ビットとOPCラッチ3iからの入力
７ビットはセレクタ21を通して、デコーダ22に入力され
る。デコーダ22では、７ビットを128ビットにデコード
して128ビットの分岐履歴ラッチ23のうち１つを分岐予
測値として分岐予測信号線29を通して分岐予測出力ラッ
チ24に出力する。128ビットの分岐履歴ラッチ23は、ク
リア信号27が入力されると、一斉に値をゼロにして「分
岐しない」を示す状態となる。The 7-bit input from the PO bus 15 and the 7-bit input from the OPC latch 3i are input to the decoder 22 through the selector 21. The decoder 22 decodes 7 bits into 128 bits, and outputs one of the 128-bit branch history latches 23 as a branch prediction value to the branch prediction output latch 24 through the branch prediction signal line 29. When the clear signal 27 is input to the 128-bit branch history latch 23, the values are simultaneously set to zero, and the state indicates “no branch”.

分岐予測出力ラッチ24は予測反転回路25によりその内容
を反転して分岐予測更新ラッチ26に結合されている。分
岐予測制御フラグ28は、SR型のフリップフロップで構成
され、出力は分岐予測許可信号線33としてアンドゲート
30に接続される。分岐予測制御フラグ28は、分岐発生信
号31により出力が「０」となり、１命令デコード完了信
号32により出力が「１」となる。34は分岐予測信号線
で、分岐予測出力ラッチ24に読み出された分岐履歴情報
をアンドゲート30に出力する。The branch prediction output latch 24 has its contents inverted by a prediction inversion circuit 25 and is coupled to a branch prediction update latch 26. The branch prediction control flag 28 is composed of an SR type flip-flop, and the output is an AND gate as the branch prediction permission signal line 33.
Connected to 30. The branch prediction control flag 28 has an output of "0" in response to the branch occurrence signal 31 and an output of "1" in response to the 1-instruction decoding completion signal 32. A branch prediction signal line 34 outputs the branch history information read by the branch prediction output latch 24 to the AND gate 30.

なお、この実施例においては、Ｄステージ52でデコード
しようとする命令の１つ前にＤステージ52でデコードさ
れた命令のアドレスの下位８ビット（命令アドレスの最
下位ビット（右端のビット）は必ず０」であるため分岐
予測テーブル2cは128ビットで構成される）をもとに分
岐予測テーブル2cを引いて分岐予測を行う。分岐予測は
過去１回の履歴のみに従ったダイレクトマッピング方式
で登録されている。In this embodiment, the lower 8 bits of the address of the instruction decoded in the D stage 52 immediately before the instruction to be decoded in the D stage 52 (the least significant bit (rightmost bit) of the instruction address is always Since it is "0", the branch prediction table 2c is composed of 128 bits) and the branch prediction is performed by subtracting the branch prediction table 2c. The branch prediction is registered by the direct mapping method according to only the past history.

次に動作について説明する。Next, the operation will be described.

分岐予測テーブル2cの分岐履歴は、クリア信号27により
初期値をすべて「分岐しない」とする。分岐予測の更新
は、条件分岐命令BCCが命令実行ステージで分岐したと
きに行われる。条件分岐命令BCCが命令実行ステージで
分岐を起こしたとき、それは命令デコードステージでの
分岐予測が間違っていたことを意味する。そこで、命令
実行ステージでは、OPCラッチ3iの内容をデコーダ22に
転送し、そのデコード結果で対応する分岐履歴ラッチ23
の内容を分岐予測出力ラッチ24に読み出す。次いで、分
岐予測出力ラッチ24の内容が反転された分岐予測更新ラ
ッチ26の内容を、同じくOPCラッチ3iで示された分岐履
歴ラッチ23に書き戻す。For the branch history of the branch prediction table 2c, the clear signal 27 sets all initial values to "no branch". The branch prediction is updated when the conditional branch instruction BCC branches at the instruction execution stage. When the conditional branch instruction BCC takes a branch at the instruction execution stage, it means that the branch prediction at the instruction decode stage was incorrect. Therefore, in the instruction execution stage, the contents of the OPC latch 3i are transferred to the decoder 22, and the branch history latch 23 corresponding to the decoding result is transferred.
The content of is read to the branch prediction output latch 24. Next, the contents of the branch prediction update latch 26 in which the contents of the branch prediction output latch 24 are inverted are written back to the branch history latch 23 also shown by the OPC latch 3i.

分岐予測は対象となる条件分岐命令BCCがデコードされ
る１つ前にデコードされた命令のPC値のもとに行われる
ため、分岐予測テーブル2cの更新も命令実行ステージで
条件分岐命令BCCの１つ前に実行された命令のPC値をも
とに行う。このため、命令実行ステージでは現在実行中
の命令の１つ前に実行した命令のPC値の下位１バイト
（最下位ビットは不要）を記憶しておくOPCラッチ3iが
あり、分岐予測テーブル2cの更新はこの値を用いて行
う。分岐履歴の更新は、命令実行ステージで条件分岐命
令BCCが分岐を起こしたときだけしか行われないため、
命令デコードステージの分岐予測テーブル2cの参照動作
が命令実行ステージの更新に妨げられることはない。ま
た、命令実行ステージで分岐が起きた直後は、命令デコ
ードステージは、命令フェッチステージからの命令コー
ド待ち状態となる。分岐履歴の書換えは、この命令コー
ド待ち状態の間に行われる。Since the branch prediction is performed based on the PC value of the instruction that was decoded just before the target conditional branch instruction BCC is decoded, the branch prediction table 2c is also updated at the instruction execution stage by 1 of the conditional branch instruction BCC. Performed based on the PC value of the last executed instruction. Therefore, in the instruction execution stage, there is an OPC latch 3i that stores the lower 1 byte (the least significant bit is not necessary) of the PC value of the instruction executed immediately before the currently executing instruction, and the branch prediction table 2c Update is performed using this value. Since the branch history is updated only when the conditional branch instruction BCC causes a branch at the instruction execution stage,
The reference operation of the branch prediction table 2c of the instruction decode stage is not hindered by the update of the instruction execution stage. Immediately after the branch occurs in the instruction execution stage, the instruction decode stage is in a state of waiting for the instruction code from the instruction fetch stage. Rewriting of the branch history is performed during this instruction code waiting state.

一方、命令実行ステージでの分岐が起こると、分岐発生
信号31が「１」となり、分岐予測制御フラグ28の出力で
ある分岐予測許可信号線33が「０」となる。このとき、
分岐予測出力ラッチ24から出力された分岐予測信号線34
は分岐予測許可信号線33とのアンドがとられて、そのア
ンド出力が命令デコーダ2aに入力されるから、分岐予測
許可信号線33が「０」の間は、「分岐しない」と予測す
ることとなる。１命令デコード完了信号32は、１つの命
令のデコードが完了すると分岐予測制御フラグ28に対し
て出力される。分岐発生信号31が出力された後、分岐先
命令がデコードされ、１命令デコード完了信号32が分岐
予測制御フラグ28に入力されると、分岐予測許可信号線
33は「１」となり、分岐予測信号線34を介して分岐予測
出力ラッチ24に読み出された分岐予測値が命令デコーダ
2aに出力される。On the other hand, when a branch occurs in the instruction execution stage, the branch occurrence signal 31 becomes "1" and the branch prediction permission signal line 33 which is the output of the branch prediction control flag 28 becomes "0". At this time,
Branch prediction signal line 34 output from the branch prediction output latch 24
Is ANDed with the branch prediction permission signal line 33, and the AND output is input to the instruction decoder 2a. Therefore, while the branch prediction permission signal line 33 is "0", predict "no branch". Becomes The one-instruction decode completion signal 32 is output to the branch prediction control flag 28 when the decoding of one instruction is completed. After the branch occurrence signal 31 is output, the branch target instruction is decoded, and the 1-instruction decoding completion signal 32 is input to the branch prediction control flag 28, the branch prediction enable signal line
33 becomes "1", and the branch prediction value read to the branch prediction output latch 24 via the branch prediction signal line 34 is the instruction decoder.
It is output to 2a.

以上のようにして、命令実行ステージにおける分岐直後
の命令に対しては、分岐予測は常に「分岐しない」と予
測させることができる。As described above, branch prediction can always be predicted as "no branch" for an instruction immediately after a branch in the instruction execution stage.

次に第４図を参照しながらこの発明によるパイプライン
処理動作について説明する。Next, the pipeline processing operation according to the present invention will be described with reference to FIG.

第４図はこの発明によるパイプライン処理動作を説明す
るブロック図である。FIG. 4 is a block diagram for explaining the pipeline processing operation according to the present invention.

この図において、35は命令フェッチステージ（IFステー
ジ）で、命令のプリフェッチを行い、命令コードを後段
のステージに出力する。36はデコードステージ（Ｄステ
ージ）で、命令コードをデコードし、オペランドアドレ
ス計算を実行し、オペコードの中間デコード結果となる
Ｄコード42,アドレス計算情報となるＡコード43を後段
のステージに出力する。37はオペランドアドレス計算ス
テージ（Ａステージ）で、レジスタやメモリの書込み予
約およびマイクロプログラムのエントリ番地とマイクロ
プログラムに対するパラメータ等からなるＲコード44お
よびアドレス計算結果となるＦコード45を出力する。38
はオペランドフェッチステージ（Ｆステージ）で、マイ
クロROMアクセスを行うＲステージ40,オペランドのプリ
フェッチを行うOFステージ41から構成され、Ｒステージ
40は、実行制御コードとなるＥコード46を生成し、OFス
テージ41はフェッチしたオペランドとなるＳコード47は
それぞれ実行ステージ（Ｅステージ）39へ出力するとい
った、例えば５段構成のパイプライン処理を実行する。
なお、Ｅステージ39では、１段のストアバッファがある
ほか、高機能命令の一部は命令実行自体をパイプライン
化するため、実際には５段以上のパイプライン処理と同
等の処理を行える。In this figure, 35 is an instruction fetch stage (IF stage), which prefetches instructions and outputs the instruction code to the subsequent stage. A decode stage (D stage) 36 decodes an instruction code, performs operand address calculation, and outputs a D code 42 which is an intermediate decoding result of the operation code and an A code 43 which is address calculation information to a subsequent stage. An operand address calculation stage (A stage) 37 outputs an R code 44 consisting of a register or memory write reservation, an entry address of the microprogram and parameters for the microprogram, and an F code 45 resulting from the address calculation. 38
Is an operand fetch stage (F stage), which is composed of an R stage 40 that performs micro ROM access and an OF stage 41 that performs prefetch of operands.
40 generates an E code 46 that is an execution control code, and the OF stage 41 outputs the fetched operand S code 47 to the execution stage (E stage) 39, for example. Run.
In addition, in the E stage 39, in addition to the one-stage store buffer, a part of the high-performance instructions pipelines the instruction execution itself, so that the processing equivalent to the pipeline processing of five or more stages can be actually performed.

また、各ステージは、他のステージとは独立に動作し、
理論上は５つのステージが完全に独立動作する。各ステ
ージは１回の処理を最小２クロックで実行可能となって
いる。さらに、この実施例におけるデータ処理装置にお
いては、メモリ−メモリ間演算や、メモリ間接アドレッ
シング等、基本パイプライン処理１回だけでは処理が行
えない命令があるが、これらの処理に対してもなるべく
均衡したパイプライン処理が行えるように設計されてい
る。複数のメモリオペランドをもつ命令に対してはメモ
リオペランドの数をもとに、デコード段階で複数のパイ
プライン処理単位（ステップコード）に分解してパイプ
ライン処理を行うのである。特にパイプライン処理単位
の分解方法については、特願昭61-236456号に詳しく記
載されており、ここでの説明は省略する。Also, each stage operates independently of the other stages,
In theory, the five stages operate completely independently. Each stage can execute one processing in a minimum of 2 clocks. Further, in the data processing device in this embodiment, there are instructions that cannot be processed by only one basic pipeline processing such as memory-memory operation and memory indirect addressing. It is designed so that it can perform pipeline processing. For an instruction having a plurality of memory operands, the pipeline processing is performed by decomposing into a plurality of pipeline processing units (step codes) at the decoding stage based on the number of memory operands. Particularly, the method of disassembling the pipeline processing unit is described in detail in Japanese Patent Application No. 61-236456, and the description thereof is omitted here.

次に各ステージの処理動作について説明する。Next, the processing operation of each stage will be described.

IFステージ35からＤステージ36に渡される情報は、命令
コードそのものである。Ｄステージ36からＡステージ37
に渡される情報は、命令で指定された演算に関するＤコ
ード42と、オペランドのアドレス計算に関係するＡコー
ド43との２つである。Ａステージ37からＦステージ38に
渡される情報はマイクロプログラムルーチンのエントリ
番地やマイクロプログラムへのパラメータなどを含むＦ
コード45の２つである。Ｆステージ38からＥステージ39
に渡される情報は、演算制御情報とリテラル等を含むＥ
コード46と、オペランドやオペランドアドレスを含むＳ
コード47との２つである。The information passed from the IF stage 35 to the D stage 36 is the instruction code itself. D stage 36 to A stage 37
There are two pieces of information that are passed to each of the two: the D code 42 related to the operation designated by the instruction and the A code 43 related to the operand address calculation. The information passed from the A stage 37 to the F stage 38 includes the entry address of the microprogram routine and the parameters to the microprogram.
There are two, code 45. F stage 38 to E stage 39
The information passed to E includes E control information and literals.
Code 46 and S containing operand and operand address
Two with code 47.

Ｅステージ39以外のステージで検出されたEIT処理は、
そのコードがＥステージ39に到達されるまではEIT処理
は起動されない。すなわち、Ｅステージ39で処理されて
いる命令のみが実行段階の命令であり、IFステージ35〜
Ｆステージ38で処理されている命令はまだ実行段階に至
っていない。従って、Ｅステージ39以外で検出されたEI
Tは、検出したことをステップコード中に記憶して次の
ステージに伝えられる。EIT processing detected in stages other than E stage 39
The EIT process is not started until the code reaches the E stage 39. That is, only the instruction processed in the E stage 39 is the instruction in the execution stage, and the IF stage 35-
The instruction processed in the F stage 38 has not yet reached the execution stage. Therefore, the EI detected in other than E stage 39
The T stores the detected result in the step code and transmits it to the next stage.

次にパイプラインに処理単位について説明する。Next, the processing unit of the pipeline will be described.

この実施例におけるデータ処理装置では、命令フォーマ
ットの特徴を生かしたパイプライン処理を行うため、Ｄ
ステージ36では、２バイトの命令基本部＋０〜４バイト
のアドレッシング修飾部，多段間接モード指定部＋アド
レッシング修飾部または命令固有の拡張部を１つのデコ
ード単位として処理する。各回のデコード結果をステッ
プコードと呼び、Ａステージ37以降ではこのステップコ
ードをパイプライン処理の単位としている。なお、ステ
ップコードの数は命令毎に固有であり、多段間接モード
指定を行わないとき、１つの命令は最小１個、最大３個
のステップコードに分かれる。また、多段間接モード指
定があればそれだけ、ステップコードが増える。In the data processing device according to this embodiment, since the pipeline processing that makes the best use of the characteristics of the instruction format is performed, D
In the stage 36, the 2-byte instruction basic part + 0 to 4-byte addressing modification part, the multi-stage indirect mode designating part + addressing modification part or the instruction-specific extension part is processed as one decoding unit. The decoding result of each time is called a step code, and this step code is used as a unit of pipeline processing after the A stage 37. Note that the number of step codes is unique to each instruction, and when the multistage indirect mode is not designated, one instruction is divided into a minimum of one step code and a maximum of three step codes. Also, if there is a multi-stage indirect mode designation, the step code increases accordingly.

また、パイプライン上に存在するステップコードはすべ
て別命令に対するものである可能性があり、プログラム
カウンタの値はステップコード毎に管理する。すべての
ステップコードはそのステップコードのもとになった命
令のプログラムカウンタ値を持つ。ステップコードに付
属してパイプラインの各ステージを流れるプログラムカ
ウンタ値はステッププログラムカウンタ（SPC）と呼ば
れ、ステッププログラムカウンタ（SPC）はパイプライ
ンステージを次々と受け渡されて行く。Further, all step codes existing on the pipeline may be for different instructions, and the value of the program counter is managed for each step code. Every step code has the program counter value of the instruction that caused the step code. The program counter value that is attached to the step code and flows through each stage of the pipeline is called a step program counter (SPC), and the step program counter (SPC) is passed through the pipeline stages one after another.

各パイプラインステージの入出力ステップコードには第
４図に示したように便宜上名前が付けられている。ま
た、ステップコードはオペコードに関する処理を行い、
マイクロROM部５のエントリ番地やＥステージ39に対す
るパラメータなどになる系列とＥステージ39のマイクロ
命令に対するオペランドになる系列の２系列がある。The input / output step codes of each pipeline stage are named for convenience as shown in FIG. Also, the step code performs processing related to the operation code,
There are two series of series, which are the entry address of the micro ROM section 5 and parameters for the E stage 39, and the series which is the operand for the micro instruction of the E stage 39.

命令フェッチステージ35は命令をメモリやブランチバッ
ファからフェッチし、命令キュー2bに入力して、Ｄステ
ージ36に対して命令コードを出力する。命令キュー2bの
入力は、整置された４バイト単位で行う。メモリから命
令をフェッチする時は、整置された４バイトにつき最小
２クロックを要する。ブランチバッファがヒットした時
は整置された４バイトにつき１クロックでフェッチ可能
である。The instruction fetch stage 35 fetches an instruction from a memory or a branch buffer, inputs it to the instruction queue 2b, and outputs an instruction code to the D stage 36. Input to the instruction queue 2b is performed in aligned 4-byte units. Fetching instructions from memory requires a minimum of 2 clocks for every 4 bytes aligned. When the branch buffer is hit, it is possible to fetch the aligned 4 bytes in 1 clock.

命令キュー2bの出力単位は、２バイト毎に可変であり、
２クロックの間に最大６バイトまで出力できる。また、
分岐の直後には命令キュー2bをバイパスして命令基本部
２バイトを直接命令デコーダに転送することも出来るよ
うに構成されている。The output unit of the instruction queue 2b is variable every 2 bytes,
Up to 6 bytes can be output in 2 clocks. Also,
Immediately after branching, the instruction queue 2b is bypassed so that 2 bytes of the basic instruction part can be directly transferred to the instruction decoder.

ブランチバッファへの命令の登録やクリアなどの制御，
プリフェッチ先命令アドレスの管理や命令キューの制御
もIFステージ35で行う。Controls such as registering and clearing instructions in the branch buffer,
The IF stage 35 also manages the prefetch destination instruction address and controls the instruction queue.

なお、IFステージ35で検出するEIT処理には命令をメモ
リからフェッチするときのバスアクセス例外，メモリ保
護違反等によるアドレス変換例外がある。The EIT processing detected by the IF stage 35 includes a bus access exception when fetching an instruction from the memory and an address translation exception due to a memory protection violation.

命令デコードステージ36では、IFステージ35から入力さ
れた命令コードをデコードする。デコードは、命令デコ
ード部２のFHWデコーダ,NFHWデコーダ，アドレッシング
モードデコーダを使用して、２クロック単位に１度行
い、１回のデコード処理で、０〜６バイトの命令コード
を消費する（RET命令等の復帰先アドレスを含むステッ
プコードの出力処理などでは命令コードを消費しな
い）。１回のデコードで、Ａステージ37に対してアドレ
ス計算情報であるＡコード43である約35ビットの制御コ
ードと最大32ビットアドレス修飾情報と、オペコードの
中間デコード結果であるＤコード42である約50ビットの
制御コードと８ビットのリテラル情報とをＡステージ37
に対して出力する。The instruction decode stage 36 decodes the instruction code input from the IF stage 35. Decoding is performed once every two clocks using the FHW decoder, NFHW decoder, and addressing mode decoder of the instruction decoding unit 2, and one decoding process consumes an instruction code of 0 to 6 bytes (RET instruction The instruction code is not consumed in the output processing of the step code including the return address such as). About 35 bit control code which is A code 43 which is address calculation information and maximum 32 bit address modification information for A stage 37 and D code 42 which is intermediate decoding result of operation code. 50-bit control code and 8-bit literal information in A stage 37
Output to.

なお、Ｄステージ36では、各命令のPC計算部の制御，分
岐予測処理，プリブランチ命令に対するプリブランチ処
理，命令キューからの命令コード出力処理等も行う。ま
た、Ｄステージ36で検出するEIT処理には、予約命令例
外，プリブランチ時の奇数アドレスジャンプトラップが
ある。また、IFステージ35より転送されてきた各種EIT
処理はステップコード内にエンコードする処理をしてＡ
ステージ37に転送する。In the D stage 36, control of the PC calculator of each instruction, branch prediction processing, pre-branch processing for pre-branch instructions, and instruction code output processing from the instruction queue are also performed. Further, the EIT processing detected in the D stage 36 includes a reserved instruction exception and an odd address jump trap during pre-branch. In addition, various EITs transferred from the IF stage 35
The process is the process of encoding in the step code A
Transfer to stage 37.

Ａステージ37では、処理が大きく分けて２つに分かれ、
１つは命令コード部２のデコーダを使用して、オペコー
ドの後段デコード処理で、他方はオペランドアドレス計
算部４でオペランドアドレスの計算を行う処理である。In A stage 37, the processing is roughly divided into two,
One is a subsequent decoding process of the operation code by using the decoder of the instruction code unit 2, and the other is a process of calculating an operand address by the operand address calculation unit 4.

オペコードの後段デコード処理は、Ｄコード42を入力と
し、レジスタやメモリの書込み予約およびマイクロプロ
グラムのエントリ番地とマイクロプログラムに対するパ
ラメータ等を含むＲ個−ド44の出力を行う。なお、レジ
スタやメモリの書込み予約は、アドレス計算で参照した
レジスタやメモリの内容が、パイプライン上を先行する
命令で書き換えられ、誤ったアドレス計算が行われるの
を防ぐためである。レジスタやメモリの書き込み予約
は、デッドロックを避けるため、ステップコード毎に行
うのではなく命令毎に行う。レジスタやメモリの書き込
み予約については、特願昭62-144394号等に詳しく記載
されているため説明は省略する。In the subsequent decoding process of the operation code, the D code 42 is input, and the R code 44 including the write reservation of the register and the memory and the entry address of the microprogram and parameters for the microprogram is output. The write reservation of the register and the memory is for preventing the contents of the register and the memory referred to in the address calculation from being rewritten by the preceding instruction on the pipeline and erroneous address calculation being performed. In order to avoid deadlock, write reservation of registers and memory is performed not for each step code but for each instruction. Details of the register and memory write reservations are described in Japanese Patent Application No. 62-144394 and the description thereof is omitted.

オペランドアドレス計算処理は、Ａコード43を入力と
し、Ａコード43に従いオペランドアドレス計算部４で加
算やメモリ間接参照を組み合わせてアドレス計算を行
い、その計算結果をＦコード45として出力する。この
際、アドレス計算に伴うレジスタやメモリの読み出し時
にコンフリクトチェックを行い、先行命令がレジスタや
メモリ書き込み処理を終了していないためコンフリクト
が指示されれば、先行命令がＥステージ39で書き込み処
理を終了するまで待つ。In the operand address calculation processing, the A code 43 is input, the operand address calculation unit 4 performs address calculation by combining addition and memory indirect reference according to the A code 43, and the calculation result is output as an F code 45. At this time, a conflict check is performed at the time of reading the register or the memory associated with the address calculation, and if the conflict is instructed because the preceding instruction does not finish the register or memory writing process, the preceding instruction finishes the writing process at the E stage 39. Wait until you do.

また、オペランドアドレスやメモリ間接参照のアドレス
がメモリにマップされたI/O領域に入るかどうかのチェ
ックも行う。It also checks whether the operand address or memory indirect reference address falls within the I / O area mapped in memory.

Ａステージ37で検出するEIT処理には、予約命令例外，
特権命令例外，バスアクセス例外，アドレス変換例外，
メモリ間接アドレッシングのときのオペランドブレイク
ポイントヒットによるデバッグトラップがある。Ｄコー
ド42,Aコード43自体がEIT処理を起こしたことを示して
おれば、Ａステージ37はそのコードに対してアドレス計
算処理をせず、そのEIT処理をＲコード44,Fコード45に
伝える。The EIT processing detected at the A stage 37 includes a reserved instruction exception,
Privileged instruction exception, bus access exception, address translation exception,
There is a debug trap due to an operand breakpoint hit during memory indirect addressing. If the D code 42 and the A code 43 indicate that the EIT processing has occurred, the A stage 37 does not perform the address calculation processing on the code, and transmits the EIT processing to the R code 44 and the F code 45. .

Ｆステージ38も処理が大きく２つに分れ、１つはマイク
ロROMのアクセス処理を実行するＲステージ40と、オペ
ランドプリフェッチ処理を行うOFステージ41から構成さ
れ、Ｒステージ40とOFステージ41は必ずしも同時に動作
するわけではなく、メモリアクセス権が獲得できるかど
うかに依存して独立して動作する。The processing of the F stage 38 is also roughly divided into two, and one is composed of an R stage 40 for executing a micro ROM access processing and an OF stage 41 for performing an operand prefetch processing. The R stage 40 and the OF stage 41 are not always required. They do not operate simultaneously, but operate independently depending on whether memory access rights can be acquired.

Ｒステージ40では、マイクロROMアクセス処理、すなわ
ちＲコード44に対して次のＥステージ39での実行に使用
する実行制御コードであるＥコード46を作り出すための
マイクロROMアクセスとマイクロ命令デコード処理を行
う。なお、１つのＲコード44に対する処理が２つ以上の
マイクロプログラムステップに分解される場合、マイク
ロROMはＥステージ39で使用され、次のＲコード44はマ
イクロROMアクセス待ちになる。Ｒコード44に対するマ
イクロROMアクセスが行われるのは、その前のＥステー
ジ39での最後のマイクロ命令実行の時である。また、こ
の実施例におけるデータ処理装置では、殆どの基本命令
は１マイクロプログラムステップで行われるため、実際
にはＲコード44に対するマイクロROMアクセスが次々に
行われることが多い。In the R stage 40, a micro ROM access process, that is, a micro ROM access and a micro instruction decoding process for producing an E code 46, which is an execution control code used for execution in the next E stage 39, is performed on the R code 44. . When the processing for one R code 44 is decomposed into two or more microprogram steps, the micro ROM is used in the E stage 39, and the next R code 44 waits for the micro ROM access. The micro ROM access to the R code 44 is performed at the last micro instruction execution in the E stage 39 before that. Further, in the data processor of this embodiment, most of the basic instructions are executed in one microprogram step, so in practice, micro ROM access to the R code 44 is often performed one after another.

Ｒステージ40で新たに検出するEIT処理はなく、Ｒコー
ド44に命令処理再実行型のEITを示している時には、そ
のEIT処理に対するマイクロプログラムが実行される。When there is no EIT processing newly detected in the R stage 40 and the R code 44 indicates an instruction processing re-execution type EIT, the microprogram for the EIT processing is executed.

Ｒコード44が奇数アドレスジャンプトラップを示してい
るとき、Ｒステージ40はそれをＥコード46に伝える。こ
れはプリブランチに対するもので、Ｅステージ39ではそ
のＥコード46で分岐が生じなければ、そのプリブランチ
を有効として奇数アドレスジャンプトラップを発生す
る。When R-code 44 indicates an odd address jump trap, R-stage 40 conveys it to E-code 46. This is for a pre-branch. In the E stage 39, if no branch occurs in the E code 46, the pre-branch is validated and an odd address jump trap is generated.

OFステージ41では、Ｆステージ38で行う上記２つの処理
のうち、オペランドフェッチ処理を行う。In the OF stage 41, the operand fetch process of the two processes performed in the F stage 38 is performed.

オペランドフェッチはＦコード45を入力とし、フェッチ
したオペランドとアドレスをＳコード47として出力す
る。１つのＦコード45ではワード境界をまたいでもよい
が、４バイト以下のオペランドフェッチを指定する。Ｆ
コード45にはオペランドのアクセスを行うかどうかの指
定も含まれており、Ａステージ37で計算したオペランド
アドレス自体や即値をＥステージ39に転送する場合に
は、オペランドプリフェッチは行わず、Ｆコード45の内
容をＳコード47として転送する。For the operand fetch, the F code 45 is input, and the fetched operand and address are output as S code 47. One F code 45 may cross word boundaries, but specifies an operand fetch of 4 bytes or less. F
The code 45 also includes a designation as to whether or not to access the operand. When transferring the operand address itself or the immediate value calculated in the A stage 37 to the E stage 39, the operand prefetch is not performed and the F code 45 Is transmitted as S code 47.

プリフェッチしようとするオペランドとＥステージ39が
書き込み処理を行おうとするオペランドが一致するとき
は、オペランドプリフェッチはメモリから行わず、バイ
パスして行う。When the operand to be prefetched and the operand to be written by the E stage 39 match, the operand prefetch is not performed from the memory but bypassed.

また、I/O領域に対してはオペランドプリフェッチを遅
延させ、先行命令がすべて完了するまで待ってオペラン
ドフェッチを行う。Also, the operand prefetch is delayed for the I / O area, and the operand fetch is performed after waiting for the completion of all the preceding instructions.

OFステージ41で検出するEIT処理には、バイアクセス例
外，アドレス変換例外，オペランドプリフェッチに対す
るブレイクポイントヒットによるデバックトラップがあ
る。Ｆコード45がデバックトラップ以外のEIT処理を示
しているときは、それをＳコード47として転送し、オペ
ランドプリフェッチは行わない。Ｆコード45がデバッグ
トラップを示しているときはそのＦコード45に対してEI
T処理を示していないときと同じ処理をするとともに、
デバックトラップをＳコード47としてＥステージ39に伝
える。The EIT processing detected in the OF stage 41 includes a debug trap due to a breakpoint hit for a by-access exception, an address translation exception, and an operand prefetch. When the F code 45 indicates an EIT process other than a debug trap, it is transferred as an S code 47 and operand prefetch is not performed. When F code 45 indicates a debug trap, EI is issued for that F code 45.
The same processing as when T processing is not shown,
The debug trap is transmitted to the E stage 39 as S code 47.

実行ステージ39では、Ｅコード46,Sコード47を入力とし
て動作し、命令を実行するステージとなり、Ｆステージ
38以前の各ステージで行われた処理は、すべてＥステー
ジ39のための前処理ステージとして機能する。In the execution stage 39, the E code 46 and the S code 47 operate as inputs to become the stage for executing instructions, and the F stage
All the processing performed in each stage before 38 functions as a pre-processing stage for E stage 39.

Ｅステージ39でジャンプ命令が実行されたり、EIT処理
が起動されたりしたときは、IFステージ35〜Ｆステージ
38までの処理はすべて無効化される。Ｅステージ39に
は、マイクロプログラムにより制御され、Ｒコード44に
示されたマイクロプログラムのエントリ番地から一連の
マイクロプログラムを実行することにより命令を実行す
る。If a jump instruction is executed in E stage 39 or EIT processing is started, IF stage 35 to F stage
All processes up to 38 are invalidated. Controlled by the microprogram, the E stage 39 executes an instruction by executing a series of microprograms from the entry address of the microprogram indicated by the R code 44.

マイクロROMの読み出しとマイクロ命令の実行は、パイ
プライン化されて行われる。従って、マイクロプログラ
ムで分岐が起きたときは１マイクロステップの空きがで
きる。また、Ｅステージ39はデータ演算部６にあるスト
アバッファを利用して、４バイト以内のオペランドスト
アと次のマイクロ命令実行をパイプライン処理すること
もできる。The reading of the micro ROM and the execution of the micro instructions are pipelined. Therefore, when a branch occurs in the microprogram, there is a space of 1 microstep. Further, the E stage 39 can use the store buffer in the data operation unit 6 to pipeline the operand store within 4 bytes and the next microinstruction execution.

Ｅステージ39ではＡステージ37で行ったレジスタやメモ
リに対する書き込み予約をオペランド書き込みの後、解
除する。In the E stage 39, the write reservation for the register and the memory made in the A stage 37 is canceled after writing the operand.

また、条件分岐命令がＥステージ39で分岐を起こしたと
きは、その条件分岐命令に対する分岐予測が誤っていた
のであるから分岐履歴の書き換えを行う。When the conditional branch instruction causes a branch at the E stage 39, the branch history for the conditional branch instruction is rewritten because the branch prediction for the conditional branch instruction was incorrect.

Ｅステージ39で検出されるEITには、バスアクセス例
外，アドレス変換例外，デバッグトラップ，奇数アドレ
スジャンプトラップ，予約機能例外，不正オペランド例
外，予約スタックフォーマット例外，ゼロ除算トラッ
プ，無条件トラップ，条件トラップ，遅延コンテキスト
トラップ，外部割込み，遅延割込み，リセット割込み，
システム障害等がある。The EIT detected in the E stage 39 includes a bus access exception, an address translation exception, a debug trap, an odd address jump trap, a reserved function exception, an invalid operand exception, a reserved stack format exception, a divide by zero trap, an unconditional trap, and a conditional trap. , Delayed context trap, external interrupt, delayed interrupt, reset interrupt,
There is a system failure.

Ｅステージ39で検出されたEITは、すべてEIT処理される
がＥステージ39以前のIFステージ35〜Ｆステージ38の間
で検出され、Ｒコード44やＳコード47に反映されている
EITは、必ずEIT処理されるとは限らない。IFステージ35
〜Ｆステージ38の間で検出したが、先行の命令がＥステ
ージ39でジャンプ命令が実行されたなどの原因でＥステ
ージ39まで到達しなかったEITはすべてキャンセルされ
る。そのEITを起こした命令は、そもそも実行されなか
ったこととなる。All EITs detected in E stage 39 are processed by EIT, but detected between IF stage 35 to F stage 38 before E stage 39 and reflected in R code 44 and S code 47.
EIT is not always processed by EIT. IF stage 35
All EITs that have been detected during the ~ F stage 38 but have not reached the E stage 39 due to the preceding instruction executing a jump instruction in the E stage 39, etc. are cancelled. The instruction that caused the EIT was not executed in the first place.

外部割込みや遅延割込みは、命令の切れ目でＥステージ
39で直接受け付けられ、マイクロプログラムにより必要
な処理が実行される。その他の各種EITも処理はマイク
ロプログラムにより行われる。External interrupts and delayed interrupts are at the E stage due to instruction breaks.
It is directly accepted at 39, and the required processing is executed by the microprogram. Other various EITs are also processed by microprograms.

パイプラインの各ステージは、入力ラッチと出力ラッチ
を持ち、他のステージとは独立に動作することを基本と
する。各ステージは１つ前に行った処理が終り、その処
理結果を出力ラッチから次のステージの入力ラッチに転
送し、自分のステージの入力ラッチに次の処理に必要な
入力信号がすべて揃えば次の処理を開始する機構になっ
ており、各ステージは、１つ前段のステージから出力さ
れてくる次の処理に対する入力信号がすべて有効とな
り、今の処理結果を後段のステージの入力ラッチに転送
して出力ラッチが空になると、次の処理を開始する。Each stage of the pipeline has an input latch and an output latch, and is basically operated independently of the other stages. When each stage completes the previous process, transfers the result of the process from the output latch to the input latch of the next stage, and when all the input signals necessary for the next process are available in the input latch of the own stage, In each stage, the input signal for the next process output from the previous stage becomes valid and each stage transfers the current process result to the input latch of the subsequent stage. And the output latch becomes empty, the next processing is started.

各ステージが動作する１つ前のクロックタイミングで入
力信号がすべて揃っている必要がある。入力信号が揃っ
ていないと、そのステージは待ち状態（入力待ち）にな
る。出力ラッチから次のステージの入力ラッチへの転送
を行うときは、次のステージの入力ラッチが空き状態に
なっている必要があり、次のステージの入力ラッチが空
きでない場合も、パイプラインステージは待ち状態（出
力待ち）になる。必要なメモリアクセス権が獲得できな
かったり、処理しているメモリアクセスにウエイトが挿
入されたり、その他のパイプラインコンフリクトが生じ
ると、各ステージの処理自体が遅延する。It is necessary that all input signals be completed at the clock timing one clock before the operation of each stage. If the input signals are not complete, the stage enters the waiting state (waiting for input). When performing a transfer from the output latch to the input latch of the next stage, the input latch of the next stage must be empty, and if the input latch of the next stage is not empty, the pipeline stage Waiting state (waiting for output). If the necessary memory access right cannot be acquired, a wait is inserted in the memory access being processed, or other pipeline conflict occurs, the processing itself of each stage is delayed.

次にブランチ命令に対するPC計算部３の動作について説
明する。Next, the operation of the PC calculator 3 for a branch instruction will be described.

PC計算部３は、Ｄステージ36で命令コードがデコードさ
れるとき、１つ前にデコードされた命令コードの長さ情
報とその１つ前にデコードされた命令コードの先頭番地
とからデコード中の命令コードの先頭番地を計算する。
PC計算部３ではDPCラッチ3eに命令の切れ目のアドレス
である命令のPC値を保持し、TPCラッチ3dにステップコ
ードの切れ目のアドレスを管理する。DPCラッチ3eは命
令の切れ目のアドレスが計算されたときだけ書き換えら
れる。TPCラッチ3dはステップコードの切れ目のアドレ
ス、すなわち命令デコード処理毎に書き換えられる。パ
イプライン上で処理されるステップコードのPC値はその
ステップコードのもとになった命令のPC値が必要である
ため、第２図に示すDPCラッチ3eの値がAPCラッチ3f,FPC
ラッチ3g,CPCラッチ3hに順次転送されて行く。When the instruction code is decoded in the D stage 36, the PC calculation unit 3 determines whether the instruction code is decoded from the length information of the instruction code that was decoded immediately before and the start address of the instruction code that was decoded immediately before that. Calculate the start address of the instruction code.
The PC calculator 3 holds the PC value of the instruction, which is the address of the instruction break, in the DPC latch 3e, and manages the address of the step code break in the TPC latch 3d. The DPC latch 3e is rewritten only when the break address of the instruction is calculated. The TPC latch 3d is rewritten at the address of the break of the step code, that is, every time the instruction decoding process is performed. Since the PC value of the step code processed in the pipeline needs the PC value of the instruction that caused the step code, the value of the DPC latch 3e shown in FIG. 2 is the APC latch 3f, FPC.
The data is sequentially transferred to the latch 3g and the CPC latch 3h.

命令のデコード処理については上述したように、ステッ
プコード単位に実行され、１回のデコード処理で０〜６
バイトの命令コードが消費される。命令デコード処理毎
に判明したそのとき使用した命令コードの長さが命令デ
コーダ2aの命令長バス11に出力される。As described above, the instruction decoding process is executed in units of step code, and 0 to 6 can be performed in one decoding process.
Bytes of opcode are consumed. The length of the instruction code used at that time, which is found for each instruction decoding process, is output to the instruction length bus 11 of the instruction decoder 2a.

プリブランチしない場合、Ｄステージ36は引き続く次の
命令のデコード処理を行うと同時に、PC計算部３で引き
続く次の命令のPC値を計算するため、TPCラッチ3dの値
と命令長バス11から転送されたデコードで消費した命令
コードの長さとの加算を行いTPCラッチ3dに加算結果を
書き戻す。すなわち、あるステップコードの先頭アドレ
スは、そのステップコードがデコード処理により生成さ
れたときに計算される。If not pre-branched, the D stage 36 decodes the next succeeding instruction and, at the same time, calculates the PC value of the succeeding next instruction in the PC calculation unit 3, so that the value of the TPC latch 3d and the instruction length bus 11 are transferred. The added result is added to the length of the instruction code consumed by the decoding, and the addition result is written back to the TPC latch 3d. That is, the start address of a step code is calculated when the step code is generated by the decoding process.

プリブランチ以外では、デコードすべき命令コードは命
令キュー2bから次々と出力されるため、デコード開始段
階でそのコードの先頭アドレスを知る必要はない。Ｄス
テージ36で生成したステップコードが命令Ａの最後のス
テップコードであるとき、次の命令Ｂのデコード処理中
に計算されるPC加算器3bの出力は、命令Ｂの先頭番地で
あり、命令ＢのPC値であるから、PC加算器3bの出力であ
る命令ＢのPC値はPOバス15からTPCラッチ3dとDPCラッチ
3eの両方に書き込まれる。さらにこのときＡステージ37
が入力コード待ちで、APCラッチ3fが至急必要とされて
いるなら、POバス15からAPCラッチ3fにも命令ＢのPC値
が書き込まれる。Except for the pre-branch, the instruction codes to be decoded are output one after another from the instruction queue 2b, so it is not necessary to know the start address of the code at the decoding start stage. When the step code generated in the D stage 36 is the last step code of the instruction A, the output of the PC adder 3b calculated during the decoding process of the next instruction B is the start address of the instruction B and the instruction B The PC value of the instruction B, which is the output of the PC adder 3b, is from the PO bus 15 to the TPC latch 3d and the DPC latch.
Written on both 3e. Furthermore, at this time, A stage 37
Is waiting for the input code, and if the APC latch 3f is urgently needed, the PC value of the instruction B is also written from the PO bus 15 to the APC latch 3f.

プリブランチする場合、Ｄステージ36はプリブランチ命
令の最後のステップコードを出力した後、命令デコーダ
2aの処理を止め、分岐先命令のPC値を計算するため、DP
Cラッチ3eの値とDISPバス10より転送された分岐変位の
加算を行う。When pre-branching, the D stage 36 outputs the last step code of the pre-branch instruction, and then the instruction decoder
Since the processing of 2a is stopped and the PC value of the branch destination instruction is calculated, DP
The value of the C latch 3e and the branch displacement transferred from the DISP bus 10 are added.

さらに、IFステージ35に初期化指示を出し、加算結果で
ある分岐命令のPC値をTPCラッチ3dとDPCラッチ3eに書き
込むとともに、CAバス14にも出力してカウンタQINPC,CA
Aラッチ7eにも書き込む。Further, the initialization instruction is issued to the IF stage 35, the PC value of the branch instruction as the addition result is written to the TPC latch 3d and the DPC latch 3e, and also output to the CA bus 14 to output the counter QINPC, CA.
Also write to A-latch 7e.

プリブランチによる分岐先命令アドレス計算の際、奇数
アドレスジャンプトラップの検出も行い、Ｄコード42中
にその結果をパラメータとして示す。Ｅステージ39では
プリブランチが正しいと判明した時に、奇数アドレスジ
ャンプトラップを起動する。プリブランチが間違ってい
て、再びＥステージ39で分岐が生じたときは、プリブラ
ンチで検出した奇数アドレスジャンプトラップは無視さ
れる。このため、Ｄステージ36で検出された奇数アドレ
スジャンプトラップは、その他のEITとは別扱いとなっ
ている。また、Ｅステップ39では奇数アドレスジャンプ
トラップの機動処理に奇数となった命令アドレスの値を
必要とする。このため、Ｄステージ36は奇数アドレスジ
ャンプトラップの検出を行ったとき、その奇数アドレス
値をPC値とした特殊なステップコード（OAJTステップコ
ード）を発生する。OAJTステップコードに対してＡステ
ージ37,Fステージ38はそのコードを次のステージに伝え
る。Ｅステージ39はプリブランチが正しいと判断し、し
かもそのプリブランチが奇数アドレスジャンプトラップ
を検出しているとき、CPCラッチ3hを通して次に転送さ
れてくるOAJTステップコードのPC値を使用して奇数アド
レスジャンプトラップの起動処理を行う。An odd address jump trap is also detected when the branch target instruction address is calculated by the pre-branch, and the result is shown as a parameter in the D code 42. In the E stage 39, when the pre-branch is found to be correct, the odd address jump trap is activated. If the pre-branch is wrong and a branch occurs again in the E stage 39, the odd address jump trap detected in the pre-branch is ignored. Therefore, the odd address jump trap detected in the D stage 36 is treated differently from other EITs. Further, in the E step 39, the value of the odd instruction address is required for the mobile processing of the odd address jump trap. Therefore, when the D stage 36 detects an odd address jump trap, it generates a special step code (OAJT step code) with the odd address value as the PC value. In response to the OAJT step code, the A stage 37 and F stage 38 transmit the code to the next stage. When the E stage 39 determines that the pre-branch is correct and the pre-branch detects an odd address jump trap, it uses the PC value of the OAJT step code transferred next through the CPC latch 3h to detect the odd address. Performs jump trap startup processing.

Ｅステージ39で分岐が生じたときは分岐先アドレスがEB
ラッチ６からCAバス14を通じてTPCラッチ3dに転送され
てくる。PC計算部３はこの値とゼロを加算して結果をPO
バス15からTPCラッチ3d,DPCラッチ3eに書き込む。これ
で、PC計算部３の初期化が完了する。この初期化の処理
はＥステージ39で分岐が生じた最初の単位デコードとオ
ーバラップしてなされる。なお、カウンタQINPC,CAAラ
ッチ7eにはＡバス13よりTPCラッチ3dに値を取り込む際
に同じ値がセットされる。When a branch occurs at E stage 39, the branch destination address is EB
It is transferred from the latch 6 to the TPC latch 3d via the CA bus 14. The PC calculation unit 3 adds this value and zero and outputs the result to PO.
Write from the bus 15 to the TPC latch 3d and DPC latch 3e. This completes the initialization of the PC calculation unit 3. This initialization processing is performed by overlapping with the first unit decoding in which the branch occurs in the E stage 39. The same value is set in the counter QINPC and CAA latch 7e when the value is taken into the TPC latch 3d from the A bus 13.

プリブランチ命令に対して、Ｄステージ36がプリブラン
チ処理を行わなかった場合、オペランドアドレス計算部
４がプリブランチ命令の分岐先アドレスを計算する。分
岐先アドレスの計算は、Ａバス13より転送されてくるAP
Cラッチ3fの値とDISPバス10より転送されてくる分岐変
位値をアドレス加算器4dで加算することにより行われ
る。計算された分岐先アドレスはＥステージ39に伝えら
れる。Ａステージ37でオペランドアドレス計算部４を使
用した分岐先アドレスの計算の際は、奇数アドレスジャ
ンプトラップの検出は行わない。Ｅステージ39に転送さ
れる分岐先アドレスが奇数であることにより、奇数アド
レスジャンプトラップの情報が伝えられる。When the D stage 36 does not perform the pre-branch processing for the pre-branch instruction, the operand address calculation unit 4 calculates the branch destination address of the pre-branch instruction. The branch destination address is calculated by the AP transferred from the A bus 13.
This is performed by adding the value of the C latch 3f and the branch displacement value transferred from the DISP bus 10 by the address adder 4d. The calculated branch destination address is transmitted to the E stage 39. When calculating the branch destination address using the operand address calculation unit 4 in the A stage 37, the odd address jump trap is not detected. Since the branch destination address transferred to the E stage 39 is odd, the information of the odd address jump trap is transmitted.

Ｄステージ36がプリブランチ処理をした場合、BCC命令,
ACB命令に対してはＡステージ37がそのプリブランチ命
令に引き続くアドレスにある次の命令のPC値を計算す
る。計算結果はＥステージ39に伝えられ、プリブランチ
が間違っていたときの再度の分岐先アドレスとして使用
される。BCC命令等、Ｄステージ36で１ステップコード
にデコードされる命令に対しては、Ａバス13より転送さ
れてくるAPCラッチ3fの値に補正値バス12から転送され
てくるBCC命令の命令長を加算して、加算結果をAOバス1
6よりFAラッチ7bに書き込む。If D stage 36 performs pre-branch processing, BCC instruction,
For the ACB instruction, the A stage 37 calculates the PC value of the next instruction at the address following the pre-branch instruction. The calculation result is transmitted to the E stage 39 and used as a branch destination address again when the pre-branch is wrong. For an instruction such as a BCC instruction that is decoded into a one-step code in the D stage 36, the value of the APC latch 3f transferred from the A bus 13 is set to the instruction length of the BCC instruction transferred from the correction value bus 12. Add and add the result to AO bus 1
Write to FA latch 7b from 6.

〔The invention's effect〕

以上説明したように、この発明は第２の分岐処理発生を
示す発生情報を保持する保持手段と、この第２の分岐処
理発生直後の前記第１のパイプライン処理ステージにお
ける分岐予測結果を前記保持手段に保持される発生情報
に基づいて非分岐とする分岐予測制御手段とを設けたの
で、命令実行ステージにおける分岐直後にデコードされ
る命令による分岐予測処理を「分岐しない」と予測で
き、曖昧な予測処理によるデータ処理効率の低下を防止
できる優れた効果を奏する。As described above, according to the present invention, the holding means for holding the occurrence information indicating the occurrence of the second branch processing, and the branch prediction result in the first pipeline processing stage immediately after the occurrence of the second branch processing are held. Since the branch prediction control means for making non-branch based on the occurrence information held in the means is provided, the branch prediction processing by the instruction decoded immediately after the branch in the instruction execution stage can be predicted as "no branch", which is ambiguous. This has an excellent effect of preventing a decrease in data processing efficiency due to the prediction processing.

[Brief description of drawings]

第１図はこの発明の一実施例を示すデータ処理装置の構
成を説明するブロック図、第２図は、第１図に示したデ
ータ処理装置の構成を説明する詳細ブロック図、第３図
は、第２図に示した分岐予測テーブルの構成を説明する
詳細ブロック図、第４図はこの発明によるパイプライン
処理動作を説明するブロック図、第５図は従来のデータ
処理装置のパイプライン処理構成を説明するブロック
図、第６図は分岐命令処理に係る各ステージの命令実行
サイクルを説明する模式図である。図において、１は命令フェッチ部、２は命令デコード
部、３はPC計算部、４はオペランドアドレス計算部、５
はマイクロROM部、６はデータ演算部、７は外部バスイ
ンタフェース部、８はアドレス出力回路、９はデータ入
出力回路である。なお、各図中の同一符号は同一または相当部分を示す。FIG. 1 is a block diagram for explaining the configuration of a data processing apparatus showing an embodiment of the present invention, FIG. 2 is a detailed block diagram for explaining the configuration of the data processing apparatus shown in FIG. 1, and FIG. 2 is a detailed block diagram for explaining the structure of the branch prediction table shown in FIG. 2, FIG. 4 is a block diagram for explaining the pipeline processing operation according to the present invention, and FIG. 5 is a pipeline processing structure for a conventional data processor. FIG. 6 is a schematic diagram for explaining the instruction execution cycle of each stage related to branch instruction processing. In the figure, 1 is an instruction fetch unit, 2 is an instruction decode unit, 3 is a PC calculation unit, 4 is an operand address calculation unit, 5
Is a micro ROM unit, 6 is a data operation unit, 7 is an external bus interface unit, 8 is an address output circuit, and 9 is a data input / output circuit. The same reference numerals in each drawing indicate the same or corresponding parts.

Claims

[Claims]

1. A first prediction function comprising: a decoding mechanism for decoding an instruction; and a branch prediction mechanism for executing a branch prediction process for predicting branching or non-branching of a conditional branch instruction. A first pipeline processing stage that performs branch processing, transfers the instruction decoding and the result of the first branch processing as a unit processing code to a subsequent pipeline processing stage, and the first pipeline processing stage. In a data processing device having a second branch processing based on a unit processing code and a second pipeline processing stage for executing a decoded instruction, holding means for holding occurrence information indicating occurrence of the second branch processing. And the branch prediction result in the first pipeline processing stage immediately after the occurrence of the second branch processing. The data processing apparatus being characterized in that includes a branch prediction unit for a non-branch based on the occurrence information held in the holding means.