JPH0769808B2

JPH0769808B2 - Data processing device

Info

Publication number: JPH0769808B2
Application number: JP63040024A
Authority: JP
Inventors: 豊彦吉田; 雅仁松尾
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1988-02-23
Filing date: 1988-02-23
Publication date: 1995-07-31
Anticipated expiration: 2010-07-31
Also published as: JPH01214931A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明はパイプラインの乱れを少なくする分岐命令処
理機構により、多段パイプライン処理機構を効率的に動
作させ、高い処理能力を実現したデータ装置に関するも
のである。DETAILED DESCRIPTION OF THE INVENTION [Industrial field of application] The present invention is a data device that realizes a high processing capacity by efficiently operating a multi-stage pipeline processing mechanism by a branch instruction processing mechanism that reduces disturbance of the pipeline. It is about.

[Conventional technology]

第５図に従来のデータ処理装置で行われていたパイプラ
イン処理機構の例を示す。（11）は命令フェッチステー
ジ（IFステージ）、（12）は命令デコードステージ（Ｄ
ステージ）、（13）はオペランドアドレス計算ステージ
（Ａステージ）、（14）はオペランドフェッチステージ
（Ｆステージ、（15）は命令実行ステージ（Ｅステー
ジ）である。FIG. 5 shows an example of a pipeline processing mechanism used in a conventional data processing device. (11) is the instruction fetch stage (IF stage), (12) is the instruction decode stage (D
Stage), (13) is an operand address calculation stage (A stage), (14) is an operand fetch stage (F stage), and (15) is an instruction execution stage (E stage).

IFステージ（11）はメモリから命令コードをフェッチし
てＤステージ（12）に出力する。Ｄステージ（12）はIF
ステージ（11）から入力される命令コードをデコードし
て、デコード結果をＡステージ（13）に出力する。Ａス
テージ（12）は命令コード中で指定されたオペランドの
実効アドレスの計算を行い、計算したオペランドアドレ
スをＦステージ（14）に出力する。Ｆステージ（14）は
Ａステージ（13）から入力されたオペランドアドレスに
従い、メモリよりオペランドをフェッチする。フェッチ
したオペランドはＥステージ（15）に出力する。Ｅステ
ージ（15）はＦステージ（14）から入力されたオペラン
ドに対して命令コード中で指定された演算を実行する。
さらに必要ならその演算結果をメモリにストアする。The IF stage (11) fetches the instruction code from the memory and outputs it to the D stage (12). D stage (12) is IF
The instruction code input from the stage (11) is decoded and the decoding result is output to the A stage (13). The A stage (12) calculates the effective address of the operand specified in the instruction code, and outputs the calculated operand address to the F stage (14). The F stage (14) fetches the operand from the memory according to the operand address input from the A stage (13). The fetched operand is output to the E stage (15). The E stage (15) executes the operation specified in the instruction code on the operand input from the F stage (14).
If necessary, the calculation result is stored in the memory.

上記のパイプライン処理機構により、各命令で指定され
る処理は５つに分解され、５つの処理を順番に実行する
ことにより、指定された処理を完了する。各々５つの処
理は異なる命令に対しては並列動作させることが可能で
あり、理想的には上記の５段のパイプライン処理機構に
より５つの命令を同時に処理し、パイプライン処理を行
わない場合にくらべ、最大で５倍の処理能力もつデータ
処理装置を得ることができる。By the above pipeline processing mechanism, the process designated by each instruction is decomposed into five, and the designated process is completed by sequentially executing the five processes. Each of the five processes can be operated in parallel for different instructions. Ideally, if the above five-stage pipeline processing mechanism processes five instructions at the same time, and pipeline processing is not performed. Compared to this, it is possible to obtain a data processing device having a maximum processing capacity of 5 times.

[Problems to be Solved by the Invention]

パイプライン処理技術は上記のようにデータ処理装置の
処理能力を大幅に向上させる可能性をもつものであり、
高速なデータ処理装置で広く用いられている。The pipeline processing technology has the potential to significantly improve the processing capacity of the data processing device as described above.
Widely used in high-speed data processing equipment.

しかし、パイプライン処理にもいくつかの欠点があり、
いつも理想的な状態で命令が処理されるわけではない。
パイプライン処理で問題となるものの１つは命令のシー
ケンスを乱す分岐命令の実行である。However, pipeline processing also has some drawbacks,
Instructions are not always processed in ideal conditions.
One of the problems in pipeline processing is the execution of branch instructions that disturb the sequence of instructions.

第５図に示すパイプライン処理機構をもち、分岐命令を
Ｅステージ（15）で処理してから分岐先命令をIFステー
ジ（11）が処理する従来のデータ処理装置では、分岐命
令の実行により、パイプラインが大幅に乱れる。従来の
データ処理装置で分岐命令が実行された場合に、パイプ
ライン中を流れる命令の様子を第６図に示す。第６図で
は命令３及び命令12が分岐命令である。命令３が実行さ
れるとすでにパイプライン処理中の命令４、命令５、命
令６、命令７はキャンセルされ、新たに命令11がIFステ
ージ（11）から処理される。命令３がＥステージ（15）
で実行されてから命令11がＥステージ（15）で実行され
るまでには４命令処理分の時間が無駄になる。命令12に
ついても同様に４命令処理分の時間が無駄になる。この
無駄時間は分岐命令の実行後に処理すべき命令のフェッ
チが分岐命令に対する全パイプライン処理が終了した後
に行われるためであり、パイプライン処理の段数が多い
ほどこの無駄時間も長くなる。In the conventional data processor having the pipeline processing mechanism shown in FIG. 5, the branch instruction is processed by the E stage (15) and then the branch target instruction is processed by the IF stage (11). The pipeline is significantly disturbed. FIG. 6 shows a state of an instruction flowing in a pipeline when a branch instruction is executed in a conventional data processing device. In FIG. 6, instruction 3 and instruction 12 are branch instructions. When the instruction 3 is executed, the instruction 4, the instruction 5, the instruction 6, and the instruction 7 which are already pipelined are canceled, and the instruction 11 is newly processed from the IF stage (11). Instruction 3 is E stage (15)
The time corresponding to the processing of four instructions is wasted until the instruction 11 is executed at the E stage (15) after being executed at. Similarly, for the instruction 12, the time for processing four instructions is wasted. This dead time is because the fetch of the instruction to be processed after the execution of the branch instruction is performed after the completion of all the pipeline processing for the branch instruction, and the larger the number of stages of the pipeline processing, the longer the dead time.

パイプライン処理を行うデータ処理装置では分岐命令の
処理が処理能力向上の１つの大きなキーポイントである
ことは従来より指摘されおり、様々な工夫がすでに行わ
れている。分岐命令の処理に対する工夫は例えば、J.K.
F.Lee,A.J.Smith,「Branch Prediction Strategies and
Branch Target Buffer Design」,IEEE Computer,Vo1.1
7,NO.1.January,1984.で紹介されている。しかし、いず
れの工夫も実現に多大のハードウエアを必要としたり、
一部の分岐命令のみにしか効果がないなど、まだまだ欠
点を多く残すものであった。It has been pointed out that the processing of branch instructions is one of the key points for improving the processing capability of a data processing device that performs pipeline processing, and various measures have already been taken. Some ideas for processing branch instructions include JK
F. Lee, AJ Smith, `` Branch Prediction Strategies and
Branch Target Buffer Design '', IEEE Computer, Vo1.1
7, NO.1.January, 1984. However, any of the devises require a large amount of hardware to realize,
There were still many drawbacks, such as the effect only on some branch instructions.

[Means for Solving the Problems]

本発明のデータ処理装置では上記の欠点を解決するた
め、条件分岐命令は履歴に依存して、その他の命令は命
令コードに依存して分岐を予測することが可能な命令デ
コード機構と、分岐先アドレスを計算することが可能な
プログラムカウンタ値計算機構と、分岐命令の命令長と
分岐命令のプログラムカウンタ値を加算することが可能
なオペランドアドレス計算機構とをもつ。In order to solve the above-mentioned drawbacks, the data processor of the present invention has an instruction decoding mechanism capable of predicting a branch depending on a history of conditional branch instructions and an instruction code of other instructions, and a branch destination. It has a program counter value calculation mechanism capable of calculating an address and an operand address calculation mechanism capable of adding the instruction length of a branch instruction and the program counter value of a branch instruction.

[Action]

本発明のデータ処理装置では上記の、条件分岐命令は履
歴に依存して、その他の命令は命令コードに依存して分
岐を予測することが可能な命令デコード機構と、分岐先
アドレスを計算することが可能なプログラムカウンタ値
計算機構と、分岐命令の命令長と分岐命令のプログラム
カウンタ値を加算することが可能なオペランドアドレス
計算機構とにより、無条件分岐命令、条件分岐命令、サ
ブルーチン分岐命令、ループ制御命令に対して命令デコ
ード段階で分岐処理を行い、パイプライン処理の乱れを
少なくする。In the data processing device of the present invention, the conditional branch instruction depends on the history and the other instructions depend on the instruction code, and an instruction decoding mechanism capable of predicting a branch and a branch destination address are calculated. An unconditional branch instruction, a conditional branch instruction, a subroutine branch instruction, and a loop are provided by a program counter value calculation mechanism that is capable of calculating and a operand address calculation mechanism that can add the instruction length of a branch instruction and the program counter value of a branch instruction. The control instruction is subjected to branch processing at the instruction decoding stage to reduce disturbance in pipeline processing.

Example of Invention

（１）本発明のデータ処理装置の命令フォーマット本発明のデータ処理装置の命令は16ビット単位で可変長
となっており、奇数バイト長の命令はない。(1) Instruction format of the data processing device of the present invention The instruction of the data processing device of the present invention has a variable length in units of 16 bits, and there is no instruction of odd byte length.

本発明のデータ処理装置では高頻度命令を短いフォーマ
ットとするため、特に工夫された命令フォーマット体系
をもつ。例えば、２オペランド命令に対して、基本的に
４バイト＋拡張部の構成をもち、すべてのアドレッシン
グモードが利用できる一般形フォーマットと頻度の高い
命令とアドレッシングモードのみを使用できる短縮形フ
ォーマットの２つのフォーマットがある。The data processor of the present invention has a particularly devised instruction format system in order to make a high-frequency instruction into a short format. For example, for a two-operand instruction, there are basically two formats: a general format that has a structure of 4 bytes and an extension part and can use all addressing modes, and a short format that can use only frequently used instructions and addressing modes. There is a format.

第８図から第17図に示す本発明のデータ処理装置の命令
フォーマット中に現われる記号の意味は次の通りであ
る。The meanings of the symbols appearing in the instruction format of the data processor of the present invention shown in FIGS. 8 to 17 are as follows.

−：オペコードの入る部分＃：リテラル、または即値の入る部分 Ea:8ビットの一般形のアドレッシングモードでオペラン
ドを指定する部分 Sh:6ビットの短縮形のアドレッシングモードでオペラン
ドを指定する部分 Rn:レジスタ上のオペランドをレジスタ番号で指定する
部分フォーマットは、第８図に示すように右側がLSB側で、
かつ高いアドレスになっている。アドレスＮとアドレス
Ｎ＋１の２バイトを見ないと命令フォーマットが判別で
きないようになっているが、これは、命令が必ず16ビッ
ト（２バイト）単位でフェッチ、デコードされることを
前提としたためである。-: Part containing opcode #: Part containing literal or immediate value Ea: Part specifying operand in 8-bit general addressing mode Sh: Part specifying operand in 6-bit short addressing mode Rn: Register The partial format that specifies the above operand by the register number is the LSB side on the right side, as shown in Fig. 8.
And the address is high. The instruction format cannot be distinguished unless the two bytes of the address N and the address N + 1 are seen. This is because it is premised that the instruction is fetched and decoded in units of 16 bits (2 bytes). .

本発明のデータ処理装置では、いずれのフォーマットの
場合も、各オペランドのEaまたはShの拡張部は、必ずそ
のEaまたはShの基本部を含むハーフワードの直後に置か
れる。これは、命令により暗黙に指定される即値データ
や、命令の拡張部に優先する。したがって、４バイト以
上の命令では、Eaの拡張部によって命令のオペコードが
分断される場合がある。In the data processor of the present invention, the extension of Ea or Sh of each operand is always placed immediately after the halfword including the basic part of Ea or Sh in any format. This takes precedence over immediate data implicitly specified by the instruction and the extension of the instruction. Therefore, for an instruction of 4 bytes or more, the operation code of the instruction may be divided by the extension part of Ea.

また、後でも述べるように、多段間接モードによって、
Eaの拡張部にさらに拡張部が付く場合にも、次の命令オ
ペコードよりもそちらの方が優先される。例えば、第一
ハーフワードにEa1を含み、第二ハーフワードにEa2を含
み、第三ハーフワードまである６バイト命令の場合を考
える。Ea1に多段間接モードを使用したため、普通の拡
張部のほかに多段間接モードの拡張部もつくものとす
る。この時、実際の命令ビットパターンは、命令の第一
ハーフワード（Ea1の基本部を含む）、Ea1の拡張部、Ea
1の多段間接モード拡張部、命令の第二ハーフワード（E
a2の基本部を含む）、Ea2の拡張部、命令の第三ハーフ
ワード、の順となる。Also, as will be described later, the multi-stage indirect mode
Even if the extension part of Ea has an extension part, that part has priority over the next instruction opcode. For example, consider the case of a 6-byte instruction with Ea1 in the first halfword, Ea2 in the second halfword, and up to the third halfword. Since the multistage indirect mode is used for Ea1, the extension part of the multistage indirect mode shall be attached in addition to the ordinary extension part. At this time, the actual instruction bit pattern is the first halfword of the instruction (including the basic part of Ea1), the extension part of Ea1, and Ea1.
Multistage indirect mode extension of 1, second halfword of instruction (E
a2 including the basic part), Ea2 extension, the third halfword of the instruction, in that order.

（1.1.）短縮形２オペランド命令第９図から第12図に示す。２オペランド命令の短縮形フ
ォーマットである。(1.1.) Short two-operand instructions are shown in Figures 9-12. It is a shortened format of a two-operand instruction.

第９図はメモリーレジスタ間演算命令のフォーマットで
ある。このフォーマットにはソースオペランド側がメモ
リとなるL-formatとデスティネーションオペランド側が
メモリとなるS-formatがある。FIG. 9 shows the format of the arithmetic instruction between memory registers. This format has an L-format in which the source operand side is the memory and an S-format in which the destination operand side is the memory.

L-formatでは、Shはソースオペランドの指定フィール
ド、Rhはデスティネーションオペランドのレジスタの指
定フィールド、RRはShのオペランドサイズの指定をあら
わす。レジスタ上に置かれたデスティネーションオペラ
ンドのサイズは、32ビットに固定されている。レジスタ
側とメモリ側のサイズが異なり、ソース側のサイズが小
さい場合に符号拡張が行なわれる。In the L-format, Sh represents the designation field of the source operand, Rh represents the designation field of the register of the destination operand, and RR represents the designation of the operand size of Sh. The size of the destination operand placed in the register is fixed at 32 bits. Sign extension is performed when the size of the register side is different from that of the memory side and the size of the source side is small.

S-formatではShはデスティネーションオペランドの指定
フィールド、Rhはソースオペランドのレジスタ指定フィ
ールド、RRはShのオペランドサイズの指定をあらわす。
レジスタ上に置かれたソースオペランドのサイズは、32
ビットに固定されている。レジスタ側とメモリ側のサイ
ズが異なり、ソース側のサイズが大場合にあふれた部分
の切捨てとオーバーフローチャックが行なわれる。In S-format, Sh represents the destination operand specification field, Rh represents the source operand register specification field, and RR represents the Sh operand size specification.
The size of the source operand placed on the register is 32.
It has been fixed to a bit. When the size of the register side is different from that of the memory side and the size of the source side is large, the overflow portion is truncated and overflow chuck is performed.

第10図はレジスタ−レジスタ間演算命令のフォーマット
（R-format）である。Rnはデスティネーションレジスタ
の指定フィールドRmはソースレジスタの指定フィールド
である。オペランドサイズは32ビットのみである。FIG. 10 shows the format (R-format) of a register-register operation instruction. Rn is a designation field of the destination register Rm is a designation field of the source register. Operand size is only 32 bits.

第11図はリテラル−メモリ間演算命令のフォーマット
（Q-format）である。MMはデスティネーションオペラン
ドサイズの指定フィールド、＃はリテラルによるソース
オペランドの指定フィルド、Shはデスティネーションオ
ペランドの指定フィルードである。FIG. 11 shows the format (Q-format) of a literal-memory operation instruction. MM is a destination operand size specification field, # is a literal source operand specification field, and Sh is a destination operand specification field.

第12図は即値−メモリ間演算命令のフォーマット（I-fo
rmat）である。MMはオペランドサイズの指定フィールド
（ソース，デスティネーションで共通）、Shはデスティ
ネーションオペランドの指定フィールドである。I-form
atの即値のサイズは、デスティネーション側のオペラン
ドのサイズと共通に8,16,32ビットとなり、ゼロ拡張、
符号拡張は行なわれない。Figure 12 shows the format of the immediate-memory operation instruction (I-fo
rmat). MM is an operand size specification field (common to the source and destination), and Sh is a destination operand specification field. I-form
The size of the immediate value of at is 8,16,32 bits in common with the size of the operand on the destination side, and zero extension,
No sign extension is performed.

（1.2）一般形１オペランド命令第13図は１オペランド命令の一般形フォーマット（G1-f
ormat）である。MMはオペランドサイズの指定フィール
ドである。一部のG1-format命令では、Eaの拡張部以外
にも拡張部がある。また、MMを使用しない命令もある。(1.2) General-purpose 1-operand instruction Figure 13 shows the general-purpose format of the 1-operand instruction (G1-f
ormat). MM is a field for specifying the operand size. Some G1-format instructions have an extension part other than the extension part of Ea. Also, some instructions do not use MM.

（1.3）一般形２オペランド命令第14図から第16図は２オペランド命令の一般形フォーマ
ットである。このフォーマットに含まれるのは、８ビッ
トで指定する一般形アドレッシングモードのオペランド
が最大２つ存在する命令である。オペランドの総数自体
は３つ以上になる場合がある。(1.3) General type two-operand instruction Figures 14 to 16 show the general format of the two-operand instruction. Included in this format are instructions that have up to two operands in the general addressing mode specified by 8 bits. The total number of operands themselves may be three or more.

第14図は第一オペランドがメモリ読みだしを必要とする
命令のフォーマット（G-format）である。EaMはデステ
ィネーションオペランドの指定フィールド、MMはデステ
ィネーションオペランドサイズの指定フィールド、EaR
はソースオペランド指定フィールド、RRはソースオペラ
ンドサイズの指定フィルドである。一部のG-format命令
では、EaMやEaRの拡張部以外にも拡張部がある。FIG. 14 shows the format (G-format) of the instruction in which the first operand requires memory reading. EaM is the destination operand specification field, MM is the destination operand size specification field, EaR
Is a source operand specification field, and RR is a specification field of the source operand size. Some G-format instructions have extensions other than the extensions of EaM and EaR.

第15図は第一オペランドが８ビット即値の命令フォーマ
ット（E-format）である。EaMはデスティネーションオ
ペランドの指定フィールド、MMはデスティネーションオ
ペランドサイズの指定フィールド、＃はソースオペラン
ド値である。FIG. 15 shows an instruction format (E-format) in which the first operand is an 8-bit immediate value. EaM is the destination operand specification field, MM is the destination operand size specification field, and # is the source operand value.

E-formatとI-formatとは機能的には似たものであるが、
考え方の点では大きく違っている。E-formatはあくまで
も２オペランド一般形（G-format）の派生形であり、ソ
ースオペランドのサイズが８ビット固定、デスティネー
ションオペランドのサイズが8/16/32ビットから選択と
なっている。つまり、異種サイズ間の演算を前提とし、
デスティネーションオペランドのサイズに合わせて８ビ
ットのソースオペランドがゼロ拡張または符号拡張され
る。一方、I-formatは、特に転送命令、比較命令で頻度
の多い即値のパターンを短縮形にしたものであり、ソー
スオペランドとデスティネーションオペランドのサイズ
は等しい。E-format and I-format are functionally similar, but
There is a big difference in the way of thinking. The E-format is a derivative of the 2-operand general type (G-format), and the size of the source operand is fixed at 8 bits and the size of the destination operand is selected from 8/16/32 bits. In other words, assuming arithmetic between different sizes,
The 8-bit source operand is zero-extended or sign-extended according to the size of the destination operand. On the other hand, the I-format is a shortened form of an immediate value pattern that is frequently used especially for transfer instructions and comparison instructions, and the source operand and the destination operand have the same size.

第16図は第一オペランドがアドレス計算のみの命令のフ
ォーマット（GA-format）である。EaWはデスティネーシ
ョンオペランドの指定フィールド、WWはデスティネーシ
ョンオペランドサイズの指定フィールド、EaAはソース
オペランドの指定フィールドである。ソースオペランド
としては実行アドレスの計算結果自体が使用される。FIG. 16 shows the format (GA-format) of the instruction whose first operand is only address calculation. EaW is a destination operand specification field, WW is a destination operand size specification field, and EaA is a source operand specification field. The execution address calculation result itself is used as the source operand.

第17図はショートブランチ命令のフォーマットである。
ccccは分岐条件指定フィールド、disp:8はジャンプ先と
の変位指定フィールド、本発明のデータ処理装置では８
ビットで変位を指定する場合には、ビットパターンでの
指定値を２倍して変位値とする。FIG. 17 shows the format of the short branch instruction.
cccc is a branch condition designation field, disp: 8 is a displacement designation field with a jump destination, and 8 in the data processing device of the present invention.
When the displacement is designated by a bit, the designated value in the bit pattern is doubled to obtain the displacement value.

（1.4）アドレッシングモード本発明のデータ処理装置のアドレッシングモード指定方
法には、レジスタを含めて６ビットで指定する短縮形
と、８ビットで指定する一般形がある。(1.4) Addressing Mode The addressing mode designating method of the data processing device of the present invention includes a shortened form that is designated by 6 bits including a register and a general form that is designated by 8 bits.

未定義のアドレッシングモードを指定した場合や、意味
的に考えて明らかにおかしなアドレッシングモードの組
み合わせを指定した場合には、未定義命令を実行した場
合と同じく予約命令例外を発生し、例外処理を起動す
る。If an undefined addressing mode is specified, or if a combination of addressing modes that is obviously strange in terms of meaning is specified, a reserved instruction exception is generated and exception processing is started, as when an undefined instruction is executed. To do.

これに該当するのは、デスティネーションが即値モード
の場合、アドレス計算を伴うべきアドレッシングモード
指定フィールドで即値モードを使用した場合などであ
る。This corresponds to the case where the destination is the immediate mode, the case where the immediate mode is used in the addressing mode designation field which should accompany the address calculation, and the like.

第18図から第28図に示すフォーマットの図中で使われる
記号つぎの通りである。The symbols used in the diagrams of the formats shown in FIGS. 18 to 28 are as follows.

Rn レジスタ指定（Sh）６ビットの短縮形アドレッシングモードでの指
定方法（Ea）８ビットの一般形アドレッシングモードでの指
定方法フォーマットの図で点線で囲まれた部分は、拡張部を示
す。Rn register specification (Sh) Specification method in 6-bit shortened addressing mode (Ea) Specification method in 8-bit general type addressing mode The part enclosed by the dotted line in the format diagram indicates the extension part.

（1.4.1）基本アドレッシングモード本発明のデータ処理装置は様々なアドレッシングモード
をサポートする。そのうち、本発明のデータ処理装置で
サポートする基本アドレッシングモードには、レジスタ
直接モード、レジスタ間接モード、レジスタ相対間接続
モード、即値モード、絶対モード、PC相対間接モード、
スタックポップモード、スタックプッシュモードがあ
る。(1.4.1) Basic Addressing Mode The data processing device of the present invention supports various addressing modes. Among them, basic addressing modes supported by the data processing device of the present invention include register direct mode, register indirect mode, register relative connection mode, immediate mode, absolute mode, PC relative indirect mode,
There are stack pop mode and stack push mode.

レジスタ直接モードは、レジスタの内容をそのままオペ
ランドとする。フォーマットは第18図に示す。Rnは汎用
レジスタの番号を示す。In the register direct mode, the contents of the register are directly used as the operand. The format is shown in FIG. Rn indicates the general register number.

レジスタ間接モードは、レジスタの内容をアドレスとす
るメモリの内容をオペランドとする。フォーマットは第
19図に示す。Rnは汎用レジスタの番号を示す。In the register indirect mode, the content of the memory whose address is the content of the register is the operand. Format is first
Shown in Figure 19. Rn indicates the general register number.

レジスタ相対間接は、ディスプレースメント値が16ビッ
トか32ビットかにより、２種類ある。それぞれ、レジス
タの内容に16ビットまたは32ビットのディスプレースメ
ント値を加えた値をアドレスとするメモリの内容をオペ
ランドとする。フォーマットは第20図に示す。Rnは汎用
レジスタの番号を示す。disp:16とdisp:32は、それぞ
れ、16ビットのディスプレースメント値、32ビットのデ
ィスプレースメント値を示す。ディスプレースメント値
は符号付きとして扱う。There are two types of register relative indirect, depending on whether the displacement value is 16 bits or 32 bits. Each of them uses the contents of the memory whose address is a value obtained by adding a displacement value of 16 bits or 32 bits to the contents of the register as an operand. The format is shown in Fig. 20. Rn indicates the general register number. disp: 16 and disp: 32 represent a 16-bit displacement value and a 32-bit displacement value, respectively. The displacement value is treated as signed.

即値モードは、命令コード中で指定されるビットパタン
をそのまま２進数と見なしてオペランドとする。フォー
マットは第21図に示す。imm-dataは即値を示す。imm-da
taのサイズは、オペランドサイズとして命令中で指定さ
れる。In the immediate mode, the bit pattern specified in the instruction code is regarded as a binary number as it is and is used as an operand. The format is shown in FIG. imm-data indicates an immediate value. imm-da
The size of ta is specified in the instruction as the operand size.

絶対モードは、アドレス値が16ビットで示されるか32ビ
ットで示されるかにより２種類ある。それぞれ、命令コ
ード中で指定される16ビットまたは32ビットのビットパ
タンをアドレスとしたメモリの内容をオペランドとす
る。フォーマットは第22図に示す。abs:16とabs:32は、
それぞれ16ビット、32ビットのアドレス値を示す。abs:
16でアドレスが示されるときは指定されたアドレス値を
32ビットに符号拡張する。There are two types of absolute modes depending on whether the address value is indicated by 16 bits or 32 bits. The contents of the memory with the 16-bit or 32-bit bit pattern specified in the instruction code as an address are used as operands. The format is shown in FIG. abs: 16 and abs: 32 are
16-bit and 32-bit address values are shown, respectively. abs:
When the address is indicated by 16, the specified address value
Sign extend to 32 bits.

PC相対間接モードは、ディスプレースメント値が16ビッ
トか32ビットかにより、２種類ある。それぞれ、プログ
ラムカウンタの内容に16ビットまたは32ビットのディス
プレースメント値を加えた値をアドレスとするメモリの
内容をオペランドとする。フォーマットは第23図に示
す。There are two types of PC relative indirect mode depending on whether the displacement value is 16 bits or 32 bits. The contents of the memory whose address is a value obtained by adding a displacement value of 16 bits or 32 bits to the contents of the program counter are used as operands. The format is shown in FIG.

disp:16とdisp:32は、それぞれ、16ビットのディスプレ
ースメント値、32ビットのディスプレースメント値を示
す。ディスプレースメント値は符号付きとして扱う。PC
相対間接モードにおいて参照されるプログラムカウンタ
の値は、そのオペランドを含む命令の先頭アドレスであ
る。多段間接アドレッシングモードにおいてプログラム
カウンタの値が参照される場合にも、同じように命令先
頭のアドレスをPC相対の基準値として使用する。disp: 16 and disp: 32 represent a 16-bit displacement value and a 32-bit displacement value, respectively. The displacement value is treated as signed. PC
The value of the program counter referred to in the relative indirect mode is the start address of the instruction including the operand. Even when the value of the program counter is referenced in the multi-stage indirect addressing mode, the address at the beginning of the instruction is used as the PC-relative reference value in the same manner.

スタックポップモードはスタックポインタ（SP）の内容
をアドレスとするメモリの内容をオペランドとする。オ
ペランドアクセス後、SPをオペランドサイズだけインク
リメントする。例えば、32ビットデータを扱う時には、
オペランドサクセス後にSPが＋４だけ更新される。B,H
のサイズのオペランドに対するスタックポップモードの
指定も可能であり、それぞれSPが＋1,＋２だけ更新され
る。フォーマットは第24図に示す。オペランドに対して
スタックポップモードが意味を持たないものに対して
は、予約命令外を発生する。具体的に予約命令外となる
のは、writeオペランド、read-modify-writeオペランド
に対するスタックポップモード指定である。In the stack pop mode, the content of the memory whose address is the content of the stack pointer (SP) is the operand. After the operand is accessed, SP is incremented by the operand size. For example, when handling 32-bit data,
SP is updated by +4 after operand success. B, H
It is also possible to specify stack pop mode for operands of size, and SP is updated by +1, +2 respectively. The format is shown in Fig. 24. If the stack pop mode has no meaning for the operand, a non-reserved instruction is generated. Specifically, what is outside the reserved instruction is the stack pop mode specification for the write operand and the read-modify-write operand.

スタックプッシュモードはSPの内容をオペランドサイズ
だけデクリメントした内容をアドレスとするメモリの内
容をオペランドとする。スタックプッシュモードではオ
ペランドアクセス前にSPがデクリメントされる。例え
ば、32ビットデータを扱う時には、オペランドアクセス
前にSPが−４だけ更新される。B,Hのサイズのオペラン
ドに対するスタックプッシュモードの指定も可能であ
り、それぞれSPが−1,−２だけ更新される。フォーマッ
トは第25図に示す。オペランドに対してスタックプッシ
ュモードが意味を持たないものに対しては、予約命令外
を発生する。具体的に予約命令外となるのは、readオペ
ランド、read-modify-writeオペランドに対するスタッ
クプッシュモード指定である。In the stack push mode, the contents of the memory whose address is the contents of SP decremented by the operand size are the operands. In stack push mode, SP is decremented before operand access. For example, when handling 32-bit data, SP is updated by -4 before operand access. It is also possible to specify the stack push mode for operands of sizes B and H, and SP is updated by -1 and -2, respectively. The format is shown in Fig. 25. If the stack push mode has no meaning for the operand, a non-reserved instruction is generated. Specifically, what is outside the reserved instruction is the stack push mode specification for the read operand and the read-modify-write operand.

（1.4.2）多段間接アドレッシングモード複雑なアドレッシングも、基本的には加算と間接参照の
組み合わせに分解することができる。したがって、加算
と間接参照のオペレーションをアドレッシングのプリミ
ティブとして与えておき、それを任意に組み合わせるこ
とができれば、どんな複雑なアドレッシングモードをも
実現することができる。本発明のデータ処理装置の多段
間接アドレッシングモードはこのような考え方にたった
アドレッシングモードである。複雑なアドレッシングモ
ードは、モジュール間のデータ参照やAI言語の処理系に
特に有用である。(1.4.2) Multi-stage indirect addressing mode Basically, complicated addressing can be decomposed into a combination of addition and indirect reference. Therefore, if complicated addition and indirect reference operations are given as addressing primitives and they can be arbitrarily combined, any complicated addressing mode can be realized. The multi-stage indirect addressing mode of the data processing device of the present invention is an addressing mode based on such a concept. Complex addressing modes are especially useful for data references between modules and AI language implementations.

多段間接アドレッシングモードを指定するとき、基本ア
ドレッシングモード指定フィールドでは、レジスタベー
ス多段間接モード、PCベース多段間接モード、絶対ベー
ス多段間接モードの３種類の指定方法のうちいずれか１
つを指定する。When specifying the multi-stage indirect addressing mode, in the basic addressing mode specification field, select one of three types of register-based multi-stage indirect mode, PC-based multi-stage indirect mode, and absolute base multi-stage indirect mode.
Specify one.

レジスタベース多段間接モードはレジスタの値を、拡張
する多段間接アドレッシングのベース値とするアドレッ
シングモードである。フォーマットは第26図に示す。Rn
は汎用レジスタの番号を示す。The register-based multistage indirect mode is an addressing mode in which a register value is used as a base value for expanding multistage indirect addressing. The format is shown in FIG. Rn
Indicates the general register number.

PCベース多段間接モードはプログラムカウンタの値を、
拡張する多段間接アドレッシングのベース値とするアド
レッシングモードである。フォーマットは第27図に示
す。In the PC-based multi-stage indirect mode, the value of the program counter is
This is an addressing mode that is used as a base value for expanding multistage indirect addressing. The format is shown in Figure 27.

絶対ベース多段間接モードはゼロを、拡張する多段間接
アドレッシングのベース値とするアドレッシングモード
である。フォーマットは第28図に示す。The absolute base multi-stage indirect mode is an addressing mode in which zero is used as a base value for expanding multi-stage indirect addressing. The format is shown in Fig. 28.

拡張する多段間接モード指定フィールドは、16ビットを
単位としており、これを任意回繰り返す。１段の多段間
接モードにより、ディスプレースメントの加算、インデ
クスレジスタのスケーリング（×１、×２、×４、×
８）と加算、メモリの間接参照、を行なう。多段間接モ
ードのフォーマットは第29図で示す。各フィールドは以
下に示す意味をもつ。The multistage indirect mode specification field to be expanded has 16 bits as a unit, and this is repeated any number of times. Addition of displacement, scaling of index register (× 1, × 2, × 4, ×
8) and addition, indirect reference of memory. The format of the multi-stage indirect mode is shown in FIG. Each field has the following meaning.

Ｅ＝0:多段間接モード継続Ｅ＝1:アドレス計算終了 tmp＝＝＞address of operand Ｉ＝0:メモリ間接参照なし tmp＋disp＋Rx＊Scale＝＝＞tmp Ｉ＝1:メモリ間接参照あり mem〔tmp＋disp＋Rx＊Scale〕＝＝＞tmp Ｍ＝1:〈Rx〉をインデクスとして使用Ｍ＝2:特殊なインデクス〈Rx〉＝０インデクス値を加算しない（Rx＝０）〈Rx〉＝１プログラムカウンタをインデクス値として
使用（Rx＝PC）〈Rx〉＝２〜reserved Ｄ＝0:多段間接モード中の４ビットのフィールドd4の値
を４倍してディスプレースメント値とし、これを加算す
る。d4は符号付きとして扱い、オペランドのサイズとは
関係なく必ず４倍して使用する。E = 0: Multi-stage indirect mode continued E = 1: Address calculation end tmp ==> address of operand I = 0: No memory indirect reference tmp + disp + Rx * Scale ==> tmp I = 1: Memory indirect reference mem [tmp + disp + Rx * Scale ] ==> tmp M = 1: Use <Rx> as index M = 2: Special index <Rx> = 0 Do not add index value (Rx = 0) <Rx> = 1 Use program counter as index value (Rx = PC) <Rx> = 2 to reserved D = 0: The value of the 4-bit field d4 in the multi-stage indirect mode is multiplied by 4 to be a displacement value, which is added. d4 is treated as signed, and it is always multiplied by 4 regardless of the operand size.

Ｄ＝1:多段間接モードの拡張部で指定されたdispx（16/
32ビット）をディスプレースメント値とし、これを加算
する。D = 1: dispx (16 /
(32 bits) is used as the displacement value and this is added.

拡張部のサイズはd4フィールドで指定する。The size of the extension is specified in the d4 field.

d4＝0001 dispxは16ビット d4＝0010 dispxは32ビット XX:インデクスのスケール（scale＝1/2/4/8）プログラムカウンタに対して×２、×４、×８のスケー
リングを行なった場合には、その段の処理終了後の中間
値（tmp）として、不定値が入る。この多段間接モード
によって得られる実効アドレスは予測できない値となる
が、例外は発生しない。プログラムカウンタに対するス
ケーリングの指定は行なってはいけない。d4 = 0001 dispx is 16 bits d4 = 0010 dispx is 32 bits XX: Index scale (scale = 1/2/4/8) When the program counter is scaled by × 2, × 4, × 8 Has an indeterminate value as the intermediate value (tmp) after the processing of that stage is completed. The effective address obtained by this multistage indirect mode has an unpredictable value, but no exception occurs. Do not specify scaling for the program counter.

多段間接モードによる命令フォーマットのバリエーショ
ンを第30図、第31図に示す。第30図は多段間接モードが
継続するか終了するかのバリエーションを示す。第31図
はディスプレースメントのサイズのバリエーションを示
す。A variation of the instruction format in the multi-stage indirect mode is shown in FIGS. 30 and 31. FIG. 30 shows a variation in which the multistage indirect mode continues or ends. FIG. 31 shows variations in displacement size.

任意段数の多段間接モードが利用できれば、コンパイラ
の中で段数による場合分けが不要になるので、コンパイ
ラの負担が軽減されるというメリットがある。多段の間
接参照の頻度が非常に少ないとしても、コンパイラとし
ては必ず正しいコードを発生できなければならないから
である。このため、フォーマット上、任意の段数が可能
になっている。If the multi-stage indirect mode with an arbitrary number of stages can be used, it is not necessary to divide the case depending on the number of stages in the compiler, which has the advantage of reducing the load on the compiler. This is because the compiler must be able to generate correct code even if the frequency of multiple indirect references is extremely low. Therefore, an arbitrary number of stages is possible in terms of format.

（1.5）例外処理本発明のデータ処理装置ではソフトウエア負荷の軽減の
ため、豊富な例外処理機能をもつ、本発明のデータ処理
装置では例外処理は、命令処理を再実行するもの（例
外）、命令処理を完了するもの（トラップ）、割込の３
種類に分けて名称をつけている。また本発明のデータ処
理装置では、この３種の例外処理と、システム障害を総
称してEITと呼ぶ。(1.5) Exception Processing In the data processing device of the present invention, a wide variety of exception processing functions are provided to reduce the software load. In the data processing device of the present invention, exception processing re-executes instruction processing (exception), One that completes instruction processing (trap), interrupt 3
The names are given separately for each type. Further, in the data processing device of the present invention, these three types of exception processing and system failures are collectively referred to as EIT.

（２）機能ブロックの構成第２図に本発明のデータ処理装置のブロック図を示す。
本発明のデータ処理装置の内部を機能的に大きく分ける
と、命令フェッチ部（51）、命令デコード部（52）PC計
算部（53）、オペランドアドレス計算部（54）、マイク
ロROM部（55）、データ演算部（56）、外部バスインタ
ーフェイス部（57）に分かれる。第２図ではその他にCP
U外部にアドレスを出力するアドレス出力回路（58）とC
PU外部とデータの入出力を行うデータ入出力回路（59）
を他の機能ブロック部と分けて示した。(2) Functional Block Configuration FIG. 2 shows a block diagram of the data processing apparatus of the present invention.
Functionally roughly dividing the inside of the data processing device of the present invention, an instruction fetch unit (51), an instruction decoding unit (52), a PC calculation unit (53), an operand address calculation unit (54), and a micro ROM unit (55). , A data calculation unit (56) and an external bus interface unit (57). In Figure 2, CP
U Address output circuit (58) that outputs the address to the outside and C
Data input / output circuit (59) for inputting / outputting data to / from the outside of the PU
Are shown separately from other functional block parts.

（2.1）命令フェッチ部（51）にはブランチバッファ、
命令キューとその制御部などがあり、次にフェッチすべ
き命令のアドレスを決定して、ブランチバッファやCPU
外部のメモリから命令をフェッチする。ブランチバッフ
ァへの命令登録も行う。(2.1) The instruction fetch unit (51) has a branch buffer,
There is an instruction queue and its control unit, etc., which determines the address of the next instruction to be fetched and
Fetch instructions from external memory. It also registers instructions in the branch buffer.

ブランチバッファは小規模であるためセレクティブキャ
ッシュとして動作する。ブランチバッファの動作の詳細
は特願昭61-202041で詳しく述べられている。Since the branch buffer is small, it operates as a selective cache. The details of the operation of the branch buffer are described in detail in Japanese Patent Application No. 61-202041 .

次にフェッチすべき命令のアドレスは命令キューに入力
すべき命令のアドレスとして専用のカウンタで計算され
る。分岐やジャンプが起きたときには、新たな命令のア
ドレスが、PC計算部（53）やデータ演算部（56）より転
送されてくる。The address of the instruction to be fetched next is calculated by a dedicated counter as the address of the instruction to be input to the instruction queue. When a branch or jump occurs, the address of the new instruction is transferred from the PC calculation unit (53) or the data calculation unit (56).

CPU外部のメモリから命令をフェッチするときは、外部
アスインターフェイス部（57）を通して、フェッチすべ
き命令のアドレスをアドレス出力回路（58）からCPU外
部に出力し、データ入出力回路（59）をから命令コード
をフェッチする。When fetching an instruction from the memory outside the CPU, the address of the instruction to be fetched is output from the address output circuit (58) to the outside of the CPU through the external interface unit (57), and the data input / output circuit (59) is output. Fetch the instruction code.

バッファリングした命令コードのうち、命令デコード部
（52）で次にデコードすべき命令コードを命令デコード
部（52）に出力する。Of the buffered instruction codes, the instruction decoding unit (52) outputs the instruction code to be decoded next to the instruction decoding unit (52).

（2.2）命令デコード部命令デコード部（52）では基本的に16ビット（ハーフワ
ード）単位に命令コードをデコードする。このブロック
には第１ハーフワードに含まれるオペコードをデコード
するFHWデコーダ、第２、第３ハーフワードに含まれる
オペコードをデコードするNHFWデコーダ、アドレッシン
グモードをデコードするアドレッシングモードデコーダ
が含まれる。(2.2) Instruction decoding unit The instruction decoding unit (52) basically decodes the instruction code in 16-bit (halfword) units. This block includes an FHW decoder that decodes the operation code included in the first halfword, an NHFW decoder that decodes the operation code included in the second and third halfwords, and an addressing mode decoder that decodes the addressing mode.

さらにFHWデコーダやNHFWデコーダの出力をさらにデコ
ードして、マイクロROMのエントリアドレスを計算する
デコーダ２、条件分岐命令の分岐予測を行う分岐予測機
構、オペランドアドレス計算のときのパイプラインコン
フリクトをチェックするアドレス計算コンフリクトチェ
ック機構も含まれる。Furthermore, the decoder 2 that further decodes the output of the FHW decoder or NHFW decoder to calculate the entry address of the micro ROM, the branch prediction mechanism that performs branch prediction of conditional branch instructions, the address that checks pipeline conflicts when calculating operand addresses A calculation conflict check mechanism is also included.

命令フェッチ部より入力された命令コードを２クロック
につき０〜６バイトデコードする。デコード結果のう
ち、データ演算部（56）での演算に関する情報がマイク
ロROM部（55）に、オペランドアドレス計算に関係する
情報がオペランドアドレス計算部（54）に、PC計算に関
する情報がPC計算部（53）に、それぞれ出力される。The instruction code input from the instruction fetch unit is decoded for 0 to 6 bytes every 2 clocks. Among the decoding results, the information about the operation in the data operation unit (56) is in the micro ROM unit (55), the information related to the operand address calculation is in the operand address calculation unit (54), and the information related to the PC calculation is in the PC calculation unit. Output to (53) respectively.

（2.3）マイクロROM部マイクロROM部（55）には主にデータ演算部（56）を制
御するマイクロプログラムが格納されているマイクロRO
M、マイクロシーケンサ、マイクロ命令デコーダなどが
含まれる。マイクロ命令はマイクロROMから２クロック
に１度読み出される。まマイクロシーケンサはマイクロ
プログラムで示されるシーケンス処理の他に、例外、割
込、トラップ（この３つをあわせてEITと呼ぶ）の処理
をハードウエア的に受付ける。またマイクロROM部はス
トアバッファの管理も行う。マイクロROM部には命令コ
ードに依存しない割込みや演算実行結果によるフラッグ
情報と、デコーダ２の出力など命令デコード部の出力が
入力される。マイクロデコーダの出力は主にデータ演算
部（56）に対して出力されるが、ジャンプ命令の実行に
よる他の先行処理中止情報など一部の情報は他のブロッ
クへも出力される。(2.3) Micro ROM section The micro ROM section (55) stores a micro program that mainly controls the data calculation section (56).
Includes M, micro sequencer, micro instruction decoder, etc. Micro instructions are read from the micro ROM once every two clocks. In addition to the sequence processing indicated by the microprogram, the micro-sequencer accepts exception, interrupt, and trap (these three are collectively called EIT) processing by hardware. The micro ROM section also manages the store buffer. Flag information based on an interrupt or an operation execution result that does not depend on an instruction code and the output of the instruction decoding unit such as the output of the decoder 2 are input to the micro ROM unit. The output of the microdecoder is mainly output to the data operation unit (56), but some information such as other preceding process stop information due to execution of the jump instruction is also output to other blocks.

（2.4）オペランドアドレス計算部オペランドアドレス計算部（54）は、命令デコード部
（52）のアドレスデコーダなどから出力されたオペラン
ドアドレス計算に関係する情報によりハードワイヤード
制御される。このブロックではオペランドのアドレス計
算に関するほとんどの処理が行われる。メモリ間接アド
レッシングのためのメモリアクセスのアドレスやオペラ
ンドアドレスがメモリにマップされたI/O領域に入るか
どうかのチェックも行われる。(2.4) Operand Address Calculation Unit The operand address calculation unit (54) is hard-wired controlled by the information related to the operand address calculation output from the address decoder of the instruction decoding unit (52). In this block, most of the processing for calculating the address of the operand is performed. It is also checked whether the memory access address or operand address for memory indirect addressing falls within the I / O area mapped in the memory.

アドレス計算結果は外部バスインターフェイス部（57）
に送られる。アドレス計算に必要な汎用レジスタやプロ
グラムカウンタの値はデータ演算部より入力される。The address calculation result is the external bus interface (57)
Sent to. The values of general-purpose registers and program counters required for address calculation are input from the data calculation unit.

メモリ間接アドレッシングを行うときは外部バスインタ
ーフェイス部（57）を通してアドレス出力回路（58）か
らCPU外部に参照すべきメモリアドレスを出力し、デー
タ入出力部（59）から入力された間接アドレス値を命令
デコード部（52）をそのまま通過させてフェッチする。When performing memory indirect addressing, the memory address to be referred to outside the CPU is output from the address output circuit (58) through the external bus interface unit (57), and the indirect address value input from the data input / output unit (59) is instructed. The decoding unit (52) is passed as it is and fetched.

（2.5）PC計算部 PC計算部（53）は命令デコード部（52）から出力される
PC計算に関係する情報でハードワイヤードに制御され、
命令のPC値を計算する。本特許のデータ処理装置は可変
長命令セットを持っており、命令をデコードしてみない
とその命令の長さが判らない。PC計算部（53）は、命令
デコード部（52）から出力される命令長をデコード中の
命令のPC値に加算することによりつぎの命令のPC値を作
り出す。また、命令デコード部（52）が、分岐命令をデ
コードしてデコード段階での分岐を指示したときは命令
長の代わりに分岐変位を分岐命令のPC値に加算すること
により分岐先命令のPC値を計算する。分岐命令に対して
命令デコード段階で分岐を行うことを本発明のデータ処
理装置ではプリブランチと呼ぶ。プリブランチの方法に
ついては特願昭61-204500と特願昭61-200557で詳しく述
べられている。(2.5) PC calculator The PC calculator (53) is output from the instruction decoder (52).
It is hard-wired controlled by the information related to PC calculation,
Calculate the PC value of the instruction. The data processor of this patent has a variable length instruction set, and the length of the instruction cannot be known unless the instruction is decoded. The PC calculation unit (53) adds the instruction length output from the instruction decoding unit (52) to the PC value of the instruction being decoded to generate the PC value of the next instruction. Also, when the instruction decoding unit (52) decodes a branch instruction and instructs branching at the decoding stage, by adding the branch displacement instead of the instruction length to the PC value of the branch instruction, the PC value of the branch destination instruction is added. To calculate. The branching of a branch instruction at the instruction decoding stage is called a pre-branch in the data processor of the present invention. The method of pre-branching is described in detail in Japanese Patent Application Nos. 61-204500 and 61-200557 .

PC計算部（53）の計算結果は各命令のPC値として命令の
デコード結果とともに出力されるほか、プリブランチ時
には、次にデコードすべき命令のアドレスとして命令フ
ェッチ部に出力される。The calculation result of the PC calculation unit (53) is output as the PC value of each instruction together with the instruction decode result, and at the pre-branch time, it is output to the instruction fetch unit as the address of the instruction to be decoded next.

また、次に命令デコード部（52）でデコードされる命令
の分岐予測のためのアドレスにも使用される。分岐予測
の方法については特願昭62-8394で詳しく述べられてい
る。It is also used as an address for branch prediction of an instruction to be decoded next by the instruction decoding unit (52). The branch prediction method is described in detail in Japanese Patent Application No. 62-8394 .

（2.6）データ演算部データ演算部（56）はマイクロプログラムにより制御さ
れ、マイクロROM部（55）の出力情報に従い、各命令の
機能を実現するに必要な演算をレジスタと演算と演算器
で実行する。演算対象となるオペランドがアドレスや即
値の場合は、オペランドアドレス計算部（54）で計算さ
れたアドレスや即値を外部バスインターフェイス部（5
7）を通過させて得る。また、演算対象となるオペラン
ドがCPU外部のメモリにあるデータの場合は、アドレス
計算部（54）で計算されたアドレスをバスインターフェ
イス部（57）がアドレス出力回路（58）から出力して、
CPU外部のメモリからフェッチしたオペランドをデータ
入出力回路（59）から得る。(2.6) Data operation unit The data operation unit (56) is controlled by the micro program, and executes the operations required to realize the function of each instruction with the register, operation and operation unit according to the output information of the micro ROM unit (55). To do. If the operand to be operated is an address or an immediate value, the address or immediate value calculated by the operand address calculation unit (54) is used as the external bus interface unit (5
Get through 7). When the operand to be operated is data in the memory outside the CPU, the bus interface unit (57) outputs the address calculated by the address calculation unit (54) from the address output circuit (58),
The operand fetched from the memory outside the CPU is obtained from the data input / output circuit (59).

演算器としてはALU、バレルシフタ、プライオリティエ
ンコーダやカウンタ、シフトレジスタなどがある。レジ
スタと主な演算器の間は３バスで結合されており、１つ
のレジスタ間演算を指示する１マイクロ命令を２クロッ
クサイクルで処理する。There are ALUs, barrel shifters, priority encoders, counters, shift registers, etc. as computing units. The registers and main arithmetic units are connected by three buses, and one microinstruction for instructing one inter-register operation is processed in two clock cycles.

データ演算のときCPU外部のメモリをアクセスする必要
がある時はマイクロプログラムの指示により外部バスイ
ンターフェイス部（57）を通してアドレス出力回路（5
8）からアドレスをCPU外部に出力し、データ入出力回路
（59）を通して目的のデータをフェッチする。When it is necessary to access the memory outside the CPU during data operation, the address output circuit (5
The address is output to the outside of the CPU from 8) and the target data is fetched through the data input / output circuit (59).

CPU外部のメモリにデータをストアするときは外部バス
インターフェイス部（57）を通してアドレス出力回路
（58）よりアドレスを出力すると同時に、データ入出力
回路（59）からデータをCPU外部に出力する。オペラン
ドストアを効率的に行うためデータ演算部（56）には４
バイトのストアバッファがある。When storing data in the memory outside the CPU, the address is output from the address output circuit (58) through the external bus interface section (57), and at the same time, the data is output from the data input / output circuit (59) to the outside of the CPU. 4 in the data operation unit (56) for efficient operand store
There is a store buffer of bytes.

ジャンプ命令の処理や例外処理などを行って新たな命令
アドレスをデータ演算部（56）が得たときはこれを命令
フェッチ部（51）とPC計算部（53）に出力する。When the data operation unit (56) obtains a new instruction address by performing a jump instruction process or an exception process, the data operation unit outputs the instruction address to the instruction fetch unit (51) and the PC calculation unit (53).

（2.7）外部バスインターフェイス部（57）は本特許の
データ処理装置の外部バスでの通信を制御する。メモリ
のアクセスはすべてクロック同期で行われ、最小２クロ
ックサイクルで行うことができる。(2.7) The external bus interface section (57) controls communication on the external bus of the data processing device of this patent. All memory accesses are clock-synchronized and can be performed in a minimum of two clock cycles.

メモリに対するアクセス要求は命令フェッチ部（51）、
アドレス計算部（54）、データ演算部（56）から独立に
生じる。外部バスインターフェイス部（57）はこれらの
メモリアクセス要求を調停する。さらにメモリとCPUを
結ぶデータバスサイズである32ビット（ワード）の整置
境界をまたぐメモリ番地にあるデータのアクセスは、こ
のブロック内で自動的にワード境界をまたぐことを検知
して、２回のメモリアクセスに分解して行う。The memory access request is sent to the instruction fetch unit (51),
It occurs independently from the address calculator (54) and the data calculator (56). The external bus interface section (57) arbitrates these memory access requests. Furthermore, when accessing data at a memory address that crosses a 32-bit (word) aligned boundary that is the size of the data bus connecting the memory and the CPU, it is detected twice that the word boundary is automatically crossed within this block, and the data is accessed twice. The memory access is decomposed into

プリフェッチするオペランドとするオペランドが重なる
場合の、コンフリクト防止処理やストアオペランドから
フェッチオペランドへのバイパス処理も行う。When the operands to be prefetched overlap, the conflict prevention process and the bypass process from the store operand to the fetch operand are also performed.

（３）パイプライン機構本発明のデータ処理装置のバイプライン処理は第３図に
示す構成となる。命令のプリフェッチを行う命令フェッ
チステージ（IFステージ（31））、命令のデコードを行
うデコードステージ（Ｄステージ（32））オペランドの
アドレス計算を行うオペランドアドレス計算ステージ
（Ａステージ（33））、マイクロROMアクセス（特にＲ
ステージ（36）と呼ぶ）とオペランドのプリフェッチ
（特にOFステージ（37）と呼ぶ）を行うオペランドフェ
ッチステージ（Ｆステージ（34））、命令の実行を行う
実行ステージ（Ｅステージ（35））の５段構成をパイプ
ライン処理の基本とする。Ｅステージ（35）では１段の
ストアバッファがあるほか、高機能命令の一部は命令実
行自体をパイプライン化するため、実際には５段以上の
パイプライン処理効果がある。(3) Pipeline Mechanism The bipline processing of the data processing apparatus of the present invention has the configuration shown in FIG. Instruction fetch stage (IF stage (31)) that prefetches instructions, decode stage (D stage (32)) that decodes instructions Operand address calculation stage (A stage (33)) that calculates address of operand, micro ROM Access (especially R
A stage (36)), an operand fetch stage (F stage (34)) for prefetching operands (specifically called an OF stage (37)), and an execution stage (E stage (35)) for executing instructions. The stage configuration is the basis of pipeline processing. In the E stage (35), there is a one-stage store buffer, and since some high-performance instructions pipeline the instruction execution itself, there is actually a pipeline processing effect of five or more stages.

各ステージは他のステージとは独立に動作し、理論上は
５つのステージが完全に独立動作する。各ステージは１
回の処理を最小２クロックで行うことができる。従って
理想的には２クロックごとに次々とパイプライン処理が
進行する。Each stage operates independently of the other stages, and theoretically five stages operate completely independently. Each stage is 1
It is possible to perform the processing once with a minimum of two clocks. Therefore, ideally, pipeline processing proceeds every two clocks.

本発明のデータ処理装置にはメモリ−メモリ間演算や、
メモリ間接アドレッシングなど、基本パイプライン処理
１回だけでは処理が行えない命令があるが、本発明のデ
ータ処理装置はこれらの処理に対してもなるべく均衡し
たパイプライン処理が行えるように設計されている。複
数のメモリオペランドをもつ命令に対してはメモリオペ
ランドの数をもとに、デコード段階で複数のパイプライ
ン処理単位（ステップコード）に分解してパイプライン
処理を行うのである。パイプライン処理単位の分解方法
に関しては特願昭61-236456で詳しく述べられている。The data processing device of the present invention includes a memory-memory operation,
Although there are instructions that cannot be processed by only one basic pipeline processing such as memory indirect addressing, the data processing apparatus of the present invention is designed so that pipeline processing can be performed in a balanced manner as much as possible. . For an instruction having a plurality of memory operands, the pipeline processing is performed by decomposing into a plurality of pipeline processing units (step codes) at the decoding stage based on the number of memory operands. The method of disassembling the pipeline processing unit is described in detail in Japanese Patent Application No. 61-236456 .

IFステージ（31）からＤステージ（32）に渡される情報
は命令コードそのものである。Ｄステージ（32）からＡ
ステージに渡される情報は命令で指定された演算に関す
るもの（Ｄコード（41）と呼ぶ）と、オペランドのアド
レス計算に関係するもの（Ａコード（42）と呼ぶ）との
２つある。Ａステージ（33）からＦステージに渡される
情報はマイクロプログラムルーチンのエントリ番地やマ
イクロプログラムへのパラメータなどを含むＲコード
（43）と、オペランドのアドレスとアクセス方法指示情
報などを含むＦコードとの２つである。Ｆステージ（3
4）からＥステージ（35）に渡される情報は演算制御情
報とリテラルなどを含むＥコード（45）と、オペランド
やオペランドアドレスなどを含むＳコード（46）との２
つである。The information passed from the IF stage (31) to the D stage (32) is the instruction code itself. From D stage (32) to A
There are two types of information passed to the stage: information related to the operation designated by the instruction (called the D code (41)) and information related to the address calculation of the operand (called the A code (42)). Information passed from the A stage (33) to the F stage includes an R code (43) including an entry address of a microprogram routine and parameters to the microprogram, and an F code including an operand address and access method instruction information. There are two. F stage (3
The information passed from the 4) to the E stage (35) is an E code (45) including operation control information and literals, and an S code (46) including operands and operand addresses.
Is one.

Ｅステージ（35）以外のステージで検出されたEITはそ
のコードがＥステージ（35）の到達するまではRIT処理
を起動しない。Ｅステージ（35）で処理されている命令
のみが実行段階の命令であり、IFステージ（31）〜Ｆス
テージ（34）で処理されている命令はまだ実行段階に至
っていないのである。従ってＥステージ（35）以外で検
出れたEITは検出したことをステップコード中に記録し
て次のステージに伝えられるのみである。The EIT detected at a stage other than the E stage (35) does not start the RIT processing until the code reaches the E stage (35). Only the instructions processed in the E stage (35) are in the execution stage, and the instructions processed in the IF stage (31) to F stage (34) have not yet reached the execution stage. Therefore, the EIT detected at a stage other than the E stage (35) is only recorded in the step code and transmitted to the next stage.

（3.1）パイプライン処理単位（3.1.1）命令コードフィールドの分類本発明のデータ処理装置のパイプライン処理単位は命令
セットのフォーマットの特徴を利用して決定されてい
る。（１）の節で述べたように、本発明のデータ処理装
置の命令は２バイト単位の可変長命令であり、基本的に
は（２バイトの命令基本部＋０〜４バイトのアドレッシ
ング拡張部）を１〜３回繰り返すことにより命令が構成
されている。(3.1) Pipeline processing unit (3.1.1) Classification of instruction code field The pipeline processing unit of the data processing device of the present invention is determined by utilizing the characteristics of the format of the instruction set. As described in the section (1), the instruction of the data processing device of the present invention is a variable length instruction in units of 2 bytes, and basically (2 byte instruction basic part + 0 to 4 byte addressing extension part). The instruction is configured by repeating 1 to 3 times.

命令基本部には多くの場合オペコード部とアドレッシン
グモード指定部があり、インデックスアドレッシングや
メモリ間接アドレッシングが必要なときにはアドレッシ
ング拡張部の代わりに（２バイトの多段間接モード指定
部＋０〜４バイトのアドレッシング拡張部）が任意個付
く。また、命令により２また４バイトの命令固有の拡張
部が最後に付く。In many cases, the instruction basic part has an opcode part and addressing mode specification part. When index addressing or memory indirect addressing is required, instead of the addressing expansion part (2 bytes multi-stage indirect mode specification part + 0 to 4 bytes addressing expansion Parts) are attached arbitrarily. In addition, depending on the instruction, a 2- or 4-byte instruction-specific extension is added at the end.

命令基本部には命令のオペコード、基本アドレッシング
モード、リテラルなどが含まれる。アドレッシング拡張
部はディスプレースメント、絶対アドレス、即値、分岐
命令の変位のいずれかである。命令固有の拡張部にはレ
ジスタマップ、I-format命令の即値指定などがある。第
32図で本発明のデータ処理装置の基本的命令フォーマッ
トの特徴を示す。The instruction basic part includes an opcode of an instruction, a basic addressing mode, and a literal. The addressing extension unit is one of displacement, absolute address, immediate value, and displacement of branch instruction. The instruction-specific expansion part includes a register map and immediate value specification of I-format instructions. First
FIG. 32 shows the characteristics of the basic instruction format of the data processor of the present invention.

（3.12）ステップコードへの命令の分解本発明のデータ処理装置では上記の命令フォーマットの
特徴を生かしたパイプライン処理を行う。Ｄステージ
（32）では（２バイトの命令基本部＋０〜４バイトのア
ドレッシング拡張部）、（多段間接モード指定部＋アド
レッシング拡張部）または命令固有の拡張部を１つのデ
コード単位として処理する。各回のデコード結果をステ
ップコードと呼び、Ａステージ（33）以降ではこのステ
ップコードをパイプライン処理の単位としている。ステ
ップコードの数は命令ごとに固有であり、多段間接モー
ド指定を行わないとき、１つの命令は最小１個、最大３
個のステップコードに分かれる。多段間接モード指定が
あればそれだけステップコードが増える。ただし、これ
は後で述べるようにデコード段階のみである。(3.12) Decomposition of instruction into step code The data processing apparatus of the present invention performs pipeline processing that makes the most of the characteristics of the above instruction format. In the D stage (32), (2-byte instruction basic part + 0 to 4-byte addressing extension part), (multistage indirect mode designating part + addressing extension part) or an instruction-specific extension part is processed as one decoding unit. The result of decoding each time is called a step code, and this step code is used as a unit of pipeline processing after the A stage (33). The number of step codes is peculiar to each instruction. When the multi-stage indirect mode is not specified, one instruction is at least one and maximum is three.
It is divided into individual step codes. If there is a multistage indirect mode specification, the step code will increase accordingly. However, this is only the decoding stage as described later.

（3.1.3）プログラムカウンタの管理本発明のデータ処理装置のパイプライン上に存在するス
テップコードはすべて別命令に対するものである可能性
があり、プログラムカウンタの値はステップコードごと
に管理する。すべてのステップコードはそのステップコ
ードのもとになった命令のプログラムカウンタ値をも
つ。ステップコードに付属してパイプラインの各ステー
ジを流れるプログラムカウンタ値はステッププログラム
カウンタ（SPC）と呼ぶ。SPCはパイプラインステージを
次々と受け渡されていく。(3.1.3) Management of Program Counter All step codes existing on the pipeline of the data processing device of the present invention may be for different instructions, and the value of the program counter is managed for each step code. Every step code has the program counter value of the instruction that caused the step code. The program counter value attached to the step code and flowing through each stage of the pipeline is called a step program counter (SPC). SPCs are handed over to the pipeline stages one after another.

（3.2）各パイプラインステージの処理各パイプラインステージの入出力ステップコードには第
３図に示したように便宜上名前が付けられている。また
ステップコードはオペコードに関する処理を行い、マイ
クロROMのエントリ番地やＥステージ（35）に対するパ
ラメータなどになる系列とＥステージ（35）のマイクロ
命令に対するオペランドになる系列の２系列がある。(3.2) Processing of each pipeline stage The input / output step code of each pipeline stage is named for convenience as shown in FIG. The step code performs processing related to the operation code, and there are two series of series which become the entry address of the micro ROM and parameters for the E stage (35) and the series which become the operand for the micro instruction of the E stage (35).

（3.2.1）命令フェッチステージ命令フェッチステージ（Ifステージ（31））は命令をメ
モリやブランチバッファからフェッチし、命令キューに
入力して、Ｄステージ（32）に対して命令コードを出力
する。命令キューの入力は整置された４バイト単位で行
う。メモリから命令をフェッチするときは整置された４
バイトにつき最小２クロックを要する。ブランチバッフ
ァがヒットした時は整置された４バイトにつき１クロッ
クでフェッチ可能である。命令キューの出力単位はは２
バイトごとに可変であり、２クロックの間に最大６バイ
トまで出力できる。また分岐の直後には命令キューをバ
イパスして命令基本部２バイトを直接命令デコーダに転
送することもできる。(3.2.1) Instruction fetch stage The instruction fetch stage (If stage (31)) fetches an instruction from the memory or branch buffer, inputs it to the instruction queue, and outputs an instruction code to the D stage (32). Input to the instruction queue is performed in aligned 4-byte units. 4 aligned when fetching instructions from memory
It requires a minimum of 2 clocks per byte. When the branch buffer is hit, it is possible to fetch the aligned 4 bytes in 1 clock. The output unit of the instruction queue is 2
It is variable for each byte, and up to 6 bytes can be output in 2 clocks. Immediately after branching, it is possible to bypass the instruction queue and directly transfer the two bytes of the basic instruction portion to the instruction decoder.

ブランチバッファへの命令の登録やクリアなどの制御、
プリフェッチ先命令アドレスの管理や命令キューの制御
もIFステージ（31）で行う。Controls such as registering and clearing instructions in the branch buffer,
The IF stage (31) also manages the prefetch destination instruction address and controls the instruction queue.

IFステージ（31）で検出するEITには命令をメモリから
フェッチするときのバスアクセス例外や、メモリ保護違
反などによるアドレス変換例外がある。The EIT detected in the IF stage (31) includes a bus access exception when fetching an instruction from memory and an address translation exception due to a memory protection violation.

（3.2.2）命令デコードステージ命令デコードステージ（Ｄステージ（32））はIFステー
ジ（31）から入力された命令コードをデコードする。デ
コードは命令デコード部（52）のFHWデコーダ、NFHWデ
コーダ、アドレッシングモードデコーダを使用して、２
クロック単位に１度行ない、１回のデコード処理で、０
〜６バイトの命令コードを消費する（RET命令の復帰先
アドレスを含むステップコードの出力などでは命令コー
ドを消費しない）。１回のデコードでＡステージ（33）
に対してアドレス計算情報であるＡコード（42）である
約35ビットの制御コードと最大32ビットアドレス修飾情
報と、オペコードの中間デコード結果であるＤコード結
果であるＤコード（41）である約50ビットの制御コード
と８ビットのリテラル情報と、を出力する。(3.2.2) Instruction decode stage The instruction decode stage (D stage (32)) decodes the instruction code input from the IF stage (31). Decoding is performed using the FHW decoder, NFHW decoder, and addressing mode decoder of the instruction decoding unit (52).
It is performed once for each clock unit, and is 0 in one decoding process.
-Consumes 6-byte instruction code (does not consume instruction code when outputting step code including return address of RET instruction). A stage by decoding once (33)
On the other hand, the A code (42) which is the address calculation information, the control code of about 35 bits, the maximum 32 bits address modification information, and the D code (41) which is the D code result which is the intermediate decoding result of the operation code. It outputs a 50-bit control code and 8-bit literal information.

Ｄステージ（32）では各命令のPC計算部（53）の制御、
分岐予測処理、プリブランチ命令に対するプリブランチ
処理、命令キューからの命令コード出力処理も行う。In the D stage (32), control of the PC calculation unit (53) for each instruction,
Branch prediction processing, pre-branch processing for pre-branch instructions, and instruction code output processing from the instruction queue are also performed.

Ｄステージ（32）で検出するEITには予約命令例外、プ
リブランチ時の奇数アドレスジャンプトラップがある。
また、IFステージ（31）より転送されてきた各種EITは
ステップコード内にエンコードする処理をしてＡステー
ジ（33）に転送する。The EIT detected in the D stage (32) includes a reserved instruction exception and an odd address jump trap during pre-branch.
Further, various EITs transferred from the IF stage (31) are encoded in the step code and transferred to the A stage (33).

（3.2.2）オペランドアドレス計算ステージオペランドアドレス計算ステージ（Ａステージ（33））
は処理が大きく２つに分かれる。１つは命令デコード部
（52）のデコーダ２を使用して、オペコードの後段デコ
ードを行う処理で、他方はオペランドアドレス計算部
（54）でオペランドアドレスの計算を行う処理である。(3.2.2) Operand address calculation stage Operand address calculation stage (A stage (33))
Is roughly divided into two processes. One is a process of performing the subsequent decoding of the operation code by using the decoder 2 of the instruction decoding unit (52), and the other is a process of calculating the operand address in the operand address calculation unit (54).

オペコードの後段デコード処理はＤコード（41）を入力
とし、レジスタやメモリの書き込み予約及びマイクロプ
ログラムのエントリ番地とマイクロプログラムに対する
パラメータなどを含むＲコード（43）の出力を行う。な
お、レジスタやメモリの書き込み予約は、アドレス計算
で参照したレジスタやメモリの内容が、パイプライン上
を先行する命令で書き換えられ、誤ったアドレス計算が
行われるのを防ぐためのものである。レジスタやメモリ
の書き込み予約はデッドロックを避けるため、ステップ
コードごとに行うのではなく命令ごとに行う。レジスタ
やメモリの書き込み予約については特願昭62-144394で
詳しく述べられている。The subsequent decoding process of the operation code receives the D code (41) as input, and outputs the R code (43) including the write reservation of the register and the memory and the entry address of the microprogram and parameters for the microprogram. Note that the register or memory write reservation is for preventing the contents of the register or memory referred to in the address calculation from being rewritten by the preceding instruction on the pipeline and causing incorrect address calculation. In order to avoid deadlock, write reservation of registers and memory is performed not for each step code but for each instruction. The write reservation of registers and memory is described in detail in Japanese Patent Application No. 62-144394 .

オペランドアドレス計算処理はＡコード（42）を入力と
し、Ａコード（42）に従いオペランドアドレス計算部
（54）で加算やメモリ間接参照を組み合わせてアドレス
計算を行い、その計算結果をＦコード（44）として出力
する。この際、アドレス計算に伴うレジスタやメモリの
読み出し時にコンフリクトチェックを行い、先行命令が
レジスタやメモリに書き込み処理を終了していないため
コンフリクトが指示されれば、先行命令がＥステージ
（35）で書込み処理を終了するまで待つ。また、オペラ
ンドアドレスやメモリ間接参照のアドレスがメモリにマ
ップされたI/O領域に入るかどうかのチェックも行う。In the operand address calculation process, an A code (42) is input, and according to the A code (42), the operand address calculation unit (54) combines address addition and memory indirect reference to perform address calculation, and the calculation result is the F code (44). Output as. At this time, a conflict check is performed at the time of reading the register or memory associated with the address calculation, and if the conflict is instructed because the preceding instruction has not completed the writing process to the register or memory, the preceding instruction is written at the E stage (35). Wait until the process is completed. It also checks whether the operand address or memory indirect reference address falls within the I / O area mapped in memory.

Ａステージ（33）で検出するEITには予約命令外、特権
命令例外、バスアクセス例外、アドレス変換例外、メモ
リ間接アドレッシングのときのオペランドブレイクポイ
ントヒットによるデバッグトラップがある。Ｄコード
（41）、Ａコード（42）自体がEITを起こしたことを示
しておれば、Ａステージ（33）はそのコードに対してア
ドレス計算処理をせず、そのEITをＲコード（43）やＦ
コード（44）に伝える。The EIT detected in the A stage (33) includes a non-reserved instruction, a privileged instruction exception, a bus access exception, an address translation exception, and a debug trap due to an operand breakpoint hit at memory indirect addressing. If the D code (41) and the A code (42) indicate that the EIT has occurred, the A stage (33) does not perform address calculation processing on the code, and the EIT is converted to the R code (43). And F
Tell the code (44).

（3.2.4）マイクロROMアクセスステージオペランドフェッチステージ（Ｆステージ（34））も処
理が大きく２つに分かれる。１つはマイクロROMのアク
セス処理であり、特にＲステージ（36）と呼ぶ。他方は
オペランドプリフェッチ処理であり、特にOFステージ
（37）と呼ぶ。Ｒステージ（36）とOFステージ（37）は
必ずしも同時動作するわけではなく、メモリアクセス権
が獲得できるかどうかなどに依存して、独立に動作す
る。(3.2.4) Micro ROM access stage The operand fetch stage (F stage (34)) is also roughly divided into two processes. One is a micro ROM access process, which is particularly called an R stage (36). The other is the operand prefetch process, which is particularly called the OF stage (37). The R stage (36) and the OF stage (37) do not always operate simultaneously, but operate independently depending on whether or not a memory access right can be acquired.

Ｒステージ（36）の処理であるマイクロROMアクセス処
理はＲコードに対して次のＥステージでの実行に使用す
る実行制御コードである。Ｅコードを作り出すためのマ
イクロROMアクセスとマイクロ命令デコード処理であ
る。１つのＲコードに対する処理が２つ以上のマイクロ
プログラムステップに分解される場合、マイクロROMは
Ｅステージ（35）で使用され、次のＲコード（43）はマ
イクロROMアクセス待ちになる。Ｒコード（43）に対す
るマイクロROMアクセスが行われるのはその前のＥステ
ージ（35）での最後のマイクロ命令実行の時である。本
発明のデータ処理装置ではほとんどの基本命令は１マイ
クロプログラムステップ行われるため実際にはＲコード
（43）に対するマイクロROMアクセスが次々と行われる
ことが多い。The micro ROM access process, which is the process of the R stage (36), is an execution control code used for execution at the next E stage for the R code. It is a micro ROM access and a micro instruction decoding process for creating an E code. When the processing for one R code is decomposed into two or more microprogram steps, the micro ROM is used in the E stage (35), and the next R code (43) waits for the micro ROM access. The micro ROM access to the R code (43) is performed at the last micro instruction execution in the E stage (35) before that. In the data processor of the present invention, most of the basic instructions are carried out by one microprogram step, so in practice, micro ROM access to the R code (43) is often carried out one after another.

Ｒステージ（36）で新たに検出するEITはない。Ｒコー
ド（36）が命令処理再実行型のEITを示しているときは
そのEIT処理に対するマイクロプログラムが実行される
のでＲステージ（36）はそのＲコード（43）に従ったマ
イクロ命令をフェッチする。Ｒコード（43）が奇数アド
レスジャンプトラップを示しているときＲステージ（3
6）はそれをＥコード（45）に伝える。これはプリブラ
ンチに対するもので、Ｅステージ（35）ではそのＥコー
ド（45）で分岐が生じなければそのプリブランチを有効
として奇数アドレスジャンプトラップを発生する。There is no new EIT detected in the R stage (36). When the R code (36) indicates the EIT of the instruction processing re-execution type, the micro program for the EIT processing is executed, so the R stage (36) fetches the micro instruction according to the R code (43). . When the R code (43) indicates an odd address jump trap, the R stage (3
6) conveys it to the E code (45). This is for a pre-branch. In the E stage (35), if no branch occurs in the E code (45), the pre-branch is validated and an odd address jump trap is generated.

（3.2.5）オペランドフェッチステージオペランドフェッチステージ（OFステージ（37））はＦ
ステージ（34）で行う上記の２つの処理のうちオペラン
ドプリフェッチ処理を行う。(3.2.5) Operand fetch stage Operand fetch stage (OF stage (37)) is F
Operand prefetch processing is performed among the above two processing performed in the stage (34).

オペランドプリフェッチはＦコード（44）を入力とし、
フェッチしたオペランドとそのアドレスをＳコード（4
6）として出力する。１つのＦコード（44）ではワード
境界をまたいでもよいが４バイト以下のオペランドフェ
ッチを指定する。Ｆコード（44）にはオペランドのアク
セスを行うかどうかの指定も含まれており、Ａステージ
（33）で計算したオペランドアドレス自体や即値をＥス
テージ（35）に転送する場合にはオペランドプリフェッ
チは行わず、Ｆコード（44）の内容をＳコード（46）と
して転送する。プリフェッチしようとするオペランドと
Ｅステージ（35）が書き込み処理を行おうとするオペラ
ンドが一致するときは、オペランドプリフェッチはメモ
リから行わず、バイパスして行なう。またI/O領域に対
してはオペランドプリフェッチを遅延させ、先行命令が
すべて完了するまで待ってオペランドフェッチを行う。Operand prefetch uses F code (44) as input,
The fetched operand and its address are S code (4
6) Output as. One F code (44) may cross word boundaries, but specifies an operand fetch of 4 bytes or less. The F code (44) also contains a designation as to whether or not to access the operand. When the operand address itself or the immediate value calculated in the A stage (33) is transferred to the E stage (35), the operand prefetch is Instead, the contents of the F code (44) are transferred as the S code (46). When the operand to be prefetched and the operand to be written by the E stage (35) match, the operand prefetch is not performed from the memory but bypassed. Operand prefetch is delayed for the I / O area, and the operand fetch is performed after waiting for all the preceding instructions.

OFステージ（37）で検出するEITにはバスアクセス例
外、アドレス変換例外、オペランドプリフェッチに対す
るブレイクポイントヒットによるデバッグトラップがあ
る。Ｆコード（44）がデバッグトラップ以外のEITを示
しているときはそれをＳコード（46）に転送し、オペラ
ンドプリフェッチは行わない。Ｆコード（44）がデバッ
グトラップを示しているときはそのＦコード（44）に対
してEITを示していないときと同じ処理をすると共にデ
バッグトラップをＳコード（46）に伝える。The EIT detected at the OF stage (37) includes a bus access exception, an address translation exception, and a debug trap due to a breakpoint hit for operand prefetch. When the F code (44) indicates an EIT other than the debug trap, it is transferred to the S code (46) and operand prefetch is not performed. When the F code (44) indicates a debug trap, the same processing as when the F code (44) does not indicate EIT is performed and the debug trap is transmitted to the S code (46).

（3.2.6）実行ステージ実行ステージ（Ｅステージ（35））はＥコード（45）、
はＳコード（46）を入力として動作する。このＥステー
ジ（35）が命令を実行するステージであり、Ｆステージ
（34）以前のステージで行われた処理はすべてＥステー
ジ（35）のための前処理である。Ｅステージ（35）でジ
ャンプ命令が実行されたり、EIT処理が起動されたりし
たときは（IF）ステージ（31）〜Ｆステージ（34）まで
の処理はすべて無効化される。Ｅステージ（35）はマイ
クロプログラムにより制御され、Ｒコード（45）に示さ
れたマイクロプログラムのエントリ番地からの一連のマ
イクロプログラムを実行することにより命令を実行す
る。(3.2.6) Execution stage The execution stage (E stage (35)) is the E code (45),
Operates by inputting the S code (46). The E stage (35) is a stage for executing an instruction, and all the processing performed in the stages before the F stage (34) is preprocessing for the E stage (35). When the jump instruction is executed in the E stage (35) or the EIT process is activated, all the processes from the (IF) stage (31) to the F stage (34) are invalidated. The E stage (35) is controlled by the microprogram, and executes an instruction by executing a series of microprograms from the entry address of the microprogram indicated by the R code (45).

マイクロROMの読み出しとマイクロ命令の実行はパイプ
ライン化されて行われる。従ってマイクロプログラムで
分岐が起きたときは１マイクロステップの空きができ
る。また、Ｅステージ（35）はデータ演算部（56）にあ
るストアバッファを利用して、４バイト以内のオペラン
ドストアと次のマイクロ命令実行をパイプライン処理す
ることもできる。The reading of the micro ROM and the execution of the micro instructions are pipelined. Therefore, when a branch occurs in the microprogram, there is a space of 1 microstep. Further, the E stage (35) can use the store buffer in the data operation unit (56) to pipeline the operand store within 4 bytes and the next microinstruction execution.

Ｅステージ（35）ではＡステージ（33）で行ったレジス
タやメモリに対する書き込み予約をオペランドの書き込
みの後、解除する。In the E stage (35), the write reservation for the register and memory made in the A stage (33) is canceled after writing the operand.

また条件分岐命令がＥステージ（35）で分岐を起こした
ときはその条件分岐命令に対する分岐予測が誤っていた
のであるから分岐履歴の書き換えを行う。Further, when the conditional branch instruction causes a branch at the E stage (35), the branch prediction for the conditional branch instruction was incorrect, so the branch history is rewritten.

Ｅステージ（35）で検出されるEITにはバスアクセス例
外、アドレス変換例外、デバッグトラップ、奇数アドレ
スジャンプトラップ、予約機能例外、不正オペランド例
外、予約スタックフォーマット例外、ゼロ除算トラッ
プ、無条件トラップ、条件トラップ、遅延コンテキスト
トラップ、外部割込、遅延割込、リセット割込、システ
ム障害がある。EIT detected in E stage (35) includes bus access exception, address translation exception, debug trap, odd address jump trap, reserved function exception, illegal operand exception, reserved stack format exception, divide by zero trap, unconditional trap, condition There are traps, delayed context traps, external interrupts, delayed interrupts, reset interrupts, and system failures.

Ｅステージ（35）で検出されたEITはすべてEIT処理され
るがＥステージ以前のIFステージ（31）〜Ｆステージ
（34）の間で検出されＲコード（43）やＳコード（46）
に反映されているEITは必ずEIT処理されるとは限らな
い。IFステージ（31）〜Ｆステージ（34）の間で検出し
たが、先行の命令がＥステージ（35）でジャンプ命令が
実行されたなどの原因でＥステージ（35）まで到達しな
かったEITはすべてキャンセルされる。そのEITを起こし
た命令はそもそも実行されなかったことになる。All EITs detected in E stage (35) are processed by EIT, but detected between IF stage (31) to F stage (34) before E stage and R code (43) or S code (46).
The EIT reflected in is not always processed by EIT. An EIT that was detected between the IF stage (31) and the F stage (34) but did not reach the E stage (35) due to a preceding instruction such as a jump instruction being executed at the E stage (35). All canceled. The instruction that caused the EIT was not executed in the first place.

外部割込や遅延割込は命令の切れ目でＥステージ（35）
で直接受け付けられ、マイクロプログラムにより必要な
処理が実行される。その他の各種EITも処理はマイクロ
プログラムにより行われる。External interrupts and delayed interrupts are E stages at instruction breaks (35)
Is directly received by and the required processing is executed by the microprogram. Other various EITs are also processed by microprograms.

（3.3）各パイプラインステージの状態制御パイプラインの各ステージは入力ラッチと出力ラッチを
持ち、他のステージとは独立に動作することを基本とす
る。各ステージは１つ前に行った処理が終わり、その処
理結果を出力ラッチから次のステージの入力ラッチに転
送し、自分のステージの入力ラッチに次の処理に必要な
入力信号がすべてそろえば次の処理を開始する。(3.3) State control of each pipeline stage Each stage of the pipeline basically has an input latch and an output latch and operates independently of other stages. Each stage completes the previous processing, transfers the processing result from the output latch to the input latch of the next stage, and when the input latch of the own stage has all the input signals necessary for the next processing, The process of is started.

つまり、各ステージは、１つ前段のステージから出力さ
れてくる次の処理に対する入力信号がすべて有効とな
り、今の処理結果を後段のステージの入力ラッチに転送
して出力ラッチが空になると次の処理を開始する。In other words, in each stage, when all the input signals for the next processing output from the previous stage become valid, the current processing result is transferred to the input latch of the subsequent stage and the output latch becomes empty, the next Start processing.

各ステージが動作を開始する１つ前のクロックタイミン
グで入力信号がすべてそろっている必要がある。入力信
号がそろっていないと、そのステージは待ち状態（入力
待ち）になる。出力ラッチから次のステージの入力ラッ
チへの転送を行うときは次のステージの入力ラッチが空
き状態になっている必要があり、次のステージの入力ラ
ッチが空きでない場合もパイプラインステージは待ち状
態（出力待ち）になる。必要なメモリアクセス権が獲得
できなかったり、処理しているメモリアクセスにウエイ
トが挿入されたり、その他のパイプラインコンフリクト
が生じると、各ステージの処理自体が遅延する。All input signals must be available at the clock timing one clock before the operation of each stage. If the input signals are not complete, the stage enters the waiting state (waiting for input). When transferring from the output latch to the input latch of the next stage, the input latch of the next stage must be empty, and even if the input latch of the next stage is not empty, the pipeline stage waits. (Waiting for output). If the necessary memory access right cannot be acquired, a wait is inserted in the memory access being processed, or other pipeline conflict occurs, the processing itself of each stage is delayed.

（４）分岐命令の処理本発明のデータ処理装置では上記に説明したように多段
のパイプライン処理を採用しているため分岐命令を実行
した際のオーバーヘッドが大きい。このオーバーヘッド
を小さくするため動的分岐予測処理を行う。動的分岐予
測処理は実行ステージで分岐を行う代わりにデコードス
テージで分岐を行うことによりなるべく早く分岐先命令
を取り込むことを狙いとしている。(4) Processing of branch instruction Since the data processing device of the present invention employs the multi-stage pipeline processing as described above, the overhead when executing the branch instruction is large. Dynamic branch prediction processing is performed to reduce this overhead. The dynamic branch prediction process aims at fetching a branch target instruction as early as possible by performing a branch at the decode stage instead of a branch at the execution stage.

本発明のデータ処理装置に限らず、データ処理装置で
は、一般に分岐命令が実行される頻度は大きく、動的分
岐予測処理による性能改善効果は大きい。Not only the data processing device of the present invention, but in the data processing device, the branch instruction is generally executed at a high frequency, and the performance improvement effect by the dynamic branch prediction process is large.

（4.1）分岐命令の種類本発明のデータ処理装置では動的分岐予測処理を行う命
令をプリブランチ命令と呼ぶ。プリブランチ命令には無
条件分岐命令のように、動的な予測にかかわらず、必ず
分岐する命令も含む。(4.1) Kind of branch instruction In the data processing device of the present invention, an instruction that performs dynamic branch prediction processing is called a pre-branch instruction. The pre-branch instruction also includes an instruction that always branches regardless of dynamic prediction, such as an unconditional branch instruction.

本発明のデータ処理装置がもつ分岐命令は分岐条件がス
タティックかダイナミックか及び分岐先がスタティック
かダイナミックかにより計４種類に分類できるが、本発
明のデータ処理装置ではこのうちつぎの２種類に分類さ
れる命令をプリブランチ命令とする。The branch instructions included in the data processing device of the present invention can be classified into a total of four types depending on whether the branch condition is static or dynamic and whether the branch destination is static or dynamic. The data processing device of the present invention is classified into the following two types. The executed instruction is a pre-branch instruction.

第１の種類の分岐命令は分岐条件、分岐先ともスタティ
ックな命令である。この種の命令には無条件分岐命令
（BRA）とサブルーチン呼び出し命令（BSR）がある。第
２の種類の分岐命令は分岐条件がダイナミックで分岐先
がスタティックな命令である。この種の命令には条件分
岐命令（Bcc），ループ制御命令（ACB）がある。The first type of branch instruction is a static instruction in both branch condition and branch destination. This kind of instruction includes an unconditional branch instruction (BRA) and a subroutine call instruction (BSR). The second type of branch instruction is an instruction whose branch condition is dynamic and whose branch destination is static. This kind of instruction includes a conditional branch instruction (Bcc) and a loop control instruction (ACB).

（4.2）分岐命令処理回路の機能構成第１図に本発明のデータ処理装置の分岐命令処理回路の
構成を示す。第１図には命令フェッチ部（51）、命令デ
コード部（52）、PC計算部（53）、オペランドアドレス
計算部（54）、データ演算部（56）、外部バスインター
フェイス部（57）のそれぞれに含まれる回路の部分詳細
図と、アドレス出力回路（58）、データ入出力回路（5
9）よりなる。(4.2) Functional Configuration of Branch Instruction Processing Circuit FIG. 1 shows the configuration of the branch instruction processing circuit of the data processing device of the present invention. FIG. 1 shows an instruction fetch section (51), an instruction decode section (52), a PC calculation section (53), an operand address calculation section (54), a data calculation section (56), and an external bus interface section (57). Partial detailed view of the circuit included in, address output circuit (58), data input / output circuit (5
9) consists of.

命令デコード（111）とPC加算器（132）の入力側、アド
レス加算器（124）の入力側は、ディスプレースメント
値、分岐命令の変位値を転送するDISPバス（100）で結
ばれている。命令デコード（111）とアドレス加算器（1
24）の入力側はステップコード生成に使用した命令コー
ド長、スタックプッシュモードのときプリデクリメント
値などを転送する補正値バス（101）でも結ばれてい
る。命令デコー（111）とPC加算器（132）の入力側はス
テップコード生成に使用した命令コード長を転送する命
令長バス（101）でも結ばれている。レジスタファイル
（144）とアドレス加算器（124）入力側はレジスタファ
イル（144）に蓄えられているアドレス値を転送するＡ
バス（103）で結ばれている。The instruction decode (111) is connected to the input side of the PC adder (132) and the input side of the address adder (124) by a DISP bus (100) that transfers displacement values and displacement values of branch instructions. Instruction decode (111) and address adder (1
The input side of 24) is also connected to the correction value bus (101) that transfers the instruction code length used for step code generation and the pre-decrement value in the stack push mode. The instruction decoder (111) and the input side of the PC adder (132) are also connected to an instruction length bus (101) for transferring the instruction code length used for step code generation. The register file (144) and the address adder (124) input side transfer the address value stored in the register file (144) A
It is connected by bus (103).

命令デコーダ（111）には命令キュー（112）から命令コ
ードが入力され、分岐予測テーブル（113）から分岐予
測ビットが入力される。命令デコーダ（111）の出力部
には、分岐予測結果により、条件分岐命令の分岐条件指
定フィールドを、Ｅステージ（35）にそのまま出力する
か条件指定を反転して出力するかの選択を行う、分岐条
件生成回路（114）がある。An instruction code is input from the instruction queue (112) to the instruction decoder (111), and a branch prediction bit is input from the branch prediction table (113). At the output unit of the instruction decoder (111), the branch prediction result is used to select whether to output the branch condition designation field of the conditional branch instruction to the E stage (35) as it is or invert the condition designation and output. There is a branch condition generation circuit (114).

命令長バス（101）の値とDISPバス（100）の値のどちら
かを選択してを入力する被加算値選択回路（131）の出
力と、Ｄステージ（32）でデコードした命令のPC値を保
持するDPC（135）またはステップコードの切れ目毎の作
業用PC値を保持するTPC（134）のどちらかと、はPC加算
器（132）に入力される。PC加算器（132）の出力はPC加
算器出力ラッチ（133）を通してCAバス（104）やPOバス
（105）に出力される。POバス（105）はラッチTPC（13
4）、ラッチDPC（135）、Ａステージで処理中の命令のP
C値を保持するラッチAPC（136）、さらに分岐予測テー
ブル（113）にも結合している。TPC（134）にはＥステ
ージ（35）で分岐やジャンプが生じたとき、新たな命令
番地を入力するため、CAバス（103）からの入力経路も
ある。Output of the added value selection circuit (131) that selects and inputs either the value of the instruction length bus (101) or the value of the DISP bus (100) and the PC value of the instruction decoded in the D stage (32) , Or a TPC (134) that holds a working PC value for each break in the step code, is input to a PC adder (132). The output of the PC adder (132) is output to the CA bus (104) and PO bus (105) through the PC adder output latch (133). PO bus (105) is latched TPC (13
4), Latch DPC (135), P of the instruction being processed at the A stage
It is also connected to a latch APC (136) that holds the C value and a branch prediction table (113). The TPC (134) also has an input path from the CA bus (103) for inputting a new instruction address when a branch or jump occurs in the E stage (35).

補正値バス（102）の出力とDISPバス（100）の出力はデ
ィスプレースメント選択回路（122）に入力され、どち
らか一方がアドレス加算器（124）に入力され、DISPバ
ス（100）出力とＡバス（103）出力はベースアドレス選
択回路（123）に入力され、どちらか一方がアドレス加
算器（124）に入力される。アドレス加算器（124）は、
ディスプレースメント選択回路（122）の出力、ベース
アドレス選択回路（123）の出力、それにＡバス（103）
より入力された値をシフトすることにより、１倍、２
倍、４倍、８倍の値とするインデックス値生成回路（12
1）の出力、の計３つの値を入力として、３値加算を行
う。アドレス加算器（124）の出力値はアドレス加算器
出力ラッチ（125）を通して、AOバス（106）に出力され
る。AOバス（106）は、メモリ間接アドレッシングを行
うとき、AAバス（107）を通してアドレス出力回路（5
8）からCPU外部にアドレス値を出力するときそのアドレ
ス値を保持するラッチIA（126）と、Ｆステージでのオ
ペランドプリフェッチ時に、AAバス（107）を通してア
ドレス出力回路（58）からCPU外部にオペランドアドレ
ス値を出力するとき、そのオペランドアドレス保持する
ラッチFA（127）と、につながる。The output of the correction value bus (102) and the output of the DISP bus (100) are input to the displacement selection circuit (122), one of which is input to the address adder (124), and the output of the DISP bus (100) and A The output of the bus (103) is input to the base address selection circuit (123), and one of them is input to the address adder (124). The address adder (124)
Output of displacement selection circuit (122), output of base address selection circuit (123), and A bus (103)
By shifting the value input more, 1 times, 2
Index value generation circuit (12x, 4x, 8x)
The output of 1), three values in total, are input, and three-value addition is performed. The output value of the address adder (124) is output to the AO bus (106) through the address adder output latch (125). The AO bus (106) uses the address output circuit (5) through the AA bus (107) when performing memory indirect addressing.
8) When an address value is output from the CPU to the outside of the CPU, the latch IA (126) that holds the address value and the operand output from the address output circuit (58) to the outside of the CPU through the AA bus (107) during operand prefetch in the F stage When the address value is output, it is connected to the latch FA (127) that holds the operand address.

FA（127）は、アドレス加算器（124）で計算されたオペ
ランドアドレスをＥステージ（35）で使用するためにオ
ペランドアドレス値を保持するラッチSA（141）への出
力経路ももつ。SA（141）はデータ演算部（56）の汎用
データバスであるＳバス（109）への出力経路をもつ。
命令のアドレスを転送するCAバス（104）はPC加算出力
ラッチ（133）と、TPC（134）と、命令フェッチ部（5
1）がプリフェッチする命令コードの番地を管理するカ
ウンタQINPC（115）と、命令フェッチのためのアドレス
をAAバス（107）を通してアドレス出力回路（58）からC
PU外部に出力するときその値を保持するラッチAA（14
2）と、Ｅステージ（35）で分岐やジャンプが起きたと
きに新たな命令番地をＳバス（109）から入力するラッ
チEB（143）と、に結合している。APC（136）はＡバス
（103）と、Ｆステージ（34）で処理中の命令のPC値を
保持するラッチEPC（137）とに出力経路がある。EPC（1
37）はＥステージ（35）で処理中の命令のPC値を保持す
るラッチCPC（138）への出力経路をもつ。CPC（138）は
Ｓバス（109）と分岐履歴書換えのためにPC値の最下位
バイトの値を保持するラッチOPC（139）とに出力経路を
もつ。レジスタファイル（144）は汎用レジスタや作業
用レジスタなどからなり、Ｓバス（109）とＡバス（10
3）への出力経路をもち、Ｄバス（110）から入力経路を
持つ。データ演算部（56）の演算機構であるデータ演算
器（145）はＳバス（109）から入力経路を持ち、Ｄバス
（110）への出力経路を持つ。The FA (127) also has an output path to a latch SA (141) that holds an operand address value in order to use the operand address calculated by the address adder (124) in the E stage (35). The SA (141) has an output path to the S bus (109) which is a general-purpose data bus of the data calculation section (56).
The CA bus (104) that transfers the address of the instruction is the PC addition output latch (133), the TPC (134), and the instruction fetch unit (5
1) A counter QINPC (115) for managing the address of the instruction code prefetched by 1) and an address for the instruction fetch from the address output circuit (58) via the AA bus (107) to the C
Latch AA (14
2) and a latch EB (143) which inputs a new instruction address from the S bus (109) when a branch or jump occurs in the E stage (35). The APC (136) has an output path to the A bus (103) and the latch EPC (137) that holds the PC value of the instruction being processed in the F stage (34). EPC (1
37) has an output path to a latch CPC (138) which holds the PC value of the instruction being processed in the E stage (35). The CPC (138) has an output path to the S bus (109) and a latch OPC (139) that holds the value of the least significant byte of the PC value for branch history rewriting. The register file (144) consists of general-purpose registers and work registers, and has an S bus (109) and A bus (10).
It has an output path to 3) and an input path from the D bus (110). The data calculator (145), which is the calculation mechanism of the data calculator (56), has an input path from the S bus (109) and an output path to the D bus (110).

（4.3）分岐予測方法本発明のデータ処理装置では無条件分岐命令BRA,サブル
ーチン分岐命令BSR,ループ制御命令ACB、３つの命令に
ついては、分岐予測テーブルの出力である分岐予測ビッ
トにかかわらず、必ず分岐すると予測するBRA,BSRに対
してはこの予測は必ず正しい。(4.3) Branch prediction method In the data processing device of the present invention, the unconditional branch instruction BRA, the subroutine branch instruction BSR, the loop control instruction ACB, and the three instructions are always used regardless of the branch prediction bit output from the branch prediction table. This prediction is always correct for BRA and BSR that are predicted to branch.

ACBはループ制御変数に指定された値を加えて、その結
果がループ終了条件を満たすかどうかを判定し、ループ
終了条件を満たさなければ分岐し、満たせば分岐しない
命令である。従って、大多数のソフトウエアではACBに
ついてもこの予測方法はかなりの確率で正しい。また、
ACBに対する本発明のデータ処理装置の特徴的な処理を
意識してソフトウエアを作成すれば意識しない場合より
効率的なプログラムを作成することが可能である。The ACB is an instruction that adds a specified value to a loop control variable and determines whether or not the result satisfies a loop end condition, branches if the loop end condition is not satisfied, and does not branch if the loop end condition is satisfied. Therefore, in most software, this prediction method is correct for ACB with a considerable probability. Also,
If the software is created in consideration of the characteristic processing of the data processing apparatus of the present invention for the ACB, it is possible to create a more efficient program than when the software is not created.

条件分岐命令Bccについては分岐するかしないかを過去
の履歴に従って判断する。履歴はBcc命令の１つ前に実
行した命令のアドレスの下位８ビットのアドレスをもと
に行う。分岐予測は過去１回の分岐履歴のみに従い、１
ビットで示される。Whether or not the conditional branch instruction Bcc is branched is determined according to the past history. The history is recorded based on the lower 8 bits of the address of the instruction executed immediately before the Bcc instruction. Branch prediction follows 1 branch history only
Indicated in bits.

（4.4）分岐予測テーブルの構成第４図に分岐予測テーブル（113）の詳細を示す。POバ
ス（105）からの入力７ビットとOPC（139）からの入力
７ビットはセレクタ（151）を通して、デコーダ（152）
に入力される。デコーダ（152）では７ビットを128ビッ
トにデコードして128ビットの分岐履歴ラッチ（153）の
うち１つを分岐予測値として分岐予測出力ラッチ（15
4）に出力する。128ビットの分岐履歴ラッチ（153）は
クリア信号（157）が入力されると一斉に値をゼロにし
て「分岐しない」を示す。分岐側出力ラッチ（154）は
予測反転回路（155）によりその内容を反転して分岐予
測更新ラッチ（156）に結合されている。(4.4) Configuration of Branch Prediction Table FIG. 4 shows the details of the branch prediction table (113). The 7-bit input from the PO bus (105) and the 7-bit input from the OPC (139) are passed through the selector (151) to the decoder (152).
Entered in. In the decoder (152), 7 bits are decoded into 128 bits, and one of the 128-bit branch history latches (153) is used as a branch prediction value.
Output to 4). When the clear signal (157) is input, the 128-bit branch history latch (153) simultaneously sets the value to zero, indicating "no branch". The branch side output latch (154) has its contents inverted by a prediction inversion circuit (155) and is connected to a branch prediction update latch (156).

本発明のデータ処理装置では、Ｄステージ（32）でデコ
ードしようとする命令の１つ前にＤステージ（32）でデ
コードされた命令のアドレスの下位８ビットをもとに分
岐予測テーブル（113）を引いて分岐予測を行う。分岐
予測は過去１回の履歴のみに従ったダイレクトマッピン
グ方式で登録されている。本発明のデータ処理装置では
命令アドレスの最下位ビット（右側のビット）は必ずゼ
ロであるため分岐予測テーブル128ビットで構成されて
いる。In the data processor of the present invention, the branch prediction table (113) is based on the lower 8 bits of the address of the instruction decoded in the D stage (32) immediately before the instruction to be decoded in the D stage (32). To predict branching. The branch prediction is registered by the direct mapping method according to only the past history. In the data processor of the present invention, the least significant bit (right-side bit) of the instruction address is always zero, and thus the branch prediction table is composed of 128 bits.

分岐予測ビットを有効に使用するのはBcc命令をデコー
ドするときのみであるが、分岐予測ビットは、使用する
かどうかにかかわらず、すべての命令の命令コードと共
に命令デコーダに入力する。このため分岐予測テーブル
（113）の参照は、デコードしようとする命令の１つ前
の命令がデコードされているときPC加算器（132）から
出力されてくる１つ前の命令のPC値の下位１バイト（最
下位ビットは不要）で行う。これにより、分岐予測ビッ
トは、次のＤステージ処理の最初までに命令デコーダ
（111）に入力される。The branch prediction bit is effectively used only when decoding the Bcc instruction, but the branch prediction bit is input to the instruction decoder together with the instruction codes of all instructions regardless of whether or not it is used. Therefore, the reference to the branch prediction table (113) is lower than the PC value of the immediately preceding instruction output from the PC adder (132) when the instruction immediately before the instruction to be decoded is being decoded. It is performed in 1 byte (the least significant bit is unnecessary). As a result, the branch prediction bit is input to the instruction decoder (111) by the beginning of the next D stage processing.

分岐予測テーブル（113）の分岐履歴はクリア信号（15
7）により初期値をすべて「分岐しない」とできる。分
岐予測の更新はBcc命令がＥステージ（35）で分岐した
ときに行われる。Bcc命令がＥステージ（35）で分岐を
起こしたとき、それはＤステージ（32）での分岐予測が
間違っていたこを意味する。このときＥステージ（35）
で分岐予測の更新（間違っていた分岐履歴の反転）が行
われる。Ｅステージ（35）ではOPC（139）の内容をデコ
ーダ（152）に転送し、そのデコード結果で対応する分
岐履歴ラッチ（153）の内容を分岐予測出力ラッチ（15
4）に読み出す。次に、分岐予測出力ラッチ（154）の内
容が反転された分岐予測更新ラッチ（156）の内容を、
同じくOPC（139）の値で示された分岐履歴ラッチ（15
3）に書き戻す。The branch history of the branch prediction table (113) is clear signal (15
By 7), all initial values can be set to "not branch". The branch prediction is updated when the Bcc instruction branches at the E stage (35). When the Bcc instruction causes a branch at the E stage (35), it means that the branch prediction at the D stage (32) was incorrect. At this time, E stage (35)
Updates the branch prediction (inverts the wrong branch history). In the E stage (35), the contents of the OPC (139) are transferred to the decoder (152), and the contents of the branch history latch (153) corresponding to the decoding result are transferred to the branch prediction output latch (15).
Read to 4). Next, the contents of the branch prediction update latch (156) in which the contents of the branch prediction output latch (154) are inverted are
Branch history latch (15), also indicated by the value of OPC (139)
Write back to 3).

分岐予測は対象をなるBcc命令がデコードされる１つ前
にデコードされた命令のPC値をもとに行われるため分岐
予測テーブル（113）の更新もＥステージ（35）でBcc命
令の１つ前に実行された命令のPC値をもとに行う。この
ためＥステージ（35）では現在実行中の命令の１つ前に
実行した命令のPC値の下位１バイト（最下位ビットは不
要）を記憶しておくOPC（139）があり、分岐予測テーブ
ル（113）の更新はこの値を用いて行う。分岐履歴の更
新はＥステージ（35）でBcc命令が分岐を起こしたとき
だけしか行われないため、Ｄステージ（32）の分岐予測
テーブル（113）の参照動作がＥステージ（35）の更新
に妨げられることはない。Ｅステージ（35）で分岐が起
きた直後はＤステージ（32）はIFステージ（31）からの
命令コード待ち状態となる。分岐履歴の書換えは、この
命令コード待ち状態の間に行われる。Since the branch prediction is performed based on the PC value of the decoded instruction before the target Bcc instruction is decoded, the update of the branch prediction table (113) is also one of the Bcc instructions at the E stage (35). Based on the PC value of the previously executed instruction. For this reason, the E stage (35) has an OPC (139) that stores the lower 1 byte (the least significant bit is unnecessary) of the PC value of the instruction executed immediately before the currently executing instruction, and the branch prediction table This value is used to update (113). Since the branch history is updated only when the Bcc instruction causes a branch in the E stage (35), the reference operation of the branch prediction table (113) in the D stage (32) is updated in the E stage (35). There is no hindrance. Immediately after the branch occurs in the E stage (35), the D stage (32) waits for the instruction code from the IF stage (31). Rewriting of the branch history is performed during this instruction code waiting state.

（4.5）PC計算部の動作 PC計算部はＤステージ（32）で命令コードがデコードさ
れるとき、１つ前にデコードされた命令コードの長さ情
報とその１つ前にデコードされた命令コードの先頭番地
とからデコード中の命令コードの先頭番地を計算する。
PC計算部ではDPC（135）に命令の切れ目のアドレスであ
る命令のPC値を保持し、TPC（134）にステップコードの
切れ目のアドレスを管理する。DPC（135）は命令の切れ
目のアドレスが計算されたときだけ書き換えられる。TP
C（134）はステップコードの切れ目のアドレス、つまり
命令デコード処理ごとに書き換えられる。パイプライン
上で処理されるステップコードのPC値はそのステップコ
ードのもとになった命令のPC値が必要であるため、DPC
（135）の値がAPC（136）、EPC（137）、CPC（138）と
転送されていく。(4.5) Operation of PC calculation unit When the PC calculation unit decodes the instruction code in the D stage (32), the length information of the instruction code that was previously decoded and the instruction code that was decoded immediately before that The start address of the instruction code being decoded is calculated from the start address of.
The PC calculator holds the PC value of the instruction, which is the address of the instruction break, in the DPC (135), and manages the address of the step code break in the TPC (134). The DPC (135) is rewritten only when the instruction break address is calculated. TP
C (134) is rewritten at the address of the break of the step code, that is, every instruction decoding process. Since the PC value of the step code processed in the pipeline requires the PC value of the instruction that caused the step code, DPC
The value of (135) is transferred to APC (136), EPC (137), and CPC (138).

命令のデコードは（3.1.2）の節で述べたようにステッ
プコード単位に行われ、１回のデコード処理で０〜６バ
イトの命令コードが消費される。命令デコード処理ごと
に判明したそのとき使用した命令コードの長さが命令デ
コーダ（111）から命令長バス（101）に出力される。Instruction decoding is performed in step code units as described in section (3.1.2), and 0 to 6 byte instruction codes are consumed in one decoding process. The length of the instruction code used at that time, which is found for each instruction decoding process, is output from the instruction decoder (111) to the instruction length bus (101).

プリブランチしない場合、Ｄステージ（32）は引き続く
次の命令のデコード処理を行うと同時に、PC計算部（5
3）で引き続く次の命令のPC値を計算するため、TPC（13
4）の値と命令長バス（101）から転送されたデコードで
消費した命令コードの長さとの加算を行いTPC（134）に
加算結果を書き戻す。つまり、あるステップコードの先
頭アドレスはそのステップコードがデコード処理により
生成されたときに計算されるのである。プリブランチ以
外ではデコードすべき命令コードは命令キュー（112）
から次々と出力されるため、デコード開始段階でそのコ
ードの先頭アドレスを知る必要はない。Ｄステージ（3
2）で生成したステップコードが命令Ａの最後のステッ
プコードであるとき、次の命令Ｂのデコード処理中に計
算されるPC加算器（132）の出力は、命令Ｂの先頭番地
であり、命令ＢのPC値であるから、PC加算器（132）の
出力である命令ＢのPC値はPOバス（105）からTPC（13
4）とDPC（135）の両方に書き込まれる。さらにこのと
きＡステージ（33）が入力コード待ちで、APC（136）が
至急必要とされているなら、POバス（105）からAPC（13
6）にも命令ＢのPC値が書き込まれる。If the pre-branch is not performed, the D stage (32) decodes the next succeeding instruction, and at the same time, the PC calculator (5
To calculate the PC value of the next instruction that follows in 3), use TPC (13
The value in 4) is added to the length of the instruction code consumed by decoding transferred from the instruction length bus (101), and the addition result is written back to the TPC (134). That is, the start address of a step code is calculated when the step code is generated by the decoding process. The instruction code to be decoded except the pre-branch is the instruction queue (112)
It is not necessary to know the start address of the code at the decoding start stage, since it is output one after another. D stage (3
When the step code generated in 2) is the last step code of the instruction A, the output of the PC adder (132) calculated during the decoding process of the next instruction B is the start address of the instruction B. Since it is the PC value of B, the PC value of the instruction B output from the PC adder (132) is TPC (13) from the PO bus (105).
4) and written to both DPC (135). At this time, if the A stage (33) is waiting for an input code and the APC (136) is urgently needed, the APC (13) from the PO bus (105).
The PC value of instruction B is also written in 6).

プリブランチする場合、Ｄステージ（32）はプリブラン
チ命令の最後のステップコードを出力した後、命令デコ
ーダ（111）の処理を止め、分岐先命令のPC値を計算す
るため、DPC（135）の値とDISPバス（100）より転送さ
れた分岐変位の加算を行う。さらに、IFステージ（31）
に初期化指示を出し、加算結果である分岐命令のPC値を
TPC（134）とDPC（135）に書き込むと共に、CAバス（10
4）にも出力してQINPC（115）,CAA（142）にも書き込
む。When pre-branching, the D stage (32) outputs the last step code of the pre-branch instruction, then stops the processing of the instruction decoder (111) and calculates the PC value of the branch destination instruction. The value and the branch displacement transferred from the DISP bus (100) are added. Furthermore, IF stage (31)
The initialization instruction to the PC value of the branch instruction
While writing to TPC (134) and DPC (135), CA bus (10
Also output to 4) and write to QINPC (115), CAA (142).

プリブランチによる分岐先命令アドレス計算の際、奇数
アドレスジャンプトラップの検出も行ない、Ｄコード
（41）中にその結果をパラメータとして示す。Ｅステー
ジ（35）ではプリブランチが正しいと判明した時に、奇
数アドレスジャンプトラップを起動する。プリブランチ
が間違っていて、再びＥステージ（35）で分岐が生じた
ときはプリブランチで検出した奇数アドレスジャンプト
ラップは無視される。このため、Ｄステージ（32）で検
出された奇数アドレスジャンプトラップはその他のEIT
とは別扱いとなっている。また、Ｅステージ（35）では
奇数アドレスジャンプトラップの起動処理に奇数となっ
た命令アドレスの値を必要とする。このため、Ｄステー
ジ（32）は奇数アドレスジャンプトラップの検出を行っ
たとき、その奇数アドレス値をPC値とした特殊はステッ
プコード（OAJTステップコード）を発生するOAJTステッ
プコードに対してＡステージ（33）、Ｆステージ（34）
はそのコードを次のステージに伝える。Ｅステージ（3
5）はプリブランチが正しいと判断し、しかもそのプリ
ブランチが奇数アドレスジャンプトラップを検出してい
るとき、CPC（138）を通して次に転送されてくるOAJTス
テップコードのPC値を使用して奇数アドレスジャンプト
ラップの起動処理を行う。When calculating the branch destination instruction address by the pre-branch, an odd address jump trap is also detected, and the result is shown as a parameter in the D code (41). In the E stage (35), the odd address jump trap is activated when the pre-branch is found to be correct. When the pre-branch is wrong and the branch occurs again at the E stage (35), the odd address jump trap detected in the pre-branch is ignored. Therefore, the odd address jump trap detected at the D stage (32) is not
It is treated differently from. Further, in the E stage (35), the value of the odd instruction address is required for the activation processing of the odd address jump trap. Therefore, when the D stage (32) detects an odd address jump trap, a special step code (OAJT step code) with the odd address value as a PC value is generated. 33), F stage (34)
Tells the code to the next stage. E stage (3
5) determines that the pre-branch is correct, and when the pre-branch detects an odd address jump trap, it uses the PC value of the OAJT step code that is transferred next through CPC (138) to detect the odd address. Performs jump trap startup processing.

Ｅステージ（35）で分岐が生じたときは分岐先アドレス
がEB（143）からCAバス（104）を通じてTPC（134）に転
送に転送されてくる。PC計算部（53）はこの値とゼロを
加算して結果をPOバス（105）からTPC（134）、DPC（13
5）に書き込む。これでPC計算部（53）の初期化が完了
する。この初期化の処理はＥステージ（35）で分岐が生
じた最初の単位デコード処理とオーバーラップしてなさ
れる。なお、QINPC（115）,CAA（142）にはCAバス（10
4）よりTPC（134）に値を取り込む際に同じ値がセット
される。When a branch occurs at the E stage (35), the branch destination address is transferred from the EB (143) to the TPC (134) via the CA bus (104). The PC calculation unit (53) adds this value and zero and outputs the result from the PO bus (105) to the TPC (134) and DPC (13
Write in 5). This completes the initialization of the PC calculator (53). This initialization processing overlaps with the first unit decoding processing in which the branch occurs in the E stage (35). For QINPC (115) and CAA (142), the CA bus (10
The same value is set when the value is taken into TPC (134) from 4).

（4.7）プリブランチ命令に対するオペランドアドレス
計算部の動作プリブランチ命令に対して、Ｄステージ（32）がプリブ
ランチ処理を行わなかった場合、オペランドアドレス計
算部（54）がプリブランチ命令の分岐先アドレスを計算
する。分岐先アドレスの計算は、Ａバス（103）より転
送されてくるAPC（136）の値とDISPバス（100）より転
送されてくる分岐変位値をアドレス加算器（124）で加
算することにより行われる。計算された分岐先アドレス
はＥステージ（35）に伝えられる。Ａステージ（33）
で、オペランドアドレス計算部（54）を使用した分岐先
アドレスの計算の際は、奇数アドレスジャンプトラップ
の検出は行わない。Ｅステージ（35）に転送される分岐
先アドレスが奇数であることにより、奇数アドレスジャ
ンプトラップの情報が伝えられるのである＞Ｄステージ（32）がプリブランチ処理をした場合、Bcc
命令、ACB命令に対しては、Ａステージ（33）がそのプ
リブランチ命令に引き続くアドレスにある次の命令のPC
値を計算する。計算結果はＥステージ（35）に伝えら
れ、プリブランチが間違っていたときの再度の分岐先ア
ドレスとして使用される。Bcc命令など、Ｄステージ（3
2）で１ステップコードにデコードされる命令に対して
は、Ａバス（104）より転送されてくるAPC（136）の値
に補正値バス（102）から転送されてくるBcc命令の命令
長を加算して、加算結果をAOバス（106）よりFA（127）
に書き込む。ステップコードが２つ以上に分かれるフォ
ーマットをもつACB命令に対してはDISPバス（100）より
転送れてくる最後のステップコードの先頭アドレスであ
るTPC（134）の値と補正値バス（102）から転送されて
くる最後のステップコードのデコードで使用した命令コ
ードの長さを加算して、加算結果をAOバス（106）よりF
A（127）に書き込む。(4.7) Operation of operand address calculation unit for pre-branch instruction If the D stage (32) does not perform pre-branch processing for the pre-branch instruction, the operand address calculation unit (54) causes the branch destination address of the pre-branch instruction. To calculate. The branch destination address is calculated by adding the value of the APC (136) transferred from the A bus (103) and the branch displacement value transferred from the DISP bus (100) by the address adder (124). Be seen. The calculated branch destination address is transmitted to the E stage (35). A stage (33)
In the calculation of the branch destination address using the operand address calculation unit (54), the odd address jump trap is not detected. Odd address jump trap information is transmitted because the branch destination address transferred to the E stage (35) is odd.> If the D stage (32) performs pre-branch processing, Bcc
For instructions and ACB instructions, the A stage (33) PC of the next instruction at the address following the pre-branch instruction
Calculate the value. The calculation result is transmitted to the E stage (35) and used as a branch destination address again when the pre-branch is wrong. D stage (3
For the instruction decoded into the 1-step code in 2), the instruction length of the Bcc instruction transferred from the correction value bus (102) is added to the value of APC (136) transferred from the A bus (104). Add and add the result from AO bus (106) to FA (127)
Write in. For the ACB instruction that has a format in which the step code is divided into two or more, from the value of TPC (134) which is the start address of the last step code transferred from the DISP bus (100) and the correction value bus (102) Add the length of the instruction code used for decoding the last transferred step code, and add the result from the AO bus (106) to F
Write to A (127).

BSR命令についてはプリブランチは必ず正しいわけであ
るが、リターンアドレスとしてBSR命令のアドレスが必
要なためオペランドアドレス計算部（54）でアドレス計
算を行う。BSR命令のフォーマットは第33図に示す。第3
3図で＃dsはBSRの分岐変位を32ビットの２進数で指定す
るフィールドであるBSR Dステージ（32）で１ステップ
コードにデコードされる命令であり、Bccと同様にＡバ
ス（103）より転送されてくるAPC（136）の値と補正値
バス（102）より転送されてくるBSRの命令長との加算を
行う。また、BSRの命令に対するリターンアドレス計算
の手法はTRAP（無条件トラップ）命令とTRAP/cccc（条
件トラップ）命令でも使用される。The pre-branch is always correct for the BSR instruction, but since the address of the BSR instruction is required as the return address, the operand address calculation unit (54) calculates the address. The format of the BSR instruction is shown in FIG. number 3
In Figure 3, #ds is an instruction that is decoded into one step code in the BSR D stage (32), which is a field that specifies the branch displacement of BSR with a 32-bit binary number, and is the same as Bcc from the A bus (103). The value of the transferred APC (136) and the instruction length of the BSR transferred from the correction value bus (102) are added. The return address calculation method for BSR instructions is also used for TRAP (unconditional trap) instructions and TRAP / cccc (condition trap) instructions.

TRAPA命令、TRAP/cccc命令もＤステージ（32）で１ステ
ップコードにデコードされる命令あり、Bccと同様にア
ドレッシングモード指定フィールドを待たず、オペラン
ドアドレス計算部（54）はこれらの命令のオペランドア
ドレスの計算は行わない。TRAPA命令とTRAP/cccc命令の
フォーマットは第34図に示す。第34図で（301）はTRAPA
命令のフォーマット、（302）はTRAP/cccc命令のフォー
マットである。第34図で＃d4はTRAPA命令のベクトル値
指定フィールドであり、cccc（303）はトップ条件指定
フィールドであるTRAPA、TRAP/ccccではオペランドアド
レスの計算を行わないかわりに、これらの命令のPC値で
あるAPC（136）と補正値バス（102）より転送されてく
るこれらの命令の命令長との加算を行う。The TRAPA instruction and the TRAP / cccc instruction are also instructions that are decoded into a one-step code in the D stage (32), and the operand address calculation unit (54) does not wait for the addressing mode specification field like Bcc, and the operand address calculator (54) Is not calculated. The formats of the TRAPA instruction and TRAP / cccc instruction are shown in FIG. In Figure 34, (301) is TRAPA.
Instruction format, (302) is the format of the TRAP / cccc instruction. In Figure 34, # d4 is the vector value specification field of the TRAPA instruction, cccc (303) is the top condition specification field TRAPA, and TRAP / cccc does not calculate the operand address, but the PC value of these instructions. The APC (136) and the instruction length of these instructions transferred from the correction value bus (102) are added.

（4.8）各分岐命令の処理方法の詳細本発明のデータ処理装置がプリブランチを行う命令につ
いてここでまとめる。(4.8) Details of Processing Method for Each Branch Instruction The instructions for which the data processing device of the present invention performs pre-branching are summarized here.

（4.8.1）BRA命令 BRA命令は無条件分岐命令であり、実行されると必ず分
岐を起こす。(4.8.1) BRA instruction The BRA instruction is an unconditional branch instruction and always causes a branch when executed.

BRA命令は必ず分岐を起こすためＤステージ（32）では
分岐予測ビットによらず必ず分岐すると判断してプリブ
ランチ処理をする。Ａステージ（33）、Ｆステージ（3
4）ではBRA命令はそのまた転送され、EIT検出があった
かどうかを示すフラッグとPC値だけがＥステージ（35）
に転送されていく。Ｅステージ（35）に転送されてい
く。Ｅステージ（35）ではBRAに対して分岐処理しな
い。Since the BRA instruction always causes a branch, the D stage (32) determines that the branch always occurs regardless of the branch prediction bit and performs pre-branch processing. A stage (33), F stage (3
In 4), the BRA instruction is transferred again, and only the flag indicating whether EIT is detected and the PC value are in the E stage (35).
Will be transferred to. Transferred to the E stage (35). In E stage (35), BRA is not branched.

（4.8.2）BSR命令 BSR命令はサブルーチン分岐命令であり、実行されるとB
SRの次のアドレスにある命令のPC値をスタックにプッシ
ュして、必ず分岐を起こす。命令フォーマットは第33図
に示されている。(4.8.2) BSR instruction The BSR instruction is a subroutine branch instruction, and when it is executed, B
Pushes the PC value of the instruction at the address next to SR onto the stack and always causes a branch. The instruction format is shown in FIG.

BSR命令は必ず分岐を起こすためＤステージ（32）では
分岐予測ビットによらず必ず分岐すると判断してプリブ
ランチ処理をする。Ａステージ（33）ではAPC（136）と
BSRの命令長を加算してサブルーチンからのリターンア
ドレスを計算する。計算されたリターンアドレスはBSR
のオペランドとしてＥステージ（35）に渡される。Ｅス
テージ（35）ではBSR命令に対してリターンアドレスを
スタックにプッシュし、分岐処理しない。Since the BSR instruction always causes a branch, in the D stage (32), it is determined that the branch always occurs regardless of the branch prediction bit and the pre-branch processing is performed. In A stage (33) with APC (136)
Calculate the return address from the subroutine by adding the BSR instruction length. The calculated return address is BSR
Is passed to the E stage (35) as an operand of. At the E stage (35), the return address for the BSR instruction is pushed onto the stack and branch processing is not performed.

（4.8.3）Bcc命令 Bcc命令は条件分岐命令で、命令フォーマットは第35図
に示す。分岐条件ccc（304）は４ビットのフォーマット
である。分岐条件は第35図の分岐条件cccc（304）の最
下位ビットが「０」か「１」かにより、分岐条件を正反
対にするようになっている。＃dsは分岐変位を32ビット
の２進数で指定するフィールドである。(4.8.3) Bcc instruction The Bcc instruction is a conditional branch instruction, and the instruction format is shown in FIG. The branch condition ccc (304) has a 4-bit format. As for the branch condition, the branch condition is set to the opposite depending on whether the least significant bit of the branch condition cccc (304) in FIG. 35 is "0" or "1". #Ds is a field that specifies the branch displacement by a 32-bit binary number.

Bcc命令は分岐確率は過去の実行履歴にかなり依存する
ため、Ｄステージ（32）では分岐予測テーブル（113）
から出力される分岐予測ビットの値に従い、分岐するか
どうかを判断する。Bcc命令の分岐確率の実行履歴依存
性についても上記のJ.K.F.Lee,A.J.Smith,「Branch Pre
diction Strategies and Branch Target Buffer Desig
h」,IEEE Computer,Vo1.17,No.1,January,1984.で詳し
く述べれれている。Since the branch probability of the Bcc instruction greatly depends on the past execution history, the branch prediction table (113) is used in the D stage (32).
It is determined whether to branch according to the value of the branch prediction bit output from. Regarding the execution history dependency of the branch probability of the Bcc instruction, the above JKFLee, AJ Smith, "Branch Pre
diction Strategies and Branch Target Buffer Desig
h ”, IEEE Computer, Vo 1.17, No. 1, January, 1984.

分岐予測ビットが「分岐する」を示している場合にはＤ
ステージ（32）でプリブランチ処理を行う。プリブラン
チが行われた場合には、分岐条件生成回路（114）で第3
5図の分岐条件cccc（304）の最下位ビットが反転されて
Ｅステージ（35）に渡されるため、Ｅステージ（35）で
はＤステージ（32）でプリブランチ処理が行われたかど
うかにかかわらず、渡された分岐条件に従いBcc命令を
実行すればよい。もしＥステージ（35）でBcc命令が分
岐を起こした場合には、Ｄステージ（32）での分岐予測
が誤っていたわけであるから、分岐予測テーブル（11
3）をアクセスし、OPC（139）で示される場所の分岐予
測履歴の反転を行う。分岐履歴の更新はＥステージ（3
5）でBcc命令が分岐を起こしたときだけしか行われない
ため、Ｄステージ（32）の分岐予測テーブル（113）の
参照動作がＥステージ（35）の更新に妨げられることは
ない。Ｅステージ（35）で分岐が起きた直後はＤステー
ジ（32）はIFステージ（31）からの命令コード待ち状態
となる。分岐履歴の書換えは、この命令コード待ち状態
の間に行われる。D if the branch prediction bit indicates "branch"
Perform pre-branch processing in stage (32). When pre-branching is performed, the branch condition generation circuit (114) performs the third branch.
Since the least significant bit of the branch condition cccc (304) in Fig. 5 is inverted and passed to the E stage (35), regardless of whether pre-branch processing was performed in the D stage (32) in the E stage (35). , Bcc instruction should be executed according to the passed branch condition. If the Bcc instruction causes a branch at the E stage (35), it means that the branch prediction at the D stage (32) was incorrect, so the branch prediction table (11
3) is accessed to invert the branch prediction history at the location indicated by OPC (139). The branch history is updated at the E stage (3
Since it is executed only when the Bcc instruction causes a branch in 5), the reference operation of the branch prediction table (113) of the D stage (32) is not hindered by the update of the E stage (35). Immediately after the branch occurs in the E stage (35), the D stage (32) waits for the instruction code from the IF stage (31). Rewriting of the branch history is performed during this instruction code waiting state.

Bcc命令がプリブランチ時に奇数アドレスジャンプトラ
ップを検出しており、Ｅステージ（35）で分岐を起こさ
なかったときは奇数アドレスジャンプトラップを起動す
る。Bcc命令がプリブランチ時に奇数アドレスジャンプ
トラップを検出していてもＥステージ（35）で再び分岐
を起こしたときはプリブランチ時の奇数アドレスジャン
プトラップ検出は無視される。この機能により分岐処理
を行わないBcc命令の実行により、奇数アドレスジャン
プトラップが検出されることはなくなる。If the Bcc instruction detects an odd address jump trap at the pre-branch and the E stage (35) does not cause a branch, the odd address jump trap is activated. Even if the Bcc instruction detects an odd address jump trap during the pre-branch, the detection of the odd address jump trap during the pre-branch is ignored when the branch occurs again at the E stage (35). Owing to this function, the odd address jump trap will not be detected by executing the Bcc instruction without branch processing.

（4.8.4）ACB命令 ACB命令はループのプリミティブとして使用される命令
である。ACBはループ制御変数を増加して、比較し、条
件ジャンプを行う命令である。ACBのフォーマットは第3
6図に示す。第36図でEaRは一般形のアドレッシングモー
ドでループ制御変数に加算する値を指定するフィール
ド、EaRXは一般形アドレッシングモードでループ制御変
数の比較対象値を指定するフィールド、RgMXはループ制
御変数の存在する汎用レジスタ番号を指定するフィール
ド、＃ds8は分岐変位を８ビットの２進数で指定するフ
ィールドである。ACBはＤステージ（32）で３ステップ
コード以上に分解されてパイプライン上を流れる命令で
ある。(4.8.4) ACB instruction The ACB instruction is an instruction used as a primitive of a loop. ACB is an instruction that increases the loop control variables, compares them, and performs a conditional jump. ACB format is third
Shown in Figure 6. In Fig. 36, EaR is a field that specifies the value to be added to the loop control variable in the general form addressing mode, EaRX is a field that specifies the comparison target value of the loop control variable in the general form addressing mode, and RgMX is the existence of the loop control variable. # Ds8 is a field for designating a general-purpose register number to be stored, and # ds8 is a field for designating a branch displacement by an 8-bit binary number. ACB is an instruction that is decomposed into 3 step codes or more in the D stage (32) and flows on the pipeline.

ACB命令は分岐する確率が高いため本発明のデータ処理
装置ではこの命令に対して分岐予測ビットにかかわら
ず、分岐すると判断してプリブランチ処理を行う。Since the ACB instruction has a high probability of branching, the data processing apparatus of the present invention performs pre-branch processing on this instruction regardless of the branch prediction bit, judging that the instruction will branch.

この命令はステップコードが３つ以上（多段間接アドレ
ッシングモードが含まない場合に３つ）になるため、プ
リブランチ処理を最後のステップコードをＤステージ
（32）が出力するとき行う。Ｄステージ（32）ではACB
のPC値であるDPC（135）の内容と命令デコーダ（111）
からDISPバス（100）を通して出力される分岐変位を加
算することにより、プリブランチ処理を行う。Ａステー
ジ（33）ではプリブランチが間違っていたときにそな
え、ACB命令の次のアドレス命令のPC値を計算すると
き、TPC（134）からDISPバス（100）を通して転送され
た最後のステップコードのデコードに使用した命令コー
ドの先頭番地と、補正値バス（102）を通して転送され
た最後のステップコードのデコードに使用した命令コー
ドの長さを加算する。Since this instruction has three or more step codes (three if the multi-stage indirect addressing mode is not included), pre-branching processing is performed when the final step code is output from the D stage (32). ACB on D stage (32)
Contents of DPC (135) which is the PC value of and instruction decoder (111)
The pre-branch processing is performed by adding the branch displacement output from the device through the DISP bus (100). In the A stage (33), when the pre-branch is wrong, when calculating the PC value of the address instruction next to the ACB instruction, the last step code transferred from the TPC (134) through the DISP bus (100) The start address of the instruction code used for decoding and the length of the instruction code used for decoding the last step code transferred through the correction value bus (102) are added.

この命令に対してはＤステージ（32）で必ずプリブラン
チが行われているので、Ｅステージ（35）では、分岐条
件の判断を常に逆に行う。プリブランチ処理が誤ってい
た場合、Ｅステージ（35）で分岐が起こる。しかし、こ
の命令は分岐予測テーブル（113）に従ってプリブラン
チをするものではないので、プリブランチが間違ってい
た場合でも分岐履歴の書換えは行わない。Since pre-branching is always performed for this instruction in the D stage (32), the judgment of the branch condition is always reversed in the E stage (35). If the pre-branch processing is incorrect, a branch occurs at the E stage (35). However, since this instruction does not pre-branch according to the branch prediction table (113), the branch history is not rewritten even if the pre-branch is wrong.

またはこの命令に対してＤステージ（32）でのプリブラ
ンチ時に奇数アドレスジャンプ例外が検出されたときそ
の検出はBcc命令と同様にパラメータでＥステージ（3
5）に伝えられる。Ｅステージ（35）に伝えられた奇数
アドレスジャンプトラップはやはりBcc命令と同様にＥ
ステージ（35）で分岐が行われたときには起動されず、
分岐が行われなかったときに起動される。この機能によ
り分岐処理を行わないACB命令の実行により、奇数アド
レスジャンプトラップが検出されることはなくなる。Or, if an odd address jump exception is detected during pre-branching in the D stage (32) for this instruction, the detection is performed with the E stage (3
5). The odd address jump trap transmitted to the E stage (35) is still E as with the Bcc instruction.
It will not be activated when a branch is taken at stage (35),
Fires when no branch is taken. With this function, the odd address jump trap will not be detected by executing the ACB instruction without branch processing.

（５）本発明の他の実施例上記の実施例では、デコーダ（111）からPC計算部（5
3）やオペランドアドレス計算部（54）に、命令デコー
ドに使用した命令コードの長さを転送するために、補正
値バス（102）と命令長バス（101）の２つのバス用いて
いるが、例えば補正値バス（102）からPC計算部（53）
への入力経路を設けて、命令長バス（101）を廃止して
もよい。(5) Other Embodiments of the Present Invention In the above embodiments, the decoder (111) to the PC calculator (5
In order to transfer the length of the instruction code used for instruction decoding to 3) and the operand address calculation unit (54), two buses, a correction value bus (102) and an instruction length bus (101), are used. For example, correction value bus (102) to PC calculation unit (53)
The instruction length bus (101) may be eliminated by providing an input path to the.

また、上述の実施例ではACB命令のプリブランチ処理でT
PC（134）の値をDISPバス（102）を通して、オペランド
アドレス計算部（54）に転送する例を述べたが、このTP
C（134）の値の転送はＡバス（103）で行ってもよい。Further, in the above-described embodiment, T
The example of transferring the value of PC (134) to the operand address calculator (54) through the DISP bus (102) was described.
The value of C (134) may be transferred by the A bus (103).

〔The invention's effect〕

本発明のデータ処理装置では上記のように１ステップコ
ードで処理されるBRA命令、BSR命令、Bcc命令に対して
も複数のステップコードになるACB命令に対してもＤス
テージ（32）で分岐処理を行うので、多くの分岐命令に
対してパイプライン処理の乱れを少なくできる。In the data processor of the present invention, the BRA instruction, the BSR instruction, the Bcc instruction which are processed by the one-step code as described above, and the ACB instruction having a plurality of step codes are branched at the D stage (32). Therefore, the disturbance of pipeline processing can be reduced for many branch instructions.

第７図にプリブランチを行う本発明のデータ処理装置で
プリブランチ命令が実行された場合に、パイプライン中
を流れる命令の様子を示す。第７図では命令３及び命令
12は分岐命令であり、本発明のデータ処理装置のプリブ
ランチ処理の対象になる命令である。FIG. 7 shows a state of an instruction flowing in the pipeline when a pre-branch instruction is executed in the data processing device of the present invention which performs pre-branch. In FIG. 7, command 3 and command
Reference numeral 12 is a branch instruction, which is a target of pre-branch processing of the data processing device of the present invention.

命令３がＤステージ（32）でデコードされ、プリブラン
チすると判断されると、Ｄステージ（32）では次にPC計
算部（53）で分岐先命令のPC値を計算する。次に分岐先
命令がIFステージ（31）によりフェッチされ、パイプラ
イン処理対象が早期に命令11に切り替わる。命令４は処
理キャンセルされる。Ｄステージ（32）、IFステージ
（31）がプリブランチ処理を行っている間もパイプライ
ン上を先行する命令１や命令２は処理を続行する。結
果、命令３がＥステージ（35）で処理されてから２命令
処理分の時間後に命令11がＥステージ（35）で処理され
る。これは、第６図に示すように、プリブランチ処理を
行わない従来のデータ処理装置で無駄時間が４命令処理
分であったのに比べ、本発明のデータ処理装置が無駄時
間を半減していることを意味する。When the instruction 3 is decoded in the D stage (32) and it is determined to be pre-branched, the PC value of the branch destination instruction is calculated in the PC calculation unit (53) in the D stage (32). Next, the branch target instruction is fetched by the IF stage (31), and the pipeline processing target is switched to the instruction 11 early. Command 4 is canceled. While the D stage (32) and the IF stage (31) are performing the pre-branch processing, the instruction 1 and the instruction 2 preceding the pipeline continue processing. As a result, after the instruction 3 is processed in the E stage (35), the instruction 11 is processed in the E stage (35) two time after the processing. This is because the data processing apparatus of the present invention reduces the dead time by half as compared with the conventional data processing apparatus which does not perform the pre-branch processing as shown in FIG. Means that

このようにプリブランチはデータ処理装置の高速化に非
常に有効な技術であり、なるべく多くの分岐命令に対し
てプリブランチを行うことが重要である。本発明では、
PC計算部（54）とオペランドアドレス計算部（54）にわ
ずかなハードウエアの追加だけで、１ステップコードで
処理されるBRA命令、Bcc命令に対しても複数ステップコ
ードになるACB命令に対してもプリブランチ処理を行う
ことを可能にし、処理速度が大幅に高速化されたデータ
処理装置を得ている。As described above, pre-branching is a very effective technique for increasing the speed of the data processing device, and it is important to perform pre-branching for as many branch instructions as possible. In the present invention,
By adding a little hardware to the PC calculation unit (54) and operand address calculation unit (54), even for BRA and Bcc instructions processed in one step code The pre-branch processing is also possible, and the processing speed is significantly increased.

また、分岐予測テーブルの書換えをBcc命令に対してＥ
ステージ（35）で分岐が行われたとき、次のＤステージ
（32）で分岐予測テーブル（113）のアクセスが必要に
なるまでに、分岐履歴の更新を行うことが可能であり、
Ｄステージ（32）とＥステージ（35）が分岐予測テーブ
ル（113）のアクセスの競合でパイプライン処理が滞る
ことによるデータ処理装置の処理速度低下を防ぐことが
できる。Also, the branch prediction table can be rewritten for the Bcc instruction by E.
When a branch is made in the stage (35), it is possible to update the branch history until the branch prediction table (113) needs to be accessed in the next D stage (32).
The D stage (32) and the E stage (35) can prevent the processing speed of the data processing device from being lowered due to the delay of the pipeline processing due to the access competition of the branch prediction table (113).

[Brief description of drawings]

第１図は本発明のデータ処理装置の分岐命令処理回路の
図、第２図は本発明のデータ処理装置の全体ブロック
図、第３図は本発明のデータ処理装置のパイプラインス
テージ概要図、第４図は本発明のデータ処理装置の分岐
予測テーブル詳細図、第５図は従来のデータ処理装置の
パイプラインステージ概要図、第６図は従来のデータ処
理装置での分岐命令処理の様子を示す図、第７図は本発
明のデータ処理装置での分岐命令処理の様子を示す図、
第８図は本発明のデータ処理装置のメモリ上での命令の
並び方を示す図、第９図から第17図は本発明のデータ処
理装置の命令フォーマットの図、第18図から第31図は本
発明のデータ処理装置のアドレッシングモードの説明
図、第32図は本発明のデータ処理装置の命令フォーマッ
トの特徴を示す図、第33図はBSR命令のフォーマット
図、第34図はTRAPA、TRAP/cccc命令のフォーマット図、
第35図はBcc命令のフォーマット図、第36図はACB命令の
フォーマット図である。（52）は命令デコード部、（53）はPC計算部、（54）は
オペランドアドレス計算部、（56）はデータ演算部、
（100）はDISPバス、（102）は補正値バス、（103）は
Ａバスを示す。FIG. 1 is a diagram of a branch instruction processing circuit of a data processing device of the present invention, FIG. 2 is an overall block diagram of the data processing device of the present invention, and FIG. 3 is a schematic diagram of a pipeline stage of the data processing device of the present invention. FIG. 4 is a detailed view of a branch prediction table of the data processor of the present invention, FIG. 5 is a schematic diagram of a pipeline stage of the conventional data processor, and FIG. 6 is a state of branch instruction processing in the conventional data processor. FIG. 7 is a diagram showing a state of branch instruction processing in the data processing device of the present invention,
FIG. 8 is a diagram showing the arrangement of instructions on the memory of the data processing device of the present invention, FIGS. 9 to 17 are diagrams of the instruction format of the data processing device of the present invention, and FIGS. 18 to 31 are FIG. 32 is an explanatory diagram of an addressing mode of the data processing device of the present invention, FIG. 32 is a diagram showing characteristics of an instruction format of the data processing device of the present invention, FIG. 33 is a format diagram of BSR instruction, FIG. 34 is TRAPA, TRAP / cccc instruction format diagram,
FIG. 35 is a format diagram of the Bcc instruction, and FIG. 36 is a format diagram of the ACB instruction. (52) is an instruction decoding unit, (53) is a PC calculation unit, (54) is an operand address calculation unit, (56) is a data calculation unit,
(100) shows the DISP bus, (102) shows the correction value bus, and (103) shows the A bus.

Claims

[Claims]

1. A decoding mechanism that receives an instruction code, decodes the instruction code, and receives information about updating a program counter from the decoding mechanism, calculates a program counter value of an instruction code next to the instruction code, A data processing device comprising: a first calculation mechanism that holds a start address of an instruction code; and a second calculation mechanism that receives information related to operand address calculation from the decoding mechanism and calculates an operand address. In the above, when the conditional branch instruction is decoded, the first calculation mechanism fetches the branch destination instruction to fetch the start address of the conditional branch instruction and the branch displacement of the conditional branch instruction from the decoding mechanism. To generate a branch destination address, and the second calculation mechanism If not, in order to fetch the instruction code next to the conditional branch instruction, the start address of the conditional branch instruction calculated by the first calculation mechanism and the instruction length of the conditional branch instruction from the decoding mechanism And a sum of them to generate a start address of an instruction code next to the conditional branch instruction.

2. A decoding mechanism that divides a conditional branch instruction into a plurality of unit decoding processes of two or more times and decodes it, and a first mechanism that holds the address value of the break of the instruction code that the decoding mechanism performs the unit decoding process. A latch, a second latch that holds the program counter value of an instruction decoded by the decoding mechanism, and an addition that selectively uses the contents of either the first latch or the second latch as the first input. A first calculating mechanism for receiving a program counter update information from the decoding mechanism and calculating a program counter value; and receiving an operand address calculation information from the decoding mechanism,
In a data processing device having a second calculation mechanism for calculating an operand address, the first calculation mechanism decodes the last instruction code of the conditional branch instruction by the decoding mechanism to fetch a branch destination instruction. The second displacement mechanism receives the contents of the second latch and the branch displacement of the conditional branch instruction from the decoding mechanism, adds them, and generates a branch destination address. If no branch occurs,
In order to fetch the next instruction of the conditional branch instruction, the contents of the first latch and the instruction code length of the last instruction code of the conditional branch instruction are received by the decoding mechanism, they are added, A data processing device characterized by generating a start address of an instruction code next to a conditional branch instruction.

3. A branch history table for holding a branch history of a conditional branch instruction, instruction decoding, and a first branch process for the conditional branch instruction according to the output of the branch history table or the first branch processing. A first pipeline stage having a function of performing either of the branch processing and the second branch processing according to a branch condition for the conditional branch instruction. A second pipeline stage having a function of performing an operation of performing or not performing, and a function of performing a pipeline operation of the second pipeline stage, wherein the second pipeline stage executes the second branch processing. When the second pipeline stage is accessed, the second pipeline stage has priority over the first pipeline stage to access the branch history table to update the branch history. When the processing has not been performed, the second pipeline stage, the data processing apparatus characterized by not access the branch history table.