JPH0820952B2

JPH0820952B2 - Data processing device with pipeline processing mechanism

Info

Publication number: JPH0820952B2
Application number: JP63049093A
Authority: JP
Inventors: 雅仁松尾; 豊彦吉田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1988-03-01
Filing date: 1988-03-01
Publication date: 1996-03-04
Anticipated expiration: 2011-03-04
Also published as: JPH01222332A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、高度なパイプライン処理機構により高い
処理能力を実現したデータ処理装置に関するものであ
り、特にサブルーチンリターン命令に関しても、パイプ
ライン処理の初期の段階で戻り先アドレスへの先行分岐
処理が可能なデータ処理装置に関するものである。Description: TECHNICAL FIELD The present invention relates to a data processing device that realizes a high processing capacity by an advanced pipeline processing mechanism, and particularly in a pipeline processing even for a subroutine return instruction. The present invention relates to a data processing device capable of performing preceding branch processing to a return address at an initial stage.

[Conventional technology]

第７図は従来のデータ処理装置の典型的なパイプライ
ンステージを示す図であり、図において、（１）は命令
フェッチステージ、（２）は命令デコードステージ、
（３）はアドレス計算ステージ、（４）はオペランドフ
ェッチステージ、（５）は実行ステージ、（８）はオペ
ランドライトステージである。FIG. 7 is a diagram showing a typical pipeline stage of a conventional data processing apparatus. In the figure, (1) is an instruction fetch stage, (2) is an instruction decode stage,
(3) is an address calculation stage, (4) is an operand fetch stage, (5) is an execution stage, and (8) is an operand write stage.

次に動作について説明する。第７図に示したデータ処
理装置は、バスが空いている時間を利用して命令データ
の取り込みを行う命令フェッチステージ（１）、命令デ
ータの解析を行う命令デコードステージ（２）、オペラ
ンド等のアドレス計算を行うアドレス計算ステージ
（３）、オペランドデータのフェッチを行うオペランド
フェッチステージ（４）、データの処理を行う実行ステ
ージ（５）、オペランドデータの書き込みを行うオペラ
ンドライトステージ（８）の６段のパイプラインステー
ジで構成されており、各ステージは異なる命令を同時に
処理することが可能である。ただしオペランドやメモリ
アクセスに関してコンフリクトが起こったような場合に
は優先度の低いステージがコンフリクトが解消されるま
で処理を一時停止する。Next, the operation will be described. The data processing device shown in FIG. 7 uses an idle time of the bus to fetch the instruction data (1), an instruction decoding stage (2) for analyzing the instruction data, an operand, etc. Six stages: an address calculation stage (3) for address calculation, an operand fetch stage (4) for fetching operand data, an execution stage (5) for processing data, and an operand write stage (8) for writing operand data. Of pipeline stages, each stage can process different instructions simultaneously. However, in the case where a conflict has occurred with respect to operands or memory access, the process with a low priority is suspended until the conflict is resolved.

以上のように、パイプライン化されたデータ処理装置
では、データの処理の流れに従って処理を複数のステー
ジに分割し、各ステージを同時に動作させることによ
り、１命令に必要な平均処理時間を短縮させて全体とし
ての性能を向上させている。As described above, in the pipelined data processing device, the processing is divided into a plurality of stages according to the flow of data processing, and each stage is operated at the same time to reduce the average processing time required for one instruction. To improve the overall performance.

ところが、このようにパイプライン化されたデータ処
理装置において、分岐命令等の命令の流れを乱す命令が
実行ステージ（５）で実行された場合には、それより前
のステージで行われていた処理がすべてキャンセルさ
れ、次に実行される命令は命令のフェッチから行わなけ
ればない。このように、処理の流れを乱す命令が実行さ
れると、パイプライン処理のオーバーヘッドが大きくな
り、データ処理装置の実行速度が上がらない。データ処
理装置の性能向上のため、無条件分岐命令、条件分岐命
令等の命令実行に関するオーバーヘッド削減について様
々な工夫がなされてきた。例えば、分岐命令のアドレス
と分岐先のアドレスを組にして記憶しておくブランチタ
ーゲットバッファというものを用いて、命令フェッチの
段階で命令の流れを予測し、処理を行っている。（J.K.
F.Lee and A.J.Smith,“Branch Prediction Strategies
and Branch Target Buffer Design,“IEEE COMPUTER V
ol.17,No.1,January 1984,pp.6−22.参照）以上のよう
に、パイプライン処理の初期の段階で処理の流れを予測
し、次に実行されると予測される命令をパイプラインに
流す（以下先行分岐処理と呼ぶ）ことにより分岐命令実
行時のオーバーヘッド削減が計られている。ところが、
サブルーチンからのリターン命令に関してはサブルーチ
ンからのリターンアドレスが対応するサブルーチンコー
ル命令のアドレスに依存するため、処理の流れを予測す
ることが困難であった。However, in such a pipelined data processing device, when an instruction that disturbs the flow of instructions such as a branch instruction is executed in the execution stage (5), the processing performed in the previous stage is executed. Are all canceled and the next instruction to be executed must be from the instruction fetch. In this way, when an instruction that disturbs the processing flow is executed, the overhead of pipeline processing becomes large, and the execution speed of the data processing device cannot be increased. In order to improve the performance of the data processing device, various measures have been made to reduce the overhead related to instruction execution such as unconditional branch instructions and conditional branch instructions. For example, a branch target buffer that stores the address of a branch instruction and the address of a branch destination as a pair is used to predict the flow of instructions at the instruction fetch stage and perform processing. (JK
F. Lee and AJ Smith, “Branch Prediction Strategies
and Branch Target Buffer Design, “IEEE COMPUTER V
ol.17, No.1, January 1984, pp.6-22.) As described above, the flow of processing is predicted at the initial stage of pipeline processing, and the instruction predicted to be executed next is calculated. The overhead at the time of executing a branch instruction is reduced by flowing it to the pipeline (hereinafter referred to as the preceding branch processing). However,
With respect to the return instruction from the subroutine, it is difficult to predict the flow of processing because the return address from the subroutine depends on the address of the corresponding subroutine call instruction.

[Problems to be Solved by the Invention]

従来のデータ処理装置は、以上で述べたように、サブ
ルーチンからのリターン命令に対してサブルーチンから
のリターンアドレスが対応するサブルーチンコール命令
のアドレスに依存するため、処理の流れを予測する有効
な手段がなかった。As described above, in the conventional data processing device, since the return address from the subroutine depends on the address of the corresponding subroutine call instruction with respect to the return instruction from the subroutine, an effective means for predicting the processing flow is provided. There wasn't.

この発明は上記のような問題点を解消するためになさ
れたもので、サブルーチンリターン命令に関しても、パ
イプライン処理の初期の段階で戻り先アドレスへの先行
分岐処理が可能なデータ処理装置を得ることを目的とす
る。The present invention has been made in order to solve the above problems, and it is possible to obtain a data processing device capable of performing a pre-branch processing to a return destination address even in the case of a subroutine return instruction at an early stage of pipeline processing. With the goal.

[Means for solving the problem]

この発明に係るデータ処理装置は、サブルーチンコー
ル命令のリターンアドレスのみを格納するプログラムカ
ウンタ（PC）専用のスタックメモリ（以下PCスタックと
呼ぶ）を備えたものである。The data processing device according to the present invention includes a stack memory (hereinafter referred to as a PC stack) dedicated to a program counter (PC) that stores only the return address of a subroutine call instruction.

〔作用〕この発明におけるデータ処理装置は、実行ステージで
サブルーチンコール命令実行時にサブルーチンからのリ
ターンアドレスがPCスタックにプッシュされ、命令デコ
ードステージでサブルーチンリターン命令デコード時に
PCスタックからポップされたアドレスに先行分岐処理を
行う。[Operation] In the data processing device according to the present invention, the return address from the subroutine is pushed onto the PC stack when the subroutine call instruction is executed in the execution stage, and the subroutine return instruction is decoded in the instruction decode stage.
Performs pre-branch processing to the address popped from the PC stack.

Example of Invention

（１）パイプライン機構本発明のデータ処理装置のパイプライン処理は第１図
に示す構成となる。命令のプリフェッチを行う命令フェ
ッチステージ（IFステージ（１）、１段目の命令のデコ
ードを行うデコードステージ（Ｄステージ（２）、２段
目の命令のデコードとオペランドのアドレス計算を行う
オペランドアドレス計算ステージ（Ａステージ
（３））、マイクロROMのアクセス（特にＲステージ
（６）と呼ぶ）とオペランドのプリフェッチ（特にOFス
テージ（７）と呼ぶ）を行うオペランドフェッチステー
ジ（Ｆステージ（４））、命令の実行を行う実行ステー
ジ（Ｅステージ（５））の５段構成をパイプライン処理
の基本とする。Ｅステージ（５）では１段のストアバッ
ファがあるほか、高機能命令の一部は命令実行自体をパ
イプライン化するため、実際には５段以上のパイプライ
ン処理効果がある。(1) Pipeline Mechanism The pipeline processing of the data processing apparatus of the present invention has the configuration shown in FIG. Instruction fetch stage for prefetching instructions (IF stage (1), decode stage for decoding first instruction (D stage (2), operand address calculation for second instruction decoding and operand address calculation) A stage (A stage (3)), a micro-ROM access (especially called R stage (6)) and an operand fetch stage (especially called OF stage (7)) for performing an operand prefetch (F stage (4)), The pipeline processing is based on a five-stage configuration of an execution stage (E stage (5)) for executing instructions.In the E stage (5), there is a one-stage store buffer, and some high-performance instructions include instructions. Since the execution itself is pipelined, there are actually five or more pipeline processing effects.

各ステージは他のステージとは独立に動作し、理論上
は５つのステージが完全に独立動作する。各ステージは
１回の処理を最小２クロックで行うことができる。従っ
て理想的には２クロックごとに次々とパイプライン処理
が進行する。Each stage operates independently of the other stages, and theoretically five stages operate completely independently. Each stage can perform one processing in a minimum of 2 clocks. Therefore, ideally, pipeline processing proceeds every two clocks.

本発明のデータ処理装置にはメモリ−メモリ間演算
や、メモリ間接アドレッシングなど基本パイプライン処
理１回だけでは処理が行えない命令があるが、本発明の
データ処理装置はこれらの処理に対してもなるべく均衡
したパイプライン処理が行えるように設計されている。
複数のメモリオペランドをもつ命令に対してはメモリオ
ペランドの数をもとに、デコード段階で複数のパイプラ
イン処理単位（ステップコード）に分解してパイプライ
ン処理を行うのである。パイプライン処理単位の分解方
法に関しては特願昭61−236456で詳しく述べられてい
る。Although the data processing device of the present invention has instructions that cannot be processed by only one basic pipeline process such as memory-memory operation and memory indirect addressing, the data processing device of the present invention also handles these processes. It is designed so that balanced pipeline processing is possible.
For an instruction having a plurality of memory operands, the pipeline processing is performed by decomposing into a plurality of pipeline processing units (step codes) at the decoding stage based on the number of memory operands. The method of disassembling the pipeline processing unit is described in detail in Japanese Patent Application No. 61-236456.

IFステージ（１）からＤステージ（２）に渡される情
報は命令コード（11）そのものである。Ｄステージ
（２）からＡステージ（３）に渡される情報は命令で指
定された演算に関するもの（Ｄコード（12）と呼ぶ）
と、オペランドのアドレス計算に関係するもの（Ａコー
ド（13）と呼ぶ）との２つある。Ａステージ（３）から
Ｆステージ（４）に渡される情報はマイクロプログラム
ルーチンのエントリ番地やマイクロプログラムへのパラ
メータなどを含むＲコード（14）と、オペランドのアド
レスとアクセス方法指示情報などを含むＦコード（15）
との２つである。Ｆステージ（４）からＥステージ
（５）に渡される情報は演算制御情報とリテラルなどを
含むＥコード（16）と、オペランドやオペランドアドレ
スなどを含むＳコード（17）との２つである。The information passed from the IF stage (1) to the D stage (2) is the instruction code (11) itself. The information passed from the D stage (2) to the A stage (3) relates to the operation specified by the instruction (called the D code (12)).
And those related to operand address calculation (called A code (13)). The information passed from the A stage (3) to the F stage (4) is an R code (14) including an entry address of a microprogram routine and a parameter to the microprogram, and an F code including an operand address and access method instruction information. Code (15)
And two. Information passed from the F stage (4) to the E stage (5) is an E code (16) including operation control information and a literal and an S code (17) including operands and operand addresses.

（1.1）各パイプラインステージの処理（1.1.1）命令フェッチステージ命令フェッチステージ（IFステージ（１））は外部メ
モリから命令をフェッチし、命令キューに入力して、Ｄ
ステージ（２）に対して命令コード（11）を出力する。(1.1) Processing of each pipeline stage (1.1.1) Instruction fetch stage The instruction fetch stage (IF stage (1)) fetches an instruction from external memory, inputs it to the instruction queue, and
The instruction code (11) is output to the stage (2).

命令キューの入力は整置された４バイト単位で行う。
メモリから命令をフェッチするときは整置された４バイ
トにつき最小２クロックを要する。ブランチバッファが
ヒットした時は整置された４バイトにつき１クロックで
フェッチ可能である。Input to the instruction queue is performed in aligned 4-byte units.
Fetching instructions from memory requires a minimum of 2 clocks for every 4 bytes aligned. When the branch buffer is hit, it is possible to fetch the aligned 4 bytes in 1 clock.

命令キューの出力単位は２バイドごとに可変であり、
２クロックの間に最大６バイトまで出力できる。また分
岐の直後には命令キューをバイパスして命令基本部２バ
イトを直接命令デコーダに転送することもできる。The output unit of the instruction queue is variable every 2 bytes,
Up to 6 bytes can be output in 2 clocks. Immediately after branching, it is possible to bypass the instruction queue and directly transfer the two bytes of the basic instruction portion to the instruction decoder.

プリフェッチ先命令アドレスの管理もIFステージ
（１）で行う。次にフェッチすべき命令のアドレスは命
令キューに入力すべき命令のアドレスとして専用のカウ
ンタで計算される。分岐やジャンプが起きたときには、
新たな命令のアドレスが、PC演算部やデータ演算部より
転送されてくる。The management of the prefetch destination instruction address is also performed in the IF stage (1). The address of the instruction to be fetched next is calculated by a dedicated counter as the address of the instruction to be input to the instruction queue. When a branch or jump occurs,
The address of the new instruction is transferred from the PC operation unit or data operation unit.

（1.1.2）命令デコードステージ命令デコードステージ（Ｄステージ（２））はIFステ
ージ（１）から入力された命令コード（11）をデコード
する。命令コードは16ビット（ハーフワード）単位とな
っている。デコードは２クロック単位に１度行ない、１
回のデコード処理で０〜３ハーフワードの命令コードを
消費する。このＤステージ（２）で命令コードがパイプ
ライン処理単位であるステップコードに分解される。す
なわち、１命令が１つないし複数のステップコードに分
解されて、後段のパイプラインステージで処理されてい
くのである。Ｄステージ（２）ではステップコードとし
てＡステージ（３）に対してアドレス計算情報であるＡ
コード（13）と、オペコードの中間デコード結果である
Ｄコード（12）とを出力する。(1.1.2) Instruction Decode Stage The instruction decode stage (D stage (2)) decodes the instruction code (11) input from the IF stage (1). The instruction code is in units of 16 bits (halfword). Decode once every two clocks
The instruction code of 0 to 3 halfwords is consumed in the decoding process of one time. At the D stage (2), the instruction code is decomposed into a step code which is a pipeline processing unit. That is, one instruction is decomposed into one or a plurality of step codes and processed in the subsequent pipeline stage. In the D stage (2), as the step code, A is the address calculation information for the A stage (3).
The code (13) and the D code (12) which is the intermediate decoding result of the operation code are output.

Ｄステージ（２）ではPC演算部の制御、分岐予測処
理、プリブランチ命令に対する先行分岐処理（プリブラ
ンチ）、命令キューからの命令コード出力制御等も行
う。プリブランチ処理とは、Ｅステージ（５）での分岐
処理に先立し、無条件分岐命令、条件分岐命令等の分岐
を予測し、PC演算部で飛び先の番地を計算し、IFステー
ジ（１）に飛び先の命令をフェッチさせ、飛び先の命令
をパイプラインに流すことである。プリブランチ命令と
は、プリブランチ処理を行う命令である。In the D stage (2), control of the PC arithmetic unit, branch prediction processing, preceding branch processing for pre-branch instructions (pre-branch), instruction code output control from the instruction queue, etc. are also performed. Pre-branch processing is, prior to the branch processing at E stage (5), predicts branches such as unconditional branch instructions and conditional branch instructions, calculates the address of the jump destination in the PC arithmetic unit, and This is to cause (1) to fetch the jump destination instruction and to flow the jump destination instruction to the pipeline. The pre-branch instruction is an instruction for performing pre-branch processing.

（1.1.3）オペランドアドレス計算ステージオペランドアドレス計算ステージ（Ａステージ
（３））は処理が大きく２つに分かれる。１つはオペラ
ンドの後段デコードを行う処理で、もう１つはオペラン
ドのアドレスの計算を行う処理である。(1.1.3) Operand address calculation stage The operand address calculation stage (A stage (3)) is roughly divided into two processes. One is a process for performing the subsequent decoding of the operand, and the other is a process for calculating the address of the operand.

オペコードの後段デコード処理はＤコード（12）を入
力とし、レジスタやメモリの書き込み予約及びマイクロ
プログラムのエントリ番地とマイクロプログラムに対す
るパラメータなどを含むＲコード（14）の出力を行う。
なお、レジスタやメモリの書き込み予約は、アドレス計
算で参照したレジスタやメモリの内容が、パイプライン
上を先行する命令で書き換えられ、誤ったアドレス計算
が行われるのを防ぐためのものである。レジスタやメモ
リの書き込み予約はデッドロックを避けるため、ステッ
プコードごとに行うのではなく命令ごとに行う。レジス
タやメモリの書き込み予約は特願昭62−144394で詳しく
述べられている。The subsequent decoding process of the operation code receives the D code (12) as input, and outputs the R code (14) including the write reservation of the register and the memory and the entry address of the microprogram and parameters for the microprogram.
Note that the register or memory write reservation is for preventing the contents of the register or memory referred to in the address calculation from being rewritten by the preceding instruction on the pipeline and causing incorrect address calculation. In order to avoid deadlock, write reservation of registers and memory is performed not for each step code but for each instruction. The write reservation of registers and memories is described in detail in Japanese Patent Application No. 62-144394.

オペランドアドレス計算処理はＡコード（13）を入力
とし、Ａコード（13）に従いオペランドアドレス計算部
で加算やメモリ間接参照を組み合わせてアドレス計算を
行い、その計算結果をＦコード（15）として出力する。
この際、アドレス計算に伴うレジスタやメモリの読み出
し時にコンフリクトチェックを行い、先行命令がレジス
タやメモリに書き込み処理を終了していないためコンフ
リクトが指示されれば、先行命令がＥステージ（15）で
書き込み処理を終了するまで待つ。In the operand address calculation process, the A code (13) is input, the address calculation is performed by the operand address calculation unit according to the A code (13) by combining addition and memory indirect reference, and the calculation result is output as the F code (15). .
At this time, a conflict check is performed at the time of reading the register or memory associated with the address calculation, and if the conflict is instructed because the preceding instruction has not completed the writing process to the register or memory, the preceding instruction is written at the E stage (15). Wait until the process is completed.

また、Ａステージ（３）ではスタックからのポップ操
作、スタックへのプッシュ操作等によるスタックポイン
タ（SP）のコンフリクトを防ぐため、実行ステージ
（５）のSPに先行してＡステージスタックポインタ（AS
P）を備えており、ポップ、プッシュ操作に伴うASPの更
新はこのステージで行われる。従って、通常のポップ。
プッシュ操作直後でもASPを参照することにより、SPの
コンフリクトでステップコードの処理を遅らせることな
く処理を進めることができる。SPの管理方法に関しては
特願昭62−145852で詳しく述べられている。In order to prevent a stack pointer (SP) conflict due to a pop operation from the stack, a push operation to the stack, etc. at the A stage (3), the A stage stack pointer (AS
P) is provided, and ASP update associated with pop and push operations is performed at this stage. Therefore, normal pop.
By referring to the ASP even immediately after the push operation, it is possible to proceed without delaying the processing of the step code due to the conflict of the SP. The method of managing the SP is described in detail in Japanese Patent Application No. 62-145852.

（1.1.4）マイクロROMアクセスステージオペランドフェッチステージ（Ｆステージ（４））も
処理が大きく２つに分かれる。１つはマイクロROMのア
クセス処理であり、特にＲステージ（６）と呼ぶ。他方
はオペランドプリフェッチ処理であり、特にOFステージ
（７）と呼ぶ。Ｒステージ（６）とOFステージ（７）は
必ずしも同時に動作するわけでなく、メモリアクセス権
が獲得できるかどうかなどに依存して、独立に動作す
る。(1.1.4) Micro ROM access stage The operand fetch stage (F stage (4)) is also roughly divided into two processes. One is a micro ROM access process, which is particularly called an R stage (6). The other is an operand prefetch process, which is particularly called an OF stage (7). The R stage (6) and the OF stage (7) do not always operate at the same time, but operate independently depending on whether or not a memory access right can be acquired.

Ｒステージ（６）では、Ｒコード（14）に対して次の
Ｅステージ（５）での実行に使用する実行制御コードで
あるＥコード（16）を作り出すためのマイクロROMアク
セスとマイクロ命令デコード処理が行われる。１つのＲ
コードに対する処理が２つ以上のマイクロプログラムス
テップに分解される場合、マイクロROMはＥステージ
（５）で使用され、次のＲコード（14）はマイクロROM
アクセス待ちになる。Ｒコード（14）に対するマイクロ
ROMアクセスが行われるのはその前のＥステージ（５）
での最後のマイクロ命令実行の時である。本発明のデー
タ処理装置ではほとんどの基本命令は１マイクロプログ
ラムステップで行われるため実際にはＲコード（14）に
対するマイクロROMアクセスが次々と行われることが多
い。In the R stage (6), micro ROM access and micro instruction decoding processing for producing an E code (16) which is an execution control code used for execution in the next E stage (5) for the R code (14) Is done. 1 R
If the processing on the code is decomposed into two or more microprogram steps, the micro ROM is used in the E stage (5) and the next R code (14) is the micro ROM.
Waiting for access. Micro for R code (14)
ROM access is performed before the E stage (5)
It is the time of the last microinstruction execution in. In the data processor of the present invention, most basic instructions are executed in one microprogram step, so in practice, micro ROM access to the R code (14) is often performed one after another.

（1.1.5）オペランドフェッチステージオペランドフェッチステージ（OFステージ（７））は
Ｆステージ（４）で行う上記の２つの処理のうちオペラ
ンドプリフェッチ処理を行う。(1.1.5) Operand fetch stage The operand fetch stage (OF stage (7)) performs the operand prefetch process of the above two processes performed in the F stage (4).

オペランドプリフェッチはＦコード（15）を入力と
し、フェッチしたオペランドとそのアドレスをＳコード
（17）として出力する。１つのＦコード（15）ではワー
ド境界をまたいでもよいが４バイト以下のオペランドフ
ェッチを指定する。Ｆコード（15）にはオペランドのア
クセスを行うかどうかの指定も含まれており、Ａステー
ジ（３）で計算したオペランドアドレス自体や即値をＥ
ステージ（５）に転送する場合にはオペランドプリフェ
ッチは行わず、Ｆコード（15）の内容をＳコード（17）
として転送する。また、プリフェッチしようとするオペ
ランドとＥステージ（５）が書き込み処理を行おうとす
るオペランドとが包含関係を満たすときには、オペラン
ドプリフェッチに関してメモリアクセスは行わず、Ｅス
テージ（５）が書き込もうとする値をバイパスする。In the operand prefetch, the F code (15) is input, and the fetched operand and its address are output as the S code (17). One F code (15) may cross word boundaries, but specifies an operand fetch of 4 bytes or less. The F code (15) also includes the designation of whether or not to access the operand, and the operand address itself or the immediate value calculated in the A stage (3) is E
When transferring to stage (5), operand prefetch is not performed and the contents of F code (15) are converted to S code (17)
To transfer as. When the operand to be prefetched and the operand to be written by the E stage (5) satisfy the inclusive relation, memory access is not performed for the operand prefetch and the value to be written by the E stage (5) is bypassed. To do.

（1.1.6）実行ステージ実行ステージ（Ｅステージ（５））はＥコード（1
6）、Ｓコード（17）を入力として、各種演算器を用い
たデータの処理、データのリード、ライト等の処理を行
う。演算器としてはALU、バレルシウタ、プライオリテ
ィエンコーダやカウンタ、シフトレジスタなどがある。
Ｅステージ（５）はマイクロプログラムにより制御され
Ｒコード（16）に示されたマイクロプログラムのエント
リ番地からの一連のマイクロプログラムを実行すること
により命令を実行する。レジスタと主な演算器の間は３
バスで結合されており、１つのレジスタ間演算を指示す
る１マイクロ命令を２クロックサイクルで処理する。(1.1.6) Execution stage The execution stage (E stage (5)) is the E code (1
6), using the S code (17) as an input, the data processing, the data reading, the data writing, and the like using various arithmetic units are performed. ALU, barrel shutter, priority encoder, counter, shift register, etc.
The E stage (5) is controlled by the microprogram and executes an instruction by executing a series of microprograms from the entry address of the microprogram indicated by the R code (16). 3 between the register and the main arithmetic unit
They are connected by a bus and process one microinstruction for instructing one inter-register operation in two clock cycles.

このＥステージ（５）が命令を実行するステージであ
り、Ｆステージ（４）以前のステージで行われた処理は
すべてＥステージ（５）のための前処理である、Ｅステ
ージ（５）で分岐が起こると、IFステージ（１）〜Ｆス
テージ（４）までの処理はすべて無効化され、飛び先番
地が命令フェッチ部とPC計算部に出力される。The E stage (5) is a stage for executing an instruction, and all the processes performed in the stages before the F stage (4) are preprocessing for the E stage (5). Occurs, all the processing from the IF stage (1) to the F stage (4) is invalidated, and the jump destination address is output to the instruction fetch unit and the PC calculation unit.

Ｅステージ（５）ではデータ演算部（56）にあるスト
アバッファを利用して、４バイト以内のオペランドスト
アと次のマイクロ命令実行をパイプライン処理すること
もできる。In the E stage (5), the store buffer in the data operation unit (56) can be used to pipeline the operand store within 4 bytes and the next microinstruction execution.

Ｅステージ（５）ではＡステージ（３）で行ったレジ
スタやメモリに対する書き込み予約をオペランドの書き
込みの後、解除する。In the E stage (5), the write reservation for the register and the memory made in the A stage (3) is canceled after writing the operand.

また条件分岐命令がＥステージ（５）で分岐を起こし
たときはその条件分岐命令に対する分岐予測が誤ってい
たことを示しており分岐履歴の書換え処理を行う。When the conditional branch instruction causes a branch at the E stage (5), it indicates that the branch prediction for the conditional branch instruction is incorrect, and the branch history is rewritten.

（1.2）プログラムカウンタの管理本発明のデータ処理装置のパイプライン上に存在する
ステップコードはすべて別命令に対するものである可能
性があり、プログラムカウンタの値はステップコードご
とに管理する。すべてのステップコードはそのステップ
コードのもとになった命令のプログラムカウンタ値をも
つ。ステップコードに付属してパイプラインの各ステー
ジを流れるプログラムカウンタ値はステッププログラム
カウンタ（SPC）と呼ぶ。SPCはパイプラインステージを
次々と受け渡されていく。(1.2) Management of Program Counter All step codes existing on the pipeline of the data processing device of the present invention may be for different instructions, and the value of the program counter is managed for each step code. Every step code has the program counter value of the instruction that caused the step code. The program counter value attached to the step code and flowing through each stage of the pipeline is called a step program counter (SPC). SPCs are handed over to the pipeline stages one after another.

（２）サブルーチンリターン命令の先行分岐処理本発明のデータ処理装置は実行ステージでのサブルー
チンリターン命令の実行によるパイプラインの乱れを抑
えるために、サブルーチンリターン命令の実行に関して
は命令デコードステージ（Ｄステージ（２））で先行分
岐処理を行う。以下、詳細な動作を説明する。(2) Preceding Branch Processing of Subroutine Return Instruction In order to suppress the disturbance of the pipeline due to the execution of the subroutine return instruction in the execution stage, the data processing device of the present invention has an instruction decode stage (D stage (D stage ( The preceding branch processing is performed in 2)). The detailed operation will be described below.

第２図は、本発明のデータ処理装置のブロック図であ
り、サブルーチンコール命令、サブルーチンリターン命
令の処理を説明するために必要な部分だけが抜き出され
て説明されている。図において、（21）は命令キュー、
（22）は命令デコード部、（23）は外部とデータのやり
取りを行うデータ入出力回路、（24）は外部アドレスの
出力を行うアドレス出力回路、（25）は命令フェッチを
行うアドレスを出力するためのカウンタ（QINPC）、（2
6）は各ステップコード生成毎に命令デコード部（22）
で処理された命令長を格納するラッチ（IL）、（27）は
プリブランチのためのPCに対する変位を格納するための
ラッチ（PD）、（30）はPC演算部（54）での加算を行う
ためのPC加算器、（28）、（29）、（31）はそれぞれPC
加算器（30）の入出力ラッチ（PA,PB,P0）、（32）はス
テップコード処理毎のテンポラリなPCを格納するための
レジスタ（TPC）、（33）は現在デコード中の命令のPC
を格納するためのＤステージPC（DPC）、（34）はアド
レス計算中のステップコードコードに対応するPCを格納
するためのＡステージPC（APC）、（38）はアドレス計
算のための３値加算を行うアドレス加算器、（35）、
（36）、（37）、（39）はそれぞれアドレス加算器（3
8）の入出力ラッチ（AI,AD,AB,A0）、（40）はＡステー
ジ（３）でインクリメントやデクリメントを行いSPの管
理を行うＡステージスタックポインタ（ASP）、（41）
はＦコード（15）としてのアドレスを格納するためのＦ
コードアドレスレジスタ（FA）、（42）はＳコード（1
7）としてのアドレスを格納するためのＳコードアドレ
スレジスタ、（43）は命令フェッチを行うアドレスを一
時的に記憶するためのCAアドレスレジスタ（CAA）、（4
4）はＥステージ（５）で管理しているアドレスレジス
タ（AA）、（45）はＥステージ（５）での分岐先アドレ
スを格納するためのＥステージブランチアドレスレジス
タ（EB）、（46）はサブルーチンコール時の戻り先アド
レスのみを格納しておくPCスタック，（47）はスタック
ポインタ、フレームポインタ、ワーキングレジスタ等を
含むレジスタファイル、（56）はS2バス（102）から値
を入力してD0バス（103）に値を出力するDMラッチ、（5
0）はデータ演算のためのALU、（48），（49），（51）
はALU（50）の入出力ラッチ、（DA,DB,D0）、（52）は
Ｓコード（17）としてのデータ格納するためのＳコード
データレジスタ（SD）、（53）はＥステージ（５）で行
うメモリアクセスに関するデータを格納するデータレジ
スタ（DD）であり、（101）〜（110）はそれぞれ内部で
データやアドレスの転送を行うための内部バス（S1バ
ス、S2バス、D0バス、Ａバス、A0バス、DISPバス、P0バ
ス、CAバス、AAバス、D0バス）である。（54）はPC演算
部、（55）はアドレス計算部である。FIG. 2 is a block diagram of the data processing apparatus of the present invention, in which only the portions necessary for explaining the processing of the subroutine call instruction and the subroutine return instruction are extracted and described. In the figure, (21) is the instruction queue,
(22) is an instruction decoding unit, (23) is a data input / output circuit for exchanging data with the outside, (24) is an address output circuit for outputting an external address, and (25) is an address for fetching an instruction. Counter for (QINPC), (2
6) is an instruction decoding unit (22) for each step code generation
Latch (IL) for storing the instruction length processed by, (27) is a latch (PD) for storing displacement for PC for pre-branch, (30) is addition in PC operation unit (54) PC adder to do, (28), (29), (31) are respectively PC
Input / output latches (PA, PB, P0) of the adder (30), (32) are registers (TPC) for storing the temporary PC for each step code processing, and (33) is the PC of the instruction currently being decoded.
D-stage PC (DPC) for storing the address, (34) A-stage PC (APC) for storing the PC corresponding to the step code code in the address calculation, (38) is a three-value for address calculation Address adder that performs addition, (35),
(36), (37), (39) are address adders (3
Input / output latches (AI, AD, AB, A0) and (40) of 8) are A stage stack pointers (ASP) and (41) that manage SP by incrementing or decrementing at A stage (3).
Is an F code for storing an address as an F code (15)
Code address registers (FA), (42) are S code (1
7) S code address register for storing address, (43) CA address register (CAA) for temporarily storing instruction fetch address, (4
4) is the address register (AA) managed by the E stage (5), (45) is the E stage branch address register (EB) for storing the branch destination address in the E stage (5), (46) Is a PC stack that stores only the return address at the time of subroutine call, (47) is a register file containing stack pointer, frame pointer, working register, etc. (56) is a value input from S2 bus (102) DM latch, which outputs the value to the D0 bus (103), (5
0) is ALU for data operation, (48), (49), (51)
Is an ALU (50) input / output latch, (DA, DB, D0), (52) is an S code data register (SD) for storing data as an S code (17), and (53) is an E stage (5 ) Is a data register (DD) for storing data related to memory access, and (101) to (110) are internal buses (S1 bus, S2 bus, D0 bus, etc.) for internally transferring data and addresses. A bus, A0 bus, DISP bus, P0 bus, CA bus, AA bus, D0 bus). Reference numeral (54) is a PC calculation unit, and (55) is an address calculation unit.

第３図は本発明のデータ処理装置におけるサブルーチ
ンリターン命令の先行分岐処理に特に関係する部分のブ
ロック図である。図において、（61）はＤステージ制御
部、（62）はIFステージ制御部、（63）はＥステージ制
御部、（65）はパイプライン処理途中のサブルーチンコ
ール命令の数をカウントするための３ビットのカウンタ
であるBSRカウンタ、（66）はＤステージ（２）が管理
している３ビットのPCスタックポインタ（DP）、（67）
はＥステージ（５）が管理している３ビットのPCスタッ
クポインタ（EP）、（68），（69）はそれぞれDP（6
6）、EP（67）をデコードするデコーダであり、（201）
〜（212）は各部の制御信号である。この図では簡単の
ためタイミングを制御するためのクロック信号は省略し
てある。FIG. 3 is a block diagram of a portion particularly related to the preceding branch processing of the subroutine return instruction in the data processing device of the present invention. In the figure, (61) is a D stage control unit, (62) is an IF stage control unit, (63) is an E stage control unit, and (65) is 3 for counting the number of subroutine call instructions during pipeline processing. BSR counter, which is a bit counter, (66) is a 3-bit PC stack pointer (DP) managed by the D stage (2), (67)
Is a 3-bit PC stack pointer (EP) managed by the E stage (5), (68) and (69) are DP (6
6), a decoder that decodes EP (67), (201)
˜ (212) are control signals for each unit. In this figure, the clock signal for controlling the timing is omitted for simplicity.

本実施例では、PCスタック（46）は８エントリで構成
されている。また、DP（66）,EP（67）は３ビットとな
っているが、インクリメント時の最上位ビットからのキ
ャリー、デクリメント時の最上位ビットへのボローは無
視される。すなわち、PCスタック（46）は、ポインタ
‘000'の指し示すエントリが隣合ったリング状のスタッ
クメモリとして取り扱われている。In this embodiment, the PC stack (46) is composed of 8 entries. Although DP (66) and EP (67) are 3 bits, carry from the most significant bit at increment and borrow to the most significant bit at decrement are ignored. That is, the PC stack (46) is handled as a ring-shaped stack memory in which the entry pointed by the pointer '000' is adjacent.

（2.1）PCスタックの動作の概要本発明のデータ処理装置においてサブルーチンコール
命令およびサブルーチンリターン命令がどのように実行
されるか大まかに説明する。(2.1) Outline of the operation of the PC stack An outline of how the subroutine call instruction and the subroutine return instruction are executed in the data processing device of the present invention will be described.

本発明のデータ処理装置では、サブルーチンコール命
令としてブランチサブルーチン（BSR）命令とジャンプ
サブルーチン（JSR）命令がある。また、サブルーチン
リターン命令としては、リターンサブルーチン（RTS）
命令と高機能命令として高級言語用サブルーチンリター
ンとパラメータ解放を一度に行うEXITD命令がある。In the data processor of the present invention, there are a branch subroutine (BSR) instruction and a jump subroutine (JSR) instruction as the subroutine call instruction. Also, as a subroutine return instruction, a return subroutine (RTS)
As an instruction and a high-function instruction, there is a high-level language subroutine return and an EXITD instruction that releases parameters at once.

サブルーチンコール命令が実行されると、Ｅステージ
（５）でサブルーチンからの戻り先アドレスがPCスタッ
ク（46）にプッシュされる。サブルーチンリターン命令
がデコードされると、Ｄステージ（２）でPCスタック
（46）のスタックトップにあるアドレスに先行分岐処理
（プリリターン）を行う。パイプラインの初期段階であ
るデコードステージ（２）で分岐処理を行うためサブル
ーチンリターン命令実行によるパイプラインの乱れを大
幅に削減できる。実行ステージ（５）では、プリリター
ンを行ったアドレスとメモリから読み込んだ真の戻り先
アドレスが比較され、不一致であったならば真の戻り先
アドレスへの分岐処理を行う。ポインタPD（66）、EP
（66）等の更新を含めて少し詳しく説明する。When the subroutine call instruction is executed, the return address from the subroutine is pushed onto the PC stack (46) at the E stage (5). When the subroutine return instruction is decoded, the preceding branch processing (pre-return) is performed at the address on the stack top of the PC stack (46) in the D stage (2). Since the branch process is performed in the decode stage (2) which is the initial stage of the pipeline, the disturbance of the pipeline due to the execution of the subroutine return instruction can be significantly reduced. In the execution stage (5), the pre-returned address and the true return address read from the memory are compared, and if they do not match, branch processing to the true return address is performed. Pointer PD (66), EP
(66) etc. will be explained in a little more detail, including updates.

リセットされた状態では、RESET信号（208）により、
BSRカウンタ（65）、EP（67）はゼロクリアされ、DP（6
6）にはゼロになっているEP（67）の値がコピーされ
る。In the reset state, the RESET signal (208)
The BSR counter (65) and EP (67) are cleared to zero and DP (6
The value of EP (67) which is zero is copied to 6).

まず、命令キュー（21）から取り込まれた命令コード
が命令デコード部（22）でデコードされる。デコードの
結果、取り込まれた命令がサブルーチンコール命令であ
った場合にはDPDEC信号（202）によりDPのデクリメント
を行うと共に、BSRカウンタ（65）をカウントアップす
る。アドレス計算ステージ（３）では、アドレス加算器
（38）により戻り先アドレスが計算されてA0バス（10
5）を介してFAレジスタ（41）に転送される。Ｆステー
ジ（４）では、FAレジスタ（41）の値がSAレジスタ（4
2）に転送される。サブルーチンコール命令がＥステー
ジ（５）で実行されるとEPDEC信号（206）によりEP（6
7）の値がプリデクリメントされる。そしてPCWRITE信号
（210）により更新されたEP（67）が指すPCスタック（4
6）に、SIバス（101）を介してSAレジスタ（42）に格納
されている戻り番地の値が書き込まれる。また、BSRCDE
C信号（205）によりBSRカウンタ（65）をデクリメント
する。BSR命令では、Ｄステージ（２）でサブルーチン
の先頭番地への分岐処理を行う。BSR命令では、Ｅステ
ージ（５）での分岐処理を行う必要はない。First, the instruction code fetched from the instruction queue (21) is decoded by the instruction decoding unit (22). As a result of decoding, when the fetched instruction is a subroutine call instruction, the DPDEC signal (202) is used to decrement the DP and the BSR counter (65) is incremented. In the address calculation stage (3), the return address is calculated by the address adder (38) and the A0 bus (10
5) is transferred to the FA register (41). In the F stage (4), the value of the FA register (41) is changed to the SA register (4
2) transferred to. When the subroutine call instruction is executed at the E stage (5), EP (6
The value of 7) is pre-decremented. The PC stack (4) pointed to by the EP (67) updated by the PCWRITE signal (210)
The value of the return address stored in the SA register (42) is written in 6) via the SI bus (101). Also, BSRCDE
The B signal (205) decrements the BSR counter (65). In the BSR instruction, branch processing to the start address of the subroutine is performed in the D stage (2). The BSR instruction does not need to perform branch processing at the E stage (5).

次に、サブルーチンリターン命令の処理について説明
する。命令キュー（21）から取り込まれた命令がサブル
ーチンリターン命令であったときにはBSRカウンタ（6
5）の値がゼロであるかどうかを示すBSRCZ信号（201）
のチェックを行う。もし、BSRカウンタ（65）がゼロで
なかったり、BSRカウンタ（65）の値がゼロになるまで
Ｄステージ（２）は処理を一時停止する。BSRカウンタ
（65）がゼロでないということは、まだ対応するサブル
ーチンコール命令がＥステージ（５）で実行されずにパ
イプライ中にあることを示しており、PCスタック（46）
に対応する戻り番地が登録されていないことを示してい
る。BSRCZ信号（201）により、BSRカウンタ（65）の値
がゼロである、あるいは、ゼロになったことが示される
と、Ｄステージ制御部（61）はPRERET信号（209）によ
り、IFステージ制御部（62）及びPCスタック（46）にプ
リリターン処理を行うことを知らせる。PCスタック（4
6）はDP（66）が指し示しているエントリの内容をCAバ
ス（108）に出力する。IFステージ制御部（62）は、命
令キュー（21）に取り込まれている命令データをすべて
無効化し、CAバスに出力された値で戻り先アドレスの命
令のフェッチを行い、取り込まれた命令データを命令デ
コード部（22）に送る。PCスタック（46）の内容がCAバ
ス（108）に出力された後に、DPINC信号（203）によりD
P（66）がポストインクリメントされる。RTS命令では、
Ｆステージ（４）で、メモリから正しい戻り先アドレス
がフェッチされ、SDレジスタ（52）に取り込まれてい
る。また、EXITD命令では、Ｅステージ（５）での命令
実行中に、メモリから正しい戻り先アドレスをDDレジス
タ（53）に取り込む。PCREAD信号（211）によりEP（6
7）が指し示すPCスタック（46）の内容がS1バスに出力
され、ALU（50）の入力ラッチであるDAラッチ（48）に
取り込まれる。DAラッチ（48）に取り込まれた値は現在
Ｅステージ（５）で処理中のサブルーチンリターン命令
がプリリターンを行ったときの戻り先アドレスである。
また、SDレジスタ（52）あるいはDDレジスタ（53）に取
り込まれている真の戻り先アドレスがS2バス（102）を
介してDBラッチ（49）に取り込まれる。ALU（50）で
は、DAラッチ（48）の内容とDBラッチ（49）の内容の比
較を行い比較結果であるゼロフラグ（ZFLAG信号（21
2））をＥステージ制御部（63）に送る。Ｅステージ制
御部（63）では、もし比較結果が一致であったなら、プ
リリターンが正しかったことを示しているので、サブル
ーチンリターン命令の実行を終了する。もし比較結果が
不一致であった場合には、プリリターンを行った戻り先
アドレスが誤っていたことを示している。このとき、真
の戻り先アドレスの値をS1バス（101）を介してEBレジ
スタ（45）に転送した後、EBレジスタ（45）の値をCAバ
ス（108）に出力する。IFステージ（１）はCAバス（10
8）に出力された値により命令フェッチを行う。Next, the processing of the subroutine return instruction will be described. If the instruction fetched from the instruction queue (21) is a subroutine return instruction, the BSR counter (6
BSRCZ signal (201) indicating whether the value of 5) is zero
Check. If the BSR counter (65) is not zero or the value of the BSR counter (65) becomes zero, the D stage (2) suspends the processing. The fact that the BSR counter (65) is not zero indicates that the corresponding subroutine call instruction has not been executed in the E stage (5) and is in pipeline, and the PC stack (46)
It indicates that the return address corresponding to is not registered. When the BSRCZ signal (201) indicates that the value of the BSR counter (65) is zero or has reached zero, the D stage control unit (61) sends a PRERET signal (209) to the IF stage control unit. Notify (62) and PC stack (46) that pre-return processing will be performed. PC stack (4
6) outputs the content of the entry pointed to by DP (66) to the CA bus (108). The IF stage control unit (62) invalidates all the instruction data fetched in the instruction queue (21), fetches the instruction at the return address with the value output to the CA bus, and fetches the fetched instruction data. It is sent to the instruction decoding unit (22). After the contents of the PC stack (46) are output to the CA bus (108), the D signal is output by the DPINC signal (203).
P (66) is post-incremented. In the RTS instruction,
At the F stage (4), the correct return address is fetched from the memory and taken into the SD register (52). In the EXITD instruction, the correct return address is fetched from the memory into the DD register (53) during the execution of the instruction at the E stage (5). EP (6 by the PCREAD signal (211)
The contents of the PC stack (46) pointed to by 7) are output to the S1 bus and taken into the DA latch (48) which is the input latch of the ALU (50). The value fetched in the DA latch (48) is the return address when the subroutine return instruction currently being processed in the E stage (5) has made a pre-return.
Further, the true return address stored in the SD register (52) or the DD register (53) is stored in the DB latch (49) via the S2 bus (102). The ALU (50) compares the contents of the DA latch (48) with the contents of the DB latch (49), and the zero flag (ZFLAG signal (21
2)) is sent to the E stage control unit (63). In the E stage control section (63), if the comparison results are in agreement, it indicates that the pre-return was correct, so the execution of the subroutine return instruction is ended. If the comparison results do not match, it indicates that the return address for which pre-return was performed was incorrect. At this time, the value of the true return address is transferred to the EB register (45) via the S1 bus (101), and then the value of the EB register (45) is output to the CA bus (108). The IF stage (1) is a CA bus (10
Instruction fetch is performed according to the value output in 8).

サブルーチンリターン命令実行時に、Ｅステージ
（５）では、Ｄステージ（２）でプリリターンを行った
戻り先アドレスが正しかったかどうかのチェックを行っ
ている。これは、PCスタック（46）が８エントリで構成
されているため、サブルーチンコールが９レベル以上の
入れ子になった場合には８レベルより上のレベルのサブ
ルーチンコールに関する戻り先アドレスのデータがオー
バーライトされて壊されてしまう。また、プログラムに
よって外部メモリ上の戻り先アドレスの値が書き換えら
れた場合にも、PCスタック（46）に登録されている戻り
先アドレスとは異なるアドレスにリターンする。このよ
うな場合に備え、Ｅステージ（５）ではプリリターンが
正しく実行されたかどうかのチェックを行っているので
ある。しかし、プログラムによって外部メモリ上の戻り
先アドレスの値を書き換える様なことはまずないし、サ
ブルーチンレベルが一番深くなったところから８レベル
のサブルーチンコールに関してはいつも正しい値がPCス
タック（46）に格納されているので、プリリターンが正
しく行われる確立は非常に高い。When the subroutine return instruction is executed, in the E stage (5), it is checked whether or not the return destination address which was pre-returned in the D stage (2) was correct. This is because the PC stack (46) consists of 8 entries, so if subroutine calls are nested at 9 levels or higher, the data at the return address for subroutine calls at levels above 8 will be overwritten. It will be destroyed. Further, even when the value of the return address on the external memory is rewritten by the program, the program returns to an address different from the return address registered in the PC stack (46). In preparation for such a case, the E stage (5) checks whether or not the pre-return has been correctly executed. However, it is unlikely that the value of the return address on the external memory will be rewritten by the program, and the correct value will always be stored in the PC stack (46) for the 8th level subroutine call from the point where the subroutine level became the deepest. Therefore, the probability that the pre-return will be correct is very high.

先に述べたBSRカウンタ（65）は、より正確なプリリ
ターンを行い、Ｅステージ（５）での比較を確実に行う
ために備えられている。この機能がないと、サブルーチ
ンコール命令が処理中であり、Ｄステージ（２）での処
理は終了したが、Ｅステージ（５）でまだ戻り先アドレ
スの値がPCスタック（46）に書き込まれていないうち
に、Ｄステージ（２）でサブルーチンリターン命令が実
行された場合、対応するサブルーチンリターン命令の戻
り先アドレスが登録されていないため、誤った戻り先ア
ドレスにプリリターン処理を行ってしまう。ところが、
サブルーチンリターン命令がＥステージ（５）で処理さ
れる段階では、先行していたサブルーチンコール命令が
すでに処理されており、PCスタック（46）には正しい戻
り先アドレスが登録されているため、Ｅステージ（５）
での比較結果は一致を示し、プリリターンが正しかった
として処理されてしまう。すなわち、このような場合誤
動作を行ってしまうわけである。BSRカウンタの機能を
備えることにより、参照すべき戻り先アドレスの値が先
行するサブルーチンコール命令により登録された後に、
プリリターンが行われる。また、サブルーチンコール命
令の実行に際し、Ｄステージ（２）でPCスタック（46）
が参照されてからＥステージ（５）処理されるまでPCス
タック（46）が書き換えられることがないので、Ｄステ
ージ（２）でプリリターンを行った戻り先アドレスの値
がＥステージ（５）において正しく参照される。ただ
し、プリブランチを行わないJSR命令では、Ｅステージ
（５）において分岐先アドレスの分岐処理が行われるた
め、もし、RTS命令がJSR命令で登録される前のPCスタッ
クを参照してプリリターンしても、そのRTS命令自体が
実行される前にパイプラインはキャンセルされるので、
このようなことは起こらない。The BSR counter (65) described above is provided for more accurate pre-return and for reliable comparison in the E stage (5). Without this function, the subroutine call instruction is being processed and the processing at the D stage (2) has finished, but at the E stage (5) the value of the return address has not yet been written to the PC stack (46). If the subroutine return instruction is executed in the D stage (2) before that, the return destination address of the corresponding subroutine return instruction is not registered, so pre-return processing is performed at an incorrect return destination address. However,
When the subroutine return instruction is processed in the E stage (5), the preceding subroutine call instruction has already been processed and the correct return address is registered in the PC stack (46). (5)
The comparison result in (1) indicates a match, and the pre-return is processed as if it was correct. That is, in such a case, a malfunction occurs. By providing the function of the BSR counter, after the value of the return address to be referenced is registered by the preceding subroutine call instruction,
A pre-return is done. Also, when executing a subroutine call instruction, PC stack (46) at D stage (2)
Since the PC stack (46) is not rewritten from when is referenced to when the E stage (5) is processed, the value of the return destination address pre-returned in the D stage (2) is the E stage (5). Correctly referenced. However, with the JSR instruction that does not perform pre-branch, the branch processing of the branch destination address is performed in the E stage (5). Therefore, if the RTS instruction refers to the PC stack before being registered with the JSR instruction, it returns pre-return. However, since the pipeline is canceled before the RTS instruction itself is executed,
This will not happen.

以上で述べたように、サブルーチンコール時の戻り先
アドレスのみを記憶するPCスタック（46）を設けること
により、サブルーチンリターン命令に対して命令のデコ
ード段階で戻り先アドレスへのプリリターンを行い、サ
ブルーチンリターン命令実行時のパイプラインの乱れを
なくす。As described above, by providing the PC stack (46) that stores only the return destination address at the time of subroutine call, pre-return to the return destination address is performed at the instruction decoding stage for the subroutine return instruction, and the subroutine return Eliminates turbulence in the pipeline when executing return instructions.

Ｅステージ（５）においてブランチが起こった場合に
は、EBRA信号（204）によりBSRカウンタ（65）の値がゼ
ロクリアされ、EP（67）の内容がDP（66）にコピーされ
る。Ｅステージ（５）においてブランチが起こった場合
には、IFステージ（１）〜Ｆステージでの処理がすべて
無効化されるため、Ｄステージ（２）でデコードされた
が、Ｅステージ（５）では実行されなかった処理途中サ
ブルーチンコール命令、サブルーチンリターン命令に対
して行われたBSRカウンタ（65）、DP（66）の更新を無
効化し、PCスタック（46）のそのレベルまでの戻り先ア
ドレスの値をＤステージ（２）で正しく参照できるよう
になっている。When a branch occurs in the E stage (5), the value of the BSR counter (65) is cleared to zero by the EBRA signal (204), and the content of EP (67) is copied to DP (66). When a branch occurs at the E stage (5), all the processing at the IF stage (1) to the F stage is invalidated, so that it was decoded at the D stage (2), but at the E stage (5). The value of the return address of the PC stack (46) up to that level is invalidated by invalidating the update of the BSR counter (65) and DP (66) performed for the unprocessed mid-process subroutine call instruction or subroutine return instruction. Can be correctly referred to on the D stage (2).

（2.2）サブルーチンコール命令、サブルーチンリター
ン命令の詳細動作以上では、サブルーチンコール命令とサブルーチンリ
ターン命令の大まかな動作について述べてきたが、ここ
では各命令の詳細な動作について説明する。(2.2) Detailed Operation of Subroutine Call Instruction and Subroutine Return Instruction Above, the rough operation of the subroutine call instruction and the subroutine return instruction has been described. Here, the detailed operation of each instruction will be described.

本発明のデータ処理装置では、サブルーチンコール命
令としてブランチサブルーチン（BSR）命令とジャンプ
サブルーチン（JSR）命令がある。また、サブルーチン
リターン命令としては、リターンサブルーチン（RTS）
命令と高機能命令として高級言語用サブルーチンリター
ンとパラメータ解放を一度に行うEXITD命令がある。以
下、各命令について詳細な説明を行う。各命令のビット
割り付けを第４図に示してある。‘−’はオペレーショ
コードを示す。In the data processor of the present invention, there are a branch subroutine (BSR) instruction and a jump subroutine (JSR) instruction as the subroutine call instruction. Also, as a subroutine return instruction, a return subroutine (RTS)
As an instruction and a high-function instruction, there is a high-level language subroutine return and an EXITD instruction that releases parameters at once. Hereinafter, each instruction will be described in detail. The bit allocation for each instruction is shown in FIG. "-" Indicates an operation code.

（2.2.1）BSR命令 BSR命令はPC相対のアドレッシングのみをサポートす
るサブルーチンコール命令であり、戻り先アドレスがス
タックに退避される。第４図（Ａ），（Ｂ）に示すよう
にBSR命令に関しては一般形（Ｇフォーマット）と短縮
形（Ｄフォーマット）の２つの命令フォーマットがあ
る。Ｄステージ（２）では、どちらのフォーマットでも
同様の処理が行われる。この命令は、１つのステップコ
ードとして処理される。(2.2.1) BSR instruction The BSR instruction is a subroutine call instruction that supports only PC-relative addressing, and the return address is saved in the stack. As shown in FIGS. 4A and 4B, the BSR instruction has two instruction formats, a general type (G format) and a shortened type (D format). In the D stage (2), the same process is performed for both formats. This instruction is processed as one step code.

BSR命令実行のフローチャートを第５図に示す。BSR命
令が命令デコード部（22）で処理されると、BSR命令の
ステップコードを示すＤコード（12）と戻り先アドレス
を計算するためのＡコード（13）が生成される。Ｇフォ
ーマットの命令であれば、変位のサイズを示すフィール
ド（82B）に従って変位（82D）の値も同時に取り込む。
また、DPDEC信号（202）によりDP（66）のデクリメン
ト、及びBSRカウンタ（65）のインクリメント処理を行
う。この命令は、プリブランチを行う命令であり、PC演
算部（54）において飛び先アドレスの計算が行われ、演
算結果がCAバス（108）に出力されてプリブランチ処理
が行われる。Ａステージ（３）では、Ａコード（13）の
指示に従ってアドレス計算部（55）において戻り先のア
ドレスが計算され、A0バス（105）を介してFAレジスタ
（41）に転送される。Ｆステージ（14）ではFAレジスタ
（41）の値がSAレジスタ（42）に転送される。Ｅステー
ジ（５）では、まず、EPDEC信号（206）によりEP（67）
のプリデクリメントを行う。次に、PCWRITE信号（210）
によって、戻り先アドレスが格納されているSAレジスタ
（42）の値がS1がバス（101）を介してPCスタック（4
6）中のEP（67）の指すエントリに書き込まれる。ま
た、同時にS1バス（101）の値がALU（50）、D0バス（10
3）を介してDDレジスタ（53）に書き込まれ、戻り先ア
ドレスの格納されたDDレジスタ（53）の値をスタックポ
インタによってソフトウェアで管理されているメモリ上
のスタックにプッシュする。PCスタック（46）に戻り先
アドレスが登録されたらBSRCDEC信号（205）によりBRS
カウンタ（65）がデクリメントされる。この命令では、
Ｄステージ（２）においてすでに分岐処理が行われてい
るので、Ｅステージでは分岐処理は行わない。A flowchart for executing the BSR instruction is shown in FIG. When the BSR instruction is processed by the instruction decoding unit (22), a D code (12) indicating the step code of the BSR instruction and an A code (13) for calculating the return address are generated. In the case of the G format command, the value of the displacement (82D) is also captured at the same time according to the field (82B) indicating the size of the displacement.
Further, the DPDEC signal (202) decrements the DP (66) and increments the BSR counter (65). This instruction is an instruction for pre-branching, the jump address is calculated in the PC operation unit (54), and the operation result is output to the CA bus (108) for pre-branching processing. In the A stage (3), the return address is calculated in the address calculation section (55) according to the instruction of the A code (13) and transferred to the FA register (41) via the A0 bus (105). In the F stage (14), the value of the FA register (41) is transferred to the SA register (42). In the E stage (5), first, the EP (67) is sent by the EPDEC signal (206).
Pre-decrement of. Then the PCWRITE signal (210)
Depending on the value of the SA register (42) that stores the return address, S1 is transferred to the PC stack (4) via the bus (101).
It is written in the entry pointed to by EP (67) in 6). At the same time, the value of S1 bus (101) is ALU (50) and D0 bus (10
The value of the DD register (53), which is written in the DD register (53) via 3) and stores the return address, is pushed onto the stack on the memory managed by software by the stack pointer. When the return address is registered in the PC stack (46), BRS is sent by the BSRCDEC signal (205).
The counter (65) is decremented. In this command,
Since the branch processing has already been performed in the D stage (2), the branch processing is not performed in the E stage.

（2.2.2）JSR命令 JSR命令のビット割り付けが第４図（Ｃ）に示されて
いる。JSR命令はNEWPC（83C）の実効アドレスにサブル
ーチンジャンプする命令であり、戻り先アドレスがスタ
ックに退避される。飛び先番地に関しては複数段のアド
レッシング拡張指定が可能であるが簡単のため拡張指定
がない場合について説明する。この命令はＤステージ
（２）で２つのステップコードに分解されて処理され
る。第１のステップコードは飛び先のアドレスに関する
処理を行い、第２のステップコードでは戻り先アドレス
に関する処理を行う。(2.2.2) JSR instruction The bit allocation of the JSR instruction is shown in Fig. 4 (C). The JSR instruction is a subroutine jump instruction to the effective address of NEWPC (83C), and the return address is saved in the stack. With respect to the jump destination address, it is possible to specify the addressing extension in a plurality of stages, but for simplicity, the case where the extension is not specified will be described. This instruction is decomposed into two step codes and processed in the D stage (2). The first step code performs processing regarding the jump destination address, and the second step code performs processing regarding the return destination address.

まず、第１のステップコードに関する処理について説
明する。JSR命令が命令デコード部（22）でデコードさ
れると、JSR命令の第１ステップコードを示すＤコード
（12）と飛び先番地の実効アドレスを計算するためのＡ
コード（13）が生成される。もし飛び先番地のアドレス
計算に絶対アドレスや変位等の拡張部（83C）を必要と
する場合にはそのデータも命令キュー（21）から同時取
り込む。Ａステージ（３）では、Ａコード（13）の指示
に従ってアドレス計算部（55）において飛び先のアドレ
スが計算され、A0バス（105）を介してFAレジスタ（4
1）に転送される。Ｆステージ（４）ではFAレジスタ（4
1）の値がSAレジスタ（42）に転送される。Ｅステージ
（５）では飛び先アドレスが格納されているSAレジスタ
（42）の値がS1バス（101）を介してEBレジスタ（45）
に転送される。First, the process related to the first step code will be described. When the JSR instruction is decoded by the instruction decoding unit (22), the D code (12) indicating the first step code of the JSR instruction and the A for calculating the effective address of the jump destination address
Code (13) is generated. If an expansion unit (83C) such as an absolute address or displacement is required for address calculation of the jump destination address, that data is also fetched from the instruction queue (21) at the same time. In the A stage (3), the address calculation unit (55) calculates the jump destination address according to the instruction of the A code (13), and the FA register (4) is calculated via the A0 bus (105).
Transferred to 1). In the F stage (4), FA register (4
The value of 1) is transferred to the SA register (42). At the E stage (5), the value of the SA register (42) storing the jump destination address is transferred to the EB register (45) via the S1 bus (101).
Transferred to.

次に、第２のステップコードに関する処理について説
明する。Ｄステージ（２）ではJSR命令の第２ステップ
コードを示すＤコード（12）と戻り先アドレスの実効ア
ドレスを計算するためのＡコード（13）が生成される。
このステップコードの処理では命令キュー（21）から命
令データは取り込まれない。また、DPDEC信号（202）に
よりDP（66）のデクリメント、及び、BSRカウンタ（6
5）のインクリメント処理を行う。Ａステージ（３）で
は、Ａコード（13）の指示に従ってアドレス計算部（5
5）において戻り先のアドレスが計算され、A0バス（10
5）を介してFAレジスタ（41）に転送される。Ｆステー
ジ（４）ではFAレジスタ（41）の値がSAレジスタ（42）
に転送される。Ｅステージ（５）では、まず、EPDEC信
号（206）によりEP（67）のプリデクリメントを行う。
次に、PCWRITE信号（210）によって、戻り先アドレスが
格納されているSAレジスタ（42）の値がS1バス（101）
を介してPCスタック（46）中のEP（67）の指すエントリ
に書き込まれる。また、同時にS1バス（101）の値がALU
（50）、D0バス（103）を介してDDレジスタ（53）に書
き込まれ、戻り先アドレスの格納されたDDレジスタ（5
3）の値をスタックポインタによってソフトウェアで管
理されているメモリ上のスタックにブッシュする。第１
ステップコードですでにEBレジスタ（45）に書き込まれ
ている飛び先番地の値をCAバス（108）に出力して分岐
処理を行う。このとき、EBRA信号（204）により、BSRカ
ウンタ（65）はクリアされ、DP（66）にはEP（67）の値
がコピーされる。Next, the process related to the second step code will be described. In the D stage (2), a D code (12) indicating the second step code of the JSR instruction and an A code (13) for calculating the effective address of the return destination address are generated.
In the processing of this step code, the instruction data is not fetched from the instruction queue (21). Also, the DPDEC signal (202) decrements the DP (66) and the BSR counter (6
Perform the increment process of 5). In the A stage (3), the address calculation unit (5
The return address is calculated in 5) and the A0 bus (10
5) is transferred to the FA register (41). In F stage (4), the value of FA register (41) is SA register (42).
Transferred to. In the E stage (5), first, the EP (67) is pre-decremented by the EPDEC signal (206).
Next, by the PCWRITE signal (210), the value of the SA register (42) storing the return address is changed to the S1 bus (101).
Is written to the entry pointed to by EP (67) in the PC stack (46). At the same time, the value of S1 bus (101) is ALU.
(50), DD register (53) that has been written to the DD register (53) via the D0 bus (103) and stores the return address
Bush the value of 3) to the stack in the memory managed by software by the stack pointer. First
The value of the jump destination address already written in the EB register (45) by the step code is output to the CA bus (108) to perform branch processing. At this time, the BSR counter (65) is cleared by the EBRA signal (204), and the value of EP (67) is copied to DP (66).

以上述べたように、JSR命令でもPCスタック（46）に
関する処理はBSR命令と同じである。As described above, the processing for the PC stack (46) is the same for the JSR instruction as for the BSR instruction.

（2.2.3）RTS命令 RTS命令はサブルーチンからのリターンを行う命令で
あり、スタックから復帰されたリターンアドレスにジャ
ンプする。この命令は、１つのステップコードとして処
理される。(2.2.3) RTS instruction The RTS instruction is an instruction that returns from a subroutine and jumps to the return address returned from the stack. This instruction is processed as one step code.

RTS命令実行のフローチャートを第６図に示す。RTS命
令が命令デコード部（22）で処理されると、RTS命令の
ステップコードを示すＤコード（12）とスタックトップ
のアドレスを計算するためのＡコード（13）が生成され
る。この命令はプリリターンを行う命令である。BSRCZ
信号（201）によりパイプライン中にサブルーチンコー
ル命令が存在することが示されている場合にはBSRカウ
ンタ（65）の内容がゼロになるまで処理を一時停止す
る。BSRカウンタ（65）がゼロである場合にはプリリタ
ーン処理を行う。PRERET信号（209）により、PCスタッ
ク（46）中のDP（66）が指し示すエントリの内容をCAバ
ス（108）に出力し、先行分岐処理（プリリターン）を
行う。また、PCスタック（46）参照後、DPINC信号（20
3）によりDP（66）のポストインクリメント処理を行
う。Ａステージ（３）では、Ａコード（13）の指示に従
ってアドレス計算部（55）においてスタックトップのア
ドレスが計算され、A0バス（105）を介してFAレジスタ
（41）に書き込まれる。スタックトップのアドレスとは
ASP（40）の値そのものである。Ｆステージ（４）でFA
レジスタ（41）の値でオペランドがフェッチされ、SDレ
ジスタ（52）に取り込まれる。SDレジスタ（52）に取り
込まれた値は、スタック上に退避されていた真の戻り先
アドレスである。Ｅステージ（５）では、PCREAD信号
（211）によって、プリリターン時に参照されたリター
アドレスが格納されているPCスタック（46）中のEP（6
7）の指すエントリの内容がS1バス（101）に出力され、
DAラッチ（48）に取り込まれる。そして、真の戻り先ア
ドレスが格納されているSDレジスタ（52）の内容がS2バ
ス（102）を介してDBラッチ（49）に取り込まれる。ALU
（50）ではプリリターンが行われたアドレスと真の戻り
先アドレスとの比較が行われ、比較結果がZFLAG信号（2
12）としてＥステージ制御部（63）に送られる。また、
同時に、SDレジスタ（52）の内容がS2バス（102）、DM
ラッチ（56）、D0バス（103）を介して、レジスタファ
イル（47）中のワーキングレジスタに退避される。PCス
タック（46）参照後、EPINC信号（207）によりEP（67）
のポストインクリメントを行う。比較結果が一致してい
たら、正しいアドレスにプリリターンが行われたことを
示しており、Ｅステージ（５）は１マイクロサイクルNO
Pを実行して命令の実行を終了する。比較結果が不一致
であった場合にはプリリターンを行ったリターンアドレ
スが誤ってたことを示しており、ワーキングレジスタに
退避されている真の戻り先アドレスの値をS1バス（10
1）を介してEBレジスタ（45）に転送し、EBレジスタ（4
5）の値がCAバス（108）に出力されて分岐処理が行われ
る。このとき、EBRA信号（204）により、BSRカウンタ
（65）はクリアされ、DP（66）にはEP（67）の値がコピ
ーされる。A flow chart for executing the RTS instruction is shown in FIG. When the RTS instruction is processed by the instruction decoding unit (22), a D code (12) indicating the step code of the RTS instruction and an A code (13) for calculating the stack top address are generated. This instruction is a pre-return instruction. BSRCZ
When the signal (201) indicates that the subroutine call instruction exists in the pipeline, the processing is temporarily stopped until the content of the BSR counter (65) becomes zero. If the BSR counter (65) is zero, pre-return processing is performed. The PRERET signal (209) outputs the content of the entry pointed to by the DP (66) in the PC stack (46) to the CA bus (108), and performs the preceding branch processing (pre-return). Also, after referring to the PC stack (46), the DPINC signal (20
The post increment processing of DP (66) is performed by 3). In the A stage (3), the address calculation unit (55) calculates the stack top address according to the instruction of the A code (13) and writes the stack top address in the FA register (41) via the A0 bus (105). What is the stack top address
It is the value of ASP (40) itself. FA on F stage (4)
The operand is fetched with the value of the register (41) and taken into the SD register (52). The value stored in the SD register (52) is the true return address saved on the stack. At the E stage (5), the PCREAD signal (211) causes the EP (6) in the PC stack (46) in which the litter address referenced at the time of pre-return is stored.
The contents of the entry pointed to by 7) are output to the S1 bus (101),
It is taken into the DA latch (48). Then, the contents of the SD register (52) in which the true return address is stored are fetched into the DB latch (49) via the S2 bus (102). ALU
In (50), the pre-returned address is compared with the true return address, and the comparison result is the ZFLAG signal (2
12) is sent to the E stage control unit (63). Also,
At the same time, the contents of the SD register (52) are S2 bus (102), DM
It is saved in the working register in the register file (47) via the latch (56) and the D0 bus (103). EP (67) by EPINC signal (207) after referring to PC stack (46)
Post-increment. If the comparison results are in agreement, it indicates that the pre-return was made to the correct address, and the E stage (5) is 1 microcycle NO.
Execute P to end instruction execution. If the comparison result does not match, it indicates that the return address for which pre-return was performed is incorrect, and the value of the true return destination address saved in the working register is set to the S1 bus (10
1) to the EB register (45) and then to the EB register (4
The value of 5) is output to the CA bus (108) and branch processing is performed. At this time, the BSR counter (65) is cleared by the EBRA signal (204), and the value of EP (67) is copied to DP (66).

（２・２・４）EXITD命令 EXITD命令は高級言語用のパラメータ解放、退避して
いたレジスタの復帰、サブルーチンからのリターン、及
び、スタック上のサブルーチンパラメータの解放を行う
高機能命令である。第４図（Ｅ），（Ｆ）に示すように
EXITD命令に関してはＧフォーマットとＥフォーマット
の２つの命令フォーマットがある。Ｇフォーマットでは
３つのステップコードとして処理され、Ｅフォーマット
ではこの命令は、１つのステップコードとして処理され
る。簡単のため第４図（Ｅ）に示すＥフォーマットの命
令についてのみ説明を行う。(2.2.4) EXITD instruction The EXITD instruction is a high-performance instruction that releases parameters for high-level languages, restores saved registers, returns from subroutines, and releases subroutine parameters on the stack. As shown in FIGS. 4 (E) and (F)
There are two instruction formats, G format and E format, for the EXITD instruction. In the G format, it is processed as three step codes, and in the E format, this instruction is processed as one step code. For the sake of simplicity, only the E format instruction shown in FIG. 4 (E) will be described.

EXITD命令が命令デコード部（22）で処理されると、E
XITD命令がステップコードを示すＤコード（12）と即値
の転送を行うためのＡコード（13）が生成される。スタ
ックポインタの補正値（85B）の値は、リテラルとして
Ｄコード（12）で送られる。この命令では、２バイトの
復帰するレジスタのビットマップデータ（85C）の値も
同時に取り込み、Ａコード（13）の即値として転送され
ていく。この命令はプリリターンを行う命令である。BR
SCZ信号によりパイプライン中にサブルーチンコール命
令が存在することが示されている場合にはBSRカウンタ
（65）の内容がゼロになるまで処理を一時停止する。BS
Rカウンタ（65）がゼロである場合にはプリリターン処
理を行う。PRERET信号（209）によりPCスタック（46）
中のDP（66）が指し示すエントリの内容をCAバス（10
8）に出力し、先行分岐処理（プリリターン）を行う。
また、PCスタック（46）参照後、DPINC信号（203）によ
りDP（66）のポストインクリメント処理を行う。Ａステ
ージ（３）では、Ａコード（13）の指示に従ってアドレ
ス計算部（55）において即値の値が転送され、A0バス
（105）を介してFAレジスタ（41）に書き込まれる。Ｆ
ステージではFAレジスタ（41）の値がSAレジスタ（42）
に転送される。Ｅステージ（５）では、退避されていた
レジスタのスタックからの復帰、スタックフレームの解
放、フレームポインタのスタックからの復帰等の処理を
行った後に、スタックから戻り先アドレスの値をポップ
しDDレジスタ（53）に取り込む。また、サブルーチンパ
ラメータを解放するためにスタックポインタの補正を行
う。PCREAD信号（211）によって、プリリターン時に参
照されたリターンアドレスが格納されているPCスタック
（46）中のEP（67）の指すエントリの内容がS1バス（10
1）に出力され、DAラッチ（48）に取り込まれる。そし
て、真の戻り先アドレスが格納されているDDレジスタ
（53）の内容がS2バス（102）を介してDBラッチ（49）
に取り込まれる。ALU（50）ではプリリターンが行われ
たアドレスと真の戻り先アドレスとの比較が行われ、比
較結果がZFLAG信号（212）としてＥステージ制御部（6
3）に送られる。また、同時に、DDレジスタ（53）の内
容が、S2バス（102）、DMラッチ（56）、DDバス（103）
を介して、レジスタファイル（47）中のワーキングレジ
スタに退避される。PCスタック（46）参照後、EPINC信
号（207）によりEP（67）のポストインクリメントを行
う。比較結果が一致していたら、正しいアドレスにプリ
リターンが行われたことを示しており、Ｅステージは１
マイクロサイクルNOPを実行して命令の実行を終了す
る。比較結果が不一致であった場合にはプリリターンを
行ったリターンアドレスが誤っていたことを示してお
り、ワーキングレジスタに退避されている真の戻り先ア
ドレスの値をS1バス（101）を介してEBレジスタ（45）
に転送し、EBレジスタ（45）の値がCAバス（108）に出
力されて分岐処理が行われる。このとき、EBRA信号（20
4）により、BSRカウンタ（65）はクリアされ、DP（66）
にはEP（67）の値がコピーされる。When the EXITD instruction is processed by the instruction decoding unit (22), E
The XITD instruction generates a D code (12) indicating a step code and an A code (13) for transferring an immediate value. The value of the correction value (85B) of the stack pointer is sent as a literal in the D code (12). With this instruction, the value of the bitmap data (85C) of the 2-byte returning register is also fetched at the same time and transferred as the immediate value of the A code (13). This instruction is a pre-return instruction. BR
If the SCZ signal indicates that there is a subroutine call instruction in the pipeline, the processing is suspended until the content of the BSR counter (65) becomes zero. BS
If the R counter (65) is zero, pre-return processing is performed. PC stack (46) by PRERET signal (209)
The contents of the entry pointed to by DP (66) in the CA bus (10
Output to (8) and perform preceding branch processing (pre-return).
After referring to the PC stack (46), the post increment processing of the DP (66) is performed by the DPINC signal (203). In the A stage (3), the immediate value is transferred in the address calculation section (55) in accordance with the instruction of the A code (13) and is written in the FA register (41) via the A0 bus (105). F
At the stage, the value of FA register (41) is SA register (42)
Transferred to. In the E stage (5), after the saved registers are restored from the stack, the stack frame is released, and the frame pointer is restored from the stack, the return address value is popped from the stack and the DD register Take it into (53). In addition, the stack pointer is corrected to release the subroutine parameter. By the PCREAD signal (211), the contents of the entry pointed to by EP (67) in the PC stack (46) that stores the return address referenced at the time of pre-return is stored in the S1 bus (10
It is output to 1) and taken into the DA latch (48). Then, the contents of the DD register (53) in which the true return address is stored are transferred to the DB latch (49) via the S2 bus (102).
Is taken into. The ALU (50) compares the pre-returned address with the true return address, and the comparison result is the ZFLAG signal (212), which is the E stage control unit (6).
3) sent to. At the same time, the contents of the DD register (53) are changed to S2 bus (102), DM latch (56), DD bus (103).
Via the register file (47) to the working register. After referring to the PC stack (46), the EP (67) is post-incremented by the EPINC signal (207). If the comparison results are in agreement, it indicates that the pre-return was made to the correct address, and the E stage is 1
The micro cycle NOP is executed to terminate the execution of the instruction. If the comparison result does not match, it indicates that the return address for which pre-return was performed is incorrect, and the value of the true return destination address saved in the working register is sent via the S1 bus (101). EB register (45)
And the value of the EB register (45) is output to the CA bus (108) for branch processing. At this time, the EBRA signal (20
4) clears BSR counter (65) and DP (66)
The value of EP (67) is copied to.

以上述べたように、EXITD命令でもPCスタック（46）
に関する処理はRTS命令と同じである。As mentioned above, even with the EXITD instruction, the PC stack (46)
The processing regarding is the same as the RTS instruction.

（2.3）他の実施例の説明本実施例では、PCスタック（46）は８エントリで構成
されている。従って、サブルーチンコールが９レベル以
上の入れ子となったときには、有効な戻り先アドレスが
格納されているエントリに別の戻り先アドレスがオーバ
ーライトされるため、最初の値が消えてしまう。従っ
て、リカーシブコールを行うような特殊な場合を除い
て、９レベル以上の入れ子になると誤ったプリリターン
を行うことになる。また、プログラムにより外部メモリ
上の戻り先アドレスを書き換えても誤った戻り先アドレ
スにプリリターンを行うことになる。このため、Ｅステ
ージでプリリターンが正しかったかどうかのチェックが
必要になるわけである。PCスタックを何エントリ設ける
かに関しては、何レベルまでの深さのサブルーチンコー
ルに対して正しいプリリターンを行うかという性能の問
題と、ハードウェアの増加量との兼ね合いで決定すれば
よい。(2.3) Description of Other Embodiments In this embodiment, the PC stack (46) is composed of 8 entries. Therefore, when a subroutine call is nested in 9 levels or more, another return address is overwritten in the entry in which a valid return address is stored, and the first value disappears. Therefore, except for a special case where a recursive call is made, an erroneous pre-return will be made when nesting 9 levels or more. Also, even if the return address on the external memory is rewritten by the program, pre-return will be performed to the wrong return address. Therefore, it is necessary to check whether the pre-return was correct at the E stage. The number of PC stack entries to be provided can be determined in consideration of the performance problem of how many levels of depth the subroutine call is made and the correct pre-return, and the amount of hardware increase.

本実施例では、サブルーチンのなかでサブルーチンか
らの戻り先アドレスが書き換えられても正しい動作を保
証するため、RTS命令実行時にCPU外部からフェッチした
サブルーチンからの正しい戻り先アドレスとPCスタック
（46）からフェッチしてプリリターンに使用したアドレ
スを比較している。もし、サーブルチンからの戻り先ア
ドレスが書き換えられないソフトウエアを実行するだけ
でよいのなら（実際のアプリケーションプログラムでは
サブルーチンからの戻り先アドレスが書き換えられるこ
とはほとんどない）、CPU外部のメモリからサブルーチ
ンの戻り先アドレスをフェッチする必要はない。PCスタ
ック（46）にあるサブルーチンの戻り先アドレスが書き
換えられるかどうかをPCスタック値の有効無効を示すフ
ラッグを設けるなどしてチェックするだけよい。つま
り、CPU外部のメモリにあるサブルーチンからの戻り先
アドレスがあるスタックの整合性が保証されるのなら、
プリリターンが正しいかどうかをPCスタック（46）の管
理機構だけで判断して、PCスタックから正しいサブルー
チンの戻り先アドレスが得られないときだけ、CPU外部
のメモリからサブルーチンの戻り先アドレスをフェッチ
してそのアドレスにリターンすればよい。In this embodiment, in order to guarantee the correct operation even if the return address from the subroutine is rewritten in the subroutine, the correct return address from the subroutine fetched from outside the CPU during execution of the RTS instruction and the PC stack (46) The addresses used for fetching and pre-returning are compared. If it is only necessary to execute the software that does not rewrite the return address from Saburchin (the return address from the subroutine is hardly rewritten in the actual application program), the subroutine There is no need to fetch the return address. It is only necessary to check whether or not the return address of the subroutine in the PC stack (46) is rewritten by providing a flag indicating whether the PC stack value is valid or invalid. In other words, if the consistency of the stack with the return address from the subroutine in the memory outside the CPU is guaranteed,
Only the management mechanism of the PC stack (46) judges whether the pre-return is correct, and only when the correct return address of the subroutine cannot be obtained from the PC stack, the return address of the subroutine is fetched from the memory outside the CPU. And return to that address.

本実施例では、確実なプリリターンを行うためにBSR
カウンタ（65）を備えらているが、サブルーチンコール
命令のプリブランチ処理を行わない場合には、サブルー
チンコール命令実行後必ず飛び先番地への分岐処理が行
われ、パイプラインがキャンセルされるためこの機能は
必要ない。また、BSR命令をＤステージ（２）デコード
するときポインタDP（66）をデクリメントしているが、
BSR命令をＥステージ（５）で実行するとき、デクリメ
ントしたポインタEP（67）の値をコピーすれようにして
もよい。In this embodiment, the BSR is
Although it has a counter (65), if the pre-branch processing of the subroutine call instruction is not performed, the branch processing to the jump destination address is always performed after the subroutine call instruction is executed, and the pipeline is canceled. No function required. Also, the pointer DP (66) is decremented when the BSR instruction is decoded in the D stage (2),
When the BSR instruction is executed at the E stage (5), the value of the decremented pointer EP (67) may be copied.

また、本実施例では、Ｅステージ（５）でプリリター
ンが正しく行われたかどうかのチェックを行うために、
PCスタック（46）からプリリターンを行った戻り先アド
レスを参照して、CPU外部のメモリからフェッチした正
しい戻り先アドレスと比較するようにしているが、Ｄス
テージ（２）でプリリターンを行った戻り先アドレスを
退避しておき、Ｅステージ（５）でその退避されている
値を参照するようにしてもよい。Further, in this embodiment, in order to check whether or not the pre-return is correctly performed at the E stage (5),
The return address after pre-returning from the PC stack (46) is referenced and compared with the correct return address fetched from the memory outside the CPU, but pre-returning was performed at the D stage (2). The return address may be saved, and the saved value may be referred to in the E stage (5).

また、本実施例でＤステージ（２）より後段のステー
ジがサブルーチンコール命令を処理しているかどうかを
検出する手段としてカウンタを用いているが、各ステッ
プコードあるいは各パイプラインステージにサブルーチ
ンコール命令用のフラグを設けて、全てのフラグが立っ
ていないときのみ、プリリターン処理を行うようにして
もよい。また、ハードウェア削減のためBSRカウンタや
そのかわりとなる上記のフラグの機能を外しても、サブ
ルーチンリターン命令実行時に、プリリターンが正しか
ったかどうかのチェックを行っているので、正しい動作
を行うことができる。このときの性能低下は、サブルー
チンコール命令とそれに対応するサブルーチンリターン
命令がパイプライン中に同時に取り込まれる頻度がどの
程度あるかによる。Further, in the present embodiment, the counter is used as a means for detecting whether or not the stage subsequent to the D stage (2) is processing the subroutine call instruction. However, each step code or each pipeline stage uses the counter for the subroutine call instruction. The flag may be provided, and the pre-return processing may be performed only when all the flags are not set. Even if the function of the BSR counter or the above flag that replaces it is removed to reduce the hardware, it is checked whether the pre-return was correct when the subroutine return instruction is executed, so correct operation can be performed. it can. The performance degradation at this time depends on how often the subroutine call instruction and the corresponding subroutine return instruction are simultaneously fetched into the pipeline.

また、本実施例では、PCスタック（46）のポインタと
して、Ｄステージ（２）で管理しているポインタDP（6
6）とＥステージ（５）で管理しているポインタEP（6
7）の２つのポインタを備えている。これは、複数のサ
ブルーチンリターン命令がパイプライン中で処理される
場合にも正しい戻り先アドレスを参照できるようにした
ものである。EP（67）はＥステージ（５）で実行された
サブルーチンコール命令、サブルーチンリターン命令に
対応して変化する。DP（66）は命令デコード段階で変化
するため、２つ以上のサブルーチンリターン命令がパイ
プライン中に取り込まれても対応するサブルーチンコー
ル命令の戻り先アドレスが参照できるわけである。Ｅス
テージ（５）で分岐処理が行われたときにはパイプライ
ンはキャンセルされるのでEP（67）の値がDP（66）にコ
ピーされる。サブルーチンリターン命令実行時には、プ
リリターンが正しかったかどうかのチェックを行ってい
るので、ハードウェア削減のためPCスタック（46）のポ
インタ管理をすべてEP（67）のみで行うようにしても正
しい動作を行うことができる。この場合の性能低下は、
２つ以上のサブルーチンリターン命令がパイプライン中
に同時取り込まれる頻度がどの程度あるかによる。ポイ
ンタを１つにした場合、サブルーチンリターン命令用の
フラグを設け、Ａステージ（３）以降のステージでサブ
ルーチンリターン命令を実行中の時はそのフラグを立て
ておき、そのフラグが立っているときにはプリリターン
の処理を待つようにすると、ポインタが正しく切り替わ
ってからPCスタック（46）の参照が行えるので正しいプ
リリターンが可能となる。In the present embodiment, the pointer DP (6) managed by the D stage (2) is used as the pointer of the PC stack (46).
6) and the pointer EP (6
It has two pointers of 7). This is so that the correct return address can be referred to even when a plurality of subroutine return instructions are processed in the pipeline. EP (67) changes corresponding to the subroutine call instruction and the subroutine return instruction executed in the E stage (5). Since DP (66) changes at the instruction decoding stage, even if two or more subroutine return instructions are taken into the pipeline, the return address of the corresponding subroutine call instruction can be referred to. Since the pipeline is canceled when the branch processing is performed in the E stage (5), the value of EP (67) is copied to DP (66). When the subroutine return instruction is executed, it is checked whether the pre-return was correct, so even if all the pointer management of the PC stack (46) is performed only by EP (67) to reduce the hardware, the correct operation will be performed. be able to. The performance degradation in this case is
It depends on how often two or more subroutine return instructions are simultaneously captured in the pipeline. When the number of pointers is one, a flag for the subroutine return instruction is provided, the flag is set when the subroutine return instruction is being executed in the stages after the A stage (3), and the flag is set when the flag is set. By waiting for the return process, the PC stack (46) can be referenced after the pointer is properly switched, and thus the correct pre-return can be performed.

また、本発明のPCスタック（46）はプリリターン時に
もプリリターンが正しく行われたどうか判断するときに
もアクセスさせ、CPU外部のメモリアクセスとは独立に
行うと効率がよい。従って、CPUが１つの集積回路チッ
プで実現されるマイクロプロセッサの様なデータ処理装
置ではPCスタック（46）をCPUと同じ集積回路内にもつ
ようにすれば、CPU外部のメモリアクセスとは独立にPC
スタック（46）がアクセスできる。Further, the PC stack (46) of the present invention can be accessed at the time of pre-return as well as when it is determined whether the pre-return is correctly performed, and it is efficient to perform the access independently of the memory access outside the CPU. Therefore, in a data processing device such as a microprocessor in which the CPU is realized by one integrated circuit chip, if the PC stack (46) is provided in the same integrated circuit as the CPU, memory access outside the CPU is independent. PC
The stack (46) is accessible.

この発明は次の（１）〜（７）項の実施態様により実
施できる。This invention can be implemented by the embodiments of the following (1) to (7).

（１）第１のステージと第２のステージをもち、命令の
実行に対して第１のステージでの処理が第２のステージ
での処理に先行して行われるパイプライン処理により命
令を処理するデータ処理装置であって、命令やデータを格納する第１の記憶装置と、サブルーチンからの戻り先命令のアドレス値を格納す
る第１の記憶装置とは異なる第２の記憶装置と、サブルーチンからの戻り先アドレスとなる値を前記第
１の記憶装置に書き込む第１の書き込み手段と、サブルーチンからの戻り先アドレスとなる値を前記第
２の記憶装置に書き込む第２の書き込み手段と、前記第１のステージで制御され、第１の値を前記第２
の記憶装置から読みだす第１の読み出し手段と、サブルーチンリターン命令処理時に、サブルーチンか
らの戻り先アドレスとなる第２の値を前記第１の記憶装
置から読みだす第２の読み出し手段と、サブルーチンリターン命令処理時に、前記第１の値が
サブルーチンからの戻り先アドレスであるかどうかを判
断する判断手段と、前記第１の記憶装置から命令をフェッチする命令フェ
ッチ手段とを備え、前記命令フェッチ手段が、前記第１の記憶装置の前記
第１の値の示すアドレスから第１の命令をフェッチする
機能と、前記第１の記憶装置の前記第２の値の示すアド
レスから第２の命令をフェッチする機能を備え、サブルーチンリターン命令処理時に、前記判断手段が、前記第１の値がサブルーチンからの
戻り先アドレスであると判断するときは前記第１の命令
を実行し、前記判断手段が、前記第１の値がサブルーチンからの
戻り先アドレスでないと判断するときは前記第２の命令
を実行することを特徴とするデータ処理装置。(1) An instruction is processed by a pipeline process that has a first stage and a second stage, and for the execution of the instruction, the processing in the first stage precedes the processing in the second stage. A first memory device for storing instructions and data, a second memory device different from the first memory device for storing an address value of a return destination instruction from the subroutine, and a data processor First writing means for writing a value to be a return address to the first storage device; second writing means to write a value to be a return address from a subroutine to the second storage device; Controlled by the stage of the first value of the second
Reading means for reading from the storage device, second reading means for reading a second value serving as a return address from the subroutine from the first storage device during processing of a subroutine return instruction, and subroutine return The instruction fetch unit includes a determination unit that determines whether the first value is a return address from a subroutine during instruction processing, and an instruction fetch unit that fetches an instruction from the first storage device. , A function of fetching a first instruction from the address indicated by the first value of the first storage device, and a second instruction fetch from the address indicated by the second value of the first storage device A function is provided, and at the time of processing a subroutine return instruction, the determination means determines that the first value is a return address from the subroutine. The data processing device, wherein the first instruction is executed when the second instruction is executed, and the second instruction is executed when the determination means determines that the first value is not the return address from the subroutine. .

（２）前記判断手段は前記第１の値と前記第２の値を比
較する比較手段であり、前記比較手段により比較結果が一致するときは前記第
１の値がサブルーチンからの戻り先アドレスであると判
断し、前記比較手段による比較結果が一致しないときは前記
第１の値がサブルーチンからの戻り先アドレスでないと
判断することを特徴とする第１項記載のデータ処理装
置。(2) The judging means is a comparing means for comparing the first value and the second value, and when the comparison results match by the comparing means, the first value is a return address from the subroutine. The data processing device according to claim 1, wherein it is determined that the first value is not the return address from the subroutine when the comparison result by the comparison means does not match.

（３）前記第２の記憶装置がサイクリックな番号がつけ
らた複数のエントリと、前記サイクリックな番号を管理
する第１のポインタレジスタにより構成したスタック記
憶装置であることを特徴とする第１項記載のデータ処理
装置。(3) The second storage device is a stack storage device configured by a plurality of entries with cyclic numbers and a first pointer register for managing the cyclic numbers. The data processing device according to item 1.

（４）前記第２の記憶装置は2ⁿ個のエントリで構成さ
れ、インクリメントまたはデクリメントの少なくとも一方
が可能で、前記エントリの番号を管理する第１のｎビッ
トカウンタと、インクリメントおよびデクリメントの両方が可能で、
前記エントリの番号を管理する第２のｎビットカウンタ
と、前記第２の記憶装置の前記第２のｎビットカウンタの
値が示すエントリから読み出す第３の読み出し手段と、前記第２のｎビットカウンタの値を前記第１のｎビッ
トカウンタに書き込む第３の書き込み手段を備え、前記第２の書き込み手段が前記第２の記憶装置の前記
第２のｎビットカウンタの値が示すエントリ番号にサブ
ルーチンからの戻り先アドレスを書き込む手段であり、前記第１の読み出し手段が前記第２の記憶装置の前記
第１のｎビットカウンタの値が示すエントリから前記第
１の値を読み出す手段であり、前記比較手段が前記第３の読み出し手段により読み出
して得た前記第１の値を前記第２の値と比較することを特徴とする第２項記載のデータ処理装置。(4) The second storage device is composed of 2 ⁿ entries and can be incremented or decremented by at least one of them, and the first n-bit counter for managing the number of the entry and both the incrementation and decrement Possible,
A second n-bit counter that manages the number of the entry; a third reading unit that reads from the entry indicated by the value of the second n-bit counter of the second storage device; and the second n-bit counter From the subroutine to the entry number indicated by the value of the second n-bit counter of the second storage device. The first read means is means for reading the first value from the entry indicated by the value of the first n-bit counter of the second storage device, The data processing device according to claim 2, wherein the means compares the first value obtained by reading by the third reading means with the second value.

（５）前記第１のステージで処理を終えた全サブルーチ
ンコール命令に対する、前記第２の記憶装置へのサブル
ーチンからの戻り先命令のアドレスの書き込み処理が終
了しているかどうかを検出するサブルーチンコール命令
処理検出手段を備えたことを特徴とする第１項あるいは第４項記載のデータ処理装置。(5) Subroutine call instruction for detecting whether or not the processing of writing the address of the return destination instruction from the subroutine to the second storage device has been completed for all the subroutine call instructions that have been processed in the first stage The data processing apparatus according to claim 1 or 4, further comprising processing detection means.

（６）命令やデータを格納する第１の記憶装置と、第１
の記憶装置とは異なり、サブルーチンからの戻り先命令
のアドレス値を格納し、2ⁿ個のエントリからなる第２の
記憶装置と、インクリメントまたはデクリメントの少なくとも一方
が可能で、前記エントリの番号を管理する第１のｎビッ
トカウンタと、インクリメントおよびデクリメントの両方が可能で、
前記エントリの番号を管理する第２のｎビットカウンタ
と、前記第２の記憶装置の前記第１のｎビットカウンタの
値が示すエントリから値を読みだす第１の読み出し手段
と、前記第２の記憶装置の前記第２のｎビットカウンタの
値が示すエントリから値を読みだす第２の読み出し手段
と、前記第２の記憶装置の前記第２のｎビットカウンタの
値が示すエントリにサブルーチンからの戻り先命令のア
ドレスを書き込む第１の書き込み手段と、前記第２のｎビットカウンタの値を前記第１のｎビッ
トカウンタに書き込む第２の書き込み手段と、を備えることを特徴とするデータ処理装置。(6) A first storage device for storing instructions and data, and a first storage device.
Unlike the above memory device, it stores the address value of the return destination instruction from the subroutine, and it is possible to increment or decrement the second memory device consisting of 2 ⁿ entries and manage the number of the entry. The first n-bit counter that does, and can both increment and decrement,
A second n-bit counter for managing the number of the entry; a first reading means for reading a value from the entry indicated by the value of the first n-bit counter of the second storage device; Second read means for reading a value from an entry indicated by the value of the second n-bit counter of the storage device; and an entry indicated by the value of the second n-bit counter of the second storage device from the subroutine. A data processing device, comprising: a first writing unit that writes an address of a return destination instruction; and a second writing unit that writes the value of the second n-bit counter to the first n-bit counter. .

（７）サブルーチンからの戻り先命令のアドレス値を格
納し、2ⁿ個のエントリからなる記憶装置と、インクリメントまたはデクリメントの少なくとも一方
が可能で前記エントリの番号を管理する第１のｎビット
カウンタと、インクリメントおよびデクリメントの両方が可能で、
前記エントリの番号を管理する第２のｎビットカウンタ
と、前記第２の記憶装置の前記第１のｎビットカウンタの
値が示すｎビットカウンタから値を読みだす第１の読み
出し手段と、前記第２の記憶装置の前記第２のｎビットカウンタの
値が示すｎビットカウンタから値を読みだす第２の読み
出し手段と、前記第２の記憶装置の前記第２のｎビットカウンタの
値が示すエントリにサブルーチンからの戻り先命令のア
ドレスを書き込む第１の書き込み手段と、前記第２のｎビットカウンタの値を前記第１のｎビッ
トカウンタに書き込む第２の書き込み手段と、をCPUと同じ単一の集積回路内に備えることを特徴と
するデータ処理装置。(7) A storage device that stores the address value of the return destination instruction from the subroutine, a storage device that is made up of 2 ⁿ entries, and a first n-bit counter that can increment or decrement and that manages the entry number , Both increment and decrement are possible,
A second n-bit counter for managing the number of the entry; first reading means for reading a value from the n-bit counter indicated by the value of the first n-bit counter of the second storage device; Second reading means for reading a value from the n-bit counter indicated by the value of the second n-bit counter of the second storage device; and an entry indicated by the value of the second n-bit counter of the second storage device. A first writing means for writing an address of a return destination instruction from the subroutine, and a second writing means for writing the value of the second n-bit counter to the first n-bit counter; And a data processing device provided in the integrated circuit.

〔The invention's effect〕

以上のように、この発明によればサブルーチンコール
命令の戻り先アドレスのみを格納するPCスタックを設け
ることにより、サブルーチンリターン命令の分岐処理を
命令実行ステージでの処理に先だって行うことができ、
サブルーチンリターン命令実行によるパイプライン処理
のオーバーヘッドが削減されるので、高性能なデータ処
理装置が得られる効果がある。As described above, according to the present invention, by providing the PC stack that stores only the return address of the subroutine call instruction, the branch processing of the subroutine return instruction can be performed prior to the processing at the instruction execution stage,
Since the overhead of pipeline processing due to the execution of the subroutine return instruction is reduced, a high-performance data processing device can be obtained.

[Brief description of drawings]

第１図は本発明のデータ処理装置のパイプライン処理構
成を示す図、第２図は本発明のデータ処理装置のブロッ
ク図、第３図は本発明のデータ処理装置におけるサブル
ーチンリターン命令の先行分岐処理に特に関係する部分
のブロック図、第４図は本発明のデータ処理装置におけ
るサブルーチンコール命令及びサブルーチンリターン命
令のビット割り付けを示す図、第５図はBSR命令実行の
フローチャート、第６図はRTS命令実行のフローチャー
ト、第７図は従来のデータ処理装置の典型的なパイプラ
インステージを示す図である。（46）はサブルーチンコール命令の戻り先アドレスのみ
を格納するPCスタック、（65）は命令デコードステージ
以降のステージで処理されているサブルーチンコール命
令の数をカウントするBSRカウンタ、（66）は命令デコ
ードステージが管理しているPCスタック（46）のポイン
タDP、（67）は命令実行ステージが管理しているPCスタ
ック（46）のポインタEPである。なお、図中、同一符号は同一、又は相当部分を示す。FIG. 1 is a diagram showing a pipeline processing configuration of a data processing device of the present invention, FIG. 2 is a block diagram of the data processing device of the present invention, and FIG. 3 is a preceding branch of a subroutine return instruction in the data processing device of the present invention. FIG. 4 is a block diagram of a portion particularly related to processing, FIG. 4 is a diagram showing bit allocation of a subroutine call instruction and a subroutine return instruction in the data processing device of the present invention, FIG. 5 is a flowchart of BSR instruction execution, and FIG. 6 is RTS. FIG. 7 is a flow chart of instruction execution, and shows a typical pipeline stage of a conventional data processing device. (46) is a PC stack that stores only the return address of a subroutine call instruction, (65) is a BSR counter that counts the number of subroutine call instructions being processed in the instruction decode stage and subsequent stages, and (66) is instruction decode A pointer DP of the PC stack (46) managed by the stage is a pointer EP of a PC stack (46) managed by the instruction execution stage. In the drawings, the same reference numerals indicate the same or corresponding parts.

Claims

[Claims]

1. A first stage and a second stage,
What is claimed is: 1. A data processing device for processing an instruction by pipeline processing, wherein the processing in the first stage precedes the processing in the second stage, comprising: an instruction storage device for storing the instruction; 1 storage device, a second storage device different from the first storage device that stores the address value of the return destination instruction from the subroutine, and a value that becomes the return destination address from the subroutine when the subroutine call instruction is processed. First writing means for writing to a first storage device; second writing means for writing a value serving as a return address from a subroutine to the second storage device during processing of a subroutine call instruction; and the first stage And a second read means for reading a first value from the second storage device when processing a subroutine return instruction. And a second value which is a return address from the true subroutine and is read from the first storage device when the subroutine return instruction is processed.
Read means, a judgment means for judging whether the first value is a return address from a true subroutine at the time of processing a subroutine return instruction, and an instruction fetch means for fetching an instruction from the instruction storage device. The instruction fetch unit specifies the address indicated by the first value from the instruction storage device prior to the completion of the reading of the second value by the second reading unit during processing of a subroutine return instruction. And a function of fetching a second instruction designated by the address indicated by the second value from the instruction storage device. When the judging means judges that the first value is a return address from a true subroutine, the first instruction is executed. However, when the judging means judges that the first value is not the return address from the true subroutine, the preceding processing in the pipeline of the first instruction is invalidated and the second instruction is executed. A data processing device characterized by the above.

2. A first stage and a second stage,
A data processing device for processing instructions by pipeline processing performed in the first stage prior to the processing in the second stage, comprising: a first storage device for storing instructions and data; Unlike the first storage device, it stores the address value of the return destination instruction from the subroutine, and can be incremented or decremented by the second storage device consisting of 2 ⁿ entries, and the entry number A first n-bit counter that manages, a second n-bit counter that can both increment and decrement, and that manages the number of the entry, and a second n-bit counter that manages the number of the entry at the first stage when processing a subroutine return instruction. Reading means for reading a value from the entry indicated by the value of the first n-bit counter of the storage device, Second read means for reading a value from the entry indicated by the value of the second n-bit counter of the second storage device in the second stage during return instruction processing; The first address of the return destination instruction from the subroutine is written into the entry indicated by the value of the second n-bit counter of the second storage device.
Writing means and the second n stage when the pre-processing performed prior to the second stage in the first stage is invalidated.
A second writing means for writing the value of the bit counter to the first n-bit counter, and a data processing device.

3. A first stage and a second stage,
A data processing device for processing an instruction by pipeline processing performed in the first stage prior to the processing in the second stage, storing the address value of a return destination instruction from a subroutine,
A storage device consisting of 2 ⁿ entries, a first n-bit counter capable of incrementing or decrementing and managing the number of the entry, and a first n-bit counter capable of incrementing and decrementing and managing the number of the entry A second n-bit counter, and a first reading unit that reads a value from an entry indicated by the value of the first n-bit counter of the storage device in the first stage during processing of a subroutine return instruction, Second read means for reading a value from the entry indicated by the value of the second n-bit counter of the storage device in the second stage during processing of the subroutine return instruction; and the storage device during processing of the subroutine call instruction. Return from the subroutine to the entry indicated by the value of the second n-bit counter of When the first writing means for writing the address of the previous instruction, prior to said second stage was performed prior at the first stage processing is disabled, the second n
A data processing device comprising: a second writing means for writing a value of a bit counter into the first n-bit counter; and a second writing means in the same single integrated circuit as the CPU.