JPH0769814B2

JPH0769814B2 - Data processing device with pipeline processing mechanism

Info

Publication number: JPH0769814B2
Application number: JP63086704A
Authority: JP
Inventors: 雅仁松尾
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1988-04-07
Filing date: 1988-04-07
Publication date: 1995-07-31
Anticipated expiration: 2010-07-31
Also published as: JPH01258032A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、高度なパイプライン処理機構により高い処
理能力を実現したデータ処理装置に関するものであり、
特に、サブルーチンリターン命令に関しても、パイプラ
イン処理の初期の段階で戻り先アドレスへの先行分岐処
理が可能なデータ処理装置に関するものである。Description: TECHNICAL FIELD The present invention relates to a data processing device that realizes high processing capability by an advanced pipeline processing mechanism,
In particular, the present invention also relates to a data processing device capable of performing a preceding branch process to a return destination address in the early stage of pipeline processing even for a subroutine return instruction.

[Conventional technology]

第８図は従来のデータ処理装置の典型的なパイプライン
ステージを示す図であり、図において、（１）は命令フ
エツチステージ、（２）は命令デコードステージ、
（３）はアドレス計算ステージ、（４）はオペランドフ
エツチステージ、（５）は実行ステージ、（８）はオペ
ランドライトステージである。FIG. 8 is a diagram showing a typical pipeline stage of a conventional data processing apparatus. In the figure, (1) is an instruction fetch stage, (2) is an instruction decode stage,
(3) is an address calculation stage, (4) is an operand fetch stage, (5) is an execution stage, and (8) is an operand write stage.

次に動作について説明する。第８図に示したデータ処理
装置は、バスが空いている時間を利用して命令データの
取り込みを行う命令フエツチステージ（１）、命令デー
タの解析を行う命令デコードステージ（２）、オペラン
ド等のアドレス計算を行うアドレス計算ステージ
（３）、オペランドデータのフエツチを行うオペランド
フエツチステージ（４）、データの処理を行う実行ステ
ージ（５）、オペランドデータの書き込みを行うオペラ
ンドライトステージ（８）の６段のパイプラインステー
ジで構成されており、各ステージは異なる命令を同時に
処理することが可能である。ただしオペランドやメモリ
アクセスに関してコンフリクトが起こつたような場合に
は優先度の低いステージがコンフリクトが解消されるま
で処理を一時停止する。Next, the operation will be described. The data processing device shown in FIG. 8 uses an idle time of the bus to fetch the instruction data, an instruction fetch stage (1), an instruction decode stage (2) for analyzing the instruction data, an operand, etc. Of the address calculation stage (3) for performing address calculation of the operand, the operand fetch stage (4) for fetching operand data, the execution stage (5) for processing data, and the operand write stage (8) for writing operand data. It is composed of 6 pipeline stages, and each stage can simultaneously process different instructions. However, if a conflict occurs with respect to operands or memory access, the process with a low priority is suspended until the conflict is resolved.

以上のように、パイプライン化されたデータ処理装置で
は、データの処理の流れに従つて処理を複数のステージ
に分割し、各ステージを同時に動作させることにより、
１命令に必要な平均処理時間を短縮させて全体としての
性能を向上させている。As described above, in the pipelined data processing device, the process is divided into a plurality of stages according to the flow of data processing, and each stage is operated at the same time,
The average processing time required for one instruction is shortened to improve the overall performance.

ところが、このようにパイプライン化されたデータ処理
装置において、分岐命令等の命令の流れを乱す命令が実
行ステージ（５）で実行された場合には、それより前の
ステージで行われていた処理がすべてキヤンセルされ、
次に実行される命令は命令のフエツチから行わなければ
ない。このように、処理の流れを乱す命令が実行される
と、パイプライン処理のオーバーヘツドが大きくなり、
データ処理装置の実行速度が上がらない。データ処理装
置の性能向上のため、無条件分岐命令、条件分岐命令等
の命令実行に関するオーバーヘツド削減について様々な
工夫がなされてきた。例えば、分岐命令のアドレスと分
岐先のアドレスを組にして記憶しておくブランチターゲ
ツトバツフアというものを用いて、命令フエツチの段階
で命令の流れを予測し、処理を行つている。（J.K.F.Le
e and A.J.Smith,“Branch Prediction Strategies and
Branch Target Buffer Design,"IEEE COMPUTER Vol.1
7,No.1,January 1984,pp.6-22.参照）以上のように、パ
イプライン処理の初期の段階で処理の流れを予測し、次
に実行されると予測される命令をパイプラインに流す
（以下先行分岐処理と呼ぶ）ことにより分岐命令実行時
のオーバーヘツド削減が計られている。ところが、サブ
ルーチンからのリターン命令に関してはサブルーチンか
らのリターンアドレスが、対応するサブルーチンコール
命令のアドレスに依存するため、処理の流れを予測する
ことが困難であつた。However, in such a pipelined data processing device, when an instruction that disturbs the flow of instructions such as a branch instruction is executed in the execution stage (5), the processing performed in the previous stage is executed. Are all canceled,
The next instruction to be executed must come from the instruction's fetch. In this way, when an instruction that disturbs the processing flow is executed, the pipeline processing overhead increases,
The execution speed of the data processor does not increase. In order to improve the performance of the data processing device, various measures have been made to reduce the overhead related to instruction execution such as unconditional branch instructions and conditional branch instructions. For example, a branch target buffer that stores the address of the branch instruction and the address of the branch destination as a pair is used to predict the flow of the instruction at the instruction fetch stage and perform the processing. (JKFLe
e and AJSmith, “Branch Prediction Strategies and
Branch Target Buffer Design, "IEEE COMPUTER Vol.1
7, No. 1, January 1984, pp. 6-22.) As described above, the flow of processing is predicted at the early stage of pipeline processing, and the pipeline is predicted to be executed next. (Hereinafter referred to as the preceding branch processing) to reduce the overhead at the time of executing the branch instruction. However, regarding the return instruction from the subroutine, it is difficult to predict the flow of processing because the return address from the subroutine depends on the address of the corresponding subroutine call instruction.

[Problems to be Solved by the Invention]

従来のデータ処理装置は、以上で述べたように、サブル
ーチンからのリターン命令に対してサブルーチンからの
リターンアドレスが対応するサブルーチンコール命令の
アドレスに依存するため、処理の流れを予測する有効な
手段がなかつた。As described above, in the conventional data processing device, since the return address from the subroutine depends on the address of the corresponding subroutine call instruction with respect to the return instruction from the subroutine, an effective means for predicting the processing flow is provided. Nakatsuta.

この発明は上記のような問題点を解消するためになされ
たもので、サブルーチンリターン命令に関しても、パイ
プライン処理の初期の段階で戻り先アドレスへの先行分
岐処理が可能なデータ処理装置を得ることを目的とす
る。The present invention has been made in order to solve the above problems, and it is possible to obtain a data processing device capable of performing a pre-branch processing to a return destination address even in the case of a subroutine return instruction at an early stage of pipeline processing. With the goal.

[Means for Solving the Problems]

この発明に係るデータ処理装置は、サブルーチンコール
命令のリターンアドレスのみを格納するプログラムカウ
ンタ（PC）専用のスタツクメモリ（以下PCスタツクと呼
ぶ）を備えたものである。The data processor according to the present invention comprises a stack memory (hereinafter referred to as a PC stack) dedicated to a program counter (PC) that stores only the return address of a subroutine call instruction.

[Action]

この発明におけるデータ処理装置は、実行ステージでサ
ブルーチンコール命令実行時にサブルーチンからのリタ
ーンアドレスがPCスタツクにプツシユされ、命令デコー
ドステージでサブルーチンリターン命令デコード時にPC
スタツクからポツプされたアドレスに先行分岐処理を行
う。In the data processor according to the present invention, the return address from the subroutine is pushed to the PC stack when the subroutine call instruction is executed at the execution stage, and the PC when the subroutine return instruction is decoded at the instruction decode stage.
Preceding branch processing is performed on the address popped from the stack.

〔Example〕

（１）パイプライン機構この発明のデータ処理装置のパイプライン処理は第１図
に示す構成となる。命令のプリフエツチを行う命令フエ
ツチステージ（IFステージ（１））、１段目の命令のデ
コードを行うデコードステージ（Ｄステージ（２））、
２段目の命令のデコードとオペランドのアドレス計算を
行うオペランドアドレス計算ステージ（Ａステージ
（３））、マイクロROMのアクセス（特にＲステージ
（６）と呼ぶ）とオペランドのプリフエツチ（特にOFス
テージ（７）と呼ぶ）を行うオペランドフエツチステー
ジ（Ｆステージ（４））、命令の実行を行う実行ステー
ジ（Ｅステージ（５））の５段構成をパイプライン処理
の基本とする。Ｅステージ（５）では１段のストアバツ
フアがあるほか、高機能命令の一部は命令実行自体をパ
イプライン化するため、実際には５段以上のパイプライ
ン処理効果がある。(1) Pipeline Mechanism The pipeline processing of the data processing apparatus of the present invention has the configuration shown in FIG. An instruction fetch stage (IF stage (1)) for prefetching instructions, a decode stage (D stage (2)) for decoding the first instruction,
An operand address calculation stage (A stage (3)) for decoding the second-stage instruction and operand address calculation, a micro ROM access (especially referred to as R stage (6)), and an operand pre-fetch (especially OF stage (7)) )) Is performed and an execution stage (E stage (5)) for executing an instruction is performed as a basic five-stage pipeline process. In the E stage (5), there is a one-stage store buffer, and in addition, some high-performance instructions pipeline the instruction execution itself, so there is actually a pipeline processing effect of five or more stages.

各ステージは他のステージとは独立に動作し、論理上は
５つのステージが完全に独立動作する。各ステージは１
回の処理を最小２クロツクで行うことができる。従つて
理想的には２クロツクごとに次々とパイプライン処理が
進行する。Each stage operates independently of the other stages, and theoretically five stages operate completely independently. Each stage is 1
A minimum of 2 clocks can be processed. Therefore, ideally, pipeline processing proceeds one after another every two clocks.

この発明のデータ処理装置にはメモリ−メモリ間演算
や、メモリ間接アドレツシングなど、基本パイプライン
処理１回だけでは処理が行えない命令があるが、本発明
のデータ処理装置はこれらの処理に対してもなるべく均
衡したパイプライン処理が行えるように設計されてい
る。複数のメモリオペランドをもつ命令に対してはメモ
リオペランドの数をもとに、デコード段階で複数のパイ
プライン処理単位（ステツプコード）に分解してパイプ
ライン処理を行うのである。パイプライン処理単位の分
解方法に関しては特願昭61-236456で詳しく述べられて
いる。The data processing device of the present invention has instructions that cannot be processed by only one basic pipeline process, such as memory-memory operation and memory indirect addressing. It is designed for balanced pipeline processing as much as possible. For an instruction having a plurality of memory operands, the pipeline processing is performed by decomposing into a plurality of pipeline processing units (step codes) at the decoding stage based on the number of memory operands. The method of disassembling the pipeline processing unit is described in detail in Japanese Patent Application No. 61-236456.

IFステージ（１）からＤステージ（２）に渡される情報
は命令コード（11）そのものである。Ｄステージ（２）
からＡステージ（３）に渡される情報は命令で指定され
た演算に関するもの（Ｄコード（12）と呼ぶ）と、オペ
ランドのアドレス計算に関係するもの（Ａコード（13）
と呼ぶ）との２つある。Ａステージ（３）からＦステー
ジ（４）に渡される情報はマイクロプログラムルーチン
のエントリ番地やマイクロプログラムへのパラメータな
どを含むＲコード（14）と、オペランドのアドレスとア
クセス方法指示情報などを含むＦコード（15）との２つ
である。Ｆステージ（４）からＥステージ（５）に渡さ
れる情報は演算制御情報とリテラルなどを含むＥコード
（16）と、オペランドやオペランドアドレスなどを含む
Ｓコード（17）との２つである。The information passed from the IF stage (1) to the D stage (2) is the instruction code (11) itself. D stage (2)
The information passed from A to A stage (3) is related to the operation specified by the instruction (called D code (12)) and the information related to operand address calculation (A code (13)).
There are two). The information passed from the A stage (3) to the F stage (4) is an R code (14) including an entry address of a microprogram routine and a parameter to the microprogram, and an F code including an operand address and access method instruction information. It is two with the code (15). Information passed from the F stage (4) to the E stage (5) is an E code (16) including operation control information and a literal and an S code (17) including operands and operand addresses.

（1.1）各パイプラインステージの処理（1.1.1）命令フエツチステージ命令フエツチステージ（IFステージ（１））は外部メモ
リから命令をフエツチし、命令キユーに入力して、Ｄス
テージ（２）に対して命令コード（11）を出力する。(1.1) Processing of each pipeline stage (1.1.1) Instruction fetch stage The instruction fetch stage (IF stage (1)) fetches an instruction from the external memory and inputs it to the instruction queue, and then the D stage (2). The instruction code (11) is output to.

命令キユーの入力は整置された４バイト単位で行う。メ
モリから命令をフエツチするときは整置された４バイト
につき最小２クロツクを要する。Input the instruction queue in aligned 4-byte units. Fetching instructions from memory requires a minimum of 2 clocks for every 4 bytes aligned.

命令キユーの出力単位は２バイトごとに可変であり、２
クロツクの間に最大６バイトまで出力できる。また分岐
の直後には命令キユーをバイパスして命令基本部２バイ
トを直接命令デコーダに転送することもできる。The output unit of the instruction queue is variable every 2 bytes.
Up to 6 bytes can be output during the clock. Immediately after branching, it is possible to bypass the instruction queue and transfer the two bytes of the basic instruction portion directly to the instruction decoder.

プリフエツチ先命令アドレスの管理もIFステージ（１）
で行う。次にフエツチすべき命令のアドレスは命令キユ
ーに入力すべき命令のアドレスとして専用のカウンタで
計算される。分岐やジヤンプが起きたときには、新たな
命令のアドレスが、PC演算部やデータ演算部より転送さ
れてくる。IF stage (1) for managing prefetch destination instruction address
Done in. The address of the instruction to be fetched next is calculated by a dedicated counter as the address of the instruction to be input to the instruction queue. When a branch or jump occurs, the address of the new instruction is transferred from the PC arithmetic unit or data arithmetic unit.

（1.1.2）命令デコードステージ命令デコードステージ（Ｄステージ（２））はIFステー
ジ（１）から入力された命令コード（11）をデコードす
る。命令コードは16ビツト（ハーフワード）単位となつ
ている。デコードは２クロツク単位に１度行ない、１回
のデコード処理で０〜３ハーフワードの命令コードを消
費する。このＤステージ（２）で命令コードがパイプラ
イン処理単位であるステツプコードに分解される。すな
わち、１命令が１つないし複数のステツプコードに分解
されて、後段のパイプラインステージで処理されていく
のである。Ｄステージ（２）ではステツプコードとして
Ａステージ（３）に対してアドレス計算情報であるＡコ
ード（13）と、オペコードの中間デコード結果であるＤ
コード（12）とを出力する。(1.1.2) Instruction Decode Stage The instruction decode stage (D stage (2)) decodes the instruction code (11) input from the IF stage (1). The instruction code is in units of 16 bits (halfword). Decoding is performed once for every two clocks, and 0 to 3 half-word instruction codes are consumed by one decoding process. At the D stage (2), the instruction code is decomposed into a step code which is a pipeline processing unit. That is, one instruction is decomposed into one or a plurality of step codes and processed in the subsequent pipeline stage. In the D stage (2), an A code (13) which is address calculation information for the A stage (3) as a step code and a D which is an intermediate decoding result of the operation code.
The code (12) and is output.

Ｄステージ（２）ではPC演算部の制御、分岐予測処理、
プリブランチ命令に対する先行分岐処理（プリブラン
チ）、命令キユーからの命令コード出力制御等も行う。
プリブランチ処理とは、Ｅステージ（５）での分岐処理
に先立ち、無条件分岐命令、条件分岐命令等の分岐を予
測し、PC演算部で飛び先の番地を計算し、IFステージ
（１）に飛び先の命令をフエツチさせ、飛び先の命令を
パイプラインに流すことである。プリブランチ命令と
は、プリブランチ処理を行う命令である。In the D stage (2), control of the PC arithmetic unit, branch prediction processing,
Pre-branch processing (pre-branch) for a pre-branch instruction and instruction code output control from the instruction queue are also performed.
Pre-branch processing means predicting branches such as unconditional branch instructions and conditional branch instructions prior to branch processing at the E stage (5), calculating the address of the jump destination in the PC arithmetic unit, and the IF stage (1) To jump to the destination instruction, and to flow the destination instruction to the pipeline. The pre-branch instruction is an instruction for performing pre-branch processing.

（1.1.3）オペランドアドレス計算ステージオペランドアドレス計算ステージ（Ａステージ（３））
は処理が大きく２つに分かれる。１つはオペコードの後
段デコードを行う処理で、もう１つはオペランドのアド
レスの計算を行う処理である。(1.1.3) Operand address calculation stage Operand address calculation stage (A stage (3))
Is roughly divided into two processes. One is the process of performing the subsequent decoding of the opcode, and the other is the process of calculating the address of the operand.

オペコードの後段デコード処理はＤコード（12）を入力
とし、レジスタやメモリの書き込み予約及びマイクロプ
ログラムのエントリ番地とマイクロプログラムに対する
パラメータなどを含むＲコード（14）の出力を行う。な
お、レジスタやメモリの書き込み予約は、アドレス計算
で参照したレジスタやメモリの内容が、パイプライン上
を先行する命令で書き換えられ、誤つたアドレス計算が
行われるのを防ぐためのものである。レジスタやメモリ
の書き込み予約はデツドロツクを避けるため、ステツプ
コードごとに行うのではなく命令ごとに行う。レジスタ
やメモリの書き込み予約については特願昭62-144394で
詳しく述べられている。The subsequent decoding process of the operation code receives the D code (12) as input, and outputs the R code (14) including the write reservation of the register and the memory and the entry address of the microprogram and parameters for the microprogram. Note that the register or memory write reservation is for preventing the contents of the register or memory referred to in the address calculation from being rewritten by a preceding instruction on the pipeline and causing incorrect address calculation. In order to avoid deadlock, write reservation of registers and memory is performed not for each step code but for each instruction. The write reservation of registers and memory is described in detail in Japanese Patent Application No. 62-144394.

オペランドアドレス計算処理はＡコード（13）を入力と
し、Ａコード（13）に従いオペランドアドレス計算部で
加算やメモリ間接参照を組み合わせてアドレス計算行
い、その計算結果をＦコード（15）として出力する。こ
の際、アドレス計算に伴うレジスタやメモリの読み出し
時にコンフリクトチエツクを行い、先行命令がレジスタ
やメモリに書き込み処理を終了していないためコンフリ
クトが指示されれば、先行命令がＥステージ（５）で書
き込み処理を終了するまで待つ。In the operand address calculation process, an A code (13) is input, an address calculation is performed according to the A code (13) by combining addition and memory indirect reference, and the calculation result is output as an F code (15). At this time, a conflict check is performed at the time of reading the register or memory associated with the address calculation, and if the conflict is instructed because the preceding instruction has not completed the writing process to the register or memory, the preceding instruction is written at the E stage (5). Wait until the process is completed.

また、Ａステージ（３）ではスタツクからのポツプ操
作、スタツクへのプツシユ操作等によるスタツクポイン
タ（SP）のコンフリクトを防ぐため、実行ステージ
（５）のSPに先行してＡステージスタツクポインタ（AS
P）を備えており、ポツプ、プツシユ操作に伴うASPの更
新はこのステージで行われる。従つて、通常のポツプ、
プツシユ操作直後でもASPを参照することにより、SPの
コンフリクトでステツプコードの処理を遅らせることな
く処理を進めることができる。SPの管理方法に関しては
特願昭62-145852で詳しく述べられている。In the A stage (3), in order to prevent a stack pointer (SP) conflict due to a push operation from the stack, a push operation to the stack, etc., the A stage stack pointer (SP) precedes the SP of the execution stage (5). AS
P) is provided, and the ASP update associated with the pop and push operation is performed at this stage. Therefore, a normal pop,
By referring to the ASP even immediately after the push operation, the processing can be advanced without delaying the step code processing due to the SP conflict. The method of managing the SP is described in detail in Japanese Patent Application No. 62-145852.

（1.1.4）マイクロROMアクセスステージオペランドフエツチステージ（Ｆステージ（４））も処
理が大きく２つに分かれる。１つはマイクロROMのアク
セス処理であり、特にＲステージ（６）と呼ぶ。他方は
オペランドプリフエツチ処理であり、特にOFステージ
（７）と呼ぶ。Ｒステージ（６）とOFステージ（７）は
必ずしも同時に動作するわけではなく、メモリアクセス
権が獲得できるかどうかなどに依存して、独立に動作す
る。(1.1.4) Micro ROM access stage The operand fetch stage (F stage (4)) is also roughly divided into two processes. One is a micro ROM access process, which is particularly called an R stage (6). The other is an operand prefetch process, which is particularly called an OF stage (7). The R stage (6) and the OF stage (7) do not always operate at the same time, but operate independently depending on whether or not a memory access right can be acquired.

Ｒステージ（６）では、Ｒコード（14）に対して次のＥ
ステージ（５）での実行に使用する実行制御コードであ
るＥコード（16）を作り出すためのマイクロROMアクセ
スとマイクロ命令デコード処理が行われる。１つのＲコ
ードに対する処理が２つ以上のマイクロプログラムステ
ツプに分解される場合、マイクロROMはＥステージ
（５）で使用され、次のＲコード（14）はマイクロROM
アクセス待ちになる。Ｒコード（14）に対するマイクロ
ROMアクセスが行われるのはその前のＥステージ（５）
での最後のマイクロ命令実行の時である。本発明のデー
タ処理装置ではほとんどの基本命令は１マイクロプログ
ラムステツプ行われるため実際にはＲコード（14）に対
するマイクロROMアクセスが次々と行われることが多
い。In the R stage (6), the next E for the R code (14)
Micro ROM access and microinstruction decoding processing for producing an E code (16) which is an execution control code used for execution in the stage (5) are performed. When the processing for one R code is decomposed into two or more microprogram steps, the micro ROM is used in the E stage (5) and the next R code (14) is the micro ROM.
Waiting for access. Micro for R code (14)
ROM access is performed before the E stage (5)
It is the time of the last microinstruction execution in. In the data processor of the present invention, most of the basic instructions are executed by one microprogram step, so in practice, micro ROM access to the R code (14) is often performed one after another.

（1.1.5）オペランドフエツチステージオペランドフエツチステージ（OFステージ（７））はＦ
ステージ（４）で行う上記の２つの処理のうちオペラン
ドプリフエツチ処理を行う。(1.1.5) Operand fetch stage Operand fetch stage (OF stage (7)) is F
Operand prefetch processing is performed out of the above two processing performed in stage (4).

オペランドプリフエツチはＦコード（15）を入力とし、
フエツチしたオペランドとそのアドレスをＳコード（1
7）として出力する。１つのＦコード（15）ではワード
境界をまたいでもよいが４バイト以下のオペランドフエ
ツチを指定する。Ｆコード（15）にはオペランドのアク
セスを行うかどうかの指定も含まれており、Ａステージ
（３）で計算したオペランドアドレス自体や即値をＥス
テージ（５）に転送する場合にはオペランドプリフエツ
チは行わず、Ｆコード（15）の内容をＳコード（17）と
して転送する。また、プリフエツチしようとするオペラ
ンドとＥステージ（５）が書き込み処理を行おうとする
オペランドとが包含関係を満たすときには、オペランド
プリフエツチに関してメモリアクセスは行わず、Ｅステ
ージ（５）が書き込もうとする値をバイパスする。Operand prefetch input F code (15),
Fetch the operand and its address in S code (1
Output as 7). One F code (15) may cross word boundaries, but specifies an operand feature of 4 bytes or less. The F code (15) also includes a designation as to whether or not to access the operand. When transferring the operand address itself or the immediate value calculated in the A stage (3) to the E stage (5), the operand prefetch is performed. The contents of the F code (15) are transferred as the S code (17) without performing the stitching. When the operand to be prefetched and the operand to be written by the E stage (5) satisfy the inclusive relation, the memory access is not performed for the operand prefetch, and the value to be written by the E stage (5). Bypass.

（1.1.6）実行ステージ実行ステージ（Ｅステージ（５））はＥコード（16）、
Ｓコード（17）を入力として、各種演算器を用いたデー
タの処理、データのリード、ライト等の処理を行う。演
算器としてはALU、バレルシフタ、プライオリテイエン
コーダやカウンタ、シフトレジスタなどがある。Ｅステ
ージ（５）はマイクロプログラムにより制御され、Ｒコ
ード（16）に示されたマイクロプログラムのエントリ番
地からの一連のマイクロプログラムを実行することによ
り命令を実行する。レジスタと主な演算器の間は３バス
で結合されており、１つのレジスタ間演算を指示する１
マイクロ命令を２クロツクサイクルで処理する。(1.1.6) Execution stage The execution stage (E stage (5)) is the E code (16),
The S code (17) is used as an input to perform data processing, data reading, writing, and the like using various arithmetic units. There are ALUs, barrel shifters, priority encoders, counters, shift registers, etc. as computing units. The E stage (5) is controlled by the microprogram and executes an instruction by executing a series of microprograms from the entry address of the microprogram indicated by the R code (16). The registers and the main arithmetic units are connected by 3 buses, and the instruction between the registers is 1
Process micro-instructions in 2 clock cycles.

このＥステージ（５）が命令を実行するステージあり、
Ｆステージ（４）以前にステージで行われた処理はすべ
てＥステージ（５）のための前処理である。Ｅステージ
（５）で分岐が起こると、IFステージ（１）〜Ｆステー
ジ（４）までの処理はすべて無効化され、飛び先番地が
命令フエツチ部とPC計算部に出力される。This E stage (5) is a stage for executing instructions,
All the processes performed in the stages before the F stage (4) are pre-processes for the E stage (5). When a branch occurs at the E stage (5), all the processing from the IF stage (1) to the F stage (4) is invalidated, and the jump destination address is output to the instruction fetch section and the PC calculation section.

Ｅステージ（５）ではデータ演算部（56）にあるストア
バツフアを利用して、４バイト以内のオペランドストア
と次のマイクロ命令実行をパイプライン処理することも
できる。In the E stage (5), the store buffer in the data operation unit (56) can be used to pipeline the operand store within 4 bytes and the next microinstruction execution.

Ｅステージ（５）ではＡステージ（３）で行つたレジス
タやメモリに対する書き込み予約をオペランドの書き込
みの後、解除する。In the E stage (5), the write reservation for the register and the memory performed in the A stage (3) is canceled after writing the operand.

また条件分岐命令がＥステージ（５）で分岐を起こした
ときはその条件分岐命令に対する分岐予測が誤つていた
ことを示しており分岐履歴の書換え処理を行う。When the conditional branch instruction causes a branch at the E stage (5), it indicates that the branch prediction for the conditional branch instruction is incorrect, and the branch history is rewritten.

（1.2）プログラムカウンタの管理この発明のデータ処理装置のパイプライン上に存在する
ステツプコードはすべて別命令に対するものである可能
性があり、プログラムカウンタの値はステツプコードご
とに管理する。すべてのステツプコードはそのステツプ
コードのもとになつた命令のプログラムカウンタ値をも
つ。ステツプコードに付属してパイプラインの各ステー
ジを流れるプログラムカウンタ値はステツププログラム
カウンタ（SPC）と呼ぶ。SPCはパイプラインステージを
次々と受け渡されていく。(1.2) Management of Program Counter All step codes existing on the pipeline of the data processing device of the present invention may be for different instructions, and the value of the program counter is managed for each step code. Every step code has the program counter value of the instruction that caused the step code. The program counter value attached to the step code and flowing through each stage of the pipeline is called the step program counter (SPC). SPCs are handed over to the pipeline stages one after another.

（２）サブルーチンリターン命令の先行分岐処理この発明のデータ処理装置は実行ステージでのサブルー
チンリターン命令の実行によるパイプラインの乱れを抑
えるために、サブルーチンリターン命令の実行に関して
は命令デコードステージ（Ｄステージ（２））で先行分
岐処理を行う。以下、詳細な動作を説明する。(2) Preceding Branch Processing of Subroutine Return Instruction In order to suppress the disturbance of the pipeline due to the execution of the subroutine return instruction in the execution stage, the data processing device of the present invention has an instruction decode stage (D stage (D stage ( The preceding branch processing is performed in 2)). The detailed operation will be described below.

第２図は、この発明のデータ処理装置のブロツク図であ
り、サブルーチンコール命令、サブルーチンリターン命
令の処理を説明するために必要な部分だけが抜き出され
て説明されている。図において、（21）は命令キユー、
（22）は命令デコード部、（23）は外部とデータのやり
取りを行うデータ入出力回路、（24）は外部にアドレス
の出力を行うアドレス出力回路、（25）は命令フエツチ
を行うアドレスを出力するためのカウンタ（QINPC）、
（26）は各ステツプコード生成毎に命令デコード部（2
2）で処理された命令長を格納するラツチ（IL）、（2
7）はプリブランチのためのPCに対する変位を格納する
ためのラツチ（PD）、（30）はPC演算部（54）での加算
を行うためのPC加算器、（28）、（29）、（31）はそれ
ぞれPC加算器（30）の入出力ラツチ（PA,PB,PO）、（3
2）はステツプコード処理毎のテンポラリなPCを格納す
るためのレジスタ（TPC）、（33）は現在デコード中の
命令のPCを格納するためのＤステージPC（DPC）、（3
4）はアドレス計算中のステツプコードに対応するPCを
格納するためのＡステージPC（APC）、（38）はアドレ
ス計算のための３値加算を行うアドレス加算器、（3
5）、（36）、（37）、（39）はそれぞれアドレス加算
器（38）の入出力ラツチ（AI,AD,AB,AO）、（40）はＡ
ステージ（３）でインクリメントやデクリメントを行い
SPの管理を行うＡステージスタツクポインタ（ASP）、
（41）はＦコード（15）としてのアドレスを格納するた
めのＦコードアドレスレジスタ（FA）、（42）はＳコー
ド（17）としてのアドレスを格納するためのＳコードア
ドレスレジスタ、（43）は命令フエツチを行うアドレス
を一時的に記憶するためのCAアドレスレジスタ（CA
A）、（44）はＥステージ（５）で管理しているアドレ
スレジスタ（AA）、（45）はＥステージ（５）での分岐
先アドレスを格納するためのＥステージブランチアドレ
スレジスタ（EB）、（46）はサブルーチンコール時の戻
り先アドレスのみを格納しておくPCスタツク、（47）は
スタツクポインタ、フレームポインタ、ワーキングレジ
スタ等を含むレジスタフアイル、（50）はデータ演算の
ためのALU、（48）、（49）、（51）はALU（50）の入出
力ラツチ（DA,DB,DO）、（52）はＳコード（17）として
のデータを格納するためのＳコードデータレジスタ（S
D）、（53）はＥステージ（５）で行うメモリアクセス
に関するデータを格納するデータレジスタ（DD）であ
り、（101）〜（110）はそれぞれ内部でデータやアドレ
スの転送を行うための内部バス（S1バス、S2バス、DOバ
ス、Ａバス、AOバス、DISPバス、POバス、CAバス、AAバ
ス、DDバス）である。（54）はPC演算部、（55）はアド
レス計算部である。FIG. 2 is a block diagram of the data processing apparatus of the present invention, in which only the parts necessary for explaining the processing of the subroutine call instruction and the subroutine return instruction are extracted and described. In the figure, (21) is the instruction queue,
(22) is an instruction decoding unit, (23) is a data input / output circuit for exchanging data with the outside, (24) is an address output circuit for outputting an address to the outside, and (25) is an address for performing an instruction fetch. Counter (QINPC) for
(26) is an instruction decoding unit (2
Latch (IL) that stores the instruction length processed in 2), (2
7) is a latch (PD) for storing displacements to the PC for pre-branching, (30) is a PC adder for performing addition in the PC arithmetic unit (54), (28), (29), (31) are input / output latches (PA, PB, PO) of the PC adder (30), (3
2) is a register (TPC) for storing the temporary PC for each step code processing, (33) is a D stage PC (DPC) for storing the PC of the instruction currently being decoded, (3
4) is an A-stage PC (APC) for storing the PC corresponding to the step code under address calculation, (38) is an address adder for performing ternary addition for address calculation, and (3
5), (36), (37) and (39) are the input / output latches (AI, AD, AB, AO) of the address adder (38), and (40) is A.
Increment and decrement on stage (3)
A stage stack pointer (ASP) that manages SP,
(41) is an F code address register (FA) for storing an address as an F code (15), (42) is an S code address register for storing an address as an S code (17), (43) Is a CA address register (CA
A) and (44) are address registers (AA) managed by the E stage (5), and (45) is an E stage branch address register (EB) for storing the branch destination address in the E stage (5). , (46) is a PC stack that stores only the return address at the time of subroutine call, (47) is a register file including stack pointer, frame pointer, working register, etc. (50) is an ALU for data operation , (48), (49) and (51) are input / output latches (DA, DB, DO) of the ALU (50), and (52) is an S code data register for storing data as an S code (17). (S
D) and (53) are data registers (DD) that store data related to memory access performed in the E stage (5), and (101) to (110) are internal for internally transferring data and addresses. It is a bus (S1 bus, S2 bus, DO bus, A bus, AO bus, DISP bus, PO bus, CA bus, AA bus, DD bus). Reference numeral (54) is a PC calculation unit, and (55) is an address calculation unit.

第３図は、この発明のデータ処理装置におけるサブルー
チンリターン命令の先行分岐処理に特に関係する部分の
ブロツク図である。図において、（61）はＤステージ制
御部、（62）はIFステージ制御部、（63）はＥステージ
制御部、（65）はパイプライン処理途中のサブルーチン
コール命令の数をカウントするための３ビツトのカウン
タであるBSRカウンタ、（66）はＤステージ（２）が管
理している３ビツトのPCスタツクポインタ（DP）、（6
7）はＥステージ（５）が管理している３ビツトのPCス
タツクポインタ（EP）、（68），（69）はそれぞれDP
（66）、EP（67）をデコードするデコーダ、（70）はAN
Dゲート、（71）は有効ビツト制御信号ラツチであり、
（201）〜（214）は各部の制御信号である。この図で
は、簡単のためタイミングを制御するためのクロツク信
号は省略してある。FIG. 3 is a block diagram of a portion particularly related to the preceding branch processing of the subroutine return instruction in the data processing device of the present invention. In the figure, (61) is a D stage control unit, (62) is an IF stage control unit, (63) is an E stage control unit, and (65) is 3 for counting the number of subroutine call instructions during pipeline processing. The BSR counter, which is a bit counter, (66) is a 3-bit PC stack pointer (DP) (6) managed by the D stage (2).
7) is a 3-bit PC stack pointer (EP) managed by the E stage (5), (68), (69) is DP respectively.
Decoder for decoding (66), EP (67), AN for (70)
D gate, (71) is the effective bit control signal latch,
(201) to (214) are control signals for each unit. In this figure, the clock signal for controlling the timing is omitted for simplicity.

第４図は、PCスタツク（46）の構成を示す図であり、図
において、（46A）は戻り先アドレスを格納する戻り先
アドレスフイールド、（46B）は各エントリ内に格納さ
れている戻り先アドレスが有効か無効かを示す有効ビツ
トである。FIG. 4 is a diagram showing the configuration of the PC stack (46). In the figure, (46A) is the return address field that stores the return address, and (46B) is the return destination stored in each entry. This is a valid bit indicating whether the address is valid or invalid.

本実施例では、PCスタツク（46）は８エントリで構成さ
れている。また、命令コードが16ビツト単位となつてい
るので、PCとしては奇数アドレスは存在せず、戻り先ア
ドレスフイールドは31ビツトになつている。PCスタツク
（46）から、戻り先アドレスが読み出される場合には、
最下位ビツトは‘0'として出力される。DP（66）,EP（6
7）は３ビツトとなつているが、インクリメント時の最
上位ビツトからのキヤリー、デクリメント時の最上位ビ
ツトへのボローは無視される。すなわち、PCスタツク
（46）は、ポインタ‘000'とポインタ‘111'の指し示す
エントリが隣合つたリング状のスタツクメモリとして取
り扱われている。In this embodiment, the PC stack (46) is composed of 8 entries. Also, since the instruction code is in units of 16 bits, there is no odd address for the PC, and the return address field is 31 bits. If the return address is read from the PC stack (46),
The lowest bit is output as '0'. DP (66), EP (6
Although 7) is 3 bits, the carry from the highest bit when incrementing, and the borrow to the highest bit when decrementing are ignored. That is, the PC stack (46) is handled as a ring-shaped stack memory in which the entries indicated by the pointer "000" and the pointer "111" are adjacent to each other.

（2.1）PCスタツクの動作の概要この発明のデータ処理装置では、サブルーチンコール命
令としてブランチサブルーチン（BSR）命令とジヤンプ
サブルーチン（JSR）命令がある。また、サブルーチン
リターン命令としては、リターンサブルーチン（RTS）
命令と高機能命令として高級言語用サブルーチンリター
ンとパラメータ解放を一度に行うEXITD命令がある。(2.1) Outline of operation of PC stack In the data processing device of the present invention, there are a branch subroutine (BSR) instruction and a jump subroutine (JSR) instruction as the subroutine call instruction. Also, as a subroutine return instruction, a return subroutine (RTS)
As an instruction and a high-function instruction, there is a high-level language subroutine return and an EXITD instruction that releases parameters at once.

サブルーチンコール命令が実行されると、Ｅステージ
（５）でサブルーチンからの戻り先アドレスがPCスタツ
ク（46）にプツシユされる。サブルーチンリターン命令
がデコードされると、Ｄステージ（２）でPCスタツク
（46）のスタツクトツプにあるアドレスに先行分岐処理
（プリリターン）を行う。Ｅステージ（５）では、Ｄス
テージ（２）でのプリリターン処理が正しかつたかどう
かがチエツクされ、プリリターンを行つたアドレスが誤
つていた場合には、真の戻り先アドレスへの分岐処理を
行う。When the subroutine call instruction is executed, the return address from the subroutine is pushed to the PC stack (46) at the E stage (5). When the subroutine return instruction is decoded, the preceding branch processing (pre-return) is performed on the address in the stack of the PC stack (46) in the D stage (2). In the E stage (5), it is checked whether or not the pre-return processing in the D stage (2) is correct, and if the address for which the pre-return is made is wrong, the branch to the true return destination address is made. Perform processing.

ポインタDP（66）、EP（66）等の更新も含めて以下で詳
しく説明する。ただし、有効ビツト制御信号ラツチ（7
1）の値は‘1'とする。It will be described in detail below including the updating of the pointers DP (66) and EP (66). However, the effective bit control signal latch (7
The value of 1) shall be '1'.

リセツトされた状態では、PCスタツク（46）初期化信号
（INIT信号（208））により、BSRカウンタ（65）、EP
（67）はゼロクリアされ、DP（66）にはゼロになつてい
るEP（67）の値がコピーされる。また、PCスタツク（4
6）中の有効ビツト（46B）はすべて‘0'にクリアされ
る。In the reset state, the PC stack (46) initialization signal (INIT signal (208)) causes the BSR counter (65), EP
(67) is cleared to zero, and the value of EP (67) which is zero is copied to DP (66). Also, the PC stack (4
All valid bits (6B) in 6) are cleared to '0'.

まず、命令キユー（21）から取り込まれた命令コード
（11）が命令デコード部（22）でデコードされる。デコ
ードの結果、取り込まれた命令がサブルーチンコール命
令であつた場合にはDPDEC信号（202）によりDPのデクリ
メントを行うと共に、BSRカウンタ（65）をカウントア
ツプする。アドレス計算ステージ（３）では、アドレス
加算器（38）により戻り先アドレスが計算されてAOバス
（105）を介してFAレジスタ（41）に転送される。Ｆス
テージ（４）では、FAレジスタ（41）の値がSAレジスタ
（42）に転送される。サブルーチンコール命令がＥステ
ージ（５）で実行されるとEPDEC信号（206）によりEP
（67）の値がプリデクリメントされる。そして、PCスタ
ツク（46）中のPCWRITE信号（210）により更新されたEP
（67）が指すエントリの戻り先アドレスフイールド（46
A）に、S1バス（101）を介してSAレジスタ（42）に格納
されている戻り番地の値が書き込まれ、そのエントリの
有効ビツト（46B）が‘1'にセツトされる。また、BSRCD
EC信号（205）によりBSRカウンタ（65）をデクリメント
する。BSR命令では、Ｄステージ（２）でサブルーチン
の先頭番地への分岐処理を行うので、Ｅステージ（５）
で分岐処理を行う必要はない。First, the instruction code (11) fetched from the instruction queue (21) is decoded by the instruction decoding section (22). When the fetched instruction is a subroutine call instruction, the DPDEC signal (202) decrements the DP and the BSR counter (65) counts up. In the address calculation stage (3), the return address is calculated by the address adder (38) and transferred to the FA register (41) via the AO bus (105). At the F stage (4), the value of the FA register (41) is transferred to the SA register (42). When the subroutine call instruction is executed in the E stage (5), EP is issued by the EPDEC signal (206).
The value of (67) is pre-decremented. Then, the EP updated by the PCWRITE signal (210) in the PC stack (46)
The return address field of the entry pointed to by (67) (46
The value of the return address stored in the SA register (42) is written in A) via the S1 bus (101), and the valid bit (46B) of that entry is set to '1'. Also, BSRCD
The EC signal (205) decrements the BSR counter (65). With the BSR instruction, branch processing to the start address of the subroutine is performed in the D stage (2), so the E stage (5)
There is no need to perform branch processing with.

次に、サブルーチンリターン命令の処理について説明す
る。命令キユー（21）から取り込まれた命令がサブルー
チンリターン命令であつたときにはBSRカウンタ（65）
の値がゼロであるかどうかを示すBSRCZ信号（201）のチ
エツクを行う。もしBSRカウンタ（65）がゼロでなかつ
たら、BSRカウンタ（65）の値がゼロになるまでＤステ
ージ（２）は処理を一時停止する。BSRカウンタ（65）
がゼロでないということは、まだ対応するサブルーチン
コール命令がＥステージ（５）で実行されずにパイプラ
イン中にあることを示しており、PCスタツク（46）に対
応する戻り番地が登録されていないことを示している。
BSRCZ信号（201）により、BSRカウンタ（65）の値がゼ
ロである、あるいは、ゼロになつたことが示されると、
Ｄステージ制御部（61）はPRERET信号（209）により、I
Fステージ制御部（62）及びPCスタツク（46）にプリリ
ターン処理を行うことを知らせる。PCスタツク（46）
は、DP（66）が指し示しているエントリの戻り先アドレ
スフイールド（46A）の内容をCAバス（108）に出力す
る。IFステージ制御部（62）は、命令キユー（21）に取
り込まれている命令データをすべて無効化し、CAバスに
出力された値で戻り先アドレスの命令のフエツチを行
い、取り込まれた命令データを命令デコード部（22）に
送る。PCスタツク（46）の内容がCAバス（108）に出力
された後に、DPINC信号（203）によりDP（66）がポスト
インクリメントされる。VREAD信号（211）によりPCスタ
ツク（46）中のEP（67）が指し示すエントリの有効ビツ
ト（46B）の内容が、VALID信号（214）としてＥステー
ジ制御部（63）に送られる。読み出されたエントリの有
効ビツト（46B）は、‘0'にクリアされる。Ｅステージ
制御部（63）では、もしVALID信号（214）が‘1'であつ
たなら、プリリターンが正しかつたことを示しているの
で、サブルーチンリターン命令の実行を終了する。もし
VALID信号（214）が‘0'であつた場合には、プリリター
ンを行つた戻り先アドレスが誤つていたことを示してい
る。このとき、真の戻り先アドレスの値をメモリからDD
レジスタ（53）に取り込み、S1バス（101）を介してEB
レジスタ（45）に転送した後、EBレジスタ（45）の値を
CAバス（108）に出力する。IFステージ（１）はCAバス
（108）に出力された値により命令フエツチを行う。Next, the processing of the subroutine return instruction will be described. When the instruction fetched from the instruction queue (21) is a subroutine return instruction, the BSR counter (65)
The BSRCZ signal (201) is checked to see if the value of is zero. If the BSR counter (65) is not zero, the D stage (2) suspends processing until the value of the BSR counter (65) becomes zero. BSR counter (65)
Is not zero, it means that the corresponding subroutine call instruction is not executed yet in the E stage (5) and is in the pipeline, and the return address corresponding to the PC stack (46) is not registered. It is shown that.
When the BSRCZ signal (201) indicates that the value of the BSR counter (65) is zero or has reached zero,
The D stage controller (61) receives the I signal by the PRERET signal (209).
Notify the F stage control unit (62) and the PC stack (46) that pre-return processing will be performed. PC stack (46)
Outputs the content of the return address field (46A) of the entry pointed to by DP (66) to the CA bus (108). The IF stage controller (62) invalidates all the instruction data fetched by the instruction queue (21), fetches the instruction at the return address with the value output to the CA bus, and fetches the fetched instruction data. It is sent to the instruction decoding unit (22). After the contents of the PC stack (46) are output to the CA bus (108), the DP (66) is post-incremented by the DPINC signal (203). The contents of the valid bit (46B) of the entry pointed to by the EP (67) in the PC stack (46) by the VREAD signal (211) are sent to the E stage control section (63) as a VALID signal (214). The valid bit (46B) of the read entry is cleared to "0". In the E stage control unit (63), if the VALID signal (214) is "1", it indicates that the pre-return is correct, and the execution of the subroutine return instruction is ended. if
When the VALID signal (214) is "0", it indicates that the return address for which pre-return was performed is incorrect. At this time, the value of the true return address is
Captured in register (53) and EB via S1 bus (101)
After transferring to register (45), the value of EB register (45)
Output to CA bus (108). The IF stage (1) performs instruction fetch according to the value output to the CA bus (108).

PCスタツク（46）中の有効ビツト（46B）は、サブルー
チンコール時の戻り先アドレスを登録したとき‘1'にセ
ツトされ、サブルーチンリターン時に有効ビツト（46
B）が読み出された後で‘0'にクリアされる。すなわ
ち、PCスタツク中の有効ビツト（46B）が‘1'のエント
リには、正しい戻り先アドレスが登録されている。The valid bit (46B) in the PC stack (46) is set to '1' when the return address at the time of subroutine call is registered, and the valid bit (46B) at the subroutine return.
It is cleared to '0' after B) is read. That is, the correct return address is registered in the entry with the valid bit (46B) of '1' in the PC stack.

サブルーチンリターン命令実行時に、Ｅステージ（５）
では、Ｄステージ（２）でプリリターンを行つた戻り先
アドレスが正しかつたかどうかのチエツクを行つてい
る。これは、PCスタツク（46）が８エントリで構成され
ているため、サブルーチンコールが９レベル以上の入れ
子になつた場合には８レベルより上のレベルのサブルー
チンコールに関する戻り先アドレスのデータがオーバー
ライトされて壊されてしまう。このような場合に備え、
Ｅステージ（５）ではプリリターンが正しく実行された
かどうかのチエツクを行つているのである。一番深くな
つたところから８レベル以上PCスタツク（46）がリード
（サブルーチンリターン）されると、PCスタツク（46）
中の有効ビツト（46B）はすべて‘0'となり、有効な戻
り先アドレスが格納されていないことを示す。しかし、
サブルーチンレベルが一番深くなつたところから８レベ
ルのサブルーチンコールに関してはいつも正しい値がPC
スタツク（46）に格納されているので、プリリターンが
正しく行われる確率は非常に高い。E stage (5) when the subroutine return instruction is executed
Then, a check is made as to whether or not the return address that was pre-returned in the D stage (2) was correct. This is because the PC stack (46) consists of 8 entries, so if subroutine calls are nested at 9 levels or higher, the return address data for subroutine calls at levels higher than 8 will be overwritten. It will be destroyed. In case of such a case,
In the E stage (5), a check is made as to whether or not the pre-return has been executed correctly. When the PC stack (46) is read (subroutine return) from the deepest point for 8 levels or more, the PC stack (46)
All of the valid bits (46B) are "0", indicating that a valid return address is not stored. But,
From the deepest level of the subroutine level, the correct value is always PC for 8th level subroutine calls.
Stored in the stack (46), the probability of pre-return being correct is very high.

先に述べたBSRカウンタ（65）は、プリブランチを行うB
SR命令直後でも、正確なプリリターンを行い、Ｅステー
ジ（５）での比較を確実に行うために備えられている。
この機能がないと、BSR命令が処理中であり、Ｄステー
ジ（２）での処理は終了したが、Ｅステージ（５）でま
だ戻り先アドレスの値がPCスタツク（46）に書き込まれ
ていないうちに、Ｄステージ（２）でサブルーチンリタ
ーン命令が実行された場合、対応するサブルーチンリタ
ーン命令の戻り先アドレスが登録されていないため、誤
つた戻り先アドレスにプリリターン処理を行つてしま
う。ところが、サブルーチンリターン命令がＥステージ
（５）で処理される段階では、先行していたBSR命令が
すでに処理されており、PCスタツク（46）には正しい戻
り先アドレスが登録されているため、Ｅステージ（５）
で有効ビツト（46B）を参照した際VALID信号（214）は
‘1'（有効）を示し、プリリターンが正しかつたとして
処理されてしまう。すなわち、このような場合誤動作を
行つてしまうわけである。BSRカウンタの機能を備える
ことにより、参照すべき戻り先アドレスの値が先行する
BSR命令により登録された後に、プリリターンが行われ
る。また、BSR命令の実行に際し、Ｄステージ（２）でP
Cスタツク（46）が参照されてからＥステージ（５）で
処理されるまでPCスタツク（46）が書き換えられること
がないので、Ｄステージ（２）で戻り先アドレスが読み
出されたPCスタツク（46）中のエントリに対応する有効
ビツト（46B）がＥステージ（５）において正しく参照
される。The BSR counter (65) described above is a B that performs pre-branch.
It is provided for accurate pre-return even immediately after the SR instruction and for reliable comparison at the E stage (5).
Without this function, the BSR instruction is being processed and the processing at the D stage (2) has ended, but at the E stage (5) the value of the return address has not yet been written to the PC stack (46). If the subroutine return instruction is executed in the D stage (2), the return address of the corresponding subroutine return instruction is not registered, and the pre-return processing is performed at the incorrect return destination address. However, when the subroutine return instruction is processed in the E stage (5), the preceding BSR instruction has already been processed and the correct return address is registered in the PC stack (46). Stage (5)
When the valid bit (46B) is referred to, the VALID signal (214) shows '1' (valid), and the pre-return is processed as if it was correct. That is, in such a case, a malfunction will occur. By providing the function of the BSR counter, the value of the return destination address to be referenced precedes
After registering with the BSR instruction, a pre-return is performed. In addition, when executing the BSR instruction, P at the D stage (2)
Since the PC stack (46) is not rewritten after the C stack (46) is referenced until it is processed in the E stage (5), the PC stack (where the return address is read in the D stage (2) is The valid bit (46B) corresponding to the entry in 46) is correctly referenced in the E stage (5).

プリブランチを行わないJSR命令では、Ｅステージ
（５）において分岐先アドレスへの分岐処理が行われる
ため、もし、RTS命令ががJSR命令で登録される前のPCス
タツク（46）を参照してプリリターンしても、そのRTS
命令自体が実行される前にパイプラインはキヤンセルさ
れるので、このようなことは起こらない。BSR命令に関
してプリブランチ処理を行わない場合も同様である。In the JSR instruction that does not perform pre-branching, branch processing to the branch destination address is performed in the E stage (5), so if the RTS instruction is registered in the JSR instruction, refer to the PC stack (46). Even if you pre-return, that RTS
This does not happen because the pipeline is canceled before the instruction itself is executed. The same applies when the pre-branch processing is not performed for the BSR instruction.

以上で述べたように、サブルーチンコール時の戻り先ア
ドレスのみを記憶するPCスタツク（46）を設けることに
より、サブルーチンリターン命令に対して命令のデコー
ド段階で戻り先アドレスへのプリリターンを行い、サブ
ルーチンリターン命令実行時のパイプラインの乱れをな
くす。As described above, by providing the PC stack (46) that stores only the return address at the time of the subroutine call, the subroutine return instruction is pre-returned to the return address at the instruction decoding stage, and the subroutine Eliminates turbulence in the pipeline when executing return instructions.

Ｅステージ（５）においてブランチが起こつた場合に
は、EBRA信号（204）によりBSRカウンタ（65）の値がゼ
ロクリアされ、EP（67）の内容がDP（66）にコピーされ
る。Ｅステージ（５）においてブランチが起こつた場合
には、IFステージ（１）〜Ｆステージでの処理がすべて
無効化されるため、Ｄステージ（２）でデコードされた
が、Ｅステージ（５）では実行されなかつた処理途中の
サブルーチンコール命令、サブルーチンリターン命令に
対して行われたBSRカウンタ（65）、DP（66）の更新を
無効化し、PCスタツク（46）のそのレベルまでの戻り先
アドレスの値をＤステージ（２）で正しく参照できるよ
うになつている。When a branch occurs in the E stage (5), the value of the BSR counter (65) is cleared to zero by the EBRA signal (204), and the content of EP (67) is copied to DP (66). When a branch occurs at the E stage (5), all the processing at the IF stage (1) to the F stage is invalidated, so that it is decoded at the D stage (2), but at the E stage (5). The update of the BSR counter (65) and DP (66) performed for the subroutine call instruction and the subroutine return instruction that were not executed during processing is invalidated, and the return address of the PC stack (46) to that level is reset. The value can be referred to correctly in the D stage (2).

プログラムによつて外部メモリ上のサブルーチンからの
戻り先アドレスの値が書き換えられた場合には、PCスタ
ツク（46）に格納されている戻り先アドレスと、外部メ
モリ上の戻り先アドレスが異なつた値となるため、動作
は保証されない。従つて。プリリターン処理を行う場合
はプログラムによる外部メモリ上の戻り先アドレスの値
の書き換えは禁止する。When the return address value from the subroutine in the external memory is rewritten by the program, the return address stored in the PC stack (46) is different from the return address in the external memory. Therefore, the operation is not guaranteed. Therefore. When performing pre-return processing, rewriting the value of the return address on the external memory by the program is prohibited.

この発明のデータ処理装置は、プログラムによつてプリ
リターン処理を強制的に無効化する手段を有している。
プログラムで制御レジスタ中の有効ビツト制御信号ラツ
チ（VCNTラツチ（71））の内容を書き換えることによつ
て、この処理を行う。VCNTラツチ（71）を‘1'にセツト
しておくと、上述のようにプリリターン処理を行つた戻
り先アドレスが正しかつたかどうかを示すVALID信号（2
14）が、PCスタツク（46）中の有効ビツト（46B）の値
を反映して、Ｅステージ制御部（63）に送られる。VCNT
ラツチ（71）を‘0'にセツトしておくと、VCNT信号（21
3）が‘0'となり、PCスタツク（46）中の有効ビツト（4
6B）の値が何であつても、ANDゲート（70）からＥステ
ージ制御部（63）に送られるVALID信号（214）は‘0'と
なる。従つて、Ｄステージ（２）で行われたプリリター
ン処理はいつも無効となり、Ｅステージ（５）で戻り先
アドレスが外部メモリから読み出され、その戻り先アド
レスにリターンする。プリリターン処理がすべて無効と
なるので、外部メモリ上のサブルーチンからの戻り先ア
ドレスの値が書き換えられても正確な動作が保証され
る。The data processing apparatus of the present invention has means for forcibly invalidating the pre-return processing by a program.
This processing is performed by rewriting the contents of the valid bit control signal latch (VCNT latch (71)) in the control register by the program. If the VCNT latch (71) is set to '1', the VALID signal (2
14) is sent to the E stage control section (63), reflecting the value of the effective bit (46B) in the PC stack (46). VCNT
If the latch (71) is set to '0', the VCNT signal (21
3) becomes '0' and the valid bit (4) in the PC stack (46)
Whatever the value of 6B), the VALID signal (214) sent from the AND gate (70) to the E stage control unit (63) becomes "0". Therefore, the pre-return processing performed in the D stage (2) is always invalid, the return destination address is read from the external memory in the E stage (5), and the return destination address is returned. Since all the pre-return processing is disabled, the correct operation is guaranteed even if the value of the return address from the subroutine on the external memory is rewritten.

また、有効ビツト制御信号ラツチ（VCNTラツチ（71））
に‘0'をセツトしたのち、再びプリリターン処理を有効
にする場合には、プログラムにより制御レジスタ中のPC
スタツク（46）初期化信号（INIT信号（208））を‘1'
にセツトすることにより、PCスタツク（46）の初期化を
行う。BSRカウンタ（65）、EP（67）はゼロクリアさ
れ、DP（66）にはゼロになつているEP（67）の値がコピ
ーされる。また、PCスタツク（46）中の有効ビツト（46
B）はすべて‘0'にクリアされる。その後、VCNTラツチ
（71）を‘1'にセツトすることにより、再びプリリター
ン処理を有効にする。Also, the effective bit control signal latch (VCNT latch (71))
After setting "0" to, to enable pre-return processing again, program the PC in the control register.
Stack (46) initialization signal (INIT signal (208)) to '1'
The PC stack (46) is initialized by resetting. The BSR counter (65) and EP (67) are cleared to zero, and the value of EP (67) that has reached zero is copied to DP (66). In addition, the effective bit (46) in the PC stack (46)
B) are all cleared to '0'. After that, the VCNT latch (71) is set to '1' to enable the pre-return processing again.

（2.2）サブルーチンコール命令、サブルーチンリター
ン命令の詳細動作以上では、サブルーチンコール命令とサブルーチンリタ
ーン命令の大まかな動作について述べてきたが、ここで
は各命令の詳細な動作について説明する。(2.2) Detailed Operation of Subroutine Call Instruction and Subroutine Return Instruction Above, the rough operation of the subroutine call instruction and the subroutine return instruction has been described. Here, the detailed operation of each instruction will be described.

この発明のデータ処理装置では、サブルーチンコール命
令としてブランチサブルーチン（BSR）命令とジヤンプ
サブルーチン（JSR）命令がある。また、サブルーチン
リターン命令としては、リターンサブルーチン（RTS）
命令と高機能命令として高級言語用サブルーチンリター
ンとパラメータ解放を一度に行うEXITD命令がある。各
命令のビツト割り付けを第５図に示してある。‘−’は
オペレーシヨンコードを示す。In the data processor of the present invention, there are a branch subroutine (BSR) instruction and a jump subroutine (JSR) instruction as the subroutine call instruction. Also, as a subroutine return instruction, a return subroutine (RTS)
As an instruction and a high-function instruction, there is a high-level language subroutine return and an EXITD instruction that releases parameters at once. The bit allocation of each instruction is shown in FIG. "-" Indicates an operation code.

BSR命令とJSR命令及びRTS命令とEXITD命令は、PCスタツ
ク（46）に関する処理は同じなので、以下、BSR命令とR
TS命令について詳細な説明を行う。The BSR instruction and the JSR instruction and the RTS instruction and the EXITD instruction have the same processing regarding the PC stack (46).
The TS instruction will be described in detail.

（2.2.1）BSR命令 BSR命令はPC相対のアドレツシングのみをサポートする
サブルーチンコール命令であり、戻り先アドレスがスタ
ツクに退避される。第５図（Ａ），（Ｂ）に示すように
BSR命令に関しては一般形（Ｇフオーマツト）と短縮形
（Ｄフオーマツト）の２つの命令フオーマツトがある。
Ｄステージ（２）では、どちらのフオーマツトでも同様
の処理が行われる。この命令は、１つのステツプコード
として処理される。(2.2.1) BSR instruction The BSR instruction is a subroutine call instruction that supports only PC-relative addressing, and the return address is saved in the stack. As shown in FIGS. 5 (A) and (B)
Regarding the BSR instruction, there are two instruction formats, a general type (G format) and a short form (D format).
In the D stage (2), the same process is performed for both formats. This instruction is processed as one step code.

BSR命令実行のフローチヤートを第６図に示す。BSR命令
が命令デコード部（22）で処理されると、BSR命令のス
テツプコードを示すＤコード（12）と戻り先アドレスを
計算するためのＡコード（13）が生成される。Ｇフオー
マツトの命令であれば、変位のサイズを示すフイールド
（82B）に従つて変位（82D）の値も同時に取り込む。ま
た、DPDEC信号（202）によりDP（66）のデクリメント、
及び、BSRカウンタ（65）のインクリメント処理を行
う。この命令は、プリブランチを行う命令であり、PC演
算部（54）において飛び先アドレスの計算が行われ、演
算結果がCAバスに出力されてプリブランチ処理が行われ
る。Ａステージ（３）では、Ａコード（13）の指示に従
つてアドレス計算部（55）において戻り先アドレスが計
算され、AOバス（105）を介してFAレジスタ（41）に転
送される。Ｆステージ（14）ではFAレジスタ（41）の値
がSAレジスタ（42）に転送される。Ｅステージ（５）で
は、まず、EPDEC信号（206）によりEP（67）のプリデク
リメントを行う。次に、PCWRITE信号（210）によつて、
戻り先アドレスが格納されているSAレジスタ（42）の値
がS1バス（101）を介してPCスタツク（46）中のEP（6
7）の指すエントリの戻り先アドレスフイールド（46A）
に書き込まれ、そのエントリの有効ビツト（46B）が
‘1'にセツトされる。また、同時にS1バス（101）の値
がALU（50）、DOバス（103）を介してDDレジスタ（53）
に書き込まれ、戻り先アドレスの格納されたDDレジスタ
（53）の値をスタツクポインタによつてソフトウエアで
管理されているメモリ上のスタツクにプツシユする。PC
スタツク（46）に戻り先アドレスが登録されたらBSRCDE
C信号（205）によりBRSカウンタ（65）がデクリメント
される。この命令では、Ｄステージ（２）においてすで
に分岐処理が行われているので、Ｅステージでは分岐処
理は行わない。The flow chart for executing the BSR instruction is shown in FIG. When the BSR instruction is processed by the instruction decoding unit (22), a D code (12) indicating the step code of the BSR instruction and an A code (13) for calculating the return address are generated. In the case of the G format command, the value of the displacement (82D) is also captured at the same time according to the field (82B) indicating the size of the displacement. Also, the DPDEC signal (202) decrements the DP (66),
Also, the BSR counter (65) is incremented. This instruction is an instruction to perform pre-branch, the jump destination address is calculated in the PC operation unit (54), the operation result is output to the CA bus, and pre-branch processing is performed. In the A stage (3), the return address is calculated in the address calculation section (55) according to the instruction of the A code (13) and transferred to the FA register (41) via the AO bus (105). In the F stage (14), the value of the FA register (41) is transferred to the SA register (42). In the E stage (5), first, the EP (67) is pre-decremented by the EPDEC signal (206). Next, by the PCWRITE signal (210),
The value of the SA register (42) that stores the return address is set to EP (6) in the PC stack (46) via the S1 bus (101).
The return address field of the entry pointed to by 7) (46A)
The valid bit (46B) of the entry is set to "1". At the same time, the value of the S1 bus (101) is ALU (50) and the value of the DD register (53) via the DO bus (103).
The value of the DD register (53), which has been written to the memory and is stored in the return address, is pushed to the stack on the memory managed by the software by the stack pointer. PC
Return to the stack (46) and register the destination address. BSRCDE
The BRS counter (65) is decremented by the C signal (205). With this instruction, branch processing has already been performed in the D stage (2), so branch processing is not performed in the E stage.

（2.2.2）RTS命令 RTS命令はサブルーチンからのリターンを行う命令であ
り、スタツクから復帰されたリターンアドレスにジヤン
プする。この命令は、１つのステツプコードとして処理
される。(2.2.2) RTS instruction The RTS instruction is an instruction that returns from a subroutine, and jumps to the return address returned from the stack. This instruction is processed as one step code.

RTS命令実行のフローチヤートを第７図に示す。RTS命令
が命令デコード部（22）で処理されると、RTS命令のス
テツプコードを示すＤコード（12）とスタツクトツプの
アドレスを計算するためのＡコード（13）が生成され
る。この命令はプリリターンを行う命令である。BSRCZ
信号（201）によりパイプライン中にサブルーチンコー
ル命令が存在することが示されている場合にはBSRカウ
ンタ（65）の内容がゼロになるまで処理を一時停止す
る。BSRカウンタ（65）がゼロである場合にはプリリタ
ーン処理を行う。PRERET信号（209）により、PCスタツ
ク（46）中のDP（66）が指し示すエントリの戻り先アド
レスフイールド（46A）の内容をCAバス（108）に出力
し、先行分岐処理（プリリターン）を行う。また、PCス
タツク（46）参照後、DPINC信号（203）によりDP（66）
のポストインクリメント処理を行う。Ａステージ（３）
では、Ａコード（13）の指示に従つてアドレス計算部
（55）においてスタツクトツプのアドレスが計算され、
AOバス（105）を介してFAレジスタ（41）に書き込まれ
る。スタツクトツプのアドレスとはASP（40）の値その
ものである。Ｆステージ（４）では、FAレジスタ（41）
の値がSAレジスタ（42）に転送される。Ｅステージ
（５）では、VREAD信号（211）によつて、プリリターン
時に参照されたリターンアドレスが格納されているPCス
タツク（46）中のEP（67）の指すエントリの有効ビツト
（46B）の内容がVALID信号（214）としてＥステージ制
御部（63）に送られ、そのエントリの有効ビツト（46
B）の値が‘0'にクリアされる。また、同時に、スタツ
クトツプのアドレスを示しているSAレジスタ（42）の値
を、S1バス（101）を介してAAレジスタ（44）に転送す
る。PCスタツク（46）参照後、EPINC信号（207）により
EP（67）のポストインクリメントを行う。VALID信号（2
14）が‘1'であつたら、正しいアドレスにプリリターン
が行われたことを示しており、Ｅステージ（５）は１マ
イクロサイクルNOPを実行して命令の実行を終了する。V
ALID信号（214）が‘0'であつた場合にはプリリターン
を行つたリターンアドレスが誤つていたことを示してお
り、AAレジスタ（45）の値をアドレスとして戻り先アド
レスの値をフエツチし、DDレジスタ（53）に取り込む。
DDレジスタ（53）の値をS1バス（101）を介してEBレジ
スタ（45）に転送し、EBレジスタ（45）の値がCAバス
（108）に出力されて分岐処理が行われる。このとき、E
BRA信号（204）により、BSRカウンタ（65）はクリアさ
れ、DP（66）にはEP（67）の値がコピーされる。A flow chart for executing the RTS instruction is shown in FIG. When the RTS instruction is processed by the instruction decoding unit (22), a D code (12) indicating the step code of the RTS instruction and an A code (13) for calculating the address of the stap are generated. This instruction is a pre-return instruction. BSRCZ
When the signal (201) indicates that the subroutine call instruction exists in the pipeline, the processing is temporarily stopped until the content of the BSR counter (65) becomes zero. If the BSR counter (65) is zero, pre-return processing is performed. The PRERET signal (209) outputs the contents of the return address field (46A) of the entry pointed to by DP (66) in the PC stack (46) to the CA bus (108), and performs the pre-branch processing (pre-return). . Also, after referring to the PC stack (46), DP (66) by the DPINC signal (203)
Post-increment processing of. Stage A (3)
Then, in accordance with the instruction of the A code (13), the address calculator (55) calculates the address of the staptop,
It is written to the FA register (41) via the AO bus (105). The address of the stamp is the ASP (40) value itself. In the F stage (4), FA register (41)
Is transferred to the SA register (42). At the E stage (5), the valid bit (46B) of the entry pointed to by the EP (67) in the PC stack (46) in which the return address referred to at the time of pre-return is stored by the VREAD signal (211). The contents are sent to the E stage control section (63) as a VALID signal (214), and the valid bit (46
The value of B) is cleared to '0'. At the same time, the value of the SA register (42) indicating the address of the stap is transferred to the AA register (44) via the S1 bus (101). After referring to the PC stack (46), use the EPINC signal (207)
Post increment of EP (67). VALID signal (2
If 14) is '1', it indicates that the pre-return was made to the correct address, and the E stage (5) executes one micro cycle NOP to end the execution of the instruction. V
If the ALID signal (214) is '0', it means that the return address for the pre-return was incorrect, and the value of the return address is used as the fetch address with the value of the AA register (45) as the address. And load it into the DD register (53).
The value of the DD register (53) is transferred to the EB register (45) via the S1 bus (101), and the value of the EB register (45) is output to the CA bus (108) for branch processing. At this time, E
The BRA signal (204) clears the BSR counter (65) and copies the value of EP (67) to DP (66).

（2.3）他の実施例の説明本実施例では、PCスタツク（46）は８エントリで構成さ
れている。従つて、サブルーチンコールが９レベル以上
の入れ子となつたときには、有効な戻り先アドレスが格
納されているエントリに別の戻り先アドレスがオーバー
ライトされるため、最初の値が消えてしまう。従つて、
リカーシブコールを行うような特殊な場合を除いて、９
レベル以上の入れ子になると誤つたプリリターンを行う
ことになる。このため、Ｅステージでプリリターンが正
しかつたどうかのチエツクが必要になるわけである。PC
スタツクを何エントリ設けるかに関しては、何レベルま
での深さのサブルーチンコールに対して正しいプリリタ
ーンを行うかという性能の問題と、ハードウエアの増加
量との兼ね合いで決定すればよい。(2.3) Description of Other Embodiments In this embodiment, the PC stack (46) is composed of 8 entries. Therefore, when the subroutine calls are nested in nine levels or more, another return address is overwritten on the entry in which a valid return address is stored, and the first value disappears. Therefore,
9 except in the special cases of making recursive calls
If you nest more than the level, you will get a wrong pre-return. Therefore, it is necessary to check whether the pre-return is correct at the E stage. PC
The number of stack entries to be provided may be determined in consideration of the performance problem of how many levels of depth the subroutine call is made and the correct pre-return, and the increased amount of hardware.

本実施例では、確実なプリリターンを行うためにBSRカ
ウンタ（65）を備えらているが、サブルーチンコール命
令のプリブランチ処理を行わない場合には、サブルーチ
ンコール命令実行後必ず飛び先番地への分岐処理が行わ
れ、パイプラインがキヤンセルされるためこの機能は必
要ない。また、BSR命令をＤステージ（２）デコードす
るときポインタDP（66）をデクリメントしているが、BS
R命令をＥステージ（５）で実行するとき、デクリメン
トしたポインタEP（67）の値をコピーすれようにしても
よい。In the present embodiment, the BSR counter (65) is provided in order to perform a reliable pre-return. However, if the pre-branch processing of the subroutine call instruction is not performed, the jump destination address is always executed after the subroutine call instruction is executed. This function is not necessary because branch processing is performed and the pipeline is canceled. Also, the pointer DP (66) is decremented when the BSR instruction is decoded in the D stage (2), but BS
When the R instruction is executed at the E stage (5), the value of the decremented pointer EP (67) may be copied.

また、本実施例では、Ｅステージ（５）でプリリターン
が正しく行われたかどうかのチエツクを行うために、PC
スタツク（46）からプリリターンを行う際に参照したエ
ントリの有効ビツト（46B）を読み出しているが、Ｄス
テージ（２）でプリリターンを行う際に有効ビツト（46
B）も同時に読み出し、その有効ビツトの値をＥステー
ジ（５）まで転送するようにしてもよい。この場合、転
送されてきた有効ビツトの値を用いて本実施例と同様に
Ｅステージ（５）でチエツクを行つてもよいし、この有
効ビツトの値を用いて、マイクロ命令のエントリアドレ
スを変えるなどしてマイクロ命令の処理をＲステージ
（４）で変えてもよい。ただし、この場合もポインタの
切り替え処理、及び有効ビツト（46B）のクリア処理は
必要である。In addition, in this embodiment, in order to check whether the pre-return was correctly performed at the E stage (5), the PC
The valid bit (46B) of the entry referred to when performing the pre-return from the stack (46) is read, but the valid bit (46B) is performed during the pre-return in the D stage (2).
B) may also be read at the same time and the value of the valid bit may be transferred to the E stage (5). In this case, a check may be performed at the E stage (5) using the transferred valid bit value as in the present embodiment, and the entry address of the microinstruction is changed using this valid bit value. For example, the processing of the micro instruction may be changed in the R stage (4). However, also in this case, the pointer switching process and the valid bit (46B) clearing process are necessary.

また、本実施例では、Ｄステージ（２）でサブルーチン
リターン命令が処理されたときには必ずプリリターン処
理を行つているが、Ｄステージ（２）でプリリターンを
行う際に有効ビツト（46B）も同時に読み出し、その有
効ビツトの値が‘1'（有効）であつたときのみプリリタ
ーン処理を行うようにしてもよい。Further, in this embodiment, the pre-return processing is always carried out when the subroutine return instruction is processed in the D stage (2), but the effective bit (46B) is also simultaneously executed when the pre-return is carried out in the D stage (2). The pre-return processing may be performed only when the value of the valid bit is read and is "1" (valid).

また、本実施例では、Ｅステージ（５）でプリリターン
が正しかつたかどうかのチエツクを行つてから、誤つて
いたときのみ正しい戻り先アドレスを外部メモリから取
り込んでいるが、チエツクの結果にかかわらず戻り先ア
ドレスの値を読み出してもよい。例えば、RTS命令であ
ればＦステージ（４）で戻り先アドレスの値をプリフエ
ツチしてもよい。Further, in this embodiment, after checking whether or not the pre-return was correct in the E stage (5), the correct return address is fetched from the external memory only when it is wrong. The value of the return destination address may be read regardless of the above. For example, if it is an RTS instruction, the value of the return address may be prefetched in the F stage (4).

また、本実施例ではＤステージ（２）より後段のステー
ジがサブルーチンコール命令を処理しているかどうかを
検出する手段としてカウンタを用いているが、各ステツ
プコードあるいは各パイプラインステージにサブルーチ
ンコール命令用のフラグを設けて、全てのフラグが立つ
ていないときのみ、プリリターン処理を行うようにして
もよい。Further, in this embodiment, the counter is used as a means for detecting whether or not the stage subsequent to the D stage (2) is processing the subroutine call instruction. However, for each step code or each pipeline stage, the subroutine call instruction is used. The flag may be provided and the pre-return processing may be performed only when all the flags are not set.

また、本実施例では、PCスタツク（46）のポインタとし
て、Ｄステージ（２）で管理しているポインタDP（66）
とＥステージ（５）で管理しているポインタEP（67）の
２つのポインタを備えている。これは、複数のサブルー
チンリターン命令がパイプライン中で処理される場合に
も正しい戻り先アドレスを参照できるようにしたもので
ある。EP（67）はＥステージ（５）で実行されたサブル
ーチンコール命令、サブルーチンリターン命令に対応し
て変化する。DP（66）は命令デコード段階で変化するた
め、２つ以上のサブルーチンリターン命令がパイプライ
ン中に取り込まれても対応するサブルーチンコール命令
の戻り先アドレスが参照できるわけである。Ｅステージ
（５）で分岐処理が行われたときにはパイプラインはキ
ヤンセルされるのでEP（67）の値がDP（66）にコピーさ
れる。PCスタツク（46）のポインタ管理をすべてEP（6
7）のみで行うようにし、サブルーチンリターン命令用
のフラグを設け、Ａステージ（３）以降のステージでサ
ブルーチンリターン命令を実行中の時はそのフラグを立
てておき、そのフラグが立つているときにはプリリター
ンの処理を待つようにしてもよい。この場合、ポインタ
が正しく切り替わつてからPCスタツク（46）の参照が行
えるので正しいプリリターンが行える。In this embodiment, the pointer DP (66) managed by the D stage (2) is used as the pointer of the PC stack (46).
And a pointer EP (67) managed by the E stage (5). This is so that the correct return address can be referred to even when a plurality of subroutine return instructions are processed in the pipeline. EP (67) changes corresponding to the subroutine call instruction and the subroutine return instruction executed in the E stage (5). Since DP (66) changes at the instruction decoding stage, even if two or more subroutine return instructions are taken into the pipeline, the return address of the corresponding subroutine call instruction can be referred to. When the branch processing is performed in the E stage (5), the pipeline is canceled, so the value of EP (67) is copied to DP (66). All pointer management of PC stack (46) EP (6
7) only, a flag for the subroutine return instruction is provided, the flag is set when the subroutine return instruction is being executed in the stage A (3) and subsequent stages, and the flag is set when the flag is set. You may make it wait for the return process. In this case, since the PC stack (46) can be referenced after the pointer has been properly switched, correct pre-return can be performed.

また、この発明のPCスタツク（46）はプリリターン時に
もプリリターンが正しく行われたかどうか判断するとき
ににもアクセスされ、CPU外部のメモリアクセスとは独
立に行うと効率がよい。従つて、CPUが１つの集積回路
チツプで実現されるマイクロプロセツサの様なデータ処
理装置ではPCスタツク（46）をCPUと同じ集積回路内に
もつようにすれば、CPU外部のメモリアクセスとは独立
にPCスタツク（46）がアクセスできる。Further, the PC stack (46) of the present invention is also accessed at the time of pre-return to determine whether or not the pre-return is correctly performed, and it is efficient to perform it independently of the memory access outside the CPU. Therefore, in a data processing device such as a microprocessor in which the CPU is realized by one integrated circuit chip, if the PC stack (46) is provided in the same integrated circuit as the CPU, the memory access outside the CPU is Independently accessible by PC stack (46).

この発明では次の（１）〜（４）項の実施態様により実
施することができる。This invention can be implemented by the embodiments of the following (1) to (4).

（１）第１のステージと第２のステージをもち、命令の
実行に対して第１のステージでの処理が第２のステージ
での処理に先行して行われるパイプライン処理により命
令を処理するデータ処理装置であつて、命令やデータを格納する第１の記憶装置と、サブルーチンからの戻り先命令のアドレス値を１つまた
は複数個格納するアドレス記憶部と、前記アドレス記憶
部に格納されている各戻り先アドレスの値が有効か無効
かを示す有効ビツトをサブルーチンからの戻り先アドレ
スと組にして格納する有効ビツト記憶部とを含む第１の
記憶装置とは異なる第２の記憶装置と、サブルーチンからの戻り先アドレスとなる値を前記第１
の記憶装置に書き込む第１の書き込み手段と、サブルーチンからの戻り先アドレスとなる値を前記第２
の記憶装置の前記戻り先アドレス記憶部に書き込む第２
の書き込み手段と、前記第１のステージで制御され、第１の値を前記第２の
記憶装置から読みだす第１の読み出し手段と、サブルーチンリターン命令処理時に、サブルーチンから
の戻り先アドレスとなる第２の値を前記第１の記憶装置
から読みだす第２の読み出し手段と、サブルーチンコー
ル命令処理時に、前記第２の記憶装置の前記有効ビツト
記憶部に有効を示す値を書き込む有効ビツト書き込み手
段と、サブルーチンリターン命令処理時に、前記第２の記憶装
置の前記有効ビツト記憶部に無効を示す値を書き込む有
効ビツトクリア手段と、サブルーチンリターン命令処理時に、前記第２の記憶装
置の前記有効ビツト記憶部に格納されている前記有効ビ
ツトを読み出す有効ビツト読み出し手段と、前記第１の記憶装置から命令をフエツチする命令フエツ
チ手段とを備え、前記命令フエツチ手段が、前記第１の記憶装置の前記第
１の値の示すアドレスから第１の命令をフエツチする機
能と、前記第１の記憶装置の前記第２の値の示すアドレ
スから第２の命令をフエツチする機能を備え、サブルーチンリターン命令処理時に、前記有効ビツト読み出し手段により読み出された有効ビ
ツトの値が有効を示す時は、前記第１の命令を実行し、前記有効ビツト読み出し手段により読み出された有効ビ
ツトの値が有効を示す時は、前記第２の命令を実行することを特徴とするデータ処理装置。(1) An instruction is processed by a pipeline process that has a first stage and a second stage, and for the execution of the instruction, the processing in the first stage precedes the processing in the second stage. A data processing device comprising: a first storage device for storing instructions and data; an address storage unit for storing one or a plurality of address values of a return destination instruction from a subroutine; and an address storage unit for storing data in the address storage unit. A second storage device different from the first storage device, which includes a valid bit storage unit that stores a valid bit indicating whether the value of each return destination address is valid or invalid in combination with the return destination address from the subroutine; , The value that is the return address from the subroutine is the first
First writing means for writing in the storage device, and a value which is a return address from the subroutine is stored in the second writing means.
Writing into the return address storage unit of the storage device of
Writing means, first reading means for reading the first value from the second storage device under the control of the first stage, and a return address from the subroutine when processing the subroutine return instruction. Second reading means for reading the value of 2 from the first storage device, and valid bit writing means for writing a value indicating validity in the valid bit storage part of the second storage device during processing of a subroutine call instruction. Valid bit clearing means for writing a value indicating invalidity to the valid bit storage section of the second storage device during processing of the subroutine return instruction; and valid bit storage section of the second storage device during processing of the subroutine return instruction. Valid bit reading means for reading the stored valid bit; and a command from the first storage device. An instruction fetching means for executing the instruction fetching means for fetching the first instruction from the address indicated by the first value of the first storage device, and the second fetching means of the first storage device. Is provided with a function of fetching the second instruction from the address indicated by the value of, when the valid bit value read by the valid bit reading means is valid during the processing of the subroutine return instruction, the first instruction is executed. A data processing apparatus, which executes the second instruction, when the value of the valid bit read by the valid bit reading means is valid.

（２）前記第２の記憶装置は、前記アドレス記憶部の１
つのエントリと前記有効ビツト記憶部の１エントリを１
つのエントリとし、2ⁿ個のサイクリツクな番号がつけら
れたエントリで構成され、インクリメントまたはデクリメントの少なくとも一方が
可能で、前記エントリの番号を管理する第１のｎビツト
カウンタと、インクリメントおよびデクリメントの両方が可能で、前
記エントリの番号を管理する第２のｎビツトカウンタ
と、前記第２のｎビツトカウンタの値を前記第１のｎビツト
カウンタに書き込む第３の書き込み手段を備え、前記第２の書き込み手段が前記第２の記憶装置の前記第
２のｎビツトカウンタの値が示すエントリ番号にサブル
ーチンからの戻り先アドレスを書き込む手段であり、前記第１の読み出し手段が前記第２の記憶装置の前記第
１のｎビツトカウンタの値が示すエントリ番号の前記ア
ドレス記憶部から前記第１の値を読み出す手段であり、前記有効ビツト書き込み手段が前記第２の記憶装置の前
記第２のｎビツトカウンタの値が示すエントリ番号の前
記有効ビツト記憶部に有効を示す値を書き込む手段であ
り、前記有効ビツトクリア手段が前記第２の記憶装置の前記
第２のｎビツトカウンタの値が示すエントリ番号の前記
有効ビツト記憶部に無効を示す値を書き込む手段であることを特徴とする第１項記載のデータ処理装置。(2) The second storage device is the address storage unit 1
1 entry and 1 entry in the valid bit storage section
Each entry consists of 2 ⁿ cyclically numbered entries, at least one of which can be incremented or decremented, and the first n-bit counter for managing the number of the entry and both the increment and decrement. A second n-bit counter for managing the number of the entry, and a third writing means for writing the value of the second n-bit counter to the first n-bit counter. The writing means is means for writing the return address from the subroutine to the entry number indicated by the value of the second n-bit counter in the second storage device, and the first reading means is for the second storage device. The first value is read from the address storage unit of the entry number indicated by the value of the first n-bit counter. The valid bit writing means is a means for writing a valid value in the valid bit storage section of the entry number indicated by the value of the second n-bit counter of the second storage device, 2. The data according to claim 1, wherein the bit clearing means is means for writing a value indicating invalidity into the valid bit storage part of the entry number indicated by the value of the second n-bit counter of the second storage device. Processing equipment.

（３）前記第２の書き込み手段により、前記第１のステ
ージで処理を終えた全サブルーチンコール命令に対す
る、前記第２の記憶装置へのサブルーチンからの戻り先
命令のアドレスの書き込み処理が終了しているかどうか
を検出するサブルーチンコール命令処理検出手段を備え
たことを特徴とする第１項あるいは第２項記載のデータ処
理装置。(3) By the second writing means, the write processing of the address of the return destination instruction from the subroutine to the second storage device is completed for all the subroutine call instructions that have been processed in the first stage. The data processing device according to claim 1 or 2, further comprising a subroutine call instruction processing detecting means for detecting whether or not the data processing apparatus is present.

（４）命令やデータを格納する第１の記憶装置と、第１の記憶装置とは異なり、サブルーチンからの戻り先
命令のアドレス値の一部あるいは全部を格納するアドレ
ス記憶部と前記アドレス値が有効か無効かを示す有効ビ
ツトを格納する有効ビツト格納部を１エントリずつ組に
して１つのエントリとして格納し、2ⁿ個のエントリから
なる第２の記憶装置と、インクリメントまたはデクリメントの少なくとも一方が
可能で、前記エントリの番号を管理する第１のｎビツト
カウンタと、インクリメントおよびデクリメントの両方が可能で、前
記エントリの番号を管理する第２のｎビツトカウンタ
と、前記第２の記憶装置の前記第１のｎビツトカウンタの値
が示すエントリから戻り先命令のアドレス値を読みだす
第１の読み出し手段と、前記第２の記憶装置の前記第２のｎビツトカウンタの値
が示すエントリの前記アドレス記憶部にサブルーチンか
らの戻り先命令のアドレスの一部または全部を書き込む
第１の書き込み手段と、前記第２の記憶装置の前記第２のｎビツトカウンタの値
が示すエントリの前記有効ビツト記憶部に有効か無効か
を示す値を書き込む有効ビツト書き込み手段と、前記第２の記憶装置の前記有効ビツト記憶部に格納され
ている有効ビツトの値を読み出す有効ビツト読み出し手
段と、前記第２のｎビツトカウンタの値を前記第１のｎビツト
カウンタに書き込む第２の書き込み手段と、前記第２の
記憶装置の全てのエントリの前記有効ビツト記憶部に無
効を示す値を書き込む有効ビツトクリア手段とを備えたことを特徴とするデータ処理装置。(4) The first storage device for storing instructions and data differs from the first storage device in that the address storage unit for storing a part or all of the address value of the return destination instruction from the subroutine and the address value are A valid bit storage unit for storing valid bits indicating valid or invalid is stored as one entry grouped as one entry, and a second storage device having 2 ⁿ entries and at least one of increment and decrement are stored. A first n-bit counter that manages the number of the entry, and a second n-bit counter that manages both the increment and decrement and manages the number of the entry; First reading means for reading the address value of the return destination instruction from the entry indicated by the value of the first n-bit counter; First writing means for writing a part or all of the address of the return destination instruction from the subroutine to the address storage section of the entry indicated by the value of the second n-bit counter of the storage apparatus; Valid bit writing means for writing a value indicating valid or invalid in the valid bit storage part of the entry indicated by the value of the second n-bit counter, and stored in the valid bit storage part of the second storage device. Valid bit reading means for reading the value of the valid bit present, second writing means for writing the value of the second n-bit counter to the first n-bit counter, and all entries of the second storage device. A data processing device comprising: a valid bit clearing means for writing a value indicating invalidity into the valid bit storage section.

〔The invention's effect〕

以上のように、この発明によればサブルーチンコール命
令の戻り先アドレスのみを格納するPCスタツクを設ける
ことにより、サブルーチンリターン命令の分岐処理を命
令実行ステージでの処理に先だつて行うことができ、サ
ブルーチンリターン命令実行によるパイプライン処理の
オーバーヘツドが削減されるので、高性能なデータ処理
装置が得られる効果がある。As described above, according to the present invention, by providing the PC stack that stores only the return address of the subroutine call instruction, the branch processing of the subroutine return instruction can be performed prior to the processing at the instruction execution stage. Since the overhead of pipeline processing due to execution of the return instruction is reduced, a high-performance data processing device can be obtained.

[Brief description of drawings]

第１図はこの発明のデータ処理装置のパイプライン処理
構成を示す図、第２図はこの発明のデータ処理装置のブ
ロツク図、第３図はこの発明のデータ処理装置における
サブルーチンリターン命令の先行分岐処理に特に関係す
る部分のブロツク図、第４図はこの発明のPCスタツク
（46）の構成を示す図、第５図はこの発明のデータ処理
装置におけるサブルーチンコール命令及びサブルーチン
リターン命令のビツト割り付けを示す図、第６図はBSR
命令実行のフローチヤート、第７図はRTS命令実行のフ
ローチヤート、第８図は従来のデータ処理装置の典型的
なパイプラインステージを示す図である。（46）はサブルーチンコール命令の戻り先アドレスのみ
を格納するPCスタツク、（46A）はPCスタツク（46）に
おいてサブルーチンコール時の戻り先アドレスを登録す
る戻り先アドレスフイールド、（46B）はPCスタツク（4
6）中の各エントリに格納されている戻り先アドレスが
有効か無効かを示す有効ビツト、（65）は命令デコード
ステージ以降のステージで処理されているサブルーチン
コール命令の数をカウントするBSRカウンタ、（66）は
命令デコードステージが管理しているPCスタツク（46）
のポインタDP、（67）は命令実行ステージが管理してい
るPCスタツク（46）のポインタEPである。なお、図中、同一符号は同一、又は相当部分を示す。FIG. 1 is a diagram showing a pipeline processing configuration of a data processing device of the present invention, FIG. 2 is a block diagram of the data processing device of the present invention, and FIG. 3 is a preceding branch of a subroutine return instruction in the data processing device of the present invention. A block diagram of a portion particularly related to processing, FIG. 4 is a diagram showing a configuration of a PC stack (46) of the present invention, and FIG. 5 is a bit allocation of a subroutine call instruction and a subroutine return instruction in the data processing device of the present invention. Figure and Figure 6 show BSR
FIG. 7 is a flow chart of instruction execution, FIG. 7 is a flow chart of RTS instruction execution, and FIG. 8 is a diagram showing a typical pipeline stage of a conventional data processor. (46) is a PC stack that stores only the return address of the subroutine call instruction, (46A) is a PC stack (46) that is the return address field that registers the return address at the time of subroutine call, and (46B) is the PC stack (46B). Four
A valid bit indicating whether the return address stored in each entry in 6) is valid or invalid, (65) is a BSR counter that counts the number of subroutine call instructions processed in the stages following the instruction decoding stage, (66) is a PC stack managed by the instruction decode stage (46)
The pointer DP, (67) is the pointer EP of the PC stack (46) managed by the instruction execution stage. In the drawings, the same reference numerals indicate the same or corresponding parts.

Claims

[Claims]

1. A first stage and a second stage,
A data processing device that processes an instruction by pipeline processing, in which the processing in the first stage precedes the processing in the second stage in response to the execution of the instruction, and stores a first instruction and data. Storage device, an address storage unit that stores one or more address values of a return destination instruction from a subroutine, and a validity indicating whether the value of each return destination address stored in the address storage unit is valid or invalid. A second storage device different from the first storage device that includes a valid bit storage unit that stores a bit in combination with a return address from the subroutine, and a value that is a return address from the subroutine is the first storage device.
First writing means for writing in the storage device, and a value which is a return address from the subroutine is stored in the second writing means.
Writing into the return address storage unit of the storage device of
Writing means for reading the first value from the second storage device, which is controlled by the first stage, and a second address which is a return address from the subroutine when the subroutine return instruction is processed. Second read means for reading the value of from the first storage device, valid bit writing means for writing a value indicating validity in the valid bit storage part of the second storage device during processing of a subroutine call instruction, and a subroutine Valid bit clearing means for writing a value indicating invalidity into the valid bit storage part of the second storage device at the time of return command processing; and storing in the valid bit storage part of the second storage device at processing of a subroutine return command Valid bit reading means for reading the valid bit stored therein, and a command from the first storage device. And a function for fetching the first instruction from the address indicated by the first value of the first storage device, and the second fetch function of the first storage device. Has a function of fetching a second instruction from the address indicated by the value of, and when the value of the valid bit read by the valid bit reading means is valid at the time of processing the subroutine return instruction, the first instruction is executed. The data processing device is characterized by executing the second instruction when the value of the valid bit read by the valid bit reading means indicates invalid.

2. A first storage device for storing instructions and data; and, unlike the first storage device, an address storage section for storing a part or all of an address value of a return destination instruction from a subroutine, and the above-mentioned. The valid bit storage unit that stores the valid bit indicating whether the value of the return address stored in the address storage unit is valid or invalid is grouped into one entry and stored as one entry, and consists of 2 ⁿ entries. Second
A first n-bit counter capable of incrementing or decrementing and managing the number of the entry, and a second n-bit counter capable of both incrementing and decrementing and managing the number of the entry. A bit counter; first reading means for reading the address value of the return destination instruction from the entry indicated by the value of the first n-bit counter of the second storage device; and the second reading device of the second storage device. first write means for writing a part or all of the address of the return destination instruction from the subroutine to the address storage section of the entry indicated by the value of the n-bit counter; and the second n-bit counter of the second storage device. Valid bit writing means for writing a value indicating valid or invalid in the valid bit storage section of the entry indicated by A valid bit reading means for reading a value of a valid bit stored in the valid bit storage section of the second storage device; and a value for writing the value of the second n-bit counter in the first n-bit counter. A data processing device comprising: a second writing unit; and a valid bit clearing unit that writes a value indicating invalidity into the valid bit storage units of all entries of the second storage device.