JP3146058B2

JP3146058B2 - Parallel processing type processor system and control method of parallel processing type processor system

Info

Publication number: JP3146058B2
Application number: JP08249092A
Authority: JP
Inventors: 健相川; 健二皆川; 光男斎藤; 譲治武田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-04-05
Filing date: 1992-04-03
Publication date: 2001-03-12
Anticipated expiration: 2016-03-12
Also published as: JPH05181676A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、命令レベルでの並列実
行を行うための並列処理型プロセッサシステムに関し、
より詳細には、並列処理型プロセッサシステムにおける
トラップ処理とストール処理の制御機能に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel processing type processor system for executing instructions at a parallel level.
More specifically, the present invention relates to a function for controlling trap processing and stall processing in a parallel processing type processor system.

【０００２】[0002]

【従来の技術】近年、高速パーソナルコンピュータに対
して高まりつつある要望に応るものとして、機械語命令
レベルで並列処理が可能なスーパースカラプロセッサあ
るいはＶＬＩＷと呼ばれるＣＰＵが開発されＶＬＳＩチ
ップとして実現されている。この様な並列処理ＣＰＵに
おいては、ＲＩＳＣの命令を基本的な命令セットとして
用い、複数命令を同時にフェッチ、実行することによ
り、処理性能を向上させている。特に、スーパースカラ
プロセッサは、従来の命令レベルで逐次処理を行うＲＩ
ＳＣを実現しユーザプログラムレベルで互換性を保つこ
とが可能なようなアーキテクチャを有し、計算機ユーザ
ーからの期待が高い。2. Description of the Related Art In recent years, a CPU called a super scalar processor or VLIW capable of parallel processing at a machine language instruction level has been developed and realized as a VLSI chip in order to meet a growing demand for a high-speed personal computer. I have. In such a parallel processing CPU, the RISC instruction is used as a basic instruction set, and a plurality of instructions are simultaneously fetched and executed, thereby improving the processing performance. In particular, the super scalar processor is an RI that performs sequential processing at the conventional instruction level.
It has an architecture that realizes SC and can maintain compatibility at the user program level, and is highly expected by computer users.

【０００３】このような従来のトラップ制御方法を採用
する命令レベルで並列処理を行うことができるプロセッ
サシステムは、図１９に示すような構成を有している。A processor system capable of performing parallel processing at an instruction level employing such a conventional trap control method has a configuration as shown in FIG.

【０００４】この図１９の構成は、命令レベルで並列処
理を実行可能なプロセッサシステムを実現するものであ
り、Ｆステージ（フェッチ）、Ｄステージ（デコー
ド）、Ｅステージ（実行）、Ｍステージ（メモリアクセ
ス）及びＷステージ（レジスタライトバック）の５段の
パイプラインステージを有し、各命令の長さが１ワード
長（３２ビット）である。The configuration shown in FIG. 19 realizes a processor system capable of executing parallel processing at the instruction level, and includes an F stage (fetch), a D stage (decode), an E stage (execution), and an M stage (memory). Access) and W stage (register write back), each of which has five pipeline stages, and each instruction is one word long (32 bits).

【０００５】図１９に示されるように、このプロセッサ
システムは、命令を格納するための命令メモリ１と；Ｆ
ステージで４ワードバウンダリの４つの命令を命令メモ
リ１から同時にフェッチし、Ｄステージで４つのフェッ
チされた命令間のデータ依存関係及び制御依存関係を考
慮し、Ｅステージで命令供給線２０、２１、２２及び２
３を介して実行可能な命令を供給する命令発行ユニット
２と；命令供給線２０及び２１から供給される命令に従
って、Ｅステージで算術論理演算及びメモリアドレス計
算を実行する算術論理演算ユニット（ALU0及びALU1）３
及び４と；命令供給線２２から供給される命令に従って
Ｅステージで浮動小数点加減算を行う浮動小数点加算器
（FADD）５と；命令供給線２３から供給される命令に従
ってＥステージで浮動小数点乗除算を行う浮動小数点乗
算器（FMUL）６と；ALU0３及びALU1４の出力に従って、
Ｍステージで２ポートデータメモリ２５に対してメモリ
アクセス処理を行うメモリアクセスユニット（MA0 及び
MA1 ）７及び８と；FADD５及びFMUL６の出力に従って、
各々のＭステージでの浮動小数点計算における例外チェ
ックを行うための浮動小数点例外チェックユニット（EC
1 及びEC2 ）９及び１０と；ＷステージでMA0 ７、MA1
８、EC1 ９及びEC2 １０の出力を受ける４つの書き込み
ポートと、Ｅステージでオペランドデータ供給線１２〜
１９を介してALU0３、ALU1４、FADD５及びFMUL６へオペ
ランドデータを供給するための８つの読み込みポートと
からなる１２のポートを有するマルチポートレジスタフ
ァイル１１と、を備えている。As shown in FIG. 19, this processor system has an instruction memory 1 for storing instructions;
At the stage, four instructions of a four-word boundary are fetched from the instruction memory 1 at the same time, and at the D stage, the instruction supply lines 20, 21, 22 and 2
An instruction issuing unit 2 for supplying an executable instruction via an instruction supply line 3; an arithmetic logic unit (ALU0 and ALU1) 3
A floating point adder (FADD) 5 for performing floating point addition and subtraction at the E stage according to the instruction supplied from the instruction supply line 22; and performing floating point multiplication and division at the E stage according to the instruction supplied from the instruction supply line 23. A floating point multiplier (FMUL) 6 to perform; according to the outputs of ALU03 and ALU14,
A memory access unit (MA0 and MA0) that performs memory access processing on the 2-port data memory 25 in the M stage
MA1) 7 and 8; according to the output of FADD5 and FMUL6
Floating-point exception check unit (EC) for checking exceptions in floating-point calculations at each M stage
1 and EC2) 9 and 10; MA0, MA1 at W stage
8, EC1 9 and EC2 10, four write ports for receiving the outputs, and the E stage with operand data supply lines 12 to
And a multiport register file 11 having 12 ports consisting of 8 read ports for supplying operand data to ALU03, ALU14, FADD5, and FMUL6 via an interface 19.

【０００６】この図１９の構成においては、MA0 ７及び
MA1 ８によってページフォールトやオーバーフローなど
の算術演算例外トラップが発生し、又、EC1 ９及びEC2
１０によって浮動小数点演算例外トラップが発生する。In the configuration of FIG. 19, MA07 and MA07
MA1 8 causes an arithmetic exception trap, such as a page fault or overflow, and EC19 and EC2
10 causes a floating point operation exception trap.

【０００７】このような例外トラップに対処するため
に、このプロセッサシステムには更に、トラップ発生の
原因を格納するためのトラップ原因レジスタ３０と、ト
ラップ発生を引き起こした命令のアドレスを格納するた
めのトラップアドレスレジスタ３２と、トラップ要求信
号線４３〜４６を介して送られる、MA0 ７、MA1 ８、EC
1 ９及びEC2 １０からのトラップ原因を受け、それに応
じてトラップ信号をトラップ信号線３４ー３８にアサー
トし、信号線４０及び４２を介してトラップ原因レジス
タ３０及びトラップアドレスレジスタ３２への入力を適
宜発生させる、トラップ制御ユニット３３と、を備えて
いる。To cope with such an exception trap, the processor system further includes a trap cause register 30 for storing the cause of the trap and a trap for storing the address of the instruction causing the trap. MA0 7, MA18, EC sent via the address register 32 and the trap request signal lines 43 to 46
In response to the trap cause from 19 and EC2 10, the trap signal is asserted on the trap signal lines 34-38 in response to the input to the trap cause register 30 and the trap address register 32 via the signal lines 40 and 42 as appropriate. And a trap control unit 33 for generating the trap.

【０００８】トラップ信号線３４におけるトラップ信号
は、命令発行ユニット２、ALU0３及びALU1４へ送られ、
一方、トラップ信号線３５〜３８におけるトラップ信号
は、各々、MA0 ７、MA1 ８、EC1 ９及びEC2 １０へ送ら
れる。トラップ制御ユニット３３からのトラップ信号に
応じて、実行無効化フラグが各部に立てられ、その後の
パイプラインステージでの命令の処理をアボートする一
方、命令発行ユニット２は予め定められたトラップ処理
ルーチンに関する命令フェッチを開始し、このトラップ
処理ルーチンにおいて、トラップ原因レジスタ３０及び
トラップアドレスレジスタ３２に格納されたトラップ原
因及びトラップアドレスが使用される。The trap signal on the trap signal line 34 is sent to the instruction issuing unit 2, ALU03 and ALU14,
On the other hand, the trap signals on the trap signal lines 35 to 38 are sent to MA0 7, MA1 8, EC1 9 and EC2 10, respectively. In response to a trap signal from the trap control unit 33, an execution invalidation flag is set in each unit to abort the subsequent instruction processing in the pipeline stage, while the instruction issuing unit 2 relates to a predetermined trap processing routine. Instruction fetch is started, and in this trap processing routine, the trap cause and the trap address stored in the trap cause register 30 and the trap address register 32 are used.

【０００９】更に詳細には、トラップ制御ユニット３３
は図２０に示されるような構成を有している。即ち、ト
ラップ制御ユニット３３は更に、Ｍステージで現在実行
される命令のアドレスの下位２ビットを除くことによっ
て得られるワードアドレスの共通部分を格納するＭステ
ージプログラムカウンタ（MPC ）５１と；Ｍステージで
MA0 ７、MA1 ８、EC1 ９及びEC2 １０によって現在実行
される命令のアドレスの下位２ビットによって示される
ワードアドレスの個別部分を格納するためのＭステージ
サブプログラムカウンタ（submpc1, submpc2, submpc3
及びsubmpc4 ）５３、５４、５５及び５６と；Ｍステー
ジサブプログラムカウンタ５３〜５６の中で最小のエン
トリを、MPC ５１のエントリと組み合わされてトラップ
アドレスレジスタ３２に供給されるトラップアドレス４
２を発生するための出力４７として出力し、最小のエン
トリを有するＭステージサブプログラムカウンタ５３〜
５６に対応するトラップ要求信号線４３〜４６のうちの
１つを介して送られるトラップ原因をトラップ原因レジ
スタ３０に供給されるトラップ原因４０として出力する
トラップデータ生成ユニット５７を備えている。More specifically, the trap control unit 33
Has a configuration as shown in FIG. That is, the trap control unit 33 further includes an M stage program counter (MPC) 51 for storing a common part of the word address obtained by removing the lower two bits of the address of the instruction currently executed in the M stage;
M stage subprogram counters (submpc1, submpc2, submpc3) for storing the individual parts of the word address indicated by the lower two bits of the address of the instruction currently executed by MA0 7, MA1 8, EC1 9 and EC2 10.
And submpc4) 53, 54, 55 and 56; the smallest entry among the M-stage subprogram counters 53 to 56 is combined with the entry of the MPC 51 and the trap address 4 to be supplied to the trap address register 32.
2 is output as an output 47 for generating 2 and the M-stage subprogram counter 53-
A trap data generation unit 57 that outputs a trap cause sent via one of the trap request signal lines 43 to 46 corresponding to 56 as a trap cause 40 supplied to the trap cause register 30 is provided.

【００１０】ここで、トラップ信号線３４におけるトラ
ップ信号はトラップ要求信号線４３〜４６のいずれか１
つからトラップ原因を受けた時にアサートされ、各トラ
ップ信号３５〜３８は、トラップ要求信号線４３〜４６
の１つからトラップ要求を受け、対応するＭステージサ
ブプログラムカウンタ５３〜５６の１つが、トラップ要
求を発生したＭステージサブプログラムカウンタ３５〜
５６のうちの１つにおけるエントリより大きいか又は等
しいエントリを有する時にアサートされる。The trap signal on the trap signal line 34 is one of the trap request signal lines 43 to 46.
, Is asserted when a trap cause is received, and each of the trap signals 35 to 38 is set to a trap request signal line 43 to 46.
And the corresponding one of the M-stage subprogram counters 53 to 56 receives the trap request from one of the M-stage subprogram counters 35 to 56 that have generated the trap request.
Asserted when it has an entry that is greater than or equal to an entry in one of 56.

【００１１】図２１は図１９のプロセッサシステムによ
って実行されるプログラムの一例を示す図であり、図２
２は、上述の従来のトラップ制御方法を用いた図１９の
プロセッサシステムによるパイプライン処理において、
図２１のプログラムが実行された時に「load」命令でペ
ージフォールトが起こった場合の進行状況を示すもので
あり、斜線部分はアボートされた命令を示す。図２２に
示すように、従来のトラップ制御においては、プログラ
ム実行の進行中にn+2 番目の「load」命令の実行により
トラップが発生したとき、命令番号がn+2 より大きいか
又は等しい命令だけがアボートされる。FIG. 21 is a diagram showing an example of a program executed by the processor system of FIG.
2 is a pipeline processing by the processor system of FIG. 19 using the above-described conventional trap control method.
21 shows the progress of a page fault caused by the "load" instruction when the program of FIG. 21 is executed, and the hatched portion indicates the aborted instruction. As shown in FIG. 22, in the conventional trap control, when a trap occurs due to execution of an (n + 2) th “load” instruction while program execution is in progress, an instruction having an instruction number greater than or equal to n + 2 is executed. Only aborted.

【００１２】[0012]

【発明が解決しようとする課題】しかしながら、このよ
うな従来のトラップ制御方法においては、C+3 サイクル
目にがMA1 ８によってページフォールトが検出されてト
ラップ要求がトラップ要求線４４を介して示されたとき
に、トラップ信号線３５〜３８におけるトラップ信号は
Ｍステージサブプログラムカウンタ５３〜５６における
エントリが互いに比較されてどれがどれより大きいかが
決定されるまで決められず、そのような比較処理を施す
サイクル時間がかなり長くなるためにクロック周波数の
低下を引き起こすという問題があった。However, in such a conventional trap control method, a page fault is detected by the MA18 at the C + 3 cycle, and a trap request is indicated via the trap request line 44. The trap signals on the trap signal lines 35-38 are not determined until the entries in the M-stage subprogram counters 53-56 are compared with each other to determine which is greater, and such a comparison process is performed. There is a problem that the clock frequency is reduced because the cycle time is considerably long.

【００１３】又、ＲＩＳＣは、単純なデータパス及び簡
易な制御回路を有する構成を必要としており、複数のＲ
ＩＳＣデータパスを有するスーパースカラプロセッサの
データパスはさほど複雑ではないが、スーパースカラプ
ロセッサの制御回路は、命令供給制御等を必要とするた
めに非常に複雑となる。特に、ＯＳのようなソフトウエ
アのサポートがないと処理の続行が不可能になる、いわ
ゆる例外と呼ばれるケースの処理のためのハードウェア
が非常に複雑になり、そのようなハードウエアの設計に
非常に時間を要するために、このようなハードウェアが
スーパースカラプロセッサの実現におけるクリチカルパ
スとなる場合が多いという問題があった。The RISC requires a configuration having a simple data path and a simple control circuit.
Although the data path of a superscalar processor having an ISC data path is not so complicated, the control circuit of the superscalar processor is very complicated because it requires instruction supply control and the like. In particular, without the support of software such as an OS, it is impossible to continue processing, and the hardware for processing a so-called exception case becomes very complicated, and the design of such hardware becomes very complicated. However, there is a problem that such hardware often becomes a critical path in the realization of a superscalar processor.

【００１４】本発明は、この様な従来技術の課題を解決
するためになされたもので、サイクル時間を増加させる
ことなく処理可能で、システムにおけるクロック周波数
の低下を防げるようなトラップ及びストール制御機能を
組み込んだ、スーパースカラプロセッサ等の並列処理型
プロセッサシステムを提供することを目的とするもので
ある。The present invention has been made to solve such problems of the prior art, and has a trap and stall control function capable of processing without increasing the cycle time and preventing a decrease in clock frequency in the system. It is an object of the present invention to provide a parallel processing type processor system, such as a superscalar processor, which incorporates a processor.

【００１５】［発明の構成］[Configuration of the Invention]

【００１６】[0016]

【課題を解決するための手段】上記課題を解決するため
本発明の並列処理型プロセッサシステムは、複数命令を
同時実行するＮ個（Ｎは整数）の演算器と、前記Ｎ個の
演算器により実行される命令を供給する命令供給手段
と、Ｍ個（Ｎ≧Ｍ、Ｍは整数）の命令が前記命令供給手
段から前記Ｎ個の演算器に同時に供給され、これらＭ個
の命令中の少なくとも一つの命令実行において処理例外
が発生したとき、前記Ｎ個の演算器に同時に供給された
前記Ｍ個の命令の処理を全てアボートするように前記Ｎ
個の演算器を制御するトラップ制御手段とを備えたこと
を特徴とする。In order to solve the above-mentioned problems, a parallel processing type processor system according to the present invention comprises N (N is an integer) arithmetic units for simultaneously executing a plurality of instructions, and the N arithmetic units. Instruction supply means for supplying an instruction to be executed; and M (N ≧ M, M is an integer) instructions are simultaneously supplied from the instruction supply means to the N operation units, and at least one of the M instructions When a processing exception occurs in the execution of one instruction, the N instruction is executed so as to abort the processing of the M instructions supplied to the N arithmetic units at the same time.
And trap control means for controlling the number of arithmetic units.

【００１７】又、本発明の並列処理型プロセッサシステ
ムの制御方法はＮ個の演算器にＭ個（Ｎ≧Ｍ）の命令を
同時に供給するステップと、これらＭ個の命令中の少な
くとも一つの命令実行において処理例外が発生したと
き、前記Ｎ個の演算器に同時に供給された前記Ｍ個の命
令の処理を全てアボートするように前記Ｎ個の演算器を
制御するステップとを備えたことを特徴とする。Further, in the control method of the parallel processing type processor system according to the present invention, a step of simultaneously supplying M (N ≧ M) instructions to N arithmetic units, and at least one instruction among the M instructions And controlling the N arithmetic units so as to abort all the processing of the M instructions supplied to the N arithmetic units when a processing exception occurs in the execution. And

【００１８】[0018]

【作用】本発明においては、命令供給手段によりＮ個の
演算器に同時に供給されたＭ個の命令の中の１つ以上の
命令実行過程で処理例外が発生したとき、同時に供給さ
れたＭ個の命令の実行をすべて中止する手段を備えたの
で、同時に演算器に供給されたＭ個の命令の処理過程で
どれか１つでも例外を発生したらＭ個の命令をすべてア
ボートしてＯＳ内の処理ルーチンへディスパッチするこ
とにより、例外処理のためのハードウエアが簡単化され
る。According to the present invention, when a processing exception occurs during the execution of one or more instructions among the M instructions simultaneously supplied to the N arithmetic units by the instruction supply means, the M simultaneously supplied instructions are output. Means for aborting the execution of all of the instructions, if at least one exception occurs during the processing of the M instructions supplied to the computing unit at the same time, all M instructions are aborted and the OS By dispatching to the processing routine, the hardware for exception handling is simplified.

【００１９】[0019]

【実施例】以下、図１から図３を参照して、本発明に係
る並列処理型プロセッサシステムの第１の実施例を詳細
に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a first embodiment of a parallel processing type processor system according to the present invention will be described in detail with reference to FIGS.

【００２０】この図１の構成は、命令レベルで並列処理
を行うことができるプロセッサシステムを実現するもの
であり、Ｆステージ（フェッチ）、Ｄステージ（デコー
ド）、Ｅステージ（実行）、Ｍステージ（メモリアクセ
ス）及びＷステージ（レジスタライトバック）の５段の
パイプラインステージがあり、各命令の長さは１ワード
長（３２ビット）である。The configuration shown in FIG. 1 realizes a processor system capable of performing parallel processing at an instruction level, and includes an F stage (fetch), a D stage (decode), an E stage (execution), and an M stage ( There are five pipeline stages, a memory access) and a W stage (register write back), and each instruction is one word long (32 bits).

【００２１】この第１実施例においては、図１に示すよ
うに、プロセッサシステムは、命令を格納するための命
令メモリ１０１と；Ｆステージで４ワードバウンダリの
４つの命令を命令メモリ１０１から同時にフェッチし、
Ｄステージで４つのフェッチされた命令間のデータ依存
関係及び制御依存関係を考慮し、Ｅステージで命令供給
線１２０、１２１、１２２及び１２３を介して実行可能
な命令を供給するためのする命令発行ユニット１０２
と；命令供給線１２０及び１２１から供給される命令に
従って、Ｅステージで算術論理演算及びメモリアドレス
計算を実行する算術論理演算ユニット（ALU0及びALU1）
１０３及び１０４と；命令供給線１２２から供給される
命令に従ってＥステージで浮動小数点加減算を行う浮動
小数点加算器（FADD）１０５と；命令供給線１２３から
供給される命令に従ってＥステージで浮動小数点乗除算
を行う浮動小数点乗算器（FMUL）１０６と；ALU0１０３
及びALU1１０４の出力に従って、Ｍステージで２ポート
データメモリ１２５に対するメモリアクセス処理を行う
メモリアクセスユニット（MA0 及びMA1 ）１０７及び１
０８と；FADD１０５及びFMUL１０６の出力に従って、各
々のＭステージでの浮動小数点計算における例外チェッ
クを行うための浮動小数点例外チェックユニット（EC1
及びEC2 ）１０９及び１１０と；ＷステージでMA0 １０
７、MA1 １０８、EC1 １０９及びEC2 １１０の出力を受
ける４つの書き込みポートと、Ｅステージでオペランド
データ供給線１１２〜１１９を介してALU0１０３、ALU1
１０４、FADD１０５及びFMUL１０６へオペランドデータ
を供給するための８つの読み込みポートとからなる１２
のポートを有するマルチポートレジスタファイル１１１
と、を備えている。In the first embodiment, as shown in FIG. 1, a processor system includes an instruction memory 101 for storing instructions and four instructions of a 4-word boundary at the F stage from the instruction memory 101 at the same time. And
Instruction issuance to supply executable instructions via instruction supply lines 120, 121, 122 and 123 at E stage, taking into account data and control dependencies among the four fetched instructions at D stage Unit 102
Arithmetic logic units (ALU0 and ALU1) for executing arithmetic logic operations and memory address calculations in the E stage in accordance with the instructions supplied from the instruction supply lines 120 and 121
103 and 104; a floating point adder (FADD) 105 for performing floating point addition and subtraction at the E stage according to the instruction supplied from the instruction supply line 122; and floating point multiplication and division at the E stage according to the instruction supplied from the instruction supply line 123 Floating point multiplier (FMUL) 106 for performing ALU0103
And the memory access units (MA0 and MA1) 107 and 1 for performing a memory access process on the 2-port data memory 125 in the M stage according to the output of the ALU 1104
08; a floating-point exception check unit (EC1) for performing an exception check in the floating-point calculation in each M stage according to the output of the FADD 105 and the FMUL 106
And EC2) 109 and 110; MA0 10 at W stage
7, four write ports for receiving outputs of MA1 108, EC1 109 and EC2 110, and ALU0103 and ALU1 via operand data supply lines 112 to 119 in the E stage.
12 comprising eight read ports 104 for supplying operand data to 104, FADD 105 and FMUL 106
Multi-port register file 111 having three ports
And

【００２２】この図１の構成においては、MA0 １０７及
びMA1 １０８によってページフォールトやオーバーフロ
ーなどの算術演算例外トラップが発生し、又、EC1 １０
９及びEC2 １１０によって浮動小数点演算例外トラップ
が発生する。In the configuration shown in FIG. 1, an arithmetic operation exception trap such as a page fault or an overflow is generated by MA0 107 and MA1 108, and EC1 10
9 and EC2 110 cause a floating point operation exception trap.

【００２３】このような例外トラップに対処するため
に、このプロセッサシステムは更に、トラップ発生の原
因を格納するためのトラップ原因レジスタ１３０と；ト
ラップによって実行が遮断されている命令中で最も小さ
いアドレスを有する命令のアドレスを格納するためのア
ボートアドレスレジスタ１３１と；トラップ発生を起こ
した命令のアドレスを格納するためのトラップアドレス
レジスタ１３２と；トラップ要求信号線１４３〜１４６
を介して送られる、MA0 １０７、MA1 １０８、EC1 １０
９及びEC2 １１０からのトラップ原因を受け、それに応
じてトラップ信号をトラップ信号線１３４を介してアサ
ートし、一方、信号線１４０、１４１及び１４２を介し
てトラップ原因レジスタ１３０、アボートアドレスレジ
スタ１３１、トラップアドレスレジスタ１３２への入力
を適宜発生させるトラップ制御ユニット１３３と、を備
えている。To deal with such an exception trap, the processor system further includes a trap cause register 130 for storing the cause of the trap occurrence; An abort address register 131 for storing an address of an instruction having the same; a trap address register 132 for storing an address of an instruction causing a trap; and trap request signal lines 143 to 146.
MA0 107, MA1 108, EC1 10 sent via
9 and the trap cause from EC2 110 and asserts a trap signal accordingly via trap signal line 134, while trap cause register 130, abort address register 131, and trap signal via signal lines 140, 141 and 142. A trap control unit 133 for appropriately generating an input to the address register 132.

【００２４】トラップ信号線１３４におけるトラップ信
号は、命令発行ユニット１０２、ALU0１０３、ALU1１０
４、FADD１０５、FMUL１０６、MA0 １０７、MA1 １０
８、EC1 １０９及びEC2 １１０へ各々送られる。トラッ
プ制御ユニット１３３からのトラップ信号に応じて、実
行無効化フラグが各部に立てられ、その後のパイプライ
ンステージで命令の処理をアボートする一方、命令発行
ユニット１０２は予め定められたトラップ処理ルーチン
に関する命令フェッチを開始し、このトラップ処理ルー
チンにおいて、トラップ原因レジスタ１３０、アボート
アドレスレジスタ１３１及びトラップアドレスレジスタ
１３２に各々格納されているトラップ原因、アボートア
ドレス及びトラップアドレスが使用される。The trap signal on the trap signal line 134 is transmitted to the instruction issuing unit 102, ALU0103, ALU110.
4, FADD105, FMUL106, MA0107, MA110
8, sent to EC1 109 and EC2 110, respectively. In response to a trap signal from the trap control unit 133, an execution invalidation flag is set in each unit, and the instruction processing is aborted in a subsequent pipeline stage, while the instruction issuing unit 102 executes an instruction related to a predetermined trap processing routine. Fetching is started, and in this trap processing routine, the trap cause, abort address, and trap address stored in the trap cause register 130, the abort address register 131, and the trap address register 132, respectively, are used.

【００２５】更に詳細には、トラップ制御ユニット１３
３は図２に示されるような構成を有している。即ち、ト
ラップ制御ユニット１３３は更に、Ｍステージで現在実
行される命令のアドレスの下位２ビットを除くことによ
って得られるワードアドレスの共通部分を格納するＭス
テージプログラムカウンタ１（MPC ）１５１と；Ｍステ
ージで現在実行される命令のうち最も小さいアドレスを
有する命令のアドレスの下位２ビットによって示される
ものであってMPC １５１のエントリと組み合わされてア
ボートアドレスレジスタ１３１に供給されるアボートア
ドレスを発生することになるワードアドレスの個別部分
を格納するＭステージプログラムカウンタ２（mpc ）１
５２と；ＭステージでMA0 １０７、MA1 １０８、EC1 １
０９及びEC2 １１０によって現在実行される命令のアド
レスの下位２ビットによって示されるワードアドレスの
個別部分を格納するためのＭステージサブプログラムカ
ウンタ（submpc1, submpc2, submpc3 及びsubmpc4 ）１
５３、１５４、１５５及び１５６と；Ｍステージサブプ
ログラムカウンタ１５３〜１５６の中で最小のエントリ
を、MPC １５１のエントリと組み合わされてトラップア
ドレスレジスタ１３２に供給されるトラップアドレス１
４２を発生するための出力１４７として出力し、最小の
エントリを有するＭステージサブプログラムカウンタ１
５３〜１５６に対応するトラップ要求信号線１４３〜１
４６のうちの１つを介して送られるトラップ原因をトラ
ップ原因レジスタ１３０に供給されるトラップ原因１４
０として出力するトラップデータ生成ユニット１５７
と；トラップ原因をトラップ要求信号線１４３〜１４６
のいずれか１つから受けたときにアサートされるトラッ
プ信号をトラップ信号線１３４に対して発生するトラッ
プ信号生成ユニット１５８と、を備えている。More specifically, the trap control unit 13
3 has a configuration as shown in FIG. That is, the trap control unit 133 further includes an M stage program counter 1 (MPC) 151 for storing a common part of a word address obtained by removing the lower two bits of the address of the instruction currently executed in the M stage; To generate the abort address indicated by the lower two bits of the address of the instruction having the smallest address among the instructions currently executed and supplied to the abort address register 131 in combination with the entry of the MPC 151. M stage program counter 2 (mpc) 1 for storing an individual part of a word address
52; MA0 at the M stage, MA1 108, EC1 1
09 and M stage subprogram counters (submpc1, submpc2, submpc3 and submpc4) 1 for storing the individual parts of the word address indicated by the lower two bits of the address of the instruction currently being executed by EC2 110.
53, 154, 155, and 156; the smallest entry among the M stage subprogram counters 153 to 156 is combined with the entry of the MPC 151 to provide the trap address 1 supplied to the trap address register 132.
M-stage subprogram counter 1 with the smallest entry, output as output 147 to generate
Trap request signal lines 143-1 corresponding to 53-156
The trap cause sent via one of the trap causes 14 is provided to trap cause register 130.
Trap data generation unit 157 that outputs as 0
The trap cause signal lines 143 to 146
And a trap signal generation unit 158 that generates a trap signal asserted when received from any one of the trap signal lines 134.

【００２６】図３は、図１のプロセッサシステムにおけ
るパイプライン処理において、前記図２１のプログラム
が実行された時に「load」命令でページフォールトが起
こった場合の進行状況をしめすものであり、斜線部分は
アボートされた命令を示す。図３に示すように、図１の
第１実施例においては、プログラム実行の進行中にn+2
番目の「load」命令の実行によりトラップが発生したと
き、トラップを引き起こしているn+2 番目の命令と同時
にフェッチされた命令番号がn からn+3 の命令全てがア
ボートされる。図２２に示される従来の場合と比較する
と、アボートされる命令の数が従来の場合の方が少ない
ので従来の場合の方が効率的であると思われるかも知れ
ないが、既に述べたように従来の場合は適切なトラップ
信号を決定する時間が必要なために、従来の場合の方が
この第１実施例よりサイクル時間が長くなり、実際には
この第１実施例の方が効率が高くなる。FIG. 3 shows the progress of the pipeline processing in the processor system of FIG. 1 when a page fault occurs by the "load" instruction when the program of FIG. 21 is executed. Indicates an aborted instruction. As shown in FIG. 3, in the first embodiment shown in FIG. 1, n + 2
When a trap is generated by execution of the "load" instruction, all the instructions fetched at the same time as the (n + 2) th instruction causing the trap and having instruction numbers n to n + 3 are aborted. Compared to the conventional case shown in FIG. 22, the number of instructions to be aborted is smaller in the conventional case, so that the conventional case may seem to be more efficient. Since the conventional case requires time to determine an appropriate trap signal, the conventional case has a longer cycle time than the first embodiment, and in fact, the first embodiment has higher efficiency. Become.

【００２７】より詳しく説明すると、図３に示されるパ
イプライン処理において、n 番目、n+1 番目、n+2 番目
及びn+3 番目の命令全てがサイクルC+3 でＭステージに
入いり、n 番目の「fadd」処理についてのＭステージ処
理がEC1 １０９で行われ、n+1 番目の「add 」処理につ
いてのＭステージ処理がMA0 １０７で行われ、n+2 番目
の「load」処理についてのＭステージ処理がMA1 １０８
で行われ、n+3 番目の「fmul」処理についてのＭステー
ジ処理がEC2 １１０で行われる。このサイクルC+3 にお
いて、ページフォールトがMA1 １０８で検出されたと
き、MA1 １０８はトラップ要求信号線１４４を介してト
ラップ制御ユニット１３３にページフォールトトラップ
の発生を知らせる。ここで、このサイクルC+3 におい
て、MPC １５１は、Ｍステージで現在実行されているn
番目〜n+3 番目の命令の下位２ビットを除いたワードア
ドレスを格納し、mpc １５２は、Ｍステージで現在実行
されている命令中最小のアドレスを有するn 番目の命令
の下位２ビットを示すワードアドレスを格納しており、
この場合これは０である。一方、submpc1 １５３、subm
pc2 １５４、submpc3 １５５及びsubmpc4 １５６は、MA
0 １０７、MA1 １０８、EC1 １０９及びEC2 １１０の各
々で現在実行されているn+1 番目、n+2 番目、n番目及
びn+3 番目の命令の下位２ビットを各々示すワードアド
レスを格納している。More specifically, in the pipeline processing shown in FIG. 3, all of the n-th, n + 1-th, n + 2-th and n + 3th instructions enter the M stage in cycle C + 3. An M-stage process for the n-th “fadd” process is performed in EC1 109, an M-stage process for the (n + 1) -th “add” process is performed in MA0 107, and the n + 2-th “load” process M1 processing of MA1 108
, And an M stage process for the (n + 3) th “fmul” process is performed in the EC2 110. In this cycle C + 3, when a page fault is detected at MA1 108, MA1 108 notifies trap control unit 133 via trap request signal line 144 that a page fault trap has occurred. Here, in this cycle C + 3, the MPC 151
The word address excluding the lower 2 bits of the (th) to (n + 3) th instructions is stored, and mpc 152 indicates the lower 2 bits of the nth instruction having the lowest address among the instructions currently being executed in the M stage. Stores the word address,
In this case this is zero. On the other hand, submpc1 153, subm
pc2 154, submpc3 155 and submpc4 156 are MA
0, 107, MA1 108, EC1 109, and EC2 110 store word addresses indicating the lower 2 bits of the (n + 1) th, n + 2th, nth, and n + 3th instructions currently being executed. ing.

【００２８】トラップ信号生成ユニット１５８は、トラ
ップ要求信号線１４３〜１４６のいずれかにトラップ要
求があるとき、トラップ信号線１３４のトラップ信号を
アサートする。トラップ信号線１３４におけるアサート
されたトラップ信号は、命令発行ユニット１０２、ALU0
１０３、ALU1１０４、MA0 １０７、MA1 １０８、FADD１
０５、FMUL１０６、EC1 １０９及びEC2 １１０に供給さ
れるので、これらのユニットにおける処理がアボートさ
れると同時に、命令発行ユニット１０２は予め定められ
たトラップ処理ルーチンに関する命令フェッチを始め
る。The trap signal generation unit 158 asserts a trap signal on the trap signal line 134 when any of the trap request signal lines 143 to 146 has a trap request. The asserted trap signal on the trap signal line 134 is sent to the instruction issuing unit 102, ALU0
103, ALU1104, MA0107, MA1108, FADD1
05, FMUL 106, EC1 109 and EC2 110, so that the processing in these units is aborted and at the same time the instruction issuing unit 102 starts fetching instructions for a predetermined trap processing routine.

【００２９】ここで、トラップデータ生成ユニット１５
７がトラップ要求信号線１４４にトラップ要求を検出す
ると、このトラップ要求信号線１４４に対応するsubmpc
2 １５４のエントリが、信号線１４７に出力され、それ
がMPC １５１のエントリと組み合わされてトラップアド
レス１４２、この場合n+2 番目の命令のアドレス、を生
成し、このn+2 番目の命令のアドレスはトラップアドレ
ス信号線１４２を介してトラップアドレスレジスタ１３
２に格納される。Here, the trap data generating unit 15
7 detects a trap request on the trap request signal line 144, the submpc corresponding to the trap request signal line 144
2 154 entries are output on signal line 147 which is combined with the MPC 151 entry to generate a trap address 142, in this case the address of the n + 2th instruction, The address is stored in the trap address register 13 via the trap address signal line 142.
2 is stored.

【００３０】他方、MPC １５１のエントリをmpc １５２
のエントリと組み合わせることによりアボートアドレス
１４１が得られ、この場合のアボートアドレスであるn
番目の命令のアドレスはアボートアドレス信号線１４１
を介してアボートアドレスレジスタ１３１に格納され
る。On the other hand, the entry of the MPC 151 is changed to the mpc 152
Abort address 141 is obtained by combining with the above entry, and the abort address n in this case is n.
The address of the instruction is the abort address signal line 141
Through the abort address register 131.

【００３１】更に、トラップ要求を現在出しているトラ
ップ要求信号線に付随するＭステージプログラムカウン
タ中で最小のアドレスを格納するsubmpc1 １５３、subm
pc2１５４、submpc3 １５５及びsubmpc4 １５６の一つ
に対応するトラップ要求信号線１４３〜１４６の１つに
おける信号がトラップ原因信号線１４０へ出力され、ト
ラップ原因レジスタ１３０に格納される。この場合、ト
ラップ要求は、トラップ要求信号線１４４のみから出さ
れいるので、トラップ要求信号線１４４の信号がトラッ
プ原因信号線１４０へ出力されトラップ原因レジスタ１
３０に格納されることになる。Further, submpc1 153, subm which stores the minimum address in the M stage program counter associated with the trap request signal line which is currently issuing a trap request.
The signal on one of the trap request signal lines 143 to 146 corresponding to one of pc2 154, submpc3 155 and submpc4 156 is output to the trap cause signal line 140 and stored in the trap cause register 130. In this case, since the trap request is issued only from the trap request signal line 144, the signal of the trap request signal line 144 is output to the trap cause signal line 140 and the trap cause register 1
30 will be stored.

【００３２】ここでは、n 番目からn+3 番目の命令全て
が同時にフェッチされる場合を説明しているが、４つの
命令間にデータの依存性があるためにmpc のエントリが
０でない場合がありうる。又、２個以上のトラップ要求
がトラップ要求信号線１４３〜１４６において検出され
た場合には、submpc1 １５３、submpc2 １５４、submpc
3 １５５及びsubmpc4 １５６の内でＭステージで現在実
行される命令中最小のアドレスを格納するもののエント
リが信号線１４７に出力される。Here, the case where all the n-th to (n + 3) -th instructions are fetched at the same time has been described. However, there is a case where the entry of mpc is not 0 due to data dependency between the four instructions. It is possible. When two or more trap requests are detected on the trap request signal lines 143 to 146, submpc1 153, submpc2 154, submpc
Of the 3 155 and submpc 4 156, the entry that stores the lowest address among the instructions currently executed in the M stage is output to the signal line 147.

【００３３】この実施例においては、トラップを引き起
こす命令とアボートされる命令は異なるので、アボート
アドレスレジスタ１３１とトラップアドレスレジスタ１
３２の両方が必要である。In this embodiment, since the instruction causing the trap and the instruction to be aborted are different, the abort address register 131 and the trap address register 1
Both 32 are required.

【００３４】又、この実施例においては、トラップ要求
信号線１４３〜１４６のいずれであっても少なくとも１
つトラップ要求があればすぐにトラップ信号はアサート
されるのでトラップ信号線におけるエントリは非常に速
く決めることができる。他方、トラップ原因レジスタ１
３０に格納されるトラップ原因１４０及びトラップアド
レスレジスタ１３２に格納されるトラップアドレス１４
２は、Ｍステージサブプログラムカウンタ１５３〜１５
６におけるエントリを比較した結果として決定されるた
め、これらはずっと後にならないと決められない。しか
し、一般的に言って、メモリアクセスをアボートするた
めにトラップ信号線におけるエントリが速く決められる
必要はあるが、レジスタに格納されるトラップデータは
ずっと遅く決められても構わないので、この実施例にお
いてはプロセッサシステムのサイクル時間を長くする必
要は生じない。In this embodiment, at least one of the trap request signal lines 143 to 146 is provided.
Since the trap signal is asserted as soon as there is one trap request, the entry on the trap signal line can be determined very quickly. On the other hand, trap cause register 1
30 and the trap address 14 stored in the trap address register 132
2 is an M stage subprogram counter 153 to 15
Since these are determined as a result of comparing the entries in 6, these cannot be determined much later. However, generally speaking, in order to abort the memory access, the entry in the trap signal line needs to be determined quickly, but the trap data stored in the register may be determined much later. In, there is no need to increase the cycle time of the processor system.

【００３５】従って、この第１実施例によると、システ
ムにおけるクロック周波数の低下を防げるように、サイ
クル時間を増加させることなく機能できるトラップ制御
機能を組み込んだスーパースカラプロセッサ等の並列処
理型プロセッサシステムを提供することが可能となる。Therefore, according to the first embodiment, a parallel processing type processor system such as a super scalar processor incorporating a trap control function capable of functioning without increasing the cycle time so as to prevent a decrease in the clock frequency in the system is provided. Can be provided.

【００３６】次に、図４から図６を参照して、本発明に
係る並列処理型プロセッサシステムの第２の実施例につ
いて詳細に説明する。Next, a second embodiment of the parallel processing type processor system according to the present invention will be described in detail with reference to FIGS.

【００３７】この第２実施例は、上述の第１実施例の応
用であり、以下の理由から、第１実施例のトラップ制御
機能に加え、更に、ストール制御機能を組み込んだもの
である。The second embodiment is an application of the first embodiment described above, and incorporates a stall control function in addition to the trap control function of the first embodiment for the following reasons.

【００３８】即ち、第１実施例のシステムを変形して、
図４に示すように、システムが更に、バス線を介して命
令キャッシュメモリ１０１Ａ及び２ポートデータキャッ
シュメモリ１２５Ａに接続されるメインメモリ１６１と
Ｉ／Ｏ装置１６２とを含むようにした場合、トラップ制
御機能だけではトラブルが起きる場合がある。トラブル
の原因は、ここでは、メモリアドレスにマップされる複
数のＩ／Ｏレジスタを通常含んでいるＩ／Ｏ装置１６２
である。このようなＩ／Ｏレジスタへのアクセスは、Ｉ
／Ｏレジスタのアドレスを特定するメモリアクセス命令
と同一の命令によってなされ、キャッシュは通常この目
的には使用されない。Ｉ／Ｏレジスタは、コマンド及び
パラメータをＩ／Ｏ装置１６２に設定するためとＩ／Ｏ
レジスタのステータスをプロセッサ側に示すためとに用
いられ、例えばＩ／Ｏ装置１６２のステータスを読み込
むためのアクセスに応じて、プロセッサ側でＩ／Ｏ装置
１６２のステータスを読み込んだときにステータスレジ
スタを消去するような形で、Ｉ／Ｏレジスタの内部状態
を変えてしまうタイプのＩ／Ｏレジスタがある。That is, by modifying the system of the first embodiment,
As shown in FIG. 4, when the system further includes a main memory 161 and an I / O device 162 connected to the instruction cache memory 101A and the two-port data cache memory 125A via a bus line, trap control is performed. Trouble may occur only with the function. The cause of the trouble here is that the I / O device 162 typically includes a plurality of I / O registers mapped to memory addresses.
It is. Access to such an I / O register is based on I
This is done by the same instruction as the memory access instruction that specifies the address of the / O register, and the cache is not normally used for this purpose. The I / O registers are used to set commands and parameters to the I / O device
Used to indicate the status of the register to the processor. For example, in response to an access for reading the status of the I / O device 162, the status register is erased when the status of the I / O device 162 is read by the processor. There is an I / O register that changes the internal state of the I / O register in such a way that the I / O register changes.

【００３９】そのようなＩ／Ｏ装置１６２を使用する場
合、上述の第１実施例のトラップ制御機能のみを有する
システムでは、以下のような問題に直面することにな
る。即ち、このシステムにおいては、２つのメモリアク
セス命令が同時実行可能なような２ポートデータキャッ
シュメモリ１２５Ａについて、２つのアクセスユニット
１０７及び１０８が設けられている。従って、メモリア
クセスユニット１０７及び１０８において、２つのＩ／
Ｏアクセス命令がＭステージに同時に到達する場合があ
る。しかし、Ｉ／Ｏアクセスの目的のためのラインはバ
ス線１６０１つのみしかないので、２つのＩ／Ｏアクセ
ス命令を同時に処理することは不可能である。この結
果、２つのＩ／Ｏアクセス命令のうちの最初の１つの処
理については２番目のＩ／Ｏアクセス命令の処理が始ま
る前にＩ／Ｏ装置１６２の側で完了していなければなら
ない。When such an I / O device 162 is used, the system having only the trap control function of the above-described first embodiment encounters the following problems. That is, in this system, two access units 107 and 108 are provided for a two-port data cache memory 125A that can execute two memory access instructions simultaneously. Therefore, in the memory access units 107 and 108, two I / Os
The O access instruction may arrive at the M stage at the same time. However, since there is only one bus line 1601 for the purpose of I / O access, it is impossible to process two I / O access instructions simultaneously. As a result, the processing of the first one of the two I / O access instructions must be completed on the I / O device 162 side before the processing of the second I / O access instruction starts.

【００４０】このような状況において、２番目のＩ／Ｏ
アクセス命令の処理でバスエラーなどの例外が起きる可
能性がある。そのような場合、もしシステムが上記第１
実施例のトラップ制御機能しか用いていないと、両方の
Ｉ／Ｏアクセス命令がアボートされることになる。従っ
て、適切なトラップ処理ルーチンを行った後に元のプロ
グラムを再度実行するとき、Ｉ／Ｏ装置１６２の側にお
いては最初のＩ／Ｏアクセス命令は終わったと見なされ
ているにも関わらず、これがもう一度実行されることに
なる。この場合、最初のＩ／Ｏアクセス命令が偶然に、
ステータスレジスタを読む命令であると、ステータスレ
ジスタはトラップ発生の前に既に一度読まれているの
で、このステータスレジスタの内容は既に消去されてお
り、最初のＩ／Ｏアクセス命令を再度実行するときには
もはや正しいものではなくなってしまっているためトラ
ブルが生じる。In such a situation, the second I / O
An exception such as a bus error may occur in the processing of the access instruction. In such a case, if the system
If only the trap control function of the embodiment is used, both I / O access instructions will be aborted. Therefore, when the original program is executed again after performing the appropriate trap processing routine, even though the first I / O access instruction is considered to have been completed on the I / O device 162 side, this is repeated. Will be executed. In this case, the first I / O access instruction happens to be
For an instruction that reads the status register, since the status register has already been read once before the trap occurred, the contents of this status register have already been erased and will no longer be available when the first I / O access instruction is executed again. Trouble arises because it is no longer correct.

【００４１】上記のトラブルは基本的に、Ｄステージで
は検出できない同一演算リソースの使用に関する要求の
競合がその後のパイプラインステージで起こり得ること
による。それ故、もしＤステージでこれらの２つのメモ
リアクセス命令が実際に２つのＩ／Ｏアクセス命令であ
ることを検出できれば、これらの２つのメモリアクセス
命令をプロセッサに同時に供給しないことによってトラ
ブルは避けることが出来るが、その反面、Ｉ／Ｏアクセ
ス命令のためだけに特別な命令を使用することを要求す
ることになる。The above-mentioned trouble is basically caused by the fact that a request for use of the same operation resource which cannot be detected in the D stage may compete in a subsequent pipeline stage. Therefore, if the D stage can detect that these two memory access instructions are actually two I / O access instructions, avoid trouble by not supplying these two memory access instructions to the processor at the same time. However, on the other hand, it is required to use a special instruction only for the I / O access instruction.

【００４２】図４に示される第２実施例の構成において
は、この問題を、以下のように、そのような特別なＩ／
Ｏアクセス命令を使用せずに解決するようにしている。In the configuration of the second embodiment shown in FIG. 4, this problem is solved as follows by such a special I / O.
The problem is solved without using the O access instruction.

【００４３】まず第１に、この第２実施例においては、
命令がALU0１０３及びALU1１０４へ同時に供給されるこ
とになるとき、最初に実行されるべき、より小さいアド
レスを有する命令がALU0１０３へ供給されるように、命
令発行ユニット１０２ＡがALU0１０３及びALU1１０４へ
の命令供給を制御する。従って、メモリアクセス命令が
MA0 １０７及びMA1 １０８へ同時に届いたときには、MA
0 １０７が常に先に実行されるべきメモリアクセス命令
を有していることになる。First, in the second embodiment,
When an instruction is to be supplied to ALU0103 and ALU1104 at the same time, instruction issuance unit 102A causes instruction supply to ALU0103 and ALU1104 so that the instruction having the smaller address to be executed first is supplied to ALU0103. Control. Therefore, the memory access instruction
When it reaches MA0 107 and MA1 108 at the same time,
0 107 will always have a memory access instruction to be executed first.

【００４４】第２に、図４の構成は、ストール要求信号
がMA0 １０７、MA1 １０８、EC1 １０９及びEC2 １１０
からストール要求信号線１７０、７１、７２及び１７３
を介して各々供給されるストール制御ユニット１６３を
更に含んでおり、このストール制御ユニット１６３はst
all1信号１８０、stall2信号１８１及びstallv1 信号１
８２を命令発行ユニット１０２Ａ、ALU0１０３、ALU1１
０４、FADD１０５、FMUL１０６、MA0 １０７、MA1 １０
８、EC1 １０９、EC2 １１０及びトラップ制御ユニット
１３３へ出力して、以下に記載するような適当なストー
ル制御を行う。Second, in the configuration of FIG. 4, the stall request signals are MA0 107, MA1 108, EC1 109 and EC2 110.
To stall request signal lines 170, 71, 72 and 173
Further includes a stall control unit 163, each of which is supplied via
all1 signal 180, stall2 signal 181 and stallv1 signal 1
82 is the instruction issuing unit 102A, ALU0103, ALU11
04, FADD105, FMUL106, MA0107, MA1 10
8, output to EC1 109, EC2 110 and trap control unit 133 to perform appropriate stall control as described below.

【００４５】stall1信号１８０は、MA0 １０７、MA1 １
０８、EC1 １０９及びEC2 １１０のいずれかからのスト
ール要求があるときアサートされ、stall2信号１８１
は、MA0 １０７、MA1 １０８、EC1 １０９及びEC2 １１
０のいずれかからのストール要求があるときアサートさ
れ、一方stallv1 信号１８２は、Ｍステージの処理がMA
1 １０８で現在実行されている命令のアドレスの下位２
ビットを示す。The stall1 signal 180 is composed of MA0 107 and MA1 1
08, a stall request from any of EC1 109 and EC2 110, and a stall2 signal 181
Are MA0 107, MA1 108, EC1 109 and EC2 11
Asserted when there is a stall request from any of the STAs, while the stallv1 signal 182 indicates that
1 Lower 2 of the address of the instruction currently being executed at 108
Indicates a bit.

【００４６】これらのstall1、stall2及びstallv1 信号
１８０〜１８２の値に従って、このシステムにおけるパ
イプライン処理は以下のように制御される。In accordance with the values of these stall1, stall2 and stallv1 signals 180-182, the pipeline processing in this system is controlled as follows.

【００４７】（１）ALU0１０３及びMA0 １０７のパイプ
ライン＜１０３、１０７＞： (a) Ｍステージ及びＷステージ：stall2信号１８１がニ
ゲートであれば、stall1信号１８０に関係なくパイプラ
イン処理が行われる。(1) Pipeline <103, 107> of ALU0103 and MA0107: (a) M stage and W stage: If stall2 signal 181 is negated, pipeline processing is performed regardless of stall1 signal 180.

【００４８】(b) Ｅステージ：stall1信号１８０がニゲ
ートであれば、パイプライン処理が行われる。(B) E stage: If the stall1 signal 180 is negated, pipeline processing is performed.

【００４９】（２）ALU1１０４及びMA1 １０８のパイプ
ライン＜１０４、１０８＞：stall1信号１８０がニゲー
トであれば、パイプライン処理が行われる。(2) Pipeline <104, 108> of ALU 1104 and MA 1 108: If stall 1 signal 180 is negated, pipeline processing is performed.

【００５０】（３）FADD１０５及びEC1 １０９のパイプ
ライン＜１０５、１０９＞及びFMUL１０６及びEC2 １１
０のパイプライン＜１０６、１１０＞： (a) Ｍステージ及びＷステージ：stall1信号１８０がニ
ゲートであれば、stall1信号１８０に関係なくパイプラ
イン処理が行われる。(3) Pipeline <105, 109> of FADD 105 and EC1 109 and FMUL 106 and EC2 11
0 pipeline <106, 110>: (a) M stage and W stage: If stall1 signal 180 is negated, pipeline processing is performed regardless of stall1 signal 180.

【００５１】パイプライン処理は又、stall1信号１８０
がアサートでありstall2信号がニゲートであって、stal
lv1 信号１８２が、EC1 １０９及びEC2 １１０の各々に
おいてＭステージの処理が現在行われている命令の下位
２ビットを格納するsubmpc3１５５及びsubmpc4 １５６
の各々より大きい値を示している場合に行われる。The pipeline processing also includes the stall1 signal 180
Is asserted, stall2 signal is negated, and stal
The lv1 signal 182 is a submpc3155 and submpc4 156 that store the lower two bits of the instruction currently being processed by the M stage in EC1 109 and EC2 110, respectively.
Is performed when a value greater than each of the values is indicated.

【００５２】(b) Ｅステージ：stall1信号１８０がニゲ
ートであれば、パイプライン処理が行われる。(B) E stage: If the stall1 signal 180 is negated, pipeline processing is performed.

【００５３】又、この第２実施例においては、mpc １５
２に格納される値はこれらstall1、stall2及びstallv1
信号１８０〜１８２の値に従って以下の様に決定され
る。即ち、stall1信号１８０がアサートされないとき、
mpc １５２には、Ｅステージで実行された命令中の最小
アドレスがロードされ、stall1信号１８０及びstall2信
号１８１の両方がアサートされるときは、mpc １５２は
前の値を維持するが、stall1信号１８０がアサートされ
るがstall2信号１８１信号はアサートされないときに
は、mpc １５２にはstallv1 信号１８２によって示され
る値がロードされる。In the second embodiment, mpc 15
The values stored in 2 are stall1, stall2 and stallv1
It is determined as follows according to the values of the signals 180 to 182. That is, when stall1 signal 180 is not asserted,
The mpc 152 is loaded with the lowest address in the instruction executed in the E stage, and when both the stall1 signal 180 and the stall2 signal 181 are asserted, the mpc 152 maintains the previous value, but the stall1 signal 180 Is asserted, but the stall2 signal 181 is not asserted, the mpc 152 is loaded with the value indicated by the stallv1 signal 182.

【００５４】従って、この第２実施例では、MA1 １０８
だけからストール要求があれば、パイプライン＜１０
３、１０７＞の処理が完了し、現在処理されている命令
のアドレスがMA1 １０８での命令のアドレスより小さい
ときのみ、パイプライン＜１０５、１０９＞と＜１０
６、１１０＞の各々の処理が完了する。Therefore, in this second embodiment, MA1 108
If there is a stall request only from the pipeline <10
3, 107> is completed, and only when the address of the instruction currently being processed is smaller than the address of the instruction in MA1 108, pipelines <105, 109> and <10
6, 110> is completed.

【００５５】この結果、第１及び第２Ｉ／Ｏアクセス命
令が同時にMA0 １０７及びMA1 １０８に各々届いたと
き、MA1 １０８はストール要求をストール制御ユニット
１６３に出力するが、MA0 １０７は、１クロックサイク
ル以内でＩ／Ｏアクセス処理を完了させるのが不可能で
ない限り、ストール要求を出力しない。そして、EC1 １
０９及びEC2 １１０もストール要求を出力しない時、st
all1信号１８０のみがアサートされ、stall2信号１８２
はアサートされない。そのような場合、パイプライン＜
３０３、１０７＞は処理が進められ、mpc １５２には、
stallv1 信号１８２の値がロードされて，システムの状
態が２番目のＩ／Ｏアクセス命令より前の命令が実行完
了している状態となる。As a result, when the first and second I / O access instructions arrive at MA0 107 and MA1 108 at the same time, MA1 108 outputs a stall request to stall control unit 163, but MA0 107 takes one clock cycle. The stall request is not output unless it is impossible to complete the I / O access processing within the period. And EC1 1
09 and EC2 110 also do not output a stall request, st
Only the all1 signal 180 is asserted and the stall2 signal 182
Is not asserted. In such cases, the pipeline <
303, 107> are processed, and the mpc 152 has
The value of the stallv1 signal 182 is loaded, and the state of the system becomes a state where the execution of the instruction before the second I / O access instruction is completed.

【００５６】それ故、２番目のＩ／Ｏアクセス命令につ
いて例外が発生したときでも、最初のＩ／Ｏアクセス命
令は再度実行されない。他方、パイプライン＜１０４、
１０８＞の処理と、２番目のＩ／Ｏアクセス命令より大
きいアドレスを有する命令についてのパイプラインの処
理は、stall1信号１８０がニゲートになる迄ストールさ
れる。Therefore, even when an exception occurs for the second I / O access instruction, the first I / O access instruction is not executed again. On the other hand, pipeline <104,
108> and the processing of the pipeline for instructions having addresses greater than the second I / O access instruction are stalled until the stall1 signal 180 becomes negated.

【００５７】１クロックサイクル以内にＩ／Ｏアクセス
処理を完了させるのが不可能であるためにMA0 １０７が
ストール要求を出力するときには、最初のＩ／Ｏアクセ
ス命令が完了してstall2信号１８１がニゲートになるま
で、stall1信号及びstall2信号１８２がアサートされる
ので、全てのパイプラインはストールされる。When MA0 107 outputs a stall request because it is impossible to complete the I / O access processing within one clock cycle, the first I / O access instruction is completed and stall2 signal 181 is negated. , The stall1 and stall2 signals 182 are asserted, so that all pipelines are stalled.

【００５８】図６は、図４のプロセッサシステムにおけ
るパイプライン処理において、図５のプログラムを実行
した時にn+2 番目の「ロード」命令でバスエラーが起こ
った場合の進行状況を示すものである。図６に示される
ように、この図４の第２実施例においては、ストール要
求がサイクルC+3 においてn+2 番目の命令の実行によっ
て発生するとき、EC1 １０９で行われるn 番目の「fad
d」処理についてのＭステージ処理及びMA0 １０７で行
われるn+1 番目の「add 」処理についてのＭステージ処
理は、次のサイクルC+4 で終了するが、MA1 １０８での
n+2 番目の「load」操作についてのＭステージ処理及び
EC2 １１０でのn+3 番目の「fmul」操作についてのＭス
テージ処理及びは次のサイクルC+4 でストールされ、そ
の後のサイクルC+5 迄終了しない。一方、n+2 番目の
命令より大きいアドレスをもった後続の命令も全てサイ
クルC+4 でストールされる。FIG. 6 shows the progress of the pipeline processing in the processor system of FIG. 4 when a bus error occurs at the (n + 2) th "load" instruction when the program of FIG. 5 is executed. . As shown in FIG. 6, in the second embodiment of FIG. 4, when the stall request is generated by the execution of the (n + 2) th instruction in the cycle C + 3, the n-th “fad” performed in EC1 109 is performed.
The M-stage processing for the “d” processing and the M-stage processing for the (n + 1) -th “add” processing performed in MA0 107 end in the next cycle C + 4.
M-stage processing for n + 2th "load" operation and
The M-stage processing for the n + 3rd “fmul” operation on EC2 110 is stalled in the next cycle C + 4 and does not end until the next cycle C + 5. On the other hand, all subsequent instructions having addresses larger than the (n + 2) th instruction are also stalled in cycle C + 4.

【００５９】ここで、命令の実行中に例外が起こる可能
性を否定できないために命令の処理がストールされた後
で、命令の実行中に実際に例外が発生したときには、命
令の処理はアボートされる。さもなくば、命令実行中に
実際には例外が起こらなかったので命令の処理が再開さ
れる。Here, after the processing of the instruction is stalled because the possibility of an exception occurring during the execution of the instruction cannot be denied, if the exception actually occurs during the execution of the instruction, the processing of the instruction is aborted. You. Otherwise, processing of the instruction is resumed because no exception actually occurred during instruction execution.

【００６０】従って、この第２実施例によると、システ
ムにおけるクロック周波数の低下を防げるように、サイ
クル時間を増加させることなく機能できるトラップとス
トールの制御機能を組み込んだスーパースカラプロセッ
サ等の並列処理型プロセッサシステムを提供することが
可能となる。Therefore, according to the second embodiment, a parallel processing type such as a super scalar processor incorporating a trap and stall control function capable of functioning without increasing the cycle time so as to prevent a decrease in the clock frequency in the system. A processor system can be provided.

【００６１】次に図７を参照して、本発明に係る並列処
理型プロセッサシステムの第３の実施例を詳細に説明す
る。Next, a third embodiment of the parallel processing type processor system according to the present invention will be described in detail with reference to FIG.

【００６２】この第３実施例は、より一般的な設定にお
ける上記第２実施例のストール制御機能の一般化を行っ
たものである。The third embodiment is a generalization of the stall control function of the second embodiment in a more general setting.

【００６３】この第３実施例においては、図７に示され
るように、プロセッサシステムは、命令を格納する命令
キャッシュメモリ（I-cache ）２０１と；Ｆステージで
I-cache ２０１から４ワードバウンダリの４つの命令を
同時にフェッチし、Ｄステージで４つのフェッチされた
命令間のデータ依存関係及び制御依存関係を考慮し、Ｅ
ステージで命令供給線２２０、２２１、２２２及び２２
３を介して実行可能な命令を供給する命令発行ユニット
２０２と；命令供給線２２０及び２２１から供給される
命令に従って、Ｅステージで算術論理演算及びメモリア
ドレス計算を実行する論理演算ユニット（ALU0及びALU
1）２０３及び２０４と；ALU0２０３及びALU1２０４か
らのコマンドに従ってＥステージで整数の乗除算を行う
ための整数乗除算器２０５と；ALU0２０３及びALU1２０
４からのアクセスされるデータを格納するためのデータ
キャッシュメモリ（D-cache ）２０６と；命令供給線２
２２から供給される命令に従ってＥステージで浮動小数
点の加減算を行うための浮動小数点加算器（FADD）２０
７と；命令供給線２２３から供給される命令に従ってＥ
ステージで浮動小数点の乗算を行うための浮動小数点乗
算器（FMUL）２０８と；FMUL２０８からのコマンドに従
ってＥステージで浮動小数点の除算を行うための浮動小
数点除算器（FDIV）２０９と；命令発行ユニット２０２
で発行される命令を特定するための命令アドレス生成ユ
ニット２１０と；後に説明するトラップ制御及びストー
ル制御を行うための制御ユニット２１１と；ALU0２０
３、ALU1２０４の出力を格納するための整数レジスタフ
ァイル２１２と；FADD２０７及びFMUL２０８の出力を格
納するための浮動小数点レジスタファイル２１３とを備
えている。In the third embodiment, as shown in FIG. 7, the processor system includes an instruction cache memory (I-cache) 201 for storing instructions and an F-stage.
At the same time, four instructions of a 4-word boundary are fetched from the I-cache 201 at the same time, and the data dependency and the control dependency between the four fetched instructions are considered in the D stage.
Instruction supply lines 220, 221, 222 and 22 at stage
An instruction issuance unit 202 for supplying an executable instruction via the instruction supply line 3; and an arithmetic operation unit (ALU0 and ALU) for executing an arithmetic and logic operation and a memory address calculation in the E stage according to the instruction supplied from the instruction supply lines 220 and 221.
1) 203 and 204; an integer multiplier / divider 205 for performing an integer multiplication / division in the E stage according to a command from ALU0203 and ALU1204; and ALU0203 and ALU120.
A data cache memory (D-cache) 206 for storing data to be accessed from the memory 4;
A floating-point adder (FADD) 20 for performing floating-point addition and subtraction at the E stage in accordance with an instruction supplied from
7; E in accordance with the instruction supplied from the instruction supply line 223
A floating point multiplier (FMUL) 208 for performing floating point multiplication in the stage; a floating point divider (FDIV) 209 for performing floating point division in the E stage in accordance with a command from the FMUL 208;
ALU020: an instruction address generation unit 210 for specifying an instruction issued by the ALU020; a control unit 211 for performing trap control and stall control described later;
3. An integer register file 212 for storing the output of the ALU 1204; and a floating-point register file 213 for storing the output of the FADD 207 and the FMUL 208.

【００６４】上述の図４の第２実施例と同様に、図７の
プロセッサシステムは、更に、I-cache ２０１及びD-ca
che ２０６にキャッシュされるデータを格納するための
メインメモリ２１４と；Ｉ／Ｏレジスタを含むＩ／Ｏ装
置２１５と；メインメモリ２１４及びＩ／Ｏ装置２１５
をI-cache ２０１及びD-cache に接続させるバス線２１
６とを有している。As in the second embodiment of FIG. 4, the processor system of FIG. 7 further includes an I-cache 201 and a D-ca
main memory 214 for storing data cached in che 206; I / O device 215 including I / O registers; main memory 214 and I / O device 215
Bus line 21 for connecting to I-cache 201 and D-cache
6.

【００６５】加えて、図７のプロセッサシステムには、
更に、I-cache ２０１とバス線２１６との間に設けられ
るプレデコーダ２１７と；命令発行ユニット２０２に接
続されているレジスタスコアボード回路２１８とが組み
込まれている。これらについては以下に詳細に説明す
る。In addition, the processor system of FIG.
Further, a predecoder 217 provided between the I-cache 201 and the bus line 216; and a register scoreboard circuit 218 connected to the instruction issuing unit 202 are incorporated. These will be described in detail below.

【００６６】まず、この第３実施例におけるトラップと
ストールの制御処理について説明する。First, the trap and stall control processing in the third embodiment will be described.

【００６７】一般に、並列処理型プロセッサシステム
は、マシン語命令の実行の時に同じリソースについての
競合する要求が発生するのを回避する機構、及び、マシ
ン語命令間の実行順序の正当性を保持する機構を有する
必要がある。ここで、実行順序の正当性とは、データ依
存関係及び制御依存関係のコンシステンシを意味する。In general, a parallel processing type processor system has a mechanism for avoiding the occurrence of conflicting requests for the same resource during execution of a machine language instruction, and retains the correctness of the execution order between machine language instructions. You need to have a mechanism. Here, the validity of the execution order means the consistency of the data dependency and the control dependency.

【００６８】データ依存関係のコンシステンシを維持す
るためには、実行順序をＤ→Ｓ関係、Ｓ→Ｄ関係及びＤ
→Ｄ関係の内の１つに維持する必要がある。ここで、Ｄ
→Ｓ関係は、先に実行すべき命令の結果を格納するリソ
ースが、後で実行すべき命令に用いられるソースデータ
が読みだされるリソースと同一である関係を示す。Ｓ→
Ｄ関係は、先に実行すべき命令の結果を格納するリソー
スが、後で実行すべき命令の結果を格納するためのリソ
ースと同一である関係を示す。Ｄ→Ｄ関係は、先に実行
すべき命令に用いられるソースデータが読みだされるリ
ソースが、後で実行すべき命令に用いられるソースデー
タが読みだされるリソースと同一であることを示す。こ
の図７の第３実施例においては、データ格納リソース
が、レジスタ及びメモリの２つの態様で存在し、両者間
のデータ依存関係のコンシステンシを保つ必要がある。In order to maintain the consistency of the data dependency, the execution order is changed from D → S relation, S → D relation, and D → D relation.
→ It is necessary to maintain one of the D relationships. Where D
The → S relationship indicates that the resource storing the result of the instruction to be executed first is the same as the resource from which the source data used for the instruction to be executed later is read. S →
The D relation indicates that the resource for storing the result of the instruction to be executed first is the same as the resource for storing the result of the instruction to be executed later. The D → D relationship indicates that the resource from which the source data used for the instruction to be executed first is read is the same as the resource from which the source data used for the instruction to be executed later is read. In the third embodiment shown in FIG. 7, the data storage resources exist in two forms, the register and the memory, and it is necessary to maintain the consistency of the data dependency between them.

【００６９】制御依存関係とは、先の分岐命令とその後
の命令との間の関係である。従来のＶＬＩＷ型プロセッ
サにおける分岐命令については、この制御依存関係のコ
ンシステンシがコンパイラによって保たれている。しか
し、この第３実施例の並列処理型プロセッサシステムで
は、ユーザープログラムのオブジェクトコンパチビリテ
ィをもたせる必要があるので、制御依存関係のコンシス
テンシの保持はハードウェアにおいて実現させる必要が
ある。The control dependency is a relationship between a previous branch instruction and a subsequent instruction. For a branch instruction in a conventional VLIW processor, the consistency of the control dependency is maintained by a compiler. However, in the parallel processing type processor system of the third embodiment, since it is necessary to provide object compatibility of the user program, it is necessary to realize the consistency of the control dependency relationship by hardware.

【００７０】この第３実施例においては、プロセッサシ
ステムは、４ワードバウンダリの４つの命令を同時にフ
ェッチし、その４つの命令間でインオーダーな順序で同
時に発行されできる命令を各演算ユニットに発行して、
インオーダーな順序で命令の実行が完了する。In the third embodiment, the processor system fetches four instructions of a four-word boundary at the same time, and issues instructions which can be issued simultaneously in an in-order order among the four instructions to each arithmetic unit. hand,
Instruction execution is completed in an in-order order.

【００７１】この第３実施例においては、同じリソース
を使用する競合する要求が発生するのを回避しデータ依
存関係及び制御依存関係のコンシステンシを保持するた
めのハードウェアは、２つの機構を含んでいる。第１の
部分は、命令発行ユニット２０２によって実現される命
令発行機構であり、第２の部分は、制御ユニット２１１
によって実現されるストール機構である。In the third embodiment, the hardware for avoiding the occurrence of conflicting requests using the same resources and maintaining the consistency of the data dependency and the control dependency includes two mechanisms. In. The first part is an instruction issuing mechanism implemented by the instruction issuing unit 202, and the second part is the control unit 211.
This is a stall mechanism realized by:

【００７２】命令発行機構を実現するために、命令発行
ユニット２０２には、同じリソースを使用する競合する
要求の発生を検出するためのプレデコーダ２１７が備え
られており、これはキャッシュへのリフィル時（又はキ
ャッシュスルー命令フェッチの時は命令フェッチの時）
に同時にフェッチされた４つの命令の使用リソースと、
同時にフェッチされた命令のうちで最小のアドレスを有
する命令について競合があるかどうかとのマークを付け
る。命令発行ユニット２０２は又、Ｄ→Ｓ関係及びＤ→
Ｄ関係を検出するためのレジスタスコアボード回路２１
８も備えており、プレデコーダ２１７によってマークさ
れる競合を回避しつつ、レジスタスコアボード回路２１
８によって検出されるデータ依存関係及び制御依存関係
においてＤ→Ｓ及びＤ→Ｄ関係を保つことにより、Ｄス
テージでの命令発行処理をインオーダーな順序で行うこ
とができる。Ｓ→Ｄ関係については、ソースレジスタの
読み出し後に、命令発行をインオーダーな順序で行いイ
ンオーダーな順序で実行が完了することにより保たれ
る。To implement the instruction issuance mechanism, the instruction issuance unit 202 is provided with a predecoder 217 for detecting occurrence of conflicting requests using the same resource, which is used when refilling the cache. (Or instruction fetch for cache-through instruction fetch)
Resources used for the four instructions fetched at the same time,
The instruction having the lowest address among the instructions fetched at the same time is marked as having a conflict. The instruction issuing unit 202 also has a D → S relation and a D →
Register scoreboard circuit 21 for detecting D relationship
8 to avoid conflicts marked by the predecoder 217,
By maintaining the D → S and D → D relationships in the data dependency and the control dependency detected by step 8, the instruction issuance processing in the D stage can be performed in an in-order order. The S → D relationship is maintained by reading out the source register and issuing instructions in an in-order order and completing the execution in the in-order order.

【００７３】更に、この第３実施例においては、２つの
メモリアクセス命令を同時に実行することが可能である
ために、メモリリソースについてのデータ依存関係と同
様、メモリリソースの競合関係のコンシステンシも保つ
必要がある。命令発行ユニット２０２はＤステージより
前で処理を行うので、命令発行ユニット２０２はこれら
の関係がないと仮定して処理を行っており、これらの関
係が存在するかどうかを正確に確認することができな
い。これらの関係の存在はＭステージになるまで正確に
は決められず、関係の存在の可能性を決めることは可能
であっても、命令供給処理をそのような可能性に基づい
て過剰に制御してしまうと、パーフォーマンスが著しく
制限されてしまうことになる。Furthermore, in the third embodiment, since two memory access instructions can be executed at the same time, the consistency of the memory resource competition as well as the data dependency of the memory resource is maintained. There is a need. Since the instruction issuing unit 202 performs processing before the D stage, the instruction issuing unit 202 performs processing on the assumption that these relationships do not exist, and it is possible to accurately confirm whether or not these relationships exist. Can not. The existence of these relationships cannot be determined exactly until the M stage, and although it is possible to determine the possibility of the existence of the relationship, the instruction supply process is over-controlled based on such a possibility. Doing so will severely limit performance.

【００７４】ストール機構においては、実行完了はイン
オーダーな順序に保たれ、命令発行機構で既に考慮され
たもの以外のケース及び命令発行処理が速すぎて命令フ
ェッチ処理が追いつけなくなった場合についての制御を
行う。In the stall mechanism, execution completion is kept in an in-order order, and control is performed in cases other than those already considered by the instruction issuing mechanism and when the instruction issuing processing is too fast to keep up with the instruction fetch processing. I do.

【００７５】インオーダーな順序での実行完了において
は、Ｄステージから同時発行された命令は同時に実行完
了するという基本原理に基づいた命令の実行完了が達成
される。但し、ALU1２０４によってＭステージで実行さ
れるていメモリアクセス命令Ｘによるメモリアクセス処
理が１サイクル時間内に終了できない時には、上記第２
実施例と同様に、命令Ｘによるストール要求に関わら
ず、このメモリアクセス命令Ｘより小さいアドレスを有
する命令を実行完了し、実行完了した命令の数だけmpc
が更新される。この機能を実行完了ステージにおけるグ
ループ機能と呼ぶ。この第３実施例においては、ALU0２
０３及びALU1２０４で命令を同時に実行するとき、命令
発行ユニット２０２は、ALU0２０３が常により小さいア
ドレスを有する命令を実行するような制御を行うので、
インオーダーな実行完了が保証される。この命令発行ユ
ニット２０２による制御によって、Ｄステージでは検出
することができないメモリリソースの使用に関する競合
する要求の発生によるデッドロックをＭステージにおい
て防ぐことができる。In the execution completion in the in-order order, the execution completion of the instruction based on the basic principle that the instructions issued simultaneously from the D stage are completed simultaneously is achieved. However, if the memory access processing by the memory access instruction X executed in the M stage by the ALU 1204 cannot be completed within one cycle time, the second
As in the embodiment, regardless of the stall request by the instruction X, the execution of the instruction having the address smaller than that of the memory access instruction X is completed, and the number of the mpc equal to the number of the executed instruction is completed.
Is updated. This function is called a group function in the execution completion stage. In the third embodiment, ALU02
03 and the ALU 1204 simultaneously execute an instruction, the instruction issuing unit 202 performs control such that the ALU 0203 always executes an instruction having a smaller address.
In-order execution completion is guaranteed. The control by the instruction issuing unit 202 can prevent a deadlock in the M stage due to a conflicting request regarding the use of memory resources that cannot be detected in the D stage.

【００７６】この様な、ストール要求を発している命令
より小さいアドレスを有する命令のみについてパイプラ
イン処理を完了するための制御は、他の状況、例えば、
ALU0２０３でのＭステージ処理が１サイクルで終了でき
ないような場合や、FADD２０７又はFMUL２０８のE2ステ
ージで実行される命令についてトラップの可能性が否定
できないためにストール要求がアサートされている場合
等にも同様に適用することが可能である。The control for completing the pipeline processing only for an instruction having an address smaller than the instruction issuing the stall request is performed in other situations, for example,
The same applies when the M-stage processing in the ALU0203 cannot be completed in one cycle, or when a stall request is asserted because the possibility of trapping an instruction executed in the E2 stage of the FADD207 or FMUL208 cannot be denied. It is possible to apply to.

【００７７】命令発行ユニット２０２は、メモリ（キャ
ッシュメモリを含む）に関する限り、同じリソースの使
用に関する競合する要求の発生あるいはデータ依存関係
を考慮しない。この第３実施例のプロセッサシステムは
RISCと同じように、いわゆる「load, store 」アーキテ
クチュアを採用するので、同じリソースの使用に関する
競合する要求の発生の防止及びデータ依存関係の保持を
確保するためには、「load」及び「store 」命令のみを
考慮すれば充分である。D-cache ２０６にはALU0２０３
及びALU1２０４用の専用ポートがあるので、ALU0２０３
及びALU1２０４の１つだけがD-cache ２０６にアクセス
している限りは、リソースの競合は起こらない。故に、
「load」及び「store 」命令は外部メモリに同時にアク
セスする際、データ依存関係はこの２つの命令が同じア
ドレスにアクセスしようとしている場合にのみ存在する
ことになる。The instruction issuing unit 202 does not consider the occurrence of competing requests for the use of the same resource or the data dependency as far as the memory (including the cache memory) is concerned. The processor system of the third embodiment is
Like the RISC, it employs a so-called "load, store" architecture, so to avoid conflicting requests for the use of the same resources and to ensure that data dependencies are maintained, "load" and "store" It is sufficient to consider only the instructions. ALU0203 for D-cache 206
There is a dedicated port for ALU1204 and ALU1203.
And as long as only one of the ALUs 1204 is accessing the D-cache 206, no resource contention occurs. Therefore,
When the "load" and "store" instructions access the external memory simultaneously, a data dependency will exist only if the two instructions are trying to access the same address.

【００７８】メモリリソース競合は実行完了ステージに
おける命令グループ機能によって解決される。メモリに
ついてのデータ依存関係には、「リードアフターライ
ト」、「ライトアフターリード」及び「ライトアフター
ライト」があり、そのコンシステンシはキャッシュメモ
リの側で保たれる。The memory resource conflict is solved by the instruction group function in the execution completion stage. Data dependencies for the memory include “read after write”, “write after read”, and “write after write”, and the consistency is maintained on the cache memory side.

【００７９】次に図８から図１８を参照して、図７の第
３実施例におけるストール制御を詳細に説明する。Next, the stall control in the third embodiment of FIG. 7 will be described in detail with reference to FIGS.

【００８０】この第３実施例においては、ストールが生
じる場合は以下のように要約することができる。In the third embodiment, the case where a stall occurs can be summarized as follows.

【００８１】M0busy M1busy Imis（I-cache miss） FRbusy（FPUFA レジスタ書き込み競合） FAexch（FADD例外チェック） FMexch（FMUL例外チェック） FDexch（FMUL例外チェック） Fstall（強制ストール）これらの場合の各々に対応するストール要求信号がアサ
ートされる条件は以下のようなものである。M0busy M1busy Imis (I-cache miss) FRbusy (FPUFA register write conflict) FAexch (FADD exception check) FMexch (FMUL exception check) FDexch (FMUL exception check) Fstall (forced stall) The conditions under which the stall request signal is asserted are as follows.

【００８２】M0busy、M1busy：これらのタイプのストー
ル要求信号は、メモリアクセス処理（キャッシュ及びＩ
／Ｏに関するアクセス処理を含む）がＭステージでt 番
目のサイクルに行われ、このメモリアクセス処理がこの
t 番目のサイクル中に完了できない時にアサートされ
る。M0busy, M1busy: These types of stall request signals correspond to memory access processing (cache and I
/ O) is performed in the Mth stage in the t th cycle, and this memory access
Asserted when it cannot complete during the tth cycle.

【００８３】Imis：このタイプのストール要求信号は、
新しい命令フェッチ要求があるがその命令フェッチが不
成功の時に、アサートされる。ストールがこのImisスト
ール要求によって起こった場合の例を図８から図１２に
示す。ここでは、Imisストール要求信号は、fpc に新し
い値がロードされ「fpcen 」をアサートされてから１ク
ロックサイクル後に命令が命令レジスタにフェッチされ
ない時、実際にアサートされる。Imis: This type of stall request signal is:
Asserted when there is a new instruction fetch request but the instruction fetch is unsuccessful. FIGS. 8 to 12 show an example in which a stall is caused by the Imis stall request. Here, the Imis stall request signal is actually asserted when the instruction is not fetched into the instruction register one clock cycle after fpc is loaded with a new value and "fpcen" is asserted.

【００８４】図８は命令フェッチ時に命令キャッシュミ
スによるストールが発生した場合を示している。FIG. 8 shows a case where a stall occurs due to an instruction cache miss during instruction fetch.

【００８５】図９は命令フェッチ時に命令キャッシュミ
スによるストールが発生した後、システム中の他の部分
で別のストール要求が発生し、命令キャッシュミスによ
るストールが他の部分での別のストールより先に解消さ
れた場合を示している。FIG. 9 shows that after a stall due to an instruction cache miss at the time of instruction fetch, another stall request occurs in another part of the system, and the stall due to the instruction cache miss precedes another stall in another part. FIG.

【００８６】図１０は命令キャッシュミスによるストー
ルがシステムの他の部分での別のストールと同時に発生
し、他の部分での別のストールの方が命令キャッシュミ
スによるストールよりも先に解消した場合を示してい
る。FIG. 10 shows a case in which a stall due to an instruction cache miss occurs simultaneously with another stall in another part of the system, and another stall in the other part is resolved earlier than a stall due to the instruction cache miss. Is shown.

【００８７】図１１は命令フェッチ時の命令キャッシュ
ミスによるストールがジャンプ命令のジャンプ先で発生
した場合を示している。FIG. 11 shows a case where a stall due to an instruction cache miss at the time of instruction fetch occurs at a jump destination of a jump instruction.

【００８８】図１２は命令キャッシュミスによるストー
ルに伴うキャッシュリフィル動作中にジャンプが起こっ
た場合を示している。FIG. 12 shows a case where a jump occurs during a cache refill operation associated with a stall due to an instruction cache miss.

【００８９】FRbusy：このタイプのストール要求信号
は、FDIV２０９がパイプラインのE2ステージにあってFM
UL２０８もこのE2ステージにある時にアサートされ、FD
IV以外のパイプライン処理を１サイクルの間ストールす
ることによって、FMUL２０８とFDIV２０９間での浮動小
数点レジスタファイル２１３への書き込みの競合を回避
するようにするものである。FRbusy: This type of stall request signal is generated when the FDIV 209 is in the E2 stage of the pipeline and
UL208 is also asserted when in this E2 stage, FD
By stalling pipeline processing other than IV for one cycle, contention of writing to the floating-point register file 213 between the FMUL 208 and the FDIV 209 is avoided.

【００９０】FAexch：FADD２０７のE1ステージにおい
て、トラップ発生の可能性が否定できないとき、FADD２
０７におけるパイプライン処理は通常のF1型からF2型に
なる。この場合、FADD２０７での実行終了ステージはＭ
ステージになるため、FADD２０７のE2及びE3ステージの
間他のユニットの実行完了ステージを遅延させて全ての
ユニットがFADD２０７のＭステージと同時に実行完了ス
テージに達するようにするために、このFAexchストール
要求信号をアサートすることによって他のユニットの処
理をストールする。このようなこのFAexchストール要求
信号によってストールが起こる状況が、図１３に示され
ており、ここでは「fadd」命令と同時にフェッチされた
「Iadd」「fmul」命令の処理と、次のサイクルでフェッ
チされる「fadd」「Iadd」及び「fmul」命令の処理が
「fadd」命令の処理のE2及びE3ステージ間にストールさ
れる。FAexch: In the E1 stage of FADD207, when the possibility of trap occurrence cannot be denied, FADD2
The pipeline processing at 07 changes from the normal F1 type to the F2 type. In this case, the execution end stage in FADD207 is M
This FAexch stall request signal is used to delay the execution completion stage of other units during the E2 and E3 stages of FADD207 so that all units reach the execution completion stage at the same time as the M stage of FADD207. Stall the processing of other units by asserting. FIG. 13 shows a situation in which a stall occurs due to such a FAexch stall request signal. Here, the processing of the “Iadd” and “fmul” instructions fetched simultaneously with the “fadd” instruction and the fetch in the next cycle are performed. The processing of the "fadd", "Iadd" and "fmul" instructions is stalled between the E2 and E3 stages of the processing of the "fadd" instruction.

【００９１】FMexch：FMUL２０８のE1ステージにおい
て、トラップ発生の可能性が否定できない場合、FMUL２
０８でのパイプライン処理は、通常のF1型からF2型タイ
プになる。この場合、FMUL２０８での実行完了ステージ
はＭステージになるので、他のユニットの実行終了ステ
ージを２サイクル遅延させて全てのユニットがFMUL２０
８のＭステージと同時に実行完了ステージに達するよう
にするために、このFMexchストール要求信号をアサート
することによって他のユニットの処理をストールする。FMexch: In the E1 stage of FMUL 208, if the possibility of trap occurrence cannot be denied, FMUL2
The pipeline processing at 08 changes from the normal F1 type to the F2 type. In this case, the execution completion stage in the FMUL 208 becomes the M stage, so that the execution completion stage of the other units is delayed by two cycles, and
In order to reach the execution completion stage at the same time as the M stage 8, the processing of other units is stalled by asserting the FMexch stall request signal.

【００９２】FDexch：FDIV２０９のE1ステージにおい
て、トラップ発生の可能性が否定できない場合、FDIV２
０９でのパイプライン処理は、通常のD1型からD2型にな
る。この場合、FDIV２０９での実行完了ステージはＭス
テージになるので、他のユニットの実行完了ステージを
FDIV２０９がE3ステージを通過する迄遅延させて全ての
ユニットがFDIV２０９のＭステージと同時に実行完了ス
テージに達するようにするために、このFDexchストール
要求信号をアサートすることによって他のユニットの処
理をストールする。このようなこのFDexchストール要求
信号によってストールが起こる場合が図１４に示されて
おり、「fdiv」命令と同時にフェッチされた命令「iad
d」及び「fadd」の処理は、命令「fdiv」がE3ステージ
を通過するまでストールされる。FDexch: In the E1 stage of FDIV209, if the possibility of occurrence of a trap cannot be denied, FDIV2
The pipeline processing at 09 changes from the normal D1 type to the D2 type. In this case, the execution completion stage of the FDIV 209 becomes the M stage, so the execution completion stage of the other units is changed to the M stage.
Stall the processing of other units by asserting this FDexch stall request signal to delay until the FDIV 209 has passed the E3 stage so that all units reach the execution complete stage at the same time as the M stage of the FDIV 209. . FIG. 14 shows a case where a stall occurs due to such an FDexch stall request signal, and the instruction “iad” fetched simultaneously with the “fdiv” instruction
The processing of “d” and “fadd” is stalled until the instruction “fdiv” passes through the E3 stage.

【００９３】Fstall：このタイプのストール要求信号
は、実行完了ステージ以前にキャッシュミスリカバリの
ような処理を除く全てのパイプラインの処理をロックさ
せるために、外部からアサートされる。Fstall: This type of stall request signal is externally asserted before the execution completion stage to lock all pipeline processing except for processing such as cache miss recovery.

【００９４】図１５から図１７は、この第３実施例にお
けるストール制御のタイミングチャートの例を示す。FIGS. 15 to 17 show examples of timing charts of stall control in the third embodiment.

【００９５】図１５のタイミングチャートにおいては、
N 番目のサイクルでALU0 ２０３のＭステージにおいて
キャッシュミスが検出されて、M0busy信号がアサートさ
れる一方で、キャッシュミスリカバリ処理がスタートす
る。一方、FMUL２０８についてトラップ発生の可能性が
否定されないので、FMexch信号がアサートされ、stall1
信号及びstall2信号がアサートされる。。In the timing chart of FIG.
In the Nth cycle, a cache miss is detected in the M stage of ALU0 203, and the M0busy signal is asserted, while the cache miss recovery process starts. On the other hand, since the possibility of occurrence of a trap cannot be denied for FMUL 208, the FMexch signal is asserted and stall1
The signal and stall2 signal are asserted. .

【００９６】N+2 番目のサイクルにおいては、ALU0２０
３についてキャッシュミスリカバリ処理が続けられる一
方、FMUL２０８は例外を生じなかったのでFMexch信号は
ニゲートされる。しかし、stall1信号及びstall2信号は
まだアサートされているので、FMUL２０８の実行は完了
できない。In the (N + 2) th cycle, ALU020
While the cache miss recovery process continues for FM3, the FMexch signal is negated because FMUL 208 did not generate an exception. However, execution of FMUL 208 cannot be completed because the stall1 and stall2 signals are still asserted.

【００９７】N+4 番目のサイクルにおいては、ALU0２０
３についてキャッシュミスリカバリ処理がこのサイクル
中に完了できるので、M0busy信号がニゲートされる。こ
のため、stall1信号及びstall2信号はニゲートされるの
で、他の全てのユニットでの実行が完了できるようにな
る。In the N + 4th cycle, ALU020
The M0busy signal is negated because the cache miss recovery process for 3 can be completed during this cycle. Therefore, the stall1 signal and the stall2 signal are negated, so that execution in all other units can be completed.

【００９８】そして最後に、N+5 番目のサイクルにおい
て、全ての命令の実行が完了されている。Finally, in the (N + 5) th cycle, the execution of all instructions has been completed.

【００９９】図１６のタイミングチャートにおいては、
各処理ユニットでの命令のアドレスは、ALU0２０３＜AL
U1２０４且つFMUL２０８＜ALU1２０４＜FADD２０７であ
ると仮定している。この場合、キャッシュミスがN 番目
のサイクルにおいてALU1２０４のＭステージで検出され
て、M1busy信号がアサートされる一方で、キャッシュミ
スリマバリ処理がスタートする。一方、トラップ発生の
可能性がFMUL２０８について否定できないので、FMexch
信号がアサートされ、stall1信号及びstall2信号がアサ
ートされる。In the timing chart of FIG.
The address of the instruction in each processing unit is ALU0203 <AL
It is assumed that U1204 and FMUL208 <ALU1204 <FADD207. In this case, a cache miss is detected in the M stage of the ALU 1204 in the Nth cycle, and the M1busy signal is asserted, while the cache miss recovery process starts. On the other hand, since the possibility of trap occurrence cannot be denied for FMUL 208, FMexch
The signal is asserted, and the stall1 and stall2 signals are asserted.

【０１００】N+2 番目のサイクルにおいては、ALU1２０
４についてキャッシュミスリカバリ処理が続けられる一
方、FMUL２０８は例外を起こさなかったのでFMexch信号
はニゲートされる。この時点で、stall2信号はニゲート
されるのでALU0２０３及びFMUL２０８での実行は完了で
きるが、stall1信号がまだアサートされているので、FA
DD２０７の実行は完了できない。In the (N + 2) th cycle, ALU120
While the cache miss recovery process continues for FM4, the FMexch signal is negated because FMUL 208 did not cause an exception. At this point, the stall2 signal is negated and the execution in ALU0203 and FMUL208 can be completed, but the stall1 signal is still asserted,
Execution of DD 207 cannot be completed.

【０１０１】N+4 番目のサイクルにおいては、ALU1２０
４についてキャッシュミスリカバリ処理がこのサイクル
中に完了できるので、M1busy信号がニゲートされる。こ
のため、stall1信号はニゲートされるので、FADD２０７
の実行が完了できるようになる。In the N + 4th cycle, ALU120
Since the cache miss recovery process for 4 can be completed during this cycle, the M1busy signal is negated. Therefore, the stall1 signal is negated, so that the FADD207
Can be completed.

【０１０２】そして最後に、N+5 番目のサイクルにおい
て、全ての命令の実行が完了されている。Finally, in the (N + 5) th cycle, all instructions have been executed.

【０１０３】図１７のタイミングチャートにおいては、
N 番目のサイクルにおいてALU1２０４のＭステージでキ
ャッシュミスが検出されて、M1busy信号がアサートされ
る一方でキャッシュミスリカバリ処理がスタートする。
一方、トラップ発生の可能性がFMUL２０８について否定
できないので、FMexch信号がアサートされ、stall1信号
及びstall2信号がアサートされる。In the timing chart of FIG.
In the Nth cycle, a cache miss is detected at the M stage of the ALU 1204, and while the M1busy signal is asserted, the cache miss recovery process starts.
On the other hand, since the possibility of trap occurrence cannot be denied for the FMUL 208, the FMexch signal is asserted, and the stall1 signal and the stall2 signal are asserted.

【０１０４】N+2 番目のサイクルにおいては、ALU1２０
４についてキャッシュミスリカバリ処理が完了し、M1bu
sy信号がニゲートされ、同じN+2 番目のサイクル又はひ
とつ前のN+1 番目のサイクルにおいて、FMUL２０８が例
外を起こさなかったのでFMexch信号がニゲートされる。
この時点で、stall1信号及びstall2信号もニゲートされ
るので全ての命令の実行が完了できるようになる。In the (N + 2) th cycle, ALU 120
4, the cache miss recovery process is completed, and M1bu
The sy signal is negated, and the FMexch signal is negated in the same N + 2th cycle or the previous N + 1th cycle because FMUL 208 did not raise an exception.
At this point, the stall1 signal and the stall2 signal are also negated, so that execution of all instructions can be completed.

【０１０５】この第３実施例におけるstall1信号、stal
l2信号及びFRbusy信号を用いたストール制御処理に応じ
た各処理ステージでの各パイプラインの処理について図
１８に示される表に要約する。In the third embodiment, the stall1 signal, stal
The table shown in FIG. 18 summarizes the processing of each pipeline in each processing stage according to the stall control processing using the l2 signal and the FRbusy signal.

【０１０６】この第３実施例においては、命令実行にお
いて例外発生の可能性を否定できないとして命令の処理
がストールされた後、命令の処理は、例外が実際に命令
実行の際に起こったときにアボートされるか、さもなく
ば、命令の実行において例外が実際には起こらなかった
ときに命令が復帰する。図１８において、BUはプロセッ
サシステムの各処理ユニットに設けられた分岐ユニット
を指すものである。In the third embodiment, after the processing of an instruction is stalled on the assumption that the possibility of occurrence of an exception cannot be denied in the execution of the instruction, the processing of the instruction is stopped when the exception actually occurs during the execution of the instruction. The instruction is aborted or otherwise returned when no exception actually occurred in the execution of the instruction. In FIG. 18, BU indicates a branch unit provided in each processing unit of the processor system.

【０１０７】故に、この第３実施例によると、システム
のクロック周波数の低下を防げるように、サイクル時間
を増加することなく処理可能なトラップとストールの制
御機能を組み込んだスーパースカラプロセッサ等の並列
処理型プロセッサシステムを提供することが可能とな
る。Therefore, according to the third embodiment, a parallel processing such as a super scalar processor incorporating a trap and stall control function capable of processing without increasing the cycle time so as to prevent a decrease in the clock frequency of the system is prevented. It is possible to provide a type processor system.

【０１０８】[0108]

【発明の効果】以上説明したように、本発明の並列処理
計算機は、サイクル時間を増加させることなく、高速に
トラップとストールの制御処理が可能なものであり、ス
ーパースカラプロセッサ等の並列処理型プロセッサシス
テムにおいてシステムのクロック周波数を低下させるこ
となくトラップとストールの制御をより効率的に行うこ
とが可能となる。As described above, the parallel processing computer of the present invention is capable of high-speed trap and stall control processing without increasing the cycle time. In a processor system, trap and stall control can be performed more efficiently without lowering the system clock frequency.

[Brief description of the drawings]

【図１】本発明に係る並列処理型プロセッサシステムの
第１実施例のブロック図である。FIG. 1 is a block diagram of a first embodiment of a parallel processing type processor system according to the present invention.

【図２】図１の並列処理型プロセッサシステムにおける
トラップ制御ユニットのブロック図である。FIG. 2 is a block diagram of a trap control unit in the parallel processing type processor system of FIG. 1;

【図３】図２１のプログラムの実行時にトラップ要求が
生じた場合の図１の並列処理型プロセッサシステムにお
けるパイプライン処理の進行状況を示す図である。FIG. 3 is a diagram showing the progress of pipeline processing in the parallel processing type processor system of FIG. 1 when a trap request occurs during execution of the program of FIG. 21;

【図４】本発明に係る並列処理型プロセッサシステムの
第２実施例のブロック図である。FIG. 4 is a block diagram of a second embodiment of the parallel processing type processor system according to the present invention.

【図５】図４の並列処理型プロセッサシステムで実行さ
れるプログラムの一例を示す図である。FIG. 5 is a diagram illustrating an example of a program executed by the parallel processing type processor system of FIG. 4;

【図６】図５のプログラムの実行時にトラップ要求が生
じた場合の図４の並列処理型プロセッサシステムにおけ
るパイプライン処理の進行状況を示す図である。6 is a diagram showing a progress state of pipeline processing in the parallel processing type processor system of FIG. 4 when a trap request occurs during execution of the program of FIG. 5;

【図７】本発明に係る並列処理型プロセッサシステムの
第３実施例のブロック図である。FIG. 7 is a block diagram of a third embodiment of a parallel processing type processor system according to the present invention.

【図８】図７の並列処理型プロセッサシステムにおけ
る、命令キャッシュミスによるストールの一例を示す図
である。8 is a diagram illustrating an example of a stall due to an instruction cache miss in the parallel processing type processor system of FIG. 7;

【図９】図７の並列処理型プロセッサシステムにおけ
る、命令キャッシュミスによるストールの別の一例を示
すタイミングチャートである。9 is a timing chart showing another example of a stall due to an instruction cache miss in the parallel processing type processor system of FIG. 7;

【図１０】図７の並列処理型プロセッサシステムにおけ
る、命令キャッシュミスによるストールの別の一例を示
すタイミングチャートである。10 is a timing chart showing another example of a stall due to an instruction cache miss in the parallel processing type processor system of FIG. 7;

【図１１】図７の並列処理型プロセッサシステムにおけ
る、命令キャッシュミスによるストールの別の一例を示
すタイミングチャートである。11 is a timing chart showing another example of a stall due to an instruction cache miss in the parallel processing type processor system of FIG. 7;

【図１２】図７の並列処理型プロセッサシステムにおけ
る、命令キャッシュミスによるストールの別の一例を示
すタイミングチャートである。12 is a timing chart showing another example of a stall due to an instruction cache miss in the parallel processing type processor system of FIG. 7;

【図１３】図７の並列処理型プロセッサシステムにおけ
る、FAexchによるストールの一例を示すタイミングチャ
ートである。13 is a timing chart showing an example of a stall by FAexch in the parallel processing type processor system of FIG. 7;

【図１４】図７の並列処理型プロセッサシステムにおけ
る、FDexchによるストールの一例を示すタイミングチャ
ートである。14 is a timing chart showing an example of a stall by FDexch in the parallel processing type processor system of FIG. 7;

【図１５】図７の並列処理型プロセッサシステムにおけ
る、ストール制御処理の一例を示すタイミングチャート
である。FIG. 15 is a timing chart showing an example of a stall control process in the parallel processing type processor system of FIG. 7;

【図１６】図７の並列処理型プロセッサシステムにおけ
る、ストール制御処理の別の一例を示すタイミングチャ
ートである。FIG. 16 is a timing chart showing another example of the stall control process in the parallel processing type processor system of FIG. 7;

【図１７】図７の並列処理型プロセッサシステムにおけ
る、ストール制御処理の別の一例を示すタイミングチャ
ートである。FIG. 17 is a timing chart showing another example of the stall control process in the parallel processing type processor system of FIG. 7;

【図１８】図７の並列処理型プロセッサシステムにおけ
る、ストール制御処理に対する各パイプラインの処理態
様をまとめた表である。18 is a table summarizing the processing modes of each pipeline for stall control processing in the parallel processing type processor system of FIG. 7;

【図１９】従来のトラップ制御方法を用いた並列処理型
プロセッサシステムのブロック図である。FIG. 19 is a block diagram of a parallel processing type processor system using a conventional trap control method.

【図２０】図１９の従来の並列処理型プロセッサシステ
ムにおけるトラップ制御ユニットのブロック図である。20 is a block diagram of a trap control unit in the conventional parallel processing processor system of FIG.

【図２１】並列処理型プロセッサシステムで実行される
プログラムの一例を示す図である。FIG. 21 is a diagram illustrating an example of a program executed by the parallel processing type processor system.

【図２２】図２１のプログラムの実行時にトラップ要求
が生じた場合の図１９の並列処理型プロセッサシステム
におけるパイプライン処理の進行状況を示す図である。22 is a diagram showing the progress of pipeline processing in the parallel processing type processor system of FIG. 19 when a trap request occurs during execution of the program of FIG. 21.

[Explanation of symbols]

１０１命令メモリ１０１Ａ命令キャッシュメモリ１０２命令発行ユニット１０２Ａ命令発行ユニット１０３算術論理演算ユニット１０４算術論理演算ユニット１０５浮動小数点加算器１０６浮動小数点乗算器１０７メモリアクセスユニット１０８メモリアクセスユニット１０９浮動小数点例外チェックユニット１１０浮動小数点例外チェックユニット１１１マルチポートレジスタファイル１２５２ポートデータメモリ１２５Ａ２ポートデータキャッシュメモリ１３０トラップ原因レジスタ１３１アボートアドレスレジスタ１３２トラップアドレスレジスタ１３３トラップ制御ユニット１５１Ｍステージプログラムカウンタ１１５２Ｍステージプログラムカウンタ２１５３Ｍステージサブプログラムカウンタ１５４Ｍステージサブプログラムカウンタ１５５Ｍステージサブプログラムカウンタ１５６Ｍステージサブプログラムカウンタ１５７トラップデータ生成ユニット１５８トラップ信号生成ユニット１６０バスライン１６１メインメモリ１６２Ｉ／Ｏ装置１６３ストール制御ユニット２０１命令キャッシュメモリ２０２命令発行ユニット２０３算術論理演算ユニット２０４算術論理演算ユニット２０５整数乗除算器２０６２ポートデータキャッシュメモリ２０７浮動小数点加算器２０８浮動小数点乗算器２０９浮動小数点除算器２１０命令アドレス生成ユニット２１１制御ユニット２１２整数レジスタファイル２１３浮動小数点レジスタファイル２１４メインメモリ２１５Ｉ／Ｏ装置２１６バスライン２１７プリデコーダ２１８レジスタスコアボード回路 101 Instruction Memory 101A Instruction Cache Memory 102 Instruction Issuing Unit 102A Instruction Issuing Unit 103 Arithmetic and Logic Operation Unit 104 Arithmetic and Logic Operation Unit 105 Floating Point Adder 106 Floating Point Multiplier 107 Memory Access Unit 108 Memory Access Unit 109 Floating Point Exception Check Unit 110 Floating point exception check unit 111 Multi-port register file 125 2-port data memory 125A 2-port data cache memory 130 Trap cause register 131 Abort address register 132 Trap address register 133 Trap control unit 151 M stage program counter 1 152 M stage program counter 2 153 M stage subprogram counter 1 4 M stage subprogram counter 155 M stage subprogram counter 156 M stage subprogram counter 157 Trap data generation unit 158 Trap signal generation unit 160 Bus line 161 Main memory 162 I / O device 163 Stall control unit 201 Instruction cache memory 202 Instruction issuance Unit 203 Arithmetic / Logic Operation Unit 204 Arithmetic / Logic Operation Unit 205 Integer Multiplier / Divider 206 2-Port Data Cache Memory 207 Floating Point Adder 208 Floating Point Multiplier 209 Floating Point Divider 210 Instruction Address Generation Unit 211 Control Unit 212 Integer Register File 213 Floating point register file 214 Main memory 215 I / O device 216 Bus line 217 Re-decoder 218 Register scoreboard circuit

───────────────────────────────────────────────────── フロントページの続き (72)発明者武田譲治神奈川県川崎市幸区小向東芝町１株式会社東芝総合研究所内 (56)参考文献特開平４−247523（ＪＰ，Ａ) 特開平４−218841（ＪＰ，Ａ) 特開平５−53806（ＪＰ，Ａ) 特開平５−20070（ＪＰ，Ａ) 特開平４−353929（ＪＰ，Ａ) 特開平４−308930（ＪＰ，Ａ) 特開平４−308929（ＪＰ，Ａ) 原哲也、外４名”ＳＩＭＰ（単一命令流／多重パイプライン）方式に基づく改良版スーパースカラ・プロセッサの構成と処理”、電子情報通信学会技術研究報告，Ｖｏｌ．90，Ｎｏ．144、社団法人電子情報通信学会、平成２年（1990年) ７月20日、ｐ．103−108 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 9/38 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Joji Takeda 1 Komukai Toshiba-cho, Saiwai-ku, Kawasaki-shi, Kanagawa Prefecture Toshiba Research Institute, Ltd. (56) References JP-A-4-247523 (JP, A) JP-A Heisei 4-218841 (JP, A) JP-A-5-53806 (JP, A) JP-A-5-20070 (JP, A) JP-A-4-353929 (JP, A) JP-A-4-308930 (JP, A) A) Japanese Patent Application Laid-Open No. 4-308929 (JP, A) Tetsuya Hara, et al., "Configuration and Processing of Improved Superscalar Processor Based on SIMP (Single Instruction Flow / Multiple Pipeline)", Electronic Information Communication Technical Report of the Society, Vol. 90, no. 144, The Institute of Electronics, Information and Communication Engineers, July 20, 1990, p. 103-108 (58) Field surveyed (Int. Cl. ⁷ , DB name) G06F 9/38

Claims

(57) [Claims]

1. A method for simultaneously executing a plurality of sequences of instructions.
(N is a positive integer) instruction execution means, instruction supply means for supplying instructions simultaneously executed by the N instruction execution means, and M (N ≧ M, M is a positive integer) instructions The instruction supply means simultaneously supplies the instruction execution means to the N instruction execution means, and when a processing exception occurs in the execution of at least one of the M instructions in one clock cycle, the N instruction execution is performed in the one clock cycle. Sending the abort signal to all of the N instruction execution means so that the simultaneous processing of the M instructions supplied to the N instruction execution means is aborted in the one clock cycle. An instruction level parallel processing type processor system, comprising: trap control means for controlling execution means.

2. An abort address storage means for storing an address of an instruction having the smallest address among the M instructions whose processing has been aborted by said trap control means; and said processing among said M instructions. 2. The instruction level parallel processing type processor system according to claim 1, further comprising: trap address storage means for storing an address of an instruction which caused the exception.

3. The method according to claim 1, wherein the N instruction executing means includes K instructions (M ≧ M).
K, where K is an integer) ordered arithmetic means having equivalent functions, wherein the instruction supply means is J pieces (K ≧ J> 1,
(J is an integer) so that the instructions in the J instructions earlier in the order are provided to the arithmetic means earlier in the K arithmetic means. The instruction level parallel processing type processor system according to claim 1, wherein said processor is supplied to said K arithmetic means.

4. When the possibility of occurrence of a processing exception in the execution of the I-th instruction among the J instructions cannot be denied, the latter of the J instructions after the I-th instruction. So as to stall the processing of instructions in order.
Stall means to stall part of the processing of the instructions
The instruction level parallel processing type processor system according to claim 3, further comprising:

5. When not be ruled out the possibility that the processing exception occurs in the execution of the M instruction, the M
And further execution after the execution of the M instructions.
That process stalls, said upon the occurrence of M processing <br/> management exception actually when an instruction has been executed, the M number of instruction execution of
Abort further processing after line and execute the M instructions
2. The instruction-level parallel processor according to claim 1, further comprising: a stall control unit that restarts further processing after the execution of the M instructions when a processing exception does not actually occur. Processing type processor system.

6. A means for handling the processing exception, and means for simultaneously restarting the M instructions aborted by the trap control means after handling the processing exception. The instruction level parallel processing type processor system according to claim 1, wherein

7. A method for controlling an instruction level parallel processing type processor system, comprising: M (N ≧ M, M) of a plurality of sequences simultaneously executed by N (N is a positive integer) instruction execution means of the system. And M instructions are simultaneously supplied to the N instruction execution means, and are processed in at least one instruction execution of the M instructions in one clock cycle. When an exception occurs, an abort signal is sent to all of the N instruction execution means during the one clock cycle, thereby processing all of the M instructions supplied to the N instruction execution means simultaneously. N so that it aborts during one clock cycle
Controlling a plurality of instruction execution means. A method for controlling an instruction level parallel processing type processor system, comprising: