JPH07120283B2

JPH07120283B2 - Microprocessor

Info

Publication number: JPH07120283B2
Application number: JP1083243A
Authority: JP
Inventors: 秀哉岸上; 操宮田; 光正岡本
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-04-01
Filing date: 1989-03-31
Publication date: 1995-12-20
Anticipated expiration: 2010-12-20
Also published as: JPH0242534A

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）この発明は、命令をパイプライン方式により実行処理す
るマイクロプロセッサに関し、特に、パイプラインの乱
れを抑制して、性能を大幅に向上させることができるマ
イクロプロセッサに関する。Description: [Object of the invention] (Industrial field of application) [0001] The present invention relates to a microprocessor that executes an instruction by a pipeline method, and particularly, to suppress the disturbance of the pipeline to significantly improve the performance. The present invention relates to a microprocessor that can be improved.

（従来の技術）近年、マイクロプロセッサにあっては、命令をパイプラ
イン方式より実行処理して、性能の向上を図っている。
このパイプライン方式における一般的なステージの構成
は、例えば「命令フェッチ→命令デコード→実効アドレ
ス計算→アドレス変換→オペランドリード（読出し）→
命令実行→オペランドライト（書込み）」となる（文献
「32ビット・マイクロプロセッサの全容−企業・戦略・
技術・市場動向」日経マグロウヒル社,PP.137〜139）。(Prior Art) In recent years, in microprocessors, instructions are executed by a pipeline method to improve performance.
A general stage configuration in this pipeline system is, for example, “instruction fetch → instruction decode → effective address calculation → address translation → operand read (read) →
Instruction execution → Operand write (writing) ”(Reference" The whole picture of 32-bit microprocessor-Company / Strategy /
Technology and Market Trends "Nikkei McGraw-Hill, PP.137-139).

このようなパイプライン構成にあって、メモリオペラン
ドを有する高機能命令（Im）は、実効アドレスの計算及
び実効アドレスから物理アドレスに変換を行うアドレス
変換のステージでの処理が必要となる。これに対して、
メモリオペランドのない基本命令（I_R）では、上記２つ
のステージでの処理は不要となる。In such a pipeline configuration, a high-performance instruction (Im) having a memory operand needs to be processed in an address translation stage that calculates an effective address and translates an effective address into a physical address. On the contrary,
The basic instruction (I _R ) having no memory operand does not require the processing in the above two stages.

したがって、例えば命令のシーケンスが、Im→I_R→Im→
I_R→Im→I_Rのような場合には、パイプラインの“流れ”
は、第12図に示すようになる。なお、各ステージの処理
は１サイクルで終了するものとし、命令（Im）のオペラ
ンドライトをレジスタとして、実行ステージで完了する
ものとする。また、第12図において、Ｘ印はステージの
動作が休止状態であることを示している。So, for example, the sequence of instructions is Im → I _R → Im →
In the case, such as the I _{_R} → Im → I _R is, the pipeline "flow"
Is as shown in FIG. The processing of each stage is completed in one cycle, and the operand write of the instruction (Im) is used as a register and completed in the execution stage. Further, in FIG. 12, an X mark indicates that the operation of the stage is in a resting state.

第12図から明らかなように、実効アドレス計算のステー
ジ（OAG）は、４サイクル目と６サイクル目において休
止状態であり、アドレス変換のステージ（MMU）は、５
サイクル目と７サイクル目において休止状態となってい
る。As is clear from FIG. 12, the effective address calculation stage (OAG) is in the rest state in the fourth and sixth cycles, and the address conversion stage (MMU) is 5
It is in the dormant state at the 7th and 7th cycles.

このことから、実効アドレス計算及びアドレス変換の各
ステージでの稼働率は、50（％）となる。From this, the operating rate at each stage of effective address calculation and address conversion is 50 (%).

一方、複雑な高機能命令セットを有するCISC（Complex
Instruction Set Computer）型のマイクロプロセッサの
場合には、実行のステージでの処理に数サイクルを必要
とする複雑な高機能命令（Ic）がある。On the other hand, CISC (Complex
In the case of an Instruction Set Computer) type microprocessor, there are complex high-performance instructions (Ic) that require several cycles for processing at the stage of execution.

このようなマイクロプロセッサにおいて、例えば命令シ
ーケンスが、Ic→I_R→I_R→I_R→Imのような場合は、パイ
プラインの流れが第13図に示すようになる。なお、第13
図において、命令Icは、その実行ステージでの処理に４
サイクルかかるものとし、Ｘ印は第12図と同様とする。In such a microprocessor, for example instruction sequence, if such as _{_{Ic → I R → I R →}} I R → Im flows in the pipeline as shown in Figure 13. The thirteenth
In the figure, the instruction Ic is used for processing at the execution stage.
It shall take a cycle and the X mark shall be the same as in FIG.

このような場合には、命令Icの実行に４サイクルかかる
ために、第13図から明らかなように、所謂“パイプライ
ンの乱れ”が生じる。これにより、第13図に示した例で
は、すべての命令の実行が第13図の斜線で示した理想的
なパイプラインの流れの中で終了せず、３サイクル分
（12サイクル目〜14サイクル目）だけ処理が長くかかっ
ている。In such a case, since it takes four cycles to execute the instruction Ic, so-called "disorder of the pipeline" occurs, as is apparent from FIG. As a result, in the example shown in FIG. 13, execution of all instructions does not end in the ideal pipeline flow shown by the diagonal lines in FIG. Only the eyes) take a long time to process.

また、高機能命令Icの実行に４サイクルかかるため、実
効アドレス計算（OAG）、アドレス変換（MMU）及びオペ
ランドリード（OF）の各ステージにおいて、休止状態が
存在することになる。Moreover, since it takes 4 cycles to execute the high-performance instruction Ic, a dormant state exists in each stage of effective address calculation (OAG), address conversion (MMU), and operand read (OF).

（発明が解決しようとする課題）パイプライン処理を行うマイクロプロセッサにおいて、
メモリオペランドを有する高機能命令（Im）とメモリオ
ペランドのない基本命令（I_R）がそれぞれ交互に実行さ
れた場合には、乱れは生じない。しかし、第12図に示し
たように、実効アドレス計算及びアドレス変換のステー
ジでの稼働率が低下するという問題が生じる。(Problems to be Solved by the Invention) In a microprocessor that performs pipeline processing,
Disturbance does not occur when a high-performance instruction (Im) having a memory operand and a basic instruction (I _R ) having no memory operand are alternately executed. However, as shown in FIG. 12, there arises a problem that the operating rate at the stage of effective address calculation and address translation is lowered.

また、実行ステージでの処理に数サイクルを必要とする
複雑な高機能命令（Ic）が実行される場合には、パイプ
ラインの流れに乱れが生じる。これにより、性能が低下
するという問題があった。Also, when a complex high-performance instruction (Ic) that requires several cycles for processing in the execution stage is executed, the flow of the pipeline is disturbed. As a result, there is a problem that the performance is lowered.

さらに、このような場合にも、所定のステージでの稼働
率が低下することになる。Further, even in such a case, the operating rate at the predetermined stage is reduced.

そこで、この発明は、上記問題に鑑みてなされたもので
あり、その目的とするところは、ステージの稼働率の低
下を防止するとともに、パイプラインの乱れを抑制し
て、性能を大幅に向上させることのできるマイクロプロ
セッサを提供することにある。Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to prevent a decrease in the operating rate of the stage, suppress turbulence of the pipeline, and significantly improve the performance. It is to provide a microprocessor capable of processing.

［発明の構成］（課題を解決するための手段）上記目的を達成するために、この発明に従うマイクロプ
ロセッサは、デコードされた命令のうち同一の処理過程
を得て実行処理される第１の種類の命令をマイクロプロ
グラム制御により実行処理する第１の実行処理手段と、
前記第１の種類の命令より処理過程が異なる第２の種類
の命令をハードワイヤード制御により実行処理する第２
の実行処理手段と、デコードされた命令をプログラムシ
ーケンスの順序で発行して、発行された命令を前記第１
の実行処理手段および前記第２の実行処理手段のどちら
で実行処理するかを選択決定し、前記第１の実行処理手
段と前記第２の実行処理手段を独立に、しかも並行して
動作させる制御手段と、前記第１の実行処理手段あるい
は前記第２の実行処理手段により命令の実行が終了する
と、前記第１の種類の命令あるいは前記第２の種類の命
令の実行結果を直ちに書込むための第１の情報保持領
域、及び、前記プログラムシーケンス順序に従って前記
実行結果を保持する第２の情報保持領域を備える情報保
持手段と、前記第２の種類の命令の直前の前記第１の種
類の命令の実行が終了した後に、前記第１の情報保持領
域に格納されている前記第１の種類の命令の実行結果お
よび前記第２の種類の命令の実行結果を前記第２の情報
保持領域にプログラムシーケンス順序で置き換える切替
え手段と、を有している。[Configuration of the Invention] (Means for Solving the Problem) In order to achieve the above object, a microprocessor according to the present invention is a first type in which a decoded instruction obtains the same processing step and is executed. First execution processing means for executing the processing of the instruction of 1 by microprogram control,
A second type of instruction that executes a second type instruction whose processing process is different from that of the first type instruction by hardwired control
And executing the decoded instructions in the order of the program sequence to issue the issued instructions to the first
Of the execution processing means and the second execution processing means are selectively determined, and the first execution processing means and the second execution processing means operate independently and in parallel. Means for writing the execution result of the instruction of the first kind or the instruction of the second kind when the execution of the instruction is completed by the means and the first execution processing means or the second execution processing means. An information holding unit having a first information holding area and a second information holding area for holding the execution result according to the program sequence order, and the instruction of the first type immediately before the instruction of the second type After the execution of the above is completed, the execution result of the first type instruction and the execution result of the second type instruction stored in the first information holding area are stored in the second information holding area. Has a switching means replaces a sequence order, the.

（作用）上記構成のマイクロプロセッサによれば、高機能命令と
基本命令を、それぞれ独立して実行処理するようにし
て、高機能命令と基本命令が並行してあるいは同時に実
行処理されることを可能にしている。(Operation) According to the microprocessor having the above configuration, the high-function instruction and the basic instruction can be executed independently of each other, so that the high-function instruction and the basic instruction can be executed in parallel or simultaneously. I have to.

また、制御手段によって発行された命令の実行を、第１
の情報保持手段の更新にしたがって開始するようにして
いる。さらに、メインルーチンからはずれたサブルーチ
ンが実行された後、プログラムシーケンスがメインルー
チンに戻った時に、実行開始の命令を第２の情報保持手
段の保持内容にしたがって決定するようにして、命令を
再実行できるようにしている。In addition, the execution of the instruction issued by the control means is
It is started according to the update of the information holding means. Furthermore, after the subroutine deviating from the main routine is executed, when the program sequence returns to the main routine, the instruction to start execution is determined according to the content held in the second information holding means, and the instruction is re-executed. I am able to do it.

また、発行された命令に関する情報及び命令の実行／終
了状態に関する情報にしたがって、第２の情報保持手段
の保持内容がプログラムシーケンス順に順序正しく更新
されるようにしている。Further, according to the information about the issued instruction and the information about the execution / end state of the instruction, the contents held in the second information holding means are updated in order in the program sequence.

（実施例）以下図面を用いてこの発明の実施例を説明する。Embodiment An embodiment of the present invention will be described below with reference to the drawings.

第１図は、この発明を実施したマイクロプロセッサの内
部全体の構成を示すブロック図である。FIG. 1 is a block diagram showing the entire internal configuration of a microprocessor embodying the present invention.

このマイクロプロセッサは、主記憶からの命令データの
フェッチを行う命令フェッチユニット（IFU）１と、上
記命令フェッチユニット１よりの命令データの解読を行
うためのデコードユニット（DCU）２と、上記デコード
ユニット２から送られてきた命令情報をその種類すなわ
ち、メモリオペランドを持たない基本命令およびメモリ
オペランドを持つ基本命令ないしは処理の複雑な高機能
命令に従って発行するための命令発行ユニット（IIU）
３と、命令の実行を上記種類に従ってハードワイヤード
制御またはマイクロプログラム制御で行うための命令実
行ユニット（EXU）４と、メモリオペランドのアドレス
を生成するためのメモリ管理ユニット（MMU）５と、オ
ペランドデータを管理するためのキャッシュ制御ユニッ
ト（CCU）６と、上記マイクロプロセッサと外部とのデ
ータ入出力を制御するための入出力部（I/O）７とを有
している。The microprocessor includes an instruction fetch unit (IFU) 1 for fetching instruction data from a main memory, a decode unit (DCU) 2 for decoding instruction data from the instruction fetch unit 1, and the decode unit. An instruction issue unit (IIU) for issuing the instruction information sent from 2 according to its type, that is, a basic instruction that does not have a memory operand and a basic instruction that has a memory operand or a highly functional instruction with complicated processing.
3, an instruction execution unit (EXU) 4 for executing instructions by hard-wired control or microprogram control according to the above type, a memory management unit (MMU) 5 for generating addresses of memory operands, and operand data It has a cache control unit (CCU) 6 for managing data and an input / output unit (I / O) 7 for controlling data input / output between the microprocessor and the outside.

上記命令フェッチユニット（IFU）１は、主記憶上の命
令データ群の一部のコピーを保持する命令キャッシュ・
メモリ（Instruction Cache）８や命令キャッシュ・メ
モリ８への主記憶からの命令データのフェッチ等の制御
を行うプリフェッチ制御回路（Prefetcher）９等から構
成されるもので、従来と同様のものである。The instruction fetch unit (IFU) 1 is an instruction cache that holds a copy of part of the instruction data group in main memory.
It is composed of a memory (Instruction Cache) 8 and a prefetch control circuit (Prefetcher) 9 for controlling fetching of instruction data from the main memory to the instruction cache memory 8 and the like, and is the same as the conventional one.

上記デコードユニット（DCU）２は、命令コードの解読
を行う命令デコーダ（Decoder）10やデコードした結果
の命令情報を複数個、一時的に保持するデコード済余命
ループバッファ（Decoded Instruction Loop Buffer）1
1等から構成される。本実施例ではデコードした命令情
報をデコード済命令ループバッファ11から一度（１サイ
クル）に２命令分読みだし、命令発行ユニット（IIU）
３に転送できる構成となっている。The decoding unit (DCU) 2 includes an instruction decoder (Decoder) 10 for decoding an instruction code and a decoded life expectancy loop buffer (Decoded Instruction Loop Buffer) 1 for temporarily holding a plurality of instruction information as a result of decoding.
Composed of 1 etc. In this embodiment, the decoded instruction information is read from the decoded instruction loop buffer 11 for two instructions at a time (one cycle), and the instruction issue unit (IIU) is read.
It is configured so that it can be transferred to the No.

ただし本発明には、デコード済命令ループバッファ11や
一度に２命令分読みだす機能は必ずしも必要ではない。However, the present invention does not necessarily require the decoded instruction loop buffer 11 and the function of reading two instructions at a time.

上記命令発行ユニット（IIU）３は、上記デコードユニ
ット２から送られてきた命令情報を、上記種類にしたが
って、命令実行ユニット（EXU）４ないしメモリ管理ユ
ニット（MMU）５に対して発行する命令発行制御回路（I
nstruction Issue Logic）12や汎用レジスタ値を保持す
るカレントファイル（Current File）13、フューチャフ
ァイル（Future File）14、およびリオーダーバッファ
（Reorder Buffer）15等から構成される。The instruction issue unit (IIU) 3 issues the instruction information sent from the decode unit 2 to the instruction execution unit (EXU) 4 or the memory management unit (MMU) 5 according to the type. Control circuit (I
nstruction Issue Logic) 12, a current file (Current File) 13 that holds general-purpose register values, a future file (Future File) 14, a reorder buffer (Reorder Buffer) 15, and the like.

上記命令発行制御回路（IIL）12は通常のパイプライン
処理を行なうマイクロプロセッサがもつパイプライン制
御回路の機能（ハザードの検出などを行ない、各パイプ
ライン・ステージの状態制御を行なう）のほかに、上記
送られてきた命令情報が、メモリオペランドを持たない
基本命令であるかメモリオペランドを持つ基本命令ある
いは処理の複雑な高機能命令であるかを選択決定し、後
述する複数の命令実行部において上記各命令が並行して
実行される様に制御する機能、後述する複数の命令実行
部でプログラムシーケンス順とは異って終了する命令実
行結果の情報をプログラムシーケンス順に戻すためリオ
ーダーバッファ15の制御（情報設定／解除）を行なう機
能を有する。上記カレントファイル13はプログラムシー
ケンス順に従って更新されるが、フューチャファイル14
はプログラムシーケンス順とは無関係に後述する命令実
行ユニット（EXU）４で実行終了後、その実行結果によ
ってただちに更新される。上記リオーダーバッファ15は
命令実行ユニット（EXU）４の複数の命令実行部でプロ
グラムシーケンス順とは異って終了する命令実行結果の
情報を一時保持し、プログラムシーケンス順にカレント
ファイル13を更新するためのバッファである。The instruction issue control circuit (IIL) 12 has a function of a pipeline control circuit included in a microprocessor for performing normal pipeline processing (performs state detection of each pipeline stage by detecting a hazard, etc.), The sent instruction information is selectively determined whether it is a basic instruction having no memory operand, a basic instruction having a memory operand, or a high-performance instruction having complicated processing, and the plural instruction execution units described later Function to control each instruction to be executed in parallel, control of the reorder buffer 15 to return the information of the instruction execution result that ends in a sequence different from the program sequence in a plurality of instruction execution units described later in the program sequence order It has the function of (information setting / cancellation). The current file 13 is updated according to the program sequence order.
Is immediately updated with the execution result after the execution is completed in the instruction execution unit (EXU) 4 described later regardless of the program sequence order. The reorder buffer 15 temporarily holds the information of the instruction execution result which is different from the program sequence order in the plurality of instruction execution units of the instruction execution unit (EXU) 4 and updates the current file 13 in the program sequence order. Is the buffer.

すなわち、上記基本命令と高機能命令とは実行に要する
サイクルが異なり、ここでは、実行に要するサイクルが
異なる命令をそれぞれに対応した命令実行部で実行する
ようにしているため、プログラムシーケンスの順序で発
行される命令は、そ実行が必ずしもプログラムのシーケ
ンスの順序にしたがって終了するとは限らず、順序が逆
転することがある。That is, the basic instruction and the high-performance instruction have different cycles required for execution, and here, since the instructions having different cycles are executed by the instruction execution units corresponding to each of them, the order of program sequence is different. The issued instructions do not necessarily complete their execution in the order of the sequence of the program, but may be reversed in order.

したがって、リオーダバッファ15は、プログラムシーケ
ンスの順序でカレントファイル13の中のレジスタの内容
を更新して、上記の逆転した順序をプログラムシーケン
スの順序に戻すようにしている。すなわち、Out of ord
erで終了した命令をReorderする働きをする。Therefore, the reorder buffer 15 updates the contents of the registers in the current file 13 in the order of the program sequence so as to restore the reversed order to the order of the program sequence. That is, Out of ord
It works to reorder the commands finished with er.

これにより、割込み等のメインルーチンからはずれたプ
ログラムが実行された場合には、カレントファイル13の
内容を参照することにより、命令を再実行することが可
能となる。As a result, when a program such as an interrupt that deviates from the main routine is executed, the instruction can be re-executed by referring to the contents of the current file 13.

また、上記命令発行ユニット（IIU）３は、分岐命令の
高速実行を行うための分岐予測回路（Branch Predictio
n Logic）16等も有する。Further, the instruction issuing unit (IIU) 3 is a branch prediction circuit (Branch Predictio) for performing high-speed execution of branch instructions.
n Logic) 16 etc.

上記命令発行制御回路12、カレントファイル13、フュー
チャファイル14、およびリオーダーバッファ15は本発明
の目的を達成するために必要な構成要件である。ただ
し、リオーダーバッファ15を用いないで本発明の目的を
達成する方法もあり、それについては他の実施例という
ことで後述する。また上記カレントファイル13とフュー
チャファイル14は物理的には必ずしも別のものではなく
ても良く、ひとつのレジスタファイルを２つの部分に分
けた場合の一方と他方でも良く、それについても他の実
施例ということで後述する。The instruction issue control circuit 12, the current file 13, the future file 14, and the reorder buffer 15 are necessary constituent elements for achieving the object of the present invention. However, there is a method of achieving the object of the present invention without using the reorder buffer 15, which will be described later as another embodiment. Further, the current file 13 and the future file 14 do not have to be physically separate from each other, and may be one and the other when one register file is divided into two parts. This will be described later.

上記命令実行ユニット（EXU）４は、命令の実行をハー
ドワイヤー制御またはマイクロプログラム制御で並行し
て行なうユニットである。この実施例では、メモリオペ
ランドを持たない基本命令（比較・転送命令・算術・論
理演算命令など）をハードワイヤード制御で行なう基本
命令実行部（Simple Execution processor）17、メモリ
オペランドを持つ基本命令や処理の複雑な高機能命令を
マイクロプログラム制御で実行する命令実行部（Intege
r Execution Processor）18、および浮動小数点演算命
令を実行する浮動小数点実行部（Floating Execution P
rocessor）19の３つの実行部から構成される。The instruction execution unit (EXU) 4 is a unit that executes instructions in parallel by hardwire control or microprogram control. In this embodiment, a basic instruction execution unit (Simple Execution processor) 17 for performing basic instructions (compare / transfer instruction / arithmetic / logical operation instruction) having no memory operand by hard-wired control; Instruction execution unit (Intege
r Execution Processor 18 and the floating point execution unit (Floating Execution P) that executes floating point arithmetic instructions.
rocessor) 19 is composed of three execution units.

なお本発明は、命令の種類に対応した複数の命令実行部
を持つことが特徴であり、必ずしも３つの実行部から構
成されなくても良い。また本発明の変形として、メモリ
オペランドを持たない基本命令の実行部とオペランドの
実効アドレス計算を行なう部分を共通化した構成も考え
られる。The present invention is characterized by having a plurality of instruction execution units corresponding to the types of instructions, and does not necessarily have to be composed of three execution units. Further, as a modification of the present invention, a configuration is possible in which the execution unit of a basic instruction having no memory operand and the unit for calculating the effective address of the operand are shared.

上記メモリ管理ユニット（MMU）５は、メモリオペラン
ドの実効アドレスを生成する実効アドレス生成部（Oper
and Address Generator）20、実効アドレス（論理アド
レス）を物理アドレスに変換するアドレス変換バッファ
（Translation Lookaside Buffer）21、メモリ保護のチ
ェックを行う保護チェック回路（Protection Logic）22
等から構成されるもので、従来と同様のものである。The memory management unit (MMU) 5 is an effective address generation unit (Oper) that generates an effective address of a memory operand.
and Address Generator) 20, an address translation buffer (Translation Lookaside Buffer) 21 that converts an effective address (logical address) into a physical address, and a protection check circuit (Protection Logic) 22 that checks memory protection.
Etc., and is the same as the conventional one.

上記キャッシュ制御ユニット（CCU）６は、主記憶上の
オペランド群の一部のコピーを保持するデータキャッシ
ュ・メモリ（Data Cache）23や書き込みオペランドデー
タを一時保持するストア・バッファ（Store Buffer）24
等から構成されるもので、従来と同様のものである。The cache control unit (CCU) 6 includes a data cache memory (Data Cache) 23 that holds a copy of a part of the operand group on the main memory and a store buffer (Store Buffer) 24 that temporarily holds write operand data.
Etc., and is the same as the conventional one.

上記入出力部（I/O）７は、マイクロプロセッサと外部
とのデータ入出力を制御する部分でドライバ／レシーバ
（Driver/Receiver）25やバス制御部（Bus Control）26
等から構成されるもので、従来と同様のものである。The input / output unit (I / O) 7 is a part that controls data input / output between the microprocessor and the outside, and is a driver / receiver 25 and a bus control unit 26.
Etc., and is the same as the conventional one.

第２図は、第１図に示したマイクロプロセッサの内部ブ
ロックにおいて本発明に特に関連する要部ブロックを示
したものである。FIG. 2 shows main blocks particularly relevant to the present invention in the internal block of the microprocessor shown in FIG.

第２図において、バスは２重線で示し、データ線は直線
で示しており、制御線は省略している。In FIG. 2, the bus is shown as a double line, the data line is shown as a straight line, and the control line is omitted.

そして、第２図における各ブロックの内部をさらに詳細
に示すと第３図の如くになる。The inside of each block in FIG. 2 is shown in more detail as shown in FIG.

第３図において、前記命令発行制御回路（IIL）12は、
パイプラインの各ステージで実行中の命令に関する情報
を保持するパイプライン・レジスタ（OAGR30、MMUR31、
CCUR32、IEPR33およびSEPR34）と、それらの情報を元に
パイプラインの流れを制御するコントロール回路（Cont
rol）35から構成される。パイプラインの流れについて
は第７図、第８図を参照して後述する。コントロール回
路35はまたリオーダバッファRB15の制御（データの登録
・削除等）も行う。In FIG. 3, the instruction issue control circuit (IIL) 12 is
Pipeline registers (OAGR30, MMUR31, etc.) that hold information about the instruction being executed at each stage of the pipeline.
CCUR32, IEPR33 and SEPR34) and a control circuit (Cont that controls the flow of the pipeline based on those information
rol) 35. The flow of the pipeline will be described later with reference to FIGS. 7 and 8. The control circuit 35 also controls the reorder buffer RB15 (data registration / deletion, etc.).

本実施例では前記命令発行制御回路12は、１サイクルで
２命令分の情報をデコードユニット２のデコード済命令
ループバッファ（DILB）11から受けることができる。
（ただしそのうち１つはメモリ・オペランドを持たない
基本命令。） SEPR34は現在基本命令実行部17で実行中の命令に関する
情報を保持するレジスタである。In this embodiment, the instruction issue control circuit 12 can receive information for two instructions in one cycle from the decoded instruction loop buffer (DILB) 11 of the decode unit 2.
(However, one of them is a basic instruction that does not have a memory operand.) SEPR34 is a register that holds information about an instruction currently being executed by the basic instruction execution unit 17.

OAGR30は現在OAG20で実効アドレス計算中の命令に関す
る情報を保持するレジスタである。The OAGR30 is a register that holds information about an instruction whose effective address is currently calculated by the OAG20.

MMUR31は現在MMU5でアドレス変換中の命令に関する情報
を保持するレジスタである。The MMUR31 is a register that holds information about an instruction whose address is currently being translated by the MMU5.

CCUR32は現在CCU6でメモリ・アクセス（オペランドリー
ド）中の命令に関する情報を保持するレジスタである。CCUR32 is a register that holds information about the instruction currently being memory-accessed (operand read) in CCU6.

IEPR33は現在IEP18で実行中の命令に関する情報を保持
するレジスタである。IEPR33 is a register that holds information about the instruction currently being executed by IEP18.

なおオペランド・ライトに関する情報は、CCU6のストア
バッファ24に保持されるためIIL12にはオペランド・ラ
イトに関する情報を保持するレジスタは存在しない。Since the information about the operand write is held in the store buffer 24 of the CCU 6, the IIL 12 has no register for holding the information about the operand write.

IIL12の詳細ブロックは第４図参照。See Fig. 4 for detailed blocks of IIL12.

基本命令実行部（SEP）17はメモリ・オペランドを持た
ない基本命令をハードワイヤー制御で実行する為の演算
器（Adder）36を持つブロックである。演算器36はIIL12
中のSEPR34によって直接制御される。The basic instruction execution unit (SEP) 17 is a block having an arithmetic unit (Adder) 36 for executing a basic instruction having no memory operand by hard wire control. The calculator 36 is IIL12
Directly controlled by SEPR34 inside.

高機能命令実行部（IEP）18は、高機能命令をマイクロ
プログラム制御で実行するための演算器（ALU37,Barrel
Shifter38、Multiplier39）および、マイクロプログラ
ムを保持するμROM40およびシーケンサから構成され
る。RAL41はμROM40のアドレスを保持するためのレジス
タ、MIR42はマイクロ命令を保持するためのレジスタ、E
rrAdr43はエラー発生時のμROM40のアドレスを保持する
ためのレジスタである。またSEL44はRAL41、ErrAdr43お
よびIIL12のCCUR32のopフィールド88に保持されている
値（次にIEP18で実行する命令の先頭マイクロ命令のア
ドレス）のうちの一つを選択するためのセレクタであ
る。The high-performance instruction execution unit (IEP) 18 is an arithmetic unit (ALU37, Barrel) for executing high-performance instructions under microprogram control.
Shifter38, Multiplier39), and μROM40 that holds microprograms and sequencer. RAL41 is a register for holding the μROM40 address, MIR42 is a register for holding microinstructions, E
rrAdr43 is a register for holding the address of the μROM 40 when an error occurs. SEL44 is a selector for selecting one of the values held in the op field 88 of CCUR32 of RAL41, ErrAdr43, and IIL12 (the address of the first microinstruction of the instruction to be executed next by IEP18).

実行アドレス生成部（OAG）20は、メモリ・オペランド
の実効アドレスを算出するための加算器（Address Gene
rator）47から構成される。The execution address generator (OAG) 20 is an adder (Address Gene) for calculating the effective address of the memory operand.
rator) 47.

メモリ管理ユニット（MMU）５は、論理アドレス（実効
アドレス）を物理アドレスに変換するためのアドレス対
を保持するアドレス変換バッファ（Translation Lookas
ide Buffer:TLB）21およびメモリ・アクセス権をチェッ
クするためのアクセス権チェック回路（Protection Log
ic）22から構成される。The memory management unit (MMU) 5 is an address translation buffer (Translation Lookas) that holds an address pair for translating a logical address (effective address) into a physical address.
ide Buffer (TLB) 21 and access right check circuit (Protection Log) for checking memory access rights
ic) 22.

キャッシュ制御ユニット（CCU）６は、メイン・メモリ
のデータの一部のコピーを保持するデータ・キャッシュ
（Cache）23およびライト・データの情報を一時的に保
持するストア・バッファ（Store Buffer）24より構成さ
れる。データ・キャッシュ23はデータを保持するデータ
部（DATA）48とアドレスや属性を保持するタグ部（TA
G）49から成る。またストア・バッファ24もデータを保
持するデータ部（DATA）50とアドレスを保持するアドレ
ス部（ADDRESS）51より成る。IEP18より送られてきたラ
イト・データはいったんストア・バッファ24に格納さ
れ、その後データ・キャッシュ23および主記憶に書き込
まれる。The cache control unit (CCU) 6 includes a data cache (Cache) 23 that holds a partial copy of the data in the main memory and a store buffer (Store Buffer) 24 that temporarily holds information about write data. Composed. The data cache 23 has a data part (DATA) 48 that holds data and a tag part (TA) that holds addresses and attributes.
G) 49. The store buffer 24 also includes a data section (DATA) 50 that holds data and an address section (ADDRESS) 51 that holds an address. The write data sent from the IEP 18 is once stored in the store buffer 24 and then written in the data cache 23 and the main memory.

第４図は、第３図中のIIL12、RB15、CF13、FF14の部分
の詳細図である。FIG. 4 is a detailed view of a portion of IIL12, RB15, CF13, FF14 in FIG.

DCU2のデコード済命令ループバッファ11から送られてき
た命令の情報は、SEPR34またはOAGR30に格納される。SE
PR34にはメモリ・オペランドをもたない基本命令の情報
のみ格納することができる。一方OAGR30には全ての命令
の情報を格納することができる。Information on the instruction sent from the decoded instruction loop buffer 11 of the DCU 2 is stored in the SEPR 34 or OAGR 30. SE
PR34 can store only the information of the basic instructions that have no memory operand. On the other hand, OAGR30 can store information of all instructions.

SEPR34は次のフィールドから構成される。SEPR34 is composed of the following fields.

OP60…基本命令の種類を示し（比較、転送、加算な
ど）、SEPの演算器の機能を制御する。OP60: Indicates the type of basic instruction (comparison, transfer, addition, etc.) and controls the function of the SEP computing unit.

R/161…ソース・オペランドがレジスタかイミディエイ
ト・データかを区別する。R / 161: Distinguishes whether the source operand is a register or immediate data.

＃src62…ソース・オペランドのレジスタ番号を指定す
る。# Src62 ... Specify the register number of the source operand.

＃dest63…デスティネーション・オペランドのレジスタ
番号を指定する。# Dest63 ... Specify the register number of the destination operand.

Imm64…イミディエイト・データ。Imm64 ... Immediate data.

PC65…命令の先頭アドレス V66…有効ビット OAGR30は次のフィールドから構成される。PC65 ... Start address of instruction V66 ... Effective bit OAGR30 consists of the following fields.

OP67…命令の種類を示す。OP67: Indicates the type of instruction.

R/MI68…ソース・オペランドがレジスタかメモリかを区
別する。R / MI68 ... Distinguishes whether the source operand is a register or memory.

＃src69…ソース・オペランドのレジスタ番号を指定す
る。# Src69 ... Specify the register number of the source operand.

R/M70…デスティネーション・オペランドがレジスタか
メモリかを区別する。R / M70: Distinguishes whether the destination operand is register or memory.

＃dest71…デスティネーション・オペランドがレジスタ
かメモリかを区別する。# Dest71 ... Distinguishes whether the destination operand is a register or a memory.

Imm72…イミディエイト・データ。Imm72 ... immediate data.

Amode73…メモリ・オペランドのアドレッシング・モー
ドを指定する。Amode73 ... Specifies the addressing mode of the memory operand.

Areg74…メモリ・オペランドのアドレッシング・モード
で使用するレジスタ番号を指定する。Areg74 ... Specifies the register number used in the memory operand addressing mode.

Disp75…メモリ・オペランドのアドレッシング・モード
で使用するディスプレースメント。Disp75 ... Displacement used in memory operand addressing mode.

Ex.76…その他。Ex.76… Other.

PC77…命令の先頭アドレス。PC77 ... Start address of instruction.

V78…有効ビット。V78… Effective bit.

OAGR30に格納された命令の情報は、命令がパイプライン
の各ステージでの処理が進むにつれて、OAGR30＝＝＞MM
UR31＝＝＞CCUR32＝＝＞IEPR33と転送される。The instruction information stored in OAGR30 is OAGR30 ==> MM as the instruction progresses in each stage of the pipeline.
Transferred as UR31 ==> CCUR32 ==> IEPR33.

OAGR30＝＝＞MMUR31では、OAG20で、Amode73、Areg74、
Disp75の情報に基づいて実効アドレス（論理アドレス）
の計算が行われる。OAGR30 ＝＝＞ In MMUR31, OAG20, Amode73, Areg74,
Effective address (logical address) based on Disp75 information
Is calculated.

MMUR31＝＝＞CCUR32では、MMU5で、論理アドレスが物理
アドレスに変換される。またメモリ・アクセス権のチェ
ックが行われる。MMUR31 ==> In CCUR32, the logical address is converted to the physical address by the MMU5. Also, the memory access right is checked.

CCUR32＝＝＞IEPR33では、OPフィールド88で、μROM40
のアクセス（命令を実行する先頭マイクロ命令の読みだ
し）が行われる。CCUR32 ==> IEPR33, OP field 88, μROM40
Access (reading of the first microinstruction for executing the instruction) is performed.

図中の制御回路（Control）35は、SEPR34、OAGR30、MMU
R31、CCUR32、IEPR33に保持されている命令の情報およ
び、以下の信号を入力してパイプラインの状態制御、ハ
ザード検出、リオーダ・バッファ（RB）15の制御信号を
生成する回路である。制御回路35の詳細については第５
図、第６図を参照して後述する。The control circuit (Control) 35 in the figure is SEPR34, OAGR30, MMU.
This is a circuit that inputs the information of the instruction held in R31, CCUR32, and IEPR33 and the following signals to generate pipeline state control, hazard detection, and reorder buffer (RB) 15 control signals. The details of the control circuit 35
This will be described later with reference to FIGS.

ストアバッファ・ビジー信号（Store Buffer Busy）102 μプログラム終了信号（μEND）103 キャッシュ・ミス信号（Cache miss）104 μ命令でのGRへのライト信号（μ−ｗ−GR）105 カレント・ファイル（CF）13はプログラムシーケンス順
に従って更新される汎用レジスタ値を保持するレジスタ
・ファイルであり、フューチャ・ファイル（FF）14はSE
P17/IEP18での命令終了によりただちに更新される汎用
レジスタ値を保持するレジスタ・ファイルである。Store buffer busy signal 102 μ Program end signal (μEND) 103 Cache miss signal (Cache miss) 104 Write signal to GR by μ instruction (μ-w-GR) 105 Current file (CF ) 13 is a register file that holds general-purpose register values that are updated according to the program sequence order, and future file (FF) 14 is an SE file.
This is a register file that holds general-purpose register values that are updated immediately when the instruction in P17 / IEP18 ends.

リオーダ・バッファ（RB）15は、SEP17とIEP18の２つの
命令実行部でプログラムシーケンス順とは異なって終了
する命令実行結果を一時保持し、プログラムシーケンス
順にCF13を更新するためのバッファである。本実施例で
はRB15は８エントリであり、以下のフィールドから構成
される。The reorder buffer (RB) 15 is a buffer for temporarily holding an instruction execution result which is different from the program sequence order in the two instruction execution units of the SEP 17 and the IEP 18, and updates the CF 13 in the program sequence order. In this embodiment, RB15 has 8 entries and is composed of the following fields.

State 106…エントリの有効／無効および実行中／実行
終了を示す。State 106: Indicates whether the entry is valid / invalid and is being executed / finished.

R/M107…命令のデスティネーションがレジスタかメモリ
かを示す。R / M107 ... Indicates whether the instruction destination is a register or memory.

＃dest 108…デスティネーションがレジスタの場合のレ
ジスタ番号を示す。# Dest 108 ... Indicates the register number when the destination is a register.

Result 109…命令の実行結果を保持する。Result 109 ... Holds the execution result of the instruction.

Flg 110…命令の実行結果のフラグを保持する。Flg 110: Holds the flag of the execution result of the instruction.

Error 111…命令の実行結果でエラーがあった場合のエ
ラー情報を示す。Error 111 ... Indicates error information when there is an error in the execution result of the instruction.

PC112…命令の先頭アドレス。PC112 ... Start address of instruction.

RB15への情報の登録は、SEPR34に保持されている命令が
SEP17で実行されるタイミングないしは、OAGR30に保持
されている命令がMMUR31に転送されるタイミングで行わ
れる。図中のtail 113、head 114は各々RB15に登録され
た最も新しい命令情報を保持するエントリ＋１、および
最も旧い命令情報を保持するエントリをポイントするレ
ジスタである。RB15へは１サイクルでtail 113がポイン
トするエントリおよびtail＋１がポイントするエントリ
に２命令分の情報を同時に登録できる。またRB15から
は、head 114がポイントするエントリのState 106が実
行終了状態であれば、そのエントリのResult 109、Flg
110に保持されている実行結果にしたがってCF13およびF
lgレジスタ115の値が更新される。またError 111にエラ
ー情報がある場合には、μプログラム、シーケンス制御
部にエラー信号を発生し、エラー処理のμプログラム・
ルーチンを起動する。RB15からのデータ読み出しは１サ
イクルで最大１命令分行うことができる。For the registration of information in RB15, the instruction held in SEPR34
This is performed at the timing of being executed by SEP17 or the timing of the instruction held in OAGR30 being transferred to MMUR31. In the figure, tail 113 and head 114 are registers that point to the entry +1 holding the newest instruction information registered in RB15 and the entry holding the oldest instruction information, respectively. Information for two instructions can be simultaneously registered in RB15 in the entry pointed to by tail 113 and the entry pointed by tail + 1 in one cycle. From RB15, if State 106 of the entry pointed to by head 114 is the execution end state, Result 109, Flg of that entry
CF13 and F according to the execution result held in 110
The value of the lg register 115 is updated. If there is error information in Error 111, an error signal is generated in the μ program and sequence control unit,
Invoke the routine. Data can be read from RB15 for a maximum of one instruction in one cycle.

RB15へ命令情報を登録した時にはtail 113は＋１ないし
＋２カウントアップされる。またRB15のデータ読みだし
が行われたときはhead 114は−１カウントダウンされ
る。When command information is registered in RB15, tail 113 is incremented by +1 or +2. When the data is read from RB15, the head 114 is decremented by -1.

第５図は、第４図中の制御回路35の内部ブロックをしめ
したものである。制御回路35は、パイプライン・レジス
タのレジスタに関する情報を基にハザード・チェックを
行う部分と、パイプライン・レジスタの有効信号、ハザ
ード・チェック信号等を基にパイプラインの状態制御を
行う状態制御回路（State Control Circuit）120から構
成される。FIG. 5 shows an internal block of the control circuit 35 shown in FIG. The control circuit 35 is a state control circuit that performs a hazard check based on the information about the pipeline register and a pipeline state control based on the pipeline register valid signal and the hazard check signal. (State Control Circuit) 120.

図中のハーザードF/F121は、16ビットのレジスタで、パ
イプライン・レジスタ（MMUR31、CCUR32、IEPR33）の命
令が汎用レジスタに結果を書き込むとき、対応するビッ
トに１がセットされていて、この情報を基にハザード検
出を行う。ハザードF/F121は、OAGR30のR/M70、＃dest7
1をデコーダ124でデコードした結果でセットされ、IEPR
33のR/M97、＃dest98をデコーダ127でデコードした結果
でリセットされる。The Hazard F / F121 in the figure is a 16-bit register. When an instruction of the pipeline register (MMUR31, CCUR32, IEPR33) writes the result to the general-purpose register, 1 is set in the corresponding bit. Hazard detection is performed based on. Hazard F / F121 is OAGR30 R / M70, # dest7
Set by the result of decoding 1 by the decoder 124, IEPR
It is reset by the result of decoding the R / M97, # dest98 of 33 with the decoder 127.

SEPR34に保持されている命令がSEP17で実行できる条件
は、ソース／ディスティネーションに使用するレジスタ
ともに書き変わる可能性がないときである。（すなわち
ハザードF/F121の対応するビットに１がたっていないと
き）この条件の検出は、デコーダ122、デコーダ123でSE
PR34のR/161、＃src62および＃dest63をデコードした結
果とハザードF/F121値とを比較回路CMP1 128、CMP2 129
で比較しその結果のOR出力信号（ハザード（SEP））133
で行う。この条件が満足されるときハザード（SEP）133
が０となり、満足されないときはハザード（SEP）133が
１となる。The condition that the instruction held in SEPR34 can be executed in SEP17 is that there is no possibility that the registers used for source / destination will be rewritten. (In other words, when the corresponding bit of the hazard F / F121 is not 1), the decoder 122 and the decoder 123 can detect this condition.
PR34 R / 161, # src62 and # dest63 decoding result and hazard F / F121 value comparison circuit CMP1 128, CMP2 129
OR output signal (hazard (SEP)) 133
Done in. Hazard (SEP) 133 when this condition is met
Becomes 0, and if not satisfied, the hazard (SEP) 133 becomes 1.

同様にしてOAGR30のAmode73、Areg74をデコーダ4 125で
デコードした結果とハザードF/F121の値を比較回路CMP3
130で比較し、その出力信号（hazard（OAG））134が０
のときOAG20で実効アドレスの計算が可能となる。Similarly, the result of decoding Amode73 and Areg74 of OAGR30 by the decoder 4 125 and the value of hazard F / F121 are compared circuit CMP3
The output signal (hazard (OAG)) 134 is 0 compared with 130.
Then, the effective address can be calculated by OAG20.

またCCUR32のR/M89、＃src90をデコーダ5 126でデコー
ドした結果とハザードF/F121の値とを比較回路CMP4 131
で比較し、その出力信号（ハザード（CCU））135が０の
ときソース・オペランド（レジスタ）の読み出しが可能
となる。The comparison circuit CMP4 131 compares the result of decoding the CCUR32 R / M89, # src90 with the decoder 5 126 and the value of the hazard F / F 121.
When the output signal (hazard (CCU)) 135 is 0, the source operand (register) can be read.

状態制御回路（State Control Circuit）120は、上での
べた３つのハザード信号（hazard（SEP）133、hazard
（OAG）134、hazard（CCU）135）、パイプライン・レジ
スタの有効信号（Ｖ（IEP）101、Ｖ（CCU）96、Ｖ（MM
U）87、Ｖ（OAG）78）やレジスタ／メモリ信号（R/M（I
EP）97、R/M1（MMU）89、R/M2（MMU）91）およびIIL12
外部からの信号（ストア・バッファビジー信号102、μE
ND103、Cache miss104、μ−ｗ−GR105）を入力して、
パイプラインの状態制御を行う以下の信号を出力する。The state control circuit (State Control Circuit) 120 has three hazard signals (hazard (SEP) 133, hazard
(OAG) 134, hazard (CCU) 135), pipeline register valid signals (V (IEP) 101, V (CCU) 96, V (MM)
U) 87, V (OAG) 78) and register / memory signals (R / M (I
EP) 97, R / M1 (MMU) 89, R / M2 (MMU) 91) and IIL12
External signal (store / buffer busy signal 102, μE
ND103, Cache miss104, μ-w-GR105),
The following signals that control the pipeline status are output.

SEP−136…SEPRに保持されている命令がSEPで実行可能
なとき１になる。SEP-136: Set to 1 when the instruction held in SEPR can be executed by SEP.

OAG−MMU137…OAGRに保持されている命令が次サイクル
でMMURに進めるとき１になる。OAG-MMU137 ... It becomes 1 when the instruction held in OAGR advances to MMUR in the next cycle.

MMU−CCU138…MMURに保持されている命令が次サイクル
でCCURに進めるとき１になる。MMU-CCU138 ... Becomes 1 when the instruction held in MMUR advances to CCUR in the next cycle.

CCU−IEP139…CCURに保持されている命令が次サイクル
でIEPRに進めるとき１になる。CCU-IEP139 ... Becomes 1 when the instruction held in CCUR advances to IEPR in the next cycle.

IEP−SB140…IEPRに保持されている命令が次サイクルで
ストアバッファに情報を転送するとき１になる。IEP-SB140 ... Becomes 1 when the instruction held in IEPR transfers information to the store buffer in the next cycle.

第６図は、第５図中のステートコントロール回路の具体
的な回路例である。FIG. 6 is a specific circuit example of the state control circuit shown in FIG.

次に、第１図を参照して上記、本発明に従うマイクロプ
ロセッサのパイプライン処理動作の概略について説明す
る。An outline of the pipeline processing operation of the microprocessor according to the present invention will be described below with reference to FIG.

すなわち、パイプライン処理動作の概略は以下の様にな
る。That is, the outline of the pipeline processing operation is as follows.

（１）IF（命令フェッチ）ステージ IFUにおいて、命令キャッシュメモリ８からの命令のフ
ェッチを行うステージ。 (1) IF (instruction fetch) stage In IFU, a stage for fetching instructions from the instruction cache memory 8.

（２）ID（命令デコード）ステージ DCUにおいて、命令デコーダ10で命令のデコードを行
い、内部命令フォーマットに変換する。なお内部命令フ
ォーマットに変換された命令はデコード済命令ループバ
ッファ11に格納される。そして、内部命令フォーマット
はメモリ・オペランドを持たない基本命令とメモリ・オ
ペランドを持つ基本命令ないしは高機能命令の２種類あ
り、命令発行制御回路（IIL）12によってそれぞれ発行
される。(2) ID (instruction decode) stage In the DCU, the instruction decoder 10 decodes an instruction and converts it into an internal instruction format. The instruction converted into the internal instruction format is stored in the decoded instruction loop buffer 11. There are two types of internal instruction formats, a basic instruction having no memory operand and a basic instruction having a memory operand or a high-performance instruction, which are issued by the instruction issue control circuit (IIL) 12, respectively.

（３）OAG（オペランド実効アドレス算出）ステージ OAG20のアドレス発生回路47で、命令発行制御回路（II
L）12によって発行されたメモリオペランドを持つ命令
のメモリ・オペランド実効アドレス（論理アドレス）を
算出するステージ。(3) OAG (Operand Effective Address Calculation) Stage In the address generation circuit 47 of the OAG 20, the instruction issue control circuit (II
L) A stage for calculating the memory operand effective address (logical address) of the instruction having the memory operand issued by 12.

（４）MMU（アドレス変換）ステージ MMU5のアドレス変換バッファ21で、メモリ・オペランド
の論理アドレスを物理アドレスに変換するステージ。ま
た保護チェック回路22でメモリ保護のチェックも行われ
る。(4) MMU (address translation) stage A stage that translates a logical address of a memory operand into a physical address in the address translation buffer 21 of MMU5. The protection check circuit 22 also checks the memory protection.

（５）OF（オペランド・フェッチ）ステージ CCU6のデータ・キャッシュメモリ23からメモリ・オペラ
ンドを読み出すステージ。また、レジスタオペランドの
読み出しも行われる。(5) OF (operand fetch) stage This stage reads the memory operand from the data cache memory 23 of CCU6. The register operand is also read.

（６）IEP（命令実行）ステージ EXU4の高機能命令実行部（IEP）18において、μプログ
ラム制御でメモリオペランドを持つ基本命令あるいは高
機能命令を実行するステージ。(6) IEP (instruction execution) stage In the high-performance instruction execution unit (IEP) 18 of EXU4, a stage for executing a basic instruction or a high-performance instruction having a memory operand under μ program control.

（７）OS（オペランド・ストア）ステージ IEP18での実行結果をCCU6のストア・バッファ24に書き
込むステージ。ただしこのステージがあるのは、命令の
ディスティネーションがメモリの場合のみ。なお演算結
果はストア・バッファ24を介して、データ・キャッシュ
メモリ６とマイクロプロセッサ外部の主記憶に、パイプ
ライン処理とは非同期に書き込みが行われる。(7) OS (operand store) stage This stage writes the execution result of IEP18 to the store buffer 24 of CCU6. However, this stage exists only when the instruction destination is memory. The operation result is written to the data cache memory 6 and the main memory outside the microprocessor via the store buffer 24 asynchronously with the pipeline processing.

（８）SEP（命令実行）ステージ EXU4の基本命令実行部（SEP）17において、ハードワイ
ヤード制御で命令発行制御回路（IIL）12によって発行
された基本命令を実行するステージ。なおSEP17で実行
される命令は、メモリ・オペランドを持たない基本命令
のみ。(8) SEP (Instruction Execution) Stage In the basic instruction execution unit (SEP) 17 of the EXU4, a stage for executing the basic instruction issued by the instruction issue control circuit (IIL) 12 under hardwired control. Note that the only instructions executed in SEP17 are basic instructions that have no memory operand.

（９）RE（リオーダ）ステージ IEP18およびSEP17よりの実行結果をリオーダバッファ
（RB）15によりリオーダしてカレントファイル（CF）13
に書込むステージ。(9) RE (reorder) stage The execution results from IEP18 and SEP17 are reordered by the reorder buffer (RB) 15 and the current file (CF) 13
Write on stage.

以上のパイプライン処理のうち、（６）IEPを除く他の
ステージは、基本的には１サイクルでその処理が終了す
る。ただしキャッシュ・ミス、TLBミスが生じたときに
は、（１）IF、（４）MMU、（５）OFのステージの処理
も複数サイクル必要となる。また、ハザード（例えば、
IEPステージの実行結果を実効アドレス算出に使用する
等）が生じたときには、いわゆる“待ち”が生じて１サ
イクルで処理が終了しなくなる。Of the above pipeline processing, the other stages except (6) IEP basically complete in one cycle. However, when a cache miss or TLB miss occurs, multiple cycles are required to process the stages of (1) IF, (4) MMU, and (5) OF. In addition, hazards (for example,
When the execution result of the IEP stage is used for calculating the effective address, etc.), so-called “wait” occurs and the process does not end in one cycle.

本発明の特徴は、複数の命令実行部を持ち命令の並列実
行を可能とすることである。すなわち本実施例では、主
に、SEPステージおよびREステージが新らたに加わった
点が従来技術と比べて新しい。A feature of the present invention is that it has a plurality of instruction execution units and enables parallel execution of instructions. That is, the present embodiment is mainly new in that the SEP stage and the RE stage are newly added as compared with the conventional technique.

次に、第７図および第８図を参照して、上記本発明の特
徴的な処理動作をさらに詳細に説明する。Next, with reference to FIG. 7 and FIG. 8, the characteristic processing operation of the present invention will be described in more detail.

第７図および第８図は、本発明の実施例（すなわち、基
本命令実行部SEP17がある場合）のパイプラインタイミ
ング例をそれぞれ示し、第７図のタイミング例は、第12
図に示した従来のタイミング例に対応し、第８図の例
は、第13図の従来のタイミング例に対応する。FIG. 7 and FIG. 8 respectively show pipeline timing examples of the embodiment of the present invention (that is, when the basic instruction execution unit SEP17 is present), and the timing example of FIG.
Corresponding to the conventional timing example shown in the figure, the example of FIG. 8 corresponds to the conventional timing example of FIG.

ただし第７図、第８図では簡単のために、デコード済命
令ループバッファ（DILB）11以降の部分のみ示し、命令
はDILB11中に有るものと仮定する。However, in FIGS. 7 and 8, for simplicity, only the portion after the decoded instruction loop buffer (DILB) 11 is shown, and it is assumed that the instruction is in the DILB 11.

第７図は、命令シーケンスが Im1＝＝＞IR2＝＝＞Im3＝＝＞IR4＝＝＝＞Im5＝＝＞IR6 の場合のパイプライン・タイミング例−１である（Im）
と基本命令（I_R）は（DILB）から１サイクルで同時に読
み出されたものとする。またIm1とIm3はディスティネー
ションがレジスタ、Im5はディスティネーションがメモ
リとし、ハザードは生じないものとする。サイクル１で
はIm1,I_R２の２命令分の情報がDILB11から読み出され、
IIL12のOAGR30およびSEPR34レジスタにセットされる。FIG. 7 shows an example of pipeline timing-1 when the instruction sequence is Im1 ==> IR2 ==> Im3 ==> IR4 ===> Im5 ==> IR6 (Im).
A basic instruction (I _R) is assumed to read out simultaneously in one cycle (dILB). Hazard does not occur because Im1 and Im3 use registers as destinations and Im5 uses memories as destinations. In cycle 1, information for two instructions Im1 and I _R2 is read from DILB11,
Set in OAGR30 and SEPR34 registers of IIL12.

命令Im1は、サイクル２で命令発行制御部12によって発
行され、実行アドレス生成部（OAG）20によって実行ア
ドレス算出が行われ、サイクル３でアドレス変換バッフ
ァ（TLB）21によってアドレス変換が行われ、サイクル
４でメモリ管理ユニット（MMU）５によってオペランド
フェッチが行われ、サイクル５で高機能命令実行部（IE
P）18によって実行され、サイクル６でディスティネー
ションがレジスタのためフューチャーファイル（FF）14
へその実行結果が書き込まれる（第７図のFFの欄の↑Im
1を参照）。The instruction Im1 is issued by the instruction issue control unit 12 in cycle 2, the execution address is calculated by the execution address generation unit (OAG) 20, and the address translation is performed by the address translation buffer (TLB) 21 in cycle 3, An operand fetch is performed by the memory management unit (MMU) 5 at 4, and the high-performance instruction execution unit (IE
Future file (FF) 14 because the destination is a register in cycle 6
Navel execution result is written (↑ Im in the FF column in FIG. 7)
See 1).

一方、これと並行して、基本命令I_R２は、サイクル２で
命令発行制御部12によって発行され、基本命令実行部
（SEP）17によって実行され、サイクル３でフューチャ
ーファイル14へその実行結果が書き込まれる（第７図FF
の欄の↑I_R２を参照）。On the other hand, in parallel with this, the basic instruction I _R 2 is issued by the instruction issue control unit 12 in the cycle 2 and executed by the basic instruction execution unit (SEP) 17, and the execution result is transferred to the future file 14 in the cycle 3. Written (Figure 7 FF
(See ↑ I _R 2 in the column of).

ここで、リオーダバッファ（RB）15への命令情報の登録
は、高機能命令Imは、実行アドレス算出ステージで行わ
れ、基本命令I_Rは、基本命令実行部17での実行ステージ
で行われるため、Im1およびBII_R２の情報は、図示する
如くサイクル３で登録される。第７図のRBの欄の命令の
上の“×”および“●”印は、命令が各々“実行中”お
よび“実行終了”であることを示している。Here, the registration instruction information to the reorder buffer (RB) 15, the advanced instruction Im is performed by the execution address calculation stage, the basic instruction I _R, to be done in the execution stage in the basic instruction execution unit 17 , Im1 and BII _R 2 information is registered in cycle 3 as shown. The "x" and "●" marks above the instruction in the RB column in FIG. 7 indicate that the instruction is "in execution" and "end of execution", respectively.

一方、リオーダバッファ15からカレントファイル（CF）
13への命令実行結果の書き込みは、リオーダバッファ
（RB）15において命令情報が削除されたサイクルで行わ
れる。従って、Im1の場合は、その命令情報がサイクル
７でリオーダバッファ15から削除されているため、サイ
クル７で、その実行結果が、カレントファイル（CF）13
へ書き込まれる。また、I_R２の場合は、その命令情報
が、サイクル８でリオーダバッファ15から削除されてい
るため、サイクル８でその実行結果が、カレントファイ
ル（CF）13へ書き込まれることとなる。Meanwhile, reorder buffer 15 to current file (CF)
The writing of the instruction execution result to 13 is performed in the cycle in which the instruction information is deleted in the reorder buffer (RB) 15. Therefore, in the case of Im1, since the instruction information is deleted from the reorder buffer 15 in cycle 7, the execution result is the current file (CF) 13 in cycle 7.
Is written to. In the case of I _R 2, the instruction information is deleted from the reorder buffer 15 in cycle 8, so the execution result is written in the current file (CF) 13 in cycle 8.

すなわち、フューチャーファイル（FF）14は、命令実行
後ただちに更新（書込み）されるため、プログラムシー
ケンス順とはなっていないが、カレントファイル（CF）
13は、リオーダバッファ15から命令情報が削除されるタ
イミングで更新されるためプログラムシーケンス順に命
令実行結果がファイルされている。That is, since the future file (FF) 14 is updated (written) immediately after the instruction is executed, it is not in the program sequence order, but the current file (CF)
13 is updated at the timing when the instruction information is deleted from the reorder buffer 15, so the instruction execution result is filed in program sequence order.

命令Im3、I_R４、Im5、I_R６の場合も、上述したと同様に
処理されるものである。Instructions Im3, I _R 4, Im5, even if the I _R 6, are intended to be processed in the same manner as described above.

第８図は、命令シーケンスが Ic1＝＝＞IR2＝＝＞IR3＝＝＞IR4＝＝＞Im5の場合のパ
イプライン・タイミング例−２である。この例の場合も
Ic1の実行に時間がかかっているが、その間にI_R２、I_R
３、I_R４の実行は基本命令実行部（SEP）17で先に終了
している。FIG. 8 shows a pipeline timing example-2 in the case where the instruction sequence is Ic1 ==> IR2 ==> IR3 ==> IR4 ==> Im5. Also in this example
Ic1 takes a long time to execute, but I _R 2 and I _R
3. The execution of I _R 4 is completed first by the basic instruction execution unit (SEP) 17.

すなわち、高機能命令Ic₁は、サイクル２〜４で、実行
アドレス算出、アドレス変換、およびオペランドフェッ
チが行われ、サイクル５〜８で高機能命令実行部（IE
P）18によって実行され、サイクル９でフューチャファ
イル（FF）14へその結果が書き込まれる。That is, the high-performance instruction Ic ₁ is subjected to execution address calculation, address conversion, and operand fetch in the cycles 2 to 4, and the high-performance instruction executing unit (IE
P) 18 and writes the result to feature file (FF) 14 in cycle 9.

一方、これと並行して、基本命令I_R2は、サイクル２で
基本命令実行部（SEP）17によって実行され、サイクル
３でフューチャファイル（FF）14へその実行結果が書き
込まれる。On the other hand, in parallel with this, the basic instruction I _R2 is executed by the basic instruction execution unit (SEP) 17 in cycle 2, and the execution result is written to the future file (FF) 14 in cycle 3.

ここで、リオーダバッファ（RB）15からカレントファイ
ル（CF）13への命令実行結果の書き込みは、リオーダバ
ッファ（RB）15において命令情報が削除されたサイクル
で行われる。Here, the writing of the instruction execution result from the reorder buffer (RB) 15 to the current file (CF) 13 is performed in the cycle in which the instruction information is deleted in the reorder buffer (RB) 15.

従って、第７図に示した例と同様に、カレントファイル
（CF）13には、プログラムシーケンス順に命令実行結果
がファイルされるものである。Therefore, as in the example shown in FIG. 7, the current file (CF) 13 contains the instruction execution result in the order of the program sequence.

以上、第７図および第８図の例からわかるように、本発
明では複数の命令実行部を持ち命令の並列実行すること
により、従来例で生じていたパイプラインの乱れを押さ
え、また各パイプライン・ステージの稼働率の低下を押
さえることができ、結果として大幅な性能向上を得るこ
とができる。As described above, as can be seen from the examples of FIGS. 7 and 8, according to the present invention, by having a plurality of instruction execution units and executing instructions in parallel, the disturbance of the pipeline that has occurred in the conventional example can be suppressed, and each pipe can be suppressed. It is possible to suppress a decrease in the operating rate of the line stage, and as a result, it is possible to obtain a significant performance improvement.

また、通常の命令実行状態において、フューチャーファ
イル（FF）14に保持されている汎用レジスタ値は、カレ
ントファイル（CF）13に保持される汎用レジスタ値と異
なっている。これはプログラム・シーケンス順では後の
メモリ・オペランドを持たない基本命令が、プログラム
・シーケンス順では前の高機能命令実行部（IEP）18で
実行されるメモリ・オペランドを持つ高機能命令より先
に基本命令実行部（SEP）17で実行され、フューチャー
ファイル（FF）14を更新するためである。ただし、高機
能命令実行部（IEP）18で実行した命令でエラー（割込
み）が発生した場合には、命令の再実行を保証するため
にフューチャーファイル（FF）14の値をカレントファイ
ル（CF）13の値に戻さなければならない。このためにカ
ウンタ119が用意されている。割込み処理μプログラム
・ルーチンではカウンタ119を利用してカレントファイ
ル（CF）13の値をフューチャーファイル（FF）14にコピ
ーすることができる。Further, in the normal instruction execution state, the general-purpose register value held in the future file (FF) 14 is different from the general-purpose register value held in the current file (CF) 13. This is because a basic instruction that does not have a later memory operand in the program sequence order precedes an advanced instruction that has a memory operand executed by the previous high-performance instruction execution unit (IEP) 18 in the program sequence order. This is because it is executed by the basic instruction execution unit (SEP) 17 and updates the future file (FF) 14. However, if an error (interrupt) occurs in an instruction executed by the high-performance instruction execution unit (IEP) 18, the value of the future file (FF) 14 is changed to the current file (CF) to guarantee the re-execution of the instruction. Must return to a value of 13. A counter 119 is provided for this purpose. In the interrupt processing μ program routine, the value of the current file (CF) 13 can be copied to the future file (FF) 14 using the counter 119.

第９図に発明の実施例を適用したMPUと周辺LSIから成る
システム構成例を示す。この例はVMEバス200につながる
比較的簡単なシステムであり、以下のLSI、ICから構成
される。FIG. 9 shows an example of a system configuration including an MPU and a peripheral LSI to which the embodiment of the invention is applied. This example is a relatively simple system connected to the VME bus 200, and includes the following LSI and IC.

MPU201 ICT202…割込みコントローラ CG203…クロック・ジェネレータメモリ…SRAM（０ウェイト32Kバイト） 204 EPROM（０ウェイト32Kバイト） 205 DRAM（３ウェイト4Mバイト） 206 通信インタフェース…セントロニクス１チャンネル207 RS232C 2チャンネル208 その他…T/Rトランシーバ／レシーバ209 Bufバッファ210,211 Decodeアドレス・デコーダ212 本発明を使用したMPUを使用したシステム構成は、従来
のMPUを使用したシステム構成と何ら変わるところはな
い。すなわち本発明を使用したMPUを使用することによ
りシステム・レベルで必要な付加回路は無く、高性能な
システムを構築することができる。MPU201 ICT202 ... Interrupt controller CG203 ... Clock generator Memory ... SRAM (0 wait 32K bytes) 204 EPROM (0 wait 32K bytes) 205 DRAM (3 waits 4M bytes) 206 Communication interface ... Centronics 1 channel 207 RS232C 2 channels 208 Others ... T / R transceiver / receiver 209 Buf buffer 210, 211 Decode address decoder 212 The system configuration using the MPU using the present invention is no different from the system configuration using the conventional MPU. That is, by using the MPU according to the present invention, there is no additional circuit required at the system level and a high performance system can be constructed.

次に、第10図および第11図を参照して本発明に従うマイ
クロプロセッサの第２実施例について説明する。Next, a second embodiment of the microprocessor according to the present invention will be described with reference to FIGS. 10 and 11.

前述した本発明の第一の実施例では、リオーダバッファ
（RB）15を用いることにより、本発明の目的を達成した
が、第２実施例ではリオーダバッファ（RB）15を用いな
いで本発明の目的を達成するようにしている。In the above-described first embodiment of the present invention, the object of the present invention was achieved by using the reorder buffer (RB) 15, but in the second embodiment, the reorder buffer (RB) 15 is not used. I try to achieve my purpose.

第10図は、第一の実施例の第４図に対応するものであ
り、第一の実施例と同じ要素には同じ番号をつけてあ
る。第一の実施例と第二の実施例の違いは、次の通りで
ある。FIG. 10 corresponds to FIG. 4 of the first embodiment, and the same elements as in the first embodiment have the same numbers. The difference between the first embodiment and the second embodiment is as follows.

まず、第二実施例は、第一の実施例の構成要素であるリ
ーダバッファ（RB）15を削除した構成となっている。そ
して、第一の実施例のカレントファイル（CF）13とフュ
ーチャファイル（FF）14は、第二の実施例では一つの汎
用レジスタファイル302になっている。ただしそのエン
トリ数は16の＜Ｘ＞パートおよび16の＜Ｙ＞パートの合
計32エントリからなる。第二の実施例では、フラグ（FL
GO-FLG3）、エラー情報（Erroro-3）、プログラムカウ
ンタ（PCO-3）を一時的に保持する４エントリのステー
タスファイル301および汎用レジスタファイル（GR）302
への書込み／読み出し信号を生成するGRコントロール回
路（GR Control）303を新たに加えている。First, the second embodiment has a configuration in which the reader buffer (RB) 15, which is a component of the first embodiment, is deleted. The current file (CF) 13 and future file (FF) 14 of the first embodiment are one general-purpose register file 302 in the second embodiment. However, the number of entries consists of 16 <X> parts and 16 <Y> parts, for a total of 32 entries. In the second embodiment, the flag (FL
GO-FLG3), error information (Erroro-3), 4-entry status file 301 and general-purpose register file (GR) 302 that temporarily hold the program counter (PCO-3)
A GR control circuit (GR Control) 303 for generating write / read signals to and from is newly added.

以下、第二の実施例の特徴について説明する。The features of the second embodiment will be described below.

第２の実施例の汎用レジスタ302は先に述べたように、1
6の＜Ｘ＞パートおよび16の＜Ｙ＞パートの合計32エン
トリからなる。＜Ｘ＞パートおよび＜Ｙ＞パートは、第
１の実施例のカレントファイル（CF）13とフューチャフ
ァイル（FF）14のように、一つの汎用レジスタRiに対し
て２本のレジスタ（Xi,Yi）を用意している。ただし第
一の実施例と異なる点は、＜Ｘ＞パートがカレントファ
イル、＜Ｙ＞パートがフューチャファイルと固定的でな
く、＜Ｘ＞パートのXiがカレントの値を保持しているレ
ジスタなら、対応する＜Ｘ＞パートのYiがフューチャの
値を保持しているレジスタ、あるいはYiがカレントの値
を保持しているレジスタなら、対応するXiがフューチャ
の値を保持しているレジスタ、と言う様に、各汎用レジ
スタRiに対し２本のレジスタ（Xi、Yi）がダイナミック
にその役割が切り替わることである。The general-purpose register 302 of the second embodiment is set to 1 as described above.
It consists of 6 <X> parts and 16 <Y> parts, for a total of 32 entries. Like the current file (CF) 13 and the future file (FF) 14 of the first embodiment, the <X> part and the <Y> part have two registers (Xi, Yi) for one general-purpose register Ri. ) Is prepared. However, the difference from the first embodiment is that the <X> part is not fixed to the current file and the <Y> part is not fixed to the future file, and if Xi of the <X> part is a register holding the current value, If Yi in the corresponding <X> part holds the value of the future, or if Yi holds the current value, then the corresponding Xi holds the value of the future. In addition, the role of the two registers (Xi, Yi) is dynamically switched for each general-purpose register Ri.

例えば、ある瞬間の＜Ｘ＞のパート、＜Ｙ＞パートのX
i、Yiの役割は次のようになっている。カレントレジス
タ値:X0 X1 X2 Y3 X4 X5 Y6 Y7 Y8 X9 X10 X11 X12 X13
Y14 Y15 フューチャレジスタ値:Y0 Y1 Y2 X3 Y4 Y5 X6 X7 X8 Y9
Y10 Y11 Y12 Y13 X14 X15 第11図は、第10図中のGRコントロール回路（GR Contro
l）303の内部ブロックを示したものである。第11図中に
は４つのIDレジスタ（304〜307）、＋１回路308およびG
Rアドレス生成回路（GR address generator）309から構
成される。GRアドレス生成回路309は、パイプラインレ
ジスタの汎用レジスタのアクセスに関する情報（310〜3
13）およびIDレジスタの値（314〜316）を入力して汎用
レジスタファイル302の読み出し／書込みアドレス信号
（318〜321）を出力するブロックである。GRアドレス生
成回路309には汎用レジスタファイル302の状態を示す３
つのフリップフロップ群（322〜324）がある。For example, the <X> part at a certain moment, the X of the <Y> part
The roles of i and Yi are as follows. Current register value: X0 X1 X2 Y3 X4 X5 Y6 Y7 Y8 X9 X10 X11 X12 X13
Y14 Y15 Future register value: Y0 Y1 Y2 X3 Y4 Y5 X6 X7 X8 Y9
Y10 Y11 Y12 Y13 X14 X15 Fig. 11 shows the GR control circuit (GR Contro
l) shows the internal block of 303. FIG. 11 shows four ID registers (304 to 307), +1 circuit 308 and G.
It is composed of an R address generator circuit (GR address generator) 309. The GR address generation circuit 309 provides information regarding access to general-purpose registers of pipeline registers (310 to 3).
13) and the value (314 to 316) of the ID register are input and the read / write address signals (318 to 321) of the general-purpose register file 302 are output. The GR address generation circuit 309 indicates the state of the general-purpose register file 302 3
There is one flip-flop group (322-324).

いま、第一の実施例で示した様なパイプライン構成の場
合には、プログラムの命令シーケンスと命令実行順序が
逆転するのは、“Ic命令に続くI_R命令列”であるただし
Icはメモリオペランドを持つ命令ないしは実行ステージ
に数サイクル要する複雑な命令（高機能命令）を示し、
I_Rはメモリオペランドを持たない基本命令を示す。また
この様なパイプライン構成の場合には、I_R命令は最大４
つのIc命令を飛び越して先に終了する可能性がある。例
えばいま命令シーケンスが（先頭）の場合で、Ic1の命令実行ステージのサイクル数が大き
い場合、I_R1,I_R２はIc1より先に実行が終了し、I_R３はI
c1,Ic2より先に実行が終了し、I_R４はIc1〜Ic3より先に
実行が終了し、I_R５、I_R６はIc1〜Ic4より先に実行が終
了することになる（ただしハザードが生じない場合）。
またI_R７はIc1の実行が終了するまで実行されない。Now, in the case of such a pipeline configuration shown in the first embodiment, the instruction sequence an instruction execution order of the program is reversed is a "Ic followed instruction I _R instruction sequence" However
Ic indicates an instruction with a memory operand or a complex instruction (high-performance instruction) that requires several cycles for the execution stage,
I _R indicates a basic instruction with no memory operand. In addition, in the case of such a pipeline configuration, there is a maximum of 4 I _R instructions.
It is possible to skip one Ic instruction and finish first. For example, the instruction sequence is now (first) In this case, if the number of cycles of the instruction execution stage of Ic1 is large, the execution of I _R 1, I _R 2 ends before Ic1 and I _R 3 of I
Execution ends before c1 and Ic2, I _R 4 ends before Ic1 to Ic3, and I _R 5 and I _R 6 end before Ic1 to Ic4 (however, hazard Does not occur).
I _R 7 is not executed until the execution of Ic1 is completed.

この場合問題となるのは、例えばIc1命令実行中に例外
が発生した場合、I_R１〜I_R６の実行により更新される汎
用レジスタおよびフラグ、PCなどのステータスを元に戻
す必要があることである。このためにI_R１〜I_R６の実行
結果は、まず汎用レジスタファイル302のカレントの値
を保持しているレジスタ（例えばXi）の対のレジスタ
（例えばYi）に書込み、またフラグ、PCなどのステータ
スもステータスファイル301に一時書き込む。そしてI_R
命令直前のIc命令が実行ステージを終了するサイクルで
I_R命令の結果を保持しているXiとYiの役割を切り替え
る。例えばこの例の場合、Ic1が実行ステージを終了す
るサイクルでI_R１とＯI_R２の結果を保持しているXiとYi
の役割を切り替える。It this case become a problem, for example, if an exception occurs during Ic1 instruction execution, it is necessary to undo the general purpose registers and flags are updated by the execution of the I _R 1 to I _R 6, the status, such as a PC Is. Execution result of the I _R 1 to I _R 6 For this, first, the pair of registers holds the current value of the general register file 302 has a register (e.g., Xi) (e.g. Yi) writing, also flags, PC etc. The status of is also temporarily written in the status file 301. And I _R
In the cycle where the Ic instruction immediately before the instruction ends the execution stage
Switch the roles of Xi and Yi that hold the result of the I _R instruction. For example, in the case of this example, Xi and Yi holding the results of I _R 1 and O I _R 2 in the cycle in which Ic1 ends the execution stage.
Switch roles.

この方法の利点はIc命令に後続するI_R命令はハザードが
生じないかぎり、いくつでも先行して実行することがで
き、第一の実施例に見られた様なリオーダバッファ15の
エントリ数による制限が生じないことである。またカレ
ントの値を保持しているレジスタは、GRアドレス生成回
路309中のF/F群によってXiまたはYiの切り替えを行うた
め、Ic命令の実行終了時に複数命令のI_Rの実行が終了し
ている場合にその結果を１サイクルで更新する（すなわ
ちXiとYiの役割を１サイクルで切り替える）ことができ
る。The advantage of this method is that the I _R instruction that follows the I c instruction can execute any number of instructions as long as there is no hazard, and is limited by the number of entries in the reorder buffer 15 as seen in the first embodiment. Does not occur. The register that holds the current value switches Xi or Yi by the F / F group in the GR address generation circuit 309, so that the execution of I _R of multiple instructions ends when the execution of the Ic instruction ends. If so, the result can be updated in one cycle (that is, the roles of Xi and Yi can be switched in one cycle).

次に具体的にどのようにしてXiとYiの役割を切り替え、
汎用レジスタファイル302の読み出し／書込みを制御す
るかについて説明する。Next, how to switch the roles of Xi and Yi,
How to control reading / writing of the general-purpose register file 302 will be described.

まず、どのようにしてIc命令とそれに続くIR命令に対し
て０〜３のID番号を割り当てるかについて説明する。
（前述の命令シーケンス列参照）第11図の＋１回路308は２ビットのカウンタで構成され
ているものとする。命令がIDステージでデコードされ、
その命令の種類がIc命令（Ic1）である場合には、次のO
AGステージに発行される際に、＋１回路308の値（０）
がID1レジスタ304に格納され、＋１回路308の値が＋１
カウントアップされ、＋１回路308の値が１になる。Ic1
命令に続く２つの命令の種類がIR命令（IR1,IR2）の場
合にはID1レジスタ304の値は更新されない。一方、IR2
命令の次の命令の種類がIc命令（Ic2）の場合には、こ
のIc2命令がOAGステージに発行される際に、ID1レジス
タ304の値（０）がID2レジスタ305に転送され、同時に
＋１回路308の値（１）がID1レジスタ304に格納され、
＋１回路308の値が１カウントアップされ、＋１回路308
の値が２になる。このように、Ic命令がOAGステージに
発行される際に＋１回路308の値がID1レジスタ304に転
送され、さらに、 ID1＝＞ID2＝＞ID3＝＞ID4 と各IDレジスタ304乃至307の値がシフトされることによ
り等価的にIc命令とそれに続くIR命令に対して０〜３の
ID番号を割り当てることができる。First, how to assign ID numbers 0 to 3 to the Ic instruction and the subsequent IR instruction will be described.
(Refer to the above-mentioned instruction sequence sequence) It is assumed that the +1 circuit 308 in FIG. 11 is composed of a 2-bit counter. The instruction is decoded in the ID stage,
If the type of instruction is an Ic instruction (Ic1), the next O
The value of the +1 circuit 308 (0) when issued to the AG stage
Is stored in the ID1 register 304, and the value of the +1 circuit 308 is +1.
It is counted up and the value of the +1 circuit 308 becomes 1. Ic1
If the two instruction types following the instruction are IR instructions (IR1, IR2), the value of the ID1 register 304 is not updated. On the other hand, IR2
When the next instruction type is the Ic instruction (Ic2), when this Ic2 instruction is issued to the OAG stage, the value (0) of the ID1 register 304 is transferred to the ID2 register 305, and at the same time, the +1 circuit The value (1) of 308 is stored in the ID1 register 304,
The value of the +1 circuit 308 is incremented by 1, and the +1 circuit 308
Becomes 2. In this way, when the Ic instruction is issued to the OAG stage, the value of the +1 circuit 308 is transferred to the ID1 register 304, and further, the values of ID1 => ID2 => ID3 => ID4 and the values of the ID registers 304 to 307 are By shifting, the Ic instruction and the following IR instruction are equivalently 0 to 3
You can assign an ID number.

従って、第11図中のIDレジスタ（304〜307）は各々パイ
プラインレジスタ（30〜33）中に保持されている命令の
ID番号を保持している。また汎用レジスタに対して以下
の３つのフリップフロップ（F/F）×16のF/F群を設け
る。Therefore, the ID registers (304 to 307) in FIG. 11 correspond to the instructions held in the pipeline registers (30 to 33), respectively.
Holds the ID number. The following three flip-flops (F / F) x 16 F / F groups are provided for the general-purpose register.

すなわち、フューチャF/F群322と、有効F/F群323と、ID
F/F群324とである。That is, the future F / F group 322, the effective F / F group 323, and the ID
The F / F group 324.

フューチャF/F群（Future F/F群）323は、16個のF/F
で、汎用レジスタファイル302のXiがカレントの値を保
持しているとき対応するフューチャF/Fiは１、Yiがカレ
ントの値を保持しているときフューチャF/Fiは０とな
る。Future F / F group 323 has 16 F / Fs
Then, when Xi of the general-purpose register file 302 holds the current value, the corresponding future F / Fi becomes 1, and when Yi holds the current value, the future F / Fi becomes 0.

有効F/F群（Valid F/F）323は、16個のF/Fで、フューチ
ャの値（フューチャF/Fiの値が１の時Yi、０の時Xiの
値）が有効な時１、そうでないとき０となる。The valid F / F group (Valid F / F) 323 has 16 F / Fs, and is 1 when the value of the future (Yi when the value of the future F / Fi is 1 and the value of Xi when the value is 0) is valid. , Otherwise 0.

ID F/F群324は、16個で２ビットのF/Fで、フューチャの
値が有効なとき、その値を書き込んだ命令のID番号を示
す。The ID F / F group 324 is a 16-bit 2-bit F / F, and when the value of the future is valid, it shows the ID number of the instruction that wrote the value.

GRアドレス生成回路309は、これらF/F群（322〜324）の
値、パイプラインレジスタの汎用レジスタのアクセス情
報（310〜313）およびIDレジスタの値（314〜316）をも
とに汎用レジスタファイル302の読み出し／書込み信号
（318〜321）やF/F群の値の更新の制御を次のようにし
て行う。The GR address generation circuit 309 uses the values of these F / F groups (322 to 324), the access information (310 to 313) of the general purpose registers of the pipeline register, and the values (314 to 316) of the ID register as the general purpose registers. The read / write signals (318 to 321) of the file 302 and the updating of the values of the F / F group are controlled as follows.

1.SEP17で実行されるI_R命令の実行に必要なソースオペ
ランドのレジスタRi（SEPR34の＃src62で指定される）
は、対応する有効F/Fi＝１の時は、フューチャの値（フ
ューチャF/Fiの値が１の時Yi、０の時Xiの値）、有効F/
Fi＝０の時は、カレントの値（フューチャF/Fiの値が１
の時Xi、０の時Yiの値）とする。1. Register Ri of the source operand required to execute the I _R instruction executed in SEP17 (specified by # src62 in SEPR34)
Is the value of the future when the corresponding valid F / Fi = 1 (Yi when the value of the future F / Fi is 1, the value of Xi when the value of the future F / Fi is 0), the valid F /
When Fi = 0, the current value (Future F / Fi value is 1
The value is Xi when, and the value Yi when 0).

2.SEP17で実行されるI_R命令の実行結果を格納するディ
スティネーションのレジスタRi（SEPR34の＃dest63で指
定される）は、先行命令が無い（実行が終了している;V
78＝V87＝V96＝V101＝０）場合には、カレント（フュー
チャF/Fiの値が１の時Xi、０の時Yi）、そうでないとき
にはフューチャ（フューチャF/Fiの値が１の時Yi、０の
時Xi）とする。2. The destination register Ri (specified by # dest63 of SEPR34) that stores the execution result of the I _R instruction executed in SEP17 has no preceding instruction (execution is completed; V
If 78 = V87 = V96 = V101 = 0), the current (Xi when the future F / Fi value is 1; Yi when 0), otherwise the future (Yi when the future F / Fi value is 1) , 0 when Xi).

3.OAG（実行アドレス算出）ステージに必要な汎用レジ
スタRi（OAGR30のAmode73、Areg74で指定される）は、
対応する有効F/Fi＝１の時は、フューチャの値（フュー
チャF/Fiの値が１の時Yi、０の時Xiの値）、有効F/Fi＝
０の時は、カレントの値（フューチャF/Fiの値が１の時
Xi、０の時Yiの値）とする。3. The general-purpose register Ri (specified by Amode73 and Areg74 of OAGR30) necessary for the OAG (execution address calculation) stage is
When the corresponding valid F / Fi = 1, the value of the future (Yi when the value of the future F / Fi is 1, the value of Xi when the value of the future F / Fi is 0), valid F / Fi =
When 0, the current value (when the Future F / Fi value is 1
Xi, the value of Yi when 0).

4.IEP（命令実行）ステージに必要なソースオペランド
（CCUR3のR/M189、＃SRC90で指定される）は、対応する
有効F/Fi＝１の時は、フューチャの値（フューチャF/F
の値が１の時Yi、０の時Xiの値）、有効F/Fi＝０の時
は、カレントの値（フューチャF/Fiの値が１の時Xi、０
の時Yiの値）とする。ただし有効F/Fi＝１の時でも対向
するIDF/F≠ID4 307の時は、ソースオペランドの読み出
しは待たされる。4. The source operand (specified by CCUR3 R / M189, # SRC90) required for the IEP (instruction execution) stage is the value of the future (future F / F when the corresponding valid F / Fi = 1).
When the value of 1 is Yi, when it is 0, the value of Xi), when the effective F / Fi = 0, the current value (when the value of Future F / Fi is 1, Xi, 0)
Value of Yi). However, even if the valid F / Fi = 1, if the opposite IDF / F ≠ ID4 307, the reading of the source operand is delayed.

5.IEP（命令実行）ステージでIc命令が終了する時に
は、そのIc命令と同じID番号を持ち、なおかつ有効F/Fi
＝１の汎用レジスタRiのフューチャF/Fiの値を反転し、
また有効F/Fiを０にリセットする。5.When the Ic instruction is completed at the IEP (instruction execution) stage, it has the same ID number as the Ic instruction, and the valid F / Fi
Inverts the value of the future F / Fi of the general-purpose register Ri of = 1
It also resets the valid F / Fi to 0.

6.SET17で実行されるI_R命令の実行結果のレジスタRi（S
EPR34の＃dest63で指定される）のフューチャ（フュー
チャF/Fiの値が１の時Yi、０の時Xi）が、このI_R命令と
異なるID番号≠ID F/Fiの場合）には、このIr命令の実
行は待たされる。Executed in 6.SET17 I _R of the instruction execution result register Ri (S
Fuyucha of to) specified in EPR34 of # dest63 (Xi of the hour Yi, 0 values of Fuyucha F / Fi is 1), in this case the I _R instruction different ID number ≠ ID F / Fi) is Execution of this Ir instruction is delayed.

7.SEP17で実行されるI_R命令の実行結果のフラグ（Fl
g）、エラー情報（Error）およびPCは、ステータスファ
イル301のID1レジスタ304の値317で示されるエントリー
に一時書き込まれる。そのエントリ番号と同じID番号の
Ic命令の実行終了時にそれらの値がFlg115,Error116、P
C117にセットされ、更新される。7. Flag of execution result of I _R instruction executed in SEP17 (Fl
g), error information (Error), and PC are temporarily written in the entry indicated by the value 317 of the ID1 register 304 of the status file 301. ID number that is the same as the entry number
At the end of execution of the Ic instruction, those values are Flg115, Error116, P
It is set in C117 and updated.

以上のようにして汎用レジスタファイル302の読み出し
／書込み信号（318〜321）やF/F群の更新の制御を行う
ことにより、比較的簡単なハードウェアで、本発明の目
的を達成することができる。By controlling the read / write signals (318 to 321) of the general-purpose register file 302 and the update of the F / F group as described above, the object of the present invention can be achieved with relatively simple hardware. it can.

従って、第一実施例の場合、割込みが発生した場合に、
フューチャファイル14の値をカレントファイル13の値に
戻す必要があり、これに最低16サイクルを必要で（汎用
レジスタが16本の場合）、これがオーバーヘッドとなり
性能低下の原因となっていたが、第二実施例の場合、１
つの汎用レジスタ302で行っているため、割り込みの発
生に対しても、値の移し換えの必要がないものであり、
性能低下は起こらない。Therefore, in the case of the first embodiment, when an interrupt occurs,
It is necessary to return the value of the future file 14 to the value of the current file 13, which requires at least 16 cycles (when there are 16 general-purpose registers), which causes overhead and causes performance degradation. In the case of the embodiment, 1
Since it is performed by one general register 302, it is not necessary to transfer the value even when an interrupt occurs.
No performance degradation occurs.

また、第一実施例の場合、プログラムシーケンス順で後
続する命令が先行する命令を飛び越して実行できる命令
数は、リオーダバッファ15のエントリ数によって制限さ
れる。すなわちエントリ数が小さければ性能が低下し、
またエントリ数を大きくするとハード量が増加してしま
う。In the case of the first embodiment, the number of instructions that can be executed by skipping the preceding instruction by the succeeding instruction in the program sequence order is limited by the number of entries in the reorder buffer 15. That is, if the number of entries is small, performance will decrease,
Also, increasing the number of entries will increase the amount of hardware.

それに対し、第二実施例の場合は、一つの汎用レジスタ
302においてＸパートとＹパートの役割を切り替えて書
込み読み出しを制御しているため、飛び越して実行でき
る命令数を大きくすることができる。On the other hand, in the case of the second embodiment, one general-purpose register
In 302, the roles of the X part and the Y part are switched to control the writing and reading, so that the number of instructions that can be skipped and executed can be increased.

また、第一実施例の場合、高速分岐の手法として分岐予
測を行う場合には、分岐予測が失敗した場合に汎用レジ
スタの値を元に戻すのに最低16サイクルを必要で、これ
がオーバーヘッドとなり性能低下の原因となっていた
が、第二実施例の場合、汎用レジスタの値を元に戻す必
要がないものである。Further, in the case of the first embodiment, when performing branch prediction as a method of high-speed branching, at least 16 cycles are required to restore the value of the general-purpose register when the branch prediction fails, which results in overhead and performance. Although it caused the decrease, in the case of the second embodiment, it is not necessary to restore the value of the general register.

［発明の効果］以上説明したように、この発明によれば、第１の種類の
命令と第２の種類の命令を、パイプライン方式によりそ
れぞれ独立して並列実行処理するようにしたので、パイ
プラインにおける所定のステージでの稼働率の低下を防
止するとともに、パイプラインの乱れを抑制することが
可能となる。これにより、性能を大幅に向上させたマイ
クロプロセッサを提供することができる。[Effect of the Invention] As described above, according to the present invention, the first type instruction and the second type instruction are independently executed in parallel by the pipeline method. It is possible to prevent the operation rate from decreasing at a predetermined stage in the line and suppress the disturbance of the pipeline. As a result, it is possible to provide a microprocessor with significantly improved performance.

[Brief description of drawings]

第１図は、本発明を実施したマイクロプロセッサの内部
全体構造を示すブロック図、第２図は、第１図に示したマイクロプロセッサにおける
要部ブロック図、第３図は、第２図に示すブロック図の各ブロック図の内
部をさらに詳細に示したブロック図、第４図は、第３図におけるIIL、RB、CF、FFの詳細図、第５図は、第４図における制御回路の詳細図、第６図は、第５図に示す状態制御回路の詳細図、第７図および第８図は、本発明の実施例におけるパイプ
ライン処理動作のタイミング図、第９図は、本発明の実施例を適用したMPUと周辺LSIから
成るシステム構成図、第10図は、本発明に従うマイクロプロセッサの第二実施
例の要部構成図、第11図は、第10図におけるGRコントロール回路の詳細
図、第12図および第13図は、従来例におけるパイプライン処
理動作のタイミング図である。１……命令フェッチユニット（IFU）２……デコードユニット（DCU）３……命令発行ユニット（IIU）４……命令実行ユニット（EXU）５……メモリ管理ユニット（MMU）６……キャッシュ制御ユニット（CCU）７……入出力部（I/O） 10……命令デコーダ 11……デコード済命令ループバッファ（DIL） 12……命令発行制御回路（IIL） 13……カレントファイル（CF） 14……フューチャファイル（FF） 15……リオーダバッファ（RB） 17……基本命令実行部（SEP） 18……高機能命令実行部（IEP） 20……実行アドレス生成部（OAG） 21……アドレス変換バッファ（TLB） 23……データキャッシュメモリFIG. 1 is a block diagram showing the entire internal structure of a microprocessor embodying the present invention, FIG. 2 is a block diagram of essential parts in the microprocessor shown in FIG. 1, and FIG. 3 is shown in FIG. FIG. 4 is a detailed block diagram showing the inside of each block diagram. FIG. 4 is a detailed diagram of IIL, RB, CF, and FF in FIG. 3, and FIG. 5 is a detailed control circuit in FIG. 6 and 6 are detailed diagrams of the state control circuit shown in FIG. 5, FIGS. 7 and 8 are timing diagrams of pipeline processing operation in the embodiment of the present invention, and FIG. 9 is a timing diagram of the present invention. FIG. 10 is a configuration diagram of a system including an MPU and a peripheral LSI to which the embodiment is applied, FIG. 10 is a configuration diagram of a main part of a second embodiment of a microprocessor according to the present invention, and FIG. 11 is a detail of a GR control circuit in FIG. Figures 12, 13 and 13 show the pie in the conventional example. It is a timing diagram of a plan processing operation. 1 ... Instruction fetch unit (IFU) 2 ... Decode unit (DCU) 3 ... Instruction issue unit (IIU) 4 ... Instruction execution unit (EXU) 5 ... Memory management unit (MMU) 6 ... Cache control unit (CCU) 7 …… Input / output unit (I / O) 10 …… Instruction decoder 11 …… Decoded instruction loop buffer (DIL) 12 …… Instruction issue control circuit (IIL) 13 …… Current file (CF) 14 ... ... Future file (FF) 15 ... Reorder buffer (RB) 17 ... Basic instruction execution part (SEP) 18 ... High-performance instruction execution part (IEP) 20 ... Execution address generation part (OAG) 21 ... Address conversion Buffer (TLB) 23 …… Data cache memory

Claims

[Claims]

1. A first execution processing means for executing, by microprogram control, a first type instruction of a decoded instruction which is executed by obtaining the same processing step, and a first execution processing means of the first type. A second type of instruction that executes a second type instruction whose processing process is different from that of the instruction by hardwired control
Execution processing means and the decoded instructions are issued in the order of the program sequence, and it is determined which of the first execution processing means and the second execution processing means executes the issued instructions. Then, the control means for operating the first execution processing means and the second execution processing means independently and in parallel, and the execution of the instruction by the first execution processing means or the second execution processing means. Is completed, the first information holding area for immediately writing the execution result of the first type instruction or the second type instruction, and the second information holding area for holding the execution result according to the program sequence order And an information holding unit having a first information holding area, which is stored in the first information holding area after the execution of the first type instruction immediately before the second type instruction is completed. Microplate processor characterized by having a a switching means replaces a program sequence order to the first type of the second information storage area execution result and the execution result of said second type of instructions in the instruction.