JPH09311786A

JPH09311786A - Data processor

Info

Publication number: JPH09311786A
Application number: JP5277297A
Authority: JP
Inventors: Masahiro Uminaga; 正博海永; Yasuhiko Saito; 靖彦斎藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-03-18
Filing date: 1997-03-07
Publication date: 1997-12-02

Abstract

PROBLEM TO BE SOLVED: To reduce a pipeline stall due to a data hazard of a superscalar system and to improve the processing speed by changing an instruction in 1st instruction format stored in an instruction memory into an instruction in 2nd instruction format. SOLUTION: The instruction is taken in a 1st stage from the instruction memory and the instruction taken in the 1st stage 101 is decoded in a 2nd stage 103. The decoded instruction is executed in a 3rd stage and when the execution result is written in a register in a 4th stage 107, the instruction in the 1st instruction format stored in the instruction memory is changed into the instruction in the 2nd instruction format and executed. Consequently, the pipeline stall due to the data hazard of the superscalar system can be reduced and the processing speed is improved.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、マイクロプロセッ
サやマイクロコンピュータ等のデータ処理装置に係わ
り、特にスーパスカラ等の並列処理を行うデータ処理装
置に適用して有効な技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device such as a microprocessor or a microcomputer, and more particularly to a technique effective when applied to a data processing device such as superscalar for parallel processing.

【０００２】[0002]

【従来の技術】マイクロプロセッサ（ＣＰＵ(Central P
rocessing Unit)、マイクロコンピュータ等の総称とし
て以下使用する。）は、命令の列を順次にフェッチし、
解読し、実行していく。マイクロプロセッサが実行して
いく命令は、デコード回路の簡単化を狙って現在固定長
のものが広まってきている。固定長命令をパイプライン
方式(Pipelining)で実行するマイクロプロセッサは、Ｒ
ＩＳＣ(Redused Instruction Set Computer)型プロセッ
サと呼ばれている。2. Description of the Related Art A microprocessor (CPU (Central P
rocessing unit), microcomputer, etc. ) Sequentially fetches a sequence of instructions,
Decrypt and execute. The instructions executed by the microprocessor are now of fixed length with the aim of simplifying the decoding circuit. A microprocessor that executes fixed-length instructions in a pipeline (Pipelining)
It is called an ISC (Red used Instruction Set Computer) type processor.

【０００３】図１は、マイクロプロセッサのパイプライ
ン化された実現方法を示したものである。ここでは簡単
化のため通常は存在するメモリアクセスのステージ（Ｍ
ＥＭ）を省略してある。個別のステージ（１０１、１０
３、１０５、１０７）は１単位の時間刻み（クロック）
で実行され、最初のステージから最後のステージまでラ
ッチ群（１０２、１０４、１０６）を介して順次に処理
を積み重ねていくことで個別の命令処理が完了する。第
１ステージ１０１は命令フェッチを行う（ＩＦ）。第２
ステージ１０３は命令の解釈及びレジスタの読み出しを
行う（ＩＤ）。第３ステージ１０５は命令機能が指定し
た演算を実行する（ＥＸ）。第４ステージ１０７は演算
結果を信号線１０８を介して第２ステージ１０３内に配
置されたレジスタに書き込みを行う（ＷＢ）。FIG. 1 shows a pipelined implementation method of a microprocessor. Here, for simplification, the normally existing memory access stage (M
EM) is omitted. Individual stages (101, 10)
3, 105, 107) is a unit of time (clock)
The individual instruction processing is completed by sequentially stacking the processing from the first stage to the last stage through the latch group (102, 104, 106). The first stage 101 fetches an instruction (IF). Second
The stage 103 interprets an instruction and reads a register (ID). The third stage 105 executes the operation designated by the instruction function (EX). The fourth stage 107 writes the calculation result to the register arranged in the second stage 103 via the signal line 108 (WB).

【０００４】図２にはパイプラインで４つの命令を処理
していく時の概念図が示される。後続の命令が先行命令
のレジスタの内容を使用する場合は、後続の命令のパイ
プラインに空きができてしまう(データ・ハザードによ
るパイプライン・ストールと呼ばれる)。この様子が図
２の（ａ）に示されている。図２の（ａ）内の左下を向
いた２つの矢印は先行命令のレジスタ書き込み後、後続
命令のレジスタ読み出しを示している。FIG. 2 shows a conceptual diagram when processing four instructions in a pipeline. If a subsequent instruction uses the contents of the register of the preceding instruction, there is room in the pipeline for the subsequent instruction (called pipeline stall due to data hazard). This state is shown in FIG. The two arrows pointing to the lower left in (a) of FIG. 2 indicate the register read of the subsequent instruction after the register write of the preceding instruction.

【０００５】したがって、この問題を解決する手段とし
て、後続の命令が前の演算結果を使用する場合にはその
値を信号線１０８を介して第３ステージ１０５内の演算
器にも送出する。以上のための制御線が信号線１０９、
１１０である。この調整はフォワーディング(Forwardin
g)として知られており、これにより１クロック毎の実行
が可能となる。なお、図２の（ｂ）内の左下を向いた２
つの矢印はフォワーディングを示している。したがって
個別の命令処理に要するクロック数は例えば４となる。
しかし、個別のステージが毎クロック新たな命令を処理
していくので、命令処理は１クロック当たり１命令とな
る。したがって、１命令が１クロックで実行できるの
で、ある処理(プログラム)を行うための実行命令数が少
ないほど実行時間が短くなる。Therefore, as a means for solving this problem, when the subsequent instruction uses the result of the previous operation, that value is also sent to the operator in the third stage 105 via the signal line 108. The control line for the above is the signal line 109,
110. This adjustment is
This is known as g), which allows execution every clock. It should be noted that 2 facing downward left in FIG.
The two arrows indicate forwarding. Therefore, the number of clocks required for individual instruction processing is four, for example.
However, since each stage processes a new instruction every clock, the instruction processing becomes one instruction per clock. Therefore, since one instruction can be executed in one clock, the smaller the number of execution instructions for performing a certain process (program), the shorter the execution time.

【０００６】なお、パイプライン及びフォワーディング
については、１９９４年Morgan Kaufman Publishers, I
nc. 発行のHennessy et al.「Computer Organization a
nd Design」第６章Enhancing Performance with Pipeli
ning(３６２頁から４５０頁)に記載されている。Regarding pipeline and forwarding, 1994 Morgan Kaufman Publishers, I
Hennessy et al., `` Computer Organization a
nd Design ”Chapter 6 Enhancing Performance with Pipeli
ning (pages 362 to 450).

【０００７】次に、マイクロプロセッサの処理速度を向
上する方式の１例として、スーパスカラ方式(Superscal
ar)がある。スーパスカラ方式は、同時に実行できる演
算器の数を複数個、例えば２個にし、それに応じて命令
フェッチと命令デコードも１時期に２つ行えるようにし
たものである。この場合、図３の（ａ）データ依存無し
に示されるように、理想的には１クロック毎に２つの命
令が実行可能にされるので、通常のパイプライン方式に
較べ実行時間が半分になる。なお、スーパスカラ方式に
ついては、日経エレクトロニクス、１９８９年１１月２
７日号(No.487)、１９１頁から２００頁の「次世代ＲＩ
ＳＣ、並列処理を導入しＣＭＯＳで１００ＭＩＰＳねら
う」に記載されている。Next, as an example of a method for improving the processing speed of a microprocessor, a superscalar method (Superscalar method)
ar). In the superscalar system, the number of arithmetic units that can be executed at the same time is plural, for example, two, and accordingly, two instruction fetches and two instruction decodes can be performed at one time. In this case, as shown in FIG. 3A without data dependence, ideally, two instructions can be executed every clock, so that the execution time becomes half as compared with the normal pipeline method. . For the superscalar method, see Nikkei Electronics, November 2, 1989.
7th issue (No.487), pages 191 to 200, "Next Generation RI
SC, aiming for 100 MIPS in CMOS by introducing parallel processing ”.

【０００８】従来のスーパスカラ方式を採用しているＲ
ＩＳＣ型のマイクロプロセッサの命令長は４バイト固定
であり、算術演算等の演算命令のオペランド数は３つと
なっているのが一般的である。この例は、特開平２―１
３０６３４号に記載されている。一方、コード効率を高
める（命令を格納するメモリの使用量を少なくする）た
めに、２バイト固定長命令のＲＩＳＣ型のマイクロプロ
セッサがある。ただし、前記２バイト固定長命令のＲＩ
ＳＣ型のマイクロプロセッサにはスーパスカラ方式は採
用されていない。この例は、特開平５―１９７５４６号
に記載されている。R adopting the conventional superscalar system
The instruction length of an ISC type microprocessor is fixed at 4 bytes, and the number of operands of arithmetic instructions such as arithmetic operations is generally three. This example is disclosed in Japanese Patent Laid-Open No. 2-1
No. 30634. On the other hand, there is a RISC-type microprocessor with a 2-byte fixed length instruction in order to improve code efficiency (reduce the amount of memory used for storing instructions). However, the RI of the 2-byte fixed length instruction
The SC type microprocessor does not employ the superscalar system. This example is described in JP-A-5-197546.

【０００９】[0009]

【発明が解決しようとする課題】スーパスカラ方式によ
り生ずる課題を明らかにするために、図３を用いて説明
する。図３に示される命令の動作が下記に示される。In order to clarify the problems caused by the superscalar method, description will be made with reference to FIG. The operation of the instructions shown in FIG. 3 is shown below.

【００１０】（１）mov R3, R2 「レジスタR3の内容を
レジスタR2に複写」（２）mov #32, R5「データ*32*をレジスタR5に複写」（３）add R4, R2 「レジスタR4の内容とR2の内容を
加算して、結果をR2に格納」（４）and R3, R5 「レジスタR3の内容とR5の内容を
論理積して、結果をR5に格納」上記命令（１）と命令（２）、及び命令（３）と命令
（４）にはそれぞれデータの依存性（データフロー）は
ない。しかし、命令（１）と命令（３）、及び命令
（２）と命令（４）にはそれぞれデータの依存性（デー
タフロー）がある。すなわち、命令（１）と命令（３）
の両方でレジスタR2を使用する。また、命令（２）と命
令（４）の両方でレジスタR5を使用する。従って、命令
（１）の実行後に命令（３）を実行しなければならな
い。また、命令（２）の実行後に命令（４）を実行しな
ければならない。(1) mov R3, R2 "copy contents of register R3 to register R2" (2) mov # 32, R5 "copy data * 32 * to register R5" (3) add R4, R2 "register R4 Add the contents of R2 and the contents of R2 and store the result in R2. ”(4) and R3, R5“ AND the contents of registers R3 and R5 and store the result in R5. ”Instruction (1) above The instruction (2) and the instruction (3) and the instruction (4) have no data dependency (data flow). However, the instruction (1) and the instruction (3), and the instruction (2) and the instruction (4) have data dependency (data flow). That is, instruction (1) and instruction (3)
Both use register R2. Further, the register R5 is used by both the instruction (2) and the instruction (4). Therefore, the instruction (3) must be executed after the execution of the instruction (1). Also, the instruction (4) must be executed after the execution of the instruction (2).

【００１１】すなわち、同時に実行する命令間にデータ
依存性が無い場合、図３の（ａ）に示されるようにパイ
プラインの空きが無く、２命令が完全に並列実行されの
で、従来の同時に１命令のみを実行する場合の２倍の処
理速度が得られる。しかし、同時に実行する命令間にデ
ータ依存性がある場合、図３の（ｂ）に示されるよう
に、パイプラインに乱れが出てしまい、従来の同時に１
命令のみを実行する場合と同一の処理速度になってしま
う。That is, when there is no data dependency between the instructions to be executed at the same time, there is no space in the pipeline as shown in FIG. A processing speed twice as fast as when executing only instructions is obtained. However, if there is a data dependency between the instructions that are executed at the same time, the pipeline will be disturbed as shown in FIG.
The processing speed will be the same as when executing only instructions.

【００１２】そのために、図３の（C）に示されるよう
に、同時に実行する命令間にデータ依存性がある場合、
後続命令は次のパイプラインに回し、後続命令の替りに
無処理命令nopを先行命令と同時に実行して、パイプラ
インの乱れを回避する方法が考えられる。しかし、無駄
な命令が増え、全体の実行命令数が増加して実行時間が
長くなる。Therefore, as shown in FIG. 3C, when there is a data dependency between the instructions executed simultaneously,
A method is conceivable in which the subsequent instruction is sent to the next pipeline and the unprocessed instruction nop is executed at the same time as the preceding instruction in place of the subsequent instruction to avoid the disturbance of the pipeline. However, the number of useless instructions increases, the total number of execution instructions increases, and the execution time increases.

【００１３】次に命令フォーマット及び命令体系により
生ずる課題を明らかにするため、図４及び図５を用いて
以下に説明する。Next, in order to clarify the problems caused by the instruction format and instruction system, description will be given below with reference to FIGS. 4 and 5.

【００１４】図４には、４バイト・３オペランド命令
（４バイト固定長命令）体系の場合の命令形式（命令フ
ォーマット）と命令レパートリの例が示される。この図
でＯＰフィールド４０１は命令機能を特定する。Ｓ１フ
ィールド４０３は第１入力を特定するレジスタ番号（第
１オペランド）、Ｓ２フィールド４０４は第２入力を特
定するレジスタ番号（第２オペランド）、Ｄフィールド
４０２は出力を特定するレジスタ番号（第３オペラン
ド）が置かれている。すなわち、この命令形式は３つの
オペランドを指定することができる。命令機能には、複
写（データ転送）、加算、減算などがある。さらに、４
バイト命令体系の命令長の余裕から１ビット左シフト加
算命令asl1addや０拡張加算命令zextaddなどの複合命令
も提供される。asl1add命令は第１オペランドのビット
パタンを１ビット左シフトした後で通常の加算を行うも
ので、zextadd命令は第１オペランドのビットパタンの
左半分を０にした後で通常の加算を行うものである。な
おここでは簡単化のため通常は存在するであろうメモリ
アクセス命令や分岐命令等を省略してある。なお複写命
令（データ転送命令）の場合Ｓ２フィールド４０４は無
視され、Ｓ１フィールド４０３で特定されたレジスタ
（転送元レジスタ）内容がそのままＤフィールド４０２
で特定されたレジスタ（転送先レジスタ）に複写（転
送）される。FIG. 4 shows an example of an instruction format (instruction format) and an instruction repertoire in the case of a 4-byte / 3-operand instruction (4-byte fixed length instruction) system. In this figure, the OP field 401 specifies the instruction function. The S1 field 403 is a register number specifying the first input (first operand), the S2 field 404 is a register number specifying the second input (second operand), and the D field 402 is a register number specifying the output (third operand). ) Is placed. That is, this instruction format can specify three operands. Command functions include copying (data transfer), addition, subtraction, and the like. In addition, 4
Compound instructions such as a 1-bit left shift addition instruction asl1add and a 0 extension addition instruction zextadd are also provided due to the margin of the instruction length of the byte instruction system. The asl1add instruction shifts the bit pattern of the first operand one bit to the left and then performs the normal addition, and the zextadd instruction zeros the left half of the bit pattern of the first operand and then performs the normal addition. is there. Note that, for simplification, memory access instructions, branch instructions, etc. that would normally exist are omitted. In the case of a copy command (data transfer command), the S2 field 404 is ignored, and the contents of the register (transfer source register) specified in the S1 field 403 remain unchanged in the D field 402.
It is copied (transferred) to the register (transfer destination register) specified by.

【００１５】図５には、２バイト・２オペランド命令
（２バイト固定長命令）体系の場合の命令形式と命令レ
パートリの例が示される。図５でＯＰフィールド５０１
は命令機能を特定する。Ｓ１フィールド５０３は第１入
力を特定するレジスタ番号（第１オペランド）、Ｄフ
ィールド５０２は第２入力を特定するレジスタ番号（出
力を特定するレジスタ番号と同一、第２オペランド）が
置かれている。すなわち、この命令形式は２つのオペラ
ンドを指定することができる。図４と較べてフS２フィ
ールドが存在しない点が図４の命令形式と明確に異なっ
ている部分である。すなわち、オペランドの数が１つ少
ない。さらに残りのフィールド長も図４のものに較べて
短くなっている。FIG. 5 shows an example of the instruction format and instruction repertoire in the case of a 2-byte / 2-operand instruction (2-byte fixed length instruction) system. In FIG. 5, the OP field 501
Specifies the command function. The S1 field 503 has a register number (first operand) that specifies the first input, and the D field 502 has a register number (the same as the register number that specifies the output, the second operand) that specifies the second input. That is, this instruction format can specify two operands. As compared with FIG. 4, the point that the S2 field does not exist is the part that is clearly different from the instruction format of FIG. That is, the number of operands is one less. Further, the remaining field length is shorter than that of FIG.

【００１６】命令機能には１入力転送命令として複写命
令（データ転送命令）、０拡張命令、符号拡張命令、１
ビット左シフト命令、２入力演算命令として加算命令、
減算命令等がある。このうち１ビット左シフト命令は、
命令長の都合で入力レジスタ（転送元レジスタ）と出力
レジスタ（転送先レジスタ）の番号が同じである。した
がってこの場合、S１フィールドはレジスタ番号でな
く、asl1命令を特定するための拡張命令コードが格納さ
れる。The instruction function includes a copy instruction (data transfer instruction), a 0 extension instruction, a sign extension instruction, and a 1 input transfer instruction as 1 input transfer instruction.
Bit shift left instruction, addition instruction as 2 input operation instruction,
There is a subtraction instruction, etc. Of these, the 1-bit left shift instruction is
Due to the instruction length, the numbers of the input register (transfer source register) and the output register (transfer destination register) are the same. Therefore, in this case, the S1 field stores not the register number but the extended instruction code for specifying the asl1 instruction.

【００１７】さて、４バイト・３オペランド命令体系と
２バイト・２オペランド命令体系の利害得失を明確化す
るために例えば、以下の式を考える。In order to clarify the advantages and disadvantages of the 4-byte / 3-operand instruction system and the 2-byte / 2-operand instruction system, consider the following formula, for example.

【００１８】 a=b+c+d; （Ａ）これを４バイト・３オペランド命令体系の命令列（命令
列（Ａ１））に変換すると以下のようになる。A = b + c + d; (A) This is converted into an instruction sequence (instruction sequence (A1)) of a 4-byte / 3-operand instruction system as follows.

【００１９】 add Rb,Rc,Ra add Ra,Rd,Ra 一方これを２バイト・２オペランド命令体系の命令列
（命令列（Ａ２））に変換すると以下のようになる。Add Rb, Rc, Ra add Ra, Rd, Ra On the other hand, when this is converted into an instruction sequence (instruction sequence (A2)) of a 2-byte / 2-operand instruction system, it becomes as follows.

【００２０】 mov Rb,Ra add Rc,Ra add Rd,Ra ４バイト・３オペランドの命令体系であれば、実行命令
数は２であるが、命令メモリでの格納（および実行のた
めの命令フェッチ）バイト数は８バイトである。一方２
バイト・２オペランドの命令体系であると、実行命令数
は３に増えるが、命令メモリでの格納（および実行のた
めの命令フェッチ）バイト数は６バイトに減少する。こ
の傾向は一般的に成立する。そして、４バイト・３オペ
ランド命令体系は２バイト・２オペランド命令体系に較
べ実行命令数が１〜２割程度少ないが、格納バイト数は
６割程度多くなる、という点が一般的に認められる。Mov Rb, Ra add Rc, Ra add Rd, Ra If the instruction system has 4 bytes and 3 operands, the number of execution instructions is 2, but storage in the instruction memory (and instruction fetch for execution) The number of bytes is 8 bytes. On the other hand 2
In the case of an instruction system of bytes and 2 operands, the number of executed instructions increases to 3, but the number of bytes stored (and instruction fetch for execution) in the instruction memory decreases to 6 bytes. This tendency generally holds. It is generally accepted that the 4-byte / 3-operand instruction system has a 10 to 20% smaller number of executed instructions than the 2-byte / 2-operand instruction system, but the stored byte number is increased by 60%.

【００２１】しかし、２バイト・２オペランドの命令体
系には１つ課題が存在する。それは２オペランド命令体
系の場合に必要となる余分なデータ転送命令にかかわ
る。上の式（Ａ）でも同様に説明できるのであるが、こ
こでは以下の式（Ｂ）で説明する。However, there is one problem with the 2-byte / 2-operand instruction system. It involves the extra data transfer instructions needed in the two-operand instruction set. Although the above equation (A) can be similarly explained, the following equation (B) will be explained here.

【００２２】a=b+c; これを４バイト・３オペランドの命令体系の命令列（命
令列（Ｂ１））に変換すると以下のようになる。A = b + c; This is converted into an instruction sequence (instruction sequence (B1)) of a 4-byte / 3-operand instruction system as follows.

【００２３】add Rb,Rc,Ra 一方これを２バイト・２オペランドの命令列（命令列
（Ｂ２））に変換すると以下のようになる。On the other hand, when this is converted into a 2-byte / 2-operand instruction sequence (instruction sequence (B2)), it becomes as follows.

【００２４】 mov Rb,Ra add Rc,Ra ４バイト・３オペランドの命令体系であれば、パイプラ
インの片方だけを使用して１クロックで実行できる。一
方２バイト・２オペランドの命令体系であれば、余分に
必要となった複写（データ転送）命令mov と後続の加算
命令addの２つの命令間にデータフローが存在する。つ
まり先行命令の結果の値を後続命令が使用している。し
たがって先行命令movの結果を待って後続命令addを実行
する必要があり、２クロックの実行時間がかかる。以下
の命令列 mov Rb,Ra add Rc,Rd であれば、２つの命令間でデータフローがないので、２
つのパイプラインを使用して１クロックで実行できる訳
であるが、式（Ｂ) に対応する命令列（Ｂ２）ではデー
タフローが存在することにより処理時間が余分にかかる
ことになる。スーパスカラ方式を採用した場合、２バイ
ト・２オペランド命令体系は４バイト・３オペランド命
令体系に較べ、実行命令数の多さ以上に実行時間がかか
る傾向があるといえる。Mov Rb, Ra add Rc, Ra 4 bytes / 3 operands instruction system can be executed in one clock by using only one of the pipelines. On the other hand, in the case of a 2-byte / 2-operand instruction system, a data flow exists between two extra instructions, a copy (data transfer) instruction mov and a subsequent add instruction add. That is, the value resulting from the preceding instruction is used by the subsequent instruction. Therefore, it is necessary to wait for the result of the preceding instruction mov to execute the succeeding instruction add, which takes two clocks. If the following instruction sequence mov Rb, Ra add Rc, Rd, there is no data flow between two instructions, so 2
Although it can be executed in one clock using one pipeline, the instruction string (B2) corresponding to the expression (B) requires extra processing time because of the data flow. When the superscalar system is adopted, it can be said that the 2-byte / 2-operand instruction system tends to take more execution time than the 4-byte / 3-operand instruction system because of the large number of execution instructions.

【００２５】なお、２バイト・２オペランド命令体系の
課題を４バイト・３オペランド命令体系と比較して説明
したが、４バイト・３オペランド命令体系においても、
４オペランドの演算を実行する場合、前記命令列（Ａ
１）のようにデータフローが存在し、２バイト・２オペ
ランド命令体系と同様な課題が存在する。The problem of the 2-byte / 2-operand instruction system has been explained by comparing it with the 4-byte / 3-operand instruction system.
When executing a 4-operand operation, the instruction sequence (A
There is a data flow as in 1), and there are similar problems to the 2-byte / 2-operand instruction system.

【００２６】従来から存在するマイクロプロセッサは、
ソフトウエア資産の蓄積があり、これまで築き上げてき
たソフトウエア資産の継承の関係で、命令フォーマッ
ト、命令体系を変更することは困難である。従って、従
来の命令フォーマット、命令体系を維持したまま、処理
速度の向上を図る必要がある。The existing microprocessors are:
Due to the accumulation of software assets and the inheritance of software assets that have been built up to date, it is difficult to change the instruction format and instruction system. Therefore, it is necessary to improve the processing speed while maintaining the conventional instruction format and instruction system.

【００２７】本発明の課題は、スーパスカラ方式におけ
るデータ・ハザードのよるパイプライン・ストールを削
減し、処理速度の向上を実現することにある。An object of the present invention is to reduce the pipeline stall due to data hazard in the superscalar system and to realize the improvement of the processing speed.

【００２８】本発明の他の課題は、実行命令数を削減
し、処理速度の向上を実現することにある。Another object of the present invention is to reduce the number of execution instructions and realize an improvement in processing speed.

【００２９】さらに、本発明の他の課題は、２バイト・
２オペランド命令体系を実行するデータ処理装置の処理
速度の向上を実現することにある。Further, another object of the present invention is 2 bytes.
It is to realize an improvement in the processing speed of a data processing device that executes a two-operand instruction system.

【００３０】本発明の前記並びにその他の課題と新規な
特徴は本明細書の記述及び添付図面から明らかになるで
あろう。The above and other objects and novel characteristics of the present invention will be apparent from the description of this specification and the accompanying drawings.

【００３１】[0031]

【課題を解決するための手段】本願において開示される
発明のうち代表的なものの概要を簡単に説明すれば下記
の通りである。The following is a brief description of an outline of a typical invention among the inventions disclosed in the present application.

【００３２】パイプライン方式のデータ処理装置は、命
令メモリに格納される固定長命令を読み込むステージ
と、読み込まれた複数の命令が実行するデータに依存性
が有り、かつ前記複数の命令に所定の関係がある場合、
前記複数の命令を複数のパイプラインで並列に実行でき
るように前記複数の命令を変更するステージと、変更さ
れた前記複数の命令を並列に実行するステージとを有す
る。The pipeline type data processing device has a dependency on the stage for reading a fixed length instruction stored in the instruction memory and the data executed by the plurality of read instructions, and the plurality of instructions have a predetermined value. If there is a relationship,
It has a stage for changing the plurality of instructions so that the plurality of instructions can be executed in parallel in a plurality of pipelines, and a stage for executing the changed plurality of instructions in parallel.

【００３３】命令体系上は２バイト２オペランド命令体
系であるが、内部処理的には３オペランド命令体系とし
て処理する。つまり、命令フェッチステージは２命令を
フェッチする。命令デコードステージは２つの隣接した
命令をデコードする。演算ステージの演算器は２組用意
する。そして、隣接する２つの２オペランド命令が、１
つの３オペランド命令と同等であることを検出する手段
と、そうであれば２つの命令を１つの３オペランド命令
に統合して後続の実行ステージに送出する手段を命令デ
コーダに設ける。これにより、１つの３オペランド命令
として実行ステージに送られ１つのクロックで実行され
る。また隣接する２つの命令がデータフローの関係にあ
るが１つの３オペランド命令には統合できないことを検
出すると、先行命令のソースデータを後続命令のための
演算器に送る手段を設ける。The instruction system is a 2-byte 2-operand instruction system, but is internally processed as a 3-operand instruction system. That is, the instruction fetch stage fetches two instructions. The instruction decode stage decodes two adjacent instructions. Two sets of computing units for the computing stage are prepared. Then, two adjacent two-operand instructions are
The instruction decoder is provided with means for detecting equality with three three-operand instructions and, if so, means for integrating two instructions into one three-operand instruction and sending it to the subsequent execution stage. As a result, one 3-operand instruction is sent to the execution stage and executed in one clock. Further, when it is detected that two adjacent instructions have a data flow relationship but cannot be integrated into one three-operand instruction, means for sending the source data of the preceding instruction to the arithmetic unit for the subsequent instruction is provided.

【００３４】これにより、２つの命令を同時に実行でき
ることになる。以上の２つにより、隣接命令間のデータ
フローにより従来であれば２クロックの時間を要してい
た２つの命令処理を１クロックで実行できることにな
る。したがって、全体としての実行クロック数を削減で
きる。This allows two instructions to be executed simultaneously. Due to the above two, it is possible to execute two instruction processing in one clock, which conventionally took two clocks due to the data flow between the adjacent instructions. Therefore, the number of execution clocks as a whole can be reduced.

【００３５】[0035]

【発明の実施の形態】本発明の実施例に係るマイクロプ
ロセッサを順次項目に従って説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS A microprocessor according to an embodiment of the present invention will be described in order of items.

【００３６】《マイクロプロセッサのパイプラインデー
タパス》図６には本発明の実施例に係るマイクロプロセ
ッサのパイプラインのデータパスが示される。前記マイ
クロプロセッサは図５に示すような２バイト・２オペラ
ンド命令体系の命令をフェッチし、実行するものである
として以下説明する。<< Microprocessor Pipeline Data Path >> FIG. 6 shows a pipeline data path of the microprocessor according to the embodiment of the present invention. It will be described below that the microprocessor fetches and executes an instruction of a 2-byte / 2-operand instruction system as shown in FIG.

【００３７】第１ステージ７００は命令フェッチステー
ジである。第２ステージ８００は命令デコードステージ
である。第３ステージ９００は演算ステージである。第
４ステージ１０００はレジスタへの書き込みとフォワー
ディングを行うステージである。前記各ステージの間に
は、第１ラッチ群７５０、第２ラッチ群８５０及び第３
ラッチ群９５０がある。なお、図６以下の実施例におけ
る各ステージは、データの流れを示すもので、各ステー
ジ内に記載される回路等の物理的な配置を示すものでは
ない。The first stage 700 is an instruction fetch stage. The second stage 800 is an instruction decode stage. The third stage 900 is a calculation stage. The fourth stage 1000 is a stage for writing to a register and forwarding. A first latch group 750, a second latch group 850, and a third latch group 850 are provided between each stage.
There is a latch group 950. It should be noted that each stage in the embodiments of FIG. 6 and subsequent figures shows the flow of data, and does not show the physical arrangement of the circuits and the like described in each stage.

【００３８】《命令フェッチステージ》図７には第１ス
テージ７００と第１ラッチ群７５０との詳細ブロック図
が示される。第１ステージ７００は、プログラムカウン
タ（ＰＣ）７０１とフェッチ制御部７０２と命令メモリ
７０３とで構成される。第１ステージ７００の命令フェ
ッチステージの役割は命令メモリ内の命令を次の第２ス
テージ８００の命令デコードステージに渡すことであ
る。<< Instruction Fetch Stage >> FIG. 7 is a detailed block diagram of the first stage 700 and the first latch group 750. The first stage 700 includes a program counter (PC) 701, a fetch controller 702, and an instruction memory 703. The role of the instruction fetch stage of the first stage 700 is to pass the instructions in the instruction memory to the next instruction decode stage of the second stage 800.

【００３９】プログラムカウンタ７０１の指すアドレス
を信号線７０４に送出し命令メモリ７０３内の命令４バ
イト（２命令）を信号線７０５を介してフェッチ制御部
７０２にフェッチする。フェッチ制御部７０２にフェッ
チされた２つの命令を信号線８０３に従って、信号線７
０６、７０７に送出する。それから第１ラッチ群７５０
内のラッチ７５１に信号線７０６の内容が格納され、ラ
ッチ７５２に信号線７０７の内容が格納される。ラッチ
７５１には第１命令が、ラッチ７５２には第２命令が格
納される。ここで、命令列の中において第１命令は第２
命令よりも先にある。なお、本願では第１命令を先行命
令、第２命令を後続命令ともいう。The address indicated by the program counter 701 is sent to the signal line 704, and the instruction 4 bytes (2 instructions) in the instruction memory 703 are fetched to the fetch control unit 702 via the signal line 705. The two instructions fetched by the fetch control unit 702 are sent to the signal line 7 according to the signal line 803.
It is sent to 06 and 707. Then the first latch group 750
The content of the signal line 706 is stored in the latch 751 therein, and the content of the signal line 707 is stored in the latch 752. The latch 751 stores the first instruction and the latch 752 stores the second instruction. Here, the first instruction is the second instruction in the instruction sequence.
It precedes the order. In the present application, the first instruction is also referred to as a preceding instruction and the second instruction is also referred to as a subsequent instruction.

【００４０】また、プログラムカウンタ７０１の値に４
を加えた値をプログラムカウンタ７０１に設定しなお
す。プログラムカウンタ７０１の値（命令メモリをアク
セスするアドレスの値）は２の倍数という制約のもとで
命令メモリから４バイト分の命令（２命令）をフェッチ
し第１ラッチ群７５０内にラッチするよう第１ステージ
７００が動作する。但し、常に命令メモリからフェッチ
した４バイト分の命令をそのまま第１ラッチ群７５０に
ラッチするものではない。すなわち、第２ステージ８０
０である命令デコーダステージから見て、次に欲しい命
令が現命令の何バイト先かの情報を信号線８０３を介し
て第１ステージ７００のフェッチ制御部７０２に送る。
第１ステージ７００のフェッチ制御部７０２はそれに応
答してフェッチ制御部７０２内に存在するバッファを活
用して命令デコードステージの望みの４バイト（２命
令）を信号線７０６、７０７に送出し、第１ラッチ群７
５０内のラッチ７５１、７５２に格納する。The value of the program counter 701 is set to 4
The value added with is reset to the program counter 701. The value of the program counter 701 (the value of the address that accesses the instruction memory) is set to be a multiple of 2 so that a 4-byte instruction (2 instructions) is fetched from the instruction memory and latched in the first latch group 750. The first stage 700 operates. However, the 4-byte instruction fetched from the instruction memory is not always latched in the first latch group 750 as it is. That is, the second stage 80
When viewed from the instruction decoder stage which is 0, information about how many bytes ahead of the current instruction the next desired instruction is sent to the fetch control unit 702 of the first stage 700 via the signal line 803.
In response to this, the fetch control unit 702 of the first stage 700 utilizes the buffer existing in the fetch control unit 702 to send out the desired 4 bytes (2 instructions) of the instruction decoding stage to the signal lines 706 and 707. 1 latch group 7
It stores in the latches 751 and 752 in 50.

【００４１】《命令デコードステージ》図８には第２ス
テージ８００と第２ラッチ群８５０との詳細ブロック図
が示される。第２ステージ８００は、デコード制御部８
０１とレジスタファイル８０２とで構成される。第２ス
テージ８００の命令デコードステージの役割は以下の通
りである。（１）２つの命令で使用される入力データを用意し、次
の演算ステージ（第３ステージ９００）に渡す。<< Instruction Decode Stage >> FIG. 8 shows a detailed block diagram of the second stage 800 and the second latch group 850. The second stage 800 includes a decode controller 8
01 and register file 802. The role of the instruction decode stage of the second stage 800 is as follows. (1) Prepare input data used by two instructions and pass it to the next operation stage (third stage 900).

【００４２】（２）２つの命令間のデータフローを検査
し、先行命令（第１命令）の実行結果を後続命令（第２
命令）が使用していなければ、２つの命令処理を演算ス
テージに依頼する。(2) The data flow between two instructions is inspected, and the execution result of the preceding instruction (first instruction) is compared with the subsequent instruction (second instruction).
If two instructions are not used, the processing stage is requested to process two instructions.

【００４３】（３）２つの命令間のデータフローを検査
し、先行命令の実行結果を後続命令が使用していれば、
所定の規則に従い２つの命令を変更する。(3) The data flow between two instructions is checked, and if the execution result of the preceding instruction is used by the succeeding instruction,
Modify the two instructions according to the given rules.

【００４４】（４）演算ステージに処理依頼した命令数
を命令フェッチステージに連絡し、次のパイプラインの
処理に備える。(4) The number of instructions requested to be processed in the operation stage is sent to the instruction fetch stage to prepare for the next pipeline processing.

【００４５】命令デコードステージ（第２ステージ８０
０）の動作を以下に説明する。図１２にはデコード制
御部８０１の一部の詳細ブロック図が示される。デコー
ド制御部８０１はデータフロー検出回路ＤＦＤＣ、命令
変換回路ＩＮＣＣ等を有する。命令変換回路ＩＮＣＣ
は、セレクタＳＥＬ１から４を有し、データフロー検出
回路ＤＦＤＣの制御に基づいてラッチ７５１、７５２の
内容を加工し、ラッチ８５１、８５２の内容に変換す
る。Instruction decode stage (second stage 80
The operation of 0) will be described below. FIG. 12 shows a detailed block diagram of a part of the decoding control unit 801. The decode control unit 801 has a data flow detection circuit DFDC, an instruction conversion circuit INCC, and the like. Instruction conversion circuit INCC
Has selectors SEL1 to SEL4, processes the contents of the latches 751 and 752 under the control of the data flow detection circuit DFDC, and converts them into the contents of the latches 851 and 852.

【００４６】ラッチ７５１の内容である第１命令のＯＰ
フィールドをＯＰ−１、ＤフィールドをＤ−１、Ｓ１フ
ィールドをＳ１−１とする。ラッチ７５２の内容である
第２命令のＯＰフィールドをＯＰ−２、Ｄフィールドを
Ｄ−２、Ｓ１フィールドをＳ１−２とする。ラッチ８５
１の内容である第１命令のＯＰフィールドをＯＰN−
１、ＤフィールドをＤN−１、Ｓ１フィールドをＳ１N−
１とする。ラッチ８５２の内容である第２命令のＯＰフ
ィールドをＯＰN−２、ＤフィールドをＤＮ−２、Ｓ１
フィールドをＳ１Ｎ−２とする。ラッチ８５２の内容で
ある第２命令はさらにＳ２フィールドを有し、これをＳ
２Ｎ−２とする。OP of the first instruction which is the content of the latch 751
The field is OP-1, the D field is D-1, and the S1 field is S1-1. The OP field of the second instruction, which is the contents of the latch 752, is OP-2, the D field is D-2, and the S1 field is S1-2. Latch 85
The OP field of the first instruction, which is the content of 1, is OPN-
1, D field is DN-1, S1 field is S1N-
Let it be 1. The OP field of the second instruction, which is the contents of the latch 852, is OPN-2, the D field is DN-2, and S1.
Let the field be S1N-2. The second instruction, which is the content of the latch 852, further has an S2 field, which is S2.
2N-2.

【００４７】デコード制御部８０１はラッチ群７５０内
のラッチ７５１、７５２より先行命令と後続命令の２つ
の命令を信号線７５３、７５４を介して取り込む。そし
て先行命令のＤフィールド（Ｄ−１）のレジスタ番号が
後続命令のＳ１フィールド（Ｓ１−２）又はＤフィール
ド（Ｄ−２）のレジスタ番号と等しいか否かをデータフ
ロー検出回路ＤＦＤＣで検査する。The decode control unit 801 fetches two instructions, a preceding instruction and a succeeding instruction, from the latches 751 and 752 in the latch group 750 via signal lines 753 and 754. Then, the data flow detection circuit DFDC checks whether or not the register number of the D field (D-1) of the preceding instruction is equal to the register number of the S1 field (S1-2) or the D field (D-2) of the subsequent instruction. .

【００４８】レジスタ番号が等しくない場合、データフ
ローは存在しないと判断できる。レジスタ番号が等しい
場合、データフローが存在すると判断できる。そうする
と、データフロー検出回路ＤＦＤＣは、制御信号８２１
から８２４を出力し、セレクタＳＥＬ１から４をそれぞ
れ切り替えて信号線８１３、８０４を介して、ラッチ８
５１、８５２に変換した第１命令、第２命令を格納す
る。なお、セレクタＳＥＬ１、ＳＥＬ２の一つの入力
にはＩＮＣＣで生成された無効命令ＮＯＰ８２０が常時
入力される。If the register numbers are not equal, it can be determined that there is no data flow. If the register numbers are the same, it can be determined that there is a data flow. Then, the data flow detection circuit DFDC receives the control signal 821.
To 824, and switches the selectors SEL1 to SEL4 respectively to switch the latch 8 via the signal lines 813 and 804.
The first and second instructions converted into 51 and 852 are stored. The invalid command NOP820 generated by INCC is always input to one input of the selectors SEL1 and SEL2.

【００４９】さらに、セレクタＳＥＬ２には、信号線
８４０を介してデータフロー検出回路ＤＦＤＣにより生
成した新たな命令が入力される。信号線８４０によりセ
レクタＳＥＬ２に入力される新たな命令は、データフロ
ー検出回路ＤＦＤＣがラッチ７５１の０Ｐ−１とラッチ
７５２の０Ｐ−２に基づいて生成したものであり、ラッ
チ８５２の０Ｐ−２に格納される。生成される新たな命
令の一例としては、０Ｐ−１が１ビットシフト命令asl1
で０Ｐ−２が加算命令addのときに生成される１ビット
シフト加算命令asl1addがある。Further, a new command generated by the data flow detection circuit DFDC is input to the selector SEL2 via the signal line 840. The new instruction input to the selector SEL2 via the signal line 840 is generated by the data flow detection circuit DFDC based on 0P-1 of the latch 751 and 0P-2 of the latch 752, and is input to 0P-2 of the latch 852. Is stored. As an example of the new instruction generated, 0P-1 is a 1-bit shift instruction asl1
There is a 1-bit shift addition instruction asl1add generated when 0P-2 is the addition instruction add.

【００５０】セレクタＳＥＬ３は、Ｓ１−１またはＤ−
２の一方の値を選択し、Ｓ１Ｎ−２に格納するためのも
のである。The selector SEL3 is S1-1 or D-.
This is for selecting one of the two values and storing it in S1N-2.

【００５１】セレクタＳＥＬ４は、Ｓ１−１またはＳ１
−２の一方の値を選択し、Ｓ２Ｎ−２に格納するための
ものである。The selector SEL4 is S1-1 or S1.
-2 is selected and stored in S2N-2.

【００５２】図１１には命令デコードステージの２つの
命令を演算ステージの２つの命令に変換する規則(条件
と演算ステージに渡る命令)が示されている。第１命令
は、無効命令nopに変換されるか又は変換されないかの
どちらかである。第２命令は命令形式を図５の２バイト
・２オペランド形式ものから図４の４バイト・３オペラ
ンド形式ものに変換されるか又は無効命令nopに変換さ
れる。図１１のALUは算術演算（加算、減算等）や論理
演算（論理積、論理和等）などの２入力演算命令を総称
する命令名である。前述したように、zextALU は演算器
への第１入力を０拡張し、そしてALU演算する命令であ
る。asl1ALUは演算器への第１入力を１ビット左シフト
し、そしてALU演算する命令である。FIG. 11 shows rules for converting two instructions in the instruction decode stage into two instructions in the operation stage (conditions and instructions across the operation stage). The first instruction is either translated or not translated into an invalid instruction nop. The second instruction is converted from the 2-byte / 2-operand format shown in FIG. 5 to the 4-byte / 3-operand format shown in FIG. 4 or converted into an invalid instruction nop. The ALU in FIG. 11 is a general name for two-input operation instructions such as arithmetic operations (addition, subtraction, etc.) and logical operations (logical product, logical sum, etc.). As described above, zextALU is an instruction that zero-extends the first input to the arithmetic unit and performs ALU operation. asl1ALU is an instruction for shifting the first input to the arithmetic unit to the left by 1 bit and performing the ALU operation.

【００５３】図１１の（１）は２オペランド形式の演算
命令で３オペランドの演算命令を実行するためには複写
命令movと演算命令ALUとの２命令必要であったものを１
つの３オペランドの演算命令ALUに変換するものであ
る。複写命令movのＤフィールドのレジスタ番号と演算
命令ALUのＤフィールドのレジスタ番号とが一致する場
合である。この場合、演算ステージには第１命令が無効
命令nopに、第２命令が３オペランドの演算命令に変換
されて渡される。FIG. 11A shows a 2-operand type operation instruction which requires two instructions, a copy instruction mov and an operation instruction ALU, in order to execute a three-operand operation instruction.
It is converted into a three-operand operation instruction ALU. This is the case where the register number of the D field of the copy instruction mov matches the register number of the D field of the arithmetic instruction ALU. In this case, the first instruction is converted into an invalid instruction nop, and the second instruction is converted into a three-operand arithmetic instruction and passed to the arithmetic stage.

【００５４】ラッチ８５１,８５２の各フィールドに格
納される値を要約すると以下のようになる。なお、
「←」は、「←」の右側の値を「←」の左側に格納する
ことを意味する。The values stored in the fields of the latches 851 and 852 are summarized as follows. In addition,
“←” means that the value on the right side of “←” is stored on the left side of “←”.

【００５５】具体的には以下のようになる。ラッチ７５１のＯＰ−１
には「mov」が、Ｄ−１には「ＲＮ」が、Ｓ１−１には
「Ｒｍ」が格納されているとする。また、ラッチ７５２
のＯＰ−２には「ALU」が、Ｄ−２には「ＲＮ」が、Ｓ
１−２には「Ｒｌ」が格納されているとする。ここでＤ
−１とＤ−２が共に「ＲＮ」でレジスタ番号が一致する
ことをデータフロー検出回路ＤＦＤＣが検出する。する
とデータフロー検出回路ＤＦＤＣは、ＳＥL１がnop命令
８２０を選択するように８２１を介しセレクタＳＥＬ1
を制御し、nop命令８２０をラッチ８５１のＯＰN−１に
格納する。データフロー検出回路ＤＦＤＣは、ラッチ７
５１のＤ−１、Ｓ１−１をそのまま信号線７５３、８
１３を介してラッチ８５１のＤN−１、Ｓ１N−１に格
納する。[0055] Specifically, it is as follows. OP-1 of the latch 751
It is assumed that "mov" is stored in "", "RN" is stored in "D-1", and "Rm" is stored in "S1-1". Also, the latch 752
OP-2 has "ALU", D-2 has "RN", S
It is assumed that “Rl” is stored in 1-2. Where D
The data flow detection circuit DFDC detects that both -1 and D-2 are "RN" and the register numbers match. Then, the data flow detection circuit DFDC causes the selector SEL1 via 821 so that SEL1 selects the nop instruction 820.
And stores the nop instruction 820 in OPN-1 of the latch 851. The data flow detection circuit DFDC includes a latch 7
51 D-1 and S1-1 are directly connected to the signal lines 753 and 8
The data is stored in DN-1 and S1N-1 of the latch 851 via 13.

【００５６】またデータフロー検出回路ＤＦＤＣは、セ
レクタＳＥＬ２がラッチ７５２のＯＰ−２を選択するよ
うに制御信号８２２を介してセレクタＳＥＬ２を制御
し、ラッチ７５２のＯＰ−２をラッチ８５２のＯＰN−
２に格納する。さらにデータフロー検出回路ＤＦＤＣ
は、セレクタＳＥＬ３がラッチ７５１のＳ１−１を選択
するように制御信号８２３を介しセレクタＳＥＬ３を制
御し、ラッチ７５１のＳ１−１をラッチ８５２のＳ１Ｎ
−２に格納する。またデータフロー検出回路ＤＦＤＣは
ラッチ７５２のＤ−２を信号線７５４を介してそのまま
ラッチ８５２のＤＮ−２に格納する。さらにデータフロ
ー検出回路ＤＦＤＣは、セレクタＳＥＬ４がラッチ７５
２のＳ１−１を選択するように８３４を介してセレクタ
ＳＥＬ４を制御し、ラッチ７５２のＳ１−１をＳ２Ｎ−
２に格納する。Further, the data flow detection circuit DFDC controls the selector SEL2 via the control signal 822 so that the selector SEL2 selects OP-2 of the latch 752, and OP-2 of the latch 752 is OPN- of the latch 852.
2 is stored. Further data flow detection circuit DFDC
Controls the selector SEL3 via the control signal 823 so that the selector SEL3 selects S1-1 of the latch 751, and the S1-1 of the latch 751 is changed to the S1N of the latch 852.
-2. Further, the data flow detection circuit DFDC stores D-2 of the latch 752 in DN-2 of the latch 852 as it is via the signal line 754. Further, in the data flow detection circuit DFDC, the selector SEL4 has a latch 75.
The selector SEL4 is controlled via 834 so as to select S1-1 of 2 and S1-1 of latch 752 is changed to S2N−.
2 is stored.

【００５７】図１１の（２）は複写命令movのDフィール
ドのレジスタ番号と演算命令ALUのＳ１フィールドのレ
ジスタ番号とが一致する場合である。この場合、演算ス
テージには第１命令はそのままで、第２命令が３オペラ
ンドの演算命令に変換されて渡される。FIG. 11B shows the case where the register number of the D field of the copy instruction mov matches the register number of the S1 field of the arithmetic instruction ALU. In this case, the second instruction is converted into a three-operand operation instruction and passed to the operation stage without changing the first instruction.

【００５８】ラッチ８５１,８５２の各フィールドに格
納される値を要約すると以下のようになる。具体的には以下のようである。ラッチ７５１のＯＰ−１
には「mov」が、Ｄ−１には「ＲＮ」が、Ｓ１−１には
「Ｒｍ」が格納されているとする。また、ラッチ７５２
のＯＰ−２には「ALU」が、Ｄ−２には「Ｒｘ」が、Ｓ
１−２には「ＲＮ」が格納されているとする。ここでＤ
−１とＳ１−２が共に「ＲＮ」でレジスタ番号が一致す
ることをデータフロー検出回路ＤＦＤＣが検出する。そ
してデータフロー検出回路ＤＦＤＣは、セレクタＳＥＬ
１がラッチ７５１のＯＰ−１(この場合mov命令)を選択
するように８２１を介しセレクタＳＥＬ１を制御し、mo
v命令をラッチ８５１のＯＰN−１に格納する。The values stored in the fields of the latches 851 and 852 are summarized as follows. Specifically, it is as follows. OP-1 of the latch 751
It is assumed that "mov" is stored in "", "RN" is stored in "D-1", and "Rm" is stored in "S1-1". Also, the latch 752
OP-2 has "ALU", D-2 has "Rx", S
It is assumed that “RN” is stored in 1-2. Where D
The data flow detection circuit DFDC detects that both -1 and S1-2 are "RN" and the register numbers match. The data flow detection circuit DFDC then selects the selector SEL.
1 controls the selector SEL1 via 821 so that OP-1 of the latch 751 (mov instruction in this case) is selected.
The v instruction is stored in OPN-1 of the latch 851.

【００５９】データフロー検出回路ＤＦＤＣは、ラッチ
７５１のＤ−１、Ｓ１−１を、そのまま信号線７５３、
８１３を介してラッチ８５１のＤN−１、Ｓ１N−１に格
納する。またデータフロー検出回路ＤＦＤＣは、セレク
タＳＥＬ２がラッチ７５２のＯＰ−２を選択するように
制御信号８２２を介してセレクタＳＥＬ２を制御し、ラ
ッチ７５２のＯＰ−２をラッチ８５２のＯＰN−２に格
納する。なおデータフロー検出回路ＤＦＤＣは、ラッ
チ７５２のＤ−２をそのまま信号線７５４、８０４を介
してラッチ８５２のＤＮ−２に格納する。さらにデータ
フロー検出回路ＤＦＤＣは、セレクタＳＥＬ３がラッチ
７５１のＳ１−１を選択するように制御信号８２３を介
してセレクタＳＥＬ３を制御し、信号線８０４を介して
ラッチ８５２のＳ１Ｎ−２にラッチ７５１のＳ１−１を
格納する。なおデータフロー検出回路ＤＦＤＣは、ラッ
チ７５２のＳ１−２を信号線７５４、８０４を介して、
そのままラッチ８５２のＳ２Ｎ−２に格納する。In the data flow detection circuit DFDC, the D-1 and S1-1 of the latch 751 are directly connected to the signal line 753,
The data is stored in DN-1 and S1N-1 of the latch 851 via 813. Further, the data flow detection circuit DFDC controls the selector SEL2 via the control signal 822 so that the selector SEL2 selects OP-2 of the latch 752, and stores OP-2 of the latch 752 in OPN-2 of the latch 852. . The data flow detection circuit DFDC stores D-2 of the latch 752 as it is in DN-2 of the latch 852 via the signal lines 754 and 804. Further, the data flow detection circuit DFDC controls the selector SEL3 via the control signal 823 so that the selector SEL3 selects S1-1 of the latch 751, and the S1N-2 of the latch 852 is controlled by the selector SEL3 via the signal line 804. Store S1-1. The data flow detection circuit DFDC connects S1-2 of the latch 752 via the signal lines 754 and 804.
It is stored in S2N-2 of the latch 852 as it is.

【００６０】なお、ラッチ８５１、８５２に具体的に格
納される値を作っていく以上のような説明は図１１の
(２) 以降では省略する。図１１の(1),(2) と同様なや
り方でラッチ８５１,８５２に格納する値を作っていけ
るからである。It should be noted that the values specifically stored in the latches 851 and 852 are made as described above.
(2) It will be omitted after this. This is because the values to be stored in the latches 851 and 852 can be created in the same manner as (1) and (2) in FIG.

【００６１】図１１の（３）は１オペランド形式の１ビ
ット左シフト命令を２オペランド形式の１ビット左シフ
ト命令に変換するものである。複写命令movのＤフィー
ルドのレジスタ番号と１ビット左シフト命令asl1のＤ
フィールドのレジスタ番号とが一致する場合である。こ
の場合、演算ステージには第１命令が無効命令nopに、
第２命令が２オペランドの１ビット左シフト命令asl1に
変換されて渡される。FIG. 11 (3) is for converting a 1-operand type 1-bit left shift instruction into a 2-operand type 1-bit left shift instruction. Register field number of copy instruction mov and D of 1-bit left shift instruction asl1
This is the case when the field register numbers match. In this case, the first instruction becomes an invalid instruction nop in the operation stage,
The second instruction is converted into a 2-operand 1-bit left shift instruction asl1 and passed.

【００６２】すなわち、各フィールドは下記のように変
換される。That is, each field is converted as follows.

【００６３】図１１の（４）は第１命令が複写命令movで、第２命令
又は条件が図１１の（１）、（２）、（３）に該当しな
かった場合である。この場合、演算ステージには第１命
令はそのままで、第２命令が無効命令nopに変換されて
渡される。「その他」の命令は１クロックずれた次のパ
イプラインで実行される。[0063] In FIG. 11, (4) is a case where the first instruction is the copy instruction mov and the second instruction or condition does not correspond to (1), (2) and (3) in FIG. In this case, the first instruction remains unchanged and the second instruction is converted into the invalid instruction nop and passed to the operation stage. The "other" instructions are executed in the next pipeline, which is one clock offset.

【００６４】すなわち、各フィールドは下記のように変
換される。That is, each field is converted as follows.

【００６５】ＯＰＮ−１←ＯＰ−１、ＤＮ−１←Ｄ−１、Ｓ１Ｎ−１←Ｓ１−１、ＯＰＮ−２←nop 図１１の（５）は０拡張命令zextと演算命令ALUとを０
拡張演算命令zextALUに複合するものである。０拡張命
令zextのＤフィールドのレジスタ番号と演算命令ALUの
Ｄフィールドのレジスタ番号とが一致する場合である。
この場合、演算ステージには第１命令が無効命令nop
に、第２命令が３オペランドの０拡張演算命令zextALU
に変換されて渡される。OPN-1 ← OP-1, DN-1 ← D-1, S1N-1 ← S1-1, OPN-2 ← nop In FIG. 11 (5), 0 extension instruction zext and operation instruction ALU are set to 0.
It is combined with the extended operation instruction zextALU. This is the case where the register number of the D field of the 0 extension instruction zext matches the register number of the D field of the arithmetic instruction ALU.
In this case, the first instruction is an invalid instruction nop in the operation stage.
The second instruction is a 3-operand 0-extended operation instruction zextALU
Is converted to and passed.

【００６６】すなわち、各フィールドは下記のように変
換される。That is, each field is converted as follows.

【００６７】図１１の（６）は０拡張命令zextのＤフィールドのレ
ジスタ番号と加算命令addのS１フィールドのレジスタ番
号とが一致する場合である。この場合、演算ステージに
は第１命令はそのままで、第２命令が３オペランドの０
拡張加算命令zextaddに変換されて渡される。[0067] FIG. 11 (6) shows a case where the register number in the D field of the 0 extension instruction zext and the register number in the S1 field of the add instruction add match. In this case, the first instruction remains in the operation stage, and the second instruction has a 3-operand value of 0.
Converted to the extended addition instruction zextadd and passed.

【００６８】すなわち、各フィールドは下記のように変
換される。That is, each field is converted as follows.

【００６９】なお、加算命令add以外に可換な論理積命令andや論理和
命令or等も同様な変換を行っても良い。[0069] In addition to the addition instruction add, a commutative logical product instruction and, a logical sum instruction or, or the like may be converted in the same manner.

【００７０】図１１の（７）は第１命令が０拡張命令ze
xtで、第２命令又は条件が図１１の（５）又は（６）に
該当しない場合である。この場合、演算ステージには第
１命令はそのままで、第２命令が無効命令nopに変換さ
れて渡される。「その他」の命令は１クロックずれた次
のパイプラインで実行される。In (7) of FIG. 11, the first instruction is the 0 extension instruction ze.
This is the case where the second instruction or condition does not correspond to (5) or (6) in FIG. 11 at xt. In this case, the first instruction remains unchanged and the second instruction is converted into the invalid instruction nop and passed to the operation stage. The "other" instructions are executed in the next pipeline, which is one clock offset.

【００７１】すなわち、各フィールドは下記のように変
換される。That is, each field is converted as follows.

【００７２】ＯＰＮ−１←ＯＰ−１、ＤＮ−１←Ｄ−１、Ｓ１Ｎ−１←Ｓ１−１、ＯＰＮ−２←nop 図１１の（８）は１ビット左シフト命令asl1と演算命令
ALUとを１ビット左シフト演算命令asl1ALUに複合するも
のである。１ビット左シフト命令asl1のＤフィールドの
レジスタ番号と演算命令ALUのＤフィールドのレジスタ
番号とが一致する場合である。この場合、演算ステージ
には第１命令が無効命令nopに、第２命令が３オペラン
ドの１ビット左シフト演算命令asl1ALUに変換されて渡
される。OPN-1 ← OP-1, DN-1 ← D-1, S1N-1 ← S1-1, OPN-2 ← nop (8) in FIG. 11 is a 1-bit left shift instruction asl1 and an operation instruction
ALU and 1-bit left shift operation instruction asl1ALU are combined. This is the case where the register number of the D field of the 1-bit left shift instruction asl1 and the register number of the D field of the arithmetic instruction ALU match. In this case, the first instruction is converted into the invalid instruction nop, and the second instruction is converted into the 3-operand 1-bit left shift operation instruction asl1ALU and passed to the operation stage.

【００７３】すなわち、各フィールドは下記のように変
換される。That is, each field is converted as follows.

【００７４】図１１の（９）は１ビット左シフト命令asl1のＤフィ
ールドのレジスタ番号と加算命令addのS１フィールドの
レジスタ番号とが一致する場合である。この場合、演算
ステージには第１命令はそのままで、第２命令が３オペ
ランドの１ビット左シフト加算命令asl1addに変換され
て渡される。[0074] FIG. 11 (9) shows a case where the register number of the D field of the 1-bit left shift instruction asl1 and the register number of the S1 field of the add instruction add match. In this case, the second instruction is converted to the 3-operand 1-bit left shift addition instruction asl1add and passed to the operation stage without changing the first instruction.

【００７５】すなわち、各フィールドは下記のように変
換される。That is, each field is converted as follows.

【００７６】図１１の（１０）は第１命令が１ビット左シフト命令as
l1で、第２命令又は条件が図１１の（８）又は（９）に
該当しない場合である。この場合、演算ステージには第
１命令はそのままで、第２命令が無効命令nopに変換さ
れて渡される。「その他」の命令は１クロックずれた次
のパイプラインで実行される。[0076] In (10) of FIG. 11, the first instruction is a 1-bit left shift instruction as.
This is the case where the second instruction or condition does not correspond to (8) or (9) in FIG. 11 at l1. In this case, the first instruction remains unchanged and the second instruction is converted into the invalid instruction nop and passed to the operation stage. The "other" instructions are executed in the next pipeline, which is one clock offset.

【００７７】すなわち、各フィールドは下記のように変
換される。That is, each field is converted as follows.

【００７８】ＯＰＮ−１←ＯＰ−１、ＤＮ−１←Ｄ−１、Ｓ１Ｎ−１←Ｓ１−１、ＯＰＮ−２←nop 図１１の（１１）は２つの命令間にデータフローがない
場合のもので、命令の変換は行わない。OPN-1 ← OP-1, DN-1 ← D-1, S1N-1 ← S1-1, OPN-2 ← nop (11) in FIG. 11 shows a case where there is no data flow between two instructions. No instruction conversion is performed.

【００７９】デコード制御部８０１で変換された新しい
２つの命令は信号線８１３、８０４に送出され、それぞ
れ第２ラッチ群８５０内のラッチ８５１、８５２に格納
される。また、データフロー検出回路ＤＦＤＣにおける
先行命令と後続命令との関係の検査結果を図１１のＰＣ
更新の値に基づき命令フェッチステージ（第１ステージ
７００）に信号線８０３を介して連絡する。すなわち、
次のパイプラインでデコードする２つの命令を指定する
情報を命令フェッチステージに連絡する。The two new instructions converted by the decode controller 801 are sent to the signal lines 813 and 804 and stored in the latches 851 and 852 in the second latch group 850, respectively. In addition, the inspection result of the relationship between the preceding instruction and the succeeding instruction in the data flow detection circuit DFDC is shown in the PC of FIG.
The instruction fetch stage (first stage 700) is communicated via the signal line 803 based on the updated value. That is,
The instruction fetch stage is informed of the information specifying the two instructions to be decoded in the next pipeline.

【００８０】さらにデコード制御部８０１は先行命令の
S１フィールド（Ｓ１−１）、Ｄフィールド（Ｄ−
１）、さらに後続命令のS１フィールド５０３（Ｓ１−
２）、Ｄフィールド５０２（Ｄ−２）の４つのレジスタ
番号を信号線８０５、８０６、８０７、８０８を介して
レジスタファイル８０２に送る。レジスタファイル８０
２内の４つのレジスタの内容は、信号線８０９、８１
０、８１１、８１２に読み出され、第２ラッチ群７４内
のラッチ８５３（第１−１入力）、ラッチ８５４（第１
−２入力）、ラッチ８５５（第２−１入力）、ラッチ８
５６（第２−２入力）に格納される。Further, the decode controller 801
S1 field (S1-1), D field (D-
1), and the S1 field 503 (S1-
2), the four register numbers of the D field 502 (D-2) are sent to the register file 802 via the signal lines 805, 806, 807 and 808. Register file 80
The contents of the four registers in 2 are the signal lines 809 and 81.
0, 811, and 812, and latches 853 (1-1st input) and latches 854 (first latch) in the second latch group 74.
-2 input), latch 855 (2-1 input), latch 8
56 (2nd-2nd input).

【００８１】図１５には、レジスタファイル８０２のブ
ロック図が示される。レジスタファイル８０２は、レジ
スタＲＧＳＴＲとレジスタ制御回路ＲＣＣと等で構成さ
れる。レジスタＲＧＳＴＲは、４本のリードポートと２
本のライトポートとがあり、それぞれ信号線８０９、８
１０、８１１、８１２、信号線９５５、９５６に接続さ
れる。従って、レジスタファイル８０２は４つのレジス
タの内容を同時に読み出すことができる。また、２つの
レジスタに同時に書き込むことができる。A block diagram of the register file 802 is shown in FIG. The register file 802 includes a register RGSTR, a register control circuit RCC, and the like. Register RGSTR has 4 read ports and 2
There is a light port of a book and signal lines 809 and 8 respectively.
10, 811, 812 and signal lines 955, 956 are connected. Therefore, the register file 802 can read the contents of four registers at the same time. Also, it is possible to write to two registers at the same time.

【００８２】図１１の（１）、（５）、（８）の場合
は、（Ｓ１−１）と（Ｓ１−２）で指定される２つのレ
ジスタの内容が信号線８１１、８１２に読み出され、ラ
ッチ８５５（第２−１入力）、ラッチ８５６（第２−２
入力）に格納される。In the case of (1), (5), and (8) of FIG. 11, the contents of the two registers designated by (S1-1) and (S1-2) are read onto the signal lines 811, 812. The latch 855 (the 2nd-1st input) and the latch 856 (the 2nd-2nd)
Input).

【００８３】図１１の（２）、（６）、（９）の場合
は、（Ｓ１−１）で指定されるレジスタの内容が信号線
８０９、８１１に読み出され、ラッチ８５３（第１−１
入力）とラッチ８５５（第２−１入力）とに格納され
る。（Ｄ−２）で指定されるレジスタ内容が信号線８１
２に読み出され、ラッチ８５６（第２−２入力）に格
納される。In the case of (2), (6), and (9) of FIG. 11, the contents of the register designated in (S1-1) are read out on the signal lines 809 and 811, and the latch 853 (first-first) is read. 1
Input) and the latch 855 (2-1st input). The content of the register designated by (D-2) is the signal line 81.
2 is read out and stored in the latch 856 (2-2nd input).

【００８４】図１１の（３）の場合は、（Ｓ１−１）で
指定されるレジスタの内容が信号線８１１に読み出さ
れ、ラッチ８５５（第２−１入力）に格納される。In the case of (3) in FIG. 11, the contents of the register designated in (S1-1) are read out to the signal line 811, and stored in the latch 855 (second 2-1 input).

【００８５】図１１の（４）、（７）、（１０）の場合
は、（Ｓ１−１）で指定されるレジスタの内容が信号線
８０９に読み出され、ラッチ８５３（第１−１入力）に
格納される。In the case of (4), (7), and (10) of FIG. 11, the contents of the register designated by (S1-1) are read out to the signal line 809, and the latch 853 (first-1 input) is read. ).

【００８６】図１１の（１１）の場合は、（Ｓ１−
１）、（Ｄ−１）、（Ｓ１−２）、（Ｄ−２）で指定さ
れる４つのレジスタの内容が信号線８０９、８１０、８
１１、８１２に読み出され、ラッチ８５３（第１−１入
力）、ラッチ８５４（第１−２入力）、ラッチ８５５
（第２−１入力）、ラッチ８５６（第２−２入力）に格
納される。In the case of (11) in FIG. 11, (S1-
1), (D-1), (S1-2), and the contents of the four registers designated by (D-2) are signal lines 809, 810, and 8
11 and 812, the latch 853 (first-1 input), the latch 854 (first-2 input), the latch 855.
It is stored in the latch 856 (the 2-2nd input).

【００８７】《実行ステージ》図９には第３ステージ９
００と第３ラッチ群９５０との詳細ブロック図が示され
る。第３ステージ９００は、演算制御部９０１とＡＬＵ
(Alithmetic Logic Unit)等を含む演算器９０２、９０
３と第１入力調整回路９０４、９０５、選択器９０６、
９０７とで構成される。第３ステージ９００である実行
ステージの役割は、２つの命令の演算を実行することで
ある。<< Execution Stage >> FIG. 9 shows the third stage 9
00 and the third latch group 950 are shown in detail. The third stage 900 includes an arithmetic control unit 901 and an ALU.
Operation units 902, 90 including (Alithmetic Logic Unit)
3, the first input adjustment circuits 904 and 905, the selector 906,
And 907. The role of the execution stage, which is the third stage 900, is to execute the operation of two instructions.

【００８８】演算器９０２と第１入力調整回路９０４は
先行命令を演算するための回路で、第２ラッチ群８５０
内の２つのラッチ８５３、８５４から第１−１入力、第
１−２入力が信号線８５９、８６０を介して選択器９０
６に送られる。また、第３ラッチ群９５０内の２つのラ
ッチ９５３、９５４から第１出力、第２出力が信号線９
５５、９５６を介して選択器９０６に送られる。The computing unit 902 and the first input adjusting circuit 904 are circuits for computing the preceding instruction, and are the second latch group 850.
From the two latches 853 and 854 in the first input to the first input and the second input from the second input via the signal lines 859 and 860 to the selector 90.
Sent to 6. In addition, the first and second outputs from the two latches 953 and 954 in the third latch group 950 are the signal line 9
55, 956 to the selector 906.

【００８９】選択器９０６は信号線８５９、９５５及び
９５６のうちの１つを信号線１００１に従い選択して第
１入力回路９０４及び信号線９１２を介して演算器９０
２にデータを送る。また、選択器９０６は信号線８６
０、９５５及び９５６のうちの１つを信号線１００１に
従い選択して信号線９１３を介して演算器９０２にデー
タを送る。The selector 906 selects one of the signal lines 859, 955 and 956 according to the signal line 1001, and selects the arithmetic unit 90 via the first input circuit 904 and the signal line 912.
Send the data to 2. Also, the selector 906 is the signal line 86.
One of 0, 955 and 956 is selected according to the signal line 1001 and data is sent to the arithmetic unit 902 via the signal line 913.

【００９０】演算制御部９０１は第２ラッチ群８５０内
のラッチ８５１の命令を取り込み、その命令機能に従い
演算器９０２、第１入力調整回路９０４を信号線９１１
と９０８で制御し、先行命令のための演算を行う。そし
て結果の値（第１出力）は第３ラッチ群９５０内のラッ
チ９５３に信号線９１８を介して格納される。The arithmetic control unit 901 takes in the instruction of the latch 851 in the second latch group 850, and connects the arithmetic unit 902 and the first input adjusting circuit 904 to the signal line 911 according to the instruction function.
And 908 to perform the operation for the preceding instruction. The resulting value (first output) is stored in the latch 953 in the third latch group 950 via the signal line 918.

【００９１】一方、演算器９０３と第１入力調整回路９
０５は後続命令を演算するための回路で、第２ラッチ群
８５０内の２つのラッチ８５５、８５６から第２−１入
力、第２−２入力が信号線８６１、８６２を介して選択
器９０７に送られる。また、第３ラッチ群９５０内の２
つのラッチ９５３、９５４から第１出力、第２出力が信
号線９５５、９５６を介して選択器９０７に送られる。On the other hand, the arithmetic unit 903 and the first input adjusting circuit 9
Reference numeral 05 denotes a circuit for calculating a subsequent instruction. The two latches 855 and 856 in the second latch group 850 connect the 2-1 input and the 2-2 input to the selector 907 via the signal lines 861 and 862. Sent. In addition, 2 in the third latch group 950
The first output and the second output from one of the latches 953 and 954 are sent to the selector 907 via the signal lines 955 and 956.

【００９２】選択器９０７は信号線８６１、９５５及び
９５６のうちの１つを信号線１００２に従い選択して第
１入力回路９０５及び信号線９１４を介して演算器９０
３にデータを送る。また、選択器９０７は信号線８６
２、９５５及び９５６のうちの１つを信号線１００２に
従い選択して信号線９１５を介して演算器９０３にデー
タを送る。演算制御部９０１は第２ラッチ群８５０内の
ラッチ８５２の命令を取り込み、その命令機能に従い演
算器９０３、第１入力調整回路９０５を信号線９１０と
９０９とで制御し、後続命令のための演算を行う。そし
て結果の値（第２出力）は第３ラッチ群９５０内のラッ
チ９５４に信号線９１９を介して格納される。The selector 907 selects one of the signal lines 861, 955 and 956 according to the signal line 1002, and selects the arithmetic unit 90 via the first input circuit 905 and the signal line 914.
Send data to 3. Also, the selector 907 is the signal line 86.
One of 2, 955 and 956 is selected according to the signal line 1002 and data is sent to the arithmetic unit 903 via the signal line 915. The arithmetic control unit 901 takes in the instruction of the latch 852 in the second latch group 850, controls the arithmetic unit 903 and the first input adjustment circuit 905 with the signal lines 910 and 909 according to the instruction function, and performs the arithmetic operation for the subsequent instruction. I do. The resulting value (second output) is stored in the latch 954 in the third latch group 950 via the signal line 919.

【００９３】以上が、実行ステージ（第３ステージ９０
０）の処理であるが、s1add命令やzextadd命令について
補足説明しておく。asl1add命令やzextadd命令は、加算
を実現できる演算器９０２または９０３への第１入力を
微調整することで実現できる。すなわち第１入力を演算
器に直接入力するのでなく、第１入力調整回路９０４ま
たは９０５に入力しそれを演算制御部９０１が制御し、
１ビット左シフトや０拡張の調整を行っておき、それを
演算器９０２または９０３へ入力し、そこで通常の加算
をするよう制御することで実現できる。The above is the execution stage (third stage 90).
0), but the s1add instruction and zextadd instruction will be supplementarily described. The asl1add instruction and zextadd instruction can be realized by finely adjusting the first input to the arithmetic unit 902 or 903 capable of realizing addition. That is, the first input is not directly input to the arithmetic unit, but is input to the first input adjustment circuit 904 or 905, and the arithmetic control unit 901 controls it.
This can be realized by performing 1-bit left shift or 0-extension adjustment, inputting it to the arithmetic unit 902 or 903, and controlling it to perform normal addition.

【００９４】《書き込みステージ》図１０には第４ステ
ージ１０００の動作を説明するためのブロック図が示さ
れる。第４ステージ１０００は、レジスタ番号解読回路
１０１０とフォワーディング制御回路１０２０とで構成
される。第４ステージ１０００であるレジスタへの書き
込みとフォワーディングを行うステージの役割は以下の
通り。<< Write Stage >> FIG. 10 is a block diagram for explaining the operation of the fourth stage 1000. The fourth stage 1000 is composed of a register number decoding circuit 1010 and a forwarding control circuit 1020. The role of the fourth stage 1000, which is a stage that performs writing to a register and forwarding, is as follows.

【００９５】（１）２つの命令の演算結果を指定された
番号のレジスタに書き込む。(1) Write the operation results of two instructions to the register of the specified number.

【００９６】（２）２つの命令の演算結果が現クロック
での演算ステージ（次のパイプライン）で使用されるな
ら第２ラッチ群８５０内にラッチされている値でなく、
第３ラッチ群９５０内にラッチされている値を演算器に
入力するように調整する（フォーワーディング）。(2) If the operation results of two instructions are used in the operation stage (next pipeline) at the current clock, it is not the value latched in the second latch group 850,
The value latched in the third latch group 950 is adjusted so as to be input to the arithmetic unit (forwarding).

【００９７】まず、（１）の処理から説明する。第４ス
テージ１０００は、第３ラッチ群９５０内のラッチ９５
１、９５２から直前に演算された２つの命令を信号線９
５７、９５８を介してレジスタ番号解読回路１０１０に
取り込む。また第３ラッチ群９５０内のラッチ９５３、
９５４から直前の演算結果の値を信号線９５５、９５６
に送出する。そして、レジスタ番号解読回路１０１０は
直前に実行された命令の２つのＤフィールド内のレジス
タ番号を信号線１００３、１００４に送出して第２ステ
ージ８００のレジスタファイル８０２の書き込みレジス
タ番号を指定する。これで２つの演算結果の値がレジス
タファイル８０２に書き込まれることになる。First, the process (1) will be described. The fourth stage 1000 includes latches 95 in the third latch group 950.
The two instructions calculated immediately before from 1, 952 are sent to the signal line 9
It is taken into the register number decoding circuit 1010 via 57 and 958. Further, the latch 953 in the third latch group 950,
The value of the immediately preceding calculation result from 954 is output to signal lines 955 and 956
To send to. Then, the register number decoding circuit 1010 sends the register numbers in the two D fields of the instruction executed immediately before to the signal lines 1003 and 1004 to specify the write register number of the register file 802 of the second stage 800. Thus, the values of the two calculation results are written in the register file 802.

【００９８】次に（２）の処理を説明する。第４ステー
ジ１０００は、第２ラッチ群８５０内のラッチ８５１、
８５２から今回演算すべき２つの命令を信号線８５７、
８５８を介してフォワーディング制御回路１０２０に取
り込む。また、第３ラッチ群９５０内のラッチ９５１、
９５２から直前に演算された２つの命令を信号線９５
７、９５８を介してフォワーディング制御回路１０２０
に取り込む。そして、フォワーディング制御回路１０２
０は直前に実行された命令の２つのＤフィールド内のレ
ジスタ番号と今回演算されるべき２つの命令のS１フィ
ールド、S２フィールドの番号に同じものがあるか検査
する。検査の結果同じものがあれば、その部分につい
て、第２ラッチ群８５０内のラッチ８５３、８５４、８
５５、８５６内の値でなく、第３ラッチ群９５０内のラ
ッチ９５３、９５４内の値（信号線９５５、９５６）が
演算器９０２、９０３に入力されるようにフォワーディ
ング制御回路１０２０は信号線１００１、１００２を送
出して２つの選択器９０６、９０７を制御する。Next, the process (2) will be described. The fourth stage 1000 includes latches 851 in the second latch group 850,
The two instructions to be calculated this time from 852 are signal line 857,
It is taken into the forwarding control circuit 1020 via 858. In addition, the latches 951 in the third latch group 950,
The two instructions calculated immediately before from 952 are sent to the signal line 95.
Forwarding control circuit 1020 via 7, 958
Take in. Then, the forwarding control circuit 102
For 0, it is checked whether the register numbers in the two D fields of the instruction executed immediately before and the numbers of the S1 field and S2 field of the two instructions to be calculated this time have the same value. If there is the same one as a result of the inspection, the latches 853, 854, 8 in the second latch group 850 are checked for that portion.
The forwarding control circuit 1020 is configured so that the values (signal lines 955 and 956) in the latches 953 and 954 in the third latch group 950 are input to the arithmetic units 902 and 903 instead of the values in 55 and 856. , 1002 to control the two selectors 906 and 907.

【００９９】《命令列の処理》図１３には本発明のスー
パスカラ処理において命令列が個別のクロックでどのよ
うに処理されていくかが示されている。また、比較のた
め２命令が並列に実行できないときに無効命令nopを挿
入するのみ場合に命令列が個別のクロックでどのように
処理されていくかも示されている。本発明では、１クロ
ック当たり２つの命令処理が可能となっている。また、
本発明では、２命令が並列に実行できないときに無効命
令nopを挿入する場合に比べて、実行命令数が６つ少な
く実行時間が短くなる（この命令列においては約４０％
実行命令が少なくなる）。<< Processing of Instruction Sequence >> FIG. 13 shows how the instruction sequence is processed by individual clocks in the superscalar processing of the present invention. For comparison, it is also shown how the instruction sequence is processed with individual clocks only when the invalid instruction nop is inserted when the two instructions cannot be executed in parallel. In the present invention, it is possible to process two instructions per clock. Also,
In the present invention, the number of execution instructions is 6 less and the execution time is shorter than that in the case where the invalid instruction nop is inserted when two instructions cannot be executed in parallel (about 40% in this instruction sequence).
There are less execution instructions).

【０１００】先行命令がmov, zext, asl1等の転送系命
令で後続命令がaddなどの加算命令であれば、２つの命
令を１つの命令に変換し、１つのクロックで実行するの
で、全体としてのクロック数を削減でき、高速化を図れ
る。また、先行命令が転送系命令で後続命令が演算命令
であり、さらに両者間にデータフローが存在する場合で
も、１つのクロックで実行するので、全体としてのクロ
ック数を削減でき、高速化を図れる。If the preceding instruction is a transfer type instruction such as mov, zext, asl1 and the following instruction is an addition instruction such as add, two instructions are converted into one instruction and executed in one clock. The number of clocks can be reduced and the speed can be increased. Further, even if the preceding instruction is a transfer instruction and the subsequent instruction is an operation instruction, and there is a data flow between the two, it is executed with one clock, so that the number of clocks as a whole can be reduced and the speed can be increased. .

【０１０１】《マイクロコンピュータへの適用例》図１
４には本発明のスーパスカラ方式を用いたマイクロプロ
コンピュータシステムが示される。マイクロコンピュー
タＭＣＵは、中央処理装置ＣＰＵと、浮動小数点処理ユ
ニットＦＰＵと、積和演算機能を有する乗算器ＭＵＬＴ
と、論理アドレスを物理アドレスに変換するメモリ管理
ユニットＭＭＵと、命令及びデータのキャッシュメモリ
ＣＡＣＨＥと、キャッシュコントローラＣＣＮＴと、外
部バスインタフェースＥＢＩＦと、３２ビット論理アド
レスバスＬＡＢＵＳと、３２ビット物理アドレスデータ
バスＰＡＢＵＳと、３２ビットデータバスＤＢＵＳ、Ｄ
ＢＳとを単結晶シリコンのような半導体基板上に形成さ
れ、樹脂封止される（プラスチックパッケージに封止さ
れる）。<< Application Example to Microcomputer >> FIG. 1
4 shows a microcomputer system using the superscalar system of the present invention. The microcomputer MCU includes a central processing unit CPU, a floating point processing unit FPU, and a multiplier MULT having a product-sum operation function.
A memory management unit MMU for converting a logical address into a physical address, an instruction and data cache memory CACHE, a cache controller CCNT, an external bus interface EBIF, a 32-bit logical address bus LABUS, and a 32-bit physical address data bus. PABUS and 32-bit data bus DBUS, D
BS and BS are formed on a semiconductor substrate such as single crystal silicon and resin-sealed (sealed in a plastic package).

【０１０２】マイクロコンピュータＭＣＵは、外部アド
レスバスＥＡＢとデータバスＥＤＢを介してDRAM等のダ
イナミック記憶素子をメモリセルに使用した半導体メモ
リ等からなる主記憶装置ＭＭに接続される。The microcomputer MCU is connected via an external address bus EAB and a data bus EDB to a main memory unit MM composed of a semiconductor memory using dynamic memory elements such as DRAM as memory cells.

【０１０３】中央処理装置ＣＰＵは、図６に示されるパ
イプライン・データパスで構成される。ただし、第３ス
テージと第４ステージとの間にメモリアクセスステージ
を有し、いわゆる５段パイプラインを構成する。なお、
データメモリと命令メモリ７０３は、キャッシュメモリ
ＣＡＣＨＥ又は主記憶装置ＭＭに対応し、中央処理装置
ＣＰＵ内には存在しない。中央処理装置ＣＰＵは２バイ
ト固定長命令の命令体系の命令を実行し、演算器９０
２、９０３は、３２ビット長のＡＬＵ等をそれぞれ有す
る。また、レジスタファイル８０２は、３２ビット長の
汎用レジスタを１６本を有する。すなわち、中央処理装
置ＣＰＵは特開平５−１９７５４６号公報に記載される
２バイト・２オペランド命令体系（命令セット）の命令
を実行する。特開平５−１９７５４６号公報に記載され
るＣＰＵはスーパスカラ方式でない。これに比べて中央
処理装置ＣＰＵはスーパスカラ方式であり、中央処理装
置ＣＰＵは出願番号１９９２／８９７４５７号に記載さ
れる命令体系と同一の命令体系を実行できる。従って、
従来のソフトウエアと互換性（オブジェクト・コード互
換性）を維持しながら、高速性能を実現できる。また、
２バイト固定長命令の特徴である高コード効率化も維持
できる。The central processing unit CPU is composed of the pipeline data path shown in FIG. However, a memory access stage is provided between the third stage and the fourth stage to form a so-called 5-stage pipeline. In addition,
The data memory and the instruction memory 703 correspond to the cache memory CACHE or the main memory MM and do not exist in the central processing unit CPU. The central processing unit CPU executes the instruction of the instruction system of the 2-byte fixed length instruction, and the arithmetic unit 90
Reference numerals 2 and 903 each have a 32-bit ALU or the like. The register file 802 has 16 general-purpose registers having a 32-bit length. That is, the central processing unit CPU executes the instructions of the 2-byte / 2-operand instruction system (instruction set) described in Japanese Patent Laid-Open No. 5-197546. The CPU described in JP-A-5-197546 is not a superscalar system. On the other hand, the central processing unit CPU is a superscalar system, and the central processing unit CPU can execute the same command system as the command system described in the application number 1992/897457. Therefore,
High-speed performance can be achieved while maintaining compatibility with existing software (object code compatibility). Also,
The high code efficiency, which is a characteristic of 2-byte fixed length instructions, can be maintained.

【０１０４】以上本発明者によってなされた発明を実施
例に基づいて具体的に説明したが、本発明はそれに限定
されるものではなく、その要旨を逸脱しない範囲におい
て種々変更可能であることはいうまでもない。例えば、
図６以下の実施例では、２バイト・２オペランド命令体
系の場合について説明したが、４バイト・３オペランド
命令体系の場合にも適用できる。０拡張命令、０拡張演
算命令について説明したが、符号拡張命令、符号拡張演
算命令についてもの同様に適用できる。また、第１命令
の転送命令のＳ１フィールドはレジスタを指定する場合
について説明したが、即値データである場合についても
適用できる。Although the invention made by the present inventor has been specifically described based on the embodiments, the present invention is not limited to the embodiments and various modifications can be made without departing from the scope of the invention. There is no end. For example,
In the embodiment shown in FIG. 6 and the following, the case of the 2-byte / 2-operand instruction system has been described, but it is also applicable to the case of the 4-byte / 3-operand instruction system. Although the 0 extension instruction and the 0 extension operation instruction have been described, the same can be applied to the sign extension instruction and the sign extension operation instruction. Further, although the case where the register is specified in the S1 field of the transfer instruction of the first instruction has been described, the case where the S1 field is immediate data can be applied.

【０１０５】[0105]

【発明の効果】本願において開示される発明のうち代表
的なものによって得られる効果を簡単に説明すれば下記
の通りである。The effects obtained by typical ones of the inventions disclosed in the present application will be briefly described as follows.

【０１０６】隣接命令間のデータフローを検出し、命令
を変換することにより、並列に命令を実行できる。従っ
て、従来であれば複数クロックの時間を要していた複数
の命令処理を１クロックで実行できる。それによって、
全体としての実行クロック数を削減できる。The instructions can be executed in parallel by detecting the data flow between adjacent instructions and converting the instructions. Therefore, it is possible to execute a plurality of instruction processes in one clock, which conventionally takes a plurality of clocks. Thereby,
The number of execution clocks as a whole can be reduced.

[Brief description of drawings]

【図１】マイクロプロセッサのパイプライン化された実
現方式を示す図。FIG. 1 illustrates a pipelined implementation of a microprocessor.

【図２】パイプライン処理の概念を示す。FIG. 2 shows a concept of pipeline processing.

【図３】スーパースカラ処理の概念を示す。FIG. 3 shows the concept of superscalar processing.

【図４】４バイト命令体系の命令形式と命令レパートリ
の例を示す。FIG. 4 shows an example of an instruction format and an instruction repertoire of a 4-byte instruction system.

【図５】２バイト命令体系の命令形式と命令レパートリ
の例を示す。FIG. 5 shows an example of a command format and a command repertoire of a 2-byte command system.

【図６】本発明の実施例に係るマイクロプロセッサのパ
イプラインのデータパスを示す図。FIG. 6 is a diagram showing a data path of a pipeline of a microprocessor according to an embodiment of the present invention.

【図７】第１ステージと第１ラッチ群との詳細ブロック
図。FIG. 7 is a detailed block diagram of a first stage and a first latch group.

【図８】第２ステージと第２ラッチ群との詳細ブロック
図。FIG. 8 is a detailed block diagram of a second stage and a second latch group.

【図９】第３ステージと第３ラッチ群との詳細ブロック
図。FIG. 9 is a detailed block diagram of a third stage and a third latch group.

【図１０】第４ステージの動作を説明するブロック図。FIG. 10 is a block diagram illustrating the operation of the fourth stage.

【図１１】命令デコードステージの２つの命令を演算ス
テージの２つの命令に変換する規則を示す。FIG. 11 shows rules for converting two instructions in the instruction decode stage into two instructions in the operation stage.

【図１２】デコード制御部の一部の詳細ブロック図を示
す。FIG. 12 shows a detailed block diagram of a part of a decoding control unit.

【図１３】命令列が個別のクロックでどのように処理さ
れていくかを示す。FIG. 13 shows how an instruction sequence is processed with individual clocks.

【図１４】本発明のスーパスカラ方式を用いたマイクロ
コンピュータシステムの図。FIG. 14 is a diagram of a microcomputer system using the superscalar system of the present invention.

【図１５】レジスタファイルのブロック図。FIG. 15 is a block diagram of a register file.

[Explanation of symbols]

１０１……第１ステージ、１０３……第２ステージ、１
０５……第３ステージ、１０７……第４ステージ、１０
８、１０９、１１０……信号線、４０１……ＯＰフィー
ルド、４０２……Ｄフィールド、４０３……Ｓ１フィー
ルド、４０４……Ｓ２フィールド、５０１……ＯＰフィ
ールド、５０２……Ｄフィールド、５０３……Ｓ１フィ
ールド、７００……第１ステージ、８００……第２ステ
ージ、９００……第３ステージ、１０００……第４ステ
ージ、７０１……プログラムカウンタ、７０２……フェ
ッチ制御部、７０３……命令メモリ、７０４、７０５、
７０６、７０７……信号線、７５１、７５２……ラッ
チ、８０１……デコード制御部、８０２……レジスタフ
ァイル、８０３、８０４、８０５、８０６、８０７、８
０８、８０９、８１０、８１１、８１２、８１３……信
号線、８５１、８５２、８５３、８５４、８５５、８５
６……ラッチ、８５７、８５８、８５９、８６０、８６
１、８６２……信号線、９０１……演算制御部、９０２
……演算器、９０３……演算器、９０４……第１入力調
整回路、９０５……第１入力調整回路、９０６……選択
器、９０７……選択器、９０８、９０９、９１０、９１
１、９１２、９１３、９１４、９１５、９１６、９１
７、９１８、９１９……信号線、９５１、９５２、９５
３、９５４……ラッチ、９５５、９５６、９５７、９５
８……信号線、１００１、１００２、１００３、１００
４……信号線、１０１０……レジスタ番号制御回路、１
０２０……フォワーディング制御回路、ＩＮＣＣ……命
令変換回路、ＤＦＤＣ……データフロー検出回路、ＭＣ
Ｕ……マイクロコンピュータ、ＣＰＵ……中央処理装
置、ＦＰＵ……浮動小数点処理ユニット、ＭＵＬＴ…
…乗算器、ＭＭＵ……メモリ管理ユニット、ＣＡＣＨＥ
……命令及びデータのキャッシュメモリ、ＣＣＮＴ…
…キャッシュコントローラ、ＥＢＩＦ……外部バスイ
ンタフェース、ＬＡＢＵＳ……３２ビット論理アドレ
スバス、ＰＡＢＵＳ……３２ビット物理アドレスデー
タバス、ＤＢＵＳ、ＤＢＳ……３２ビットデータバス、
ＥＡＢ……外部アドレスバス、ＥＤＢ……外部データバ
ス、ＭＭ……主記憶装置、ＲＣＣ……レジスタ制御回
路、ＲＧＳＴＲ……レジスタ。101 ... 1st stage, 103 ... 2nd stage, 1
05 ... 3rd stage, 107 ... 4th stage, 10
8, 109, 110 ... Signal line, 401 ... OP field, 402 ... D field, 403 ... S1 field, 404 ... S2 field, 501 ... OP field, 502 ... D field, 503 ... S1 Field: 700 ... First stage, 800 ... Second stage, 900 ... Third stage, 1000 ... Fourth stage, 701 ... Program counter, 702 ... Fetch controller, 703 ... Instruction memory, 704 , 705,
706, 707 ... Signal line, 751, 752 ... Latch, 801, Decode control unit, 802 ... Register file, 803, 804, 805, 806, 807, 8
08, 809, 810, 811, 812, 813 ... Signal lines, 851, 852, 853, 854, 855, 85
6 ... Latch, 857, 858, 859, 860, 86
1, 862 ... Signal line, 901 ... Arithmetic control unit, 902
... calculator, 903 ... calculator, 904 ... first input adjusting circuit, 905 ... first input adjusting circuit, 906 ... selector, 907 ... selector, 908, 909, 910, 91
1, 912, 913, 914, 915, 916, 91
7, 918, 919 ... Signal line, 951, 952, 95
3, 954 ... Latch, 955, 956, 957, 95
8 ... Signal line, 1001, 1002, 1003, 100
4 ... Signal line, 1010 ... Register number control circuit, 1
020 ... Forwarding control circuit, INCC ... Command conversion circuit, DFDC ... Data flow detection circuit, MC
U ... Microcomputer, CPU ... Central processing unit, FPU ... Floating point processing unit, MULT ...
... Multiplier, MMU ... Memory management unit, CACHE
... Instruction and data cache memory, CCNT ...
... cache controller, EBIF ... external bus interface, LABUS ... 32-bit logical address bus, PABUS ... 32-bit physical address data bus, DBUS, DBS ... 32-bit data bus,
EAB ... External address bus, EDB ... External data bus, MM ... Main memory, RCC ... Register control circuit, RGSTR ... Register.

Claims

[Claims]

1. A data processing device that divides into a plurality of stages to execute an instruction, wherein the plurality of stages include a first stage for fetching at least an instruction from an instruction memory and a fetch stage for the first stage. A second stage for decoding the instruction, a third stage for executing the instruction decoded in the second stage, and a fourth stage for writing the result executed in the third stage into a register. A data processing device, wherein an instruction of a first instruction format stored in the instruction memory is changed to an instruction of a second instruction format and executed.

2. The first instruction format is an instruction format for operating a first operand and a second operand in an operation instruction and storing an operation result in the second operand. The format is the first in the operation instruction
2. The data processing device according to claim 1, wherein the data processing device has an instruction format in which the operand of [1] and the second operand are operated and the operation result is stored in the third operand.

3. The second stage detects that the preceding instruction is a data transfer instruction between registers, detects that the succeeding instruction is an arithmetic instruction, and further detects the transfer destination register number and the succeeding instruction of the preceding instruction. 3. The data processing according to claim 2, wherein it is detected that the instruction transfer destination register numbers are the same, the instruction is converted into an operation instruction of the second instruction format, and the operation instruction is sent to the third stage. apparatus.

4. The data processing device according to claim 3,
It is formed on a single semiconductor substrate.

5. The data processing apparatus according to claim 4, wherein the preceding instruction is a data transfer instruction that transfers the content of the transfer source register to the transfer destination register as it is.

6. The data processing apparatus according to claim 4, wherein the preceding instruction is a data transfer instruction that shifts the contents of a transfer destination register and transfers the contents to the transfer destination register.

7. The preceding instruction resets the contents of the transfer source register to 0.
The data processing apparatus according to claim 4, wherein the data processing instruction is a data transfer instruction that is expanded or sign-extended and transferred to a transfer source register.

8. The second instruction format is the first instruction format.
2. The data processing device according to claim 1, further comprising an instruction in which a plurality of instructions of the instruction format are combined.

9. The second stage detects that the preceding instruction is a data transfer between registers, detects that the following instruction is a fixed bit shift instruction, and further detects the transfer destination register number of the preceding instruction. 9. The method according to claim 8, wherein it is detected that the transfer source register numbers of the subsequent instructions are the same, the single instruction is converted into the one shift instruction of the second instruction format, and the shift instruction is sent to the third stage. Data processing equipment.

10. The second stage detects that the preceding instruction is a data transfer instruction between registers, detects that the succeeding instruction is an arithmetic instruction, and further transfers the register number of the transfer destination of the preceding instruction and the succeeding instruction. Detecting that the transfer source register numbers of the two are the same, converting the subsequent instruction into an operation instruction of the second instruction format having no data flow relationship with the preceding instruction, and sending the operation instruction to the third stage. 3. The data processing apparatus according to claim 2, wherein the same stage can be executed in parallel.

11. The data processing apparatus according to claim 10, wherein the first instruction format is a 2-byte fixed length instruction.

12. The data processing device according to claim 11, wherein the preceding instruction is a data transfer instruction for directly transferring the contents of the transfer source register to the transfer destination register.

13. The data processing apparatus according to claim 11, wherein the preceding instruction is a data transfer instruction that shifts the contents of a transfer destination register and transfers the contents to the transfer destination register.

14. The data processing apparatus according to claim 11, wherein the preceding instruction is a data transfer instruction for 0-extending or sign-extending the contents of the transfer source register and transferring the result to the transfer source register.

15. A pipeline type data processing apparatus, comprising: a first stage for reading a fixed length instruction stored in an instruction memory; and data executed by a plurality of read instructions, and When the plurality of instructions have a predetermined relationship, a second stage that changes the plurality of instructions so that the plurality of instructions can be executed in parallel in a plurality of pipelines; and the changed plurality of instructions in parallel. And a third stage for executing the data processing device.

16. The first stage according to claim 15 has two stages.
A data processing device, wherein two instructions are read simultaneously, and the second stage modifies the two instructions so that the two instructions can be executed in parallel by two pipelines.

17. The first stage according to claim 16,
A data processing device characterized by reading a 2-byte fixed length instruction.

18. A microcomputer in which a CPU and an instruction memory are formed on a single semiconductor substrate, wherein the CPU includes an instruction fetch unit for reading two 2-byte fixed length instructions stored in the instruction memory. If the read two instructions have a dependency on the data to be executed and the two instructions have a predetermined relationship,
An instruction decoder that modifies the two instructions so that the two instructions can be executed in parallel by two pipelines, and two 4-byte-long arithmetic units that execute the two modified instructions in parallel. Microcomputer characterized by.

19. The instruction decoder according to claim 18, wherein an instruction for operating a first operand and a second operand in an operation instruction and storing an operation result in the second operand is a first operand. And a second operand are operated, and the instruction is changed to an instruction for storing the operation result in the third operand.

20. The instruction decoder according to claim 18, wherein the preceding instruction is a data transfer instruction between registers, the subsequent instruction is an arithmetic instruction, and the destination of the preceding instruction is further detected. A microcomputer characterized by detecting that a register number and a transfer source register number of a succeeding instruction are the same and changing the succeeding instruction to an operation instruction having no data flow relationship with the preceding instruction.