JP2927281B2

JP2927281B2 - Parallel processing unit

Info

Publication number: JP2927281B2
Application number: JP29204097A
Authority: JP
Inventors: 憲一黒沢; 成弥田中; 康弘中塚; 忠秋坂東
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1997-10-24
Filing date: 1997-10-24
Publication date: 1999-07-28
Anticipated expiration: 2014-07-28
Also published as: JPH1083301A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はミニコン，マイコン
等のＣＰＵに係り、特に高速動作に好適な並列処理装置
および並列処理方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a CPU such as a minicomputer or a microcomputer, and more particularly to a parallel processing device and a parallel processing method suitable for high-speed operation.

【０００２】[0002]

【従来の技術】従来より、計算機の高速化の為に、種々
の工夫が行われている。代表的な手法にパイプラインが
ある。パイプラインとは、１つの命令の処理を完全に終
えてから次の命令を始めるのではなくて、１つの命令を
複数ステージに分け、最初の命令が２番目のステージに
さしかかったところで、次の命令の最初のステージの処
理を始めるというようにバケツリレー式に処理する方法
である。このような方法については、富田眞治著「並列
計算機構成論」昭晃堂ｐ．２５〜６８に詳しく論じられ
ている。ｎ段パイプライン方式を用いれば、それぞれの
パイプラインステージにて処理されている命令は１つで
あるが、全体としてｎ個の命令を同時に処理することが
でき、パイプラインピッチごとに、１つの命令の処理を
終えることができる。2. Description of the Related Art Conventionally, various measures have been taken to increase the speed of a computer. A typical approach is a pipeline. The pipeline is not to start the next instruction after completely processing one instruction, but to divide one instruction into multiple stages, and when the first instruction reaches the second stage, the next This is a method in which processing is performed in a bucket brigade manner, such as starting processing of the first stage of an instruction. For such a method, see Shinji Tomita, "Parallel Computer Configuration Theory", Shokodo, p. 25-68. When the n-stage pipeline method is used, one instruction is processed in each pipeline stage. However, n instructions can be simultaneously processed as a whole, and one instruction is provided for each pipeline pitch. Processing of the instruction can be completed.

【０００３】さて、計算機の命令アーキテクチャが、そ
の処理方式，処理性能におよぼす影響が大であることは
周知である。命令アーキテクチャの観点から計算機を分
類すると、ＣＩＳＣ（Complex Instruction Set Comput
er）とＲＩＳＣ（Reduced Instruction Set Computer）
とに分けられる。It is well known that the instruction architecture of a computer has a great effect on its processing method and processing performance. Computers are classified from the viewpoint of instruction architecture.
er) and RISC (Reduced Instruction Set Computer)
And divided into

【０００４】ＣＩＳＣでは複雑な命令をマイクロ命令を
使って処理する。これに対して、ＲＩＳＣでは、命令を
簡単なものに絞る代わりに、マイクロ命令を用いずに、
ハードワイヤド論理による制御で高速化を図っている。
以下、従来のＣＩＳＣ，ＲＩＳＣの両者について、ハー
ドウエア概要と、そのパイプライン動作について述べ
る。In the CISC, a complicated instruction is processed by using a micro instruction. In RISC, on the other hand, instead of narrowing down instructions to simple ones, instead of using microinstructions,
High speed is achieved by control using hard-wired logic.
The hardware outline and the pipeline operation of both conventional CISC and RISC will be described below.

【０００５】図２はＣＩＳＣ型計算機の一般的構成を説
明する図である。２００はメモリインタフェース、２０
１はプログラムカウンタ（ＰＣ）、２０２は命令キャッ
シュ、２０３は命令レジスタ、２０４は命令デコーダ、
２０５はアドレス計算制御回路、２０６はマイクロ命令
を格納しておくControl Storage(ＣＳ）、２０７はマイ
クロ命令カウンタ、２０８はマイクロ命令レジスタ、２
０９はデコーダ、210はメモリとデータをやり取りする
レジスタＭＤＲ（Memory Data Register）、２１１はメ
モリ上のオペランドアドレスを示すレジスタＭＡＲ（Me
mory AddressRegister）、２１２はアドレス加算器、２
１３はレジスタファイル、２１４はＡＬＵ（Arithmetic
and Logic Unit）である。FIG. 2 is a diagram for explaining a general configuration of a CISC type computer. 200 is a memory interface, 20
1 is a program counter (PC), 202 is an instruction cache, 203 is an instruction register, 204 is an instruction decoder,
205, an address calculation control circuit; 206, a control storage (CS) for storing microinstructions; 207, a microinstruction counter; 208, a microinstruction register;
09 is a decoder, 210 is a register MDR (Memory Data Register) for exchanging data with the memory, and 211 is a register MAR (Me) indicating an operand address on the memory.
mory AddressRegister), 212 is an address adder, 2
13 is a register file, 214 is an ALU (Arithmetic
and Logic Unit).

【０００６】動作の概要を説明する。ＰＣ２０１によっ
て示された命令が、命令キャッシュより取り出され、信
号２１７を通して、命令レジスタ２０３にセットされ
る。命令デコーダ２０４は命令を信号２１８を通して受
け取り、マイクロ命令の先頭アドレスを信号２２０を通
して、マイクロ命令カウンタ２０７にセットする。ま
た、アドレス計算方法を信号２１９を通して、アドレス
計算制御回路２０５に指示する。アドレス計算制御回路
２０５は、アドレス計算に必要なレジスタの読み出し、
アドレス加算器２１２の制御等を行う。アドレス計算に
必要なレジスタは、レジスタファイル２１３よりバス２
２６，２２７を通してアドレス加算器２１２に送出され
る。一方、マイクロ命令は１マシンサイクルごとにＣＳ
２０６より読み出され、デコーダ２０９によりデコード
され、ALU214，レジスタファイル213を制御するのに使
われる。２２４はこれらの制御信号である。ALU214はレ
ジスタよりバス２２８，２２９を通して送られるデータ
を演算し、再びレジスタファイル２１３に格納する。メ
モリインタフェース２００は、命令のフェッチ，オペラ
ンドのフェッチ等、メモリとのやり取りを行う回路であ
る。An outline of the operation will be described. The instruction indicated by the PC 201 is fetched from the instruction cache and set in the instruction register 203 through a signal 217. The instruction decoder 204 receives the instruction via signal 218 and sets the start address of the microinstruction via signal 220 to the microinstruction counter 207. Further, an address calculation method is instructed to the address calculation control circuit 205 through a signal 219. The address calculation control circuit 205 reads a register necessary for address calculation,
It controls the address adder 212 and the like. Registers required for address calculation are stored in the bus 2 from the register file 213.
26 and 227 to the address adder 212. On the other hand, the microinstruction is CS every machine cycle.
The ALU 214 and the register file 213 are used to control the ALU 214 and the register file 213. Reference numeral 224 denotes these control signals. The ALU 214 calculates data sent from the registers through the buses 228 and 229 and stores the data in the register file 213 again. The memory interface 200 is a circuit that exchanges data with the memory, such as fetching instructions and fetching operands.

【０００７】次に、図２で示した計算機のパイプライン
動作を図３，図４，図５を用いて説明する。パイプライ
ンは６段である。ＩＦ(Instruction Fetch）ステージで
は、命令キャッシュ２０２より命令が読み出され、命令
レジスタ２０３にセットされる。Ｄ（Decode）ステージ
では、命令デコーダ２０４により、命令のデコードが行
われる。Ａ(Address）ステージではアドレス加算器２１
２により、オペランドのアドレス計算が行われる。ＯＦ
(Operand Fetch）ステージでは、メモリインタフェース
２００を通して、MAR211で指されたアドレスのオペラン
ドがフェッチされ、MDR210にセットされる。次に、ＥＸ
(Execution）ステージでは、レジスタファイル２１３、
及び、MDR210より、データが呼び出され、ALU214に送ら
れ、演算される。最後に、Ｗ(Write）ステージでは、演
算結果がレジスタファイル２１３の中の１つのレジスタ
にバス２３０を通して格納される。Next, the pipeline operation of the computer shown in FIG. 2 will be described with reference to FIGS. 3, 4, and 5. The pipeline has six stages. In the IF (Instruction Fetch) stage, an instruction is read from the instruction cache 202 and set in the instruction register 203. In the D (Decode) stage, the instruction decoder 204 decodes the instruction. In the A (Address) stage, the address adder 21
2, the address of the operand is calculated. OF
In the (Operand Fetch) stage, the operand at the address pointed to by the MAR 211 is fetched through the memory interface 200 and set in the MDR 210. Next, EX
In the (Execution) stage, register file 213,
The data is called from the MDR 210, sent to the ALU 214, and calculated. Finally, in the W (Write) stage, the operation result is stored in one register in the register file 213 via the bus 230.

【０００８】さて、図３は基本命令の１つである加算命
令ＡＤＤを連続して処理する様子を示したものである。
１マシンサイクルごとに１命令処理されており、ALU21
4，アドレス加算器２１２共に毎サイクル並列して動い
ている。FIG. 3 shows a state in which an addition instruction ADD, which is one of the basic instructions, is continuously processed.
One instruction is processed for each machine cycle, and ALU21
4. Both address adders 212 operate in parallel every cycle.

【０００９】図４は、条件つき分岐命令ＢＲＡｃｃの処
理の様子を示したものである。TEST命令でフラグが生成
される。図４は条件成立時のフローを示したものであ
る。フラグ生成がＥＸステージで行われるため、ジャン
プ先命令のフェッチまでに３サイクルの待ちサイクルが
生じる。パイプライン段数を増やせば増やすほど、この
待ちサイクルは増え、性能向上のネックと成る。図５は
複雑な命令の実行フローを示したものである。命令１が
複雑な命令である。複雑な命令とはストリングコピーの
様に多数のメモリアクセスがある命令等で、通常ＥＸス
テージを多数回延長することにより処理される。ＥＸス
テージはマイクロ命令により制御される。マイクロ命令
は、１マシンサイクルに１回アクセスされる。即ち、複
雑な命令は、マイクロプログラムを複数回読み出すこと
により処理する。この時、ＥＸステージには１つの命令
しか入らないので、次の命令（図５命令２）は待たされ
る。このようなときには、ALU214は常に動いているが、
アドレス加算器２１２には遊びが生じてしまう。FIG. 4 shows the processing of a conditional branch instruction BRAcc. A flag is generated by the TEST instruction. FIG. 4 shows a flow when the condition is satisfied. Since the flag is generated in the EX stage, three wait cycles occur until the jump destination instruction is fetched. As the number of pipeline stages increases, the number of wait cycles increases, and this becomes a bottleneck in performance improvement. FIG. 5 shows an execution flow of a complicated instruction. Instruction 1 is a complicated instruction. A complicated instruction is an instruction having a large number of memory accesses, such as a string copy, and is usually processed by extending the EX stage many times. The EX stage is controlled by a micro instruction. Microinstructions are accessed once per machine cycle. That is, complicated instructions are processed by reading the microprogram a plurality of times. At this time, since only one instruction enters the EX stage, the next instruction (instruction 2 in FIG. 5) is awaited. In such a case, ALU214 is always moving,
Play occurs in the address adder 212.

【００１０】次に、ＲＩＳＣ型計算機について説明す
る。図６はＲＩＳＣ型計算機の一般的構成を説明する図
である。６０１はメモリインタフェース、６０２はプロ
グラムカウンタ、６０３は命令キャッシュ、６０４はシ
ーケンサ、６０５は命令レジスタ、６０６はデコーダ、
６０７はレジスタファイル、６０８はＡＬＵ、６０９は
ＭＤＲ、６１０はＭＡＲである。Next, the RISC type computer will be described. FIG. 6 is a diagram illustrating a general configuration of a RISC-type computer. 601 is a memory interface, 602 is a program counter, 603 is an instruction cache, 604 is a sequencer, 605 is an instruction register, 606 is a decoder,
607 is a register file, 608 is an ALU, 609 is an MDR, and 610 is a MAR.

【００１１】図７に基本命令の処理フローを示す。ＩＦ
(Instruction Fetch）ステージでは、プログラムカウン
タ６０２で指される命令が、命令キャッシュより読み出
され、命令レジスタ６０５にセットされる。また、シー
ケンサ６０４は命令信号615,ALU608よりのフラグ信号６
１６より、プログラムカウンタ６０２を制御する。Ｒ
（Read）ステージでは、レジスタファイル６０７より、
命令で示されたレジスタが、バス６１８，６１９を通し
てALU608に転送される。また、Ｅ(Execution）ステージ
では、ALU608により、演算が行われる。さいごに、Ｗ(W
rite）ステージでは、演算された結果がレジスタファイ
ル６０７に、バス６２０を通して格納される。FIG. 7 shows a processing flow of the basic instruction. IF
In the (Instruction Fetch) stage, the instruction indicated by the program counter 602 is read from the instruction cache and set in the instruction register 605. In addition, the sequencer 604 receives the flag signal 6 from the instruction signal 615 and ALU608.
16, the program counter 602 is controlled. R
In the (Read) stage, from the register file 607,
The register indicated by the instruction is transferred to ALU 608 via buses 618 and 619. In the E (Execution) stage, the ALU 608 performs calculations. Finally, W (W
In the (rite) stage, the calculated result is stored in the register file 607 via the bus 620.

【００１２】ＲＩＳＣ型計算機では、命令を基本的な命
令のみに限定している。演算はレジスタ−レジスタ間に
限られており、オペランドフェッチを伴う命令はロード
命令とストア命令のみである。複雑な命令は基本命令を
組合せることによって実現する。また、マイクロ命令は
使用されず、命令レジスタ６０５の内容が直接デコーダ
６０６でデコードされ、ALU608等を制御する。In the RISC type computer, instructions are limited to only basic instructions. The operation is limited between the registers, and instructions involving operand fetch are only load instructions and store instructions. Complex instructions are realized by combining basic instructions. Further, the micro instruction is not used, and the contents of the instruction register 605 are directly decoded by the decoder 606 to control the ALU 608 and the like.

【００１３】図７はレジスタ−レジスタ間演算の処理フ
ローを示している。パイプラインは命令が簡単なため４
段ですんでいる。FIG. 7 shows a processing flow of a register-to-register operation. The pipeline has 4 simple instructions.
It is stepped.

【００１４】図８は条件分岐時の処理フローを示してい
る。ＣＩＳＣ型計算機に比して、パイプライン段数が少
ないため、待ちサイクルが少ない。図８の例では、待ち
サイクルは１サイクルのみである。しかもＲＩＳＣ型計
算機では、この１サイクルの待ちサイクルも有効に利用
するデイレイド分岐方式が採用されているのが普通であ
る。この方式は、図９に示すごとくＢＲＡｃｃ命令に引
き続くＡＤＤ命令を待ちサイクルの間に実行する方式で
ある。このようにコンパイラが分岐命令の次に命令を埋
め込むことにより、ムダとなる待ちサイクルを全く無く
すことができる。FIG. 8 shows a processing flow at the time of a conditional branch. Since the number of pipeline stages is smaller than that of the CISC type computer, the number of wait cycles is small. In the example of FIG. 8, there is only one waiting cycle. In addition, the RISC type computer generally adopts a delayed branching method which effectively utilizes the one waiting cycle. In this method, as shown in FIG. 9, an ADD instruction following a BRAcc instruction is executed during a wait cycle. By embedding the instruction next to the branch instruction by the compiler in this way, a wasteful wait cycle can be completely eliminated.

【００１５】しかし、このように効率良く実行できるＲ
ＩＳＣ型計算機も１マシンサイクルで１命令しか実行で
きないという欠点がある。However, R which can be efficiently executed in this way
The ISC type computer also has a disadvantage that it can execute only one instruction in one machine cycle.

【００１６】このため最近のＲＩＳＣ型計算機では、公
開特許公報昭63−49843 号「縮小命令セットコンピユー
タ」のごとく、レジスタファイルを共用する複数の演算
ユニットを設け、命令を簡単にしてパイプライン段数を
少なくし、かつ、１マシンサイクルに複数の命令を読み
出し、複数演算ユニットを制御する方式が考案されてい
る。For this reason, a recent RISC-type computer is provided with a plurality of operation units sharing a register file, as disclosed in Japanese Patent Application Laid-Open No. 63-49843, "Reduced instruction set computer", to simplify instructions and reduce the number of pipeline stages. A method has been devised in which a plurality of instructions are read in one machine cycle to control a plurality of arithmetic units.

【００１７】しかしながら、実際のＲＩＳＣ型計算機
は、単一の演算ユニットを用いて逐次的に命令を処理し
ているため、複数の演算ユニットを用いて複数の命令を
並列実行してしまうと同一の動作を保証できない。例え
ば、割込み処理ではｍ個の命令が同時に処理されるため
ｍ個の命令単位に割込みを受け付けることになり、従来
の逐次処理の動作と異なってしまう。また、１命令単位
に命令を実行する機能を有するデバッガなどのソフトウ
エアは、使用できなくなるなどの欠点がある。However, in an actual RISC-type computer, instructions are sequentially processed using a single operation unit. Therefore, when a plurality of instructions are executed in parallel using a plurality of operation units, the same Operation cannot be guaranteed. For example, in the interrupt processing, m instructions are processed simultaneously, so that an interrupt is accepted in units of m instructions, which is different from the operation of the conventional sequential processing. In addition, software such as a debugger having a function of executing instructions in units of one instruction has a disadvantage that it cannot be used.

【００１８】一方、上記特殊なソフトウエアは使用でき
なくなるが、大部分の従来のソフトウエアを使用可能に
し、かつ高速に実行できる方式は、十分有用である。こ
のような方式で最も重要な点は図９を用いて述べたデイ
レイド分岐命令を含んだｍ個の命令を、どのように並列
実行すれば逐次実行した場合と同じ実行結果を得ること
ができるのかという問題を解決する点にある。On the other hand, the above-mentioned special software can no longer be used, but a method which enables most conventional software to be used and can be executed at high speed is sufficiently useful. The most important point in such a method is how to execute m instructions including the delayed branch instruction described with reference to FIG. 9 in parallel and obtain the same execution result as in the case of sequential execution. The point is to solve the problem.

【００１９】[0019]

【発明が解決しようとする課題】本発明の目的は、並列
処理と逐次処理との互換性を持たせて処理能力を高める
機能を両立させることにある。SUMMARY OF THE INVENTION It is an object of the present invention to achieve both functions of increasing the processing capability by making the parallel processing compatible with the sequential processing.

【００２０】本発明の他の目的は、並列動作において、
特殊な従来のソフトウエアは正常動作できなくても、大
部分の従来のソフトウエアを正常動作させ、かつ、高速
実行できることにある。Another object of the present invention is to provide, in a parallel operation,
Even if special conventional software cannot operate normally, most conventional software can operate normally and can be executed at high speed.

【００２１】[0021]

【課題を解決するための手段】上記目的を達成するため
本発明は、メモリから読み出すべき命令を指示するプロ
グラムカウンタと、プログラムカウンタによって指示さ
れた命令をそれぞれ格納するための複数の命令レジスタ
と、命令レジスタに格納された命令をデコードする複数
のデコーダと、演算を実行するための複数の演算ユニッ
トとを有し、少なくとも１つの上記デコーダは、命令の
デコードによって上記命令レジスタの内容を無効化する
ことを特徴とする。In order to achieve the above object, the present invention comprises a program counter for indicating an instruction to be read from a memory, a plurality of instruction registers for respectively storing the instructions indicated by the program counter, A plurality of decoders for decoding an instruction stored in an instruction register, and a plurality of operation units for executing an operation, wherein at least one of the decoders invalidates the contents of the instruction register by decoding the instruction. It is characterized by the following.

【００２２】[0022]

【発明の実施の形態】以下、本発明の一実施例を説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below.

【００２３】図１０は、本実施例で述べるプロセッサの
命令一覧である。基本命令は全てレジスタ間演算であ
る。分岐命令には、無条件分岐命令ＢＲＡ，条件付分岐
命令ＢＲＡｃｃ（ｃｃは分岐条件を示す。），サブルー
チンの分岐命令ＣＡＬＬ，サブルーチンからの戻り命令
ＲＴＮの４つがある。他に、ロード命令ＬＯＡＤと、ス
トア命令ＳＴＯＲがある。説明の都合上、データ型は３
２ビット整数のみとしたがこれに限定されるものではな
い。またアドレスは３２ビット（４バイト）毎にふられ
ているものとした。また、処理状態フラグ変更命令に
は、分岐先命令から複数命令の同時読み出しを開始し
て、複数の演算ユニットを活性化させると共に、処理状
態フラグをＯＮにする並列化分岐命令ＰＥＸＢと分岐先
命令から１個の命令の読み出しを開始して、第１演算ユ
ニットを活性化させると共に処理状態フラグをＯＦＦに
する逐次化分岐命令ＳＥＸＢ命令である。簡単のため
に、上記の如く命令数を制限しているが、これは本発明
を制限するものではなく、１マシンサイクル処理できる
内容であれば、さらに命令を増やしても良い。図１１に
命令フオーマットを示す。命令は全て３２ビットの固定
長である。基本命令中のＦ，Ｓ１，Ｓ２，Ｄフィールド
は、それぞれ演算結果をフラグに反映するかどうかを指
示するビット、第１ソースレジスタを指示するフィール
ド，第２ソースレジスタを指示するフィールド，ディス
ティネーションレジスタを指示するフィールドである。
本実施例の構成を示したのが図１である。１００は命令
キャッシュ、１０１は３２ビットのプログラムカウンタ
を生成するプログラムカウンタ演算器、１０２はプログ
ラムカウンタ値を保持するラッチ、１０３は処理状態フ
ラグＰＥ（１１６）を保持するプロセッサステイタスレ
ジスタ、１４３はプログラムカウンタを“１”または
“２”だけ加算するセレクタ、１０４は３２ビットの第
１命令レジスタ、１０５は３２ビットの第２命令レジス
タ、１０６は第１命令デコーダ、１０７は第２命令デコ
ーダ、１０８は第１演算ユニット、１０９は第２演算ユ
ニット、１１０はレジスタファイル、１１１はシーケン
サ、１１２はメモリアドレスレジスタＭＡＲ、１１３は
メモリデータレジスタＭＤＲ、１１４はメモリライトレ
ジスタＭＷＲ、１１５はデータキャッシュである。FIG. 10 is a list of instructions of the processor described in this embodiment. All basic instructions are register-to-register operations. There are four branch instructions: an unconditional branch instruction BRA, a conditional branch instruction BRAcc (cc indicates a branch condition), a subroutine branch instruction CALL, and a return instruction RTN from the subroutine. In addition, there are a load instruction LOAD and a store instruction STOR. Data type is 3 for convenience of explanation
Although only a 2-bit integer is used, the present invention is not limited to this. The address is assigned every 32 bits (4 bytes). Further, the processing state flag change instruction includes a parallelized branch instruction PEXB and a branch destination instruction which start simultaneous reading of a plurality of instructions from the branch destination instruction, activate a plurality of arithmetic units, and turn on the processing state flag. , A serialized branch instruction SEXB instruction that starts reading one instruction, activates the first arithmetic unit, and turns off the processing state flag. For simplicity, the number of instructions is limited as described above. However, this does not limit the present invention, and the number of instructions may be further increased as long as it can process one machine cycle. FIG. 11 shows the instruction format. All instructions have a fixed length of 32 bits. The F, S1, S2, and D fields in the basic instruction each include a bit for indicating whether or not the operation result is reflected in the flag, a field for indicating the first source register, a field for indicating the second source register, and a destination register. Field.
FIG. 1 shows the configuration of the present embodiment. 100 is an instruction cache, 101 is a program counter calculator for generating a 32-bit program counter, 102 is a latch for holding a program counter value, 103 is a processor status register for holding a processing state flag PE (116), and 143 is a program counter. Is added by "1" or "2", 104 is a 32-bit first instruction register, 105 is a 32-bit second instruction register, 106 is a first instruction decoder, 107 is a second instruction decoder, and 108 is a second instruction decoder. One operation unit, 109 is a second operation unit, 110 is a register file, 111 is a sequencer, 112 is a memory address register MAR, 113 is a memory data register MDR, 114 is a memory write register MWR, and 115 is a data cache.

【００２４】本実施例では、１マシンサイクルの間に２
つの命令が並列して読み出され実行される。本実施例で
の基本パイプライン処理の動作を示したのが図１２〜図
１５である。パイプラインはＩＦ(Instruction Fetc
h)，Ｒ(Read)，ＥＸ(Execution)，Ｗ(Write）の４段で
ある。In this embodiment, two cycles are performed during one machine cycle.
Two instructions are read and executed in parallel. 12 to 15 show the operation of the basic pipeline processing in this embodiment. The pipeline is IF (Instruction Fetc)
h), R (Read), EX (Execution), and W (Write).

【００２５】再び図１を用いて、本実施例の動作につい
て説明する。 Referring again to FIG. 1 , the operation of this embodiment will be described.

【００２６】ＩＦステージでは、プロセッサステータス
レジスタ１０３の処理状態フラグＰＥ１１６の値がＯＮ
の時、プログラムカウンタによって指される２つの命令
が読み出され、バス１１７，１１８を通して、それぞれ
第１命令レジスタ１０４と第２命令レジスタ１０５にセ
ットされる。ＰＣが偶数のときには、ＰＣ番地の命令が
第１命令レジスタにＰＣ＋１番地の命令が第２命令レジ
スタに格納される。また、ＰＣが奇数のときには、第１
命令レジスタにはＮＯＰ命令が、第２命令レジスタには
ＰＣ番地の命令がセットされる。つまり、シーケンサ１
１１はプログラムカウンタを制御する回路である。第１
命令レジスタ，第２命令レジスタ共に分岐命令でないと
きには、プログラムカウンタには前プログラムカウンタ
値＋２の値をラッチ１０２へセットする。分岐時には、
分岐アドレスを計算してプログラムカウンタにセットす
る。条件分岐時には、第１演算ユニット１０８よりのフ
ラグ情報１２０、及び、第２演算ユニット１０９よりの
フラグ情報１１９より、分岐の成否を判定し、分岐先ア
ドレス情報１２１と分岐制御情報１２２を用いてプログ
ラムカウンタ演算器１０１を制御する。In the IF stage, the value of the processing state flag PE116 of the processor status register 103 is ON.
At this time, two instructions pointed to by the program counter are read out and set in the first instruction register 104 and the second instruction register 105 via the buses 117 and 118, respectively. When the PC is an even number, the instruction at the address PC is stored in the first instruction register and the instruction at the address PC + 1 is stored in the second instruction register. When the PC is an odd number, the first
The NOP instruction is set in the instruction register, and the instruction at the PC address is set in the second instruction register. That is, sequencer 1
Reference numeral 11 denotes a circuit for controlling the program counter. First
When neither the instruction register nor the second instruction register is a branch instruction, the value of the previous program counter value + 2 is set in the latch 102 in the program counter. At the time of branch,
Calculate the branch address and set it in the program counter. At the time of a conditional branch, the success or failure of the branch is determined from the flag information 120 from the first arithmetic unit 108 and the flag information 119 from the second arithmetic unit 109, and the program is executed using the branch destination address information 121 and the branch control information 122. The counter operation unit 101 is controlled.

【００２７】次に基本命令処理時のＲステージの動作に
ついて説明する。Ｒステージでは、第１命令レジスタ１
０４の内容が第１命令デコーダ１０６でデコードされ、
また、第２命令レジスタ１０５の内容が第２命令デコー
ダ１０７でデコードされる。その結果、第１命令レジス
タ１０４の第１ソースレジスタフィールドＳ１で指され
るレジスタの内容がバス１２７を通して、第２ソースレ
ジスタフィールドＳ２で指されるレジスタの内容がバス
１２８を通して、第１演算ユニット１０８へ送出され
る。また、第２命令レジスタ１０５の第１ソースレジス
タフィールドＳ１で指されるレジスタの内容がバス１２
９を通して、第２ソースレジスタフィールドＳ２で指さ
れるレジスタの内容がバス１３０を通して、第２演算ユ
ニット109へ送出される。次にＥＸステージの動作につ
いて説明する。Next, the operation of the R stage during basic instruction processing will be described. In the R stage, the first instruction register 1
04 is decoded by the first instruction decoder 106,
The contents of the second instruction register 105 are decoded by the second instruction decoder 107. As a result, the contents of the register pointed to by the first source register field S1 of the first instruction register 104 are transferred via the bus 127, and the contents of the register pointed to by the second source register field S2 are transferred via the bus 128 to the first operation unit 108. Sent to The contents of the register pointed to by the first source register field S1 of the second instruction
9, the contents of the register pointed to by the second source register field S2 are sent to the second arithmetic unit 109 via the bus 130. Next, the operation of the EX stage will be described.

【００２８】ＥＸステージでは、第１命令レジスタ１０
４のオペコードの内容に従って第１演算ユニット１０８
において、バス１２７，１２８により送られてきたデー
タ間の実行を行う。並列して、第２命令レジスタ１０５
のオペコードの内容に従って第２演算ユニット１０９に
おいて、バス１２９，１３０により送られてきたデータ
間の演算を行う。In the EX stage, the first instruction register 10
4 according to the contents of the operation code of the first operation unit 108
, Execution between the data sent by the buses 127 and 128 is performed. In parallel, the second instruction register 105
In the second operation unit 109, the operation between the data transmitted by the buses 129 and 130 is performed according to the content of the operation code.

【００２９】最後にＷステージの動作を説明する。Ｗス
テージでは第１演算ユニット１０８の演算結果がバス１
３１を通して、第１命令レジスタのディスティネーショ
ンフィールドＤで指されるレジスタに格納される。ま
た、第２演算ユニット１０９の演算結果がバス１３２を
通して、第２命令レジスタのディスティネーションフィ
ールドＤで指されるレジスタに格納される。Finally, the operation of the W stage will be described. In the W stage, the operation result of the first operation unit 108 is
Through 31, it is stored in the register pointed to by the destination field D of the first instruction register. The operation result of the second operation unit 109 is stored in the register pointed to by the destination field D of the second instruction register via the bus 132.

【００３０】図３２は、図１の処理状態フラグ変更手段
を加えたものである。即ち、１４４，１４５はそれぞれ
第１演算ユニット、第２演算ユニットで、ＰＥＸＢおよ
びＳＥＸＢ命令が実行されたときに、フラグ値のデータ
を処理状態フラッグPE116へ伝えるデータ線である。１
４６は処理状態フラグＰＥ１１６へデータを書き込む時
に必要なセレクタである。FIG . 32 shows the processing state flag changing means of FIG. 1 added. That is, reference numerals 144 and 145 denote data lines for transmitting flag value data to the processing state flag PE116 when the PEXB and SEXB instructions are executed, respectively. 1
Reference numeral 46 denotes a selector necessary for writing data to the processing state flag PE116.

【００３１】図１２は、基本命令を連続して処理するフ
ローを示したものである。１マシンサイクルに２命令ず
つ処理される。ここで図１２で２命令ずつ処理される内
の上の方が第１演算ユニットの処理を、下の方が第２演
算ユニットの処理を示している。また、この例では、第
１演算ユニットと第２演算ユニットは常に並列して動作
している。FIG. 12 shows a flow of processing basic instructions continuously. Two instructions are processed in one machine cycle. Here, the upper one of the two instructions processed in FIG. 12 shows the processing of the first arithmetic unit, and the lower one shows the processing of the second arithmetic unit. In this example, the first operation unit and the second operation unit always operate in parallel.

【００３２】図１３は第１命令としてロード命令、また
は、ストア命令、第２命令として基本命令を連続して処
理するフローを示したものである。ロード命令実行時に
は、Ｒステージで、第１命令レジスタのＳ２フィールド
で指されるレジスタの内容が、バス１２８を通して、Ｍ
ＡＲ１１２に転送される。次に、ＥＸステージで、デー
タキャッシュからバス１３３を通して、オペランドをフ
ェッチする。最後に、Ｗステージでフェッチされたオペ
ランドが、バス１３４を通して、第１命令レジスタのデ
ィスティネーションフィールドＤで指されるレジスタに
格納される。ＥＸステージで１マシンサイクルでオペラ
ンドをフェッチすることは、図１の如く高速データキャ
ッシュ１１５を備えていれば、可能である。特に、図１
に示す、計算機全体が半導体基盤上に集積され、命令キ
ャッシュ，データキャッシュ共にオンチップ化されてい
る場合などは容易である。勿論、キャッシュがミスヒッ
トした場合には、１マシンサイクルでオペランドフェッ
チを終了することはできない。このような時は、システ
ムクロックを止めて、ＥＸステージを延長すれば良い。
これは、従来の計算機でも行われていることである。FIG. 13 shows a flow for processing a load instruction or a store instruction as the first instruction and a basic instruction as the second instruction in succession. When the load instruction is executed, the contents of the register pointed to by the S2 field of the first instruction register are transferred to the M stage through the bus 128 at the R stage.
Transferred to AR 112. Next, in the EX stage, the operand is fetched from the data cache through the bus 133. Finally, the operand fetched at the W stage is stored in the register pointed to by the destination field D of the first instruction register via the bus 134. Fetching operands in one machine cycle in the EX stage, if a high speed data cache 115 as shown in FIG. 1, it is possible. In particular, FIG.
It is easy when the entire computer is integrated on a semiconductor substrate and both the instruction cache and the data cache are on-chip as shown in FIG. Of course, if the cache misses, the operand fetch cannot be completed in one machine cycle. In such a case, the EX stage may be extended by stopping the system clock.
This is what is done with conventional computers.

【００３３】次にストア命令実行時には、Ｒステージに
おいて、第１命令レジスタの第１ソースレジスタフィー
ルドＳ１で指されるレジスタの内容がデータとして、バ
ス１３５を通してMWR114に転送される。また同時に、第
１命令レジスタの第２ソースレジスタフィールドＳ２で
指されるレジスタの内容がアドレスとして、バス１２８
を通して、MAR112に転送される。次にＥＸステージで、
MAR112で指される番地に、MWR114内のデータが書き込ま
れる。図１３に示すように、ロード命令，ストア命令、
その命令、例えば図中のＡＤＤ命令と共に１マシンサイ
クルに２命令ずつ処理することができる。Next, when the store instruction is executed, the contents of the register pointed to by the first source register field S1 of the first instruction register are transferred as data to the MWR 114 through the bus 135 in the R stage. At the same time, the contents of the register pointed to by the second source register field S2 of the first instruction register are used as an address as the bus 128
Through to the MAR 112. Next, at the EX stage,
The data in the MWR 114 is written to the address pointed to by the MAR 112. As shown in FIG. 13, a load instruction, a store instruction,
Two instructions can be processed in one machine cycle together with the instruction, for example, the ADD instruction in the figure.

【００３４】図１４は、第２命令として無条件ジャンプ
ＢＲＡ命令実行時の処理フローを示したものである。な
お、この図は後述する他の実施例の説明にも使用する。FIG . 14 shows a processing flow when the unconditional jump BRA instruction is executed as the second instruction. This drawing is also used for the description of another embodiment described later.

【００３５】ＢＲＡ命令が読み出されると、Ｒステージ
においてシーケンサ１１１はディスプレースメントフィ
ールドｄとプログラムカウンタとの加算を行い、プログ
ラムカウンタのラッチ１０２にセットする。この間にＢ
ＲＡ命令の次の番地の命令と、その次の番地の命令（図
１４命令１と命令２）が読み出される。その次のサイク
ルに、ジャンプ先の２命令が読み出される。本実施例で
は、命令１，命令２とも実行可能なハードウエアとして
いる。即ち、ジャンプ命令処理時も、待ちサイクルが発
生しない。この手法は、ディレイド分岐と呼ばれるもの
で、ＲＩＳＣ型の従来計算機でも行われているものであ
る。ただし、従来のＲＩＳＣ型計算機では、ジャンプ命
令のアドレス計算中に１命令しか実行できなかったが、
本実施例では、ジャンプ命令のアドレス計算中にも、２
命令同時処理されるため、より処理能力を高めることが
できる。ＣＡＬＬ命令，ＲＴＮ命令の処理フローも同様
である。コンパイラにより、分岐命令のアドレス計算中
にできる限り有効な命令を実行できるようにコード生成
するが、何もすることが無い時には図１４命令１，２を
ＮＯＰ命令としておく。このときには、実質的には１マ
シンサイクルの待ちが生ずる。しかしながら、パイプラ
イン段数が浅いので、従来例で述べられたＣＩＳＣ型の
計算機に比して、分岐時のオーバヘッドを小さくできる
という利点がある。When the BRA instruction is read, the sequencer 111 adds the displacement field d and the program counter in the R stage, and sets the result in the latch 102 of the program counter. During this time B
The instruction at the address next to the RA instruction and the instruction at the next address ( Fig.
14 Instructions 1 and 2) are read. In the next cycle, the two instructions at the jump destination are read. In the present embodiment, hardware capable of executing both the instruction 1 and the instruction 2 is used. That is, a wait cycle does not occur during the jump instruction processing. This method is called a delayed branch, and is also used in a conventional RISC computer. However, in the conventional RISC type computer, only one instruction can be executed during the address calculation of the jump instruction.
In this embodiment, even during the calculation of the address of the jump instruction, 2
Since the instructions are processed simultaneously, the processing capability can be further increased. The same applies to the processing flow of the CALL instruction and the RTN instruction. The compiler generates a code so that a valid instruction can be executed as much as possible during the calculation of the address of the branch instruction. However, when there is nothing to do, the instructions 1 and 2 in FIG. 14 are set as NOP instructions. At this time, a wait of substantially one machine cycle occurs. However, since the number of pipeline stages is small, there is an advantage that overhead at the time of branching can be reduced as compared with the CISC type computer described in the conventional example.

【００３６】図１５は第２命令として条件分岐命令ＢＲ
Ａｃｃ命令実行時の処理フローを示したものである。Ａ
ＤＤ，Ｆと示した命令で、フラグのセットが行われ、そ
の結果に従い分岐の成否が決められる。このときも、図
１４を用いて説明した無条件分岐命令処理時と同時にＢ
ＲＡｃｃ命令の置かれている番地の次の命令、図１５命
令１と、その次の命令、図１５命令２が読み出されて処
理され、この２命令の処理フロー中Ｗステージにおいて
は、ＢＲＡｃｃ命令の分岐条件の成否にかかわらず演算
結果のレジスタファイルへの書き込みが行われる。FIG. 15 shows a conditional branch instruction BR as a second instruction.
It shows a processing flow when an Acc instruction is executed. A
Flags are set by instructions DD and F, and the success or failure of the branch is determined according to the result. Again, the figure
At the same time B and time unconditional branch instruction processing described with reference to 14
The instruction following the address where the RAcc instruction is located, instruction 1 in FIG. 15, and the next instruction, instruction 2 in FIG. 15 are read and processed. In the W stage in the processing flow of these two instructions, the BRAcc instruction Regardless of whether the branch condition is satisfied or not, the operation result is written to the register file.

【００３７】図１６は、第１命令として無条件分岐命令
ＢＲＡ命令実行時の処理フローを示したものである。Ｂ
ＲＡ命令と命令１が読み出されると、Ｒステージにおい
てシーケンサ１１１はディスプレースメントフィールド
ｄとプログラムカウンタとの加算を行い、プログラムカ
ウンタのラッチ１０２にセットするとともに命令１のオ
ペランドのリード並列処理する。この間に命令１の次の
番地の命令２とその次の番地の命令３が読み出される。
本実施例では、分岐命令と命令１を並列実行し、さらに
命令２と命令３とも実行可能なハードウエアとしてい
る。即ち、分岐命令を含む２命令を並列実行するととも
に、その次の２命令をも実行可能としている。通常のデ
ィレイド分岐命令では、分岐命令直後の１命令のみを並
列実行するが、本実施例の分岐命令は、図１４の場合に
は分岐命令直後の２命令を実行し、一方、図１６の場合
には、分岐命令直後の３命令を実行しており、通常のデ
ィレイド分岐とは異なる。すなわち、ディレイド分岐命
令を含むｍ命令は並列実行され、しかも引き続くｍ命令
が分岐時間を利用して実行される点が異なる。これによ
り、高度な並列処理が実現可能である。一方、図１７
は、第１命令として条件付分岐命令ＢＲＡｃｃ命令実行
時の処理フローを示したものである。図１６の処理フロ
ーと同様に、ＢＲＡｃｃ命令と命令１は並列実行され、
ジャンプ先命令１および２へ分岐する時間を利用して命
令２と命令３は、条件の成否にかかわらず実行される。
これにより高度な並列実行が可能となり、図１５と図１
７からわかるように分岐命令直後の命令はそれぞれ２命
令と３命令が実行される。このように分岐命令が第１命
令として存在するかまたは第２命令として存在するか、
その場所によって分岐時に実行される命令数が異なる。FIG. 16 shows a processing flow when the unconditional branch instruction BRA instruction is executed as the first instruction. B
When the RA instruction and the instruction 1 are read, in the R stage, the sequencer 111 performs addition of the displacement field d and the program counter, sets the result in the latch 102 of the program counter, and performs read parallel processing of the operand of the instruction 1. During this time, the instruction 2 at the next address of the instruction 1 and the instruction 3 at the next address are read out.
In this embodiment, the hardware is configured to execute the branch instruction and the instruction 1 in parallel, and to execute both the instruction 2 and the instruction 3. That is, two instructions including a branch instruction are executed in parallel, and the next two instructions can be executed. In a typical delayed branch instruction, but executed in parallel only one instruction immediately branch instruction, the branch instruction of the present embodiment, in the case of FIG. 14 executes the two instructions immediately following the branch instruction, whereas in the case of FIG. 16 Executes three instructions immediately after the branch instruction, which is different from a normal delayed branch. That is, the difference is that m instructions including a delayed branch instruction are executed in parallel, and the subsequent m instructions are executed using the branch time. Thereby, advanced parallel processing can be realized. On the other hand, FIG.
Shows a processing flow when the conditional branch instruction BRAcc instruction is executed as the first instruction. As in the processing flow of FIG. 16, the BRAcc instruction and instruction 1 are executed in parallel,
Instructions 2 and 3 are executed using the time to branch to jump destination instructions 1 and 2 regardless of whether the condition is satisfied.
This enables high-level parallel execution.
As can be seen from FIG. 7, two instructions and three instructions are executed immediately after the branch instruction. Thus, whether the branch instruction exists as the first instruction or the second instruction,
The number of instructions executed at the time of branching differs depending on the location.

【００３８】以上、図１２，図１３，図１４，図１５，
図１６，図１７を用いて説明したように、プロセッサス
テータスレジスタ１０３の処理状態フラグＰＥ１１６の
値がＯＮのときには、１マシンサイクルに２命令ずつ処
理するので、その処理能力が最大２倍に向上されるとい
う利点がある。As described above, FIG. 12, FIG. 13, FIG. 14 , FIG.
As described with reference to FIG. 16 and FIG. 17, when the value of the processing state flag PE116 of the processor status register 103 is ON, two instructions are processed in one machine cycle, so that the processing capacity is doubled at the maximum. The advantage is that

【００３９】一方、プロセッサステータスレジスタ１０
３の処理状態フラグＰＥ１１６の値がＯＦＦのときに
は、制御信号１３６を介してプログラムカウンタは＋１
だけ増加するように制御すると共に、命令キャッシュ１
００は、３２ビット長の１個の命令をバス１１７を介し
て第１命令レジスタ１０４へ読み出すように、制御信号
１３７によって制御される。また、制御信号１３６は、
第１命令デコーダ１０６と第２命令デコーダ１０７へ入
っており、この結果第１命令デコーダは第１命令レジス
タ１０４の命令を第１演算ユニット１０８で処理するよ
うに動作すると共に、第２命令デコーダは第２演算ユニ
ットを止めるように動作する。この結果、第１演算ユニ
ットによる逐次処理を行うことができる。On the other hand, the processor status register 10
When the value of the processing state flag PE116 is OFF, the program counter is incremented by +1 via the control signal 136.
Instruction cache 1
00 is controlled by the control signal 137 so that one instruction having a 32-bit length is read out to the first instruction register 104 via the bus 117. The control signal 136 is
The first instruction decoder 106 and the second instruction decoder 107 are provided so that the first instruction decoder operates to process the instruction in the first instruction register 104 by the first arithmetic unit 108 and the second instruction decoder It operates to stop the second arithmetic unit. As a result, sequential processing by the first arithmetic unit can be performed.

【００４０】次に、図３２を用いて、プロセッサステー
タスレジスタ１０３の処理状態フラグＰＥ１１６の値が
ＯＦＦの時のパイプライン動作について詳しく説明す
る。Next, with reference to FIG. 32, the value of the processing state flag PE116 of the processor status register 103 is described in detail pipeline operation when the OFF.

【００４１】ＩＦステージでは、プログラムカウンタに
よって指される１つの命令が読み出され、バス１１７を
通して、第１命令レジスタ１０４にセットされる。な
お、バス１１８は、処理状態フラグＰＥ１１６の値がＯ
ＦＦの時、有効な命令は出力されない。つまり、シーケ
ンサ１１１はプログラムカウンタを制御する回路であ
る。第１命令レジスタが分岐命令でないときには、プロ
グラムカウンタには前プログラムカウンタ値＋１の値を
ラッチ１０２へセットする。分岐時には、分岐アドレス
を計算してプログラムカウンタにセットする。条件分岐
時には、第１演算ユニット１０８よりのフラグ情報１２
０より、分岐の成否を判定し、分岐先アドレス情報１２
１と分岐制御情報１２２を用いてプログラムカウンタ演
算器１０１を制御する。In the IF stage, one instruction pointed to by the program counter is read out and set in the first instruction register 104 via the bus 117. It should be noted that the bus 118 has the processing status flag
At the time of FF, no valid instruction is output. That is, the sequencer 111 is a circuit that controls the program counter. When the first instruction register is not a branch instruction, the value of the previous program counter value + 1 is set to the latch 102 in the program counter. At the time of branching, the branch address is calculated and set in the program counter. At the time of the conditional branch, the flag information 12 from the first arithmetic unit 108
From 0, the branch success / failure is determined, and the branch destination address information 12 is determined.
The program counter arithmetic unit 101 is controlled by using 1 and the branch control information 122.

【００４２】次に基本命令処理時のＲステージの動作に
ついて説明する。Ｒステージでは、第１命令レジスタ１
０４の内容が第１命令デコーダ１０６でデコードされ
る。その結果、第１命令レジスタ１０４の第１ソースレ
ジスタフィールドＳ１で指されるレジスタの内容がバス
１２７を通して、第２ソースレジスタフィールドＳ２で
指されるレジスタの内容がバス１２８を通して、第１演
算ユニット１０８へ送出される。Next, the operation of the R stage during basic instruction processing will be described. In the R stage, the first instruction register 1
04 is decoded by the first instruction decoder 106. As a result, the contents of the register pointed to by the first source register field S1 of the first instruction register 104 are transferred via the bus 127, and the contents of the register pointed to by the second source register field S2 are transferred via the bus 128 to the first operation unit 108. Sent to

【００４３】次にＥＸステージの動作について説明す
る。Next, the operation of the EX stage will be described.

【００４４】ＥＸステージでは、第１命令レジスタ１０
４のオペコードの内容に従って第１演算ユニット１０８
において、バス１２７，１２８により送られてきたデー
タ間の演算を行う。In the EX stage, the first instruction register 10
4 according to the contents of the operation code of the first operation unit 108
, An operation is performed between the data transmitted by the buses 127 and 128.

【００４５】最後にＷステージの動作を説明する。Ｗス
テージでは第１演算ユニット１０８の演算結果がバス１
３１を通して、第１命令レジスタのディスティネーショ
ンフィールドＤで指されるレジスタに格納される。Finally, the operation of the W stage will be described. In the W stage, the operation result of the first operation unit 108 is
Through 31, it is stored in the register pointed to by the destination field D of the first instruction register.

【００４６】図１８は、基本命令を連続して処理するフ
ローを示したものである。１マシンサイクルに２命令ず
つ処理される能力はあるが、１命令ずつ処理される。FIG. 18 shows a flow of processing basic instructions continuously. Although it has the ability to process two instructions in one machine cycle, it processes one instruction at a time.

【００４７】図１９はロード命令，ストア命令を連続し
て処理するフローを示したものである。ロード命令実行
時には、Ｒステージで、第１命令レジスタのＳ２フィー
ルドで指されるレジスタの内容が、バス１２８を通し
て、MAR112へ転送される。次に、ＥＸステージで、デー
タキャッシュ１１５を通して、オペランドをMDR113にフ
ェッチする。最後に、Ｗステージでフェッチされたオペ
ランドが、バス１３４を通して、第１命令レジスタのデ
ィスティネーションフィールドＤで指されるレジスタに
格納される。FIG. 19 shows a flow for processing a load instruction and a store instruction continuously. When the load instruction is executed, the contents of the register pointed to by the S2 field of the first instruction register are transferred to the MAR 112 via the bus 128 in the R stage. Next, in the EX stage, the operand is fetched to the MDR 113 through the data cache 115. Finally, the operand fetched at the W stage is stored in the register pointed to by the destination field D of the first instruction register via the bus 134.

【００４８】次にストア命令実行時には、Ｒステージに
おいて、第１命令レジスタの第１ソースレジスタフィー
ルドＳ１で指されるレジスタの内容がデータとして、バ
ス１３５を通してMWR114に転送される。また同時に、第
１命令レジスタの第２ソースレジスタフィールドＳ２で
指されるレジスタの内容がアドレスとして、バス１２８
と１３１を通してMAR112に転送される。次にＥＸステー
ジで、MAR112で指される番地に、MWR114内のデータが書
き込まれる。図１９に示すように、ロード命令，ストア
命令が連続しても、１マシンサイクルに２命令ずつ処理
する能力はあるが、１命令ずつ処理することができる。Next, when the store instruction is executed, in the R stage, the contents of the register pointed to by the first source register field S1 of the first instruction register are transferred as data to the MWR 114 through the bus 135. At the same time, the contents of the register pointed to by the second source register field S2 of the first instruction register are used as an address as the bus 128
Are transferred to the MAR 112 through the steps 131 and 131. Next, in the EX stage, the data in the MWR 114 is written to the address pointed to by the MAR 112. As shown in FIG. 19, even if a load instruction and a store instruction are successive, there is a capability of processing two instructions in one machine cycle, but one instruction can be processed.

【００４９】図２０は、無条件ジャンプＢＲＡ命令実行
時の処理フローを示したものである。ＢＲＡ命令が読み
出されると、Ｒステージにおいてシーケンサ１１１はデ
ィスプレースメントフィールドｄとプログラムカウンタ
との加算を行い、プログラムカウンタのラッチ１０２に
セットする。この間にＢＲＡ命令の次の番地の命令が読
み出される。その次のサイクルに、ジャンプ先の命令が
読み出される。本実施例では、命令１を実行可能なハー
ドウエアとしている。即ち、ジャンプ命令処理時も、待
ちサイクルが発生しない。FIG. 20 shows a processing flow when the unconditional jump BRA instruction is executed. When the BRA instruction is read, in the R stage, the sequencer 111 adds the displacement field d and the program counter, and sets the result in the latch 102 of the program counter. During this time, the instruction at the address next to the BRA instruction is read. In the next cycle, the instruction at the jump destination is read. In this embodiment, hardware capable of executing the instruction 1 is used. That is, a wait cycle does not occur during the jump instruction processing.

【００５０】プロセッサステータスレジスタ１０３の処
理状態フラグＰＥ１１６の値がOFFのときについて述べ
てきたが、ＯＮのときに比べてみると、本実施例ではデ
ィレイド分岐中に行われる命令２，命令３は実行できな
くなったが、従来のＲＩＳＣ型計算機と同じくジャンプ
命令のアドレス計算中に１命令実行できるようになっ
た。このように、本実施例の処理状態フラグＰＥ１１６
の値がＯＦＦのとき、従来と互換を保つ効果がある。Ｃ
ＡＬＬ命令，ＲＴＮ命令の処理フローも同様である。コ
ンパイラにより、分岐命令のアドレス計算中にできる限
り有効な命令を実行できるようにコード生成するが、何
もすることが無い時には図２０命令１をＮＯＰ命令とし
ておく。このときには、実質的には１マシンサイクルの
待ちが生ずる。Although the case where the value of the processing state flag PE116 of the processor status register 103 is OFF has been described, in comparison with the case where the value of the processing state flag PE116 is ON, in this embodiment, the instructions 2 and 3 performed during the delayed branch are executed. Although it is no longer possible, one instruction can be executed during the address calculation of the jump instruction as in the conventional RISC type computer. As described above, the processing state flag PE116 of this embodiment is used.
When the value is OFF, there is an effect of maintaining compatibility with the related art. C
The same applies to the processing flow of the ALL instruction and the RTN instruction. The compiler generates a code so as to execute a valid instruction as much as possible during the calculation of the address of the branch instruction. However, when there is nothing to do, instruction 1 in FIG. 20 is set as a NOP instruction. At this time, a wait of substantially one machine cycle occurs.

【００５１】図２１は条件分岐命令ＢＲＡｃｃの処理フ
ローを示したものである。ＡＤＤ，Ｆと示した命令で、
フラグのセットが行われ、その結果に従い分岐の成否が
決められる。このときも、図２０を用いて説明した無条
件分岐命令と同様にBRAcc 命令の置かれている番地の次
の命令、図２１の命令１が読み出されて処理され、この
命令の処理フロー中Ｗステージにおいて、ＢＲＡｃｃ命
令の分岐条件の成否にかかわらず演算結果のレジスタフ
ァイルへの書き込みが行われる。FIG. 21 shows a processing flow of the conditional branch instruction BRAcc. With the instructions shown as ADD and F,
The flag is set, and the success or failure of the branch is determined according to the result. At this time, similarly to the unconditional branch instruction described with reference to FIG. 20, the instruction following the address where the BRAcc instruction is located , instruction 1 in FIG. 21 , is read and processed. In the W stage, the operation result is written to the register file regardless of whether the branch condition of the BRAcc instruction is satisfied.

【００５２】以上、図１８〜図２１を用いて説明したよ
うに、プロセッサステータスレジスタ１０３の処理状態
フラグＰＥ１１６の値がＯＦＦのときには、１命令ずつ
処理させ、従来のソフトウエアと互換性を保つという利
点がある。As described above with reference to FIGS. 18 to 21, when the value of the processing state flag PE116 of the processor status register 103 is OFF, instructions are processed one instruction at a time to maintain compatibility with conventional software. There are advantages.

【００５３】以上、高度な並列処理手段と従来のソフト
ウエア互換を保つ逐次処理手段を有し、処理状態フラグ
に基づく処理手段切換え方式の実施例を示した。本実施
例の逐次処理手段は、１命令ずつ読み出して第１演算ユ
ニットで実行する方式であったが、図３２からわかるよ
うに、２つの命令レジスタ１０４，１０５が存在するた
め、プログラムカウンタは＋２ずつ増加させるように制
御して、第１命令レジスタ１０４、及び、第２命令レジ
スタ１０５へ２個の命令を読み出して保存し、第１命令
レジスタ１０４の命令を第１演算ユニット１０８で実行
し、続いて、第２命令レジスタ１０５の命令を第２演算
ユニット１０９で実行する手段を設けることによっても
実現できる。すなわち、命令キャッシュは、分岐命令を
除き、２回に１回の割合で動作すれば良い。As described above, the embodiment of the processing means switching method based on the processing state flag having the advanced parallel processing means and the sequential processing means for maintaining the conventional software compatibility has been described. The sequential processing means of the present embodiment reads out one instruction at a time and executes it in the first arithmetic unit . However, as can be seen from FIG. 32, since the two instruction registers 104 and 105 exist, the program counter is +2. The two instructions are read and stored in the first instruction register 104 and the second instruction register 105, and the instructions in the first instruction register 104 are executed by the first arithmetic unit 108, Subsequently, the present invention can also be realized by providing means for executing the instruction of the second instruction register 105 by the second arithmetic unit 109. That is, the instruction cache only needs to operate once every two times except for the branch instruction.

【００５４】そこで再び、図３２を用いてプロセッサス
テータスレジスタ１０３の処理状態フラグＰＥ１１６の
値がＯＦＦの時の“ｍ命令を読み出して逐次処理する手
段”の動作を説明する。The operation of the "means for reading and sequentially processing the m instructions" when the value of the processing status flag PE116 of the processor status register 103 is OFF will be described again with reference to FIG .

【００５５】ＩＦステージでは、プログラムカウンタに
よって指される２つの命令が読み出され、バス１１７，
１１８を通して、それぞれ第１命令レジスタ１０４と第
２命令レジスタ１０５にセットされる。ＰＣが偶数のと
きには、ＰＣ番地の命令が第１命令レジスタに、ＰＣ＋
１番地の命令が第２命令レジスタに格納される。また、
ＰＣが奇数のときには、第１命令レジスタにはＮＯＰ命
令が、第２命令レジスタにはＰＣ番地の命令がセットさ
れる。つまり、シーケンサ１１１はプログラムカウンタ
を制御する回路である。第１命令レジスタ，第２命令レ
ジスタ共に分岐命令でないときには、プログラムカウン
タには前プログラムカウンタ値＋２の値をラッチ１０２
へセットする。分岐時には、分岐アドレスを計算してプ
ログラムカウンタにセットする。条件分岐時には、第１
演算ユニット１０８よりのフラグ情報１２０、及び、第
２演算ユニット１０９よりのフラグ情報１１９より、分
岐の成否を判定し、分岐先アドレス情報１２１と分岐制
御情報１２２を用いてプログラムカウンタ演算器１０１
を制御する。なお、後述するように第１命令レジスタと
第２命令レジスタに保存されたそれぞれの命令は、後の
ステージで逐次的に処理されるため、各マシンサイクル
ごとに命令キャッシュを動作させるのではなく、２マシ
ンサイクル１度動作させれば良い。In the IF stage, two instructions pointed to by the program counter are read, and the bus 117,
Through 118, they are set in the first instruction register 104 and the second instruction register 105, respectively. When the PC is an even number, the instruction at the PC address is stored in the first instruction register by the PC +
The instruction at address 1 is stored in the second instruction register. Also,
When the PC is an odd number, the NOP instruction is set in the first instruction register, and the instruction at the PC address is set in the second instruction register. That is, the sequencer 111 is a circuit that controls the program counter. When neither the first instruction register nor the second instruction register is a branch instruction, the value of the previous program counter value + 2 is latched in the program counter.
Set to At the time of branching, the branch address is calculated and set in the program counter. At the time of conditional branch, the first
Based on the flag information 120 from the arithmetic unit 108 and the flag information 119 from the second arithmetic unit 109, the success or failure of the branch is determined, and the program counter arithmetic unit 101 is determined using the branch destination address information 121 and the branch control information 122.
Control. As will be described later, the instructions stored in the first instruction register and the second instruction register are sequentially processed in a later stage, so that the instruction cache is not operated for each machine cycle. The operation may be performed once in two machine cycles.

【００５６】次に基本命令処理時のＲステージの動作に
ついて説明する。Ｒステージでは、第１命令レジスタ１
０４の内容が第１命令デコーダ１０６でデコードされ、
続いて次のステージで、第２命令レジスタ１０５の内容
が第２命令デコーダ１０７でデコードされる。その結
果、第１命令レジスタ１０４の第１ソースレジスタフィ
ールドＳ１で指されるレジスタの内容がバス１２７を通
して、第２ソースレジスタフィールドＳ２で指されるレ
ジスタの内容がバス１２８を通して、第１演算ユニット
１０８へ送出される。また、続いて次のステージで、第
２命令レジスタ１０５の第１ソースレジスタフィールド
Ｓ１で指されるレジスタの内容がバス１２９を通して、
第２ソースレジスタフィールドＳ２で指されるレジスタ
内容がバス１３０を通して、第２演算ユニット１０９へ
送出される。Next, the operation of the R stage during basic instruction processing will be described. In the R stage, the first instruction register 1
04 is decoded by the first instruction decoder 106,
Subsequently, in the next stage, the contents of the second instruction register 105 are decoded by the second instruction decoder 107. As a result, the contents of the register pointed to by the first source register field S1 of the first instruction register 104 are transferred via the bus 127, and the contents of the register pointed to by the second source register field S2 are transferred via the bus 128 to the first operation unit 108. Sent to Then, in the next stage, the contents of the register pointed to by the first source register field S1 of the second instruction register 105 are transferred through the bus 129.
The contents of the register pointed to by the second source register field S2 are sent to the second arithmetic unit 109 via the bus 130.

【００５７】次にＥＸステージの動作について説明す
る。Next, the operation of the EX stage will be described.

【００５８】ＥＸステージでは、第１命令レジスタ１０
４のオペコードの内容に従って第１演算ユニット１０８
において、バス１２７，１２８により送られてきたデー
タ間の演算を行う。続いて次のステージで、第２命令レ
ジスタ１０５のオペコードの内容に従って第２演算ユニ
ット１０９において、バス１２９，１３０により送られ
てきたデータ間の演算を行う。最後にＷステージの動作
を説明する。Ｗステージでは第１演算ユニット１０８の
演算結果がバス１３１を通して、第１命令レジスタのデ
ィスティネーションフィールドＤで指されるレジスタに
格納される。また、続いて次のステージで、第２演算ユ
ニット１０９の演算結果がバス１３２を通して、第２命
令レジスタのディスティネーションフィールドＤで指さ
れるレジスタに格納される。In the EX stage, the first instruction register 10
4 according to the contents of the operation code of the first operation unit 108
, An operation is performed between the data transmitted by the buses 127 and 128. Subsequently, in the next stage, the operation between the data sent by the buses 129 and 130 is performed in the second operation unit 109 in accordance with the contents of the operation code of the second instruction register 105. Finally, the operation of the W stage will be described. In the W stage, the operation result of the first operation unit 108 is stored in the register pointed to by the destination field D of the first instruction register via the bus 131. In the next stage, the operation result of the second operation unit 109 is stored in the register pointed to by the destination field D of the second instruction register via the bus 132.

【００５９】図２２は、基本命令ＡＤＤを連続して処理
するフローを示したものである。１マシンサイクルに２
命令ずつ処理できる能力があるが、１命令ずつ処理され
る。すなわち、２つのＡＤＤ命令は同時にフェッチされ
るが、最初のＡＤＤ命令のみがＲステージの処理を実行
する。一方、２番目のＡＤＤ命令は、１マシンサイクル
待った後にＲステージの処理を実行する。ここで図２２
で２命令ずつ処理される内の上の方が第１演算ユニット
の処理を、下の方が第２演算ユニットの処理を示してい
る。FIG. 22 shows a flow for continuously processing the basic instructions ADD. 2 per machine cycle
It has the ability to process instructions one by one, but it processes one instruction at a time. That is, two ADD instructions are fetched at the same time, but only the first ADD instruction executes the processing of the R stage. On the other hand, the second ADD instruction executes the processing of the R stage after waiting for one machine cycle. Here, FIG.
The upper one shows the processing of the first arithmetic unit and the lower one shows the processing of the second arithmetic unit.

【００６０】図２３はロード命令，ストア命令を連続し
て処理するフローを示したものである。ロード命令実行
時には、Ｒステージで、第１命令レジスタのＳ２フィー
ルドで指されるレジスタの内容が、バス１２８を通し
て、MAR112に転送される。次に、ＥＸステージで、デー
タキャッシュからバス１３３を通して、オペランドをフ
ェッチする。最後に、Ｗステージでフェッチされたオペ
ランドが、バス１３４を通して、第１命令レジスタのデ
ィスティネーションフィールドＤで指されるレジスタに
格納される。ＥＸステージで１マシンサイクルでオペラ
ンドをフェッチすることは、図１Ａの如く高速データキ
ャッシュ１１５を備えていれば、可能である。FIG. 23 shows a flow for processing load instructions and store instructions successively. When the load instruction is executed, the contents of the register pointed to by the S2 field of the first instruction register are transferred to the MAR 112 via the bus 128 in the R stage. Next, in the EX stage, the operand is fetched from the data cache through the bus 133. Finally, the operand fetched at the W stage is stored in the register pointed to by the destination field D of the first instruction register via the bus 134. Fetching an operand in one machine cycle in the EX stage is possible if the high-speed data cache 115 is provided as shown in FIG. 1A.

【００６１】次にストア命令実行は、ロード命令のＲス
テージ実行後、Ｒステージにおいて第２命令レジスタの
第２ソースレジスタフィールドＳ１で指されるレジスタ
の内容がデータとして、バス１３５を通してMWR114に転
送される。また同時に、第２命令レジスタの第２ソース
レジスタフィールドＳ２で指されるレジスタの内容がア
ドレスとして、バス１２９を通してMAR112に転送され
る。次にＥＸステージで、MAR112で指される番地に、MW
R114内のデータが書き込まれる。図２３に示すように、
ロード命令，ストア命令が連続しても、１マシンサイク
ルに２命令ずつ処理することができる能力はあるが、１
命令ずつ処理することができる。Next, in the execution of the store instruction, after the execution of the R stage of the load instruction, the contents of the register pointed to by the second source register field S1 of the second instruction register are transferred to the MWR 114 through the bus 135 as data in the R stage. You. At the same time, the contents of the register pointed to by the second source register field S2 of the second instruction register are transferred to the MAR 112 via the bus 129 as an address. Next, at the EX stage, the MW is
The data in R114 is written. As shown in FIG.
Even if load and store instructions are consecutive, it has the ability to process two instructions in one machine cycle.
Instructions can be processed one by one.

【００６２】図２４から図２７は、無条件ジャンプＢＲ
Ａ命令と引き続く番地の命令１の実行時の処理フローを
示したものである。特に、図２４と図２５は第１命令
に、図２６と図２７は第２命令にそれぞれ無条件ジャン
プＢＲＡ命令が存在しているときのパイプライン処理フ
ローを示しており、さらに、図２４と図２６はジャンプ
先命令が第１命令に相当する番地に有るとき、図２５と
図２７はジャンプ先命令が第２命令に相当する番地に有
る場合である。ＢＲＡ命令が命令レジスタから読み出さ
れると、Ｒステージにおいてシーケンサ１１１はディス
プレースメントフィールドｄとプログラムカウンタとの
加算を行い、プログラムカウンタのラッチ１０２にセッ
トする。この間にＢＲＡ命令の次の番地の命令が次のサ
イクルで実行される。そして、次の次のサイクルに、ジ
ャンプ先の２命令が読み出される。ここで、無条件ジャ
ンプＢＲＡ命令が第２命令に有るとき（図２６，図２
７）、ＢＲＡ命令の次の番地の命令を含む２命令をＩＦ
ステージで命令キャッシュから読み出すが、第１命令は
実行するが、第２命令は実行せずにジャンプ先命令を実
行するように制御されている。つまり、分岐命令の次の
命令より後の命令が命令レジスタに保持されていてもそ
れらは、実行されずに無効化される。FIGS. 24 to 27 show the unconditional jump BR.
It shows a processing flow when an A instruction and an instruction 1 at a subsequent address are executed. In particular, FIGS. 24 and 25 show a pipeline processing flow when an unconditional jump BRA instruction exists in the first instruction, and FIGS. 26 and 27 show a pipeline processing flow when an unconditional jump BRA instruction exists in the second instruction. FIG. 26 shows the case where the jump destination instruction is at the address corresponding to the first instruction, and FIGS. 25 and 27 show the case where the jump destination instruction is at the address corresponding to the second instruction. When the BRA instruction is read from the instruction register, the sequencer 111 adds the displacement field d and the program counter in the R stage and sets the result in the latch 102 of the program counter. During this time, the instruction at the address next to the BRA instruction is executed in the next cycle. Then, in the next next cycle, the two instructions at the jump destination are read. Here, when the unconditional jump BRA instruction is included in the second instruction (see FIGS. 26 and 2)
7), two instructions including the instruction at the address next to the BRA instruction are IF
At the stage, the instruction is read from the instruction cache, but the first instruction is executed, but the second instruction is not executed, but the jump destination instruction is controlled to be executed. That is, even if instructions following the instruction following the branch instruction are held in the instruction register, they are invalidated without being executed.

【００６３】さらに、ジャンプ先命令が第２命令に相当
する番地に有るとき（図２５，図２７）、ジャンプ先命
令を含む２命令をＩＦステージで命令キャッシュから読
み出すが、ジャンプ先の第１命令は実行せずにジャンプ
先の第２命令のみを実行するように制御されている。つ
まり、ジャンプ先命令より前の命令が命令レジスタに保
持されていてもそれらは、実行されずに無効化される。
なお、ＣＡＬＬ命令，ＲＴＮ命令の処理フローも同様で
ある。When the jump destination instruction is at an address corresponding to the second instruction (FIGS. 25 and 27), two instructions including the jump destination instruction are read from the instruction cache at the IF stage. Is controlled so that only the second instruction at the jump destination is executed without being executed. That is, even if instructions before the jump destination instruction are held in the instruction register, they are invalidated without being executed.
The same applies to the processing flow of the CALL instruction and the RTN instruction.

【００６４】図２８から図３１は、条件分岐命令ＢＲＡ
ｃｃ命令と命令１の実行時の処理フローを示したもので
ある。ここで、図３０と図３１は第１命令に、図２８と
図２９は第２命令にそれぞれ条件分岐命令ＢＲＡｃｃ命
令が存在しているときの処理フローであり、また、図２
８と図３０はジャンプ先命令が第１命令に相当する番地
に有るとき、図２９と図３１はジャンプ先命令が第２命
令に相当する番地に有る場合の処理フローである。図２
８から図３１はＡＤＤ，Ｆと示した命令でフラグのセッ
トが行われ、その結果に従い分岐の成否が決められる。
このときも、図２４から図２７を用いて説明した無条件
分岐命令処理時と同様にＢＲＡｃｃ命令に置かれている
番地の次の命令１が実行され、命令１の処理フロー中Ｗ
ステージにおいては、ＢＲＡｃｃ命令の分岐条件の成否
にかかわらず演算結果のレジスタファイルへの書き込み
が行われる。FIGS. 28 to 31 show a conditional branch instruction BRA.
It shows a processing flow when the cc instruction and instruction 1 are executed. Here, FIGS. 30 and 31 are processing flows when a conditional branch instruction BRAcc instruction is present in the first instruction, and FIGS. 28 and 29 are processing flows when a conditional branch instruction BRAcc instruction is present in the second instruction.
8 and 30 show the processing flow when the jump destination instruction is at the address corresponding to the first instruction, and FIGS. 29 and 31 are the processing flows when the jump destination instruction is at the address corresponding to the second instruction. FIG.
In FIGS. 8 to 31, flags are set by instructions ADD and F, and the success or failure of the branch is determined according to the result.
At this time, the instruction 1 next to the address located in the BRAcc instruction is executed similarly to the processing of the unconditional branch instruction described with reference to FIGS.
In the stage, the operation result is written to the register file regardless of whether the branch condition of the BRAcc instruction is satisfied.

【００６５】図３０と図３１のごとくＢＲＡ命令が第１
命令として存在する場合には、ＢＲＡｃｃ命令が命令レ
ジスタから読み出されると、Ｒステージにおいてシーケ
ンサ１１１はディスプレースメントフィールドｄとプロ
グラムカウンタとの加算を行い、プログラムカウンタの
ラッチ１０２にセットするとともに命令１のオペランド
のリードを並列処理する。この間に命令１の次の番地の
命令が次のサイクルで実行される。そして、次の次のサ
イクルに、ジャンプ先の２命令が読み出される。As shown in FIGS. 30 and 31, the BRA instruction is
If the instruction exists, when the BRAcc instruction is read from the instruction register, the sequencer 111 adds the displacement field d and the program counter in the R stage, sets the result in the program counter latch 102, and sets the operand of the instruction 1 Is processed in parallel. During this time, the instruction at the address next to instruction 1 is executed in the next cycle. Then, in the next next cycle, the two instructions at the jump destination are read.

【００６６】一方、条件分岐命令ＢＲＡｃｃ命令が第２
命令に有るとき（図２８，図２９）、ＢＲＡｃｃ命令の
次の番地の命令を含む２命令をＩＦステージで命令キャ
ッシュから読み出すが、第１命令は実行するが、第２命
令は実行せずにジャンプ先命令を実行するように制御さ
れている。つまり、条件分岐命令の次の命令より後の命
令が命令レジスタに保持されていても、それらは条件成
立のときは、実行されずに無効化される。On the other hand, the conditional branch instruction BRAcc
When there is an instruction (FIGS. 28 and 29), two instructions including the instruction at the address next to the BRAcc instruction are read from the instruction cache at the IF stage, but the first instruction is executed but the second instruction is not executed. It is controlled to execute the jump destination instruction. In other words, even if instructions following the instruction following the conditional branch instruction are held in the instruction register, they are invalidated without being executed when the condition is satisfied.

【００６７】さらに、条件分岐命令が実行され条件成立
したときジャンプする。ジャンプ先命令が第２命令に相
当する番地に有るとき（図２９，図３１）、ジャンプ先
命令を含む２命令をＩＦステージで命令キャッシュから
読み出すが、第１命令は実行せずに第２命令のジャンプ
先命令を実行するように制御されている。つまり、ジャ
ンプ先命令より前の命令が命令レジスタに保持されてい
てもそれらは、実行されずに無効化される。Further, when the conditional branch instruction is executed and the condition is satisfied, a jump is made. When the jump destination instruction is at an address corresponding to the second instruction (FIGS. 29 and 31), two instructions including the jump destination instruction are read from the instruction cache at the IF stage, but the first instruction is not executed and the second instruction is not executed. Is controlled to execute the jump destination instruction. That is, even if instructions before the jump destination instruction are held in the instruction register, they are invalidated without being executed.

【００６８】以上、“ｍ命令を同時に読み出してｍ個の
演算ユニットで逐次処理する手段”の動作を説明した
が、結果的には、プログラムカウンタは＋２ずつ増加さ
せるように制御して、第１命令レジスタ１０４、及び、
第２命令レジスタ１０５へ２個の命令を読み出して保存
し、第１命令レジスタ１０４の命令を第１演算ユニット
１０８で実行し、続いて、第２命令レジスタ１０５の命
令を第２演算ユニット１０９で実行する手段（逐次処
理）を設けるようにすることである。これによって、命
令キャッシュは、分岐命令を除き、２回に１回の割合で
動作すれば良い。以上、高速な並列処理手段と従来のソ
フトウエア互換を保つ逐次処理手段を有し、処理状態フ
ラグに基づく処理手段切換え方式の実施例を示した。し
かしながら、本実施例の並列実行処理手段は、図１のプ
ロセッサステータスレジスタ103の処理状態フラグＰＥ
１１６の値がＯＮの時、１マシンサイクルに２命令ずつ
処理させるので、その処理能力を最大２倍に向上できた
が、図１４から図１７に示すように、ディレイド分岐命
令を拡張したために、従来ソフトウエアとの互換性を失
っている。そこでディレイド分岐命令の後続の一命令の
みを実行する制御手段を設けることによって大部分のソ
フトウエアの互換を保つ方法を述べる。図３３は図３２
に制御信号線１４７を加えたものである。つまり、第２
命令デコーダ１０７にてディレイド分岐命令を解読して
いる時は、後続のディレイスロット命令は第１命令レジ
スタ１０４に存在する。しかし、第２命令レジスタ１０
５に保持している命令は、実行してはならない命令であ
る。そこで第２命令デコーダ１０７がディレイド分岐命
令を検出時に制御信号線１４７を介して第２命令レジス
タ１０５の内容を無効化することにより、ディレイド分
岐命令に後続する１命令のみを実行する。The operation of "means for simultaneously reading out m instructions and sequentially processing them by m arithmetic units" has been described. As a result, the program counter is controlled to be incremented by +2, and the first Instruction register 104, and
The two instructions are read and stored in the second instruction register 105, the instruction in the first instruction register 104 is executed by the first operation unit 108, and the instruction in the second instruction register 105 is subsequently executed by the second operation unit 109. This is to provide means for executing (sequential processing). As a result, the instruction cache only needs to operate once every two times except for the branch instruction. The embodiment of the processing means switching method based on the processing state flag having the high-speed parallel processing means and the sequential processing means for maintaining the conventional software compatibility has been described. However, the parallel execution processing means of the present embodiment uses the processing status flag PE of the processor status register 103 of FIG.
When the value of 116 is ON, processing is performed two instructions at a time in one machine cycle, so that the processing ability can be improved up to twice. However, as shown in FIG. 14 to FIG. It has lost compatibility with conventional software. Therefore, a method for maintaining the compatibility of most software by providing a control means for executing only one instruction following the delayed branch instruction will be described. FIG. 33 shows FIG.
And a control signal line 147. That is, the second
When the instruction decoder 107 is decoding the delayed branch instruction, the subsequent delay slot instruction exists in the first instruction register 104. However, the second instruction register 10
The instruction held in 5 is an instruction that must not be executed. Therefore, when the second instruction decoder 107 detects the delayed branch instruction, it invalidates the contents of the second instruction register 105 via the control signal line 147, thereby executing only one instruction subsequent to the delayed branch instruction.

【００６９】また、第１命令デコーダ１０６にてディレ
イド分岐命令を解読している時は、後続のディレイスロ
ット命令は第２命令デコーダ１０７にて解読中であり、
並列実行しても問題はない。以上のように制御信号線１
４７を用いて第２命令レジスタ１０５の内容を無効化に
することにより大部分のソフトウエアの互換性を保つこ
とができる。When the first instruction decoder 106 is decoding the delayed branch instruction, the subsequent delay slot instruction is being decoded by the second instruction decoder 107,
There is no problem with parallel execution. As described above, the control signal line 1
By using 47 to invalidate the contents of the second instruction register 105, compatibility of most software can be maintained.

【００７０】次に、処理状態フラグを用いずに、常時並
列処理を行うことを基本にした“大部分の従来のソフト
ウエアを正しく実行できる方式”の実施例を述べる。Next, a description will be given of an embodiment of a "method capable of correctly executing most conventional software" based on the fact that parallel processing is always performed without using a processing state flag.

【００７１】本実施例は、分岐命令を除く処理について
は、基本的には２命令ずつ処理し、分岐命令は次に引き
続く１つの命令のみ（図１４から図１７の命令１だけ）
実行し、残りの命令の実行は抑止する手段を設けるよう
にすることである。In this embodiment, the processing except for the branch instruction is basically performed by two instructions, and the branch instruction is only the next succeeding instruction (only instruction 1 in FIGS . 14 to 17).
The purpose is to provide a means for executing and inhibiting the execution of the remaining instructions.

【００７２】図３４は、常に並列処理を行うことを基本
とした構成である。即ち、プログラムカウンタ演算器１
０１は、常に＋２加算される（１４３）。しかし、制御
信号線１４７を用いて第２命令レジスタ１０５の内容を
無効化することによってソフトウエアの互換性を保つこ
とができる。以下に図１４ないし図１７を用いて図３４
に示す構成の動作を説明する。なお、図１４は、前述し
た実施例のものを使用する。FIG . 34 shows a configuration based on always performing parallel processing. That is, the program counter calculator 1
01 is always added by +2 (143). However, the compatibility of software can be maintained by invalidating the contents of the second instruction register 105 using the control signal line 147. FIG. 34 will be described below with reference to FIGS .
The operation of the configuration shown in FIG. FIG. 14 uses the embodiment described above.

【００７３】図１４は、第２命令として無条件ジャンプ
ＢＲＡ命令実行時の処理フローを示したものである。Ｂ
ＲＡ命令が読み出されると、Ｒステージにおいてシーケ
ンサ１１１はディスプレースメントフィールドｄとプロ
グラムカウンタとの加算を行い、プログラムカウンタの
ラッチ１０２にセットする。この間にＢＲＡ命令に引き
続く番地の命令１と命令２が読み出され、その次のサイ
クルに、ジャンプ先の２命令が読み出される。本実施例
では、命令１のみを実行し、命令２の実行を抑止する。
すなわち、従来のソフトウエアとの互換性をとるため
に、分岐命令BRAに引き続く１命令しか実行できないよ
うに制御することである。つまり、図１４の命令２は、
図３４の第２命令デコーダ１０７でＮＯＰ命令と等価な
処理になるように信号線１４７を介して制御するか、ま
たは、第２命令のレジスタファイルへの書き込みを抑止
するように制御することで可能となる。コンパイラによ
り、分岐命令のアドレス計算中にできる限り有効な命令
を実行できるようにコード生成するが、何もすることが
無い時には図１４の命令１をＮＯＰ命令としておく。こ
のときには、実質的には１マシンサイクルの待ちが生ず
る。FIG . 14 shows a processing flow when the unconditional jump BRA instruction is executed as the second instruction. B
When the RA instruction is read, the sequencer 111 adds the displacement field d and the program counter in the R stage, and sets the result in the latch 102 of the program counter. During this time, the instructions 1 and 2 at the addresses following the BRA instruction are read, and in the next cycle, the two instructions at the jump destination are read. In this embodiment, only the instruction 1 is executed, and the execution of the instruction 2 is suppressed.
That is, in order to achieve compatibility with conventional software, control is performed so that only one instruction following the branch instruction BRA can be executed. That is, the instruction 2 in FIG.
The control can be performed by the second instruction decoder 107 in FIG. 34 via the signal line 147 so as to perform processing equivalent to the NOP instruction, or by controlling the writing of the second instruction to the register file to be suppressed. Becomes The compiler generates a code so that an instruction as effective as possible can be executed during the calculation of the address of the branch instruction, but when there is nothing to do, instruction 1 in FIG. 14 is set as a NOP instruction. At this time, a wait of substantially one machine cycle occurs.

【００７４】図１５は、第２命令として条件分岐命令Ｂ
ＲＡｃｃ命令実行時の処理フローを示したものである。
ＡＤＤ，Ｆと示した命令で、フラグのセットが行われ、
その結果に従い分岐の成否が決められる。このときも、
図１４を用いて説明した無条件分岐命令と同様にＢＲＡ
ｃｃ命令に置かれている番地の次の命令、図１５の命令
１と命令２が読み出され、命令１の処理フロー中Ｗステ
ージにおいては、ＢＲＡｃｃ命令の分岐条件の成否にか
かわらず演算結果のレジスタファイルへの書き込みが行
われ、一方、命令２の実行は抑止する。すなわち、図１
５の命令２は、図３４の第２命令デコーダ１０７でＮＯ
Ｐ命令と等価な処理になるように制御するか、または、
第２命令のレジスタファイルへの書き込みを抑止するよ
うに制御することで可能となる。この時には、実質的に
１マシンサイクルの待ちが生ずる。FIG. 15 shows a conditional branch instruction B as a second instruction.
It shows a processing flow when the RAcc instruction is executed.
Flags are set by instructions ADD and F,
The success or failure of the branch is determined according to the result. Again,
Like the unconditional branch instruction described with reference to FIG.
The instruction next to the address located in the cc instruction, instruction 1 and instruction 2 in FIG. 15 are read out. At the W stage in the processing flow of the instruction 1, the operation result of the BRAcc instruction is satisfied regardless of whether or not the branch condition is satisfied. Writing to the register file is performed, while execution of instruction 2 is suppressed. That is, FIG.
The instruction 2 of No. 5 is NO in the second instruction decoder 107 of FIG.
Either control so that it is equivalent to the P instruction, or
This can be achieved by controlling the writing of the second instruction to the register file to be suppressed. At this time, a wait of substantially one machine cycle occurs.

【００７５】さらに、図１６は、第１命令として無条件
ジャンプＢＲＡ命令実行時の処理フローを示したもので
ある。ＢＲＡ命令と命令１が読み出されると、Ｒステー
ジにおいてシーケンサ１１１はディスプレースメントフ
ィールドｄとプログラムカウンタとの加算を行い、プロ
グラムカウンタのラッチ１０２にセットすると同時に、
命令１のオペランドをリードする。この間に次の命令２
と命令３が読み出される。そしてその次のサイクルに、
ジャンプ先の命令１と命令２が読み出される。しかしな
がら、従来のソフトウエアとの互換性をとるために、分
岐命令ＢＲＡ命令と引き続き命令１は、並列に実行する
が命令２と命令３の実行は抑止する。つまり、図１６の
命令２，命令３は、図３４の第１命令デコーダ１０６，
第２命令デコーダ１０７でＮＯＰ命令と等価な処理にな
るように制御するか、または、第２命令，第３命令のレ
ジスタファイルへの書き込みを抑止するように制御する
ことで可能となる。コンパイラにより、分岐命令のアド
レス計算中にできる限り有効な命令を実行できるように
コード生成するが、何もすることが無い時には図１６の
命令１をＮＯＰ命令としておく。このときには、実質的
には１マシンサイクルの待ちが生ずる。FIG. 16 shows a processing flow when the unconditional jump BRA instruction is executed as the first instruction. When the BRA instruction and the instruction 1 are read, the sequencer 111 adds the displacement field d and the program counter in the R stage, sets the addition in the program counter latch 102, and
Read the operand of instruction 1. During this time, the next instruction 2
And instruction 3 are read. And in the next cycle,
Instructions 1 and 2 at the jump destination are read. However, for compatibility with the conventional software, the branch instruction BRA instruction and the instruction 1 are executed in parallel, but the execution of the instructions 2 and 3 is suppressed. That is, the instruction 2 and the instruction 3 in FIG.
This can be achieved by controlling the second instruction decoder 107 to perform processing equivalent to the NOP instruction, or by controlling the writing of the second instruction and the third instruction to the register file. The compiler generates a code so as to execute a valid instruction as much as possible during the calculation of the address of the branch instruction. However, when there is nothing to do, instruction 1 in FIG. 16 is set as a NOP instruction. At this time, a wait of substantially one machine cycle occurs.

【００７６】図１７は、第１命令として条件分岐命令Ｂ
ＲＡｃｃ命令実行時の処理フローを示したものである。
ＡＤＤ，Ｆと示した命令で、分岐状態フラグのセットが
行われ、その結果に従い分岐の成否が決められる。この
ときも、図１６を用いて説明した無条件分岐命令と同様
にＢＲＡｃｃ命令と引き続く番地の命令１が同時に読み
出され、命令１の処理フロー中Ｗステージにおいて、Ｂ
ＲＡｃｃ命令の分岐条件の成否にかかわらず演算結果の
レジスタファイルへの書き込みが行われる。さらに、図
１７の命令２と命令３は、図３４の第１命令デコーダ１
０６と第２命令デコーダ１０７においてＮＯＰ命令と等
価な処理になるように制御するか、または、第２命令と
第３命令のレジスタファイルへの書き込みを抑止するよ
うに制御するか、または、分岐命令が第１命令の場合に
は、命令１を並列実行後にジャンプ先命令１へ分岐する
ように制御することで可能となる。FIG. 17 shows a conditional branch instruction B as the first instruction.
It shows a processing flow when the RAcc instruction is executed.
Instructions indicated by ADD and F set a branch state flag, and determine the success or failure of the branch according to the result. At this time, similarly to the unconditional branch instruction described with reference to FIG. 16, the BRAcc instruction and the instruction 1 at the subsequent address are simultaneously read out.
The operation result is written to the register file regardless of whether the branch condition of the RAcc instruction is satisfied or not. Further, the instruction 2 and the instruction 3 in FIG.
06 and the second instruction decoder 107 to control the processing to be equivalent to the NOP instruction, or to control the writing of the second instruction and the third instruction to the register file, or to execute the branch instruction. Is the first instruction, the instruction 1 can be controlled by executing the instruction 1 in parallel so as to branch to the jump destination instruction 1.

【００７７】以上、大部分のソフトウエアを正常実行可
能にし、かつ並列実行により高速化する方式の動作を図
３４を用いて説明したが、結果的には、図１４，図１５
の命令２、図１６，図１７の命令２，命令３の実行を抑
止することである。これによって、１サイクルの待ちサ
イクルを有効に利用した従来のディレイド分岐方式の互
換性を保つことができ、且つ、それ以外の命令は基本的
には２命令並列実行可能であるため、従来ソフトウエア
の互換性と１から２倍の間に処理性能を向上できる両方
の効果がある。[0077] Although a large part of the software to properly executable, and describes the operating method for a high speed by parallel execution with reference to FIG. 34 which will result in 14, 15
In this case, the execution of the instruction 2, the instructions 2 and 3 in FIGS. 16 and 17 is suppressed. This makes it possible to maintain the compatibility of the conventional delayed branch method that effectively uses one wait cycle, and that other instructions can be basically executed in parallel with two instructions. And the effect of improving processing performance between 1 and 2 times.

【００７８】以上、分岐命令を中心に並列実行処理手段
を説明したが、当然のことながら第１命令と第２命令の
組合せによって、両命令を同時に実行できないことがあ
る。これを競合と呼ぶことにする。以下に競合を説明す
る。Although the parallel execution processing means has been described centering on the branch instruction, it is needless to say that both instructions may not be executed at the same time depending on the combination of the first instruction and the second instruction. This is called a conflict. The conflict is described below.

【００７９】１．ロード，ストア命令の組合せ。1. Combination of load and store instructions.

【００８０】２．第１命令のディスティネーションレジ
スタフィールドＤで指示されるレジスタと、第２命令の
第１ソースレジスタフィールドＳ１で指示されるレジス
タ、または、第２命令の第２ソースレジスタフィールド
Ｓ２で指示されるレジスタが一致する時。2. The register indicated by the destination register field D of the first instruction and the register indicated by the first source register field S1 of the second instruction or the register indicated by the second source register field S2 of the second instruction are When they match.

【００８１】上記競合の時、１．はデータキャッシュを
複数の命令から同時にアクセスできないことにより生ず
る本実施例特有の問題である。例えば、データキャッシ
ュを２ポート化すれば解決できる。また、２．について
は、図３４の第１命令デコーダと第２命令デコーダにお
いて互いのソースレジスタフィールドとディスティネー
ションレジスタフィールドの一致比較を行い、一致した
場合には第２命令をＮＯＰ命令に変えてしまうことで実
現できる。すなわち、第１命令のディスティネーション
レジスタフィールドＤで指示されたレジスタと第２命令
の２つのソースレジスタフィールドで指示されるレジス
タが一致した場合に、第２命令をNOP命令に変更して第
１命令とＮＯＰ命令を並列実行処理し、次のサイクルで
は、第１命令をＮＯＰ命令に変更してＮＯＰ命令と第２
命令を並列実行することで達成できる。At the time of the above competition, Is a problem peculiar to this embodiment caused by the inability to simultaneously access the data cache from a plurality of instructions. For example, the problem can be solved by forming the data cache into two ports. Also, 2. 34 , the first instruction decoder and the second instruction decoder of FIG. 34 compare and match the source register field and the destination register field with each other, and if they match, change the second instruction to a NOP instruction. it can. That is, when the register specified by the destination register field D of the first instruction matches the register specified by the two source register fields of the second instruction, the second instruction is changed to the NOP instruction and the first instruction is changed to the NOP instruction. And the NOP instruction are executed in parallel. In the next cycle, the first instruction is changed to the NOP instruction, and the NOP instruction and the second instruction are changed.
This can be achieved by executing instructions in parallel.

【００８２】以上、並列実行時の競合問題について述べ
た。The contention problem during parallel execution has been described above.

【００８３】本発明の全ての実施例は、２つの命令デコ
ーダと２つの演算ユニットを備えた場合について述べた
が、明らかに４台８台と増やしても全く問題はない。Although all the embodiments of the present invention have been described in the case where two instruction decoders and two arithmetic units are provided, there is no problem even if the number is increased to four or eight.

【００８４】本発明の最後の実施例について述べる。そ
れは、図３３のプロセッサステータスレジスタ１０３の
処理状態フラグＰＥ１１６についてである。本来、処理
状態フラグＰＥ１１６は、従来ソフトウエアとの互換性
を必要とするシステムにおいて、ハードウエアの切換え
を行うための情報源としての切り換え可能な手段、及
び、切り換えるための命令によりそれを切り換えるもの
であった。The last embodiment of the present invention will be described. It is the processing state flag PE116 of the processor status register 103 of FIG. 33. Originally, the processing state flag PE116 is a switchable means as an information source for performing hardware switching in a system that requires compatibility with conventional software, and that is switched by an instruction for switching. Met.

【００８５】しかし、専用システム、これから作成され
る新しいソフトウエアだけを実行すれば良いシステムな
どでは、システムに組むときに片方の機能しか利用しな
い場合がある。そこで、データ処理装置としては、並列
実行処理手段と逐次実行処理手段の両方の手段を備えて
おき、構築するシステムに応じて片方の手段だけを組み
込む手段が必要となる。この機能を実現する１つの手段
としては、プロセッサステータスレジスタ１０３の処理
状態フラグＰＥ１１６を、初期化時，リセット時に命令
によりどちらかにセットする手段がある。また、マイク
ロプロセッサなどのＬＳＩの場合、ＬＳＩと外部とで信
号をやり取りするピンを用いて、上記２つの手段を切り
換える選択手段もある。ピンは周知の通りＬＳＩから延
びているものである。However, in a dedicated system or a system that only needs to execute new software to be created, only one of the functions may be used when assembling the system. Therefore, it is necessary to provide a data processing device with both parallel execution processing means and sequential execution processing means, and to incorporate only one means according to the system to be built. As one means for realizing this function, there is a means for setting the processing status flag PE116 of the processor status register 103 to any one of the instructions at the time of initialization and at the time of reset. In the case of an LSI such as a microprocessor, there is also a selecting means for switching between the above two means by using a pin for exchanging signals between the LSI and the outside. The pins extend from the LSI as is well known.

【００８６】[0086]

【発明の効果】本発明によれば、従来の逐次処理型計算
機上で動作する全てのソフトウエアを正常動作させ、し
かも、高度な並列処理機能を用いて、より高速実行する
ことができるため処理時間を短縮できる。According to the present invention, all software operating on a conventional sequential processing type computer can be operated normally, and can be executed at a higher speed by using an advanced parallel processing function. You can save time.

【００８７】さらに、大部分の従来のソフトウエアを正
常動作させ、かつ高度な並列処理機能を用いて、より高
速実行することができる。Further, most conventional software can be operated normally, and can be executed at higher speed by using advanced parallel processing functions.

[Brief description of the drawings]

【図１】本発明の一実施例を示す全体ブロック図であ
る。FIG. 1 is an overall block diagram showing one embodiment of the present invention.

【図２】従来例の全体ブロック図である。FIG. 2 is an overall block diagram of a conventional example.

【図３】図２に示す構成の動作を説明するタイミングチ
ャートである。FIG. 3 is a timing chart for explaining the operation of the configuration shown in FIG. 2;

【図４】図２に示す構成の動作を説明するタイミングチ
ャートである。FIG. 4 is a timing chart for explaining the operation of the configuration shown in FIG. 2;

【図５】図２に示す構成の動作を説明するタイミングチ
ャートである。FIG. 5 is a timing chart for explaining the operation of the configuration shown in FIG. 2;

【図６】もう１つの従来例の全体ブロック図である。FIG. 6 is an overall block diagram of another conventional example.

【図７】図６に示す構成の動作を説明するタイミングチ
ャートである。FIG. 7 is a timing chart for explaining the operation of the configuration shown in FIG. 6;

【図８】図６に示す構成の動作を説明するタイミングチ
ャートである。FIG. 8 is a timing chart for explaining the operation of the configuration shown in FIG. 6;

【図９】ＲＩＳＣ型計算機のディレイド分岐命令処理フ
ロー図である。FIG. 9 is a flowchart of a delayed branch instruction processing of the RISC type computer.

【図１０】本発明の命令一覧を示す図である。FIG. 10 is a diagram showing a list of instructions according to the present invention.

【図１１】本発明の命令フォーマットを示す図である。FIG. 11 is a diagram showing an instruction format of the present invention.

【図１２】本発明の並列処理における動作を説明するタ
イミングチャートである。FIG. 12 is a timing chart illustrating an operation in parallel processing according to the present invention.

【図１３】本発明の並列処理における動作を説明するタ
イミングチャートである。FIG. 13 is a timing chart illustrating an operation in parallel processing according to the present invention.

【図１４】並列処理における動作を説明するタイミング
チャートである。FIG. 14 is a timing chart illustrating an operation in parallel processing.

【図１５】並列処理における動作を説明するタイミング
チャートである。FIG. 15 is a timing chart illustrating an operation in parallel processing.

【図１６】並列処理における動作を説明するタイミング
チャートである。FIG. 16 is a timing chart illustrating an operation in parallel processing.

【図１７】並列処理における動作を説明するタイミング
チャートである。FIG. 17 is a timing chart illustrating an operation in parallel processing.

【図１８】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 18 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図１９】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 19 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図２０】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 20 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図２１】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 21 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図２２】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 22 is a timing chart illustrating an operation in the sequential processing of the present invention.

【図２３】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 23 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図２４】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 24 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図２５】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 25 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図２６】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 26 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図２７】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 27 is a timing chart illustrating the operation in the sequential processing of the present invention.

【図２８】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 28 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図２９】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 29 is a timing chart illustrating the operation in the sequential processing of the present invention.

【図３０】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 30 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図３１】本発明の逐次処理における動作を説明するタ
イミングチャートである。FIG. 31 is a timing chart for explaining the operation in the sequential processing of the present invention.

【図３２】本発明の他の実施例を示す全体ブロック図で
ある。FIG. 32 is an overall block diagram showing another embodiment of the present invention.

【図３３】本発明の他の実施例を示す全体ブロック図で
ある。FIG. 33 is an overall block diagram showing another embodiment of the present invention.

【図３４】本発明の他の実施例を示す全体ブロック図で
ある。FIG. 34 is an overall block diagram showing another embodiment of the present invention.

[Explanation of symbols]

１０３…プロセッサステータスレジスタ、１０４…第１
命令レジスタ、１０５…第２命令レジスタ、１０６…第
１命令デコーダ、１０７…第２命令デコーダ、１０８…
第１演算ユニット、１０９…第２演算ユニット、１１０
…レジスタファイル。103: processor status register, 104: first
Instruction register, 105: second instruction register, 106: first instruction decoder, 107: second instruction decoder, 108 ...
1st arithmetic unit, 109 ... 2nd arithmetic unit, 110
... Register file.

───────────────────────────────────────────────────── フロントページの続き (72)発明者坂東忠秋茨城県日立市久慈町4026番地株式会社日立製作所日立研究所内 (56)参考文献久我守弘、外４名”ＳＩＭＰ（単一命令流／多重パイプライン）方式に基づく『新風』プロセッサの低いレベル並列処理アルゴリズム”，情報処理学会論文誌，Ｖｏｌ．30，Ｎｏ．12，平成元年12 月15日，ｐ．1603−1611 (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 9/38 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Tadaaki Bando 4026 Kuji-cho, Hitachi City, Ibaraki Prefecture Within Hitachi Research Laboratory, Hitachi, Ltd. “Low-level parallel processing algorithm of“ Shinpu ”processor based on pipeline)”, IPSJ Transactions, Vol. 30, No. 12, December 15, 1989, p. 1603-1611 (58) Field surveyed (Int.Cl. ⁶ , DB name) G06F 9/38

Claims

(57) [Claims]

A program counter for designating an instruction to be read from a memory; a plurality of instruction registers for respectively storing instructions designated by the program counter; and a plurality of instruction registers for decoding instructions stored in the instruction register. A decoder, a plurality of computing units for executing computations, and a plurality of computing units based on the computation results of the computing units.
Whether to perform operations on columns or perform operations sequentially
And a control device for controlling the parallel processing.

2. A program counter for designating an instruction to be read from a memory, a plurality of instruction registers for respectively storing instructions designated by the program counter, and a plurality of instruction registers for decoding the instructions stored in the instruction register. A decoder, a plurality of computing units for executing computations, and a plurality of the computing units based on the computation results of the computing units.
A control device for controlling whether to operate the column or not.
Column processing device.

3. The method according to claim 1, wherein at least one of
Of the instruction register by decoding the instruction.
A parallel processing device that invalidates the contents of a star.