JP2861234B2

JP2861234B2 - Instruction processing unit

Info

Publication number: JP2861234B2
Application number: JP9450790A
Authority: JP
Inventors: 由美子牛丸
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1989-08-18
Filing date: 1990-04-10
Publication date: 1999-02-24
Anticipated expiration: 2014-02-24
Also published as: JPH03163627A

Description

【発明の詳細な説明】〔産業上の利用分野〕第１および第２の発明は、情報処理装置の命令処理装
置に関し、特に、第１の発明は、パイプライン命令処理
機構を備え単一マシンサイクルで命令を実行するRISC型
マイクロプロセッサに関し、第２の発明は複数命令を並
列して実行する並列命令処理装置、およびパイプライン
機構を利用し高速処理を実現するパイルライン命令処理
装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The first and second inventions relate to an instruction processing device of an information processing device, and in particular, the first invention relates to a single machine having a pipeline instruction processing mechanism. The second invention relates to a parallel instruction processing device for executing a plurality of instructions in parallel and a pile line instruction processing device for realizing high-speed processing using a pipeline mechanism.

[Conventional technology]

（１）第１の発明における従来の技術では、計算機シス
テムの高性能化に伴い、様々なパイプライン構成の計算
機が開発されているが、マシンサイクルの高速化のため
に、オペランドの読み出しとオペランドの書き込みをそ
れぞれひとつのパイプライン・ステージに位置付けたパ
イプライン方式が使用されることが多い。この種のパイ
プライン方式では、第３図（ａ）の従来の命令処理装置
のパイプラインのタイミングチャートに示すように、・メモリからの命令フェッチ（IFステージと略す）、・汎用レジスタ・ファイルからのオペランド・フェッチ
（OFステージと略す）、・命令の実行（EXステージと略す）、・汎用レジスタ・ファイルへのオペランド・ライト（OW
ステージと略す）の４つのステージからパイプラインを構成する。パイプ
ラインの各ステージの処理は１マシンサイクルで実行で
きるため、１マシンサイクル毎に命令が実行できる。ま
た、各処理が細分化されているため、マシンサイクル自
体も高速化できる。このため高性能な命令処理装置が提
供できる。(1) In the prior art of the first invention, computers with various pipeline configurations have been developed with the advancement of computer systems. However, in order to speed up machine cycles, operand reading and operand In many cases, a pipeline method is used, in which each write is positioned in one pipeline stage. In this type of pipeline system, as shown in the timing chart of the pipeline of the conventional instruction processing device in FIG. 3A, instruction fetch from memory (abbreviated as IF stage); Operand fetch (abbreviated as OF stage), instruction execution (abbreviated as EX stage), operand write to general-purpose register file (OW
A pipeline is composed of four stages. Since the processing of each stage of the pipeline can be executed in one machine cycle, an instruction can be executed every machine cycle. Further, since each process is subdivided, the machine cycle itself can be sped up. Therefore, a high-performance instruction processing device can be provided.

（２）また、第２の発明における従来の技術では、以下
のような技術がある。(2) Further, in the conventional technique according to the second invention, there is the following technique.

（ａ）VLIW型並列計算機 VLIW（Very Long Instruction Word）方式は、第９図
に示すように、比較的長い命令を多数のフィールドに分
散し、各々のフィールドで多数の演算器、レジスタ、相
互結合網、メモリ等を独立して制御することにより並列
処理を実現するものである。(A) VLIW type parallel computer The VLIW (Very Long Instruction Word) method distributes a relatively long instruction into many fields as shown in FIG. 9 and a large number of arithmetic units, registers, and interconnections in each field. The parallel processing is realized by independently controlling the network, the memory, and the like.

VLIW方式では、演算の並列性はコンパイル時に抽出さ
れ、コンパイラが並列演算可能なものを１つの命令に合
成する。並列演算器の数に近い並列度が得られる場合
に、高速処理が達成できる。しかし、並列度の低い場合
には、命令フィールドに空きができて命令のビット使用
効率が低下する。どの程度命令フィールドを埋めること
ができるかはコンパイラの能力およびソース・プログラ
ムに依存する。In the VLIW method, the parallelism of the operation is extracted at the time of compiling, and the compiler synthesizes an instruction that can be operated in parallel into one instruction. When a degree of parallelism close to the number of parallel computing units is obtained, high-speed processing can be achieved. However, when the degree of parallelism is low, the instruction field becomes empty, and the bit usage efficiency of the instruction decreases. The extent to which the instruction field can be filled depends on the capabilities of the compiler and the source program.

VLIW方式では、プログラムの並列性の抽出をコンパイ
ル時に実行するため、データの依存関係の検出等の複雑
な処理を行なう必要がない。したがって、ハードウェア
構成が簡単にできる。In the VLIW method, since the parallelism of a program is extracted at the time of compilation, there is no need to perform complicated processing such as detection of data dependency. Therefore, the hardware configuration can be simplified.

VLIW方式は、水平型マイクロ命令方式から派生した考
え方に基づいており、機能レベルの低い演算器によるき
め細かな並列処理（低レベル並列処理）に適している。The VLIW method is based on a concept derived from the horizontal microinstruction method, and is suitable for fine-grained parallel processing (low-level parallel processing) using an arithmetic unit with a low function level.

（ｂ）命令パイプライン処理計算機システムにおける機械命令の実行過程は、命令
フェッチ（読み出し:IFと略す）、命令デコード（解読:
IDと略す）、オペランド・アドレス生成（OAと略す）、
オペランド・フェッチ（OFと略す）、演算実行（EXと略
す）、結果の書き戻し（WBと略す）を逐次的に進めるこ
とによってなされる。命令パイプライン方式は、この命
令実行の各ステージがオーバラップして実行される。各
実行ステージの実行時間が同一でそれがマシン・サイク
ルに等しいとき、命令パイプライン方式は最大の性能を
発揮し、演算結果は１マシン・サイクル毎に得られるこ
とになる。(B) Instruction pipeline processing The machine instruction execution process in the computer system includes instruction fetch (read: abbreviated as IF), instruction decode (decoding:
ID), operand address generation (abbreviated as OA),
This is performed by sequentially performing operand fetch (abbreviated as OF), execution of an operation (abbreviated as EX), and writing back of the result (abbreviated as WB). In the instruction pipeline system, each stage of the instruction execution is executed in an overlapping manner. When the execution time of each execution stage is the same and is equal to a machine cycle, the instruction pipeline method performs the best, and the operation result is obtained every machine cycle.

命令パイプラインの流れを乱す要因としては、・先行命令の演算結果を後続命令が必要とする場合・先行命令が後続命令のオペランド・アドレスを決定す
る場合・分岐が起こる場合・メモリ・アクセスの競合・先行命令が後続命令の内容を書き換える場合・割り込み／例外が発生した場合・命令が複雑で演算実行に複数マシン・サイクル必要と
する場合などがある。Factors that disrupt the flow of the instruction pipeline include:-When the operation result of the preceding instruction is required by the subsequent instruction-When the preceding instruction determines the operand address of the subsequent instruction-When a branch occurs-Memory access conflict • When the preceding instruction rewrites the contents of the following instruction. • When an interrupt / exception occurs. • There are cases where the instruction is complicated and requires more than one machine cycle to execute the operation.

これらの命令パイプラインを乱す要因を最小限に抑え
るために種々の工夫がなされている。例えば、条件分岐
によりパイプラインの乱れを抑える工夫として、プログ
ラム・ループが格納できるような大きな命令バッファを
使用するループ・バッファ方式、条件成立および条件不
成立両方の場合の命令系列を処理する複数命令流方式、
分岐命令の履歴情報から分岐を予測する分岐予測方式な
どが知られている。Various schemes have been devised to minimize the factors that disrupt these instruction pipelines. For example, as a device for suppressing pipeline disruption due to conditional branching, a loop buffer system using a large instruction buffer capable of storing a program loop, a multi-instruction flow for processing an instruction sequence when both conditions are satisfied and conditions are not satisfied, and so on. method,
A branch prediction method for predicting a branch from history information of a branch instruction is known.

最近の高性能マイクロプロセッサの分野では、機械命
令セットを簡素化し高速処理を達成しようというRISC
（Reduced Instruction Set Computer）のアプローチが
注目されている。In recent high-performance microprocessors, RISC seeks to simplify machine instruction sets and achieve high-speed processing.
(Reduced Instruction Set Computer) approach is attracting attention.

RISCのアプローチは、高級言語プログラムのトレース
結果の解析と、スーパーコンピュータCray−１のハード
・ワイヤード論理の成功から生まれたもので、・レジスターレジスタ間演算を基本とする簡素な命令セ
ット・パイプラインの重視・１マシン・サイクル実行・最新コンパイラ技術の適用などを特徴とするレジスターレジスタ演算を基本とする命令セットは、
オペランド・アドレス生成（OA）を不要にするととも
に、オペランド・フェッチ（OF）でのメインメモリへの
アクセスを不要にした。また、シンプルな命令セットは
命令デコードを簡単にし、ハードワイヤード論理を適用
することにより高速化して、命令デコード（ID）をオペ
ランド・フェッチ（OF）ステージに含めることが可能と
なった。さらに、各ステージにおける処理のバランスを
考慮され、第10図に示すように、・命令フェッチ＆オペランド・フェッチ（IF/OF）・命令実行（EX）・オペランド・ライト（OW）の３ステージで構成される命令パイプラインが開発され
ている。RISC's approach stems from the analysis of trace results in high-level language programs and the success of the hardwired logic of the supercomputer Cray-1. ・ Simple instruction set based on register-register operation ・ Pipeline Emphasis ・ One machine cycle execution ・ The latest compiler technology is applied.
Operand address generation (OA) is no longer required, and access to main memory in operand fetch (OF) is no longer required. The simple instruction set also simplifies instruction decoding, speeds up by applying hardwired logic, and allows instruction decoding (ID) to be included in the operand fetch (OF) stage. Furthermore, considering the balance of processing in each stage, as shown in Fig. 10, it consists of three stages:-instruction fetch & operand fetch (IF / OF)-instruction execution (EX)-operand write (OW) An instruction pipeline has been developed.

この命令パイプラインでは、命令１が分岐命令の場
合、その実行ステージ（EX）が終了して初めて、命令２
のフェッチが可能となる。したがって、命令１の実行中
にフェッチした命令は無効化する必要があり、命令パイ
プラインに１マシン・サイクルの空きが生じ性能が低下
する。In this instruction pipeline, if the instruction 1 is a branch instruction, the instruction 2
Can be fetched. Therefore, the instruction fetched during the execution of instruction 1 needs to be invalidated, resulting in a vacancy of one machine cycle in the instruction pipeline, which degrades the performance.

この性能の低下を最小限に抑えるために、遅延分岐機
構が利用されている。これは、第11図に示すように分岐
命令はその発行から１マシン・サイクル遅れて実行され
る遅延型命令であるとみなし、コンパイラによる命令ス
ケジューリングにより分岐命令直後の命令スロットを有
効な命令で埋めることにより、パイプラインの乱れを無
くし性能を維持しようとするものである。分岐命令直後
の命令スロットに有効な命令を埋め込むことができなか
った場合には、その命令スロットにはNOP命令を埋め込
む必要がある。この場合にはもちろん性能の低下があ
る。To minimize this performance degradation, a delay branching mechanism is used. This is because, as shown in FIG. 11, a branch instruction is regarded as a delayed instruction executed one machine cycle after its issuance, and the instruction slot immediately after the branch instruction is filled with valid instructions by instruction scheduling by the compiler. In this way, it is intended to eliminate the disturbance of the pipeline and maintain the performance. If a valid instruction cannot be embedded in the instruction slot immediately after the branch instruction, it is necessary to embed a NOP instruction in that instruction slot. In this case, of course, there is a decrease in performance.

どの程度の遅延命令スロットを有効な命令で埋めるこ
とができるかは、コンパイラの性能に依存する。現在、
最新のコンパイラ技術を用いた場合、遅延命令スロット
の約80〜90パーセントを有効に利用することが可能とな
っている。How many delayed instruction slots can be filled with valid instructions depends on the performance of the compiler. Current,
With the latest compiler technology, about 80-90% of the delay instruction slot can be used effectively.

[Problems to be solved by the invention]

（１）第１の発明に対する前述のパイプライン構成を採
用した従来の命令処理装置では、第３図（ｂ）に示すよ
うに、命令１が分岐命令の場合には、継続の命令である
命令２の最初のステージ（IFステージ）は、分岐命令の
EXステージが終了するマシン・サイクルt4まで待たされ
る。これは、分岐命令における分岐の成立／不成立およ
び分岐先アドレスの計算がEXステージで実行されるため
である。(1) In the conventional instruction processing apparatus employing the above-described pipeline configuration for the first invention, as shown in FIG. 3 (b), when instruction 1 is a branch instruction, the instruction is a continuation instruction. The first stage (IF stage) of the second instruction is
It is waited until the machine cycle t4 when the EX stage ends. This is because the branch taken / not taken in the branch instruction and the calculation of the branch destination address are executed in the EX stage.

従って、従来の命令処理装置では、分岐命令が実行さ
れる度に命令の実行パイプラインに空きが生じることに
なる。すなわち、分岐命令と後続の命令は並列して実行
されないため最大限の実行速度が得られないという欠点
を有する。Therefore, in the conventional instruction processing device, every time a branch instruction is executed, an empty space is created in the instruction execution pipeline. That is, there is a disadvantage that the maximum execution speed cannot be obtained because the branch instruction and the following instruction are not executed in parallel.

（２）第２の発明に対する前述のVLIW方式と命令パイプ
ライン方式を組み合わせ並列パイプライン命令処理装置
を構成しようとした場合を考える。例えば、命令パイプ
ライン方式の命令処理装置を４つ並列に並べ、４つのフ
ィールドを持つVLIW型の命令を実行する並列パイプライ
ン命令処理装置を考えてみよう。(2) Consider a case in which the VLIW method and the instruction pipeline method described above for the second invention are combined to form a parallel pipeline instruction processing device. For example, consider a parallel pipeline instruction processing device that arranges four instruction pipeline type instruction processing devices in parallel and executes a VLIW type instruction having four fields.

この並列パイプライン命令処理装置の命令パイプライ
ンは、上述したRISCマイクロプロセッサの命令パイプラ
インと同じ１マシン・サイクルの分岐遅延を持っている
とする。すると、この並列パイプライン命令処理装置は
１スロットの遅延命令スロットを持つことになるが、１
命令が４つの命令フィールドから構成されているため、
実効的に４命令分の遅延命令スロットが生じることにな
る。さらに、分岐命令を含む命令自身が持つ３つの命令
フィールドも、命令の依存関係を考慮すると遅延命令ス
ロットと同じ扱いをする必要がある。したがって、この
４並列パイプライン命令処理装置は、７個の遅延命令ス
ロットを持つ直列パイプライン命令処理装置と等価であ
ると考えることができる。It is assumed that the instruction pipeline of this parallel pipeline instruction processing device has the same branch delay of one machine cycle as the instruction pipeline of the RISC microprocessor described above. Then, this parallel pipeline instruction processing device has one delay instruction slot, but one delay instruction slot.
Since the instruction consists of four instruction fields,
Effectively, a delay instruction slot for four instructions is generated. Furthermore, the three instruction fields of the instruction itself including the branch instruction also need to be treated in the same manner as the delay instruction slot in consideration of the instruction dependency. Therefore, this four-parallel pipeline instruction processor can be considered to be equivalent to a serial pipeline instruction processor having seven delay instruction slots.

このような数多くの空き命令スロットに有効な命令を
埋め込んで活用する命令スケジューリングはきわめて難
しく、ほとんどの部分にNOP命令を埋め込まなければな
らなくなる。先にも述べたように１つの空き命令スロッ
トの利用率でさえ80〜90パーセントであり、７個の空き
命令スロットを有効利用することは至難の技である。従
って、分岐遅延が１マシン・サイクルである従来のパイ
プライン構成をとった並列パイプライン命令処理装置で
は、分岐命令の実行によりその処理性能が著しく低下す
るという欠点がある。Instruction scheduling that embeds and uses valid instructions in such a large number of empty instruction slots is extremely difficult, and almost all parts must be filled with NOP instructions. As described above, even the utilization rate of one empty instruction slot is 80 to 90%, and it is extremely difficult to effectively use seven empty instruction slots. Therefore, the parallel pipeline instruction processing device having the conventional pipeline configuration in which the branch delay is one machine cycle has a disadvantage that the processing performance is significantly reduced by the execution of the branch instruction.

[Means for solving the problem]

第１の発明の命令処理装置の構成は、単一マシン・サ
イクルで実行できる命令セットを有し、該命令を記憶す
る第一の記憶手段と、オペランドを記憶する第二の記憶
手段と、前記第一の記憶手段から前記命令を読み出すた
めの命令読み出し手段と、読み出された該命令を実行す
るのに必要なオペランドを前記第二の記憶手段から読み
出すオペランド読み出し手段と、読み出された該オペラ
ンドを使用して命令を実行する命令実行手段と、命令実
行の結果得られた該オペランドを前記第二の記憶手段に
書き込むオペランド書き込み手段とを有し、前記命令の
読み出しおよび前記オペランドの読み出しを実行する第
一のステージ、前記命令の処理を実行する第二のステー
ジ、前記オペランドの書き込みを実行する第三のステー
ジで構成されるパイプライン命令処理機構を備えた計算
機システムにおいて、分岐先のアドレスを生成する分岐
アドレス生成手段をさらに備え、前記第一の記憶手段か
ら前記命令読み出し手段によって読み出された命令が分
岐命令であった場合には、前記第一のステージで前記オ
ペランドの読み出しと前記アドレス生成手段による分岐
アドレスの生成とを同時におこなうことによって分岐命
令実行時のバイプライン動作の乱れをなくし、パイプラ
イン動作を高速化することを特徴とする。The configuration of the instruction processing apparatus of the first invention has an instruction set that can be executed in a single machine cycle, and stores a first storage unit for storing the instruction, a second storage unit for storing operands, Instruction reading means for reading the instruction from the first storage means, operand reading means for reading from the second storage means an operand required to execute the read instruction, An instruction execution unit for executing an instruction using an operand; and an operand writing unit for writing the operand obtained as a result of the instruction execution to the second storage unit, wherein reading of the instruction and reading of the operand are performed. A pie comprising a first stage to execute, a second stage to execute the processing of the instruction, and a third stage to execute the writing of the operand A computer system having a line instruction processing mechanism, further comprising a branch address generating means for generating a branch destination address, wherein the instruction read by the instruction reading means from the first storage means is a branch instruction In the first stage, the reading of the operand and the generation of the branch address by the address generation means are performed simultaneously to eliminate disturbance of the pipeline operation at the time of execution of the branch instruction and to speed up the pipeline operation. Features.

また、第２の発明の構成は、ｎ個（ｎはｎ≧２の自然
数）の命令の並列の並びからなる命令列を有し、該命令
列を記憶する第一の記憶手段と、該第一の記憶手段から
前記命令列を読み出すための命令列読み出し手段と、読
み出した前記命令列中のｎ個の前記命令に対応し、前記
命令が指定する命令を処理するｎ個の命令処理手段と、
ｎ個の該命令処理手段が使用するオペランドを記憶し、
ｎ個の前記命令処理手段から独立してリード／ライト可
能な第二の記憶手段とを備え、ｎ個の命令を並列に処理
する命令処理装置において、前記命令処理手段中のｎ−
１個の命令処理手段は、前記命令が指定する命令の実行
に必要なオペランドを前記第二の記憶手段から読み出す
オペランド読み出し手段と、読み出した該オペランドを
使用して命令を実行する命令実行手段と、命令実行の結
果得られた該オペランドを前記第二の記憶手段に書き戻
すオペランド書き込み手段とを備え、前記命令列の読み
出しおよび前記オペランドの読み出しを実行する第一の
ステージ、前記命令の処理を実行する第二のステージ、
前記オペランドの書き込みを実行する第三のステージで
構成されるパイプライン命令処理機構により分岐命令以
外の命令を実行し、一方、前記命令処理手段中の残る１
個の命令処理手段は、前記命令が指定する条件分岐命令
の実行に必要なオペランドを前記第二の記憶手段から読
み出すオペランド読み出し手段と、次の命令列のアドレ
スの生成と分岐先のアドレスの生成とを並列に実行する
アドレス生成手段を備え、前記第一の記憶手段から前記
命令読み出し手段によって読み出された命令が分岐命令
であった場合には、前記第一のステージで前記オペラン
ドの読み出しおよび前記分岐先のアドレス生成を同時に
実行することによって、分岐遅延による分岐命令実行時
のパイプライン動作の乱れをなくし、分岐遅延による空
き命令スロットの増加を抑えたことを特徴とする。Further, the configuration of the second invention has an instruction sequence composed of a parallel arrangement of n instructions (n is a natural number of n ≧ 2), and a first storage means for storing the instruction sequence; An instruction sequence reading unit for reading the instruction sequence from one storage unit; and n instruction processing units corresponding to the n instructions in the read instruction sequence and processing an instruction specified by the instruction. ,
storing n operands used by the instruction processing means;
a second storage means readable / writable independently of the n instruction processing means, wherein the instruction processing apparatus processes n instructions in parallel;
One instruction processing means includes an operand reading means for reading, from the second storage means, an operand necessary for executing the instruction specified by the instruction, and an instruction execution means for executing the instruction using the read operand. And an operand writing means for writing back the operand obtained as a result of the instruction execution to the second storage means, a first stage for executing the reading of the instruction sequence and the reading of the operand, and the processing of the instruction. The second stage to perform,
An instruction other than a branch instruction is executed by a pipeline instruction processing mechanism comprising a third stage for executing the writing of the operand, while the remaining one in the instruction processing means is executed.
The instruction processing means includes: operand reading means for reading, from the second storage means, operands necessary for executing a conditional branch instruction specified by the instruction; generation of an address of a next instruction sequence and generation of a branch destination address And an address generating means for executing the instruction in parallel, and when the instruction read by the instruction reading means from the first storage means is a branch instruction, reading the operand in the first stage and By simultaneously executing the generation of the address of the branch destination, disturbance of a pipeline operation at the time of execution of a branch instruction due to a branch delay is eliminated, and an increase in empty instruction slots due to the branch delay is suppressed.

〔Example〕

次に、本発明について図面を参照して説明する。 Next, the present invention will be described with reference to the drawings.

第１図は第１の発明の一実施例の構成を示すブロック
図、第２図は第１図の実行タイミングチャートである。FIG. 1 is a block diagram showing a configuration of an embodiment of the first invention, and FIG. 2 is an execution timing chart of FIG.

第１図において、11は命令フェッチ手段、12はオペラ
ンド・フェッチ手段、13は命令実行手段、14はオペラン
ド・ライト手段、15は命令メモリ、16は読み出しポート
を２つと書き込みポートを１つ備えた汎用レジスタ・フ
ァイル、17は命令をフェッチするアドレスを１ずつイン
クリメントするインクリメンタ、18はマルチプレクサ、
19は命令フェッチ手段11によってフェッチした命令が分
岐命令である場合に分岐先アドレスを計算する分岐アド
レス生成手段、101,102,105,109,110はフェッチした命
令を転送する命令バス、103,104,113はフェッチしたオ
ペランドを転送するソース・オペランド・バス、106は
命令実行結果を転送するデスティネーション・オペラン
ド・バス、107,108,111,112は命令アドレス・バス、11
4,116はレジスタ・アドレス・バス、115はオペランド・
フェッチに使用するレジスタ・リード・バス、117はオ
ペランドのフェッチおよびライトに時分割して使用する
レジスタ・リード／ライト・バスである。In FIG. 1, 11 is instruction fetch means, 12 is operand fetch means, 13 is instruction execution means, 14 is operand write means, 15 is instruction memory, 16 is provided with two read ports and one write port. General register file, 17 is an incrementer that increments the address to fetch an instruction by 1, 18 is a multiplexer,
19 is a branch address generating means for calculating a branch destination address when the instruction fetched by the instruction fetch means 11 is a branch instruction, 101, 102, 105, 109 and 110 are instruction buses for transferring fetched instructions, and 103, 104 and 113 are source operands for transferring fetched operands. A bus, 106 is a destination operand bus for transferring an instruction execution result, 107, 108, 111, 112 are an instruction address bus, 11
4,116 is the register address bus, 115 is the operand
A register read bus 117 used for fetching is a register read / write bus 117 used in time division for fetching and writing operands.

第２図において、IFは命令フェッチ・サイクル、OF/B
Aはオペランド・フェッチ／分岐アドレス生成サイク
ル、EXは命令実行サイクル、OWはオペランド・ライト・
サイクルである。In FIG. 2, IF is an instruction fetch cycle, OF / B
A is an operand fetch / branch address generation cycle, EX is an instruction execution cycle, and OW is an operand write
It is a cycle.

第１図および第２図を用いて本実施例における命令処
理の流れを説明する。ここでは、第２図における命令１
の流れを説明する。The flow of instruction processing in this embodiment will be described with reference to FIGS. Here, the instruction 1 in FIG.
Will be described.

命令フェッチ手段11は、マシン・サイクルt1の前半の
半サイクルで、命令アドレスを命令アドレス・バス108
を介してマルチプレクサに出力し、マルチプレクサが命
令アドレス・バス107を介して送出した命令アドレスに
よって、命令メモリ15から読み出された命令を命令バス
109を経由してフェッチする。ここで、マルチプレクサ1
8が送出する命令アドレスとは、命令バス108および111
から得られる２つの命令アドレスのうち、ソース・オペ
ランド・バス113の内容によって選択されたどちらか一
方の命令アドレスである。The instruction fetch means 11 transmits the instruction address to the instruction address bus 108 in the first half cycle of the machine cycle t1.
And outputs the instruction read from the instruction memory 15 by the instruction address transmitted by the multiplexer via the instruction address bus 107.
Fetch via 109. Where multiplexer 1
The instruction addresses sent by 8 are the instruction buses 108 and 111
Is one of the two instruction addresses selected according to the contents of the source operand bus 113.

フェッチした命令は、次のマシン・サイクルt2が開始
するタイミングで、命令バス101を介してオペランド・
フェッチ手段12に転送されるとともに、命令が分岐命令
の場合には、マシン・サイクルt1の後半の半サイクルが
開始するタイミングで、命令バス110を介して分岐アド
レス生成手段19に転送される。また、インクリメンタ17
は、命令アドレス・バス107の命令アドレスを１だげ加
算した値を、命令アドレス・バス112を介して命令フェ
ッチ手段11に送出する。At the timing when the next machine cycle t2 starts, the fetched instruction receives the operand
In addition to being transferred to the fetch means 12, when the instruction is a branch instruction, the instruction is transferred to the branch address generation means 19 via the instruction bus 110 at the timing when the latter half cycle of the machine cycle t1 starts. Also, incrementer 17
Sends the value obtained by adding one to the instruction address of the instruction address bus 107 to the instruction fetch means 11 via the instruction address bus 112.

オペランド・フェッチ手段12は、マシン・サイクルt1
の後半の半サイクルで、命令バス101を介して転送され
た命令に基づいて、オペランドをフェッチするレジスタ
のアドレスを、レジスタ・アドレス・バス114および116
に送出し、汎用レジスタ・ファイル16からレジスタ・リ
ード・バス115およびレジスタ・リード／ライトバス117
の２つのバスを介してオペランドをフェッチする。The operand fetch means 12 is provided for the machine cycle t1.
In the latter half cycle of the instruction, the address of the register from which the operand is fetched is changed based on the instruction transferred through the instruction bus 101, by using the register address buses 114 and 116.
To the register read bus 115 and the register read / write bus 117 from the general-purpose register file 16.
Fetch the operand via the two buses.

オペランドのフェッチが完了すると、命令は命令バス
102を介して、フェッチした２つのオペランドはソース
・オペランド・バス103および104を介して、次のマシン
・サイクルt2が開始するタイミングで命令実行手段13に
転送される。また、命令が分岐命令の場合には、分岐の
成立／不成立を決定するオペランド情報がソース・オペ
ランド・バス113を介してマルチプレクサ18に転送され
る。When the operand fetch is complete, the instruction is placed on the instruction bus.
The two fetched operands are transferred to the instruction execution means 13 via the source operand buses 103 and 104 at the timing when the next machine cycle t2 starts. When the instruction is a branch instruction, operand information for determining whether a branch is taken / not taken is transferred to the multiplexer 18 via the source operand bus 113.

分岐アドレス生成ユニット19は、マシン・サイクルt1
の後半の半サイクルで、命令バス110を介して転送され
た命令に基づいて生成した分岐先アドレスをマルチプレ
クサ18に送出する。The branch address generation unit 19 executes the machine cycle t1
In the latter half cycle, the branch destination address generated based on the instruction transferred via the instruction bus 110 is sent to the multiplexer 18.

命令実行手段13は、ソース・オペランド・バス103お
よび104を介して転送されたオペランドを使用し、命令
バス102を介して転送された命令を１マシン・サイクル
（t2）で実行する。実行が完了した命令は、命令バス10
5を介し、また命令実行の結果得られたデータは、デス
ティネーション・オペランド・バス106を介して、次の
マシン・サイクル・t3が開始するタイミングでオペラン
ド・ライト手段14に転送される。The instruction executing means 13 executes the instruction transferred via the instruction bus 102 in one machine cycle (t2), using the operand transferred via the source operand buses 103 and 104. The instruction whose execution has been completed is sent to the instruction bus 10
5, and the data obtained as a result of the instruction execution are transferred to the operand writing means 14 via the destination operand bus 106 at the timing when the next machine cycle t3 starts.

オペランド・ライト手段14は、マシン・サイクルt3の
前半の半サイクルで、命令バス105を介して転送された
命令に基づいて、オペランドを書き込むレジスタのアド
レスをレジスタ・アドレス・バス116に送出し、またデ
スティネーション・オペランド・バス106を介して転送
されたオペランドを、レジスタ・リード／ライト・バス
117を介して送出し、汎用レジスタ・ファイル16に書き
込む。なお、前述のレジスタ書き込みは、マシン・サイ
クルt3の前半の半サイクル間で行なわれ、オペランド・
ライト手段14はマシン・サイクルt3の後半の半サイクル
はアイドル状態となる。The operand write means 14 sends the address of the register to which the operand is to be written to the register address bus 116 based on the instruction transferred via the instruction bus 105 in the first half cycle of the machine cycle t3, and Operands transferred via destination operand bus 106 are transferred to register read / write bus
Transmit via 117 and write to general register file 16. Note that the above-described register write is performed during the first half cycle of the machine cycle t3, and the operand
The write means 14 is in the idle state in the latter half cycle of the machine cycle t3.

個々の命令は以上述べたような動作で実行される。こ
れらの動作は各マシン・サイクル毎に重ね合わされて命
令パイプラインを構成する。Each instruction is executed by the operation described above. These operations are superimposed on each machine cycle to form an instruction pipeline.

さて、第２図において、命令１が分岐命令の場合に
は、分岐アドレス生成手段19は、t1の後半の半サイクル
（OF/BAステージ）で、命令バス110を介して命令フェッ
チ手段11から転送された命令に基づいて分岐先アドレス
を生成し、t2が開始するタイミングで、命令アドレス・
バス111を介してマルチプレクサ18に送出する。このと
き、第３図（ｂ）の従来例では分岐先アドレスの生成を
演算器が使用可能となるEXステージまで待って開始しな
ければならなかったが、本実施例では分岐アドレス生成
手段19を備えているのでOF/BAステージで分岐アドレス
の生成を完了することが可能となった。また、オペラン
ド・フェッチ手段12は、成立／不成立を決定するオペラ
ンド情報をソース・オペランド・バス113を介してマル
チプレクサ18に転送する。In FIG. 2, when the instruction 1 is a branch instruction, the branch address generation unit 19 transfers the instruction from the instruction fetch unit 11 via the instruction bus 110 in the latter half cycle (OF / BA stage) of t1. A branch destination address is generated based on the executed instruction, and the instruction address and the
The signal is sent to the multiplexer 18 via the bus 111. At this time, in the conventional example of FIG. 3 (b), the generation of the branch destination address had to be started after waiting for the EX stage in which the arithmetic unit became usable. Because of this, the generation of branch addresses can be completed in the OF / BA stage. Further, the operand fetch means 12 transfers operand information for determining whether the condition is satisfied / not satisfied to the multiplexer 18 via the source operand bus 113.

分岐命令の後続の命令である命令２の最初のマシン・
サイクル（t2）の前半の半サイクル（IFステージ）で
は、マルチプレクサ18は、命令フェッチ手段11から命令
アドレス・バス108を介して転送された命令アドレス
と、分岐アドレス生成手段19から命令アドレス・バス11
1を介して転送された命令アドレスのふたつの命令アド
レスから、オペランド・フェッチ手段12から転送された
オペランド情報により適切なアドレスを選択し、命令ア
ドレス・バス107に送出する。命令フェッチ手段11は、
マルチプレクサが命令アドレス・バス107を介して送出
した命令アドレスによって命令メモリ15から読み出され
た命令を、命令バス109を経由してフェッチする。すな
わち、命令２の命令フェッチ・サイクルはマシン・サイ
クルt2で実行できる。The first machine of instruction 2, which is the instruction following the branch instruction
In the first half cycle (IF stage) of the cycle (t2), the multiplexer 18 controls the instruction address transferred from the instruction fetch means 11 via the instruction address bus 108 and the instruction address bus 11 from the branch address generation means 19.
An appropriate address is selected from the two instruction addresses of the instruction addresses transferred via 1 according to the operand information transferred from the operand fetch means 12 and transmitted to the instruction address bus 107. Instruction fetch means 11
The multiplexer fetches the instruction read from the instruction memory 15 by the instruction address sent out via the instruction address bus 107 via the instruction bus 109. That is, the instruction fetch cycle of the instruction 2 can be executed in the machine cycle t2.

従って、分岐命令と後続の命令の間でも、命令の実行
パイプラインに空きが生じないため、並列に実行でき
る。Therefore, even between the branch instruction and the succeeding instruction, there is no space in the instruction execution pipeline, so that the instructions can be executed in parallel.

ところで、本実施例の命令パイプラインにおいては、
第２図に示すように、命令１のOWサイクルと命令３のOF/BAサイクルは排反す
るタイミングで動作するため、汎用レジスタ・ファイル
16とオペランド・フェッチ手段12およびオペランド・ラ
イト手段14を接続するバスは共有できる。By the way, in the instruction pipeline of this embodiment,
As shown in FIG. 2, since the OW cycle of instruction 1 and the OF / BA cycle of instruction 3 operate at mutually exclusive timings, the general-purpose register file
The bus connecting the operand fetch means 12 and the operand fetch means 14 with the bus 16 can be shared.

なお、本発明は前述の実施例に制限されることなく他
の適切な構成によっても実現できることはいうまでもな
い。It is needless to say that the present invention is not limited to the above-described embodiment, but can be realized by other appropriate configurations.

次に、第２の発明について図面を参照して説明する。 Next, the second invention will be described with reference to the drawings.

第４図は、ｎ＝４の場合の本発明の一実施例のブロッ
ク図であり、４つの命令から構成されるVLIW型の並列命
令列により、４つの命令を並列に実行する並列パイプラ
イン命令処理装置の構成を示したものである。FIG. 4 is a block diagram of one embodiment of the present invention when n = 4. A parallel pipeline instruction for executing four instructions in parallel by a VLIW type parallel instruction sequence composed of four instructions. 2 shows a configuration of a processing device.

第４図において、411は命令列メモリ、412は命令列フ
ェッチ手段、413は８つの読み出しポートと４つの書き
込みポートを備えたデータ・レジスタ、414〜417はオペ
ランド・フェッチ手段、418は次にフェッチする命令列
のアドレスを生成するアドレス生成手段、419〜421は命
令実行手段、422〜425はオペランド・ライト手段、4101
は命令列をフェッチするための命令列バス、4102はアド
レス・バス、4103〜4106はフェッチした命令を転送する
命令バス、4107〜4110は命令の実行に必要なオペランド
のフェッチに使用する２本のレジスタ・リード・バス、
4111〜4114はフェッチしたオペランドを転送する２本の
ソース・オペランド・バス、4115〜4118は命令実行結果
を転送するデスティネーション・オペランド・バス、41
19〜4122はオペランドの書き込みに使用するレジスタ・
ライト・バスである。In FIG. 4, reference numeral 411 denotes an instruction sequence memory; 412, an instruction sequence fetch means; 413, a data register having eight read ports and four write ports; 414 to 417, operand fetch means; Address generating means for generating an address of an instruction sequence to be executed, 419 to 421 are instruction executing means, 422 to 425 are operand writing means, 4101
Is an instruction sequence bus for fetching instruction sequences, 4102 is an address bus, 4103 to 4106 are instruction buses for transferring fetched instructions, and 4107 to 4110 are two instruction buses used for fetching operands necessary for executing the instructions. Register read bus,
4111 to 4114 are two source operand buses for transferring fetched operands, 4115 to 4118 are destination operand buses for transferring instruction execution results, 41
19 to 4122 are registers used for writing operands.
It is a light bus.

第５図は、第４図の構成を持つ並列パイプライン命令
処理装置の命令フォーマットを示すものである。FIG. 5 shows an instruction format of the parallel pipeline instruction processing device having the configuration of FIG.

第６図は、第２の発明のパイプラインの構造を示す図
である。図中の略号の意味は次のとおりである。FIG. 6 is a diagram showing the structure of the pipeline of the second invention. The meanings of the abbreviations in the figure are as follows.

IF…命令列フェッチ OF…オペランド・フェッチ AG…アドレス生成 EX…命令実行 OW…ライト・バック第７図は、条件分岐命令を含むプログラム・シーケン
スの例を示す図、第８図は、第２の発明における命令パ
イプラインの動作を示す図であり、第７図に示すプログ
ラム・シーケンスを実行する場合のパイプライン動作を
示している。IF: instruction sequence fetch OF: operand fetch AG: address generation EX: instruction execution OW: write back FIG. 7 is a diagram showing an example of a program sequence including a conditional branch instruction, and FIG. FIG. 8 is a diagram showing the operation of the instruction pipeline in the invention, and shows the pipeline operation when executing the program sequence shown in FIG. 7;

第８図において、 IFは命令列フェッチ、 OFはオペランド・フェッチ、 AGは分岐アドレス生成、演算１〜６および演算10〜12はそれぞれの命令実行、 WBはオペランドのライト・バックを表わす。 In FIG. 8, IF denotes an instruction sequence fetch, OF denotes an operand fetch, AG denotes a branch address generation, operations 1 to 6 and operations 10 to 12 execute respective instructions, and WB denotes an operand write back.

はじめに、第４図および第５図を用いて命令列が実行
される場合の動作を説明する。First, an operation when an instruction sequence is executed will be described with reference to FIGS. 4 and 5.

命令列フェッチ手段412は、アドレス・バス4102で指
定される命令列を、命令列メモリ411から命令列バス410
1を介してフェッチする。命令列フェッチ手段412は、第
５図に示した命令1,命令2,命令３および分岐命令の各命
令を、それぞれ4103,4104,4105および4106の命令バスを
介してオペランド・フェッチ手段414〜417およびアドレ
ス生成手段418にそれぞれ転送する。The instruction sequence fetch means 412 stores the instruction sequence specified by the address bus 4102 from the instruction sequence memory 411 into the instruction sequence bus 410.
Fetch through one. The instruction sequence fetch means 412 transfers the instructions 1, 2, 3, and branch instructions shown in FIG. 5 to the operand fetch means 414 to 417 via instruction buses 4103, 4104, 4105 and 4106, respectively. And to the address generation means 418.

ついで、オペランド・フェッチ手段414〜417は、転送
された各命令をデコードし、各命令で使用するオペラン
ドを各々レジスタ・リード・バス4107〜4110を介してデ
ータ・レジスタ413からフェッチする。オペランド・フ
ェッチ手段414〜416は、フェッチしたオペランドをそれ
ぞれソース・オペランド・バス4112〜4114を介してそれ
ぞれ命令実行手段419〜421に転送する。一方、オペラン
ド・フェッチ手段417は、フェッチしたオペランドをソ
ース・オペランド・バス4111を介してアドレス生成手段
418に転送する。ここまでの動作はすべての命令処理に
関して同じである。Next, the operand fetch means 414 to 417 decode the transferred instructions and fetch the operands used in each instruction from the data register 413 via the register read buses 4107 to 4110, respectively. Operand fetch means 414 to 416 transfer fetched operands to instruction execution means 419 to 421 via source operand buses 4112 to 4114, respectively. On the other hand, the operand fetch means 417 outputs the fetched operand to the address generation means via the source operand bus 4111.
Transfer to 418. The operation so far is the same for all instruction processing.

命令実行手段419〜421は、ソース・オペランド・バス
4112〜4114を介して転送されたオペランドを使用して各
命令をそれぞれ実行し、それぞれの実行結果をデスティ
ネーション・オペランド・バス4116〜4118を介してオペ
ランド・ライト手段423〜425へ転送する。オペランド・
ライト手段423〜425は、各結果オペランドを各命令が指
定するレジスタにレジスタ・ライト・バス4119〜4121を
介してそれぞれ書き戻す。Instruction execution means 419 to 421 are provided by a source operand bus
The respective instructions are executed using the operands transferred via 4112 to 4114, and the respective execution results are transferred to operand writing means 423 to 425 via destination operand buses 4116 to 4118. operand·
The write means 423 to 425 rewrite each result operand to a register designated by each instruction via the register write buses 4119 to 4121.

一方、アドレス生成手段418は、内部に保持している
命令列アドレスをインクリメントし次アドレスを生成す
る。それと同時に、命令バス4106を介して与えられた分
岐命令をデコードし、分岐先アドレスの生成を実行す
る。そして、ソース・オペランド・バス4111を介して与
えられたオペランドを参照して分岐条件の成立／不成立
を判定し、分岐が発生する場合には分岐先アドレスを、
分岐が発生しない場合には、次アドレスをアドレス・バ
ス4102に出力する。また、分岐命令が同時に次アドレス
をレジスタへ格納する動作を伴うもの、すなわちブラン
チ・アンド・リンク命令の場合には、分岐先アドレスが
アドレス・バス4102に出力されるとともに、次アドレス
がデスティネーション・オペランド・バス4115を介して
オペランド・ライト手段422に転送される。ついで、オ
ペランド・ライト手段422は次アドレスをオペランドと
してレジスタ・ライト・バス4122を介してデータ・レジ
スト413に書き戻す。On the other hand, the address generation means 418 increments the instruction sequence address held therein to generate a next address. At the same time, it decodes the branch instruction given via the instruction bus 4106 and executes generation of a branch destination address. Then, with reference to the operand given via the source operand bus 4111, it is determined whether the branch condition is satisfied or not. If a branch occurs, the branch destination address is set.
If no branch occurs, the next address is output to address bus 4102. When the branch instruction involves an operation of storing the next address in a register at the same time, that is, in the case of a branch and link instruction, the branch destination address is output to the address bus 4102 and the next address is stored in the destination bus. The data is transferred to the operand writing means 422 via the operand bus 4115. Next, the operand write means 422 writes the next address as an operand back to the data register 413 via the register write bus 4122.

以上述べた処理のタイミングを第６図を用いて説明す
る。第６図は、一つの命令列が実行される際の処理の流
れを示す図である。１〜３ライン目の処理が命令Ｉ〜命
令IIIの処理に対応する。いちばん下の２ラインに渡る
処理が分岐命令の処理に対応する。第４図における、命
令列フェッチ手段412が命令列をフェッチするタイミン
グがIFに対応する。同様に、オペランド・フェッチ手段
414〜417によるレジスタからのオペランドのフェッチ、
アドレス生成手段418により次アドレスおよび分岐先ア
ドレスの生成、命令実行手段419〜421により命令の実
行、オペランド・ライト手段422〜425によるレジスタへ
のオペランド・ライトのタイミングがそれぞれOF,AG,EX
およびWBに対応する。The timing of the above-described processing will be described with reference to FIG. FIG. 6 is a diagram showing a flow of processing when one instruction sequence is executed. The processing of the first to third lines corresponds to the processing of the instructions I to III. The processing over the two lowermost lines corresponds to the processing of the branch instruction. The timing at which the instruction sequence fetch means 412 fetches an instruction sequence in FIG. 4 corresponds to the IF. Similarly, operand fetch means
Fetching operands from registers by 414-417,
The address generation unit 418 generates the next address and the branch destination address, the instruction execution units 419 to 421 execute the instruction, and the operand write units 422 to 425 adjust the operand write timing to the register by OF, AG, and EX, respectively.
And WB.

さて、本実施例の並列パイプライン命令処理装置が第
７図に示したプログラム・シーケンスを処理する場合を
考えてみよう。このシーケンスでは命令列２が条件分岐
命令を含んでおり、条件成立によりシーケンスが命令列
２からの命令列Ａへ分岐する。Now, consider the case where the parallel pipeline instruction processing apparatus of the present embodiment processes the program sequence shown in FIG. In this sequence, the instruction sequence 2 includes a conditional branch instruction, and the sequence branches from the instruction sequence 2 to the instruction sequence A when the condition is satisfied.

第８図に、第７図のシーケンスが実行される場合のパ
イプラインの動作を示す。命令列２の処理において、分
岐フィールドを処理するパイプラインは、オペランド・
フェッチと同時に、次アドレスの生成と分岐先アドレス
の生成を並列して実行しており、フェッチしたオペラン
ドの内容を使用してt2サイクルの終了時に、次アドレス
を使用するか分岐先アドレスを使用するかを決定し、ア
ドレス・バス4102に出力することができる。したがっ
て、t3サイクルから命令Ａの処理を開始することができ
る。FIG. 8 shows the operation of the pipeline when the sequence of FIG. 7 is executed. In the processing of instruction sequence 2, the pipeline for processing the branch field includes operands
At the same time as the fetch, the generation of the next address and the generation of the branch destination address are executed in parallel, and the next address or the branch destination address is used at the end of the t2 cycle using the contents of the fetched operand. Can be determined and output to the address bus 4102. Therefore, the processing of the instruction A can be started from the cycle t3.

従って、分岐を含む命令列の実行時にも、パイプライ
ンに空きが生じることはなく、コンパイラが埋めなけれ
ばならない命令の空きスロットを従来のものに比べて少
なくでき、効率の高い並列パイプライン命令処理装置が
実現できる。Therefore, even when an instruction sequence including a branch is executed, there is no vacancy in the pipeline, and the number of vacant slots of instructions that must be filled by the compiler can be reduced as compared with the conventional one, and parallel pipeline instruction processing with high efficiency can be achieved. The device can be realized.

例えば、１マシン・サイクルの分岐遅延を持つ従来装
置で、最大性能を発揮させるために、コンパイラが命令
４〜命令６の３つの命令スロット、さらに続く遅延命令
列中の３つの命令スロット、合計６つの命令スロットに
有効な命令を埋め込む必要がある。これに対し、本実施
例の並列パイプライン命令処理装置では、命令４〜命令
６の３つの命令スロットを有効な命令で埋めればよい。For example, in a conventional device having a branch delay of one machine cycle, in order to obtain the maximum performance, the compiler uses three instruction slots of instructions 4 to 6 and three instruction slots in the following delayed instruction sequence, for a total of six. A valid instruction must be embedded in one instruction slot. On the other hand, in the parallel pipeline instruction processing device of the present embodiment, three instruction slots of instructions 4 to 6 need only be filled with valid instructions.

なお、本発明は前述の実施例に制限されることなく他
の適切な構成によっても実現できることは言うまでもな
い。It is needless to say that the present invention is not limited to the above-described embodiment, but can be realized by another appropriate configuration.

〔The invention's effect〕

以上説明したように、第１の発明では、パイプライン
計算機において、分岐命令が実行された場合にもパイプ
ライン動作が乱れずに命令が並列実行されるために、命
令の実行を高速化できるという効果があり、また、汎用
レジスタの読み出し／書き込み使用するバスを時分割で
使用し共有化できるためにハードウェア量を削減できる
という効果がある。As described above, in the first aspect, in the pipeline computer, even when a branch instruction is executed, the instructions are executed in parallel without disturbing the pipeline operation, so that the speed of instruction execution can be increased. There is an effect that the bus used for reading / writing of the general-purpose register can be shared in a time-sharing manner, so that the amount of hardware can be reduced.

また、以上説明したように第２の発明の並列パイプラ
イン命令処理装置は、分岐命令を持つ命令列による命令
の空きスロットの発生がないために、簡単なハードウェ
アとコンパイル時の並列命令スケジューリングにより並
列処理を実現するVLIW型並列処理と、命令パイプライン
方式による高速処理とを組み合わせた、効率の高い並列
パイプライン命令処理装置を実現することができるとい
う効果があり、また、コンパイラが埋めなければならな
い空き命令スロットを少なくできるため、並列命令スケ
ジューリングが容易になるという効果がある。Further, as described above, the parallel pipeline instruction processing device according to the second aspect of the present invention uses simple hardware and parallel instruction scheduling at compile time because there is no occurrence of an empty slot of instructions due to an instruction sequence having branch instructions. It has the effect of realizing a highly efficient parallel pipeline instruction processing device that combines VLIW type parallel processing that realizes parallel processing and high-speed processing by the instruction pipeline method. Since the number of unused instruction slots that can be reduced can be reduced, there is an effect that parallel instruction scheduling becomes easy.

[Brief description of the drawings]

第１図は第１の発明の一実施例の構成を示すブロック
図、第２図は第１図のパイプラインのタイミング図、第
３図（ａ）は従来の命令処理装置のパイプラインのタイ
ミング図、第３図（ｂ）は従来の命令処理装置における
分岐命令のパイプラインのタイミング図、第４図はｎ＝
４の場合の第２の発明の一実施例の構成を示すブロック
図、第５図は第４図の構成を持つ並列パイプライン命令
処理装置の命令フォーマットを示す図、第６図は第２の
発明のパイプラインを示す図、第７図は条件分岐命令を
含むプログラム・シーケンスの図、第８図は第７図に示
す命令を実行した場合の命令パイプラインの動作を示す
図、第９図はVLIW方式の並列計算機の原理を示す図、第
10図は従来の直列命令処理装置の命令パイプラインを示
す図、第11図は従来のパイプラインにおける分岐発生時
の動作を示した図、第12図は従来のパイプラインにおけ
る遅延分岐命令の動作を示した図である。 AG……アドレス生成、EX……命令実行サイクル、IF……
命令フェッチ・サイクル、OF……レジスタ・フェッチ・
サイクル、OW……レジスタ・ライト・サイクル、11……
命令フェッチ手段、12……オペランド・フェッチ手段、
13……命令実行手段、14……オペランド・ライト手段、
15……命令メモリ、16……汎用レジスタ・ファイル、17
……インクリメンタ、18……マルチプレクサ、19……分
岐アドレス生成手段、101〜102……命令バス、108〜104
……ソース・オペランド・バス、105……命令バス、106
……デスティネーション・オペランド・バス、107〜108
……命令アドレス・バス、109〜110……命令バス、111
〜112……命令アドレス・バス、113……ソース・オペラ
ンド・バス、114……レジスタ・アドレス・バス、115…
…レジスタ・リード・バス、116……レジスタ・アドレ
ス・バス、117……レジスタ・リード／ライト・バス、4
11……命令列メモリ、412……命令列フェッチ手段、413
……データ・レジスタ、414〜417……オペランド・フェ
ッチ手段、418……アドレス生成手段、419〜421……命
令実行手段、422〜425……オペランド・ライト手段、41
01……命令列バス、4102……アドレス・バス、4103〜41
06……命令バス、4107〜4110……レジスタ・リード・バ
ス、4111〜4114……ソース・オペランド・バス、4115〜
4118……デスティネーション・オペランド・バス、4119
〜4122……レジスタ・ライト・バス。FIG. 1 is a block diagram showing the configuration of an embodiment of the first invention, FIG. 2 is a timing chart of the pipeline of FIG. 1, and FIG. 3 (a) is a timing of a pipeline of a conventional instruction processing device. FIG. 3 (b) is a timing diagram of a pipeline of a branch instruction in the conventional instruction processing device, and FIG.
4 is a block diagram showing the configuration of an embodiment of the second invention, FIG. 5 is a diagram showing an instruction format of a parallel pipeline instruction processing device having the configuration of FIG. 4, and FIG. FIG. 7 is a diagram showing a pipeline of the invention, FIG. 7 is a diagram of a program sequence including a conditional branch instruction, FIG. 8 is a diagram showing operation of the instruction pipeline when the instruction shown in FIG. 7 is executed, FIG. Is a diagram showing the principle of a VLIW parallel computer,
FIG. 10 is a diagram showing an instruction pipeline of a conventional serial instruction processor, FIG. 11 is a diagram showing an operation when a branch occurs in a conventional pipeline, and FIG. 12 is an operation of a delayed branch instruction in a conventional pipeline. FIG. AG: Address generation, EX: Instruction execution cycle, IF:
Instruction fetch cycle, OF: Register fetch
Cycle, OW ... Register write cycle, 11 ...
Instruction fetch means, 12 ... Operand fetch means,
13 ... instruction execution means, 14 ... operand write means,
15 ... Instruction memory, 16 ... General purpose register file, 17
... Incrementer, 18 multiplexer, 19 branch address generating means 101-102 instruction bus, 108-104
... source operand bus, 105 ... instruction bus, 106
…… Destination operand bus, 107-108
…… Instruction address bus, 109 to 110 …… Instruction bus, 111
... 112 Instruction address bus, 113 Source operand bus, 114 Register address bus, 115
... Register read bus, 116 ... Register address bus, 117 ... Register read / write bus, 4
11 ... instruction string memory, 412 ... instruction string fetch means, 413
... data register, 414 to 417 ... operand fetch means, 418 ... address generation means, 419 to 421 ... instruction execution means, 422 to 425 ... operand write means, 41
01: Instruction string bus, 4102: Address bus, 4103 to 41
06 Instruction bus, 4107-4110 Register read bus, 4111-4114 Source operand bus, 4115-
4118 …… Destination operand bus, 4119
~ 4122 ... Register write bus.

Claims

(57) [Claims]

An instruction set that can be executed in a single machine cycle; a first storage means for storing the instruction; a second storage means for storing operands; Command reading means for reading a command;
An operand reading means for reading an operand necessary to execute the read instruction from the second storage means, an instruction executing means for executing an instruction using the read operand, A operand writing means for writing the obtained operand to the second storage means, a first stage for executing the reading of the instruction and reading of the operand, and a second stage for executing the processing of the instruction. In a computer system including a pipeline instruction processing mechanism comprising a stage and a third stage for executing the writing of the operand, the computer system further comprises a branch address generating means for generating a branch destination address, and the first storage means If the instruction read from the instruction reading means is a branch instruction,
In the first stage, the reading of the operand and the generation of the branch address by the address generation means are performed simultaneously, thereby eliminating disturbance of the pipeline operation at the time of execution of the branch instruction and speeding up the pipeline operation. Instruction processing unit.

2. An apparatus according to claim 1, further comprising: a first storage unit for storing an instruction sequence comprising a parallel arrangement of n instructions (n is a natural number of n ≧ 2); Instruction string reading means for reading the instruction string, n instruction processing means corresponding to n instructions in the read instruction string, and processing an instruction specified by the instruction; An instruction processing device for storing operands used by the instruction processing means and having a second storage means readable / writable independently of the n instruction processing means, and for processing the n instructions in parallel; The n-1 instruction processing means in the instruction processing means includes: operand reading means for reading, from the second storage means, an operand necessary for executing the instruction specified by the instruction; and using the read operand. Instruction executor who executes instructions And a operand writing means for writing back the operand obtained as a result of the instruction execution to the second storage means, a first stage for executing reading of the instruction sequence and reading of the operand, processing of the instruction. And a third stage for executing the writing of the operand executes an instruction other than a branch instruction by a pipeline instruction processing mechanism. The instruction processing means includes an operand reading means for reading, from the second storage means, an operand required for executing a conditional branch instruction specified by the instruction, and generating an address of a next instruction sequence and generating a branch destination address. Address generating means for executing in parallel, wherein an instruction read from the first storage means by the instruction reading means is provided with a branch instruction. In the latter case, in the latter half of the first stage, the reading of the operand and the generation of the address of the branch destination are simultaneously executed, thereby eliminating the disturbance of the pipeline operation at the time of executing the branch instruction, and the empty instruction due to the branch delay. An instruction processing device characterized by suppressing an increase in slots.