JPH03163627A

JPH03163627A - Instruction processor

Info

Publication number: JPH03163627A
Application number: JP9450790A
Authority: JP
Inventors: Yumiko Ushimaru; 牛丸　由美子
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1989-08-18
Filing date: 1990-04-10
Publication date: 1991-07-15
Anticipated expiration: 2014-02-24
Also published as: JP2861234B2

Abstract

PURPOSE:To eliminate the disturbance of a pipeline at the time of executing an instruction by adding a branching address generating means to a computer system provided with a pipeline instruction processing mechanism, and executing simultaneously the branching address generation in a machine cycle for reading out an operand. CONSTITUTION:In the case an instruction 1 is a branching instruction, a branching address generating means 19 generates a branching destination address by the transfer instruction of an instruction fetching means 11 in the OF/BA stage of a machine cycle t1, and sends it to a multiplexer 18 with the start of t2. Subsequently, an operand fetching means 12 sends the success/failure information of the branch to the multiplexer. In the IF stage of t2 of a succeeding instruction 2, from two instruction addresses transferred from the means 11 and 19, the multiplexer 18 selects an appropriate address by information sent from the means 12, reads out an instruction from a memory 15 and fetches it to the means 11. Even between the branching instruction and the succeeding instruction, there is no room in an execution pipeline, and the parallel execution can be performed. Also, the OW cycle of the instruction 1 and the OF/BA cycle of an instruction 3 are activated by an exclusive timing, therefore, a register file 16, a fetching means 12 and a writing means 14 can share a bus.

Description

【発明の詳細な説明】〔産業上の利用分野〕第工および第２の発明は、情報処理装置の命令処理装置
に関し、特に、第ｌの発明は、バイブライン命令処理機
構を備え単一マシンサイクルで命令を実行するＲＩＳＣ
型マイクロプロセッサに関し、第２の発明は複数命令を
並列して実行する並列命令処理装置、およびバイブライ
ン機構を利用し高速処理を実現するパイルライン命令処
理装置に関する。Detailed Description of the Invention [Industrial Application Field] The first and second inventions relate to an instruction processing device for an information processing device, and in particular, the first invention relates to a single machine equipped with a vibe line instruction processing mechanism. RISC executes instructions in cycles
The second invention relates to a parallel instruction processing device that executes a plurality of instructions in parallel, and a pile line instruction processing device that utilizes a vibe line mechanism to achieve high-speed processing.

[Conventional technology]

（１）第１の発明における従来の技術では、計算機シス
テムの高性能化に伴い、様々なパイプライン構成の計算
機が開発されているが、マシンサイクルの高速化のため
に、オペランドの読み出しとオペランドの書き込みをそ
れぞれひとつのパイルライン●ステージに位置付けたパ
イプライン方式が使用されることが多い。この種のパイ
プライン方式では、第３図（ａ）の従来の命令処理装置
のパイプラインのタイミングチャートに示すよろに、 ◆メモリからの命令フェッチ（ＩＦステージと略す）、 ●汎用レジスタ●ファイルからのオペランド●フェッチ
（ＯＦステージと略す）、 ●命令の実行（ＥＸステージと略す）、●汎用レジスタ
●ファイルへのオペランド●ライト（ＯＷステージと略
す）の４つのステージからパイプラインを構成する。(1) In the conventional technology related to the first invention, computers with various pipeline configurations have been developed as the performance of computer systems increases. A pipeline method is often used in which each write is placed in one pileline stage. In this type of pipeline system, as shown in the pipeline timing chart of a conventional instruction processing device in Figure 3(a), there are two steps: ◆Instruction fetch from memory (abbreviated as IF stage), ●General-purpose register● From file The pipeline consists of four stages: ● operand fetch (abbreviated as OF stage), ● instruction execution (abbreviated as EX stage), ● operand ● write to general-purpose register ● file (abbreviated as OW stage).

パイプラインの各ステージの処理はエマシンサイクルで
実行できるため、１マシンサイクル毎に命令が実行でき
る。また、各処理が細分化されているため、マシンサイ
クル自体も高速化できる。このため高性能な命令処理装
置が提供できる。Since processing at each stage of the pipeline can be executed in an emmachine cycle, an instruction can be executed every machine cycle. Furthermore, since each process is subdivided, the machine cycle itself can be made faster. Therefore, a high-performance instruction processing device can be provided.

（２）また、第２の発明における従来の技術では、以下
のような技術がある。(2) Further, as the conventional technology related to the second invention, there is the following technology.

（ａ）ＶＬＩＷ型並列計算機Ｖ　Ｌ　Ｉ　Ｗ　（Ｖｅｒｙ　Ｌｏｎｇ　Ｉｎｓｔｒｕ
ｃｔｉｏｎ　Ｗｏｒｄ）方式は、第８図に示すように、
比較的長い命令を多数のフィールドに分散し、各々のフ
ィールドで多数の演算器、レジスタ、相互結合網、メモ
リ等を独立して制御することにより並列処理を実現する
ものである。(a) VLIW type parallel computer
ction Word) method, as shown in Figure 8,
Parallel processing is achieved by distributing relatively long instructions into a large number of fields and independently controlling a large number of arithmetic units, registers, interconnection networks, memories, etc. in each field.

ＶＬＩＷ方式では、演算の並列性はコンパイル時に抽出
され、コンパイラが並列演算可能なものをｌつの命令に
合成する。並列演算器の数に近い並列度が得られる場合
に、高速処理が達成できる。In the VLIW method, the parallelism of operations is extracted at the time of compilation, and the compiler synthesizes those that can be operated in parallel into one instruction. High-speed processing can be achieved when the degree of parallelism is close to the number of parallel computing units.

しかし、並列度の低い場合には、命令フィールドに空き
ができて命令のビット使用効率が低下する。However, when the degree of parallelism is low, empty space is created in the instruction field, reducing the bit usage efficiency of instructions.

どの程度命令フィールドを埋めることができるかはコン
パイラの能力およびソース●プログラムに依存する。The extent to which the instruction field can be filled depends on the capabilities of the compiler and the source program.

ＶＬＩＷ方式では、プログラムの並列性の抽出をコンパ
イル時に実行するため、データの依存関係の検出等の複
雑な処理を行なう必要がない。したがって、ハードウェ
ア構成が簡単にできる。In the VLIW method, since the parallelism of a program is extracted at the time of compilation, there is no need to perform complicated processing such as detecting data dependencies. Therefore, the hardware configuration can be easily configured.

ＶＬＩＭ方式は、水平型マイクロ命令方式から派生した
考え方に基づいており、機能レベルの低い演算器による
きめ細かな並列処理（低レベル並列処理）に適している
。The VLIM method is based on a concept derived from the horizontal microinstruction method, and is suitable for fine-grained parallel processing (low-level parallel processing) using arithmetic units with a low functional level.

（ｂ）命令パイプライン処理計算機システムにおける機械命令の実行過程は、命令フ
ェッチ（読み出し二〇Ｆと略す）、命令デコード（解読
：ＩＤと略す）、オペランド●アドレス生成（ＯＡと略
す）、オペランド●フェッチ（ＯＤと略す）、演算実行
（ＥＸと略す）、結果の書き戻し（ＷＢと略す）を逐次
的に進めることによってなされる。命令パイプライン方
式は、この命令実行の各ステージがオーバラツプして実
行される。各実行ステージの実行時間が同一でそれがマ
シン・サイクルに等しいとき、命令パイプライン方式は
最大の性能を発揮し、演算結果は１マシン・サイクル毎
に得られることになる。(b) Instruction pipeline processing The execution process of machine instructions in a computer system is instruction fetch (abbreviated as read 20F), instruction decode (decode: abbreviated as ID), operand address generation (abbreviated as OA), operand ● This is done by sequentially proceeding with fetching (abbreviated as OD), operation execution (abbreviated as EX), and writing back of results (abbreviated as WB). In the instruction pipeline method, each stage of instruction execution is executed in an overlapping manner. The instruction pipeline system achieves maximum performance when the execution time of each execution stage is the same and equal to a machine cycle, and an operation result is obtained every machine cycle.

命令パイプラインの流れを乱す要因としては、●先行命
令の演算結果を後続命令が必要とする場合 ●先行命令が後続命令のオペランド●アドレスを決定す
る場合 ●分岐が起こる場合 ●メモリ●アクセスの競合 ●先行命令が後続命令の内容を書き換える場合●割り込
み／例外が発生した場合 ●命令が複雑で演算実行に複数マシン・サイクル必要と
する場合などがある。Factors that disturb the flow of the instruction pipeline include: ● When the subsequent instruction requires the operation result of the preceding instruction ● When the preceding instruction determines the operand of the subsequent instruction ● When a branch occurs ● When a branch occurs ● Memory ● Access conflict ●When the preceding instruction rewrites the contents of the following instruction ●When an interrupt/exception occurs ●When the instruction is complex and requires multiple machine cycles to execute the operation.

これらの命令パイプラインを乱す要因を最小限に抑える
ために種々の工夫がなされている。例えば、条件分岐に
よりパイプラインの乱れを抑える工夫として、プログラ
ム●ループが格納できるような大きな命令バッファを使
用するループ●バッファ方式、条件成立および条件不成
立両方の場合の命令系列を処理する複数命令流方式、分
岐命令・の履歴情報から分岐を予測する分岐予測方式な
どが知られている。Various efforts have been made to minimize these factors that disturb the instruction pipeline. For example, as a way to suppress pipeline disturbances due to conditional branching, there are programs that use a loop buffer method that uses a large instruction buffer that can store the loop, and multiple instruction streams that process instruction sequences in both cases where the condition is met and when the condition is not met. A branch prediction method that predicts a branch from history information of branch instructions is known.

最近の高性能マイクロプロセッサの分野では、機械命令
セットを簡素化し高速処理を達成しようというＲ　Ｉ　
Ｓ　Ｃ　（Ｒｅｄｕｃｅｄ　Ｉｎｓｔｒｕｃｔ１ｏｎ　
Ｓｅｔ　Ｃｏｍ−ｐｕｔｏｒ　）のアプローチが注目さ
れている。Recently, in the field of high-performance microprocessors, there has been an effort to simplify the machine instruction set and achieve high-speed processing.
S C (Reduced Instruction 1 on
The approach of ``Set Computer'' is attracting attention.

ＲＩＳＣのアプローチは、高級言語プログラムのトレー
ス結果の解析と、スパーコンピュータＣｒａｙ−１のハ
ード●ワイヤード論理の成功から生まれたもので、 ●レジスターレジスタ間演算を基本とする簡素な命令セ
ット ●パイプラインの重視 ●１マシン・サイクル実行 ●最新コンパイラ技術の適用などを特徴とするレジスターレジスタ演算を基本とする命令セットは、オ
ペランド●アドレス生成（ＯＡ）を不要にした。また、
シンプルな命令セットは命令デコードを簡単にし、命令
デコード（ＩＤ）をオペランド●フェッチ（ＯＦ）ステ
ージに含めることが可能となった。さらに、各ステージ
における処理のバランスを考慮され、第１０図に示すよ
うに、●命令フェッタ＆オペランド●フェッチ（ＩＦ／
ＯＦ） ●命令実行（ＥＸ） ●オペランド●ライト（ＯＷ）の３ステージで構戚される命令パイプラインが開発され
ている。The RISC approach was born from the analysis of trace results of high-level language programs and the success of the hardwired logic of the Cray-1 supercomputer. The instruction set, which is based on register-register operations, is characterized by emphasis on ● 1 machine cycle execution ● application of the latest compiler technology, and eliminates the need for operand ● address generation (OA). Also,
The simple instruction set simplifies instruction decoding and allows instruction decode (ID) to be included in the operand fetch (OF) stage. Furthermore, the balance of processing at each stage is considered, and as shown in Figure 10, ●Instruction fetcher & operand ●Fetch (IF/
An instruction pipeline has been developed that consists of three stages: ●Instruction Execution (EX) ●Operand ●Write (OW).

この命令パイプラインでは、命令１が分岐命令の場合、
その実行ステージ（ＥＸ）が終了して初めて、命令２の
フェッチが可能となる。したがって、命令１の実行中に
フェッチした命令は無効化する必要があり、命令パイプ
ラインに１マシン・サイクルの空きが生じ性能が低下す
る。In this instruction pipeline, if instruction 1 is a branch instruction,
Instruction 2 can be fetched only after the execution stage (EX) is completed. Therefore, the instruction fetched during the execution of instruction 1 must be invalidated, which leaves one machine cycle vacant in the instruction pipeline, reducing performance.

この性能の低下を最小限に抑えるために、遅延分岐機構
が利用されている。これは、第１１図に示すように分岐
命令はその発行から１マシン・サイクル遅れて実行され
る遅延型命令であるとみなし、コンパイラによる命令ス
ケジューリングにより分岐命令直後の命令スロットを有
効な命令で埋めることにより、パイプラインの乱れを無
くし性能を維持しようとするものである。分岐命令直後
の命令スロットに有効な命令を埋め込むことができなか
った場合には、その命令スロットにはＮｏＰ命令を埋め
込む必要がある。この場合にはもちろん性能の低下があ
る。To minimize this performance degradation, a delayed branching mechanism is utilized. This assumes that a branch instruction is a delayed instruction that is executed one machine cycle after its issuance, as shown in Figure 11, and uses instruction scheduling by the compiler to fill the instruction slot immediately after the branch instruction with a valid instruction. This is intended to eliminate pipeline disturbances and maintain performance. If a valid instruction cannot be embedded in an instruction slot immediately after a branch instruction, a NoP instruction must be embedded in that instruction slot. In this case, of course, there is a reduction in performance.

どの程度の遅延命令スロットを有効な命令で埋めること
ができるかは、コンパイラの性能に依存する。現在、最
新のコンパイラ技術を用いた場合、遅延命令スロットの
約８０〜９０パーセントを有効に利用することが可能と
なっている。How many delayed instruction slots can be filled with valid instructions depends on the performance of the compiler. Currently, with modern compiler technology, it is possible to effectively utilize approximately 80 to 90 percent of the delayed instruction slots.

[Problem to be solved by the invention]

（１）第１の発明に対する前述のバイブライン構成を採
用した従来の命令処理装置では、第３図（ｂ）に示すよ
うに、命令１が分岐命令の場合には、後続の命令である
命令２の最初のステージ（ＩＦステージ）は、分岐命令
のＥＸステージが終了するマシン・サイクルｔ４まで待
たされる。これは、分岐命令における分岐の成立／不成
立および分岐先アドレスの計算がＥＸステージで実行さ
れるためである。(1) In the conventional instruction processing device that employs the above-mentioned vibe line configuration for the first invention, as shown in FIG. 3(b), when instruction 1 is a branch instruction, the following instruction The first stage (IF stage) of No. 2 is made to wait until machine cycle t4 when the EX stage of the branch instruction ends. This is because taking/not taking a branch in a branch instruction and calculating a branch destination address are executed in the EX stage.

従って、従来の命令処理装置では、分岐命令が実行され
る度に命令の実行パイプラインに空きが生じることにな
る。すなわち、分岐命令と後続の命令は並列して実行さ
れないため最大限の実行速度が得られないという欠点を
有する。Therefore, in the conventional instruction processing device, an empty space is created in the instruction execution pipeline every time a branch instruction is executed. That is, the branch instruction and the subsequent instruction are not executed in parallel, so the maximum execution speed cannot be obtained.

（２）第２の発明に対する前述のＶＬＩＷ方式と命令パ
イプライン方式を組み合わせ並列パイプライン命令処理
装置を構成しようとした場合を考える。例えば、命令パ
イプライン方式の命令処理装置を４つ並列に並べ、４つ
のフィールドを持つＶＬＩＷ型の命令を実行する並列パ
イプライン命令処理装置を考えてみよう。(2) Consider a case where a parallel pipeline instruction processing device is constructed by combining the above-mentioned VLIW method and instruction pipeline method according to the second invention. For example, consider a parallel pipeline instruction processing device that has four instruction pipeline instruction processing devices arranged in parallel and executes a VLIW instruction having four fields.

この並列パイプライン命令処理装置の命令パイプライン
は、上述したＲＩ　ＳＣマイクロプロセッサの命令パイ
プラインと同じ１マシン・サイクルの分岐遅延を持って
いるとする。すると、この並列パイプライン命令処理装
置はｌスロットの遅延命令スロットを持つことになるが
、１命令が４つの命令フィールドから構成されているた
め、実効的に４命令分の遅延命令スロットが生じること
になる。さらに、分岐命令を含む命令自身が持つ３つの
命令フィールドも、命令の依存関係を考慮すると遅延命
令スロットと同じ扱いをする必要がある。したがって、
この４並列パイプライン命令処理装置は、７個の遅延命
令スロットを持つ直列ノｆイプライン命令処理装置と等
価であると考えることができる。It is assumed that the instruction pipeline of this parallel pipeline instruction processing device has the same branch delay of one machine cycle as the instruction pipeline of the RISC microprocessor described above. Then, this parallel pipeline instruction processing device has l slots of delayed instruction slots, but since one instruction consists of four instruction fields, there are effectively four instructions' worth of delayed instruction slots. become. Furthermore, the three instruction fields of the instruction itself, including the branch instruction, need to be treated in the same way as the delayed instruction slots in consideration of instruction dependencies. therefore,
This four-parallel pipeline instruction processing device can be considered equivalent to a serial pipeline instruction processing device having seven delay instruction slots.

このような数多くの空き命令スロットに有効な命令を埋
め込んで活用する命令スケジ．　　ＩＪ冫グはきわめて
難しく、ほとんどの部分にＮＯＰ命令を埋め込まなけれ
ばならなくなる。先にも述べたように１つの空き命令ス
ロットの利用率でさえ８０〜９０パーセントであり、７
個の空き命令スロットを有効利用することは至難の技で
ある。従って、分岐遅延が１マシン・サイクルある従来
のノくイブライン構成をとった並列パイプライン，命令
処理装置では、分岐命令の実行によりその処理性能が著
しく低下するという欠点がある。An instruction schedule that embeds and utilizes valid instructions into such a large number of empty instruction slots. IJ programming is extremely difficult and requires NOP instructions to be embedded in most parts. As mentioned earlier, even the utilization rate of one free instruction slot is 80-90%, and 7
It is extremely difficult to make effective use of empty instruction slots. Therefore, a conventional parallel pipeline and instruction processing device having a branch line configuration in which the branch delay is one machine cycle has the disadvantage that its processing performance is significantly degraded by the execution of a branch instruction.

[Means to solve the problem]

第１の発明の命令処理装置の構成は、単一マシン・サイ
クルで実行できる命令セットを有し、該命令を記憶する
第一の記憶手段と、オペランドを記憶する第二の記憶手
段と、前記第一の記憶手段から前記命令を読み出すため
の命令読み出し手段と、読み出された該命令を実行する
のに必要なオペランドを前記第二の記憶手段から読み出
すオペランド読み出し手段と、読み出された該オペラン
ドを使用して命令を実行する命令実行手段と、命令実行
の結果得られた該オペランドを前記第二の記憶手段に書
き込むオペランド書き込み手段とを有し、前記命令の読
み出し、前記オペランドの読み出し、前記命令の実行、
前記オペランドの書き込みからなるパイプライン命令処
理機構を備えた計算機システムにおいて、分岐先のアド
レスを生成する分岐アドレス生成手段をさらに備え、前
記第一の記憶手段から前記命令読み出し手段によって読
み出された命令が分岐命令であった場合には、前記オペ
ランドの読み出しのマシン・サイクルにおいて前記分岐
アドレス生成手段における分岐アドレスの生成を同時に
おこなうことによって分岐命令実行時のパイプラインの
乱れをな＜シ、パイプライン動作を高速化することを特
徴とする。The configuration of the instruction processing device of the first invention has an instruction set that can be executed in a single machine cycle, and includes a first storage means for storing the instructions, a second storage means for storing the operands, and a second storage means for storing the operands. an instruction reading means for reading the instruction from the first storage means; an operand reading means for reading the operands necessary for executing the read instruction from the second storage means; comprising an instruction execution means for executing an instruction using an operand, and an operand writing means for writing the operand obtained as a result of the instruction execution into the second storage means, reading the instruction, reading the operand, execution of said instructions;
The computer system includes a pipeline instruction processing mechanism consisting of writing the operand, further comprising branch address generation means for generating a branch destination address, and the computer system further comprises branch address generation means for generating a branch destination address, and the instruction read from the first storage means by the instruction reading means. is a branch instruction, by simultaneously generating a branch address in the branch address generation means in the machine cycle for reading the operand, the pipeline is prevented from being disturbed when the branch instruction is executed. It is characterized by faster operation.

また、第２の発明の構成は、ｎ個（ｎはｎ≧２の自然数
）の命令の並列の並びからなる命令列を有し、該命令列
を記憶する第一の記憶手段と、該第一の記憶手段から前
記命令列を読み出すための命令列読み出し手段と、読み
出した前記命令列中のｎ個の前記命令に対応し、前記命
令が指定する命令を処理するｎ個の命令処理手段と、ｎ
個の該命令処理手段が使用するオペランドを記憶し、ｎ
個の前記命令処理手段から独立してリード／ライト可能
な第二の記憶手段とを備え、ｎ個の命令を並列に処理す
る命令処理装置において、前記命令処理手段中のｎ−１
個の命令処理手段は、前記命令が指定する命令の実行に
必要なオペランドを前記第二の記憶手段から読み出すオ
ペランド読み出し手段と、読み出した該オペランドを使
用して命令を実行する命令実行手段と、命令実行の結果
得られた該オペランドを前記第二の記憶手段に書き戻す
オペランド書き込み手段とを備え、前記命令列の読み出
しおよび前記オペランドの読み出しを実行する第一のス
テージ，前記命令の処理を実行する第二のステージ，前
記オペランドの書き込みを実行する第三のステージで構
成されるパイプライン命令処理機構により分岐命令以外
の命令を実行し、一方、前記命令処理手段中の残る１個
の命令処理手段は、前記命令が指定する条件分岐命令の
実行に必要なオペランドを前記第二の記憶手段から読み
出すオペランド読み出し手段と、次に実行する命令列の
アドレスを生成するアドレス生成手段とを備え、前記オ
ペランドの読み出しおよび前記アドレスの生成を並列に
実行し、前記命令列の読み出し，前記オペランドの読み
出しおよび前記アドレスの生成を単一マシン・サイクル
で実行する分岐制御機構により分岐命令を実行し、分岐
遅延による空き命令スロットの増加を抑えたことを特徴
とする。Further, the configuration of the second invention has an instruction string consisting of a parallel arrangement of n instructions (n is a natural number of n≧2), and includes a first storage means for storing the instruction string, and a first storage means for storing the instruction string; an instruction sequence reading means for reading out the instruction sequence from one storage means; n instruction processing means corresponding to the n instructions in the read instruction sequence and processing instructions specified by the instructions; ,n
n operands used by the instruction processing means are stored; n
n-1 of the instruction processing means;
The instruction processing means includes operand reading means for reading operands necessary for executing the instruction specified by the instruction from the second storage means, and instruction execution means for executing the instruction using the read operands. operand writing means for writing back the operand obtained as a result of instruction execution to the second storage means; a first stage for reading out the instruction string and reading out the operand; and executing processing for the instruction. A pipeline instruction processing mechanism consisting of a second stage to write the operand and a third stage to write the operand executes instructions other than the branch instruction, while processing the remaining one instruction in the instruction processing means. The means includes operand reading means for reading out operands necessary for execution of a conditional branch instruction specified by the instruction from the second storage means, and address generation means for generating an address of an instruction sequence to be executed next, The branch instruction is executed by a branch control mechanism that reads the operand and generates the address in parallel, reads the instruction string, reads the operand, and generates the address in a single machine cycle, and delays the branch. It is characterized by suppressing the increase in the number of empty instruction slots due to

〔Example〕

次に、本発明について図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

第１図は第１の発明の一実施例の構成を示すブロック図
、第２図は第１図の実行タイミングチャートである。FIG. 1 is a block diagram showing the configuration of an embodiment of the first invention, and FIG. 2 is an execution timing chart of FIG. 1.

第１図において、１１は命令フェッチ手段、１２はオペ
ランド●フェッチ手段、１３は命令実行手段、１４はオ
ペランド●ライト手段、１５は命令メモリ、１８は読み
出しポートを２つと書き込みボートを１つ備えた汎用レ
ジスタ●ファイル、１７は命令をフェッチするアドレス
を１ずつインクリメントするインクリメンタ、１８はマ
ノレチプレクサ、１９は命令フェッチ手段１１によって
フェッチした命令が分岐命令である場合に分岐先アドレ
スを計算する分岐アドレス生成手段、１０１，１０２，
１０５，１０９，１１０はフェッチした命令を転送する
命令バス、１０３，１０４，１１３はフェッチしたオペ
ランドを転送するソース●オペランド●バス、１０８は
命令実行結果を転送するデスティネーシロン◆オペラン
ド●バス、１０７，１０８，１１１，１１２は命令アド
レス●パス、１１４，１１６はレジスタ●アドレス●バ
ス、１１５はオペランド●フェッチに使用するレジスタ
●リード●バス、１１７はオペランドのフェッチおよび
ライトに時分割して使用するレジスタ●リード／ライト
●バスである。In FIG. 1, 11 is an instruction fetch means, 12 is an operand fetch means, 13 is an instruction execution means, 14 is an operand write means, 15 is an instruction memory, and 18 is equipped with two read ports and one write port. General-purpose register ● file, 17 is an incrementer that increments the address from which an instruction is fetched by 1, 18 is a manoretic multiplexer, and 19 is a branch address generator that calculates a branch destination address when the instruction fetched by the instruction fetch means 11 is a branch instruction. Means, 101, 102,
105, 109, 110 are instruction buses for transferring fetched instructions; 103, 104, 113 are source operand buses for transferring fetched operands; 108 are destination operand buses for transferring instruction execution results; 107 , 108, 111, and 112 are instruction address paths, 114 and 116 are register address buses, 115 is a register read bus used for operand fetch, and 117 is used in time-sharing for fetching and writing operands. It is a register ● read/write ● bus.

第２図において、工Ｆは命令フェッチ●サイクル、ＯＦ
／ＢＡはオペランド●フェッチ／分岐アドレス生成サイ
クル、ＥＸは命令実行サイクル、ＯＷはオペランド●ラ
イト●サイクルである。In Figure 2, F is the instruction fetch ● cycle, OF
/BA is an operand fetch/branch address generation cycle, EX is an instruction execution cycle, and OW is an operand write cycle.

第１図および第２図を用いて本実施例における命令処理
の流れを説明する。ここでは、第２図における命令１の
流れを説明する。The flow of command processing in this embodiment will be explained using FIG. 1 and FIG. 2. Here, the flow of command 1 in FIG. 2 will be explained.

命令フェッチ手段ｌ１は、マシン・サイクルｔ１の前半
の半サイクルで、命令アドレスを命令アドレス●バス１
０８を介してマルチプレクサに出力し、マルチプレクサ
が命令アドレス●バス１０７を介して送出した命令アド
レスによって、命令メモリ１５から読み出された命令を
命令バス１０９を経由してフェッチする。ここで、マル
チプレクサ１８が送出する命令アドレスとはー、命令バ
ス１０８および１１１から得られる２つの命令アドレス
のうち、ソース●オベランド●バス１１３の内容によっ
て選択されたどちらか一方の命令アドレスである。In the first half cycle of the machine cycle t1, the instruction fetch means l1 fetches the instruction address from the instruction address bus 1.
08 to the multiplexer, and the instruction read from the instruction memory 15 is fetched via the instruction bus 109 according to the instruction address sent by the multiplexer via the instruction address bus 107. Here, the instruction address sent by the multiplexer 18 is one of the two instruction addresses obtained from the instruction buses 108 and 111, which is selected depending on the contents of the source Oberand bus 113.

フェッチした命令は、次のマシン・サイクルｔ２が開始
するタイミングで、命令バス１０１を介してオペランド
●フェッチ手段１２に転送されるとともに、命令が分岐
命令の場合には、マシン・サイクルｔ１の後半の半サイ
クルが開始するタイミングで、命令バス１１０を介して
分岐アドレス生成手段１９に転送される。また、インク
リメンタ１７は、命令アドレス●バス１０７の命令アド
レスを１だげ加算した値を、命令アドレス●バス１１２
を介して命令フェッチ手段ｌ１に送出する。The fetched instruction is transferred to the operand fetch means 12 via the instruction bus 101 at the start of the next machine cycle t2, and if the instruction is a branch instruction, it is transferred to the operand fetch means 12 at the start of the next machine cycle t2. At the timing when a half cycle starts, it is transferred to the branch address generation means 19 via the instruction bus 110. Further, the incrementer 17 adds the value obtained by adding the instruction address of the instruction address bus 107 by one to the instruction address bus 112.
It is sent to the instruction fetch means l1 via the instruction fetch means l1.

オペランド●フェッチ手段１２は、マシン・サイクルｔ
１の後半の半サイクルで、命令パス１０１を介して転送
された命令に基づいて、オペランドをフェッチするレジ
スタのアドレスを、レジスタ●アドレス●バス１１４お
よび１１６に送出し、汎用レジスタ●ファイルｌ６から
レジスタ◆りード●バス１１５およびレジスタ●リード
／ライトバス１１７の２つのバスを介してオペランドを
フェッチする。The operand fetch means 12 is a machine cycle t
In the second half cycle of 1, based on the instruction transferred via the instruction path 101, the address of the register that fetches the operand is sent to the register●address●buses 114 and 116, and the register is sent from the general register●file l6. Operands are fetched via two buses: ◆Read Bus 115 and Register Read/Write Bus 117.

オペランドのフェッチが完了すると、命令は命令パス１
０２を介して、フェッチした２つのオペランドはソース
●オペランド●バス１０３および１０４を介して、次の
マシン・サイクルｔ２が開始するタイミングで命令実行
手段１３に転送される。また、命令が分岐命令の場合に
は、分岐の成立／不成立を決定するオペランド情報がソ
ース●オベランド●バス１１３を介してマルチプレクサ
１８に転送される。Once the operand fetch is complete, the instruction passes through instruction path 1.
02, the two fetched operands are transferred to the instruction execution means 13 via source operand buses 103 and 104 at the start of the next machine cycle t2. Further, if the instruction is a branch instruction, operand information that determines whether the branch is taken or not is transferred to the multiplexer 18 via the source Oberand bus 113.

分岐アドレス生成ユニットｌ９は、マシン・サイクルｔ
１の後半の半サイクルで、命令バス１１０を介して転送
された命令に基づいて生成した分岐先アドレスをマルチ
プレクサ１８に送出する。The branch address generation unit l9 generates a machine cycle t
In the second half cycle of 1, a branch destination address generated based on the instruction transferred via the instruction bus 110 is sent to the multiplexer 18.

命令実行手段１３は、ソース●オペランド●バス１０３
および１０４を介して転送されたオペランドを使用し、
命令バス１０２を介して転送された命令を１マシン・サ
イクル（ｔ２）で実行する。実行が完了した命令は、命
令バス１０５を介し、また命令実行の結果得られたデー
タは、デスティネーシ１ン●オペランド●バス１０６を
介して、次のマシン・サイクル●ｔ３が開始するタイミ
ングでオペランド●ライト手段１４に転送される。The instruction execution means 13 includes a source operand bus 103
and using the operands transferred via 104,
The instructions transferred via the instruction bus 102 are executed in one machine cycle (t2). The executed instruction is transferred to the instruction bus 105, and the data obtained as a result of the instruction execution is transferred to the destination operand bus 106 at the start of the next machine cycle t3. ●Transferred to the write means 14.

オペランド●ライト手段１４は、マシン・サイクルｔ３
の前半の半サイクルで、命令バス１０５を介して転送さ
れた命令に基づいて、オペランドを書き込むレジスタの
アドレスをレジスタ●アドレス●バスエエ６に送出し、
またデスティネーシロン●オペランド●バス１０６を介
して転送されたオペランドを、レジスタ●リード／ライ
ト●バス１１７を介して送出し、汎用レジスタ●ファイ
ル１６に書き込む。なお、前述のレジスタ書き込みは、
マシン・サイクルｔ３の前半の半サイクル間で行なわれ
、オペランド●ライト手段１４はマシン・サイクルｔ３
の後半の半サイクルはアイドル状態となる。Operand Write means 14 is machine cycle t3
In the first half cycle of , based on the instruction transferred via the instruction bus 105, the address of the register in which the operand is to be written is sent to the register address bus 6.
Also, the operand transferred via the destination operand bus 106 is sent out via the register read/write bus 117 and written into the general-purpose register file 16. Note that the register write mentioned above is
This is performed during the first half cycle of the machine cycle t3, and the operand write means 14 is performed during the first half cycle of the machine cycle t3.
The latter half cycle is in an idle state.

個々の命令は以上述べたような動作で実行される。これ
らの動作は各マシン参サイクル毎に重ね合わされて命令
パイプラインを構成する。Each instruction is executed as described above. These operations are superimposed on each machine cycle to form an instruction pipeline.

さて、第３図（ｂ）において、命令１が分岐命令の場合
には、分岐アドレス生成手段Ｉ９は、ｔ１の後半の半サ
イクル（ＯＦ／ＢＡステージ）で、命令バス１１０を介
して命令フェッチ手段１１から転送された命令に基づい
て分岐先アドレスを生成し、ｔ２が開始するタイミング
で、命令アドレス●バス１１１を介してマルチプレクサ
１８に送出ナる。また、オペランド●フエツチ手段ｌ２
は、成立／不成立を決定するオペランド情報をソース●
オベランド●バス１１３を介してマルチプレクサ１８に
転送する。Now, in FIG. 3(b), when the instruction 1 is a branch instruction, the branch address generation means I9 sends the instruction to the instruction fetch means via the instruction bus 110 in the latter half cycle (OF/BA stage) of t1. A branch destination address is generated based on the instruction transferred from the instruction address bus 111 and sent to the multiplexer 18 via the instruction address bus 111 at the timing when t2 starts. In addition, the operand fetish means l2
source the operand information that determines whether it holds true or not.
It is transferred to the multiplexer 18 via the Oberando bus 113.

分岐命令の後続の命令である命令２の最初のマシン・サ
イクル（ｔ２）の前半の半サイクル（■Ｆステージ）で
は、マルチブレクサ１８は、命令フェッチ手段１１から
命令アドレス●バス１０８を介して転送された命令アド
レスと、分岐アドレス生成手段１９から命令アドレス●
バス１１１を介して転送された命令アドレスのふたつの
命令アドレスから、オペランド●フェッチ手段１２から
転送されたオペランド情報により適切なアドレスを選択
し、命令アドレス●バス１０７に送出する。In the first half cycle (■F stage) of the first machine cycle (t2) of instruction 2, which is the instruction following the branch instruction, the multiplexer 18 receives the instruction address transferred from the instruction fetch means 11 via the bus 108. instruction address and the instruction address from the branch address generation means 19.
From the two instruction addresses transferred via the bus 111, an appropriate address is selected based on the operand information transferred from the operand fetch means 12 and sent to the instruction address bus 107.

命令フェッチ手段１１は、マルチプレクサが命令アドレ
ス●バス１０７を介して送出した命令アドレスによって
命令メモリ１５から読み出された命令を、命令バス１０
９を経由してフェッチする。The instruction fetch means 11 receives the instruction read from the instruction memory 15 according to the instruction address sent by the multiplexer via the instruction address bus 107.
Fetch via 9.

すなわち、命令２の命令フェッチ●サイクルはマシン・
サイクルｔ２で実行できる。In other words, the instruction fetch ● cycle for instruction 2 is
It can be executed in cycle t2.

従って、分岐命令と後続の命令の間でも、命令の実行パ
イプラインに空きが生じないため、並列に実行できる。Therefore, since there is no empty space in the instruction execution pipeline between a branch instruction and a subsequent instruction, the instructions can be executed in parallel.

ところで、本実施例の命令パイプラインにおいては、第
２図に示すように、命令１のＯＷサイクルと命令３のＯ
Ｆ／ＢＡサイクルは排反するタイミングで動作するため
、汎用レジスタ●ファイル１６とオペランド●フェッチ
手段１２およびオペランド●ライト手段１４を接続する
バスは共有できる。By the way, in the instruction pipeline of this embodiment, as shown in FIG. 2, the OW cycle of instruction 1 and the OW cycle of instruction 3 are
Since the F/BA cycle operates at mutually exclusive timing, the bus connecting the general-purpose register file 16, the operand fetch means 12, and the operand write means 14 can be shared.

なお、本発明は前述の実施例に制限されることなく他の
適切な構成によっても実現できることはいうまでもない
。It goes without saying that the present invention is not limited to the above-described embodiments, but can be realized with other suitable configurations.

次に、第２の発明について図面を参照して説明する。Next, the second invention will be explained with reference to the drawings.

第４図は、ｎ＝４の場合の本発明の一実施例のブロック
図であり、４つの命令から構成されるＶＬＩＷ型の並列
命令列により、４つの命令を並列に実行する並列パイプ
ライン命令処理装置の構成を示したものである。FIG. 4 is a block diagram of an embodiment of the present invention in the case of n=4, and is a parallel pipeline instruction that executes four instructions in parallel using a VLIW type parallel instruction sequence consisting of four instructions. This figure shows the configuration of the processing device.

第４図において、４１１は命令列メモリ、４１２は命令
列フェッチ手段、４１３は８つの読み出しポートと４つ
の書き込みボートを備えたデータ●レジスタ、４１４〜
４１７はオペランド●フェッチ手段、４１８は次にフェ
ッチする命令列のアドレスを生成するアドレス生成手段
、４１９〜４２１は命令実行手段、４２２〜４２５はオ
ペランド●ライト手段、４１０１は命令列をフェッチす
るための命令列パス、４１０２はアドレス●バス、４１
０３〜４１０６はフェッチした命令を転送する命令バス
、４１０７〜４１１０は命令の実行に必要なオペランド
のフェッチに使用する２本のレジスタ●リード●バス、
４１１１〜４１１４はフェッチしたオペランドを転送す
る２本のソース●オベランド●バス、４１１５〜４１１
８は命令実行結果を転送するデスティネーション●オペ
ランド●バス、４１１９〜４１２２はオペランドの書き
込みに使用するレジスタ●ライト●バスである。In FIG. 4, 411 is an instruction string memory, 412 is an instruction string fetch means, 413 is a data register with 8 read ports and 4 write ports, 414 to
417 is operand fetch means, 418 is address generation means for generating the address of the instruction string to be fetched next, 419 to 421 are instruction execution means, 422 to 425 are operand write means, and 4101 is for fetching the instruction string. Instruction string path, 4102 is address bus, 41
03 to 4106 are instruction buses that transfer fetched instructions, 4107 to 4110 are two register read buses used to fetch operands necessary for executing instructions,
4111 to 4114 are two source Oberand buses that transfer fetched operands, 4115 to 411
8 is a destination *operand* bus for transferring the instruction execution results, and 4119 to 4122 are register *write* buses used for writing operands.

第５図は、第４図の構成を持つ並列パイプライン命令処
理装置の命令フォーマットを示すものである。FIG. 5 shows the instruction format of the parallel pipeline instruction processing device having the configuration shown in FIG.

第７図は、第２の発明のバイブラインの構造を示す図で
ある。図中の略号の意味は次のとおりである。FIG. 7 is a diagram showing the structure of the vibration line of the second invention. The meanings of the abbreviations in the figure are as follows.

ＩＦ・・・命令列フェッチＯＦ・・・オペランド●フェッチＡＧ・・・アドレス生成ＥＸ・・・命令実行ＯＷ・・・ライト●バック第８図は、条件分岐命令を含むプログラム●シ一ケンス
の例を示す図、第９図は、第２の発明における命令パイ
プラインの動作を示す図であり、第８図に示すプログラ
ム●シーケンスを実行する場合のパイプライン動作を示
している。IF...Instruction string fetch OF...Operand Fetch AG...Address generation EX...Instruction execution OW...Write Back Figure 8 is an example of a program sequence that includes a conditional branch instruction. FIG. 9 is a diagram showing the operation of the instruction pipeline in the second invention, and shows the pipeline operation when the program ● sequence shown in FIG. 8 is executed.

第９図において、ＩＦは命令列フェッチ、ＯＦはオペランド●フェッチ、ＡＧは分岐アドレス生成、演算１〜６および演算１０〜１２はそれぞれの命令実行
、ＷＢはオペランドのライト●バックを表わす。In FIG. 9, IF represents instruction string fetch, OF represents operand fetch, AG represents branch address generation, operations 1 to 6 and operations 10 to 12 execute respective instructions, and WB represents operand write back.

はじめに、第４図および第５図を用いて命令列が実行さ
れる場合の動作を説明する。First, the operation when a sequence of instructions is executed will be explained using FIGS. 4 and 5.

命令列フェッチ手段４１２は、アドレス●バス４１０２
で指定される命令列を、命令列メモリ４１１から命令列
バス４１０１を介してフェッチする。命令列フェッチ手
段４１２は、第５図に示した命令１，命令２．命令３お
よび分岐命令の各命令を、それぞれ４１０３，４１０４
，４１０５および４１０６の命令バスを介してオペラン
ド●フェッチ手段４１４〜４１７およびアドレス生成手
段４１８にそれぞれ転送する。The instruction string fetch means 412 uses the address bus 4102
The instruction string specified by is fetched from the instruction string memory 411 via the instruction string bus 4101. The instruction string fetching means 412 fetches instruction 1, instruction 2, . . . shown in FIG. Each instruction of instruction 3 and branch instruction is 4103 and 4104, respectively.
, 4105 and 4106 to operand fetch means 414 to 417 and address generation means 418, respectively.

ついで、オペランド●フェッチ手段４１４〜４１７は、
転送された各命令をデコードし、各命令で使用するオペ
ランドを各々レジスタ●リード●バス４１０７〜４１１
０を介してデータ●レジスタ４１３からフェッチする。Next, the operand fetch means 414 to 417 are
Decode each transferred instruction and write the operands used in each instruction to each register ● Read ● Buses 4107 to 411
Fetch data from register 413 via 0.

オペランド●フェッチ手段４１４〜４１６は、フェッチ
したオペランドをそれぞれソース●オベランド●バス４
１１２〜４１１４を介してそれぞれ命令実行手段４１９
〜４２１に転送する。一方、オペランド●フェッチ手段
４１７は、フェッチしたオペランドをソース●オペラン
ド●バス４１１１を介してアドレス生成手段４１８に転
送する。ここまでの動作はすべての命令処理に関して同
じである。The operand fetch means 414 to 416 respectively transfer the fetched operands to the source Oberand bus 4.
112 to 4114 respectively to instruction execution means 419.
Transfer to ~421. On the other hand, the operand fetch means 417 transfers the fetched operand to the address generation means 418 via the source operand bus 4111. The operations up to this point are the same for all instruction processing.

命令実行手段４１９〜４２１は、ソース●オペランド●
バス４１１２〜４１１４を介して転送されたオペランド
を使用して各命令をそれぞれ実行し、それぞれの実行結
果をデスティネーシロン●オペランド●バス４１１６〜
４１１８を介してオペランド●ライト手段４２３〜４２
５へ転送する。The instruction execution means 419 to 421 execute the source●operand●
Each instruction is executed using the operands transferred via buses 4112 to 4114, and each execution result is sent to the destination ● Operand ● Bus 4116 to
Operand write means 423 to 42 via 4118
Transfer to 5.

オペランド●ライト手段４２３〜４２５は、各結果オペ
ランドを各命令が指定するレジスタにレジスタ●ライト
●バス４１１９〜４１２１を介してそれぞれ書き戻す。Operand write means 423 to 425 write each result operand back to the register specified by each instruction via register write buses 4119 to 4121, respectively.

一方、アドレス生成手段４１８は、内部に保持している
命令列アドレスをインクリメントし次アドレスを生成す
る。それと同時に、命令バス４１０６を介して与えられ
た分岐命令をデコードし、分岐先アドレスの生成を実行
する。そして、ソース●オベランド●バス４ｌ１１を介
して与えられたオペランドを参照して分岐条件の成立／
不成立を判定し、分岐が発生する場合には分岐先アドレ
スを、分岐が発生しない場合には次アドレスをアドレス
●バス４１０２に出力する。また、分岐命令が同時に次
アドレスをレジスタへ格納する動作を伴うもの、すなわ
ちブランチ●アンド●リンク命令の場合には、分岐先ア
ドレスがアドレス●バス４１０２に出力されるとともに
、次アドレスがデスティネーシロン●オベランド●バス
４１１５を介してオペランド●ライト手段２２に転送さ
れる。ついで、オペランド●ライト手段４２２は次アド
レスをオペランドとしてレジスタ●ライト●バス４１２
２を介してデータ●レジスト１３に書き戻す。On the other hand, the address generating means 418 increments the internally held instruction string address and generates the next address. At the same time, it decodes the branch instruction given via the instruction bus 4106 and generates a branch destination address. Then, by referring to the operand given via the source Oberando bus 4l11, the branch condition is satisfied/
It is determined whether the branch is not taken, and if a branch occurs, the branch destination address is output to the address bus 4102, and if the branch does not occur, the next address is output to the address bus 4102. In addition, in the case of a branch instruction that simultaneously stores the next address in a register, that is, a branch ●and●link instruction, the branch destination address is output to the address bus 4102, and the next address is output to the destination bus 4102. ●Operand●Transferred to the write means 22 via the bus 4115. Then, the operand write means 422 writes the next address to the register write bus 412.
The data is written back to the register 13 via 2.

以上述べた処理のタイミングを第６図を用いて説明する
。第６図は、一つの命令列が実行される際の処理の流れ
を示す図である。１〜３ライン目の処理が命令工〜命令
■の処理に対応する。いちばん下の２ラインに渡る処理
が分岐命令の処理に対応する。第４図における、命令列
フェッチ手段４１２が命令列をフェッチするタイミング
がＩＦに対応する。同様に、オペランド◆フェッチ手段
４１４〜４１７によるレジスタからのオペランドのフェ
ッチ、アドレス生成手段４１８により次アドレスおよび
分岐先アドレスの生成、命令実行手段４１９〜４２１に
より命令の実行、オペランド●ライト手段４２２〜４２
５によるレジスタへのオペランド●ライトのタイミング
がそれぞれＯＦ，ＡＧ，ＥＸおよびＷＢに対応する。The timing of the processing described above will be explained using FIG. FIG. 6 is a diagram showing the flow of processing when one instruction sequence is executed. The processing on the 1st to 3rd lines corresponds to the processing of commands - command (2). Processing over the bottom two lines corresponds to branch instruction processing. In FIG. 4, the timing at which the instruction string fetch means 412 fetches the instruction string corresponds to IF. Similarly, the operands are fetched from registers by the fetch means 414 to 417, the next address and branch address are generated by the address generation means 418, the instructions are executed by the instruction execution means 419 to 421, and the operands are written by the write means 422 to 42.
The timing of operand write to the register by No. 5 corresponds to OF, AG, EX, and WB, respectively.

さて、本実施例の並列パイプライン命令処理装置が第７
図に示したプログラム●シーケンスを処理する場合を考
えてみよう。このシーケンスでは命令列２が条件分岐命
令を含んでおり、条件成立によりシーケンスが命令列２
からの命令列Ａへ分岐する。Now, the parallel pipeline instruction processing device of this embodiment is the seventh
Let us consider the case of processing the program●sequence shown in the figure. In this sequence, instruction string 2 includes a conditional branch instruction, and when the condition is met, the sequence changes to instruction string 2.
Branches to instruction sequence A from .

第８図に、第７図のシーケンスが実行される場合のパイ
プラインの動作を示す。命令列２の処理において、分岐
フィールドを処理するパイプラインは、オペランド●フ
ェッチと同時に、次アドレスの生成と分岐先アドレスの
生成を並列して実行しており、フェッチしたオペランド
の内容を使用してｔ２サイクルの終了時に、次アドレス
を使用するか分岐先アドレスを使用するかを決定し、ア
ドレス●バス４ｌ０２に出力することができる。FIG. 8 shows the operation of the pipeline when the sequence of FIG. 7 is executed. In the processing of instruction sequence 2, the pipeline that processes the branch field executes generation of the next address and branch destination address in parallel while fetching the operand, and using the contents of the fetched operand. At the end of the t2 cycle, it is possible to determine whether to use the next address or the branch destination address and output it to the address bus 4l02.

したがって、ｔ３サイクルから命令Ａの処理を開始する
ことができる。Therefore, processing of instruction A can be started from cycle t3.

従って、分岐を含む命令列の実行時にも、パイプライン
に空きが生じることはなく、コンパイラが埋めなければ
ならない命令の空きスロットを従来のものに比べて少な
くでき、効率の高い並列バイブライン命令処理装置が実
現できる。Therefore, even when executing a sequence of instructions that includes a branch, there is no empty space in the pipeline, and the number of empty instruction slots that the compiler must fill can be reduced compared to conventional systems, resulting in highly efficient parallel Vibration instruction processing. The device can be realized.

例えば、１マシン・サイクルの分岐遅延を持つ従来装置
で、最大性能を発揮させるために、コンパイラが命令４
〜命令６の３つの命令スロット、さらに続く遅延命令列
中の３つ命令スロット、合計６つの命令スロットに有効
な命令を埋め込む必要がある。これに対し、本実施例の
並列パイプライン命令処理装置では、命令４〜命令６の
３つの命令スロットを有効な命令で埋めればよい。For example, on a conventional device with a branch delay of one machine cycle, the compiler may
It is necessary to embed valid instructions in the three instruction slots of instruction 6 and the three instruction slots in the subsequent delayed instruction sequence, for a total of six instruction slots. In contrast, in the parallel pipeline instruction processing device of this embodiment, it is sufficient to fill three instruction slots, instructions 4 to 6, with valid instructions.

なお、本発明は前述の実施例に制限されることなく他の
適切な構成によっても実現できることは言うまでもない
。It goes without saying that the present invention is not limited to the above-described embodiments and can be realized with other suitable configurations.

〔Effect of the invention〕

以上説明したように、第１の発明では、パイプライン計
算機において、分岐命令が実行された場合にもパイプラ
イン動作が乱れずに命令が並列実行されるために、命令
の実行を高速化できるという効果があり、また、汎用レ
ジスタの読み出し／書き込み使用するバスを時分割で使
用し共有化できるためにハードウェア量を削減できると
いう効果がある。As explained above, in the first invention, even when a branch instruction is executed in a pipelined computer, instructions are executed in parallel without disrupting the pipeline operation, so that instruction execution can be speeded up. Furthermore, since the bus used for reading/writing general-purpose registers can be used in a time-sharing manner and shared, the amount of hardware can be reduced.

また、以上説明したように第２の発明の並列パイプライ
ン命令処理装置は、分岐命令を持つ命令列による命令の
空きスロットの発生がないために、簡単なハードウェア
とコンパイル時の並列命令スケジューリングにより並列
処理を実現するＶＬＩＷ型並列処理と、命令パイプライ
ン方式による高速処理とを組み合わせた、効率の高い並
列パイプライン命令処理装置を実現することができると
いう効果があり、また、コンパイラが埋めなければなら
ない空き命令スロットを少なくできるため、並列命令ス
ケジューリングが容易になるという効果がある。Furthermore, as explained above, the parallel pipeline instruction processing device of the second invention does not create empty slots for instructions due to instruction sequences with branch instructions, so it is possible to use simple hardware and parallel instruction scheduling during compilation. The effect is that it is possible to realize a highly efficient parallel pipeline instruction processing device that combines VLIW type parallel processing that realizes parallel processing and high-speed processing using the instruction pipeline method. This has the effect of facilitating parallel instruction scheduling because the number of empty instruction slots that would otherwise be required can be reduced.

[Brief explanation of the drawing]

第１図は第１の発明の一実施例の構成を示すブロック図
、第２図は第１図のパイプラインのタイミング図、第３
図（ａ）は従来の命令処理装置のパイプラインのタイミ
ング図、第３図（ｂ）は従来の命令処理装置における分
岐命令のパイプラインのタイミング図、第４図はｎ＝４
の場合の第２の発明の一実施例の構成を示すブロック図
、第５図は第４図の構成を持つ並列パイプライン命令処
理装置の命令フォーマットを示す図、第６図は第２の発
明のパイプラインを示す図、第７図は条件分岐命令を含
むプログラム●シーケンスの図、第８図は第７図に示す
命令を実行した場合の命令パイプラインの動作を示す図
、第９図はＶＬＩＷ方式の並列計算機の原理を示す図、
第１０図は従来の直列命令処理装置の命令パイプライン
を示す図、第１１図は従来のパイプラインにおける分岐
発生時の動作を示した図、第１２図は従来のパイプライ
ンにおける遅延分岐命令の動作を示した図である。ＡＧ・・・アドレス生成、ＥＸ・・・命令実行サイクル
、ＩＦ・・・命令フェッチ●サイクル、ＯＦ・・・レジ
スタ●フェッチ●サイクル、ＯＷ・・・レジスタ●ライ
ト●サイクル、１１・・・命令フェッチ手段、１２・・
・オペランド●フェッチ手段、１３・・・命令実行手段
、１４・・・オペランド●ライト手段、１５・・・命令
メモリ、１６・・・汎用レジスタ●ファイル、１７・・
・インクリメンタ、１８・・・マノレチプレクサ、１９
・・・分岐アドレス生成手段、１０１〜１０２・・・命
令バス、１０３〜１０４オソース●オペランド●バス、
１０５・・・命令バス、１０６・・・デスティネーシロ
ン●オペランド●バス、１０７〜１０８・・・命令アド
レス●バス、１０９〜１１０・・・命令バス、１１１〜
１１２・・・命令アドレス●バス、１１３・・・ソース
●オペランド●バス、１１４・・・レジスタ●アドレス
●バス、１１５・・・レジスタ●リード●バス、１１６
・・・レジスタ●アドレス●バス、１１７・・・レジス
タ●リード／ライト●バス、４１１・・・命令列メモリ
、４１２・・・命令列フェッチ手段、４１３・・・デー
タ●レジスタ、４１４〜４１７・・・オペランド●フェ
ッチ手段、４１８・・・アドレス生成手段、４１９〜４
２１・・・命令実行手段、４２２〜４２５・・・オペラ
ンド●ライト手段、４１０１・・・命令列バス、４１０
２・・・アドレス●バス、４１０３〜４１０６・・・命
令バス、４１０７〜４１１０・・・レジスタ●リード●
バス、４１１１〜４１１４・・・ソース●オペランド●
バス、４１１５〜４１１８・・・デスティネーシロン●
オペランド●ノ｛ス、４１１９〜４１２２・・・レジス
タ●ライト●ノイス。FIG. 1 is a block diagram showing the configuration of an embodiment of the first invention, FIG. 2 is a timing diagram of the pipeline in FIG. 1, and FIG.
FIG. 3(a) is a timing diagram of a pipeline of a conventional instruction processing device, FIG. 3(b) is a timing diagram of a branch instruction pipeline in a conventional instruction processing device, and FIG. 4 is a timing diagram of a branch instruction pipeline in a conventional instruction processing device.
FIG. 5 is a block diagram showing the configuration of an embodiment of the second invention in the case of FIG. Figure 7 is a diagram of a program sequence including a conditional branch instruction, Figure 8 is a diagram showing the operation of the instruction pipeline when the instruction shown in Figure 7 is executed, Figure 9 is a diagram showing the operation of the instruction pipeline when the instruction shown in Figure 7 is executed. A diagram showing the principle of a VLIW parallel computer,
Fig. 10 is a diagram showing an instruction pipeline of a conventional serial instruction processing device, Fig. 11 is a diagram showing the operation when a branch occurs in the conventional pipeline, and Fig. 12 is a diagram showing the operation of a delayed branch instruction in the conventional pipeline. It is a diagram showing the operation. AG... address generation, EX... instruction execution cycle, IF... instruction fetch ● cycle, OF... register ● fetch ● cycle, OW... register ● write ● cycle, 11... instruction fetch Means, 12...
・Operand ● Fetching means, 13... Instruction execution means, 14... Operand ● Writing means, 15... Instruction memory, 16... General purpose register ● File, 17...
・Incrementer, 18... Manorreciplexer, 19
...branch address generation means, 101-102...instruction bus, 103-104 source●operand●bus,
105...Instruction bus, 106...Destination●operand●bus, 107-108...instruction address●bus, 109-110...instruction bus, 111-
112...Instruction address●bus, 113...Source●operand●bus, 114...register●address●bus, 115...register●read●bus, 116
...Register●Address●Bus, 117...Register●Read/Write●Bus, 411...Instruction string memory, 412...Instruction string fetch means, 413...Data●Register, 414-417. ...Operand●Fetch means, 418...Address generation means, 419-4
21... Instruction execution means, 422-425... Operand●Write means, 4101... Instruction string bus, 410
2...Address●Bus, 4103-4106...Instruction bus, 4107-4110...Register●Read●
Bus, 4111 to 4114... Source ● Operand ●
Bus, 4115-4118...Destination Chiron●
Operand●Noce, 4119-4122...Register●Write●Noise.

Claims

[Claims] 1. A first storage means having a set of instructions that can be executed in a single machine cycle and storing the instructions, a second storage means storing operands, and the first storage means; an instruction reading means for reading the instruction from the second storage means; an operand reading means for reading an operand necessary for executing the read instruction from the second storage means;
comprising an instruction execution means for executing an instruction using the read operand, and an operand writing means for writing the operand obtained as a result of the instruction execution into the second storage means, reading the instruction; The computer system includes a pipeline instruction processing mechanism that reads the operand, executes the instruction, and writes the operand. If the instruction read by the instruction reading means is a branch instruction, the branch address generation means generates a branch address at the same time in the machine cycle of reading the operand, so that the pipe at the time of execution of the branch instruction is An instruction processing device characterized by eliminating line disturbances and speeding up pipeline operation. 2. A first storage means that has an instruction sequence consisting of a parallel arrangement of n instructions (n is a natural number of n≧2), and stores the instruction sequence; and a first storage means that stores the instruction sequence from the first storage means; n instruction processing means for processing instructions specified by the instructions corresponding to the n instructions in the read instruction string; and n instruction processing means for processing the instructions specified by the instructions. an instruction processing device for processing n instructions in parallel, comprising a second storage means for storing operands used by the n instruction processing means and readable/writable independently from the n instruction processing means; The n-1 instruction processing means in the means are
operand reading means for reading operands necessary for execution of the instruction specified by the instruction from the second storage means; instruction execution means for executing the instruction using the read operands; operand writing means for writing back the operand to the second storage means; a first stage for reading out the instruction sequence and reading out the operand; a second stage for executing processing of the instruction; A pipeline instruction processing mechanism consisting of a third stage for writing operands executes instructions other than branch instructions, while the remaining one of the instruction processing means executes instructions specified by the instruction. operand reading means for reading out operands necessary for execution of a conditional branch instruction to be executed from the second storage means; and address generation means for generating an address of an instruction sequence to be executed next; A branch control mechanism executes generation in parallel, reads the instruction string, reads the operand, and generates the address in a single machine cycle. An instruction processing device characterized by being suppressed.