JPH04215130A

JPH04215130A - Instruction execution processing system for parallel pipeline instruction processing device

Info

Publication number: JPH04215130A
Application number: JP6820391A
Authority: JP
Inventors: Isako Ishikawa; 石川　功子; Yumiko Ushimaru; 牛丸　由美子
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-04-06
Filing date: 1991-04-01
Publication date: 1992-08-05
Anticipated expiration: 2012-07-09
Also published as: JP2629474B2

Abstract

PURPOSE:To improve the processing performance of a parallel pipeline instruction processing device by suppressing the increase of idle fields of a branch instruction block. CONSTITUTION:Plural instruction executing units are provided for the instruction execution processing of the parallel pipeline instruction processing device, and plural instructions are executed in parallel to prevent the occurrence of branch delay. When a conditional branch instruction exists in the branch instruction field of one instruction block consisting of plural instruction fields to be executed in parallel in instructions as the processing unit of this instruction processing device, an instruction execution selecting means (branch instruction executing unit) 6 selects whether each instruction other than the conditional branch instruction in this instruction block should be executed at the time of conditional branch or not. Since this means 6 is provided, the number of idle instruction fields in the instruction block including the conditional branch instruction is reduced.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は情報処理装置に関し、特
に複数命令を並列して実行する並列命令処理装置の分岐
命令実行時の命令実行選択処理方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus, and more particularly to an instruction execution selection processing method when executing a branch instruction in a parallel instruction processing apparatus that executes a plurality of instructions in parallel.

【０００２】0002

【従来の技術】従来、計算機システムの高性能化を図る
方法として、命令パイプライン処理およびＶＬＩＷ（Ｖ
ｅｒｙ　　Ｌｏｎｇ　　Ｉｎｓｔｒｕｃｔｉｏｎ　　Ｗ
ｏｒｄ）型並列命令処理の二つの処理方式を組み合わせ
た並列パイプライン命令処理方式がある。この処理方式
を説明するために、命令パイプライン処理と、ＶＬＩＷ
型並列命令処理について説明する。[Prior Art] Conventionally, instruction pipeline processing and VLIW (VLIW) have been used as methods for improving the performance of computer systems.
Very Long Instruction W
There is a parallel pipeline instruction processing method that combines two processing methods: ord) type parallel instruction processing. To explain this processing method, we will explain instruction pipeline processing and VLIW
Type parallel instruction processing will be explained.

【０００３】命令パイプライン処理とは、命令の実行を
複数のステージに分割し、各々のステージを担当する個
別のハードウェア・ユニットを設け、命令実行の各ステ
ージがオーバーラップして実行される処理である。各ス
テージの実行時間が同一で、それがマシンサイクルに等
しい場合、１マシン・サイクル毎に演算結果を得ること
ができる。従って、命令パイプライン処理においては、
命令やデータが定常的に供給され、各ステージで遊び時
間が生じないような場合に最大性能が発揮される。[0003] Instruction pipeline processing is a process in which instruction execution is divided into multiple stages, separate hardware units are provided for each stage, and each stage of instruction execution is executed in an overlapping manner. It is. If the execution time of each stage is the same and is equal to a machine cycle, a calculation result can be obtained every machine cycle. Therefore, in instruction pipeline processing,
Maximum performance is achieved when instructions and data are constantly supplied and there is no idle time at each stage.

【０００４】この命令パイプライン処理を乱す要因の一
つとして、条件分岐命令の実行がある。条件分岐命令の
分岐条件が成立するか否かはパイプラインの後の方にな
らないとわからない。もし、分岐条件が成立した場合に
は、すでに命令パイプラインに取込まれている命令を無
効果し、分岐先命令の取り込みからやり直さなければな
らない。このように、分岐条件が成立し、分岐が実行さ
れた場合には命令パイプライン処理に遅延が生じ（分岐
遅延）、全体の処理性能が低下してしまう。One of the factors that disturbs this instruction pipeline processing is the execution of conditional branch instructions. It is not known until later in the pipeline whether the branch condition of a conditional branch instruction is satisfied or not. If the branch condition is satisfied, the instructions that have already been taken into the instruction pipeline have no effect, and the process must start over from the introduction of the branch destination instruction. In this way, when the branch condition is satisfied and the branch is executed, a delay occurs in the instruction pipeline processing (branch delay), and the overall processing performance deteriorates.

【０００５】一方、ＶＬＩＷ型並列命令処理とは、複数
の命令フィールドからなる比較的長い命令（以下命令ブ
ロックという）をひとつの命令として処理するもので、
各々の命令フィールドが多数の演算器，レジスタ，相互
結合網，メモリなどを独立に制御できる処理である。こ
のＶＬＩＷ型命令処理方式では、コンパイル時にソース
・プログラムから並列演算可能なものを抽出して、ひと
つの命令ブロックに合成するため、並列演算器の数に近
い命令並列度が得られる場合に高速処理が達成できる。しかし、命令並列度が低い場合には、命令フィールドに
空きが生じ、処理性能は低下する。On the other hand, VLIW type parallel instruction processing is a process in which a relatively long instruction (hereinafter referred to as an instruction block) consisting of multiple instruction fields is processed as one instruction.
This is a process in which each instruction field can independently control a large number of arithmetic units, registers, interconnection networks, memories, etc. In this VLIW type instruction processing method, those that can be operated in parallel are extracted from the source program at the time of compilation and synthesized into one instruction block. can be achieved. However, when the degree of parallelism of instructions is low, empty spaces occur in the instruction field, and processing performance deteriorates.

【０００６】以上に説明したＶＬＩＷ型並列命令処理と
、命令パイプライン処理の両方を組み合わせた命令処理
方式が並列パイプライン命令処理方式である。[0006] An instruction processing method that combines both the VLIW type parallel instruction processing and instruction pipeline processing described above is a parallel pipeline instruction processing method.

【０００７】この並列パイプライン命令処理方式では、
命令フォーマットの中に分岐命令専用のフィールドを設
け、分岐命令処理ユニットを他の命令処理ユニットから
分離することにより、分岐系命令の処理を高速化してい
る。そして、分岐命令と他の命令とを並列処理するわけ
である。In this parallel pipeline instruction processing method,
By providing a field dedicated to branch instructions in the instruction format and separating the branch instruction processing unit from other instruction processing units, the processing speed of branch instructions is increased. Then, the branch instruction and other instructions are processed in parallel.

【０００８】従来までに演算系命令とロード・ストア命
令の並列処理が実現されていたことから、実際には、・
演算系命令の処理ユニット・ロード・ストア命令の処理ユニット・分岐系命令の処理ユニットの３つのユニットを並列処理することが可能な処理方式
となっている。[0008] Since parallel processing of arithmetic instructions and load/store instructions has been realized so far, in reality,
It is a processing system that can process three units in parallel: a processing unit for arithmetic instructions, a processing unit for load and store instructions, and a processing unit for branch instructions.

【０００９】分岐命令処理専用ユニットを追加すること
により、条件分岐命令が高速に実行され、条件分岐が発
生した場合にも分岐による遅延は発生せず、命令パイプ
ライン処理の流れを乱すことはない。また、独立した命
令処理ユニットを並列に制御するために、命令フォーマ
ットとしては、各命令ユニットを制御する複数の命令フ
ィールドからなるＶＬＩＷ型命令が採用されている。By adding a unit dedicated to branch instruction processing, conditional branch instructions are executed at high speed, and even if a conditional branch occurs, there is no delay due to the branch, and the flow of instruction pipeline processing is not disturbed. . Further, in order to control independent instruction processing units in parallel, a VLIW type instruction is adopted as an instruction format, which is composed of a plurality of instruction fields for controlling each instruction unit.

【００１０】さて、この並列パイプライン命令処理方式
を持つ並列命令処理装置の性能は、ひとつの命令ブロッ
クの中にどれだけ多くの命令機能を埋め込めるかによっ
て左右される。プログラムの最適化手法には、局所的最
適化と広域的最適化の二つの方法があるが、いま、基本
的演算操作の系列で、その出口を除いて分岐操作がなく
、かつその入り口を除いて外から分岐されることのない
系列（以下基本ブロックという）に注目すると、局所的
最適化はその各基本ブロック内でデータの依存関係を調
べ、並列実行可能な基本演算操作を検出して、ＶＬＩＷ
型命令として合成する最適化であり、広域的最適化は基
本ブロック間で基本的演算操作の移動を伴った最適化で
ある。しかし、並列パイプライン命令処理方式の並列命
令処理装置では、条件分岐命令が非常に多く、基本ブロ
ック長が短いためにこれらの最適化により大きな効果が
得られない。The performance of a parallel instruction processing device having this parallel pipeline instruction processing method depends on how many instruction functions can be embedded in one instruction block. There are two methods for optimizing programs: local optimization and global optimization.Currently, the basic arithmetic operation sequence has no branch operations except for the exit, and there are no branch operations except for the entrance. Focusing on sequences that are never branched from the outside (hereinafter referred to as basic blocks), local optimization examines data dependencies within each basic block, detects basic arithmetic operations that can be executed in parallel, and VLIW
It is an optimization that is synthesized as type instructions, and global optimization is an optimization that involves moving basic arithmetic operations between basic blocks. However, in a parallel instruction processing device using a parallel pipeline instruction processing method, there are a large number of conditional branch instructions and the basic block length is short, so that these optimizations do not produce a large effect.

【００１１】従来、この並列パイプライン命令処理方式
を持つ並列命令処理装置では、条件分岐に非常に偏りの
ある場合（化学技術計算等の応用）に特に有効なトレー
ス・スケジューリング法と呼ばれる最適化方式が用いら
れている。Conventionally, in parallel instruction processing devices having this parallel pipeline instruction processing method, an optimization method called a trace scheduling method is particularly effective when conditional branches are highly biased (in applications such as chemical engineering calculations). is used.

【００１２】図２１および図２２はトレース・スケジュ
ーリング法の概要を示す模式図である。FIGS. 21 and 22 are schematic diagrams showing an overview of the trace scheduling method.

【００１３】図２１（ａ）〜（ｄ）の四角で囲った部分
が基本ブロックであり、この中に基本演算命令が並んで
いる。図２１（ａ）はあるプログラムのフロー構造を示
しており、この場合の最適化手順は次に示すようになる
。The boxed area in FIGS. 21(a) to 21(d) is a basic block, in which basic operation instructions are arranged. FIG. 21(a) shows the flow structure of a certain program, and the optimization procedure in this case is as follows.

【００１４】ステップ１：図２１（ａ）のフローグラフ
をサンプル・プログラムの実行などのデータにより解析
し、高い確率で実行されるパス（以後トレースという）
を見つける。ここでは、図２１（ｂ）の斜線部で示すパ
スをトレースとする。Step 1: Analyze the flow graph in FIG. 21(a) using data such as sample program execution, and find paths that are executed with a high probability (hereinafter referred to as traces).
Find. Here, the path shown by the diagonal line in FIG. 21(b) is assumed to be a trace.

【００１５】ステップ２：ステップ１で見つけたトレー
ス内でのデータ依存関係に注目して、並列実行可能な演
算操作を検出して、逐次ＶＬＩＷ型命令として合成して
トレース内最適化を行う（図２１（ｃ））。この過程で
トレース内に存在する基本演算命令操作を移動する。Step 2: Paying attention to the data dependencies within the trace found in Step 1, detect arithmetic operations that can be executed in parallel, synthesize them as sequential VLIW instructions, and perform intra-trace optimization (Fig. 21(c)). In this process, basic arithmetic instruction operations existing in the trace are moved.

【００１６】ステップ３：トレース内での基本演算命令
の移動に伴って、図２１（ｄ）に示すようにトレース外
のコードの修正を行う。図中、Ｒで示したブロックは合
流点で付加されたブロック、Ｓで示したブロックは分岐
点で付加されたブロックである。Step 3: As the basic operation instruction moves within the trace, the code outside the trace is modified as shown in FIG. 21(d). In the figure, blocks indicated by R are blocks added at confluence points, and blocks indicated by S are blocks added at branch points.

【００１７】ステップ４：ステップ３で修正したトレー
ス外のコードに対して、上記のステップを繰り返す。Step 4: Repeat the above steps for the non-trace code modified in Step 3.

【００１８】ここでステップ３のトレース外のコード修
正について説明する。[0018] Here, the modification of code outside of tracing in step 3 will be explained.

【００１９】図２２（ａ）は合流点の処理を示す。ステ
ップ２のトレース上のコード移動により、ｍ１１，ｍ１
２，ｍ１３，ｍ３１，ｍ３２，ｍ３３，ｍ３４（図２２
（ａ）のｘ１　）がｍ１１，ｍ１２，ｍ３２，ｍ３３，
ｍ１３，ｍ３１，ｍ３４（図２２（ａ）のｘ２　）とな
ったとする。この時、新合流点は下流に「旧合流点の上
流にあるトレース上の演算操作」がない点に定め、この
図では系列Ａは新合流点ｍ３１で合流する。しかし、旧
合流点の下流にあった処理ｍ３２，ｍ３３が新合流点の
上流に移動しているので、このままでは系列Ａからのパ
スで処理ｍ３２，ｍ３３が実行されないことになる。そ
こで、系列Ａに処理ｍ３２，ｍ３３（ｘ３　）を追加す
る。FIG. 22(a) shows processing at a confluence point. By moving the code on the trace in step 2, m11, m1
2, m13, m31, m32, m33, m34 (Fig. 22
(a) x1) is m11, m12, m32, m33,
Suppose that the values are m13, m31, and m34 (x2 in FIG. 22(a)). At this time, the new merging point is determined to be a point where there is no "arithmetic operation on the trace upstream of the old merging point" downstream, and in this figure, the series A merges at the new merging point m31. However, since processes m32 and m33 that were downstream of the old confluence point have been moved upstream of the new confluence point, processes m32 and m33 will not be executed on the path from series A if this continues. Therefore, processes m32 and m33 (x3) are added to series A.

【００２０】一方、分岐点では図２２（ｂ）のような処
理になる。ステップ２のトレース上のコード移動により
、ｍ１１，ｍ１２，ｍ１３，ｍ１４，ｍ２１，ｍ２２，
ｍ２３（図２２（ｂ）ｘ４　）がｍ１１，ｍ１３，ｍ１
４，ｍ１２，ｍ２１，ｍ２２，ｍ２３（図２２（ｂ）ｘ
５　）になったとする。新分岐点ｍ１４と変わらないが
、分岐点の上流にあった処理ｍ１２が新分岐点の下流に
移動しているので、このままでは系列Ｂのパスでｍ１２
が実行されないことになる。そこで、系列Ｂに処理ｍ１
２（ｘ６　）を付加する。On the other hand, at a branch point, the processing is as shown in FIG. 22(b). By moving the code on the trace in step 2, m11, m12, m13, m14, m21, m22,
m23 (Fig. 22(b) x4) is m11, m13, m1
4, m12, m21, m22, m23 (Fig. 22(b) x
5). Although it is the same as the new branch point m14, the process m12 that was upstream of the branch point has been moved to the downstream of the new branch point, so if it continues as it is, the path of series B will
will not be executed. Therefore, process m1 for series B.
2(x6) is added.

【００２１】[0021]

【発明が解決しようとする課題】上述した並列パイプラ
イン処理方式においては、条件分岐命令を含む命令ブロ
ックの分岐命令以外の命令は、分岐条件が成立しても成
立しなくても実行される。そのため条件分岐命令を含む
命令ブロックの分岐命令以外の命令としては、分岐条件
の成立・不成立に関わらず実行可能な命令を抽出して埋
め込む必要がある。In the parallel pipeline processing system described above, instructions other than the branch instruction in the instruction block containing the conditional branch instruction are executed regardless of whether the branch condition is satisfied or not. Therefore, as instructions other than branch instructions in an instruction block containing conditional branch instructions, it is necessary to extract and embed instructions that can be executed regardless of whether the branch condition is met or not.

【００２２】例えば、命令パイプライン方式の命令処理
装置を４つ並列に並べ、４つのフィールドを持つＶＬＩ
Ｗ型の命令を実行する並列パイプライン命令処理装置を
考えてみる。この並列パイプラインのステージの１つは
分岐命令だけを実行する分岐命令専用ステージとなって
おり、条件分岐による遅延の生じない並列パイプライン
命令処理装置である。For example, a VLI with four instruction processing units of the instruction pipeline system arranged in parallel and four fields is constructed.
Consider a parallel pipeline instruction processing device that executes W-type instructions. One of the stages of this parallel pipeline is a branch instruction-only stage that executes only branch instructions, and is a parallel pipeline instruction processing device that does not cause delays due to conditional branches.

【００２３】この場合、条件分岐命令を含む命令ブロッ
クの条件分岐命令以外の命令フィールドは３つあり、そ
の３つのフィールドには、分岐条件の成立の如何に関わ
らず実行可能な命令を埋め込む必要がある。しかし、実
行可能な命令が埋め込めなかった場合には、最高３つの
空きフィールドができてしまう。In this case, there are three instruction fields other than the conditional branch instruction in the instruction block containing the conditional branch instruction, and it is necessary to embed instructions that can be executed regardless of whether the branch condition is satisfied or not in these three fields. be. However, if an executable instruction cannot be embedded, up to three empty fields will be created.

【００２４】並列でない命令パイプラインにおける遅延
分岐機構において、分岐命令直後の命令スロットに有効
な１命令を埋め込むことですら、現在の最新コンパイラ
技術を用いた場合でも、約８０〜９０パーセントの率と
なっている。このことを考えると、並列パイプライン処
理装置における条件分岐命令を含む命令ブロックの３つ
の命令フィールドに実行可能な命令を埋め込んで活用す
る命令スケジューリングはきわめて難しく、ほとんどの
部分にＮＯＰ命令を埋め込まなければならなくなる。[0024] In a delayed branch mechanism in a non-parallel instruction pipeline, even embedding one valid instruction in the instruction slot immediately after a branch instruction has a rate of about 80 to 90% even when using the current state-of-the-art compiler technology. It has become. Considering this, instruction scheduling in which executable instructions are embedded in the three instruction fields of an instruction block containing conditional branch instructions in a parallel pipeline processing device is extremely difficult, and it is necessary to embed NOP instructions in most parts. It will stop happening.

【００２５】従って、分岐による遅延の生じない並列パ
イプライン命令処理装置においても、条件分岐命令を含
む命令ブロックに空き命令フィールドが増え、条件分岐
命令を含む命令ブロックの実行時にその処理性能が著し
く低下するという問題点がある。Therefore, even in a parallel pipeline instruction processing device that does not cause delays due to branches, the number of empty instruction fields increases in instruction blocks containing conditional branch instructions, resulting in a significant drop in processing performance when executing instruction blocks containing conditional branch instructions. There is a problem with that.

【００２６】また、上述のようにトレース・スケジュー
リング法は、長いトレース上の基本的演算操作を対象と
して最適化がなされる。従って、効果は大きいが、その
反面ブロックの複写が頻繁に発生するため生成されるプ
ログラムのコードサイズが非常に大きくなってしまうと
いう欠点がある。また、合成されるＶＬＩＷ命令の条件
分岐命令を含む命令ブロックのその他複数の命令フィー
ルドは、分岐条件の成立の如何に関わらずに実行可能な
命令を埋め込まなければならない。Furthermore, as described above, the trace scheduling method is optimized for basic arithmetic operations on long traces. Therefore, although the effect is great, on the other hand, there is a drawback that the code size of the generated program becomes very large because blocks are frequently copied. In addition, instructions that can be executed must be embedded in the other instruction fields of the instruction block including the conditional branch instruction of the VLIW instruction to be synthesized, regardless of whether the branch condition is satisfied or not.

【００２７】しかし、並列でない命令パイプラインにお
ける遅延分岐機構において、分岐命令直後の命令スロッ
トに有効な１命令を埋め込むことでさえ、現在の最新コ
ンパイラ技術では約８０％〜９０％の率である。従って
、この分岐命令以外の複数のフィールドに命令を埋め込
んで活用するような命令スケジューリングは非常に難し
く、ほとんどのフィールドのＮＯＰ命令を埋め込むこと
になり、条件分岐命令を含む命令ブロックに空き命令フ
ィールドが増え、条件分岐命令を含む命令ブロックの実
行時にその処理性能が著しく低下するという問題点が発
生する。However, in a delayed branch mechanism in a non-parallel instruction pipeline, even embedding one valid instruction in the instruction slot immediately after a branch instruction has a rate of about 80% to 90% with current state-of-the-art compiler technology. Therefore, instruction scheduling that embeds and utilizes instructions in multiple fields other than this branch instruction is extremely difficult, and it ends up embedding NOP instructions in most fields, leaving empty instruction fields in the instruction block containing the conditional branch instruction. This causes a problem in that processing performance is significantly degraded when executing an instruction block containing conditional branch instructions.

【００２８】本発明の目的は、条件分岐命令を含む命令
ブロックの条件分岐以外の命令に、分岐条件の成立・不
成立に応じて命令実行を選択することが可能な命令実行
選択機能を設けることにより、条件分岐命令を含む命令
ブロックにおける命令フィールドの空きフィールド増加
を防ぎ、処理性能を向上させた並列パイプライン命令処
理装置の命令処理方式を提供することにある。An object of the present invention is to provide an instruction execution selection function for instructions other than conditional branches in an instruction block containing conditional branch instructions, which makes it possible to select instruction execution depending on whether a branch condition is met or not. An object of the present invention is to provide an instruction processing method for a parallel pipeline instruction processing device that prevents an increase in empty instruction fields in an instruction block including a conditional branch instruction and improves processing performance.

【００２９】本発明の他の目的は、実行選択処理を用い
た有効なコードを生成できるため、ループの実行処理時
間が短縮され、処理性能を向上させた並列パイプライン
命令処理装置の命令処理方式を提供することにある。Another object of the present invention is to provide an instruction processing method for a parallel pipeline instruction processing device that can generate effective code using execution selection processing, thereby reducing loop execution processing time and improving processing performance. Our goal is to provide the following.

【００３０】[0030]

【課題を解決するための手段】本発明の構成は、複数の
命令実行ユニットを備えて複数命令を並列して実行する
ことにより、分岐遅延を生じないようにした並列パイプ
ライン命令処理装置の命令実行処理方式において、この
命令処理装置の処理単位となる命令で並列実行される複
数の命令フィールドで構成される一つの命令ブロックの
分岐命令フィールドに条件分岐命令がある場合、その命
令ブロックの条件分岐命令以外の各命令について、条件
分岐時に命令を実行するか否かを選択する命令実行選択
手段を設けることにより、前記条件分岐命令を含む命令
ブロックの空き命令フィールドを少くしたことを特徴と
する。[Means for Solving the Problems] The configuration of the present invention provides a parallel pipeline instruction processing device that is equipped with a plurality of instruction execution units and executes a plurality of instructions in parallel to avoid branch delays. In the execution processing method, if there is a conditional branch instruction in the branch instruction field of one instruction block that is composed of multiple instruction fields that are executed in parallel with instructions that are the processing unit of this instruction processing device, the conditional branch of that instruction block For each instruction other than the instruction, an instruction execution selection means is provided to select whether or not to execute the instruction at the time of conditional branching, thereby reducing the number of empty instruction fields in the instruction block containing the conditional branching instruction.

【００３１】本発明において、命令ブロックを連結した
自然なループにおけるテイラ・ブロックの空きフィール
ド数よりもそのループのヘッダ・ブロックの空き命令フ
ィールド数を除いた命令数が少い場合、前記ヘッダ・ブ
ロックの入口ブロックの命令を前記テイラ・ブロックの
空きフィールドに移動し、この移動したヘッダ・ブロッ
クの入口ブロックの命令を新たな基本ブロックに複写し
、そのヘッダ・ブロックの入口ブロックをこのヘッダ・
ブロックから削除してこのループを再編成することによ
り、前記ループ内の命令ブロック数を少くし、コードを
最適化するようにすることができる。In the present invention, if the number of instructions excluding the number of free instruction fields of the header block of the loop is smaller than the number of free fields of the Taylor block in a natural loop in which instruction blocks are connected, the header block Move the instruction of the entry block of the header block to a free field of the Taylor block, copy the instruction of the entry block of this moved header block to a new basic block, and copy the entry block of the header block to the empty field of the Taylor block.
By reorganizing this loop by removing it from blocks, the number of instruction blocks within the loop can be reduced and the code optimized.

【００３２】[0032]

【実施例】図１は本発明の一実施例の並列パイプライン
命令処理装置の命令ブロックが実行される場合のブロッ
ク図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram of a parallel pipeline instruction processing device according to an embodiment of the present invention when instruction blocks are executed.

【００３３】命令列フェッチ手段２は、アドレス・バス
３２で指定される命令ブロックを、命令ブロック・メモ
リ１から命令ブロック・バス３１を介してフェッチする
。図２に示した命令（１），命令（２），命令（３）、
および分岐命令の各命令は、命令バス３３〜３６を介し
て、それぞれ命令実行ユニット３〜５、および分岐命令
実行ユニット６に転送される。各命令実行ユニット３〜
５は、転送された各命令をデコードし、各命令で使用す
るオペランドを各々レジスタ・リード・バス３７〜４０
を介してデータ・レジスタ７からフェッチする。そして
、各命令をそれぞれ実行し、その命令実行結果をレジス
タ・ライト・バス４９〜５２を介してそれぞれデータ・
レジスタ７に書き戻す。The instruction string fetch means 2 fetches the instruction block specified by the address bus 32 from the instruction block memory 1 via the instruction block bus 31. Instruction (1), instruction (2), instruction (3) shown in FIG.
and branch instructions are transferred to instruction execution units 3 to 5 and branch instruction execution unit 6 via instruction buses 33 to 36, respectively. Each instruction execution unit 3~
5 decodes each transferred instruction and sends the operands used in each instruction to register read buses 37 to 40, respectively.
from data register 7 via . Then, each instruction is executed, and the instruction execution results are sent as data via the register write buses 49 to 52.
Write back to register 7.

【００３４】一方、分岐命令実行ユニット６は、転送さ
れた分岐命令をデコードし、分岐先アドレスの生成を実
行する。それと同時に、内部に保持している命令ブロッ
ク・アドレスをインクリメントし、次アドレスを生成す
る。そして、レジスタ・リード・バス４０を介して与え
られたオペランドを参照して分岐条件の成立／不成立を
判定し、分岐が発生する場合には分岐先アドレスを、分
岐が発生しない場合には次アドレスをアドレス・バス３
２に出力する。On the other hand, the branch instruction execution unit 6 decodes the transferred branch instruction and generates a branch destination address. At the same time, the instruction block address held internally is incremented to generate the next address. Then, it refers to the operand given via the register read bus 40 to determine whether the branch condition is satisfied or not, and if a branch occurs, the branch destination address is determined, and if the branch does not occur, the next address is determined. address bus 3
Output to 2.

【００３５】また、分岐命令が同時に次アドレスをレジ
スタへ格納する動作を伴うもの、すなわちブランチ・ア
ンド・リンク命令の場合には、分岐先アドレスがアドレ
ス・バス３２に出力されるとともに、次アドレスがレジ
スタ・ライト・バス５２を介してデータ・レジスタ７に
書き戻される。この並列パイプライン命令処理装置の第
４の命令実行ユニットは分岐命令専用であり、条件分岐
による遅延を生ずることなく次アドレス、または分岐先
アドレスをアドレス・バスに出力することが可能である
。Furthermore, in the case of a branch instruction that simultaneously stores the next address in a register, that is, a branch and link instruction, the branch destination address is output to the address bus 32, and the next address is It is written back to data register 7 via register write bus 52. The fourth instruction execution unit of this parallel pipeline instruction processing device is dedicated to branch instructions, and can output the next address or branch destination address to the address bus without causing a delay due to conditional branching.

【００３６】さて、以上説明した分岐遅延の生じない並
列パイプライン命令処理装置における条件分岐命令と実
行選択機能付き命令を含むプログラム・シーケンスの流
れについて説明する。Now, the flow of a program sequence including a conditional branch instruction and an instruction with an execution selection function in the above-described parallel pipeline instruction processing device without branch delay will be explained.

【００３７】この並列パイプライン命令処理装置におけ
る命令は、図２に示すように、４つの命令からなる命令
ブロックである。各命令ブロックに埋め込まれる４つの
命令フィールドを、ＭＳＢ側からフィールド１，フィー
ルド２，フィールド３，フィールド４とよび、フィール
ド４が分岐命令フィールドとなる。The instructions in this parallel pipeline instruction processing device are an instruction block consisting of four instructions, as shown in FIG. Four instruction fields embedded in each instruction block are called field 1, field 2, field 3, and field 4 from the MSB side, and field 4 is a branch instruction field.

【００３８】この場合のプログラム・シーケンスは、図
３に示すように、命令ブロック１には分岐命令がなく、
命令フィールド４にはＮＯＰ命令が埋め込まれている。従って、この命令ブロック１はそのまま実行される。命
令ブロック２には、命令フィールド４に条件分岐命令が
埋め込まれている。このとき分岐条件が成立すれば、プ
ログラムのシーケンスは命令ブロックＡに移る。分岐条
件が成立しなければ、シーケンスはそのまま次の命令ブ
ロック３に移る。従って、命令４〜６は通常であれば分
岐命令成立時にも、分岐命令不成立時にも実行される。As shown in FIG. 3, the program sequence in this case is that there is no branch instruction in instruction block 1;
A NOP instruction is embedded in the instruction field 4. Therefore, this instruction block 1 is executed as is. In the instruction block 2, a conditional branch instruction is embedded in the instruction field 4. If the branch condition is met at this time, the program sequence moves to instruction block A. If the branch condition is not met, the sequence continues to the next instruction block 3. Therefore, instructions 4 to 6 are normally executed both when a branch instruction is established and when a branch instruction is not established.

【００３９】いま、図３の命令ブロック２が、図４に示
すような、命令ブロックであったとする。この図４の命
令ブロックでは、命令フィールド１，３に論理シフト命
令（ｌｓｌ…Ｌｏｇｉｃａｌ　　Ｓｉｆｔ　　Ｌｅｆｔ
）、命令フィールド２に加算命令、命令フィールド４に
は条件分岐命令（レジスタ“ｒ２３”の値が“０”の場
合に分岐条件成立）が埋め込まれている。Assume now that instruction block 2 in FIG. 3 is an instruction block as shown in FIG. In the instruction block of FIG. 4, instruction fields 1 and 3 contain logical shift instructions (lsl...Logical Shift Left
), an addition instruction is embedded in instruction field 2, and a conditional branch instruction (branch condition is satisfied when the value of register "r23" is "0") is embedded in instruction field 4.

【００４０】ここで、命令フィールド２の加算命令の先
頭には、“＋”記号が付加されている。図５はこの加算
命令のオブジェクト・フォーマットの模式的配置図であ
る。このオブジェクト・コードは３２ビット長で、オペ
コード，ソース・レジスタ１，ソース・レジスタ２，デ
スティネーション・レジスタの各フィールドにオブジェ
クト・コードが埋め込まれている。図５の０，１ビット
目は、命令実行選択フィールドであり、その命令ブロッ
クの条件分岐命令の条件の状態により、命令を実行する
か否かを選択するためのフィールドである。Here, a "+" sign is added to the beginning of the addition instruction in instruction field 2. FIG. 5 is a schematic layout diagram of the object format of this addition instruction. This object code has a length of 32 bits, and the object code is embedded in each field of the operation code, source register 1, source register 2, and destination register. The 0th and 1st bits in FIG. 5 are instruction execution selection fields, which are used to select whether or not to execute an instruction depending on the condition of the conditional branch instruction in the instruction block.

【００４１】図４の命令ブロックの場合、命令フィール
ド４に条件分岐命令があり、分岐条件が成立した場合に
のみ命令フィールド２の命令を実行したい。このため、
フィールド２の命令の先頭に“＋”記号が付加されてい
る。この時、オブジェクト・コードの命令実行選択フィ
ールドにはコード“０１”が埋められる。これにより、
この命令ブロック実行時に分岐条件が成立すれば命令フ
ィールド２の命令は実行され、条件が成立しなかった場
合にはこの命令は実行されない。In the case of the instruction block shown in FIG. 4, there is a conditional branch instruction in instruction field 4, and it is desired to execute the instruction in instruction field 2 only when the branch condition is satisfied. For this reason,
A “+” sign is added to the beginning of the instruction in field 2. At this time, the code "01" is filled in the instruction execution selection field of the object code. This results in
If the branch condition is met when this instruction block is executed, the instruction in instruction field 2 is executed, and if the condition is not met, this instruction is not executed.

【００４２】逆に、分岐条件が成立しなかった場合にの
みその命令を実行したいときには、その命令の先頭に“
−”記号を付加する。このときオブジェクト・コードの
命令実行選択フィールドには、コード“１０”が埋めら
れる。また、分岐条件の成立，不成立に関わらず実行し
てよい命令の場合には、そのまま命令を記述する。この
ときオブジェクト・コードの命令実行選択フィールドに
は、コード“００”が埋め込まれる。On the other hand, if you want to execute an instruction only if the branch condition is not satisfied, write " at the beginning of the instruction.
-" symbol is added. At this time, the code "10" is filled in the instruction execution selection field of the object code. Also, if the instruction can be executed regardless of whether the branch condition is met or not, the instruction execution selection field of the object code is filled with the code "10". An instruction is written. At this time, a code "00" is embedded in the instruction execution selection field of the object code.

【００４３】次に、図４に示す命令ブロックが実行され
る手順を説明する。Next, the procedure for executing the instruction block shown in FIG. 4 will be explained.

【００４４】命令ブロック・フェッチ手段１２は、アド
レス・バス３２で指定される命令ブロックを命令ブロッ
ク・メモリ１から命令ブロック・バス３１を介してフェ
ッチする。そして、図４の各フィールドの命令（命令４
，命令５，命令６，分岐命令）を命令バス３３〜３６を
介して、オペランド・フェッチ手段１４〜１７に転送し
て出力する。このとき、図４の命令フィールド２につい
ては、命令実行選択フィールドが“＋”であるため、実
行先端プラス信号５３が命令実行手段２０に出力される
。その他の命令フィールドについては、実行選択フィー
ルドに“００”コードが埋まっているため、実行選択信
号が出力されない。The instruction block fetch means 12 fetches the instruction block specified by the address bus 32 from the instruction block memory 1 via the instruction block bus 31. Then, the instructions for each field in FIG. 4 (instruction 4
, instruction 5, instruction 6, branch instruction) are transferred to the operand fetch means 14-17 via instruction buses 33-36 and output. At this time, regarding the instruction field 2 in FIG. 4, since the instruction execution selection field is "+", the execution tip plus signal 53 is output to the instruction execution means 20. Regarding the other instruction fields, since the execution selection field is filled with the "00" code, no execution selection signal is output.

【００４５】次にオペランド・フェッチ手段１４〜１７
は、転送された命令をデコードして必要なオペランドを
データ・レジスタ１３からレジスタ・リード・バス３７
〜４０を介してフェッチする。以上の動作については４
つの命令実行ユニット３〜５について同じである。Next, the operand fetch means 14 to 17
decodes the transferred instruction and transfers the necessary operands from the data register 13 to the register read bus 37.
~40. For the above operation, please refer to 4.
The same is true for the three instruction execution units 3-5.

【００４６】分岐命令実行ユニット６のアドレス生成手
段１８は、内部に保持している命令ブロック・アドレス
をインクリメントし、次アドレスを生成する。これと同
時に、命令バス３６を介して与えられた分岐命令をデコ
ードし、分岐先アドレスの生成を実行する。そしてソー
ス・オペランド・バス４１を介して与えられたオペラン
ドを参照して分岐条件の成立／不成立を判定し、分岐が
発生する場合には分岐先アドレスを、分岐が発生しない
場合には次アドレスをアドレス・バス３２に出力する。The address generating means 18 of the branch instruction execution unit 6 increments the internally held instruction block address and generates the next address. At the same time, the branch instruction provided via the instruction bus 36 is decoded and a branch destination address is generated. Then, it refers to the operand given via the source operand bus 41 to determine whether the branch condition is satisfied or not, and if a branch occurs, the branch destination address is determined, and if the branch does not occur, the next address is determined. Output to address bus 32.

【００４７】もし、分岐条件が成立し分岐が発生する場
合には、分岐条件成立信号５４の“１”が命令実行手段
１９〜２１に出力され、分岐が発生しない場合には、分
岐条件成立信号５４の“０”が出力される。そして、分
岐命令が、次のアドレスの退避を必要とするブランチ・
アンド・リンク命令だった場合には、分岐先アドレスが
アドレス・バス３２に出力され、分岐条件成立信号“１
”が出力されるとともに、次アドレスがデスティネーシ
ョン・オペランド・バス４５を介してオペランド・ライ
ト手段２２に出力される。オペランド・ライト手段２２
は、次アドレスをオペランドとしてレジスタ・ライト・
バス５２を介してデータ・レジスタ１３に書き込む。If the branch condition is satisfied and a branch occurs, "1" of the branch condition satisfied signal 54 is output to the instruction execution means 19 to 21, and if the branch does not occur, the branch condition satisfied signal 54 is outputted to the instruction execution means 19 to 21. 54 "0"s are output. Then, the branch instruction is a branch that requires saving the next address.
If it is an AND link instruction, the branch destination address is output to the address bus 32, and the branch condition fulfillment signal “1” is output.
” is output, and at the same time, the next address is output to the operand write means 22 via the destination operand bus 45.
performs a register write with the next address as the operand.
Write to data register 13 via bus 52.

【００４８】また命令実行手段１９，２１は、ソース・
オペランド・バス４２，４４を介して転送されたオペラ
ンドを用いて命令を実行し、その結果をデスティネーシ
ョン・レジスタ・バス４６，４８を介してオペランド・
ライト手段２３，２５に転送する。オペランド・ライト
手段２３，２５は各結果オペランドを各命令が指定する
レジスタにレジスタ・ライト・バス４９，５１を介して
それぞれ書き込む。The instruction execution means 19 and 21 also execute the source code.
The instructions are executed using the operands transferred via operand buses 42 and 44, and the results are transferred to the operands via destination register buses 46 and 48.
The data is transferred to the write means 23 and 25. Operand write means 23 and 25 write each result operand to the register specified by each instruction via register write buses 49 and 51, respectively.

【００４９】一方、命令実行手段２０は、実行選択プラ
ス信号５３がアクティブになっている。このとき分岐条
件成立信号５４がともにアクティブになっていれば、命
令実行手段２０はソース・オペランド・バス４３を介し
て転送されたオペランドを用いて命令を実行し、その結
果をデスティネーション・レジスタ・バス４７を介して
オペランド・ライト手段２４に転送し、オペランド・ラ
イト手段２４はその結果オペランドを命令が指定するレ
ジスタにレジスタ・ライト・バス５０を介してデータ・
レジスタ１３に書き込む。On the other hand, in the instruction execution means 20, the execution selection plus signal 53 is active. At this time, if both branch condition fulfillment signals 54 are active, the instruction execution means 20 executes the instruction using the operands transferred via the source operand bus 43, and stores the result in the destination register. The data is transferred to the operand write means 24 via the bus 47, and the operand write means 24 transfers the resulting operand to the register specified by the instruction via the register write bus 50.
Write to register 13.

【００５０】しかし、分岐条件成立信号５４がインアク
ティブであれば、実行選択プラス信号５３がアクティブ
なので、この命令の実行は行われず、従って、この場合
には命令実行手段２０およびオペランド・ライト手段２
４はともに何もしない。However, if the branch condition fulfillment signal 54 is inactive, the execution selection plus signal 53 is active, so this instruction is not executed.
Both 4 do nothing.

【００５１】以上の実施例では、命令実行選択フィール
ドが“＋”の場合について説明したが、命令実行選択フ
ィールドが“−”の場合には、同様に命令ブロック・フ
ェッチ手段から実行選択マイナス信号が該当する命令実
行手段に出力される。このとき、分岐条件成立信号がア
クティブになっていれば、命令実行は行われず、分岐条
件成立信号がインアクティブな場合に命令が実行される
ことになる。これは、表１，表２に要約される。In the above embodiment, the case where the instruction execution selection field is "+" has been explained, but when the instruction execution selection field is "-", the execution selection minus signal is similarly sent from the instruction block fetch means. It is output to the corresponding instruction execution means. At this time, if the branch condition satisfied signal is active, the instruction will not be executed, and if the branch condition satisfied signal is inactive, the instruction will be executed. This is summarized in Tables 1 and 2.

【００５２】表１の命令実行選択フィールドと実行選択
信号５３の関係を示す表においては、命令実行選択フィ
ールドが“＋”の時は分岐条件が成立した場合に限りそ
の命令を実行し、“−”の時は分岐条件が不成立の場合
に限り命令を実行する。命令実行選択フィールドが何も
指定されていない場合は、分岐条件の成立の如何に関わ
らず命令を実行する。In the table showing the relationship between the instruction execution selection field and the execution selection signal 53 in Table 1, when the instruction execution selection field is "+", the instruction is executed only if the branch condition is satisfied; ”, the instruction is executed only if the branch condition is not satisfied. If nothing is specified in the instruction execution selection field, the instruction is executed regardless of whether the branch condition is met.

【００５３】[0053]

【００５４】表２はその実行選択信号と分岐条件成立信
号の値と命令実行手段による結果の出力の関係を示す表
であり、命令実行手段の結果が出力されないということ
は命令が実行されたいことと同等である。Table 2 is a table showing the relationship between the values of the execution selection signal and branch condition fulfillment signal and the output of the result by the instruction execution means, and the fact that the result of the instruction execution means is not output means that the instruction is desired to be executed. is equivalent to

【００５５】[0055]

【００５６】以上説明したように、この方式は、条件分
岐命令を含む命令ブロックのその他複数の命令フィール
ドの各命令に、分岐条件の成立・不成立に応じて命令実
行を選択することが可能な命令実行選択機能を備えてい
るので、条件分岐命令を含む命令ブロックに、分岐条件
が成立あるいは不成立の場合にだけ実行する命令を埋め
込むことが可能になり、空きフィールドの増加を防ぐこ
とが出来る。As explained above, in this method, each instruction in a plurality of other instruction fields of an instruction block including a conditional branch instruction has an instruction whose execution can be selected depending on whether a branch condition is met or not. Since the execution selection function is provided, it is possible to embed an instruction to be executed only when a branch condition is met or not met in an instruction block containing a conditional branch instruction, and an increase in empty fields can be prevented.

【００５７】このＶＬＩＷ型並列命令処理方式において
、一つの命令ブロックに、なるべく多くの命令を埋め込
むように命令を並列化していく命令並列化方式として、
トップ・アライン、ミドル・アライン命令並列化方式が
ある。In this VLIW type parallel instruction processing method, as an instruction parallelization method in which instructions are parallelized so that as many instructions as possible are embedded in one instruction block,
There are top aligned and middle aligned instruction parallelization methods.

【００５８】トップ・アライン命令並列化方式は、実行
サイクルの早い命令ブロックから順番に、命令を埋め込
んでいく。つまり、最初はサイクル０の命令ブロックに
なるべく多くの命令を埋め込み、サイクル０の命令ブロ
ックにそれ以上命令が埋められなくなったら、次にサイ
クル１の命令ブロックを対象として命令を埋め込む。サ
イクル１の命令ブロックがそれ以上埋められなくなった
ら、次にサイクル２の命令ブロック、…というようにな
る。In the top-aligned instruction parallelization method, instructions are embedded in order from the instruction block with the earliest execution cycle. That is, at first, as many instructions as possible are embedded in the instruction block of cycle 0, and when no more instructions can be embedded in the instruction block of cycle 0, instructions are then embedded in the instruction block of cycle 1. When the instruction block of cycle 1 can no longer be filled in, the instruction block of cycle 2, and so on.

【００５９】さて、サイクルｎの命令ブロックに命令を
埋めて行くに当たり、そのサイクルの命令ブロックに埋
め込むことができる命令を、データ・レディ状態の命令
という。Now, when embedding instructions in the instruction block of cycle n, the instructions that can be embedded in the instruction block of that cycle are called instructions in the data ready state.

【００６０】ここのデータ・レディ状態の命令とは、そ
の命令で参照するオペランド・データをそれ以前に直接
評価している命令があれば、その命令が現サイクルより
も前のサイクルの命令ブロックにすでに配置されている
ことをいう。つまり、扱うオペランド・データがそれ以
前のサイクルで評価済みで、正しい値が設定されている
とき、はじめてその命令はオペランド・データを参照可
能な状態になり、データ・レディ状態といえる。[0060] An instruction in the data ready state here means that if there is an instruction that directly evaluates the operand data referenced by that instruction, that instruction is in the instruction block of the cycle before the current cycle. It means that it has already been placed. In other words, when the operand data to be handled has been evaluated in the previous cycle and the correct value has been set, the instruction can refer to the operand data for the first time, and can be said to be in a data ready state.

【００６１】このように、ある命令を実行するに当たり
、その命令が扱うオペランドとデータの依存関係がある
他の命令で、その命令よりも前のサイクルで実行されな
ければならないような命令を、その命令の前者とよび、
その命令よりも後または同じサイクルで実行されなけれ
ばならない命令をその命令の後者と呼ぶ。従って、デー
タ・レディ状態の命令とは、その命令のすべての前者が
配置済みであるような命令をいう。[0061] In this way, when executing a certain instruction, an instruction that has a data dependency with the operand handled by that instruction and must be executed in a cycle before that instruction is executed. It is called the former of the command,
An instruction that must be executed after or in the same cycle as the instruction is called the latter of the instruction. Therefore, an instruction in a data ready state is an instruction for which all former instructions have been placed.

【００６２】トップ・アライン方式ではデータ・レディ
状態の命令の中から、なるべくその命令ブロックに多く
の命令が埋め込めるように命令を選んで命令ブロックを
構成していく。もし、現サイクルの命令ブロックに配置
可能なデータ・レディ状態の命令が数多く存在する場合
、トップ・アライン方式は、データ・レディ状態の命令
の中から、最も命令の高さの高いものから順に配置して
ゆく。In the top-align method, an instruction block is constructed by selecting instructions from among the instructions in the data ready state so that as many instructions as possible can be embedded in the instruction block. If there are many data-ready instructions that can be placed in the current cycle's instruction block, the top-align method places the instructions in the data-ready state in descending order of height. I will do it.

【００６３】この命令の高さとは、基本ブロック内で、
その命令の後者、そのまた後者というように命令をたど
っていき、後者を持たない命令にたどりつくまでの道の
りの最も長い（最もサイクルの長い）パスのサイクル数
のことである。The height of this instruction is as follows within the basic block:
This is the number of cycles of the longest (longest cycle) path by tracing the latter of that instruction, then the latter, and so on until reaching an instruction that does not have the latter.

【００６４】このようなトップ・アライン方式では、基
本ブロックの入口に近い命令ブロックで、比較的命令フ
ィールドの命令充填率がよく、基本ブロックの出口に近
くなるに従って充填率は悪くなる。In such a top-align method, the instruction filling rate of the instruction field is relatively good in the instruction block near the entrance of the basic block, and the filling rate becomes worse as the instruction field gets closer to the exit of the basic block.

【００６５】次にミドル・アライン命令並列化方式につ
いて説明する。Next, the middle aligned instruction parallelization method will be explained.

【００６６】ミドル・アライン命令並列化方式では、ま
ず、基本ブロック内のすべての命令の高さの中から、基
本ブロックの中で最も高い命令の高さを２で割った値を
もとめ、それを基本の高さとする。In the middle-aligned instruction parallelization method, first, from among the heights of all instructions in a basic block, the value of the height of the highest instruction in the basic block is divided by 2, and this value is calculated. Use the basic height.

【００６７】ここで基本の高さをもつ命令を最初に配置
するブロックを命令ブロック０とし、命令ブロック０よ
り前に実行される命令ブロックをマイナス・ブロック、
命令ブロック０よりも後に実行される命令ブロックをプ
ラス・ブロックとよぶ。このとき、命令を埋めていく対
象となる命令ブロックは、図６に示すように最初に命令
ブロック０、次にマイナス・ブロックの命令ブロック１
、次にプラス・ブロックの命令ブロック２、次にマイナ
ス・ブロックの命令ブロック３…というような順になる
。Here, the block in which an instruction with the basic height is placed first is designated as instruction block 0, and the instruction blocks executed before instruction block 0 are designated as minus blocks.
An instruction block executed after instruction block 0 is called a plus block. At this time, the instruction blocks to be filled with instructions are first instruction block 0, then instruction block 1 of the minus block, as shown in FIG.
, then instruction block 2 of the plus block, then instruction block 3 of the minus block, and so on.

【００６８】このミドル・アライン方式では、最初にサ
イクル０の命令ブロックに基本の高さの命令を配置し、
ここで配置された命令をもとにして、以降、なるべくサ
イクル０の命令ブロックに近い命令ブロックになるべく
多くの命令を埋め込むようにする。配置対象とする命令
ブロックは前述のようにマイナス・ブロック、プラス・
ブロック交互になる。In this middle alignment method, first, an instruction of the basic height is placed in the instruction block of cycle 0,
Based on the instructions placed here, as many instructions as possible are subsequently embedded in the instruction block as close as possible to the instruction block of cycle 0. The instruction blocks to be placed are minus blocks, plus blocks, as described above.
Blocks alternate.

【００６９】基本的には、マイナス・ブロックの命令ブ
ロックには、そのサイクルより後で実行されるブロック
に配置済みの命令の前者で、命令の高さがすでに配置済
みの命令より１大きい命令を配置することができる。も
し、配置対象としているマイナス・ブロックに配置可能
な命令が数多く存在する場合には、それらの命令の中か
ら最も命令の深さの深いものから順に配置していく。Basically, an instruction block in a minus block contains an instruction whose height is 1 greater than the former instruction already placed in the block to be executed after that cycle. can be placed. If there are many instructions that can be placed in the minus block that is the placement target, the instructions are placed in order from those with the deepest instructions.

【００７０】ここでいう命令の深さとは、基本ブロック
内で、その命令の前者、そのまた前者というように命令
をたどっていき、最後に前者のない命令にたどりつくま
での道のりの最も長い（最もサイクルの長い）パスのサ
イクル数のことである。[0070] The depth of an instruction here refers to the longest path (the longest path) in a basic block, tracing instructions such as the former of that instruction, then the former of that instruction, and finally arriving at an instruction without the former. This refers to the number of cycles in a path (long cycle).

【００７１】同様にプラス・ブロックの命令ブロックに
は、そのサイクルより前で実行されるブロックに配置済
みの命令の後者で、その命令の深さが、すでに配置済み
の命令より１大きい命令を配置することができる。もし
、配置対象となる命令が数多く存在する場合には、それ
らの命令の中から最も高さの高さ命令から配置していく
。Similarly, in the instruction block of the plus block, place an instruction that is the latter of the instructions already placed in the block to be executed before that cycle, and whose depth is 1 greater than the already placed instructions. can do. If there are many instructions to be placed, the instructions are placed starting from the instruction with the highest height.

【００７２】図７（ａ），（ｂ）に命令実行選択機能を
利用して行われるループ最適化の例を示す。まず図７（
ａ）のようなプログラム・フローの基本ブロックを考え
る。基本ブロック１の最初の命令ブロックには命令フィ
ールド（１），（２）にだけ有効な命令が埋め込まれて
いる。そして、基本ブロック２の最終命令ブロックには
分岐フィールド（３）にだけ、条件分岐命令が埋め込ま
れていたとする。基本ブロック２の最終命令ブロックの
分岐条件が成立すると、プログラム・フローは、基本ブ
ロック１に戻るようなループになっている。FIGS. 7A and 7B show examples of loop optimization performed using the instruction execution selection function. First, Figure 7 (
Consider a basic block of a program flow as shown in a). In the first instruction block of basic block 1, instructions valid only in instruction fields (1) and (2) are embedded. Assume that the final instruction block of basic block 2 has a conditional branch instruction embedded only in branch field (3). When the branch condition of the final instruction block of basic block 2 is satisfied, the program flow loops back to basic block 1.

【００７３】このとき命令実行選択機能を利用して命令
を図７（ｂ）のように移動することができる。つまり、
基本ブロック１の先頭命令ブロックに埋め込まれている
命令（１），（２）を、基本ブロック２の最終ブロック
の空き命令フィールドに“＋”付きで移動し、基本ブロ
ック１の最初の命令ブロックを削除する。それと同時に
、基本ブロック０を新たに作り、基本ブロック１の直前
に挿入する。以上の命令の操作により、基本ブロック１
の命令ブロック数が減り、結果的にプログラムの意味を
変えることなくループにかかるクロック数を減少するこ
とができる。このような命令実行選択機能を利用した最
適化を行うには、基本ブロックの先頭ブロックと最終ブ
ロックにはなるべく３つの空きフィールドがあることが
望ましい。At this time, the instruction execution selection function can be used to move the instruction as shown in FIG. 7(b). In other words,
Move instructions (1) and (2) embedded in the first instruction block of basic block 1 to the empty instruction field of the last block of basic block 2 with "+" attached, and move the instructions (1) and (2) embedded in the first instruction block of basic block 1 to delete. At the same time, a new basic block 0 is created and inserted immediately before basic block 1. By operating the above instructions, basic block 1
The number of instruction blocks in the program is reduced, and as a result, the number of clock cycles required for the loop can be reduced without changing the meaning of the program. In order to perform optimization using such an instruction execution selection function, it is desirable that the first block and the last block of basic blocks have as many as three empty fields.

【００７４】この場合、もし基本ブロック１の先頭命令
ブロックに埋め込まれている命令数が基本ブロック２の
最終命令ブロックの空き命令フィールド数より多かった
場合には、基本ブロック１の先頭命令ブロックのすべて
の命令を基本ブロック２に移動することができないので
、基本ブロック１の命令ブロック数を減らすことができ
ず、最適化ができなくなる。従って、この命令実行選択
機能を利用しての最適化を効果的に行うには、トップ・
アライン方式より、ミドル・アライン方式で並列化され
ることが望ましい。In this case, if the number of instructions embedded in the first instruction block of basic block 1 is greater than the number of empty instruction fields in the last instruction block of basic block 2, all of the first instruction blocks of basic block 1 are cannot be moved to basic block 2, the number of instruction blocks in basic block 1 cannot be reduced, and optimization cannot be performed. Therefore, in order to effectively perform optimization using this instruction execution selection function, it is necessary to
It is preferable to parallelize using the middle alignment method rather than the alignment method.

【００７５】ここで並列化の対象となる基本ブロックの
最終命令が分岐系命令でない場合について考えてみる。このとき、その基本ブロックの最終命令ブロックに他の
基本ブロックから命令が“＋”“−”付きで移動してく
ることはなく、最終命令ブロックに空き命令フィールド
が多い必要はない。逆にその基本ブロックの先頭命令ブ
ロックに配置された命令は、その基本ブロックの前に実
行される基本ブロックに“＋”“−”付きで移動する可
能性がある。従って、その基本ブロックの最終命令が分
岐系命令でない場合には、基本ブロックの先頭ブロック
において、空き命令フィールドが多くなるように並列化
してある方が、命令実行選択機能を用いての最適化が行
われやすくなる。Let us now consider the case where the final instruction of the basic block to be parallelized is not a branch instruction. At this time, instructions will not be moved from other basic blocks to the final instruction block of the basic block with "+" or "-" attached, and there is no need for the final instruction block to have many empty instruction fields. Conversely, an instruction placed in the first instruction block of that basic block may be moved to a basic block executed before that basic block with "+" or "-" attached. Therefore, if the final instruction of the basic block is not a branch instruction, optimization using the instruction execution selection function is better if the first block of the basic block is parallelized so that there are more free instruction fields. It becomes easier to do.

【００７６】この従来のミドル・アライン並列化方式で
は、基本ブロック内の命令数や、各命令のデータの依存
関係に応じて、先頭ブロックよりも最終ブロックにおけ
る空き命令フィールドの方が多い場合が生じる。もし、
ある基本ブロックが分岐系命令を含まない基本ブロック
で、命令数が７、各命令間にデータの依存関係がまった
くなかった場合を考えてみる。このときベース・ブロッ
ク、マイナス・ブロック、プラス・ブロックの順で命令
を配置した場合には、表３（ａ）のように最終命令ブロ
ックで命令の空きフィールドが増え、ベース・ブロック
、プラス・ブロック、マイナス・ブロックの順で命令を
配置した場合には、図表３（ｂ）に示すように先頭命令
ブロックで空き命令フィールドが多くなる。In this conventional middle-aligned parallelization method, depending on the number of instructions in a basic block and the data dependencies of each instruction, there may be more free instruction fields in the last block than in the first block. . if,
Consider a case where a certain basic block does not include branch instructions, has seven instructions, and has no data dependencies at all between the instructions. At this time, if the instructions are arranged in the order of base block, minus block, and plus block, the empty field for instructions increases in the final instruction block as shown in Table 3 (a), and the base block, plus block , minus block, the number of empty instruction fields increases in the first instruction block as shown in Figure 3(b).

【００７７】[0077]

【００７８】このようにミドル・アライン方式では、基
本ブロック内の命令数や、各命令間のデータの依存関係
に応じて、たとえ、基本ブロックに分岐命令が含まれて
いないことがわかっていても、先頭命令ブロックでの空
き命令フィールドを多くするように配置することは困難
である。In this way, in the middle alignment method, depending on the number of instructions in a basic block and the data dependencies between each instruction, even if it is known that a basic block does not contain a branch instruction, , it is difficult to arrange so as to increase the number of empty instruction fields in the first instruction block.

【００７９】つまり、従来のミドル・アライン命令並列
化方式においても、分岐系命令を含まない基本ブロック
を並列化する場合には、命令実行選択機能を用いての最
適化が場合によっては効果的に行われないことがある。In other words, even in the conventional middle-aligned instruction parallelization method, when parallelizing basic blocks that do not include branch instructions, optimization using the instruction execution selection function may be effective in some cases. It may not be done.

【００８０】図８は本発明の第２の実施例を説明するフ
ローチャートである。本実施例は、命令パイプライン方
式の命令処理装置を４つ並列に並べ、すべての命令を１
クロックで実行可能なＶＬＩＷ型並列パイプライン処理
装置の場合であり、命令パイプラインのステージの１つ
は分岐命令だけを実行する分岐命令専用ステージとなっ
ており、条件分岐による遅延が生じず、結果としてすべ
ての命令を１クロックで実行可能になっているものとす
る。FIG. 8 is a flowchart illustrating a second embodiment of the present invention. In this embodiment, four instruction processing units using an instruction pipeline system are arranged in parallel, and all instructions are processed in one
This is the case of a VLIW type parallel pipeline processing device that can be executed by a clock, and one of the stages of the instruction pipeline is a branch instruction-only stage that executes only branch instructions, so there is no delay due to conditional branches, and the result is Assume that all instructions can be executed in one clock.

【００８１】図８のステップＳ１において、並列化の対
象となる基本ブロックの最終命令が分岐系命令であるか
否かを調べる。ここで最終命令が分岐系命令であれば、
このブロックに対しては、ステップＳ２でミドル・アラ
イン方式により並列化を行う。もしも、最終命令が分岐
系命令でなければ、このブロックに対して、ステップＳ
３でボトム・アライン方式により並列化を行う。In step S1 of FIG. 8, it is checked whether the final instruction of the basic block to be parallelized is a branch instruction. Here, if the final instruction is a branch instruction,
This block is parallelized using the middle alignment method in step S2. If the final instruction is not a branch instruction, step S is executed for this block.
Parallelization is performed using the bottom alignment method in step 3.

【００８２】図９，１０は図８のミドル・アライン並列
化方式およびボトム・アライン並列化方式についてフロ
ー・チャートを示す。FIGS. 9 and 10 show flow charts for the middle align parallelization method and the bottom align parallelization method of FIG. 8.

【００８３】まず、図９のフロー・チャートに従ってミ
ドル・アライン方式について簡単に説明する。ここで表
４にある基本ブロックに対応する命令文の列を、表４の
各命令の高さと深さを表５に、各命令の前者，後者に当
たる命令とその２命令のデータ依存関係を表６に示す。First, the middle align method will be briefly explained according to the flow chart of FIG. Here, Table 4 shows the sequence of instruction statements corresponding to the basic block, Table 5 shows the height and depth of each instruction in Table 4, and Table 5 shows the former and latter instructions of each instruction and the data dependence relationship between the two instructions. 6.

【００８４】[0084]

【００８５】[0085]

【００８６】[0086]

【００８７】ここでは、データ依存関係の種類として、
次の３種類を考慮する。[0087] Here, the types of data dependencies are as follows:
The following three types are considered.

【００８８】１．ＷＲ（Ｗｒｉｔｅ−Ｒｅａｄ）辺で結ばれる命令前
者の命令で書き込んだデータを、後者の命令で参照して
いる。1. The data written by the former instruction connected by the WR (Write-Read) side is referenced by the latter instruction.

【００８９】２．ＲＷ（Ｒｅａｄ−Ｗｒｉｔｅ）辺で結ばれる命令前
者の命令で参照しているデータに、後者の命令で書き込
んでいる。2. Instructions connected by the RW (Read-Write) side The latter instruction writes data referenced by the former instruction.

【００９０】３．ＷＷ（Ｗｒｉｔｅ−Ｗｒｉｔｅ）辺で結ばれる命令
前者の命令で書き込んだデータに、後者の命令でも書き
込んでいる。3. Instructions connected by the WW (Write-Write) side The latter instruction also writes data to the data written by the former instruction.

【００９１】ＷＲ辺やＷＷ辺で結ばれる命令の場合、後
者の命令は必ず前者の命令の次のクロック以降で実行さ
れなければならない。しかし、ＲＷ辺で結ばれる命令の
場合、後者の命令は前者の命令と同一クロックか、また
はそれ以降で実行されれば良いことになる。In the case of instructions connected by the WR side or the WW side, the latter instruction must always be executed after the clock following the former instruction. However, in the case of instructions connected on the RW side, the latter instruction only needs to be executed at the same clock as the former instruction or at a later time.

【００９２】図９のステップＳ１１で、その基本ブロッ
ク内のすべての命令の高さの中から、基本となる高さを
求める。これは、その基本ブロックの中で最も高い命令
の高さを２で割った値（小数点切り上げ）とする。表４
の基本ブロックの場合、基本の高さは３になる（表５参
照）。In step S11 of FIG. 9, a basic height is determined from among the heights of all instructions in the basic block. This is the value obtained by dividing the height of the highest instruction in the basic block by 2 (rounding up the decimal point). Table 4
For the basic block, the basic height will be 3 (see Table 5).

【００９３】なお、これ以降、基本の高さを持つ命令を
最初に配置するブロックをベース・ブロックとし、ベー
ス・ブロックより前に実行される命令ブロックをマイナ
ス・ブロック、ベース・ブロックよりも後に実行される
ブロックをプラス・ブロックとよぶ。このとき、命令を
埋めていく対象となる命令ブロックは、図６に示すよう
に最初にベース・ブロックの命令ブロック０、次にマイ
ナス・ブロックの命令ブロック１、次にプラス・ブロッ
クの命令ブロック２、次にマイナス・ブロックの命令ブ
ロック３…というような順であるとする。From now on, the block in which an instruction with the basic height is placed first is defined as the base block, and the instruction block executed before the base block is defined as the minus block, and the block executed after the base block is defined as the base block. This block is called a plus block. At this time, the instruction blocks to be filled with instructions are, as shown in FIG. 6, first the base block instruction block 0, then the minus block instruction block 1, and then the plus block instruction block 2. , then instruction block 3 of the minus block, and so on.

【００９４】次にステップＳ１２において、表４の基本
ブロックの最後の命令は分岐命令であるので、分岐命令
の命令文９を退避しておく。分岐命令は必ずその基本ブ
ロックの最後の命令ブロックで実行されなければならな
いので、退避することにより、以降の配置ロジックの対
象からはずしておくのである。Next, in step S12, since the last instruction of the basic block in Table 4 is a branch instruction, statement 9 of the branch instruction is saved. Since a branch instruction must be executed in the last instruction block of its basic block, by saving it, it is removed from the target of subsequent placement logic.

【００９５】そして、ステップＳ１３で、ベース・ブロ
ックのブロック０に命令を配置していく。ステップＳ１
３でベース・ブロックに配置可能な命令は、次の３条件
に当てはまる命令である。Then, in step S13, instructions are placed in block 0 of the base block. Step S1
Instructions that can be placed in the base block in step 3 are those that meet the following three conditions.

【００９６】（１）基本の高さを持つ命令（２）すでに配置済みの命令とＲＷ辺で結ばれる命令で
、他の配置済みの命令とは依存関係のない命令（３）基本の高さ以下の高さを持ち、かつ、すべての祖
先がまだ未配置である命令この命令を配置するに当って
は、（１）に該当するすべての命令を配置し、なおかつ
空き命令フィールドがある時に（２）に該当するすべて
の命令を配置、さらに空き命令フィールドがある時に（
３）に該当する命令を配置していく。また、（１）（２
）（３）各々の条件に当てはまる命令が複数存在する場
合には、命令深さの深いものから先に配置していく。(1) Instructions that have a basic height (2) Instructions that are connected to already placed instructions by the RW side and have no dependency relationship with other placed instructions (3) Basic height An instruction that has the following height and all ancestors have not been placed yet: When placing this instruction, place all instructions that fall under (1), and if there is an empty instruction field ( Place all instructions corresponding to 2), and when there is an empty instruction field (
Arrange the instructions corresponding to 3). Also, (1) (2
) (3) If there are multiple instructions that meet each condition, the instructions with the deepest depth are placed first.

【００９７】表５の基本ブロックでは、まず、（１）の
条件に当てはまる基本の高さを持つ命令文（５），（６
）をその命令の深さの順に（６），（５）の順でベース
・ブロックに配置する。さらに（２）の条件に当てはま
る命令（２）を配置する。ここで、分岐系命令以外の命
令を配置する空き命令フィールドはなくなるので、ブロ
ック０への配置を終了する。このときの命令ブロックの
配置状態は表７（ａ）のようになる。さらに、ステップ
Ｓ１４でまだ未配置の命令が残っているので、ステップ
Ｓ１５に進む。In the basic block of Table 5, first, the command sentences (5) and (6
) are placed in the base block in the order of instruction depth (6) and (5). Furthermore, an instruction (2) that satisfies the condition (2) is placed. At this point, there is no longer an empty instruction field in which to place instructions other than branch instructions, so the placement in block 0 is completed. The arrangement state of the instruction blocks at this time is as shown in Table 7(a). Furthermore, since there are still unplaced instructions remaining in step S14, the process advances to step S15.

【００９８】[0098]

【００９９】このステップＳ１５のマイナス・ブロック
に配置可能な命令は、未配置な命令で、かつ、次の４条
件にあてはまる命令である。The instructions that can be placed in the minus block in step S15 are unplaced instructions that meet the following four conditions.

【０１００】（１）基本の高さを持つ命令（２）次ブロック以降に実行される命令ブロックに配置
された命令の前者で、高さが（後者の命令の高さ＋１）
である命令（３）現ブロックに配置済みの命令とＲＷ辺で結ばれる
前者の命令で、かつ、そのブロックの他の命令の前者で
ない命令（４）基本の高さ以下の高さを持ち、かつ、すべての祖
先がまだ未配置である命令命令を配置するに当たっては
、（１）に該当するすべての命令を配置し、なおかつ空
き命令フィールドがある時に（２）に該当するすべての
命令を配置、さらに空き命令フィールドがある時に（３
）に該当する命令を配置、さらに空き命令フィールドが
ある時に（４）に該当する命令を配置していく。また、
各々の条件に当てはめる命令が複数存在する場合には、
命令の深さの深いものから先に配置していく。(1) An instruction with a basic height (2) An instruction placed in an instruction block to be executed after the next block, whose height is (height of the latter instruction + 1)
(3) An instruction that is connected to an instruction already placed in the current block by the RW side, and is not the former instruction of other instructions in that block (4) An instruction that has a height less than or equal to the basic height, In addition, when placing instructions whose ancestors have not yet been placed, place all instructions that fall under (1), and place all instructions that fall under (2) if there is an empty instruction field. , when there are more empty instruction fields (3
), and when there is an empty instruction field, an instruction corresponding to (4) is placed. Also,
If there are multiple instructions that apply to each condition,
Place the commands with the deepest commands first.

【０１０１】表４の基本ブロックでは、まず、（１）の
条件に当てはまる未配置の命令はなく、（２）の条件に
当てはまる命令（１），（４）を、命令の深さの深い順
（４），（１）で命令ブロック１に配置する。命令ブロ
ック１には、まだ空き命令フィールドはあるが、（３）
（４）の条件にあてはまる命令がないので命令ブロック
１への命令配置を終了する。このときの命令ブロックの
配置状態は表７（ｂ）のようになる。さらにステップＳ
１６で、まだ未配置の命令が残っているので次にステッ
プ７に進む。ステップＳ１７のプラス・ブロックに配置
可能な命令は、未配置な命令で、かつ、次の４条件にあ
てはまる命令である。In the basic block of Table 4, first, there are no unplaced instructions that meet condition (1), and instructions (1) and (4) that meet condition (2) are sorted in descending order of instruction depth. (4) and (1) are placed in instruction block 1. There are still empty instruction fields in instruction block 1, but (3)
Since there is no instruction that satisfies the condition (4), instruction allocation to instruction block 1 is completed. The arrangement state of the instruction blocks at this time is as shown in Table 7(b). Further step S
At step 16, there are still unplaced instructions, so the process proceeds to step 7. The instructions that can be placed in the plus block in step S17 are unplaced instructions that meet the following four conditions.

【０１０２】（１）基本の高さを持つ命令（２）前ブロック以前に実行される命令ブロックに配置
された命令の後者で、深さが（前者の命令の深さ＋１）である命令（３）現ブ
ロックに配置済みの命令とＲＷ辺で結ばれる後者の命令
で、かつ、そのブロックの他の命令の後者でない命令（４）基本の
高さ以下の高さを持ち、かつ、すべての祖先が配置済み
である命令命令を配置するに当たっては、（１）に該当
するすべての命令を配置し、なおかつ空き命令フィール
ドがある時に（２）に該当するすべての命令を配置、さ
らに空き命令フィールドがある時に（３）に該当する命
令を配置、更に空き命令フィールドがある時に（４）に
該当する命令を配置していく。また、各々の条件に当て
はまる命令が複数存在する場合には、（２）（３）（４
）については命令の高さの高いものから、（１）につい
ては命令の深さの深いものから先に配置していく。(1) Instructions that have the basic height (2) Instructions that are the latter of the instructions placed in the instruction block that is executed before the previous block and whose depth is (depth of the former instruction + 1) ( 3) The latter instruction is connected to the instruction already placed in the current block by the RW side, and is not the latter instruction of other instructions in that block. (4) The instruction has a height less than or equal to the basic height, and all When placing instructions whose ancestors have already been placed, place all instructions that fall under (1), and if there is an empty instruction field, place all instructions that fall under (2), and then place all instructions that fall under (2), and then place all instructions that fall under (2), and then When there is a free instruction field, an instruction corresponding to (3) is placed, and when there is an empty instruction field, an instruction corresponding to (4) is placed. In addition, if there are multiple instructions that meet each condition, (2), (3), and (4)
), the instructions with the highest height are placed first, and regarding (1), the instructions with the deepest instructions are placed first.

【０１０３】表４の基本ブロックでは、（１）の条件に
当てはまる未配置の命令はなく、（２）の条件に当ては
まる命令（７），（８）を命令ブロック２に配置する。命令ブロック２には、まだ空き命令フィールドはあるが
、（３）（４）の条件にあてはまる命令がないので命令
ブロック２への命令配置を終了する。このときの命令ブ
ロックの配置状態は表７（ｃ）のようになる。さらにス
テップＳ１８で、まだ未配置の命令が残っているので次
にステップＳ１５に進む。In the basic block of Table 4, there are no unplaced instructions that meet condition (1), and instructions (7) and (8) that meet condition (2) are placed in instruction block 2. Although there is still a vacant instruction field in instruction block 2, there is no instruction that satisfies the conditions (3) and (4), so instruction placement in instruction block 2 is completed. The arrangement state of the instruction blocks at this time is as shown in Table 7(c). Further, in step S18, since there are still unplaced instructions, the process proceeds to step S15.

【０１０４】ステップＳ１５で、表４の基本ブロックで
未配置の命令（３）（２）の条件に当はまるため命令ブ
ロック３に配置される。この時点で命令ブロックの配置
状態は表７（ｄ）のようになる。そして、ステップＳ１
６で未配置の命令はないのでステップＳ１９に進む。In step S15, the instruction is placed in instruction block 3 because it satisfies the conditions of instructions (3) and (2), which are not placed in the basic block of Table 4. At this point, the instruction block arrangement state is as shown in Table 7(d). And step S1
6, there are no unplaced instructions, so the process advances to step S19.

【０１０５】ステップＳ１９では、退避されていた命令
（９）を適切な場所へ配置する。退避されていた命令が
その時点での最終ブロックに配置されている命令の後者
でない場合には、その命令をその時点での最終ブロック
の分岐命令フィールドに配置する。しかし、最終ブロッ
クに配置されている命令の後者である場合には、新しい
命令ブロックを最終ブロックとして追加して、その命令
ブロックの分岐命令フィールドに分岐命令を配置する。In step S19, the evacuated instruction (9) is placed in an appropriate location. If the instruction that has been saved is not the latter of the instructions placed in the final block at that time, the instruction is placed in the branch instruction field of the final block at that time. However, if it is the latter of the instructions placed in the final block, a new instruction block is added as the final block, and the branch instruction is placed in the branch instruction field of that instruction block.

【０１０６】表４の基本ブロックの場合、命令（９）は
この時点での最終ブロックに配置されている命令の後者
にあたるので、新しい命令ブロックを追加して、その分
岐命令フィールドに命令（９）を配置する。この時点で
並列化処理は終了し、配置された命令ブロックは表７（
ｅ）のようになる。In the case of the basic block in Table 4, instruction (9) is the latter of the instructions placed in the final block at this point, so a new instruction block is added and instruction (9) is placed in the branch instruction field. Place. At this point, the parallelization process is finished, and the arranged instruction blocks are shown in Table 7 (
e).

【０１０７】次に、図１０のフロー・チャートに従って
、ボトム・アライン並列化方式について説明する。ボト
ム・アライン並列化方式を説明するに当たり、表８に、
ある基本ブロックに対応す命令文の列を、表８の各命令
の高さと深さを表９に、また、各命令の前者，後者に当
たる命令その２命令のデータ依存関係を表１０に示す。Next, the bottom align parallelization method will be explained according to the flow chart of FIG. In explaining the bottom-align parallelization method, Table 8 shows
Table 9 shows the sequence of instruction sentences corresponding to a certain basic block, the height and depth of each instruction in Table 8, and Table 10 shows the data dependence of the two instructions corresponding to the former and latter of each instruction.

【０１０８】[0108]

【０１０９】[0109]

【０１１０】[0110]

【０１１１】なお、ボトム・アライン並列化方式では、
実行サイクルの遅い命令ブロックから順番に命令を埋め
込んでいく。つまり、最終サイクルの命令ブロックを命
令ブロック０とし、命令ブロック０になるべく多くの命
令を配置し、それ以上命令が埋められなくなったら、次
にその一つ前のサイクルの命令ブロック１を対象として
命令を配置する。命令ブロック１にそれ以上命令が配置
できなくなったら、次に、さらにその一つ前のサイクル
の命令ブロック、…というようにである。Note that in the bottom aligned parallelization method,
Instructions are embedded in order starting from the instruction block with the slowest execution cycle. In other words, the instruction block of the last cycle is set as instruction block 0, and as many instructions as possible are placed in instruction block 0. When no more instructions can be filled, the next instruction is placed in instruction block 1 of the previous cycle. Place. When no more instructions can be placed in instruction block 1, the instruction block of the previous cycle is next, and so on.

【０１１２】まず、図１０のステップＳ２１では、まず
後者のない命令を、命令の深さの深い順に命令ブロック
０に配置していく。表８の場合には命令（３）（４）（
６）（７）が後者のない命令である（表１０）。この中
から、命令の深さの深いものから順に命令ブロック０に
配置していく。命令の深さは表９に示すように（３），
（４）は深さ１で同じだが、（６）は２、（７）は３で
ある。したがって、命令ブロック０に命令（７）（６）
（１）を配置した段階で命令の空きフィールドがなくな
る。そこで、ステップＳ２２からステップＳ２５へ進む
。この時点では、まだ未配置の命令があるので、ステッ
プＳ２６で、現対象ブロック，命令ブロック０への命令
配置を終了し、次の命令ブロック１に配置対象をかえて
、ステップ１に進む。このとき命令ブロックの配置状態
は表１１（ａ）のようになる。First, in step S21 in FIG. 10, instructions without the latter are placed in instruction block 0 in descending order of instruction depth. In the case of Table 8, instructions (3) (4) (
6) and (7) are instructions without the latter (Table 10). From among these, instructions are placed in instruction block 0 in descending order of depth. As shown in Table 9, the instruction depth is (3),
(4) is the same at depth 1, but (6) is 2 and (7) is 3. Therefore, instructions (7) (6) are placed in instruction block 0.
When (1) is placed, there are no empty fields for instructions. Therefore, the process advances from step S22 to step S25. At this point, there are still unplaced instructions, so in step S26 the instruction placement in the current target block, instruction block 0, is finished, the placement target is changed to the next instruction block 1, and the process proceeds to step 1. At this time, the arrangement state of the instruction blocks is as shown in Table 11(a).

【０１１３】[0113]

【０１１４】ステップＳ２１で、この時点で後者のない
、または、後者が配置済みの命令は命令（２）〜（５）
である。これらの命令を、命令の深さの順に配置すると
命令（５）（２）（３）が命令ブロック１に配置される
。この時点で空き命令フィールドがなくなるので、ステ
ップＳ２２からステップＳ２５へ進み、命令（４）が未
配置なのでステップＳ２６へ進み、命令ブロック１への
命令配置を終了し、次の命令ブロック２に配置対象をか
える。このとき命令ブロックの配置状態は表１１（ｂ）
のようになる。[0114] In step S21, the instructions without the latter or with the latter placed are instructions (2) to (5).
It is. When these instructions are arranged in order of instruction depth, instructions (5), (2), and (3) are arranged in instruction block 1. At this point, there are no more empty instruction fields, so the process advances from step S22 to step S25, and since instruction (4) has not been placed, the process proceeds to step S26, where instruction placement in instruction block 1 is completed and the next instruction block 2 is the placement target. Change. At this time, the arrangement state of the instruction block is shown in Table 11(b).
become that way.

【０１１５】さらにステップＳ２１で、後者がすべて配
置済みの命令（４）を命令ブロック２に配置する。ステ
ップＳ２１で、まだ空き命令フィールドはあるが、ステ
ップ２３で未配置の命令が残っていないので、この時点
ですべての命令の配置を終了する。最終的に命令の配置
状況は、表１１（ｃ）のようになる。なお、表８の基本
ブロックの命令をミドル・アライン命令並列化方式で並
列化すると、表１２のようになる。ボトム・アライン命
令並列化方式に比べ、基本ブロックの先頭命令ブロック
で、空き命令フィールドが少ないことが明かである。Further, in step S21, the latter instruction (4), all of which have been placed, is placed in the instruction block 2. In step S21, there are still empty instruction fields, but since there are no instructions remaining that have not been placed in step S23, the placement of all instructions is completed at this point. The final arrangement of instructions is as shown in Table 11(c). Note that when the basic block instructions in Table 8 are parallelized using the middle aligned instruction parallelization method, the results are as shown in Table 12. It is clear that compared to the bottom-aligned instruction parallelization method, there are fewer empty instruction fields in the first instruction block of the basic block.

【０１１６】[0116]

【０１１７】この並列命令実行選択処理方式を有するパ
イプライン命令処理装置は、並列パイプライン命令処理
装置のコード生成時の問題を解決するための方式である
が、このコード生成方式では並列命令実行選択処理方式
を有効に利用する最適化コードを生成することができな
い。このようなパイプライン命令処理装置に有効なコー
ドを最適化するには、プログラムの実行時間の大部分が
費やされるループに対して有効なコードを生成する必要
がある。The pipeline instruction processing device having this parallel instruction execution selection processing method is a method for solving the problem during code generation of the parallel pipeline instruction processing device. Optimized code that effectively utilizes the processing method cannot be generated. In order to optimize code that is effective for such a pipeline instruction processing device, it is necessary to generate code that is effective for loops in which most of the program execution time is spent.

【０１１８】図１１，図１２は本発明の第３の実施例の
コード最適化方式およびそのループ最適化部を説明する
フローチャートであり、図１３，図１４は並列命令実行
選択処理方式のあるパイプライン命令処理装置における
最適化する前および最適化後のプログラム・シーケンス
図である。本実施例も図１の並列命令実行選択処理方式
をもつパイプライン命令処理装置を用いるとする。なお
、ここではプログラムは現在の最適化コンパイラ技術で
得られる局所的最適化および大域的最適化はすでに施さ
れていると仮定しているので、図１３のプログラムはル
ープ内不変コードのループ外移動や誘導変数（ループの
繰り返しによって一定値ずつ変化する変数）に最適化し
た最適化後のプログラム・シーケンスを示している。FIGS. 11 and 12 are flowcharts for explaining the code optimization method and its loop optimization section according to the third embodiment of the present invention. FIGS. FIG. 4 is a program sequence diagram before and after optimization in a line instruction processing device. This embodiment also uses a pipeline instruction processing device having the parallel instruction execution selection processing method shown in FIG. Note that here it is assumed that the program has already been subjected to local optimization and global optimization obtained with current optimizing compiler technology, so the program in Figure 13 is designed to move the invariant code inside the loop out of the loop. This shows the program sequence after optimization, which is optimized for the following variables and induced variables (variables that change by a constant value as the loop repeats).

【０１１９】図１１において、ステップ１０１で「基本
ブロックの分割」を行う。この「基本ブロック」とは、
先頭の命令から最後の命令まで一つずつ順番に実行され
る一連の基本的演算操作である。分岐命令が分岐フィー
ルドに存在すると基本ブロックが切れるため、途中に分
岐命令はなく、「基本ブロックの入口」は基本ブロック
の先頭命令ブロック、「基本ブロックの出口」は基本ブ
ロックの最後の命令ブロックとなる。なお、説明の簡略
化のため、プログラムの各命令は、できる限り基本ブロ
ックの入口と出口の命令ブロックの空きフィールドを増
大するように配置されていると仮定する。In FIG. 11, "basic block division" is performed in step 101. What is this "basic block"?
It is a series of basic arithmetic operations that are executed one by one in order from the first instruction to the last instruction. If a branch instruction exists in the branch field, the basic block is cut off, so there is no branch instruction in the middle, and the "entrance of the basic block" is the first instruction block of the basic block, and the "exit of the basic block" is the last instruction block of the basic block. Become. To simplify the explanation, it is assumed that each instruction of the program is arranged so as to increase the free fields of the instruction blocks at the entrance and exit of the basic block as much as possible.

【０１２０】次のステップ１０２は、「フロー解析」で
あり、ラベルの定義／参照関係を解析し、基本ブロック
間の流れを求める。この制御の流れは、「前者」と「後
者」で示され、制御の流れが基本ブロックＡから基本ブ
ロックＢに移るとき（これを「基本ブロックＡ→基本ブ
ロックＢ」と記述する）、基本ブロックＡは基本ブロッ
クＢの前者であり、基本ブロックＢは基本ブロックＡの
後者である。The next step 102 is "flow analysis", in which label definition/reference relationships are analyzed to determine the flow between basic blocks. This flow of control is indicated by "former" and "latter", and when the flow of control moves from basic block A to basic block B (this is written as "basic block A → basic block B"), the basic block A is the former of basic block B, and basic block B is the latter of basic block A.

【０１２１】次のステップ１０３で、「深さ優先順序の
計算」を行う。この「深さ優先順序」とは、プログラム
の最初の基本ブロックからはじめて、できる限り速く最
初の基本ブロックから離れるように（深さ優先）、基本
ブロックを訪れるようにプログラムを探索した結果で、
プログラムの全基本ブロックの並びである。In the next step 103, "depth priority order calculation" is performed. This "depth-first order" is the result of exploring a program by starting with the first basic block in the program and visiting basic blocks as quickly as possible away from the first basic block (depth-first).
It is a sequence of all basic blocks of a program.

【０１２２】次のステップ１０４で、「支配者の検出」
を行う。この「支配者」とは、各基本ブロックに対して
自分が実行される前に必ず実行される基本ブロックであ
る。[0122] In the next step 104, "Detection of ruler"
I do. This "ruler" is a basic block that is always executed before each basic block is executed.

【０１２３】次のステップ１０５で、「自然なループの
構成」を行う。この「自然なループ」とは、逆向きの辺
“Ａ→Ｂ”（基本ブロックＢが基本ブロックＡを支配し
ている辺“Ａ→Ｂ”）において、Ｂを通らずにＡに到達
可能な基本ブロックにＡとＢを加えた基本ブロックの並
びである。基本ブロックＢをこの自然なループのヘッダ
・ブロック、基本ブロックＡをテイラ・ブロックと呼ぶ
。In the next step 105, "construction of a natural loop" is performed. This "natural loop" means that on the opposite side "A → B" (the side "A → B" where basic block B dominates basic block A), it is possible to reach A without passing through B. This is an arrangement of basic blocks that includes basic blocks plus A and B. Basic block B is called the header block of this natural loop, and basic block A is called the Taylor block.

【０１２４】次のステップ１０６で、「生きている変数
の解析」を行う。レジスタａとプログラム中のある点ｐ
に対して、点ｐでのレジスタａの値が点を始点とするあ
る経路内で使用できるかということを解析し、使用でき
れば「生きている」といい、使用できなければ点ｐでレ
ジスタａは「死んでいる」という。生きている変数の解
析の目的は、各基本ブロックの出口において生きている
変数の集合を求めることである。[0124] In the next step 106, "analysis of living variables" is performed. register a and a point p in the program
, we analyze whether the value of register a at point p can be used within a certain path starting from point p. If it can be used, it is said to be "alive," and if it cannot be used, register a at point p is said to be "dead." The purpose of live variable analysis is to find the set of live variables at the exit of each basic block.

【０１２５】さらにステップ１０７で、「ループ最適化
」を行う。ここではステップ１０５で構成した各自然な
ループに対してコード最適化を施し、並列命令実行選択
処理方式を持つパイプライン命令処理装置に有効なコー
ドを生成する。ループの長さ（ループを構成する命令ブ
ロックの数）を短くし、ループを実行するのに必要な時
間を短くする。Further, in step 107, "loop optimization" is performed. Here, code optimization is performed on each natural loop constructed in step 105 to generate a code that is effective for a pipeline instruction processing device having a parallel instruction execution selection processing method. Shorten the length of the loop (the number of instruction blocks that make up the loop) and shorten the time required to execute the loop.

【０１２６】図１２のステップ１０７のループ最適化部
において、ステップ１１１〜１１３は、「ループ最適化
が可能かどうかの前判定」を行う。In the loop optimization section of step 107 in FIG. 12, steps 111 to 113 perform "pre-determination as to whether loop optimization is possible."

【０１２７】まず、ステップ１１１では、テイラ・ブロ
ックの出口の空フィールド数ｙを求める。First, in step 111, the number y of empty fields at the exit of the Taylor block is determined.

【０１２８】次のステップ１１２では、ステップ１１１
で求めたｙの値により、最適化が可能かどうか判定し、
空きフィールド数ｙ＝０の場合は最適化できないので終
了する。[0128] In the next step 112, step 111
Determine whether optimization is possible based on the value of y found in
If the number of empty fields y=0, optimization is not possible and the process ends.

【０１２９】次のステップ１１３では、ヘッダ・ブロッ
クの入口の命令ブロックおよびテイラ・ブロックの出口
の命令ブロックのレジスタの定義／参照のデータにより
、最適化が可能かどうかを判定する。ヘッダ・ブロック
の入口の命令ブロックでの定義または参照されているレ
ジスタがテイラ・ブロックの出口の命令ブロックで定義
されている場合（定義／参照の依存関係がある場合）は
最適化ができないので終了する。In the next step 113, it is determined whether optimization is possible based on the register definition/reference data of the instruction block at the entrance of the header block and the instruction block at the exit of the tailor block. If a register defined or referenced in the instruction block at the entrance of the header block is defined in the instruction block at the exit of the Taylor block (if there is a definition/reference dependency), optimization cannot be performed and the process ends. do.

【０１３０】ステップ１１４〜１１９は、実際の最適化
処理を示す。Steps 114 to 119 show actual optimization processing.

【０１３１】まずステップ１１４では、ヘッダ・ブロッ
クの入口の命令ブロックの命令数ｘを求め、次のステッ
プ１１５では、ヘッダ・ブロックの入口の命令ブロック
の命令［命令数ｘ］の内、テイラ・ブロックに実行選択
処理を用いずに移動可能な命令［命令数ｚ］および移動
可能な空き命令フィールドｍを求める。定義するレジス
タがヘッダ・ブロックの出口の生きている変数に含まれ
ていない命令が移動可能な命令である。また、移動可能
な命令の定義するレジスタが参照されている命令ブロッ
クおよびその命令ブロック以降、かつ参照するレジスタ
が参照されている命令ブロック以降の命令ブロックの空
き命令フィールドが移動可能な空き命令フィールドであ
る。First, in step 114, the number x of instructions in the instruction block at the entrance of the header block is calculated, and in the next step 115, among the instructions [number of instructions x] of the instruction block at the entrance of the header block, the number of instructions in the instruction block The number of instructions [z] that can be moved without using execution selection processing and the empty instruction field m that can be moved are determined. An instruction whose defining register is not included in the live variables at the exit of the header block is a movable instruction. In addition, the free instruction field of the instruction block where the register defined by the movable instruction is referenced, and the instruction block after that instruction block and the instruction block after the instruction block where the referenced register is referenced, is the free instruction field that can be moved. be.

【０１３２】次のステップ１１６では、ｘ（ステップ１
１４）とｙ（ステップ１１１）、ｚ（ステップ１１５）
の値により、最適化が可能かどうか判定し、（ｘ−ｚ）
＞ｙの場合は最適化できないので終了する。In the next step 116, x(step 1
14) and y (step 111), z (step 115)
Determine whether optimization is possible based on the value of (x-z)
If >y, optimization cannot be performed and the process ends.

【０１３３】次のステップ１１７〜１１９は、コードの
移動／複写およびループの再構成を示す。まず、ステッ
プ１１７でｚ（ステップ１１５）にカウントとしたヘッ
ダ・ブロックの入口の命令ブロックの命令をテイラ・ブ
ロックの空きフィールドｍに移動する。また、ｚに挙げ
られなかったヘッダ・ブロックの入口の命令ブロックの
命令は、テイラ・ブロックの出口の空きフィールドに実
行選択処理［＋］付で移動する。The next steps 117-119 illustrate moving/copying the code and reconfiguring the loop. First, in step 117, the instruction in the instruction block at the entrance of the header block counted as z (step 115) is moved to the empty field m of the tailor block. Further, the instructions in the instruction block at the entrance of the header block that are not listed in z are moved to the empty field at the exit of the Taylor block with execution selection processing [+] attached.

【０１３４】次に、ステップ１１８で新しい基本ブロッ
クを生成し、ステップ１１７で移動したヘッダ・ブロッ
クの入口の命令ブロックの命令を複写する。Next, in step 118, a new basic block is generated, and in step 117, the instructions in the instruction block at the entrance of the header block moved are copied.

【０１３５】最後に、ステップ１１９で、ループの再構
成を行なう。具体的には、ヘッダ・ブロックの先頭の命
令ブロックをヘッダ・ブロックから削除する。これによ
り、ループの長さ（ループを構成する命令ブロックの数
）が１命令ブロック短くなる。また、次に挙げるような
フロー情報や分岐先の変更等を行なう。Finally, in step 119, the loop is reconfigured. Specifically, the instruction block at the head of the header block is deleted from the header block. As a result, the length of the loop (the number of instruction blocks forming the loop) is shortened by one instruction block. In addition, the following flow information and branch destination changes are performed.

【０１３６】ａ）ヘッダブロックの前者をテイラ・ブロ
ックとステップ１１８で生成した新しい基本ブロックＦ
とする。a) The former header block is combined with the Taylor block and the new basic block F generated in step 118
shall be.

【０１３７】ｂ）テイラ・ブロック以外にヘッダ・ブロ
ックを後者とする基本ブロックの後者を、ヘッダ・ブロ
ックからステップ１１８の新しい基本ブロックＦに変更
できる。b) The latter of the basic blocks is a header block in addition to the Taylor block.The latter of the basic blocks can be changed from the header block to the new basic block F in step 118.

【０１３８】ｃ）ｂ）で後者を変更した基本ブロックの
分岐先がヘッダ・ブロックであった場合には、分岐先を
ステップ１１８の新しい基本ブロックに変更する。c) If the branch destination of the basic block whose latter was changed in b) is a header block, the branch destination is changed to the new basic block in step 118.

【０１３９】ｄ）ステップ１１８の新しい基本ブロック
の前者を、ｂ）で後者を変更した全ての基本ブロックに
する。d) Make the former of the new basic blocks in step 118 all the basic blocks whose latter were changed in b).

【０１４０】ｅ）ステップ１１８の新しい基本ブロック
の後者をヘッダ・ブロックにする。e) Make the latter of the new basic blocks of step 118 a header block.

【０１４１】図１３に示すプログラム・シーケンスに対
して最適化を行うと、図１４のプログラム・シーケンス
が得られる。逆向きの辺“Ｅ→Ｂ”をもつ自然なループ
が最適化が可能であるとすると、命令５〜命令７からな
るヘッダ・ブロックＢの入口の命令ブロックの命令につ
いて、ステップ１１７でまず命令数ｚにカウントされた
命令６をテイラ・ブロックＥ内に移動し、残りの命令（
命令５と命令７）をテイラ・ブロックＥの出口の命令ブ
ロックに実行選択処理［＋］付で移動する。次にステッ
プ１１８で命令５〜命令７からなる命令ブロックひとつ
をもつ基本ブロックＦを生成する。When the program sequence shown in FIG. 13 is optimized, the program sequence shown in FIG. 14 is obtained. Assuming that a natural loop with opposite sides "E→B" can be optimized, first calculate the number of instructions in step 117 for the instructions in the instruction block at the entrance of header block B consisting of instructions 5 to 7. Move the instruction 6 counted by z into Taylor block E, and move the remaining instructions (
Instructions 5 and 7) are moved to the exit instruction block of Taylor block E with execution selection processing [+] attached. Next, in step 118, a basic block F having one instruction block consisting of instructions 5 to 7 is generated.

【０１４２】最後にステップ１１９でループの再構成を
行い、ヘッダブロックＢの先頭の命令ブロック（命令５
，命令６，命令７，ＮＯＰ）をヘッダ・ブロックから削
除して図１４に示すフローに変更される。Finally, in step 119, the loop is reconfigured and the instruction block at the beginning of header block B (instruction 5
, instruction 6, instruction 7, NOP) are deleted from the header block, and the flow is changed to the one shown in FIG.

【０１４３】この場合、ヘッダ・ブロックＢの先頭の命
令ブロック（命令５〜ＮＯＰ）がループから削除され一
命令ブロック分短くなるので、基本ブロックＢ〜Ｆのル
ープにおける繰返し命令処理時間が短縮されることにな
る。In this case, the instruction block at the beginning of header block B (instructions 5 to NOP) is deleted from the loop and becomes shorter by one instruction block, so the repeated instruction processing time in the loop of basic blocks B to F is shortened. It turns out.

【０１４４】図１５および図１６は本発明の第４の実施
例のフローチャートである。第３の実施例は「自然なル
ープ」の場合を示したが、本実施例は分岐しない場合を
示している。また図１７〜図２０は並列命令実行選択処
理方式を持つパイプライン命令処理装置のプログラム・
シーケンスに対する本実施例のコード最適化の例である
。図１７〜図２０の各ケースにおいて、（ａ）は本実施
例の最適化をおこなう前のプログラム・シーケンス、（
ｂ）は最適化後のプログラム・シーケンスである。なお
、ここでは（ａ）のプログラムは、現在の最適化コンパ
イラ技術で提供されている局所的最適化および大域最適
化はすでに施されていると仮定する。FIGS. 15 and 16 are flowcharts of a fourth embodiment of the present invention. The third embodiment shows the case of a "natural loop", but this embodiment shows the case of no branching. In addition, FIGS. 17 to 20 show a program for a pipeline instruction processing device having a parallel instruction execution selection processing method.
This is an example of code optimization of this embodiment for a sequence. In each case of FIGS. 17 to 20, (a) is the program sequence before optimization of this embodiment, (
b) is the program sequence after optimization. It is assumed here that the program (a) has already been subjected to local optimization and global optimization provided by current optimizing compiler technology.

【０１４５】図１５において、第３の実施例と相違する
個所は、ステップ１０６がなく、ステップ１０２ａ，１
０７ａが変更された点にある。In FIG. 15, the difference from the third embodiment is that step 106 is not provided, and steps 102a and 1
07a has been changed.

【０１４６】ステップ１０２ａは、「フロー解析」であ
り、ラベルの定義／参照関係を解析し、基本ブロック間
の制御の流れを求める。制御の流れは、「前者」と「後
者」、および「辺の種類（分岐成立辺，分岐不成立辺ま
たは関数呼び出し辺）」で示す。前者から後者への制御
が移行するような辺は分岐成立辺、前者の分岐フィール
ドに関数呼び出し命令があり関数からの帰還後に後者の
制御が移行するような辺は関数呼び出し辺、それ以外の
辺が分岐不成立辺である。ステップ１０２ａにより、基
本ブロック間のすべての制御の流れが辺の種類によって
分類される。Step 102a is a "flow analysis" in which label definition/reference relationships are analyzed to determine the flow of control between basic blocks. The control flow is indicated by "former", "latter", and "edge type (branch taken edge, branch not taken edge, or function call edge)". An edge where control is transferred from the former to the latter is a branch taken edge, an edge where there is a function call instruction in the branch field of the former and control is transferred to the latter after returning from the function is a function call edge, and other edges are called edges. is the branch failure edge. Step 102a classifies all control flows between basic blocks by edge type.

【０１４７】ステップ１０７ａは、「コード最適化」で
ある。ステップ１０２ａのフロー解析の結果から、分岐
不成立辺を取り出し、各分岐不成立辺に対してコード最
適化を施し、分岐が成立しないフローの実行時間を短く
する。図１６にステップ１０７ａのコード最適化の詳細
フローを示す。最適化をおこなう分岐不成立分を辺‘Ａ
→Ｂ’（Ａ：前者ブロック・Ｂ：後者ブロック）とする
。Step 107a is "code optimization". From the result of the flow analysis in step 102a, edges with no branch taken are extracted and code optimization is applied to each edge with no branch taken, thereby shortening the execution time of the flow where the branch is not taken. FIG. 16 shows a detailed flow of code optimization in step 107a. The branch unsatisfied component to be optimized is edge 'A
→B' (A: former block, B: latter block).

【０１４８】ステップ１１０〜ステップ１１３は、「最
適化が可能かどうかの前判定」である。ステップ１１０
では、分岐不成立辺の後者である基本ブロックＢの命令
ブロック数および基本ブロックＢの分岐フィールドの情
報から、最適化が可能かどうか判定する。次の条件（図
１６では条件αと記述する）が満たされる場合は最適化
ができない。Steps 110 to 113 are "pre-determination as to whether optimization is possible". Step 110
Now, it is determined whether optimization is possible based on the number of instruction blocks of basic block B, which is the latter of the branches not taken, and information on the branch field of basic block B. Optimization cannot be performed if the following condition (described as condition α in FIG. 16) is satisfied.

【０１４９】条件　　１）［基本ブロックＢ＝１命令ブロック］２）
［基本ブロックＢの分岐フィールドに分岐命令がある］ステップ１１１では、分岐不成立辺の前者である基本ブ
ロックＡの出口の空きフィールド数ｙを求める。ステッ
プ１１２では、ステップ１１１で求めたｙの値により、
最適化が可能かどうか判定する。［空きフィールド数ｙ
＝０］の場合は最適化できないので終了する。Conditions 1) [Basic block B=1 instruction block] 2)
[There is a branch instruction in the branch field of basic block B] In step 111, the number y of empty fields at the exit of basic block A, which is the former branch failure edge, is determined. In step 112, depending on the value of y obtained in step 111,
Determine whether optimization is possible. [Number of empty fields y
=0], optimization cannot be performed and the process ends.

【０１５０】次のステップ１１３では、基本ブロックＢ
の入口の命令ブロックおよび基本ブロックＡの出口の命
令ブロックのレジスタの定義／参照のデータにより、最
適化が可能かどうか判定する。基本ブロックＢの入口の
命令ブロックでの定義または参照されているレジスタが
基本ブロックＡの出口の命令ブロックで定義されている
場合（定義／参照の依存関係がある場合）は最適化がで
きないので終了となる。In the next step 113, basic block B
It is determined whether optimization is possible based on register definition/reference data of the instruction block at the entrance of basic block A and the instruction block at the exit of basic block A. If a register defined or referenced in the instruction block at the entrance of basic block B is defined in the instruction block at the exit of basic block A (if there is a definition/reference dependency), optimization cannot be performed and the process ends. becomes.

【０１５１】ステップ１１４〜ステップ１１９ａは、実
際の最適化処理である。まず、ステップ１１４では、基
本ブロックＢの入口の命令ブロックの命令数ｘを求める
。Steps 114 to 119a are actual optimization processing. First, in step 114, the number x of instructions in the input instruction block of basic block B is determined.

【０１５２】ステップ１１５では、基本ブロックＢの入
口の命令ブロックの命令［命令数ｘ］の内、基本ブロッ
クＡに実行選択処理［−］を用いずに移動可能な命令［
命令数ｚ］および移動可能な空きフィールドｍを求める
。基本ブロックＢの入口の命令ブロックの命令の内、そ
の定義するレジスタが後者ブロックＢの出口の生きてい
る変数に含まれていない命令が移動可能である。また、
移動可能な命令の定義するレジスタが基本ブロックＡ中
で参照されている命令ブロックおよびその命令ブロック
以降、かつ参照するレジスタが定義されている命令ブロ
ック以降の命令ブロックの空きフィールドが移動可能な
空きフィールドである。In step 115, among the instructions [number of instructions x] of the instruction block at the entrance of basic block B, an instruction [number of instructions x] that can be moved to basic block A without using execution selection processing [-] is selected.
The number of instructions z] and the movable empty field m are determined. Among the instructions in the instruction block at the entrance of the basic block B, those instructions whose registers are not included in the living variables at the exit of the latter block B are movable. Also,
Free fields of instruction blocks in which registers defined by movable instructions are referenced in basic block A, and instruction blocks subsequent to the instruction blocks and subsequent to instruction blocks in which referenced registers are defined, are free fields that can be moved. It is.

【０１５３】ステップ１１６では、ｘ（ステップ１１４
）とｙ（ステップ１１１）、ｚ（ステップ１１５）の値
により、最適化が可能かどうか判定する。［（ｘ−ｚ）
＞ｙ］の場合は最適化できず終了となる。In step 116, x (step 114
), y (step 111), and z (step 115), it is determined whether optimization is possible. [(x-z)
>y], optimization is not possible and the process ends.

【０１５４】ステップ１１７ａは、コードの移動／複写
およびプログラム・フローの変更である。まず、ステッ
プ１１７ａでｚ（ステップ１１５）にカウントとして基
本ブロックＢの入口の命令ブロックの命令を基本ブロッ
クＡ（前者ブロック）に移動する。移動する場所は空き
フィールドｍである。ｚに挙げられなかった後者ブロッ
クＢの入口の命令ブロックの命令は、前者ブロックＡの
出口の空きフィールドに実行選択処理［−］付で移動す
る（ステップ１１７ａ）。もし基本ブロックＡを前者と
する分岐成立辺が存在しない場合は、基本ブロックＢの
入口の命令ブロックの命令はすべて実行選択処理を用い
ずに移動できる。Step 117a is moving/copying code and changing program flow. First, in step 117a, the instruction in the instruction block at the entrance of basic block B is moved to basic block A (former block) as a count of z (step 115). The place to move is empty field m. The instructions in the input instruction block of the latter block B that are not listed in z are moved to the empty field at the exit of the former block A with an execution selection process [-] attached (step 117a). If there is no branch that takes basic block A as the former, all the instructions in the instruction block at the entrance of basic block B can be moved without using execution selection processing.

【０１５５】次に、１１８ａで基本ブロックＢを後者と
する分岐成立辺が存在する場合には、新しい基本ブロッ
クＣを生成し、ステップ１１７ａで移動した基本ブロッ
クＢの入口の命令ブロックの命令を複写する（ステップ
１１８ａ）。Next, in step 118a, if there is a branch that takes basic block B as the latter, a new basic block C is generated, and the instruction of the instruction block at the entrance of basic block B moved in step 117a is copied. (step 118a).

【０１５６】最後にプログラム・フローを変更する（ス
テップ１１９ａ）。具体的には、分岐不成立辺の後者で
ある基本ブロックＢの先頭の命令ブロックを基本ブロッ
クＢから削除し、フロー情報や分岐先の変更等をおこな
う（図１７〜図２０）。これにより、分岐が成立しない
フローにおいて、実行時間が１命令ブロック分短くなる
。Finally, the program flow is changed (step 119a). Specifically, the first instruction block of basic block B, which is the latter of the branches not taken, is deleted from basic block B, and the flow information and branch destination are changed (FIGS. 17 to 20). As a result, the execution time is shortened by one instruction block in a flow in which a branch is not taken.

【０１５７】ここで、図１７〜図２０によりフロー情報
や分岐先の変更について説明する。図中の各ケースにお
いて、（ａ）は本実施例の最適化を行なう前のプログラ
ム・シーケンス、（ｂ）は最適化後のプログラム・シー
ケンスを示し、図中では説明の簡略化のため以下の略記
を用いる。[0157] Here, changes in flow information and branch destinations will be explained with reference to FIGS. 17 to 20. In each case in the figure, (a) shows the program sequence before optimization of this embodiment, and (b) shows the program sequence after optimization. Use abbreviations.

【０１５８】　　図１７は（分岐成立辺ＸからＢが存在しない）かつ
（最適化前：Ｂ＝１命令ブロック）の場合のプログラム
・シーケンス図である。FIG. 17 is a program sequence diagram in the case of (B does not exist from branch taken edge X) and (before optimization: B=1 instruction block).

【０１５９】ａ）基本ブロックＡの後者に、基本ブロッ
クＢの後者である基本ブロックＤを追加する。a) Add basic block D, which is the latter of basic block B, to the latter of basic block A.

【０１６０】ｂ）基本ブロックＡの後者から基本ブロッ
クＢを取り除く。b) Remove basic block B from the latter of basic blocks A.

【０１６１】ｃ）基本ブロックＤの［前者：基本ブロッ
クＢ］を基本ブロックＡに置き換える。c) Replace [former: basic block B] of basic block D with basic block A.

【０１６２】図１８は、（分岐成立辺ＸからＢが存在し
ない）かつ（最適化前：Ｂ≠１命令ブロック）の場合の
プログラム・シーケンス図であり、この場合は、特に何
もしない。FIG. 18 is a program sequence diagram in the case where (B does not exist from the branch taken edge X) and (before optimization: B≠1 instruction block), and in this case, nothing special is done.

【０１６３】図１９は、（分岐成立辺からＢが存在する
）かつ（最適化前：Ｂ＝１命令ブロック）の場合の図で
ある。FIG. 19 is a diagram for the case (B exists from the branch taken edge) and (before optimization: B=1 instruction block).

【０１６４】ａ）基本ブロックＢを後者とする基本ブロ
ックＡ以外の基本ブロックＸ（Ｘ１　、…Ｘｎ　）の［
後者：基本ブロックＢ］を基本ブロックＣに置き換える
。a) [ of basic blocks X (X1,...Xn) other than basic block A, with basic block B as the latter
The latter: basic block B] is replaced with basic block C.

【０１６５】ｂ）基本ブロックＸの［分岐先：基本ブロ
ックＢ］を基本ブロックＣに変更する。b) Change [branch destination: basic block B] of basic block X to basic block C.

【０１６６】ｃ）基本ブロックＡの後者を“基本ブロッ
クＢの後者ブロックＤ・基本ブロックＡの分岐先ブロッ
ク”にする。c) Make the latter of basic block A "the latter block D of basic block B/branch destination block of basic block A."

【０１６７】ｄ）基本ブロックＣの前者を基本ブロック
Ｘとする。d) Let the former of basic block C be basic block X.

【０１６８】ｅ）基本ブロックＣの後者を基本ブロック
Ｄとする。e) Let the latter of basic block C be basic block D.

【０１６９】ｆ）基本ブロックＣの分岐フィールドに基
本ブロックＤへの無条件分岐命令を埋め込む。f) Embed an unconditional branch instruction to basic block D in the branch field of basic block C.

【０１７０】ｇ）基本ブロックＤの前者のうち、基本ブ
ロックＢ以外の基本ブロックをＹ（Ｙ１　，…Ｙｎ　）
とする。基本ブロックＤの前者を“基本ブロックＹ・基
本ブロックＡ・基本ブロックＣ”とする。g) Among the former basic blocks D, basic blocks other than basic block B are Y(Y1,...Yn)
shall be. The former basic block D is referred to as "basic block Y, basic block A, and basic block C."

【０１７１】図２０は、（分岐成立辺からＢが存在する
）かつ（最適化前：Ｂ≠１命令ブロック）の場合のプロ
グラム・シーケンス図である。FIG. 20 is a program sequence diagram in the case (B exists from the branch taken edge) and (before optimization: B≠1 instruction block).

【０１７２】ａ）基本ブロックＢを後者とする基本ブロ
ックＡ以外の基本ブロックＸ（Ｘ１　、…Ｘｎ　）の［
後者：基本ブロックＢ］を基本ブロックＣに置き換える
。a) [ of basic blocks X (X1,...Xn) other than basic block A, with basic block B as the latter
The latter: basic block B] is replaced with basic block C.

【０１７３】ｂ）基本ブロックＸの［分岐先：基本ブロ
ックＢ］を基本ブロックＣに変更する。b) Change [branch destination: basic block B] of basic block X to basic block C.

【０１７４】ｃ）基本ブロックＡの前者を“基本ブロッ
クＡ．基本ブロックＣ”にする。c) The former of basic block A is changed to "basic block A. basic block C."

【０１７５】ｄ）基本ブロックＣの前者を基本ブロック
Ｘとする。d) Let the former of basic block C be basic block X.

【０１７６】ｅ）基本ブロックＣの後者を基本ブロック
Ｂとする。e) Let the latter of basic block C be basic block B.

【０１７７】ｆ）基本ブロックＣの分岐フィールドに基
本ブロックＢへの無条件分岐命令を埋め込む。f) Embed an unconditional branch instruction to basic block B in the branch field of basic block C.

【０１７８】[0178]

【発明の効果】以上説明したように本発明は、分岐遅延
の生じない並列パイプライン処理方式において、条件分
岐命令を含む命令ブロックの分岐命令以外の命令につい
て、分岐条件の成立／不成立により、実行するかしない
かを選択できるため、分岐命令ブロックの空きフィール
ド増加を抑え、並列パイプライン命令処理装置の処理性
能が向上するという効果がある。また、プログラムの実
行時間の大部分が費やされるループに対して、実行選択
処理を用いた有効なコードを生成できるため、ループの
プログラム実行時間が短縮され、パイプライン命令処理
装置の処理能力を向上することができるという効果があ
る。Effects of the Invention As described above, in a parallel pipeline processing system that does not cause branch delays, the present invention enables instructions other than branch instructions in an instruction block containing conditional branch instructions to be executed depending on whether the branch condition is met or not. Since it is possible to select whether to do so or not, this has the effect of suppressing the increase in empty fields in the branch instruction block and improving the processing performance of the parallel pipeline instruction processing device. In addition, it is possible to generate effective code using execution selection processing for loops, which consume most of the program execution time, reducing program execution time for loops and improving the processing capacity of pipeline instruction processing units. The effect is that it can be done.

[Brief explanation of the drawing]

【図１】本発明の一実施例を説明するシステムのブロッ
ク図。FIG. 1 is a block diagram of a system illustrating an embodiment of the present invention.

【図２】図１のシステムに用いられる命令ブロックの配
置図。FIG. 2 is a layout diagram of instruction blocks used in the system of FIG. 1;

【図３】図１のシステムに用いられるプログラム・シー
ケンスの配置図。FIG. 3 is a layout diagram of a program sequence used in the system of FIG. 1;

【図４】図３の命令ブロック２の具体例の命令ブロック
の配置図。FIG. 4 is a layout diagram of an instruction block as a specific example of instruction block 2 in FIG. 3;

【図５】図３の加算命令のオブジェクト・フォーマット
の模式的配置図。FIG. 5 is a schematic layout diagram of the object format of the addition instruction in FIG. 3;

【図６】一般のミドル・アライン命令並列化方式におけ
る命令ブロックの配置図。FIG. 6 is a layout diagram of instruction blocks in a general middle-aligned instruction parallelization method.

【図７】命令実行選択機能を利用したループ最適化を説
明するブロック配置図。FIG. 7 is a block layout diagram illustrating loop optimization using an instruction execution selection function.

【図８】本発明の第２の実施例を説明するフローチャー
ト。FIG. 8 is a flowchart explaining a second embodiment of the present invention.

【図９】図８のミドル・アライン並列化方式を説明する
フローチャート。FIG. 9 is a flowchart illustrating the middle align parallelization method of FIG. 8;

【図１０】図８のボトム・アライン並列化方式を説明す
るフローチャート。FIG. 10 is a flowchart illustrating the bottom-align parallelization method of FIG. 8;

【図１１】本発明の第３の実施例を説明するフローチャ
ート。FIG. 11 is a flowchart illustrating a third embodiment of the present invention.

【図１２】図１１のループ最適化部を説明するフローチ
ャート。FIG. 12 is a flowchart illustrating the loop optimization section of FIG. 11;

【図１３】図１２の最適化前のプログラム・シーケンス
図。FIG. 13 is a program sequence diagram before optimization in FIG. 12;

【図１４】図１２の最適化後のプログラム・シーケンス
図。FIG. 14 is a program sequence diagram after optimization of FIG. 12;

【図１５】本発明の第４の実施例を説明するフローチャ
ート。FIG. 15 is a flowchart explaining a fourth embodiment of the present invention.

【図１６】図１５のコード最適化部を説明するフローチ
ャート。FIG. 16 is a flowchart illustrating the code optimization section of FIG. 15;

【図１７】図１６のプログラムフローを変更した第１の
場合の最適化前後のプログラム・シーケンス図。17 is a program sequence diagram before and after optimization in a first case in which the program flow in FIG. 16 is changed; FIG.

【図１８】図１６のプログラムフローを変更した第２の
場合の最適化前後のプログラムシーケンス図。18 is a program sequence diagram before and after optimization in a second case in which the program flow in FIG. 16 is changed; FIG.

【図１９】図１６のプログラムフローを変更した第３の
場合の最適化前後のプログラム・シーケンス図。19 is a program sequence diagram before and after optimization in a third case in which the program flow in FIG. 16 is changed; FIG.

【図２０】図１６のプログラムフローを変更した第４の
場合の最適化前後のプログラムシーケンス図。20 is a program sequence diagram before and after optimization in a fourth case in which the program flow in FIG. 16 is changed; FIG.

【図２１】トレース・スケジューリング法の概要を説明
する模式図。FIG. 21 is a schematic diagram illustrating an overview of a trace scheduling method.

【図２２】トレース・スケジューリング法の合流および
分岐の処理方法を説明する模式図。FIG. 22 is a schematic diagram illustrating a merging and branching processing method in the trace scheduling method.

[Explanation of symbols]

１　　　　命令ブロック・メモリ２　　　　命令ブロック・フェッチ手段３〜５　　　　
命令実行ユニット６　　　　分岐命令実行ユニット７　　　　データ・レジスタ１４〜１７　　　　オペランド・フェッチ手段１８　　
　　アドレス生成手段１９〜２１　　　　命令実行手段２２〜２５　　　　オペランド・ライト手段３１　　　
　命令ブロック・バス３２　　　　アドレス・バス３７〜４０　　　　レジスタ・リード・バス４１〜４４
　　　　ソース・オペランド・バス４５〜４８　　　　
デスティネーション・オペランド・バス４９〜５２　　
　　レジスタ・ライト・バス５３　　　　実行選択プラ
ス信号５４　　　　分岐条件成立信号１０１〜１０７，１１１〜１１９、Ｓ１〜Ｓ３，Ｓ１１
〜Ｓ１９，Ｓ２１〜Ｓ２６　　　　処理ステップ1 Instruction block memory 2 Instruction block fetch means 3 to 5
Instruction execution unit 6 Branch instruction execution unit 7 Data registers 14 to 17 Operand fetch means 18
Address generation means 19-21 Instruction execution means 22-25 Operand write means 31
Instruction block bus 32 Address bus 37-40 Register read bus 41-44
Source operand bus 45-48
Destination operand bus 49-52
Register write bus 53 Execution selection plus signal 54 Branch condition fulfillment signals 101 to 107, 111 to 119, S1 to S3, S11
~S19, S21~S26 Processing steps

Claims

[Claims]

1. An instruction execution processing method for a parallel pipeline instruction processing device that is equipped with a plurality of instruction execution units and executes a plurality of instructions in parallel to avoid branch delays. If there is a conditional branch instruction in the branch instruction field of one instruction block that is composed of multiple instruction fields that are executed in parallel by instructions that are a processing unit, each instruction other than the conditional branch instruction in that instruction block is An instruction execution processing method for a parallel pipeline instruction processing device, characterized in that an instruction execution selection means for selecting whether or not to execute an instruction is provided to reduce an empty instruction field in an instruction block containing the conditional branch instruction. .

2. First and second analysis means for analyzing the control flow of the program and the data flow of the program, and the program does not stop or branch at points other than entrances and exits detected by the first analysis means. a first parallelizing means for parallelizing instructions in the program based on data flow information obtained by the second analyzing means, targeting a basic block that is a group of consecutive instruction statements; 1st
2. The parallelization means according to claim 1, wherein the instructions in the basic block are arranged so that a cycle near the center of the basic block has a good instruction filling rate and there are many empty instruction fields near the entrance and exit of the basic block. An instruction execution processing method for a parallel pipeline instruction processing device.

3. A branch checking means for checking whether the final instruction of the basic block is a branch instruction; and a branch checking means for checking whether the final instruction of the basic block is a branch instruction; and a second parallelizing means for parallelizing so as to improve the instruction filling rate, and based on the information obtained by the branch checking means, if the final instruction of the basic block is not a branch instruction, the second 3. The instructions of the parallel pipeline instruction processing device according to claim 2, wherein the second parallelizing means parallelizes the instructions, and if the final instruction of the basic block is a branch instruction, the first parallelizing means parallelizes the instructions. Execution processing method.

4. If the number of instructions excluding the number of free instruction fields of the header block of the loop is smaller than the number of free fields of the Taylor block in a natural loop in which instruction blocks are concatenated, the number of instructions at the entrance of the header block Move the instructions of the block to a free field of the Taylor block, copy the instructions of the moved header block's entry block to a new basic block, and delete the header block's entry block from this header block. 2. The instruction execution processing method for a parallel pipeline instruction processing device according to claim 1, wherein the number of instruction blocks in the loop is reduced and code is optimized by reorganizing the lever loop.

5. When the first analysis means detects a flow that does not cause a branch, the data flow information is used to move an instruction in the program or move it with execution selection, and determine the state of the flow in which this movement was performed. A new instruction block is generated in accordance with the instruction block, the instruction that has been moved is copied to the new instruction block, and a branch destination or flow information is changed for a flow in which the branch does not occur. 4. The instruction execution processing method of the parallel pipeline instruction processing device according to 4.