JP2006012185A

JP2006012185A - Processor

Info

Publication number: JP2006012185A
Application number: JP2005224048A
Authority: JP
Inventors: Takehito Heiji; 岳人瓶子; Shuichi Takayama; 秀一高山; Tetsuya Tanaka; 哲也田中; Hajime Ogawa; 一小川; Nobuo Higaki; 信生桧垣
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-08-02
Filing date: 2005-08-02
Publication date: 2006-01-12

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that an instruction is executed as a non-operation instruction, use efficiency of hardware is poor and effective performance is lowered when conditions are not established in a conditional execution instruction. <P>SOLUTION: A processor decodes the number of instructions equal to or more than the number of mounted arithmetic units, decides executing conditions by an instruction issuance control part 31 before an execution stage, as regards an instruction whose conditions are false, invalidates the instruction itself, and performs allocation so that the arithmetic units (hardware) are effectively used by the succeeding valid instructions. A compiler performs scheduling so that the number of instructions whose executing conditions become true does not exceed the upper limit of parallelism of the hardware. The number of instructions itself to be arranged in parallel in each cycle can exceed the parallelism of the hardware. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、プロセッサに関し、特に並列処理において演算器の効率的活用により性能の向上を図る技術に関するものである。 The present invention relates to a processor, and more particularly to a technique for improving performance by using an arithmetic unit efficiently in parallel processing.

近年のマイクロプロセッサ応用製品の高機能化及び高速化に伴い、高い処理性能を持つマイクロプロセッサ（以下、単に「プロセッサ」という。）が望まれている。一般に、各命令のスループットを高めるために、１つの命令をいくつかの処理単位（ここでは「ステージ」と呼ぶ）に分割し、各ステージを別々のハードウェアで実行することにより、複数の命令を並行して処理できるようにするパイプライン方式が採用されている。また、パイプライン方式のような空間的な並列処理に加えて、時間的に命令レベルでの並列処理を行うＶＬＩＷ（Very Long Instruction Word）方式やスーパースカラ方式にて性能向上を図っている。 2. Description of the Related Art Along with high functionality and high speed of recent microprocessor application products, a microprocessor having high processing performance (hereinafter simply referred to as “processor”) is desired. In general, in order to increase the throughput of each instruction, one instruction is divided into several processing units (herein referred to as “stages”), and each stage is executed by separate hardware. A pipeline system that allows parallel processing is adopted. In addition to the spatial parallel processing such as the pipeline method, the VLIW (Very Long Instruction Word) method or the superscalar method that performs parallel processing at the instruction level in terms of time is intended to improve performance.

プロセッサの性能向上を妨げる主要因の１つとして分岐処理のオーバーヘッドがある。このオーバーヘッドは、上記パイプライン処理のステージ数が増すほど、命令供給のペナルティが大きくなる。また、命令並列処理を行った場合、並列度が向上すればするほど、分岐命令の頻度が増加し、オーバーヘッドが顕在化してくる。 One of the main factors that hinders improvement in processor performance is branch processing overhead. This overhead increases the penalty of instruction supply as the number of stages of the pipeline processing increases. Further, when instruction parallel processing is performed, the higher the degree of parallelism, the more frequently the branch instruction increases and the overhead becomes obvious.

そこで、このオーバーヘッドを解消する従来技術として、各命令に実行条件を示す情報を付加し、その条件が成立するときにのみ命令で示されたオペレーションを実行する、という条件付き実行方式がある。この方式では、実行時に各命令に付加された実行条件に対応する条件フラグを参照し、条件が成立しなかった場合には、その命令の実行結果を無効化する、すなわち無動作命令として実行させる。 Therefore, as a conventional technique for eliminating this overhead, there is a conditional execution method in which information indicating an execution condition is added to each instruction, and an operation indicated by the instruction is executed only when the condition is satisfied. In this method, a condition flag corresponding to an execution condition added to each instruction at the time of execution is referred to. If the condition is not satisfied, the execution result of the instruction is invalidated, that is, executed as a no-operation instruction. .

例えば、図１０に示した条件分岐を含むフローの処理を、各命令に実行条件を示す情報を付加する方式で記述すると、図１１のようなプログラムになる。図１１において、Ｃ０及びＣ１は、命令に付加された条件を示しており、それに対応する条件フラグの値が真の場合にはその命令が実行され、偽の場合にはその命令は無動作命令として実行される。この例では、まず命令１（比較命令）の比較結果がＣ０に格納される。それと同時にＣ１にはＣ０と逆の条件が設定される。したがって、命令２と命令３のうちいずれか一方に関して、実際にオペレーションが実行され、残りの一方は無動作命令として実行される。この結果、分岐処理が不要となり、分岐処理のオーバーヘッドを解決している。 For example, if the processing of the flow including the conditional branch shown in FIG. 10 is described by a method in which information indicating the execution condition is added to each instruction, a program as shown in FIG. 11 is obtained. In FIG. 11, C0 and C1 indicate conditions added to the instruction. When the value of the corresponding condition flag is true, the instruction is executed, and when it is false, the instruction is an inoperative instruction. Run as. In this example, the comparison result of instruction 1 (comparison instruction) is first stored in C0. At the same time, a condition opposite to C0 is set for C1. Therefore, the operation is actually executed for one of the instruction 2 and the instruction 3, and the remaining one is executed as a no-operation instruction. As a result, branch processing becomes unnecessary, and the overhead of branch processing is solved.

前述の従来の条件付き実行方式では、条件が不成立の場合、当該命令は無動作命令として動作することになり、実質はオペレーションを実行しないことになる。したがって、２命令が並列に記述され、２つの演算器を使用しているにもかかわらず、実際には１つの演算器しか有効に活用できていない。その結果、プログラムに記述された並列度に対して、実効性能が低くなってしまうという問題点がある。 In the above-described conventional conditional execution method, if the condition is not satisfied, the instruction operates as a no-operation instruction, and the operation is not actually executed. Therefore, even though two instructions are described in parallel and two arithmetic units are used, only one arithmetic unit can be effectively used in practice. As a result, there is a problem that the effective performance is lowered with respect to the degree of parallelism described in the program.

本発明はかかる問題点に鑑みてなされたものであり、ハードウェアの有効利用を達成し、性能を向上させたプロセッサを提供することを目的とする。 The present invention has been made in view of such problems, and an object thereof is to provide a processor that achieves effective use of hardware and has improved performance.

上記目的を達成するために、本発明の第１の特徴を有するプロセッサは、複数の命令を供給するための命令供給手段と、前記複数の命令を各々解読するための解読手段と、前記複数の命令中に各命令を実行するか否かを示す条件を指定する実行条件情報が指定され、当該実行条件情報で指定された条件を参照して、有効なオペレーションを実行する命令又は命令の集合を決定するための命令発行制御手段と、前記複数の命令中に各命令の動作が指定され、当該指定に基づいて１つ又は複数のオペレーションを実行するための実行手段とを備えたプロセッサにおいて、前記命令発行制御手段は、前記実行条件情報で指定された条件を参照することにより、実行する必要のある有効な命令であるか、実行する必要のない無効な命令であるかを判断し、無効な命令であると判断した命令に関しては、当該命令を前記実行手段へ発行する前に当該命令自体を削除するように制御し、かつ当該命令に代えて当該命令に後続する有効な命令を前記実行手段へ発行するように制御する機能を有することとしたものである。これにより、条件付き命令の条件が不成立の場合でも、無動作命令が実行されず、実行手段中の演算器が後続の命令によって有効利用されるため、演算器の利用効率が高まり、実効性能を向上させることができる。 In order to achieve the above object, a processor having the first feature of the present invention comprises: an instruction supply means for supplying a plurality of instructions; a decoding means for decoding each of the plurality of instructions; Execution condition information specifying a condition indicating whether or not to execute each instruction is specified in the instruction, and an instruction or a set of instructions for executing a valid operation is referred to by referring to the condition specified by the execution condition information. In a processor, comprising: an instruction issuance control means for determining; and an execution means for specifying an operation of each instruction in the plurality of instructions and executing one or a plurality of operations based on the specification. The instruction issuance control means determines whether the instruction is a valid instruction that needs to be executed or an invalid instruction that does not need to be executed by referring to the condition specified by the execution condition information. With respect to an instruction determined to be an invalid instruction, control is performed to delete the instruction itself before issuing the instruction to the execution means, and a valid instruction subsequent to the instruction is substituted for the instruction. It has a function to control to issue to the execution means. As a result, even when the condition of the conditional instruction is not satisfied, the no-operation instruction is not executed, and the arithmetic unit in the execution means is effectively used by the subsequent instruction. Can be improved.

また、本発明の第１の特徴を有するプロセッサにおいて、前記実行手段は、前記命令に対応するオペレーションを実行した後で、実行結果を無効化するための実行結果無効化手段を有し、各命令について、前記実行手段へ発行する前に当該命令自体を削除するか、前記実行結果無効化手段にて実行結果を無効化するかを選択するための命令無効化方法選択手段を更に備えることとした。これにより、実行条件判定に用いる条件フラグが未確定の場合にも、プロセッサのパイプラインを停止する必要がなくなり、性能を向上させることができる。 In the processor having the first feature of the present invention, the execution means includes execution result invalidation means for invalidating the execution result after executing the operation corresponding to the instruction. The instruction invalidation method selection means for selecting whether to delete the instruction itself before issuing to the execution means or invalidate the execution result by the execution result invalidation means. . This eliminates the need to stop the processor pipeline even when the condition flag used for execution condition determination is uncertain, and can improve performance.

また、本発明の第１の特徴を有するプロセッサにおいて、前記命令無効化方法選択手段は、各条件フラグの値が確定しているか否かを示す条件フラグ有効情報を参照することによって、いずれの命令無効化方法を選択するかを決定し、前記条件フラグ有効情報は、前記解読手段において条件フラグを更新する命令であると解読された場合に当該条件フラグの確定性が偽に設定され、前記実行手段において当該命令が実行されて当該条件フラグの値が確定した際に真に設定されることとした。 Further, in the processor having the first feature of the present invention, the instruction invalidation method selecting means refers to any instruction by referring to the condition flag valid information indicating whether or not the value of each condition flag is fixed. Decide whether to select an invalidation method, and if the condition flag validity information is decoded by the decoding means as an instruction to update the condition flag, the determinism of the condition flag is set to false and the execution When the instruction is executed by the means and the value of the condition flag is fixed, the value is set to true.

また、本発明の第１の特徴を有するプロセッサにおいて、前記命令発行制御手段は、複数の命令の機能が単一の命令で実現可能であるような命令の組み合わせを検出し、それら複数の命令を単一の命令として扱うように結合する機能を更に有することとした。これにより、元々複数の演算器を使用する予定であった命令を単一の演算器で実行させるようにすることができ、演算器の利用効率が高まり、実効性能を向上させることができる。 In the processor having the first feature of the present invention, the instruction issuance control means detects a combination of instructions such that the functions of a plurality of instructions can be realized by a single instruction, and outputs the plurality of instructions. It is further provided with a function of combining such that it is handled as a single instruction. As a result, an instruction originally intended to use a plurality of arithmetic units can be executed by a single arithmetic unit, the utilization efficiency of the arithmetic units can be increased, and the effective performance can be improved.

また、本発明の第１の特徴を有するプロセッサにおいて、前記複数の命令の結合は、前記実行手段への発行前における命令の削除の後に適用されることとした。 In the processor having the first feature of the present invention, the combination of the plurality of instructions is applied after the instruction is deleted before being issued to the execution means.

また、本発明の第１の特徴を有するプロセッサにおいて、前記命令発行制御手段は、各サイクルにおいて同一の前記実行条件情報を持つ命令が連続配置されている場合には、前記解読手段によって解読された複数の命令を予め各実行条件毎に分類し、その分類毎に条件フラグを参照して、実行する必要のある有効な命令であるか、実行する必要のない無効な命令であるかを判断することとした。これにより、条件フラグの参照を最小限に抑えることができ、命令の削除の判定に要する時間を削減することができる。 In the processor having the first feature of the present invention, the instruction issuance control means is decoded by the decoding means when instructions having the same execution condition information are continuously arranged in each cycle. A plurality of instructions are classified in advance for each execution condition, and a condition flag is referenced for each classification to determine whether the instruction is a valid instruction that needs to be executed or an invalid instruction that does not need to be executed. It was decided. As a result, reference to the condition flag can be minimized, and the time required for determining instruction deletion can be reduced.

また、本発明の第１の特徴を有するプロセッサにおいて、前記複数の命令中に各命令が並列実行の境界であるか否かの並列実行境界情報が指定され、前記命令発行制御手段は、各命令の前記並列実行境界情報を参照して、当サイクルにおいて実行の対象とする命令群を検出する機能を更に有することとした。 In the processor having the first feature of the present invention, parallel execution boundary information indicating whether each instruction is a boundary for parallel execution is specified in the plurality of instructions, and the instruction issue control means Referring to the parallel execution boundary information, a function for detecting an instruction group to be executed in this cycle is further provided.

また、本発明の第１の特徴を有するプロセッサにおいて、前記命令発行制御手段は、命令内の並列実行境界情報にて検出された境界命令以前の全ての命令が実行する必要のない無効な命令として削除された場合には、当該境界命令の並列実行境界情報を無効化し、当該境界命令以降の命令の並列実行境界情報を参照することにより当サイクルの新たな並列実行境界を検出することとした。これにより、あるサイクルに配置された全ての命令が削除された場合には、そのサイクル自体をスキップして次のサイクルの命令を実行することができるようになるので、実行サイクル数を削減することができる。 Further, in the processor having the first feature of the present invention, the instruction issuance control means is an invalid instruction that does not need to be executed by all instructions before the boundary instruction detected by the parallel execution boundary information in the instruction. When deleted, the parallel execution boundary information of the boundary instruction is invalidated, and the new parallel execution boundary of the cycle is detected by referring to the parallel execution boundary information of the instruction after the boundary instruction. As a result, if all instructions placed in a cycle are deleted, the cycle itself can be skipped and the next cycle instruction can be executed, reducing the number of execution cycles. Can do.

本発明の第２の特徴を有するプロセッサは、複数の命令を供給するための命令供給手段と、前記複数の命令を各々解読するための解読手段と、有効なオペレーションを実行する命令又は命令の集合を決定するための命令発行制御手段と、前記複数の命令中に各命令の動作が指定され、当該指定に基づいて１つ又は複数のオペレーションを実行するための実行手段とを備えたプロセッサにおいて、前記命令発行制御手段は、前記解読手段にて解読された命令群の中から、複数の命令の機能が単一の命令で実現可能であるような命令の組み合わせを検出し、それら複数の命令を単一の命令として扱うように結合する機能を有することとしたものである。これにより、元々実行手段中の複数の演算器を使用する予定であった命令を単一の演算器で実行させるようにすることができ、演算器の利用効率が高まり、実効性能を向上させることができる。 A processor having the second feature of the present invention includes an instruction supply means for supplying a plurality of instructions, a decoding means for decoding each of the plurality of instructions, and an instruction or a set of instructions for executing an effective operation. A processor comprising: an instruction issuance control means for determining the operation; and an execution means for specifying the operation of each instruction in the plurality of instructions and executing one or more operations based on the specification. The instruction issuance control means detects a combination of instructions such that a function of a plurality of instructions can be realized by a single instruction from the group of instructions decoded by the decoding means, and determines the plurality of instructions. It has a function of combining them so that they are handled as a single instruction. As a result, instructions that were originally scheduled to use multiple arithmetic units in the execution means can be executed by a single arithmetic unit, so that the utilization efficiency of the arithmetic units is increased and the effective performance is improved. Can do.

また、本発明の第２の特徴を有するプロセッサにおいて、前記命令発行制御手段は、当サイクルにて実行対象にも削除対象にも結合対象にもならずに残った命令群を検出し、それらの命令群を次サイクル以降で発行の対象とするように制御する機能を更に有することとした。これにより例外の発生やコンパイル装置の不良により、あるサイクルにて発行されずに残った命令が存在しても、誤動作を引き起こすことなく正確に実行を継続することができる。 In the processor having the second feature of the present invention, the instruction issuance control means detects a group of instructions that remain in the cycle without being executed, deleted or combined, and It is further provided with a function of controlling the instruction group to be issued in the next cycle. As a result, even if there is an instruction that is not issued in a certain cycle due to the occurrence of an exception or a failure of the compiling device, the execution can be continued accurately without causing a malfunction.

本発明によれば、ハードウェアの有効利用を達成し、性能を向上させたプロセッサを提供することができる。 According to the present invention, it is possible to provide a processor that achieves effective use of hardware and improves performance.

以下、本発明に係るプロセッサ、コンパイル装置及びコンパイル方法の実施の形態について、図面を用いて詳細に説明する。 Embodiments of a processor, a compiling device, and a compiling method according to the present invention will be described below in detail with reference to the drawings.

［実施の形態１：プロセッサ］
（命令フォーマットとアーキテクチャの概要）
まず、図１（ａ）〜（ｃ）を用いて、本発明に係るプロセッサが解読実行する命令の構造について説明する。図１（ａ）〜（ｃ）は、本プロセッサの命令フォーマットを示す図である。本プロセッサの各命令は、３２ビットの固定長であり、各命令は１ビットの並列実行境界情報（Ｅ：end bit）１０を保持している。この情報は、当該命令とそれに後続する命令との間に並列実行の境界が存在するか否かを示すものである。具体的には、並列実行境界情報Ｅが“１”の場合にはその命令と後続命令との間に並列実行の境界が存在し、並列実行境界情報Ｅが“０”の場合には並列実行の境界が存在しないことになる。この情報の利用方法については後で述べる。 [Embodiment 1: Processor]
(Instruction format and architecture overview)
First, the structure of an instruction that is decoded and executed by the processor according to the present invention will be described with reference to FIGS. 1A to 1C are diagrams showing the instruction format of the processor. Each instruction of this processor has a fixed length of 32 bits, and each instruction holds 1-bit parallel execution boundary information (E: end bit) 10. This information indicates whether or not there is a parallel execution boundary between the instruction and the subsequent instruction. Specifically, when the parallel execution boundary information E is “1”, a parallel execution boundary exists between the instruction and the subsequent instruction, and when the parallel execution boundary information E is “0”, the parallel execution is performed. There will be no boundaries. A method of using this information will be described later.

また、各命令は３ビットの実行条件情報（Ｐ：predicate）１１を保持している。この実行条件情報Ｐは、後述する図５中の８個の条件フラグＣ０〜Ｃ７（３１１）のうち当該命令を実行するか否かの条件が格納された条件フラグを指定するものである。この実行条件情報Ｐで指定された条件フラグの値が“１”の場合には当該命令で指定されたオペレーションを実行し、条件フラグの値が“０”の場合にはオペレーションを実行しない。 Each instruction holds 3-bit execution condition information (P: predicate) 11. The execution condition information P designates a condition flag in which a condition indicating whether or not to execute the instruction is stored among eight condition flags C0 to C7 (311) in FIG. 5 to be described later. When the value of the condition flag specified by the execution condition information P is “1”, the operation specified by the instruction is executed, and when the value of the condition flag is “0”, the operation is not executed.

各命令の命令長から並列実行境界情報Ｅと実行条件情報Ｐとを除いた２８ビットの部分にてオペレーションを指定する。具体的には、“Ｏｐ１”、“Ｏｐ２”及び“Ｏｐ３”のフィールドでは、オペレーションの種類を表すオペコードを、“Ｒｓ”のフィールドでは、ソースオペランドとなるレジスタのレジスタ番号を、“Ｒｄ”のフィールドでは、デスティネーションオペランドとなるレジスタのレジスタ番号をそれぞれ指定する。また、“ｉｍｍ”のフィールドでは、演算用定数オペランドを指定する。そして、“ｄｉｓｐ”のフィールドでは、変位（ディスプレースメント）を指定する。 The operation is specified by a 28-bit portion obtained by removing the parallel execution boundary information E and the execution condition information P from the instruction length of each instruction. Specifically, in the “Op1”, “Op2”, and “Op3” fields, an operation code indicating the type of operation is displayed. In the “Rs” field, the register number of the register that is the source operand is set in the “Rd” field. Specifies the register numbers of the registers that will be the destination operands. In the “imm” field, an arithmetic constant operand is designated. In the “disp” field, a displacement is specified.

次に、図２（ａ）及び（ｂ）を用いて、本プロセッサのアーキテクチャの概要について説明する。本プロセッサは、静的な並列スケジューリングを前提としたプロセッサである。 Next, an outline of the architecture of the processor will be described with reference to FIGS. This processor is a processor premised on static parallel scheduling.

命令の供給は、図２（ａ）に示すように、毎サイクル１２８ビット固定長の命令供給単位（ここでは「パケット」と呼ぶ。）で４命令ずつ供給する。そして、命令の実行は、図２（ｂ）に示すように、１サイクルで並列実行の境界までの命令（ここでは「実行単位」と呼ぶ。）を同時実行する。つまり、各サイクルにおいて並列実行境界情報Ｅが“１”である命令までの命令を並列実行することになる。供給されながら実行されなかった命令は、命令バッファ内に残され、次のサイクル以降で実行の対象となる。 As shown in FIG. 2A, four instructions are supplied in units of instruction supply units each having a fixed length of 128 bits (referred to herein as “packets”). As shown in FIG. 2B, the instructions are executed by executing instructions up to the boundary of parallel execution (referred to herein as “execution units”) in one cycle. That is, instructions up to the instruction whose parallel execution boundary information E is “1” are executed in parallel in each cycle. An instruction that has been supplied but not executed remains in the instruction buffer and becomes an object to be executed after the next cycle.

つまり、このアーキテクチャでは、固定長のパケット単位で命令を供給しておき、静的に求めた情報を元に、各サイクルにおいて並列度に応じた適切な数の命令を発行していく、ということになる。この手法をとることにより、通常の固定長命令のＶＬＩＷ方式で発生していた無動作命令（ｎｏｐ命令）が全く無くなり、コードサイズを削減することができる。 In other words, in this architecture, instructions are supplied in units of fixed-length packets, and an appropriate number of instructions according to the degree of parallelism is issued in each cycle based on statically obtained information. become. By adopting this method, the no-operation instruction (nop instruction) generated in the normal fixed-length instruction VLIW method is completely eliminated, and the code size can be reduced.

（プロセッサのハードウェア構成）
図３は、本発明に係るプロセッサのハードウェア構成を示すブロック図である。本プロセッサは、２つの演算器を持つ並列実行プロセッサであり、大きく分けて、命令供給部２０、解読部３０、実行部４０から構成される。 (Processor hardware configuration)
FIG. 3 is a block diagram showing the hardware configuration of the processor according to the present invention. This processor is a parallel execution processor having two arithmetic units, and is roughly composed of an instruction supply unit 20, a decoding unit 30, and an execution unit 40.

命令供給部２０は、図示されていない外部メモリから命令群を供給し、解読部３０に出力するものであり、命令フェッチ部２１、命令バッファ２２及び命令レジスタ２３からなる。 The instruction supply unit 20 supplies an instruction group from an external memory (not shown) and outputs the instruction group to the decoding unit 30. The instruction supply unit 20 includes an instruction fetch unit 21, an instruction buffer 22, and an instruction register 23.

命令フェッチ部２１は、３２ビットのＩＡ（インストラクションアドレス）バス及び１２８ビットのＩＤ（インストラクションデータ）バスを通じて図示されていない外部メモリから命令のブロックをフェッチし、内部の命令キャッシュに保持するとともに、ＰＣ（プログラムカウンタ）部４２から出力されたアドレスに相当する命令群を命令バッファ２２に供給する。 The instruction fetch unit 21 fetches a block of instructions from an external memory (not shown) through a 32-bit IA (instruction address) bus and a 128-bit ID (instruction data) bus, and holds the block in an internal instruction cache. An instruction group corresponding to the address output from the (program counter) unit 42 is supplied to the instruction buffer 22.

命令バッファ２２は、１２８ビットのバッファを２個備えており、命令フェッチ部２１によって供給された命令を蓄積しておくために用いられる。命令バッファ２２へは、命令フェッチ部２１から１２８ビット単位でパケットが供給される。命令バッファ２２に蓄積された命令は、命令レジスタ２３の適切なレジスタに出力される。 The instruction buffer 22 includes two 128-bit buffers, and is used to store instructions supplied by the instruction fetch unit 21. Packets are supplied to the instruction buffer 22 from the instruction fetch unit 21 in units of 128 bits. The instruction stored in the instruction buffer 22 is output to an appropriate register of the instruction register 23.

命令レジスタ２３は、４個の３２ビットレジスタ２３１〜２３４からなり、命令バッファ２２から送られてきた命令を保持するためのものである。命令レジスタ２３の周辺については、別の図面において更に詳細な構成を示している。 The instruction register 23 includes four 32-bit registers 231 to 234, and is used to hold an instruction sent from the instruction buffer 22. The configuration around the instruction register 23 is shown in more detail in another drawing.

解読部３０は、命令レジスタ２３に保持された命令を解読し、その解読結果に応じた制御信号を実行部４０に出力するものであり、大きく分けて、命令発行制御部３１、命令デコーダ３２及び命令無効化方法選択部３８からなる。 The decoding unit 30 decodes the instruction held in the instruction register 23, and outputs a control signal corresponding to the decoding result to the execution unit 40, and is roughly divided into an instruction issue control unit 31, an instruction decoder 32, and An instruction invalidation method selection unit 38 is included.

命令発行制御部３１は、命令レジスタ２３の４個のレジスタ２３１〜２３４に保持された命令に対して、命令内の実行条件情報Ｐと、それに対応する条件フラグとを参照することによって、条件フラグの値が偽である命令に関しては、その命令自体を実質的に削除するといった処理を行う。ただし、命令無効化方法選択部３８で解読部３０が選択された場合に限る。また、命令発行制御部３１は、命令内の並列実行境界情報Ｅを参照することによって、並列実行の境界を越えた命令について、その命令の発行を無効化するといった発行に関する制御を行う。なお、命令発行制御部３１については、別の図面において更に詳細な動作説明を行う。 The instruction issuance control unit 31 refers to the execution condition information P in the instruction and the condition flag corresponding to the instruction held in the four registers 231 to 234 of the instruction register 23, thereby obtaining the condition flag. For an instruction whose value is false, processing is performed such that the instruction itself is substantially deleted. However, only when the decoding unit 30 is selected by the instruction invalidation method selection unit 38. The instruction issuance control unit 31 controls the issuance of an instruction that invalidates the issuance of an instruction that exceeds the boundary of parallel execution by referring to the parallel execution boundary information E in the instruction. The instruction issue control unit 31 will be described in more detail in another drawing.

命令デコーダ３２は、命令レジスタ２３に格納された命令群を解読する装置であり、第１命令デコーダ３３、第２命令デコーダ３４、第３命令デコーダ３５及び第４命令デコーダ３６からなる。これらのデコーダ３３〜３６の各々は、基本的に１サイクルに１つの命令を解読し、実行部４０に制御信号を与える。また、命令内に置かれた定数オペランドについては、各命令デコーダから実行部４０のデータバス４８に転送される。 The instruction decoder 32 is a device that decodes the instruction group stored in the instruction register 23, and includes a first instruction decoder 33, a second instruction decoder 34, a third instruction decoder 35, and a fourth instruction decoder 36. Each of these decoders 33 to 36 basically decodes one instruction in one cycle and gives a control signal to the execution unit 40. Also, the constant operand placed in the instruction is transferred from each instruction decoder to the data bus 48 of the execution unit 40.

命令無効化方法選択部３８は、条件フラグが偽であり実行する必要のない命令を、解読部３０にて無効化するのか実行部４０にて無効化するのかを選択する。具体的には、後述する命令発行制御部３１の条件フラグ有効情報３１２（図５）にて、当該命令の条件フラグが有効である、つまり確定していると示された場合には、解読部３０にて無効な命令の削除を行い、そうでない場合には、実行部４０の書き込み制御部４６にて当該命令の実行結果の書き込みを無効化する。 The instruction invalidation method selection unit 38 selects whether an instruction that has a condition flag set to false and does not need to be executed is invalidated by the decoding unit 30 or invalidated by the execution unit 40. Specifically, when the condition flag validity information 312 (FIG. 5) of the instruction issuance control unit 31 to be described later indicates that the condition flag of the instruction is valid, that is, confirmed, the decoding unit The invalid instruction is deleted at 30, and if not, the writing control unit 46 of the execution unit 40 invalidates the writing of the execution result of the instruction.

実行部４０は、解読部３０での解読結果に基づいて、最大２つのオペレーションを並列実行する回路ユニットであり、実行制御部４１、ＰＣ部４２、レジスタファイル４３、第１演算器４４、第２演算器４５、書き込み制御部４６、オペランドアクセス部４７及びデータバス４８，４９からなる。 The execution unit 40 is a circuit unit that executes a maximum of two operations in parallel based on the decoding result of the decoding unit 30, and includes an execution control unit 41, a PC unit 42, a register file 43, a first computing unit 44, a second computing unit 44, It comprises an arithmetic unit 45, a write control unit 46, an operand access unit 47, and data buses 48 and 49.

実行制御部４１は、解読部３０での解読結果に基づいて実行部４０の各構成要素４２〜４９を制御する制御回路や配線の総称であり、タイミング制御、動作許可禁止制御、ステータス管理、割り込み制御などの回路を有する。 The execution control unit 41 is a general term for control circuits and wirings that control the constituent elements 42 to 49 of the execution unit 40 based on the decoding result of the decoding unit 30. Timing control, operation permission prohibition control, status management, interrupt It has circuits such as control.

ＰＣ部４２は、次に解読実行すべき命令が置かれている図示されていない外部メモリ上のアドレスを、命令供給部２０内の命令フェッチ部２１に出力する。 The PC unit 42 outputs an address in an external memory (not shown) where an instruction to be decoded and executed next is placed to the instruction fetch unit 21 in the instruction supply unit 20.

レジスタファイル４３は、６４個の３２ビットレジスタ（Ｒ０〜Ｒ６３）から構成される。これらのレジスタに格納された値は、命令デコーダ３２での解読結果に基づいて、データバス４８を経由して第１演算器４４及び第２演算器４５に転送され、そこで演算が施され、又はそこを単に通過した後に、データバス４９を経由してレジスタファイル４３又はオペランドアクセス部４７に送られる。 The register file 43 is composed of 64 32-bit registers (R0 to R63). The values stored in these registers are transferred to the first computing unit 44 and the second computing unit 45 via the data bus 48 based on the result of decoding by the instruction decoder 32, where they are operated, or After simply passing there, it is sent to the register file 43 or the operand access unit 47 via the data bus 49.

第１演算器４４及び第２演算器４５は、それぞれ２個の３２ビットデータに対して算術論理演算を行うＡＬＵや乗算器と、シフト演算を行うバレルシフタとを内部に有し、実行制御部４１による制御の下で演算を実行する。 Each of the first computing unit 44 and the second computing unit 45 includes an ALU and a multiplier that perform arithmetic logic operations on two 32-bit data, and a barrel shifter that performs shift operations, and the execution control unit 41. The calculation is executed under the control of.

書き込み制御部４６は、ある命令を実行部４０にて無効化することが命令無効化方法選択部３８にて選択された場合のみ、当該命令の条件フラグの内容が偽であったとき、当該命令の実行結果をレジスタファイル４３に書き込まないように制御を行う。これにより、当該命令に関しては、無動作命令（ｎｏｐ命令）を実行した場合と同等の結果となる。 Only when the instruction invalidation method selection unit 38 selects to invalidate a certain instruction by the execution unit 40, the write control unit 46, when the content of the condition flag of the instruction is false, Control is performed so that the execution result is not written to the register file 43. As a result, with respect to the instruction, a result equivalent to that obtained when a no-operation instruction (nop instruction) is executed is obtained.

オペランドアクセス部４７は、レジスタファイル４３と図示されていない外部メモリとの間でオペランドの転送を行う回路である。具体的には、例えば、命令内で、オペコードとして“ｌｄ”（ロード）が置かれていた場合には、外部メモリに置かれていた１ワード（３２ビット）のデータがオペランドアクセス部４７を経てレジスタファイル４３の指定されたレジスタにロードされ、また、オペコードとして“ｓｔ”（ストア）が置かれていた場合には、レジスタファイル４３の指定されたレジスタの格納値が外部メモリにストアされる。 The operand access unit 47 is a circuit that transfers operands between the register file 43 and an external memory (not shown). Specifically, for example, when “ld” (load) is placed as an operation code in an instruction, 1 word (32 bits) of data placed in the external memory passes through the operand access unit 47. When the data is loaded to the designated register of the register file 43 and “st” (store) is placed as the operation code, the stored value of the designated register of the register file 43 is stored in the external memory.

上記ＰＣ部４２、レジスタファイル４３、第１演算器４４、第２演算器４５、書き込み制御部４６及びオペランドアクセス部４７は、図示されるように、データバス４８（Ｌ１バス、Ｒ１バス、Ｌ２バス、Ｒ２バス）及びデータバス４９（Ｄ１バス、Ｄ２バス）で接続されている。なお、Ｌ１バス及びＲ１バスは第１演算器４４の２つの入力ポートに、Ｌ２バス及びＲ２バスは第２演算器４５の２つの入力ポートに、Ｄ１バス及びＤ２バスは第１演算器４４及び第２演算器４５の出力ポートにそれぞれ接続されている。 The PC unit 42, register file 43, first computing unit 44, second computing unit 45, write control unit 46, and operand access unit 47 include a data bus 48 (L1 bus, R1 bus, L2 bus), as shown. , R2 bus) and data bus 49 (D1 bus, D2 bus). The L1 bus and R1 bus are connected to the two input ports of the first computing unit 44, the L2 bus and R2 bus are connected to the two input ports of the second computing unit 45, and the D1 bus and D2 bus are connected to the first computing unit 44 and Each is connected to the output port of the second computing unit 45.

（命令レジスタ２３の周辺の構成と命令発行制御部３１の動作）
図４は、命令レジスタ２３の周辺の構成を示すブロック図である。図中、破線の矢印は制御信号を表す。 (The peripheral configuration of the instruction register 23 and the operation of the instruction issuance control unit 31)
FIG. 4 is a block diagram showing a configuration around the instruction register 23. In the figure, broken arrows represent control signals.

命令レジスタ２３は、Ａレジスタ２３１、Ｂレジスタ２３２、Ｃレジスタ２３３及びＤレジスタ２３４の４個の３２ビットレジスタからなる。命令レジスタ２３には、命令バッファ２２から命令が供給される。 The instruction register 23 includes four 32-bit registers, an A register 231, a B register 232, a C register 233, and a D register 234. An instruction is supplied from the instruction buffer 22 to the instruction register 23.

第１〜第４命令デコーダ３３，３４，３５，３６は、各々３２ビットの命令を入力とし、それを解読して、その命令の動作に関する制御信号を出力するとともに、命令内に配置された定数オペランドを出力する。図４の５０及び５１は、各々実行が確定した命令の定数オペランドである。 The first to fourth instruction decoders 33, 34, 35, and 36 each receive a 32-bit instruction, decode it, output a control signal relating to the operation of the instruction, and constants arranged in the instruction Output the operand. Reference numerals 50 and 51 in FIG. 4 denote constant operands of instructions whose execution has been determined.

また、第２〜第４命令デコーダ３４，３５，３６には、制御信号として１ビットの無動作命令フラグが入力される。このフラグを“１”にセットすると、そのデコーダは出力として無動作命令に相当する制御信号を出力する。つまり、無動作命令フラグをセットすることにより、その命令デコーダのデコードを無効化することができる。 Further, a 1-bit no-operation instruction flag is input to the second to fourth instruction decoders 34, 35, and 36 as a control signal. When this flag is set to “1”, the decoder outputs a control signal corresponding to a no-operation command as an output. That is, by setting the no-operation instruction flag, the decoding of the instruction decoder can be invalidated.

そして、命令発行制御部３１は、命令レジスタ２３に格納された命令内の情報を参照して、並列実行の境界以降の命令のデコードを無効化するための無動作命令フラグの生成と、実行条件が真であり、かつ実行部４０でオペレーションを実行すべき有効な命令を選択するための実行命令セレクタ３７１，３７２の制御と、それに対応した制御信号を選択するための実行命令セレクタ３７３，３７４の制御とを行う。 Then, the instruction issuance control unit 31 refers to the information in the instruction stored in the instruction register 23, generates a no-operation instruction flag for invalidating the decoding of the instruction after the parallel execution boundary, and the execution condition Of the execution instruction selectors 371 and 372 for selecting a valid instruction to be executed by the execution unit 40 and the execution instruction selectors 373 and 374 for selecting a control signal corresponding thereto. Control.

図５は、本プロセッサの命令発行制御部３１とその周辺回路の構成を示したものである。命令発行制御部３１は、まず各命令内の並列実行境界情報Ｅを参照し、このサイクルでどこまでの命令を発行するのかを決める。そして、このサイクルで発行されない命令に対応する命令デコーダの無動作命令フラグを“１”にセットすることにより、そのデコーダの出力を無動作命令に相当する制御信号にする。この無動作命令フラグの生成は、図５の命令発行制御部３１の右半部に示されたような簡単な論理回路（ＯＲゲート）３１４，３１５で実現することができる。それと同時に、どれだけの命令が発行されずに残ったのかの情報を命令バッファ２２に伝達する。 FIG. 5 shows the configuration of the instruction issue control unit 31 and its peripheral circuits of this processor. The instruction issuance control unit 31 first refers to the parallel execution boundary information E in each instruction and determines how many instructions are to be issued in this cycle. Then, by setting the no-operation instruction flag of the instruction decoder corresponding to the instruction not issued in this cycle to “1”, the output of the decoder is made a control signal corresponding to the no-operation instruction. The generation of the no-operation instruction flag can be realized by a simple logic circuit (OR gate) 314, 315 as shown in the right half of the instruction issuance control unit 31 in FIG. At the same time, information on how many instructions remain without being issued is transmitted to the instruction buffer 22.

具体的に説明すると、Ａレジスタ２３１の命令の並列実行境界情報Ｅが“１”の場合には、第２、第３及び第４命令デコーダ３４，３５，３６のデコードを無効化する。また、Ｂレジスタ２３２の命令の並列実行境界情報Ｅが“１”の場合には、第３及び第４命令デコーダ３５，３６のデコードを無効化する。そして、Ｃレジスタ２３３の命令の並列実行境界情報Ｅが“１”の場合には、第４命令デコーダ３６のデコードを無効化することになる。 More specifically, when the parallel execution boundary information E of the instruction in the A register 231 is “1”, the decoding of the second, third, and fourth instruction decoders 34, 35, and 36 is invalidated. If the parallel execution boundary information E of the instruction in the B register 232 is “1”, the decoding of the third and fourth instruction decoders 35 and 36 is invalidated. When the parallel execution boundary information E of the instruction in the C register 233 is “1”, the decoding of the fourth instruction decoder 36 is invalidated.

更に、命令発行制御部３１は、各命令内の実行条件情報Ｐを参照し、条件フラグが偽となる命令、すなわち実行する必要のない命令に関して、その命令自体を実質的に削除してしまうように、図４の実行命令セレクタ３７１〜３７４を制御する。本プロセッサでは、各サイクル最大４命令をデコードするが、実際にオペレーションが実行されるのはたかだか２命令ということになる。これによって、実行条件が偽の場合に実行部４０にて無動作命令が実行されてしまい、演算器４４，４５の利用効率が悪くなるという問題点を解決している。 Further, the instruction issuance control unit 31 refers to the execution condition information P in each instruction, and substantially deletes the instruction itself with respect to an instruction whose condition flag is false, that is, an instruction that does not need to be executed. The execution instruction selectors 371 to 374 in FIG. 4 are controlled. In this processor, a maximum of 4 instructions are decoded in each cycle, but the operation is actually executed at most 2 instructions. This solves the problem that when the execution condition is false, the non-operation instruction is executed in the execution unit 40, and the utilization efficiency of the computing units 44 and 45 is deteriorated.

これを実現するために、命令発行制御部３１は、実行命令選択制御部３１３を備えている。実行命令選択制御部３１３は、８個の条件フラグ（Ｃ０〜Ｃ７）３１１のうち命令内に指定された実行条件情報Ｐに対応する条件フラグを参照することにより、オペレーションを実行する必要のない命令を検出し、その命令を選択せず、後続の有効な命令を選択するように実行命令セレクタ３７１〜３７４を制御する。非選択の命令自体が実質的に削除されることになる。条件フラグ３１１は８個の１ビットレジスタＣ０〜Ｃ７からなり、各命令内の３ビットの実行条件情報Ｐをデコードすることにより指定される。ただし、条件フラグＣ７の値は常に“１”であり、常に実行する命令は、実行条件としてＣ７を指定することになる。プログラム中での記述ではＣ７の指定は省略することができる。 In order to realize this, the instruction issuance control unit 31 includes an execution instruction selection control unit 313. The execution instruction selection control unit 313 refers to the condition flag corresponding to the execution condition information P specified in the instruction among the eight condition flags (C0 to C7) 311 so that the operation does not need to be executed. The execution instruction selectors 371 to 374 are controlled so as to select the succeeding effective instruction without selecting the instruction. The unselected instruction itself will be substantially deleted. The condition flag 311 includes eight 1-bit registers C0 to C7, and is designated by decoding the 3-bit execution condition information P in each instruction. However, the value of the condition flag C7 is always “1”, and an instruction that is always executed designates C7 as an execution condition. Specification of C7 can be omitted in the description in the program.

ただし、条件フラグを更新する命令において、条件フラグが確定するのは、実行ステージすなわち実行部４０なので、前サイクルにおいて、ある条件フラグを更新する命令を実行している場合、次サイクルの解読ステージすなわち解読部３０にてその条件フラグは確定しておらず、命令の削除可否の判断を行うことができない。この状態を検出するために、条件フラグ有効情報３１２が備えられている。 However, in the instruction that updates the condition flag, it is the execution stage, that is, the execution unit 40 that determines the condition flag. Therefore, if an instruction that updates a certain condition flag is executed in the previous cycle, the decoding stage of the next cycle, that is, The condition flag is not fixed in the decoding unit 30, and it cannot be determined whether or not the instruction can be deleted. In order to detect this state, condition flag valid information 312 is provided.

条件フラグ有効情報３１２は、各条件フラグ毎にその値が有効であるか否かの１ビットの値を保持しており、解読部３０にてある条件フラグを更新する命令を実行することが判明した際に、その条件フラグの有効情報を“０”に設定し、実行部４０においてその条件フラグの値の更新が完了すると、その条件フラグの有効情報を“１”に設定する。 The condition flag validity information 312 holds a 1-bit value indicating whether or not the value is valid for each condition flag, and the decoding unit 30 executes an instruction to update a certain condition flag. In this case, the valid information of the condition flag is set to “0”, and when the execution unit 40 completes updating the value of the condition flag, the valid information of the condition flag is set to “1”.

命令発行制御部３１では、各命令の実行条件情報Ｐを参照した後、条件フラグ有効情報３１２を参照して、各実行条件に対応する条件フラグの値が有効であるか否かを検出する。そして、有効でなかった場合、すなわち条件フラグ有効情報３１２の該当ビットが“０”であった場合、当該命令自体の削除は行わない。当該命令はそのまま実行部４０に発行され、条件フラグが確定した時点で、必要であればその命令の実行結果の書き込みを無効化する。 The instruction issuance control unit 31 refers to the execution condition information P of each instruction and then refers to the condition flag validity information 312 to detect whether or not the value of the condition flag corresponding to each execution condition is valid. If it is not valid, that is, if the corresponding bit of the condition flag valid information 312 is “0”, the instruction itself is not deleted. The instruction is issued to the execution unit 40 as it is, and when the condition flag is fixed, writing of the execution result of the instruction is invalidated if necessary.

条件フラグの値が有効であった場合、すなわち条件フラグ有効情報３１２の該当ビットが“１”であった場合、当該命令の実行条件情報Ｐで指定された条件フラグ３１１内の１ビットを参照し、その値が“１”であった場合には、その命令をそのまま実行部４０に発行し、値が“０”であった場合には、その命令自体を実質的に削除するように実行命令セレクタ３７１〜３７４を制御する。 When the value of the condition flag is valid, that is, when the corresponding bit of the condition flag validity information 312 is “1”, the 1 bit in the condition flag 311 designated by the execution condition information P of the instruction is referred to. When the value is “1”, the instruction is issued as it is to the execution unit 40, and when the value is “0”, the instruction itself is substantially deleted. The selectors 371 to 374 are controlled.

つまり、ある命令の実行条件情報Ｐが“０”である場合、直前の命令において対応する条件フラグが更新される場合には、実行部４０においてその命令の実行結果を無効化し、そうでない場合には、解読部３０において、その命令自体を実質的に削除してしまうことになる。 That is, when the execution condition information P of a certain instruction is “0”, if the corresponding condition flag is updated in the immediately preceding instruction, the execution unit 40 invalidates the execution result of the instruction. In the decoding unit 30, the instruction itself is substantially deleted.

図６は、具体的な命令列を実行した際のパイプライン処理のタイミングを示す図である。ここでは、３つの命令を上から順に１命令ずつ実行した場合を想定している。最初の命令はレジスタＲ０の内容とレジスタＲ１の内容とを比較して、一致していれば条件フラグＣ０に“１”を設定し、そうでなければ“０”を設定する比較命令であり、次の命令は、条件フラグＣ０の内容が“１”の場合のみ、レジスタＲ３の内容からレジスタＲ２の内容の減算を行って、結果をレジスタＲ３に書き込む減算命令であり、最後の命令は、条件フラグＣ０の内容が“１”の場合のみ、レジスタＲ４の内容とレジスタＲ５の内容との加算を行って、結果をレジスタＲ５に書き込む加算命令である。 FIG. 6 is a diagram showing the timing of pipeline processing when a specific instruction sequence is executed. Here, it is assumed that three instructions are executed one by one in order from the top. The first instruction is a comparison instruction that compares the contents of the register R0 with the contents of the register R1, sets "1" to the condition flag C0 if they match, and sets "0" otherwise. The next instruction is a subtraction instruction that subtracts the contents of the register R2 from the contents of the register R3 and writes the result to the register R3 only when the contents of the condition flag C0 is “1”. Only when the content of the flag C0 is “1”, this is an addition instruction for adding the content of the register R4 and the content of the register R5 and writing the result to the register R5.

図６中、各命令の右側に、それぞれの命令の命令フェッチステージ（ＩＦ）、解読ステージ（ＤＥＣ）、実行ステージ（ＥＸ）のタイミングを示している。ここでは、最初の比較命令の結果が偽、すなわちＣ０が“０”になった場合を仮定している。 In FIG. 6, the timing of the instruction fetch stage (IF), decoding stage (DEC), and execution stage (EX) of each instruction is shown on the right side of each instruction. Here, it is assumed that the result of the first comparison instruction is false, that is, C0 becomes “0”.

図６を見てわかるように、最初の比較命令の解読ステージ（ＤＥＣ）にて、Ｃ０を更新する命令であることが検出され、Ｃ０の有効情報が“０”に設定され、実行ステージ（ＥＸ）にて、比較結果が確定した後で、Ｃ０の有効情報が“１”に設定される。 As can be seen from FIG. 6, at the decoding stage (DEC) of the first comparison instruction, it is detected that the instruction is to update C0, the valid information of C0 is set to “0”, and the execution stage (EX ), After the comparison result is confirmed, the valid information of C0 is set to “1”.

後続の減算命令、加算命令ともにＣ０を条件として実行する命令であるが、比較命令の直後の減算命令に関しては、解読ステージ（ＤＥＣ）の段階でＣ０の値が有効でないため命令自体の削除は行わず、実行ステージ（ＥＸ）に発行され、そのステージにて実行結果が無効化される。一方、加算命令に関しては、解読ステージ（ＤＥＣ）の時点でＣ０の値が確定しているため、解読ステージ（ＤＥＣ）にて命令自体が実質的に削除され、実行ステージ（ＥＸ）へは発行されない。この場合、空いた演算器を加算命令の後続の命令で活用できることになる。 Both the subsequent subtraction instruction and the addition instruction are executed under the condition of C0. However, for the subtraction instruction immediately after the comparison instruction, the instruction itself is deleted because the value of C0 is not valid at the stage of the decoding stage (DEC). First, it is issued to the execution stage (EX), and the execution result is invalidated at that stage. On the other hand, for the add instruction, since the value of C0 is fixed at the time of the decoding stage (DEC), the instruction itself is substantially deleted at the decoding stage (DEC) and is not issued to the execution stage (EX). . In this case, the free arithmetic unit can be used in the instruction subsequent to the addition instruction.

以上のような制御による命令の無効化後においても、発行されずに残った命令が存在した場合、命令発行制御部３１は残った命令の個数を命令バッファ２２に伝達し、命令バッファ２２内でそれらの命令が無効化されず、次のサイクルにおいて再び命令レジスタ２３に転送されるように制御する。 If there is a remaining instruction that is not issued even after the instruction is invalidated by the control as described above, the instruction issuance control unit 31 transmits the number of remaining instructions to the instruction buffer 22 and is stored in the instruction buffer 22. Control is performed so that these instructions are not invalidated and transferred to the instruction register 23 again in the next cycle.

このように、図１に示したような命令フォーマットをとり、図４及び図５に示したような構成にすることで、演算器を有効活用する命令発行制御を行うことができる。 In this way, by taking the instruction format as shown in FIG. 1 and adopting the configuration as shown in FIGS. 4 and 5, instruction issue control that effectively uses the arithmetic unit can be performed.

（プロセッサの動作）
次に、具体的な命令を解読実行した場合の本実施形態のプロセッサの動作について説明する。 (Processor operation)
Next, the operation of the processor of this embodiment when a specific instruction is decoded and executed will be described.

図７は、条件付き実行を含むプログラムの一部を示す図である。このプログラムは５個の命令で構成されており、各命令の処理内容はニーモニックで表現されている。具体的には、ニーモニック“ａｄｄ”は、定数又はレジスタの格納値とレジスタの格納値との加算を表し、ニーモニック“ｓｕｂ”は、レジスタの格納値からの定数又はレジスタの格納値の減算を表し、ニーモニック“ｓｔ”は、レジスタの格納値のメモリへの転送を表し、ニーモニック“ｍｏｖ”は、定数又はレジスタの格納値のレジスタへの転送を表している。 FIG. 7 is a diagram showing a part of a program including conditional execution. This program is composed of five instructions, and the processing content of each instruction is expressed in mnemonics. Specifically, the mnemonic “add” represents addition of a constant or register storage value and the register storage value, and the mnemonic “sub” represents subtraction of the constant or register storage value from the register storage value. The mnemonic “st” represents the transfer of the stored value of the register to the memory, and the mnemonic “mov” represents the transfer of the constant or the stored value of the register to the register.

また、“Ｒｎ（ｎ＝０〜６３）”はレジスタファイル４３の中の１つのレジスタを示す。そして、各命令の並列実行境界情報Ｅについても“０”又は“１”で示してある。更に、実行条件情報Ｐで指定される条件フラグについて、各命令の先頭に“［］”で囲んで記述してある。記述していない命令は常に実行する命令である。 “Rn (n = 0 to 63)” indicates one register in the register file 43. The parallel execution boundary information E of each instruction is also indicated by “0” or “1”. Further, the condition flag specified by the execution condition information P is described in “[]” at the head of each instruction. An instruction that is not described is an instruction that is always executed.

以下、各実行単位ごとの本プロセッサの動作を説明する。ただし、ここでは、最初の時点で、条件フラグＣ０の値が“１”、Ｃ１の値が“０”で確定しているものとする。 The operation of this processor for each execution unit will be described below. However, here, it is assumed that the value of the condition flag C0 is “1” and the value of C1 is “0” at the first time.

（実行単位１）
命令１、命令２、命令３及び命令４を含むパケットが外部メモリから供給され、それぞれ命令レジスタ２３に転送される。次に命令発行制御部３１が各命令の並列実行境界情報Ｅを参照する。この場合、命令３の並列実行境界情報が“１”であるため、第４命令デコーダ３６の解読結果を無効化、すなわち無動作命令とする。 (Execution unit 1)
Packets including instruction 1, instruction 2, instruction 3 and instruction 4 are supplied from the external memory and transferred to the instruction register 23, respectively. Next, the instruction issue control unit 31 refers to the parallel execution boundary information E of each instruction. In this case, since the parallel execution boundary information of the instruction 3 is “1”, the decoding result of the fourth instruction decoder 36 is invalidated, that is, a non-operation instruction.

次に、命令発行制御部３１は各命令の実行条件情報Ｐを参照する。命令１の実行条件フラグはＣ０であり、Ｃ０の値は“１”で確定しているので、命令１を第１番目の命令として実行するように、オペランドの選択を実行命令セレクタ３７１で制御し、解読結果を選択するように実行命令セレクタ３７３を制御する。次に命令２の実行条件フラグはＣ１であり、Ｃ１の値は“０”で確定しているので、命令２自体は実質的に削除し、オペレーションの実行は行わない。そして、後続の命令３は常に実行される命令なので、命令３を第２番目の命令として実行するように、オペランドの選択を実行命令セレクタ３７２で制御し、解読結果を選択するように実行命令セレクタ３７４を制御する。結果的に命令１と命令３が実行する命令として実行部４０に送られ、発行されなかった命令４は、命令バッファ２２内に残される。 Next, the instruction issue control unit 31 refers to the execution condition information P of each instruction. Since the execution condition flag of the instruction 1 is C0 and the value of C0 is fixed to “1”, the selection of the operand is controlled by the execution instruction selector 371 so that the instruction 1 is executed as the first instruction. The execution instruction selector 373 is controlled so as to select the decoding result. Next, since the execution condition flag of the instruction 2 is C1, and the value of C1 is “0”, the instruction 2 itself is substantially deleted and the operation is not executed. Since the subsequent instruction 3 is always executed, the selection of the operand is controlled by the execution instruction selector 372 so that the instruction 3 is executed as the second instruction, and the execution instruction selector is selected so as to select the decoding result. 374 is controlled. As a result, the instructions 1 and 3 are sent to the execution unit 40 as instructions to be executed, and the instruction 4 that has not been issued remains in the instruction buffer 22.

実行部４０では、レジスタＲ０の格納値に１を加えた値がレジスタＲ０に格納され、レジスタＲ１の格納値とレジスタＲ２の格納値とを加えた値がレジスタＲ２に格納される。 In the execution unit 40, a value obtained by adding 1 to the stored value of the register R0 is stored in the register R0, and a value obtained by adding the stored value of the register R1 and the stored value of the register R2 is stored in the register R2.

（実行単位２）
命令バッファ２２に残された命令４と、新たに外部メモリから供給された命令５とが順に命令レジスタ２３に転送される。次に命令発行制御部３１が各命令の並列実行境界情報Ｅを参照する。この場合、命令５の並列実行境界情報が“１”であるため、第３命令デコーダ３５及び第４命令デコーダ３６の解読結果を無効化、すなわち無動作命令とする。 (Execution unit 2)
The instruction 4 left in the instruction buffer 22 and the instruction 5 newly supplied from the external memory are sequentially transferred to the instruction register 23. Next, the instruction issue control unit 31 refers to the parallel execution boundary information E of each instruction. In this case, since the parallel execution boundary information of the instruction 5 is “1”, the decoding result of the third instruction decoder 35 and the fourth instruction decoder 36 is invalidated, that is, an inoperative instruction.

命令４及び命令５は、共に常に実行される命令であるので、第１番目の命令として命令４を、第２番目の命令として命令５を実行部４０に送るように、実行命令セレクタ３７１〜３７４を制御する。これで、供給された全ての命令が発行されたことになる。 Since both the instruction 4 and the instruction 5 are always executed instructions, the execution instruction selectors 371 to 374 send the instruction 4 as the first instruction and the instruction 5 as the second instruction to the execution unit 40. To control. Thus, all supplied instructions are issued.

実行部４０では、レジスタＲ０の格納値が外部メモリ内のレジスタＲ３で示されるアドレスに転送され、レジスタＲ２の格納値がレジスタＲ４に転送される。 In the execution unit 40, the stored value of the register R0 is transferred to the address indicated by the register R3 in the external memory, and the stored value of the register R2 is transferred to the register R4.

以上のように、図７に示したプログラムは、本プロセッサにおいて２つの実行単位で実行される。本プロセッサでは、演算器４４，４５の個数より多くの命令をデコードしておき、不要な命令を適宜削除することにより、これら演算器４４，４５の効率的な活用を図っている。この例においても、各サイクルとも、実行部４０において２つのオペレーションを実行しており、搭載された演算器４４，４５が効率的に活用されている。 As described above, the program shown in FIG. 7 is executed in two execution units in this processor. In this processor, more instructions than the number of the arithmetic units 44 and 45 are decoded, and unnecessary instructions are deleted as appropriate, so that the arithmetic units 44 and 45 are efficiently used. Also in this example, in each cycle, two operations are executed in the execution unit 40, and the mounted computing units 44 and 45 are efficiently utilized.

（従来の命令発行制御部を持つプロセッサとの比較）
次に、図７に示した処理を、従来技術として挙げた、条件実行命令を全て実行部へ発行し、この実行部において適宜無効化するようなプロセッサに行わせた場合を仮定して、本発明に係るプロセッサの場合と比較する。 (Comparison with a processor having a conventional instruction issue control unit)
Next, it is assumed that the processing shown in FIG. 7 is performed by a processor that issues all the conditional execution instructions to the execution unit and invalidates the execution unit as appropriate in the conventional technology. Compare with the processor according to the invention.

図８は、従来のプロセッサの命令レジスタの周辺の構成を示すブロック図である。従来のプロセッサとしては、本発明のプロセッサと同様に２つの演算器を持つものとし、命令フォーマットは、図１の本発明のプロセッサの命令フォーマットと同様とする。２並列のプロセッサなので、命令レジスタ２３ａはＡレジスタ２３１ａ及びＢレジスタ２３２ａを、命令デコーダ３２ａは第１命令デコーダ３３ａ及び第２命令デコーダ３４ａをそれぞれ備える。５０ａ、５１ａは各々定数オペランドである。命令発行制御部３１ａでは、Ａレジスタ２３１ａに格納された命令の並列実行境界情報Ｅに応じて、第２命令デコーダ３４ａの解読結果を無効化する、という制御を行う。 FIG. 8 is a block diagram showing a peripheral configuration of an instruction register of a conventional processor. The conventional processor has two arithmetic units like the processor of the present invention, and the instruction format is the same as the instruction format of the processor of the present invention shown in FIG. Since the processors are two parallel processors, the instruction register 23a includes an A register 231a and a B register 232a, and the instruction decoder 32a includes a first instruction decoder 33a and a second instruction decoder 34a. 50a and 51a are constant operands. The instruction issue control unit 31a performs control to invalidate the decoding result of the second instruction decoder 34a in accordance with the parallel execution boundary information E of the instruction stored in the A register 231a.

図９は、図７に示したプログラムの処理を、従来の命令発行制御部３１ａを持つプロセッサで実行させるプログラムを示す図である。図９のプログラムは、並列実行境界情報Ｅ以外の部分は、図７のプログラムと同一である。並列実行境界情報Ｅは、最大２命令が同時発行されるように設定されている。 FIG. 9 is a diagram showing a program that causes the processing of the program shown in FIG. 7 to be executed by a processor having a conventional instruction issue control unit 31a. The program of FIG. 9 is the same as the program of FIG. 7 except for the parallel execution boundary information E. The parallel execution boundary information E is set so that a maximum of two instructions are issued simultaneously.

以下、各実行単位ごとの従来のプロセッサの動作を説明する。ただし、ここでは、最初の時点で、条件フラグＣ０の値が“１”、Ｃ１の値が“０”で確定しているものとする。 The operation of the conventional processor for each execution unit will be described below. However, here, it is assumed that the value of the condition flag C0 is “1” and the value of C1 is “0” at the first time.

（実行単位１）
命令１、命令２、命令３及び命令４を含むパケットが外部メモリから供給され、命令１と命令２が順に命令レジスタ２３ａに転送される。次に命令発行制御部３１ａがＡレジスタ２３１ａに格納された命令１の並列実行境界情報Ｅを参照する。この場合、命令１の並列実行境界情報Ｅは“０”であるため、第２命令デコーダ３４ａの解読結果は無効化しない。したがって、命令１と命令２の両方を実行部に送ることになる。発行されなかった命令３及び命令４は、命令バッファに残される。 (Execution unit 1)
A packet including an instruction 1, an instruction 2, an instruction 3 and an instruction 4 is supplied from the external memory, and the instruction 1 and the instruction 2 are sequentially transferred to the instruction register 23a. Next, the instruction issue control unit 31a refers to the parallel execution boundary information E of the instruction 1 stored in the A register 231a. In this case, since the parallel execution boundary information E of the instruction 1 is “0”, the decoding result of the second instruction decoder 34a is not invalidated. Therefore, both instruction 1 and instruction 2 are sent to the execution unit. Instructions 3 and 4 that have not been issued are left in the instruction buffer.

実行部では、命令１の実行条件フラグであるＣ０が“１”であるため、レジスタＲ０の格納値に１を加えた値がレジスタＲ０に格納される。そして、命令２の実行条件フラグであるＣ１が“０”であるため、命令２に対応するオペレーションは実行されないか、もしくは実行後の結果を無効化し、結果的に無動作命令を実行したのと同様になる。 In the execution unit, C0 which is the execution condition flag of the instruction 1 is “1”, and therefore, a value obtained by adding 1 to the stored value of the register R0 is stored in the register R0. Since the execution condition flag C1 of the instruction 2 is “0”, the operation corresponding to the instruction 2 is not executed, or the result after execution is invalidated, and the non-operation instruction is executed as a result. It will be the same.

（実行単位２）
命令バッファに残された命令３及び命令４が順に命令レジスタ２３ａに転送され、新たに外部メモリから命令５が供給される。次に命令発行制御部３１ａがＡレジスタ２３１ａに格納された命令３の並列実行境界情報Ｅを参照する。この場合、命令３の並列実行境界情報Ｅが“０”であるため、第２命令デコーダ３４ａの解読結果は無効化しない。したがって、命令３と命令４の両方を実行部に送ることになる。発行されなかった命令５は、命令バッファ２２に残される。 (Execution unit 2)
Instruction 3 and instruction 4 remaining in the instruction buffer are sequentially transferred to the instruction register 23a, and instruction 5 is newly supplied from the external memory. Next, the instruction issuance control unit 31a refers to the parallel execution boundary information E of the instruction 3 stored in the A register 231a. In this case, since the parallel execution boundary information E of the instruction 3 is “0”, the decoding result of the second instruction decoder 34a is not invalidated. Therefore, both instruction 3 and instruction 4 are sent to the execution unit. The instruction 5 that has not been issued remains in the instruction buffer 22.

実行部では、命令３及び命令４は共に常に実行される命令であるので、これら２つの命令に対応するオペレーションが実行される。具体的には、レジスタＲ１の格納値とレジスタＲ２の格納値とを加えた値がレジスタＲ２に格納され、レジスタＲ０の格納値が外部メモリ上の、レジスタＲ３で示されるアドレスに転送される。 In the execution unit, since both the instruction 3 and the instruction 4 are instructions that are always executed, operations corresponding to these two instructions are executed. Specifically, a value obtained by adding the stored value of the register R1 and the stored value of the register R2 is stored in the register R2, and the stored value of the register R0 is transferred to the address indicated by the register R3 on the external memory.

（実行単位３）
命令バッファに残された命令５が命令レジスタ２３ａに転送される。次に命令発行制御部３１ａがＡレジスタ２３１ａに格納された命令５の並列実行境界情報Ｅを参照する。この場合、命令５の並列実行境界情報Ｅが“１”であるため、第２命令デコーダ３４ａの解読結果を無効化する。したがって、命令５のみが発行される。これで、供給された全ての命令が発行されたことになる。 (Execution unit 3)
The instruction 5 remaining in the instruction buffer is transferred to the instruction register 23a. Next, the instruction issue control unit 31a refers to the parallel execution boundary information E of the instruction 5 stored in the A register 231a. In this case, since the parallel execution boundary information E of the instruction 5 is “1”, the decoding result of the second instruction decoder 34a is invalidated. Therefore, only instruction 5 is issued. Thus, all supplied instructions are issued.

実行部では、命令５は常に実行される命令であるので、命令５に対応したオペレーションが実行される。具体的には、レジスタＲ２の格納値がレジスタＲ４に転送される。 In the execution unit, since the instruction 5 is an instruction that is always executed, an operation corresponding to the instruction 5 is executed. Specifically, the stored value of the register R2 is transferred to the register R4.

以上のように、図９に示したプログラムは、従来の命令発行制御部３１ａを持つプロセッサにおいて３つの実行単位で実行され、本発明のプロセッサの場合に比べて、１つ多い実行単位で実行されることになる。これは、従来の命令発行制御部３１ａを持つプロセッサでは、条件付き実行命令の条件が偽であった場合、その命令は無動作命令として実行されてしまい、搭載されている演算器を無駄に使用してしまうところに起因している。 As described above, the program shown in FIG. 9 is executed in three execution units in the processor having the conventional instruction issuance control unit 31a, and is executed in one execution unit more than in the case of the processor of the present invention. Will be. This is because in a processor having a conventional instruction issuance control unit 31a, if the condition of a conditional execution instruction is false, the instruction is executed as a no-operation instruction, and a built-in arithmetic unit is wasted. This is due to the fact that

［実施の形態２：コンパイル装置］
次に、上述の実施の形態１におけるプロセッサで実行するコードを生成するためのコンパイル装置、及びそのコンパイル方法に関する実施の形態について説明する。 [Embodiment 2: Compiling device]
Next, an embodiment relating to a compiling apparatus for generating a code to be executed by the processor in the above-described first embodiment and a compiling method thereof will be described.

（用語説明）
まず、ここで用いる用語を説明する。
・オブジェクトコード
再配置可能情報を含んだ対象プロセッサ向け機械語プログラムをいう。連結編集を行い未確定アドレスを決定することにより実行形式コードに変換することができる。
・プレデセッサ
ある命令を実行するために、それ以前に実行しておく必要のある命令をいう。
・実行グループ
コンパイル装置によって、同一サイクルに並列実行可能であるものをグループ化した命令群をいう。
・基本ブロック
実行が先頭から始まり、必ず最後まで実行される一連の命令列のことであり、ブロックの途中でブロックを出ることや、ブロックの途中からブロックに入ることがないものをいう。 (Glossary)
First, terms used here will be described.
-Object code A machine language program for the target processor that contains relocatable information. By performing concatenated editing and determining an undetermined address, it can be converted into an executable code.
• Predecessor An instruction that must be executed before an instruction can be executed.
-Execution group An instruction group that groups together those that can be executed in parallel in the same cycle by a compiling device.
Basic block A sequence of instructions that starts from the beginning and is always executed to the end. It means that the block is exited in the middle of the block or the block is not entered in the middle of the block.

（対象プロセッサ）
本コンパイル装置が対象とするプロセッサは、上記実施の形態１で説明したプロセッサである。このプロセッサは、コンパイル装置にて付与された並列実行境界情報Ｅを参照することにより実行グループを生成し、ハードウェアでは並列実行可能か否かの判定を行わない。したがって、並列実行境界間すなわち実行グループ内に、同時実行可能な命令が正しく配置されていることは、コンパイル装置が保証することになる。並列実行境界間に配置できる命令に対する制約は、
（１）並列実行グループ中の命令の総数は４を越えない（命令デコーダの制約）、
（２）並列実行グループ中の命令のうち、実際に実行部にてオペレーションが実行される命令の個数は２を越えない（実行命令数の制約）、
（３）並列実行グループ中の命令のうち、実際に実行部にて使用する対象プロセッサ資源の総和は、２ＡＬＵユニット、１メモリアクセスユニット、１分岐ユニットを越えない（演算器の制約）、
である。命令は、これら３つの制約が満たされた場合のみ並列実行ができる。 (Target processor)
The processor targeted by the compiling apparatus is the processor described in the first embodiment. This processor generates an execution group by referring to the parallel execution boundary information E given by the compiling device, and does not determine whether or not parallel execution is possible in hardware. Therefore, the compiling device guarantees that instructions that can be executed simultaneously are correctly arranged between parallel execution boundaries, that is, in execution groups. The restrictions on instructions that can be placed between parallel execution boundaries are:
(1) The total number of instructions in the parallel execution group does not exceed 4 (instruction decoder restrictions);
(2) Of the instructions in the parallel execution group, the number of instructions that are actually executed by the execution unit does not exceed 2 (restriction on the number of executed instructions),
(3) Among the instructions in the parallel execution group, the sum of the target processor resources actually used by the execution unit does not exceed 2 ALU units, 1 memory access unit, and 1 branch unit (constraints on computing units).
It is. An instruction can be executed in parallel only if these three constraints are met.

（コンパイル装置の構成）
図１２は、本発明の実施形態２におけるコンパイル装置の構成及び関連するデータを示すブロック図である。本コンパイル装置は、高級言語で書かれたソースコード１２０からオブジェクトコード１３０を生成するプログラム処理装置であり、コンパイラ上流部１００、アセンブラコード生成部１０１、命令スケジューリング部１０２、オブジェクトコード生成部１０３からなる。 (Compile device configuration)
FIG. 12 is a block diagram showing a configuration of the compiling apparatus and related data according to the second embodiment of the present invention. The compiling device is a program processing device that generates object code 130 from source code 120 written in a high-level language, and includes a compiler upstream unit 100, an assembler code generation unit 101, an instruction scheduling unit 102, and an object code generation unit 103. .

コンパイラ上流部１００は、ファイル形式で保存されている高級言語ソースコード１２０を読み込み、構文解析及び意味解析を行って内部形式コードを生成する。更に必要に応じて、最終的に生成される実行形式コードのサイズやその実行時間が短くなるように内部形式コードを最適化する。 The compiler upstream unit 100 reads the high-level language source code 120 stored in the file format, performs syntax analysis and semantic analysis, and generates an internal format code. Further, if necessary, the internal format code is optimized so that the size of the finally generated executable code and the execution time thereof are shortened.

アセンブラコード生成部１０１は、コンパイラ上流部１００により生成、最適化された内部形式コードからアセンブラコードを生成する。 The assembler code generation unit 101 generates assembler code from the internal format code generated and optimized by the compiler upstream unit 100.

コンパイラ上流部１００及びアセンブラコード生成部１０１での処理は本発明の主眼ではなく、また、従来のコンパイル装置で行われてきた処理と同等であるので、詳細説明は省略する。 The processing in the compiler upstream unit 100 and the assembler code generation unit 101 is not the main point of the present invention, and is equivalent to the processing performed in a conventional compiling apparatus, and thus detailed description thereof is omitted.

（命令スケジューリング部１０２）
命令スケジューリング部１０２は、アセンブラコード生成部１０１で生成されたアセンブラコードに対し、命令に付加された各条件間の排他性の解析、命令間の依存関係の解析、命令の再配置（命令順の並べ替え）及び並列実行境界の付加を行い、アセンブラコードを対象プロセッサ向けに並列化する。命令スケジューリング部１０２は、条件排他性解析部１１０、依存関係解析部１１１、命令再配置部１１２及び実行境界付加部１１３から構成される。 (Instruction scheduling unit 102)
The instruction scheduling unit 102 analyzes, for the assembler code generated by the assembler code generation unit 101, analysis of exclusiveness between conditions added to the instruction, analysis of dependency between instructions, rearrangement of instructions (arrangement of instruction order). And parallel execution boundaries are added, and the assembler code is parallelized for the target processor. The instruction scheduling unit 102 includes a condition exclusiveness analysis unit 110, a dependency relationship analysis unit 111, an instruction relocation unit 112, and an execution boundary addition unit 113.

命令スケジューリング部１０２内では、まず条件排他性解析部１１０が動作する。その後、各基本ブロックごとに、依存関係解析部１１１、命令再配置部１１２及び実行境界付加部１１３が動作する。各部の詳細な動作は以下のとおりである。 In the instruction scheduling unit 102, the conditional exclusiveness analysis unit 110 operates first. Thereafter, the dependency analysis unit 111, the instruction relocation unit 112, and the execution boundary addition unit 113 operate for each basic block. The detailed operation of each part is as follows.

条件排他性解析部１１０は、条件フラグの排他性を解析し、各基本ブロックの先頭と、各条件フラグ更新命令に対して条件排他情報テーブルを生成していく。条件排他情報テーブルは、全ての条件フラグの組み合わせに対して、条件が排他であるか否かの情報を持つ配列である。条件排他情報テーブルの具体例については、後で示す（図１６）。ここでは、全ての条件フラグの組み合わせが排他でない情報テーブルのことを無排他テーブルと呼ぶ。 The condition exclusivity analysis unit 110 analyzes the condition flag exclusivity and generates a condition exclusion information table for each basic block and each condition flag update instruction. The condition exclusion information table is an array having information on whether or not a condition is exclusive for all combinations of condition flags. A specific example of the conditional exclusion information table will be shown later (FIG. 16). Here, an information table in which all combinations of condition flags are not exclusive is called a non-exclusive table.

図１３は、条件排他性解析部１１０での処理手順を示すフローチャートである。条件排他性解析部１１０では、各命令に相当するコンパイル装置内部の中間コードを、下向きに探索して行き、各基本ブロックの先頭及び各条件フラグ更新命令に対して、条件排他情報テーブルを設定していく。 FIG. 13 is a flowchart showing a processing procedure in the condition exclusivity analysis unit 110. The conditional exclusiveness analysis unit 110 searches downward in the intermediate code in the compiling device corresponding to each instruction, and sets a conditional exclusive information table for the top of each basic block and each conditional flag update instruction. Go.

まず、現時点で有効である有効テーブルＴｖを無排他テーブルで初期化する（ステップＳ１１）。以後、各基本ブロックについて、下向きに探索していく（ステップＳ１２）。 First, a valid table Tv that is currently valid is initialized with a non-exclusive table (step S11). Thereafter, each basic block is searched downward (step S12).

ある基本ブロックに関する判定（ステップＳ１３）の結果、当該基本ブロックの先行基本ブロックが１つだけの場合は、当該基本ブロック先頭テーブルに有効テーブルＴｖを設定し（ステップＳ１４）、そうでない場合は、その時点での排他関係が特定できないので、当該基本ブロック先頭テーブルに無排他テーブルを設定する（ステップＳ１５）。 If there is only one preceding basic block of the basic block as a result of the determination on a basic block (step S13), the valid table Tv is set in the basic block head table (step S14). Since the exclusive relationship at the time cannot be specified, a non-exclusive table is set in the basic block head table (step S15).

次に、基本ブロック内の各命令について探索していく（ステップＳ１６）。比較命令などの条件フラグを更新する命令を発見すると（ステップＳ１７）、その命令が同時に排他な条件を設定する命令であるかどうかを判定する（ステップＳ１８）。同時に排他な条件を設定する命令は、図１１の命令１の条件フラグＣ０とＣ１を更新する比較命令などが該当する。 Next, each instruction in the basic block is searched (step S16). When an instruction for updating a condition flag such as a comparison instruction is found (step S17), it is determined whether the instruction is an instruction for setting an exclusive condition at the same time (step S18). An instruction for setting an exclusive condition at the same time corresponds to a comparison instruction for updating the condition flags C0 and C1 of the instruction 1 in FIG.

同時に排他な条件を設定する命令である場合、まず有効テーブルＴｖ内の当該命令で更新する条件フラグに該当する部分を全て偽に設定しておいて、その後、当該命令で排他に設定される条件フラグの組のみ真に設定する。そして、当該命令用の排他情報テーブルに有効テーブルＴｖを設定する（ステップＳ１９）。 In the case of an instruction for setting an exclusive condition at the same time, first, all portions corresponding to the condition flag to be updated by the instruction in the valid table Tv are set to false, and then the condition set to exclusive by the instruction Only set the flag set to true. Then, the valid table Tv is set in the exclusive information table for the instruction (step S19).

同時に排他な条件を設定しない命令である場合、当該命令で更新する条件フラグに関する排他性が崩れるので、有効テーブルＴｖ内の当該命令で更新する条件フラグに該当する部分を全て偽に設定する。そして、当該命令用の排他情報テーブルに有効テーブルＴｖを設定する（ステップＳ２０）。 In the case of an instruction that does not set an exclusive condition at the same time, since the exclusivity regarding the condition flag updated by the instruction is lost, all portions corresponding to the condition flag updated by the instruction in the valid table Tv are set to false. Then, the valid table Tv is set in the exclusive information table for the instruction (step S20).

以上を、各基本ブロック毎に繰り返していく（ステップＳ２１、Ｓ２２）。これによって、全基本ブロックの先頭及び条件フラグを設定する全ての命令について、それぞれの時点での条件フラグの排他性に関する情報を保持することができる。 The above is repeated for each basic block (steps S21 and S22). As a result, it is possible to hold information regarding the exclusivity of the condition flag at each point in time for all the instructions that set the head of all the basic blocks and the condition flag.

依存関係解析部１１１は、処理対象に含まれる命令間の依存関係を解析し、依存グラフとして表現する。命令間の依存関係には以下の３種類がある。いずれの依存関係にある命令も、元の命令順を変更するとプログラムの意味が異なってしまうため、命令並べ替え時においても依存関係は守る必要がある。
・データ依存関係
ある資源を定義する命令と、同じ資源を参照する命令との間の依存関係。
・逆依存関係
ある資源を参照する命令と、同じ資源を定義する命令との間の依存関係。
・出力依存関係
ある資源を定義する命令と、同じ資源を定義する命令との間の依存関係。 The dependency relationship analysis unit 111 analyzes the dependency relationship between instructions included in the processing target and expresses the dependency relationship as a dependency graph. There are the following three types of dependency relationships between instructions. Any instruction having any dependency relationship changes the meaning of the program if the original instruction order is changed. Therefore, it is necessary to maintain the dependency relationship even when the instructions are rearranged.
Data dependency A dependency between an instruction that defines a resource and an instruction that references the same resource.
• Inverse dependency A dependency between an instruction that references a resource and an instruction that defines the same resource.
Output dependency A dependency between an instruction that defines a resource and an instruction that defines the same resource.

依存関係解析部１１１では、処理対象に含まれる各命令毎に、これに対応するノード（節）を、また各依存関係毎に、これに対応するエッジ（矢印）を生成し、依存グラフを生成する。この際、参照及び定義する資源に関して依存のある２命令間であっても、それぞれの命令の実行条件が排他である、すなわち同時に成立することはないことが保証されれば、それら２命令が共に資源を参照又は定義することはありえないので、その２命令間には依存関係が存在しないことになる。したがって、それら２命令に対応するノード間にはエッジを生成しない。 The dependency relationship analysis unit 111 generates a node (section) corresponding to each instruction included in the processing target, and an edge (arrow) corresponding to each instruction, and generates a dependency graph. To do. At this time, even if it is between two instructions that depend on the resource to be referenced and defined, if it is guaranteed that the execution conditions of each instruction are exclusive, that is, they cannot be satisfied at the same time, the two instructions are both Since a resource cannot be referenced or defined, there is no dependency between the two instructions. Therefore, no edge is generated between nodes corresponding to these two instructions.

これを実現するため、先行する命令Ａと命令Ｂに関して、条件排他性解析部１１０で設定された排他情報テーブルを利用して、２命令の実行条件が排他であるか否かを検出する必要がある。この排他性の検出を行うアルゴリズムを図１４に示す。 In order to realize this, it is necessary to detect whether or not the execution condition of the two instructions is exclusive with respect to the preceding instruction A and instruction B using the exclusive information table set by the condition exclusiveness analysis unit 110. . FIG. 14 shows an algorithm for detecting this exclusivity.

まず、命令Ａの実行条件フラグをＣｎとする（ステップＳ３１）。そして、命令Ａの実行時点での有効な排他情報を求めるため、命令Ａから上向きに探索していき、条件フラグを更新する命令を発見するか、基本ブロックの先頭に到達した時点で、該当する排他情報テーブルを有効テーブルＴｖとする（ステップＳ３２）。 First, the execution condition flag of the instruction A is set to Cn (step S31). Then, in order to obtain valid exclusive information at the time of execution of the instruction A, the search is performed upward from the instruction A, and when the instruction for updating the condition flag is found or when the head of the basic block is reached, it corresponds. The exclusive information table is set as a valid table Tv (step S32).

次に、命令Ｂまでの経路をたどるため、命令Ａから下向きに探索していく（ステップＳ３３）。命令Ｂを発見した場合（ステップＳ３４）、その時点での有効テーブルＴｖを参照して、条件フラグＣｎと命令Ｂの実行条件との排他関係を求めて終了する（ステップＳ３５）。Ｃｎ以外の条件フラグを更新する命令を発見した場合（ステップＳ３６）、その命令に対応する排他情報テーブルで有効テーブルＴｖを更新して、継続する（ステップＳ３７）。条件フラグＣｎを更新する命令を発見した場合（ステップＳ３８）、排他性を保証できなくなるので偽を返す（ステップＳ３９）。以上を繰り返す（ステップＳ４０）。 Next, in order to follow the path to the instruction B, the search is performed downward from the instruction A (step S33). When the instruction B is found (step S34), the exclusive table Tv at that time is referred to determine the exclusive relationship between the condition flag Cn and the execution condition of the instruction B, and the process ends (step S35). When an instruction for updating a condition flag other than Cn is found (step S36), the validity table Tv is updated with the exclusive information table corresponding to the instruction and the process continues (step S37). If an instruction for updating the condition flag Cn is found (step S38), the exclusiveness cannot be guaranteed, and false is returned (step S39). The above is repeated (step S40).

このようにして、資源の定義及び参照関係と、実行条件の排他性との両方を解析して、各命令間の依存関係を構築していく。 In this way, both the definition and reference relationship of resources and the exclusivity of execution conditions are analyzed, and a dependency relationship between instructions is constructed.

具体例として、図１５に示したアセンブラコードに対する条件排他性解析部１１０及び依存関係解析部１１１の適用結果を説明する。 As a specific example, application results of the condition exclusiveness analysis unit 110 and the dependency relationship analysis unit 111 with respect to the assembler code illustrated in FIG. 15 will be described.

図１６は、図１５のアセンブラコードの命令２（比較命令）に対応する条件排他情報テーブルを示す図である。条件排他情報テーブルは、条件フラグＣ０〜Ｃ７の全ての組み合わせに対する排他性を示す配列である。この場合は、命令２によって、条件フラグＣ０と条件フラグＣ１とが排他であることが設定されることになる。 FIG. 16 is a diagram showing a condition exclusion information table corresponding to the assembler code instruction 2 (comparison instruction) in FIG. The condition exclusion information table is an array indicating the exclusion for all combinations of the condition flags C0 to C7. In this case, the instruction 2 sets that the condition flag C0 and the condition flag C1 are exclusive.

図１７は、依存関係解析部１１１の出力である依存グラフを示す図である。図１７中、実線はデータ依存関係を、破線は逆依存関係を示す。命令２（比較命令）は命令１で更新されるレジスタＲ０を参照するため、データ依存があり、命令３及び命令４は命令２で更新される条件フラグＣ０及びＣ１を参照するため、データ依存関係がある。ここで、命令３ではレジスタＲ２を更新し、命令４ではレジスタＲ２を参照しているため、一見すると命令３から命令４へのデータ依存関係が存在するように見える。しかし、それぞれの命令の実行条件であるＣ０及びＣ１は命令２によって排他な条件として設定されているため、図１６に示した条件排他情報テーブルを参照することにより２命令が共に実行されることがあり得ないことが判明し、これら２命令間には依存関係は存在しないものとする。 FIG. 17 is a diagram illustrating a dependency graph that is an output of the dependency relationship analysis unit 111. In FIG. 17, the solid line indicates the data dependency relationship, and the broken line indicates the reverse dependency relationship. Since the instruction 2 (comparison instruction) refers to the register R0 updated by the instruction 1, there is data dependency, and the instruction 3 and the instruction 4 refer to the condition flags C0 and C1 updated by the instruction 2, so the data dependency There is. Here, since the instruction R3 updates the register R2 and the instruction 4 refers to the register R2, at first glance, it seems that there exists a data dependency from the instruction 3 to the instruction 4. However, since C0 and C1, which are the execution conditions of the respective instructions, are set as exclusive conditions by the instruction 2, the two instructions can be executed together by referring to the conditional exclusion information table shown in FIG. It is determined that this is not possible, and there is no dependency between these two instructions.

図１２の説明に戻って、命令再配置部１１２は、依存関係解析部１１１で生成された依存グラフを用いて、処理対象の命令を並べ替え、対象プロセッサ向けの並列化されたアセンブラコードを生成する。命令再配置部１１２の処理の詳細は以下のとおりである。 Returning to the description of FIG. 12, the instruction relocation unit 112 rearranges the processing target instructions using the dependency graph generated by the dependency analysis unit 111 and generates a parallel assembler code for the target processor. To do. Details of the processing of the instruction rearrangement unit 112 are as follows.

図１８は、命令再配置部１１２での処理手順を示すフローチャートである。命令再配置部１１２は、依存関係解析部１１１が生成した依存グラフの全てのノードについて、以下のループ１の処理（ステップＳ５２〜Ｓ６０）を繰り返す（ステップＳ５１、Ｓ６１）。 FIG. 18 is a flowchart showing a processing procedure in the instruction relocation unit 112. The instruction rearrangement unit 112 repeats the following processing of loop 1 (steps S52 to S60) for all nodes of the dependency graph generated by the dependency relationship analysis unit 111 (steps S51 and S61).

まず、命令再配置部１１２は、現時点で配置候補となり得るノードを依存グラフより抽出し配置候補ノード集合とする（ステップＳ５２）。ここで配置候補となり得るノードとは、「プレデセッサが全て配置完了済み」であるノードである。 First, the instruction rearrangement unit 112 extracts nodes that can be placement candidates at the present time from the dependency graph and sets them as a placement candidate node set (step S52). Here, the nodes that can be placement candidates are nodes that are “all predecessors have been placed”.

次に、命令再配置部１１２は、配置候補ノード集合の全ての候補ノードについて、以下のループ２の処理（ステップＳ５４〜Ｓ５８）を繰り返す（ステップＳ５３、Ｓ５９）。 Next, the instruction rearrangement unit 112 repeats the following processing of loop 2 (steps S54 to S58) for all candidate nodes of the arrangement candidate node set (steps S53 and S59).

まず、配置候補ノード集合から現時点で配置することが最良と思われるノード（以下、単に「最良ノード」と呼ぶ。）を取り出す（ステップＳ５４）。最良ノードの決定方法については後述する。続いて最良ノードが、実際に配置可能か否かを判断し（ステップＳ５５）、可能な場合は仮配置する（ステップＳ５６）。この判断は、前述のプロセッサの解読ステージでの命令自体の削除の効果を有効に活かすため、既に仮配置されているノードと最良ノードとの実行条件の排他性を考慮しつつ、前述の演算器の制約、実行命令数の制約及び命令デコーダの制約を満たすかどうかによって決定する。条件排他性の考慮には、条件排他性解析部１１０の結果を用いる。ただし、実行条件フラグを更新する命令の次のサイクルにおいては、当該実行条件で実行される命令自体の削除は行われないことも考慮する。つまり、この場合は実行条件の排他性を考慮せず、純粋に演算器や実行命令数の制約で配置可能性を判定する。 First, a node (hereinafter simply referred to as “best node”) that is considered to be best arranged at the present time is extracted from the candidate arrangement node set (step S54). A method for determining the best node will be described later. Subsequently, it is determined whether or not the best node can actually be arranged (step S55). If possible, the best node is provisionally arranged (step S56). In order to make effective use of the effect of deleting the instruction itself at the decoding stage of the processor described above, this determination is performed while taking into account the exclusivity of the execution condition between the already provisionally arranged node and the best node. It is determined depending on whether the constraint, the constraint on the number of executed instructions, and the constraint on the instruction decoder are satisfied. In consideration of the condition exclusivity, the result of the condition exclusivity analysis unit 110 is used. However, it is also considered that the instruction itself executed under the execution condition is not deleted in the next cycle of the instruction that updates the execution condition flag. That is, in this case, the possibility of arrangement is determined purely by the restriction of the arithmetic unit and the number of execution instructions without considering the exclusivity of the execution conditions.

続いて、現時点で仮配置されているノード集合を調べ、更に命令を配置することができるか否かを判断する（ステップＳ５７）。配置不可と判断された場合はループ２を終了し処理をステップＳ６０へ移す。 Subsequently, the node set temporarily arranged at present is checked, and it is determined whether or not an instruction can be arranged (step S57). If it is determined that the placement is impossible, the loop 2 is terminated and the process proceeds to step S60.

配置可能と判断された場合、最良ノードが配置されたことによって新たに配置候補となり得るノードが生じたか否かを判断し、新たな配置候補が生じた場合はこれを配置候補ノードに追加する（ステップＳ５８）。ステップＳ５８で新たに配置候補にできるのは、「（現在配置しようとしている）最良ノードのみをプレデセッサとして持ち、かつ、最良ノードとの依存関係が逆依存もしくは出力依存」のノードである。つまり、ここで新たな配置候補になることができるノードは、最良ノードと同じサイクルで実行することはできるが、最良ノードより前のサイクルでは実行できないノードである。 When it is determined that placement is possible, it is determined whether or not a node that can be a new placement candidate has occurred due to the placement of the best node, and if a new placement candidate is created, this is added to the placement candidate node ( Step S58). In step S58, a node that can be newly set as a placement candidate is a node having “only the best node (currently planned to be placed) as a predecessor and the dependency relationship with the best node being inversely dependent or output dependent”. That is, a node that can be a new placement candidate here can be executed in the same cycle as the best node, but cannot be executed in a cycle before the best node.

ループ２が終了した後、仮配置ノード集合に含まれているノードを確定する（ステップＳ６０）。具体的には、仮配置ノード集合に含まれているノードに対応する命令を元の命令列から取り出し、実行境界付加部１１３へ渡すための新たな命令列に再配置する。この段階で配置候補ノードの一部が、同時に実行する命令群としてまとめられ確定したことになる。 After the loop 2 ends, the nodes included in the temporary arrangement node set are determined (step S60). Specifically, an instruction corresponding to a node included in the temporary arrangement node set is extracted from the original instruction sequence and rearranged in a new instruction sequence to be passed to the execution boundary adding unit 113. At this stage, a part of the arrangement candidate nodes is collected and confirmed as a group of instructions to be executed simultaneously.

次に、ステップＳ５４における最良ノードの決定方法について述べる。最良ノードは、依存グラフ、仮配置領域を参照して、処理対象の命令全体を最も短時間で実行できるであろう命令をヒューリスティックに選び出す。ここでは現時点での依存グラフにおいて依存グラフの終端までの命令の実行時間総和が最も短いものを選ぶ。この条件に合致する命令が多数ある場合には、元の命令順が早い命令を最良ノードとする。 Next, the method for determining the best node in step S54 will be described. The best node heuristically selects an instruction that can execute the entire instruction to be processed in the shortest time with reference to the dependency graph and the temporary arrangement area. Here, in the current dependency graph, the one having the shortest total execution time of instructions up to the end of the dependency graph is selected. If there are many instructions that match this condition, the instruction with the earlier instruction order is taken as the best node.

再び図１２に戻って、実行境界付加部１１３は、命令再配置部１１２のステップＳ６０で配置が確定した命令群の末尾毎に並列実行境界情報Ｅを設定する。 Returning to FIG. 12 again, the execution boundary adding unit 113 sets the parallel execution boundary information E for each end of the instruction group whose arrangement is determined in step S60 of the instruction relocation unit 112.

オブジェクトコード生成部１０３は、命令スケジューリング部１０２が出力したアセンブラコードをオブジェクトコード１３０に変換し、ファイルとして出力する。 The object code generation unit 103 converts the assembler code output from the instruction scheduling unit 102 into an object code 130 and outputs the object code 130 as a file.

（コンパイル装置の動作）
次に、本コンパイル装置の特徴的な構成要素の動作について、具体的な命令を用いて説明する。 (Compiler operation)
Next, operations of characteristic components of the compiling apparatus will be described using specific instructions.

図１９は、ソースコードをコンパイラ上流部１００に入力し、アセンブラコード生成部１０１を経て生成されたアセンブラコードである。命令スケジューリング部１０２は図１９のコードを入力として受け取る。図１９に含まれる各命令の意味は以下のとおりである。
・命令１…レジスタＲ０の格納値と定数０とが一致しているかを比較し、真偽を条件フラグＣ０に設定し、その逆の条件を条件フラグＣ１に設定する。
・命令２…条件フラグＣ０の値が真の場合にのみ、レジスタＲ１の格納値とレジスタＲ２の格納値とを加算してレジスタＲ２に格納する。
・命令３…条件フラグＣ１の値が真の場合にのみ、レジスタＲ２の格納値とレジスタＲ３の格納値とを加算してレジスタＲ３に格納する。
・命令４…条件フラグＣ０の値が真の場合にのみ、レジスタＲ１の格納値とレジスタＲ３の格納値とを加算してレジスタＲ３に格納する。
・命令５…条件フラグＣ１の値が真の場合にのみ、レジスタＲ３の格納値とレジスタＲ４の格納値とを加算してレジスタＲ４に格納する。
・命令６…条件フラグＣ０の値が真の場合にのみ、レジスタＲ２の格納値とレジスタＲ４の格納値とを加算してレジスタＲ４に格納する。
・命令７…条件フラグＣ１の値が真の場合にのみ、レジスタＲ３の格納値とレジスタＲ５の格納値とを加算してレジスタＲ５に格納する。 FIG. 19 shows assembler code generated by inputting source code to the compiler upstream section 100 and passing through the assembler code generation section 101. The instruction scheduling unit 102 receives the code shown in FIG. 19 as an input. The meaning of each instruction included in FIG. 19 is as follows.
Instruction 1... Compares whether the value stored in the register R0 matches the constant 0, sets true / false to the condition flag C0, and sets the opposite condition to the condition flag C1.
Instruction 2 ... Only when the value of the condition flag C0 is true, the stored value of the register R1 and the stored value of the register R2 are added and stored in the register R2.
Instruction 3 ... Only when the value of the condition flag C1 is true, the stored value of the register R2 and the stored value of the register R3 are added and stored in the register R3.
Instruction 4 ... Only when the value of the condition flag C0 is true, the stored value of the register R1 and the stored value of the register R3 are added and stored in the register R3.
Instruction 5: Only when the value of the condition flag C1 is true, the stored value of the register R3 and the stored value of the register R4 are added and stored in the register R4.
Instruction 6: Only when the value of the condition flag C0 is true, the stored value of the register R2 and the stored value of the register R4 are added and stored in the register R4.
Instruction 7: Only when the value of the condition flag C1 is true, the stored value of the register R3 and the stored value of the register R5 are added and stored in the register R5.

以下、命令スケジューリング部１０２の動作を説明する。まず、条件排他性解析部１１０と依存関係解析部１１１とが起動され、依存グラフが生成される。図１９のコード例では、命令１で生成した条件フラグＣ０とＣ１が、命令２以降において排他であることを考慮しつつ、資源の定義及び参照関係を解析する。図２０に、生成された依存グラフを示す。 Hereinafter, the operation of the instruction scheduling unit 102 will be described. First, the condition exclusiveness analysis unit 110 and the dependency relationship analysis unit 111 are activated to generate a dependency graph. In the code example of FIG. 19, the definition and reference relationship of resources are analyzed while considering that the condition flags C0 and C1 generated by the instruction 1 are exclusive after the instruction 2. FIG. 20 shows the generated dependency graph.

次に、命令再配置部１１２が起動される。図１８のフローチャートに沿って説明すると、まず第１サイクルで、配置候補ノード集合を生成する（ステップＳ５２）。図２０の依存グラフから、ここでは命令１のみが配置候補ノードとなる。次に最良ノードを取り出す（ステップＳ５４）。ここでは、自動的に命令１が選択される。そして、配置可能判定ステップ（Ｓ５５）において、配置可能であると判定される。更に、配置状態判定ステップ（Ｓ５７）においても、まだ配置可能であると判定されるが、配置候補ノード追加ステップ（Ｓ５８）において追加対象となる命令が存在しないため、配置ノード確定ステップ（Ｓ６０）にて、第１サイクルは命令１のみを発行するように確定される。 Next, the instruction relocation unit 112 is activated. Explaining along the flowchart of FIG. 18, first, in the first cycle, an arrangement candidate node set is generated (step S52). From the dependency graph of FIG. 20, here, only instruction 1 is a placement candidate node. Next, the best node is extracted (step S54). Here, instruction 1 is automatically selected. Then, in the arrangement possible determination step (S55), it is determined that the arrangement is possible. Further, in the placement state determination step (S57), it is determined that the placement is still possible, but since there is no instruction to be added in the placement candidate node addition step (S58), the placement node determination step (S60) Thus, the first cycle is determined to issue only instruction 1.

次のサイクルでは、命令２、命令３及び命令４が配置候補ノードとなる。命令２及び命令３が順に最良ノードとして選ばれ、仮配置される。次に、命令４が最良ノードとして選ばれ、配置可能判定ステップ（Ｓ５５）に入る。ここで、条件排他性を考慮した判定が行われるわけであるが、直前のサイクルで実行条件Ｃ０及びＣ１の値が更新されているため、このサイクルではＣ０及びＣ１を実行条件とする命令の解読ステージでの削除は実施されない。したがって、既に仮配置されている命令２及び命令３が削除されることがないので、ハードウェアに搭載した演算器の制限により、命令４は同時発行不可能、すなわち配置不可能と判定される。こうして、第２サイクルでは、命令２及び命令３を発行するように確定される。 In the next cycle, instruction 2, instruction 3 and instruction 4 become placement candidate nodes. Instruction 2 and instruction 3 are selected as the best nodes in order and temporarily arranged. Next, the instruction 4 is selected as the best node, and the placement possible determination step (S55) is entered. Here, the determination considering the condition exclusiveness is performed, but since the values of the execution conditions C0 and C1 are updated in the immediately preceding cycle, the instruction decoding stage using the execution conditions of C0 and C1 in this cycle. Deletion in is not performed. Therefore, since the instruction 2 and the instruction 3 that have already been temporarily arranged are not deleted, it is determined that the instruction 4 cannot be issued simultaneously, that is, cannot be arranged due to the limitation of the arithmetic unit mounted on the hardware. Thus, in the second cycle, the instruction 2 and the instruction 3 are determined to be issued.

次のサイクルでは、命令４、命令５、命令６及び命令７が配置候補ノードとなる。命令４及び命令５が順に最良ノードとして選ばれ、仮配置される。次に、命令６が最良ノードとして選ばれ、配置可能判定ステップ（Ｓ５５）に入る。ここで、条件排他性を考慮した判定が行われる。命令６が実際にオペレーションを実行する場合、すなわち命令６の実行条件フラグＣ０が真である場合には、条件フラグＣ１は偽であるため、Ｃ１を実行条件とする命令５はオペレーションを実行せず、演算器を使用しない。したがって、命令４及び命令６の組み合わせにおいて演算器の制約を満たしているので、命令６は配置可能と判定される。次に命令７が最良ノードとして選ばれるが、上記と同様に、命令７がオペレーションを実行する場合、命令４及び命令６は削除されるため、命令５及び命令７のみの組み合わせによって演算器の制約を判定し、命令７は配置可能と判定される。こうして、第３サイクルでは、命令４、命令５、命令６及び命令７を発行するように確定される。これで、未配置のノードが無くなったので、命令再配置部１１２の処理が完了する。 In the next cycle, instruction 4, instruction 5, instruction 6 and instruction 7 become placement candidate nodes. Instruction 4 and instruction 5 are selected as the best nodes in order and temporarily arranged. Next, the instruction 6 is selected as the best node, and the placement possible determination step (S55) is entered. Here, the determination considering the condition exclusivity is performed. When the instruction 6 actually executes the operation, that is, when the execution condition flag C0 of the instruction 6 is true, the condition flag C1 is false. Therefore, the instruction 5 having the execution condition of C1 does not execute the operation. Do not use the calculator. Therefore, since the combination of the instruction 4 and the instruction 6 satisfies the restrictions of the arithmetic unit, it is determined that the instruction 6 can be arranged. Next, the instruction 7 is selected as the best node. However, when the instruction 7 executes the operation as described above, the instruction 4 and the instruction 6 are deleted. It is determined that the instruction 7 can be arranged. Thus, in the third cycle, it is decided to issue the instruction 4, the instruction 5, the instruction 6 and the instruction 7. Since there are no unallocated nodes, the processing of the instruction rearrangement unit 112 is completed.

最後に、実行境界付加部１１３が起動される。ここでは、上記の命令再配置部１１２による配置された命令群の末尾の命令に並列実行境界情報Ｅを設定していく。具体的には、命令１、命令３及び命令７の並列実行境界情報Ｅに“１”を設定し、残りの命令の並列実行境界情報Ｅには“０”を設定する。 Finally, the execution boundary adding unit 113 is activated. Here, the parallel execution boundary information E is set to the last instruction of the instruction group arranged by the instruction relocation unit 112. Specifically, “1” is set to the parallel execution boundary information E of the instruction 1, the instruction 3 and the instruction 7, and “0” is set to the parallel execution boundary information E of the remaining instructions.

以上で、命令スケジューリング部１０２の処理が完了する。続いてオブジェクトコード生成部１０３が起動され、オブジェクトコードが出力される。 Thus, the process of the instruction scheduling unit 102 is completed. Subsequently, the object code generation unit 103 is activated and the object code is output.

図２１に、最終的な実行形式コードを示す。実際の実行形式コードは１２８ビット単位にまとめられたビット列である。図２１に示した実行形式コードは、本発明に係る２個の演算器を持つプロセッサにて、３つの実行グループで実行される。 FIG. 21 shows the final execution format code. The actual execution format code is a bit string collected in 128-bit units. The execution format code shown in FIG. 21 is executed in three execution groups by a processor having two arithmetic units according to the present invention.

（従来のコンパイル装置との比較）
次に、図１９に示したアセンブラコードを、本発明のコンパイル装置の構成をとらない従来のコンパイル装置にてコンパイルした場合を仮定して、本発明に係るコンパイル装置の場合と比較する。対象プロセッサは、本発明のプロセッサと同様に２個の演算器を備えたプロセッサとする。 (Comparison with conventional compilation equipment)
Next, assuming that the assembler code shown in FIG. 19 is compiled by a conventional compiling device that does not have the configuration of the compiling device of the present invention, the assembler code is compared with the compiling device according to the present invention. The target processor is assumed to be a processor including two arithmetic units as in the processor of the present invention.

従来のコンパイル装置は、命令再配置部において違いがある。まず、最初のサイクルでは、依存関係のため命令１のみ発行する。次のサイクルでは、命令２、命令３及び命令４が候補となるが、１サイクルに２つという演算器の制約のため、命令２及び命令３のみを発行する。次のサイクルでは、命令４、命令５、命令６及び命令７が候補となるが、演算器の制約のため、命令４及び命令５のみを発行する。次のサイクルでは、命令６及び命令７が候補となり、演算器の制約を満たすため、両方の命令が発行される。こうして、命令再配置が完了する。実行境界付加部では、具体的には、命令１、命令３、命令５及び命令７の並列実行境界情報Ｅに“１”を設定し、残りの命令の並列実行境界情報Ｅには“０”を設定する。以上で命令スケジューリング処理が完了する。 The conventional compiling apparatus has a difference in the instruction relocation unit. First, in the first cycle, only instruction 1 is issued due to the dependency. In the next cycle, instruction 2, instruction 3 and instruction 4 are candidates, but only instruction 2 and instruction 3 are issued due to the limitation of two arithmetic units in one cycle. In the next cycle, instruction 4, instruction 5, instruction 6 and instruction 7 are candidates, but only instruction 4 and instruction 5 are issued due to the limitation of the arithmetic unit. In the next cycle, instruction 6 and instruction 7 are candidates, and both instructions are issued to satisfy the constraints of the arithmetic unit. In this way, instruction rearrangement is completed. Specifically, the execution boundary adding unit sets “1” to the parallel execution boundary information E of the instruction 1, the instruction 3, the instruction 5, and the instruction 7, and “0” to the parallel execution boundary information E of the remaining instructions. Set. This completes the instruction scheduling process.

図２２に、結果として生成される実行形式コードを示す。図２２に示した実行形式コードは、２個の演算器を持つプロセッサにて４つの実行グループで実行される。 FIG. 22 shows the execution format code generated as a result. The execution format code shown in FIG. 22 is executed in four execution groups by a processor having two arithmetic units.

図２１と図２２を比較すると、従来のコンパイル装置の生成コード（図２２）では、本発明のコンパイル装置の生成コード（図２１）の場合に比べ、実行グループが１つ増えている。つまり、実行サイクル数が１サイクル増していることになる。このように実行グループ数が増加したのは、本発明の命令スケジューリング部１０２のような構成をとらなかったために、全ての命令が実行ステージへ発行されるものとして扱ってしまい、ハードウェアに搭載された演算器の個数を上限とした配置しかできないためである。一方、本発明のコンパイル装置では、命令自体の無効化を考慮してハードウェアに搭載された演算器の個数以上の数の命令を１サイクルに配置することが可能であり、演算器を有効活用することができる。 When FIG. 21 and FIG. 22 are compared, in the generated code of the conventional compiling device (FIG. 22), the execution group is increased by one as compared with the generated code of the compiling device of the present invention (FIG. 21). That is, the number of execution cycles is increased by one cycle. The reason why the number of execution groups has increased in this way is that the instruction scheduling unit 102 of the present invention is not adopted, so that all instructions are handled as being issued to the execution stage and are installed in hardware. This is because it can only be arranged with the number of computing units as the upper limit. On the other hand, in the compiling device according to the present invention, it is possible to arrange instructions in a number equal to or greater than the number of arithmetic units installed in hardware in consideration of invalidation of the instruction itself, and effectively use the arithmetic units. can do.

なお、本実施形態で示されるコンパイル装置の処理手順をフレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどの記録媒体に入れることにより、本実施形態で示されるコンパイル装置をコンピュータで実現できる。 The compiling apparatus shown in this embodiment can be realized by a computer by putting the processing procedure of the compiling apparatus shown in this embodiment into a recording medium such as a flexible disk, hard disk, CD-ROM, MO, and DVD.

また、本実施形態で示されるコンパイル装置により生成された実行形式コードをフレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、半導体メモリなどの記録媒体に入れることもできる。 Further, the execution format code generated by the compiling device shown in the present embodiment can be put in a recording medium such as a flexible disk, a hard disk, a CD-ROM, an MO, a DVD, or a semiconductor memory.

［実施の形態３：プロセッサ］
次に、上記実施の形態１のプロセッサを拡張したプロセッサの実施の形態について説明する。 [Embodiment 3: Processor]
Next, an embodiment of a processor that is an extension of the processor of the first embodiment will be described.

本プロセッサのハードウェア構成については、大部分は前述の第１の実施の形態のプロセッサと同一であるが、実行グループ内に配置される命令群の実行条件情報の配置に制限が加わっている。具体的には、１つの実行グループ内には、同一の実行条件を持つ命令は必ず連続して配置される、という制限である。後述する実施の形態４のコンパイル装置は、この制限に従ってコードを生成する。これによって、プロセッサとしては、命令発行制御部の構成が異なってくることになる。 Most of the hardware configuration of this processor is the same as that of the processor of the first embodiment described above, but there is a limitation on the arrangement of execution condition information of the instruction group arranged in the execution group. Specifically, there is a restriction that instructions having the same execution condition are always arranged consecutively in one execution group. The compiling device according to the fourth embodiment to be described later generates code according to this restriction. As a result, the configuration of the instruction issuance control unit differs as a processor.

（命令発行制御部の構成と動作）
図２３は、本実施の形態のプロセッサの命令発行制御部１４０とその周辺回路の構成を示したものである。図２３中の命令発行制御部１４０のほとんどの部分は、図５に示した実施の形態１のプロセッサのものと同一である。異なる点は、実行命令選択制御部１４１の制御方法と、実行命令選択制御部１４１の後段に命令結合部１４２が追加されていることの２点である。 (Configuration and operation of instruction issue controller)
FIG. 23 shows the configuration of the instruction issuance control unit 140 and its peripheral circuits of the processor of this embodiment. Most parts of the instruction issuance control unit 140 in FIG. 23 are the same as those of the processor of the first embodiment shown in FIG. The difference is that the control method of the execution instruction selection control unit 141 and the instruction coupling unit 142 are added after the execution instruction selection control unit 141.

まず、実行命令選択制御部１４１は、実施の形態１で示したのと同様に、実行条件が偽となる命令について命令自体を実質的に削除する制御を行うのであるが、前述のように実施の形態１の場合と異なり、命令の配置順に制限が加わっているため、それを積極的に活用する。具体的には、命令配置順には、同一の実行条件情報を持つ命令は連続して配置される、という制限があるので、まず解読後の命令群を各実行条件毎に分類する。命令配置順の制限により、この分類は容易に行うことができる。 First, the execution instruction selection control unit 141 performs control to substantially delete the instruction itself for an instruction whose execution condition is false, as described in the first embodiment. Unlike the case of Form 1, restrictions are added in the order of instruction arrangement, so that they are actively utilized. Specifically, since there is a restriction that instructions having the same execution condition information are successively arranged in the instruction arrangement order, first, the decoded instruction group is classified for each execution condition. This classification can be easily performed due to the restriction of the instruction arrangement order.

次に、実行条件フラグの値が“０”で確定しているか否かを、分類された各実行条件毎に検査する。“０”で確定している条件フラグを実行条件とする命令群は、まとめて削除するように制御し、実際に実行部４０へ発行すべき命令群を決定する。これによって、条件フラグの検査回数を最小にとどめ、複数の命令の削除可能性を同時に検出することができ、実行部４０へ発行すべき命令の検出を高速かつ簡単に行うことができる。 Next, whether or not the value of the execution condition flag is fixed at “0” is checked for each classified execution condition. The instruction group whose execution condition is the condition flag fixed at “0” is controlled to be deleted together, and the instruction group to be actually issued to the execution unit 40 is determined. As a result, the number of condition flag checks can be minimized, the possibility of deletion of a plurality of instructions can be detected simultaneously, and instructions to be issued to the execution unit 40 can be detected quickly and easily.

次に、実行命令選択制御部１４１による命令の削除後、命令群は命令結合部１４２に入力される。ここでは、実際に実行部４０にてオペレーションを実行することが確定した命令群について、複数の命令を１つの複合命令として結合できるか否かを検出し、結合可能な場合には、当該命令の制御信号を新規複合命令に変更し、オペランドの結合を行い、後続側の命令を実行命令選択制御部１４１と同様に削除するように制御する。こうして、命令結合部１４２からは、ハードウェアとして搭載された演算器の個数に応じた２つの命令に対応する制御信号とオペランドデータとが出力され、実行部４０に転送される。これらの命令は、それぞれ複数の命令の複合命令である可能性もあることになる。 Next, after the execution instruction selection control unit 141 deletes the instruction, the instruction group is input to the instruction coupling unit 142. Here, it is detected whether or not a plurality of instructions can be combined as one compound instruction with respect to an instruction group that is actually executed by the execution unit 40. The control signal is changed to the new compound instruction, the operands are combined, and the subsequent instruction is controlled to be deleted in the same manner as the execution instruction selection control unit 141. In this way, the instruction combining unit 142 outputs control signals and operand data corresponding to two instructions corresponding to the number of arithmetic units mounted as hardware, and transfers them to the execution unit 40. Each of these instructions may be a composite instruction of a plurality of instructions.

（プロセッサの動作）
次に、図２４を用いて、本プロセッサの具体的な動作について説明する。図２４は、条件付き実行命令を含むプログラムの一例を示す図である。このプログラムは、４個の命令で構成されており、表記は図９のプログラムと同じである。ニーモニック“ｌｓｒ”は、レジスタの格納値の論理右シフトを表している。 (Processor operation)
Next, a specific operation of this processor will be described with reference to FIG. FIG. 24 is a diagram illustrating an example of a program including a conditional execution instruction. This program is composed of four instructions, and the notation is the same as the program of FIG. The mnemonic “lsr” represents a logical right shift of the stored value of the register.

以下、各実行単位ごとの本プロセッサの動作を説明する。ただし、ここでは、最初の時点で、条件フラグＣ０の値が“０”、Ｃ１の値が“１”で確定しているものとする。 The operation of this processor for each execution unit will be described below. However, here, it is assumed that the value of the condition flag C0 is “0” and the value of C1 is “1” at the first time.

（実行単位１）
命令１、命令２、命令３及び命令４を含むパケットが外部メモリから供給され、それぞれ命令レジスタ２３に転送される。次に、命令発行制御部１４０が各命令の並列実行境界情報Ｅを参照する。この場合、命令１、命令２、命令３の並列実行境界情報Ｅがいずれも“０”であるため、命令デコーダの解読結果の無効化は行わない。 (Execution unit 1)
Packets including instruction 1, instruction 2, instruction 3 and instruction 4 are supplied from the external memory and transferred to the instruction register 23, respectively. Next, the instruction issuance control unit 140 refers to the parallel execution boundary information E of each instruction. In this case, since the parallel execution boundary information E of the instruction 1, the instruction 2, and the instruction 3 is all “0”, the decoding result of the instruction decoder is not invalidated.

次に、命令発行制御部１４０は各命令の実行条件情報Ｐを参照し、実行命令選択制御部１４１にて、オペレーションを実行する命令を選択する。命令１は常に実行する命令である。命令２の実行条件フラグはＣ０であり、Ｃ０の値は“０”で確定しているので、命令２自体は実質的に削除し、オペレーションの実行は行わない。後続の命令３及び命令４の実行条件フラグは共にＣ１であるので、条件フラグＣ１を１度だけ参照し、Ｃ１の値が“１”で確定しているので、命令３と命令４を共に実行対象とする。こうして、命令１、命令３及び命令４が、次の命令結合部１４２に送られる。 Next, the instruction issue control unit 140 refers to the execution condition information P of each instruction, and the execution instruction selection control unit 141 selects an instruction to execute the operation. Instruction 1 is an instruction that is always executed. Since the execution condition flag of the instruction 2 is C0 and the value of C0 is fixed at “0”, the instruction 2 itself is substantially deleted and the operation is not executed. Since the execution condition flags of the subsequent instruction 3 and instruction 4 are both C1, the condition flag C1 is referred to only once, and since the value of C1 is fixed at “1”, both the instruction 3 and the instruction 4 are executed. set to target. In this way, the instruction 1, the instruction 3 and the instruction 4 are sent to the next instruction combination unit 142.

命令結合部１４２では、入力された命令群の全ての組み合わせについて、複合命令が生成可能か否かを判定する。この場合、命令１（シフト命令）と命令４（加算命令）とを結合してシフト加算命令を生成できることを検出する。そして、第１番目の命令として、シフト加算に対応する制御信号とオペランドが、第２番目の命令として、命令３に対応する制御信号とオペランドがそれぞれ実行部４０に送られる。これで、供給された命令は全て発行されたことになる。 The instruction combining unit 142 determines whether or not a composite instruction can be generated for all combinations of input instruction groups. In this case, it is detected that the shift addition instruction can be generated by combining the instruction 1 (shift instruction) and the instruction 4 (addition instruction). Then, the control signal and operand corresponding to shift addition are sent to the execution unit 40 as the first instruction, and the control signal and operand corresponding to instruction 3 are sent as the second instruction, respectively. Thus, all supplied instructions have been issued.

実行部４０では、レジスタＲ３の格納値をレジスタＲ１の格納値の分だけ論理右シフトした値にレジスタＲ２の格納値を加えた値がレジスタＲ２に格納され、レジスタＲ０の格納値に１を加えた値がレジスタＲ０に格納される。 In the execution unit 40, a value obtained by logically shifting the stored value of the register R3 by the value stored in the register R1 and adding the stored value of the register R2 is stored in the register R2, and 1 is added to the stored value of the register R0. The stored value is stored in the register R0.

以上のように、図２４に示したプログラムは、本プロセッサにおいて１つの実行単位で実行される。本プロセッサでは、確定した実行条件による命令自体の削除後、命令同士を結合して１つの複合命令とすることを試みる。これによって、実質の演算効率を高めることが可能となる。また、同一の実行条件を持った命令が連続して配置されるという制限を利用して、実際にオペレーションを実行する命令を解読ステージにて選択する処理の高速化を図っている。 As described above, the program shown in FIG. 24 is executed in one execution unit in this processor. In this processor, after deleting the instruction itself according to the determined execution condition, it tries to combine the instructions into one composite instruction. This makes it possible to increase the actual calculation efficiency. Further, by using the restriction that instructions having the same execution condition are continuously arranged, the processing for selecting an instruction for actually executing an operation at the decoding stage is speeded up.

［実施の形態４：コンパイル装置］
次に上述の実施の形態３におけるプロセッサで実行するコードを生成するコンパイル装置、及びそのコンパイル方法に関する実施の形態について説明する。 [Embodiment 4: Compiling device]
Next, an embodiment relating to a compiling device that generates code to be executed by the processor and a compiling method thereof according to the third embodiment will be described.

本コンパイル装置の構成については、大部分は前述の第２の実施の形態のコンパイル装置と同一であるが、１つの実行グループ内の命令について、それぞれの実行条件に応じて配置に制限がある点と、プロセッサの解読ステージにおける命令の結合を考慮に入れている点とが異なる。具体的には、命令スケジューリング部の構成が異なることになる。 The configuration of the compiling device is mostly the same as that of the compiling device of the second embodiment described above, but there is a limitation on the arrangement of instructions in one execution group depending on the execution conditions. And that it takes into account the combination of instructions in the decoding stage of the processor. Specifically, the configuration of the instruction scheduling unit is different.

（命令スケジューリング部）
本実施の形態のコンパイル装置の命令スケジューリング部は、第２の実施の形態における命令スケジューリング部１０２と同様に、条件排他性解析部、依存関係解析部、命令再配置部、及び実行境界付加部から構成されるが、異なる点は命令再配置部の再配置の方法のみである。 (Instruction scheduling part)
The instruction scheduling unit of the compiling device according to the present embodiment includes a condition exclusiveness analysis unit, a dependency relationship analysis unit, an instruction relocation unit, and an execution boundary addition unit, similar to the instruction scheduling unit 102 according to the second embodiment. However, the only difference is the rearrangement method of the instruction rearrangement unit.

図２５に、本実施の形態のコンパイル装置の命令再配置部のフローチャートを示す。本実施の形態のコンパイル装置の命令再配置部の処理手順は、実施の形態２のコンパイル装置の命令再配置部１１２の処理手順と大部分は同一であるが、配置可能か否かを判定する部分と、配置ノード確定後に配置順序を調整する点とが異なる。具体的には、図２５中のステップＳ７１〜Ｓ８２のうち、配置可能判定（ステップＳ７５）と配置順序調整（ステップＳ８１）とが、図１８に示したフローと異なることになる。 FIG. 25 shows a flowchart of the instruction rearrangement unit of the compiling device of this embodiment. The processing procedure of the instruction relocation unit of the compiling device according to the present embodiment is mostly the same as the processing procedure of the instruction relocation unit 112 of the compiling device according to the second embodiment. The difference is that the arrangement order is adjusted after the arrangement node is determined. Specifically, among steps S71 to S82 in FIG. 25, the arrangement possibility determination (step S75) and the arrangement order adjustment (step S81) are different from the flow shown in FIG.

実施の形態２のコンパイル装置と同様に、条件排他性解析部及び依存関係解析部を経て、依存グラフが生成され、命令再配置部に移ってくる。そして、条件排他性を考慮した依存グラフに基づいて命令の再配置を行っていくわけであるが、ステップＳ７４にて最良ノードを選択した後、ステップＳ７５にて配置可能判定を行う際に、仮配置済みのノード群と最良ノードについて、実行条件の排他性だけでなく、全ての組み合わせについての命令結合の可能性も考慮して、配置可能判定を行う。つまり、ある２つのノードが結合可能であった場合、それら２つのノードを合わせて１つの命令として扱い、配置可能判定を行う。 Similar to the compiling device of the second embodiment, a dependency graph is generated through the condition exclusivity analysis unit and the dependency relationship analysis unit, and then moved to the instruction relocation unit. Then, the instruction rearrangement is performed based on the dependency graph in consideration of the condition exclusivity. After selecting the best node in step S74, the provisional arrangement is performed when the arrangement possibility determination is performed in step S75. With respect to the completed node group and the best node, the arrangement possibility determination is performed in consideration of not only the exclusivity of the execution conditions but also the possibility of instruction combination for all combinations. That is, when two nodes can be combined, the two nodes are combined and treated as one instruction, and an arrangement possibility determination is performed.

更に、ステップＳ８０にて当該サイクルで配置可能となったノードが確定した後、ステップＳ８１にて配置順序の調整を行う。具体的には、当該サイクルで配置可能となったノード群をそれぞれ実行条件ごとに分類し、同一の実行条件を持つノードが必ず連続して配置されるように、ノードの配置順序を調整する。これによって、ハードウェアにおける制御の簡単化を図っている。 Further, after the nodes that can be arranged in the cycle are determined in step S80, the arrangement order is adjusted in step S81. Specifically, the node groups that can be arranged in the cycle are classified for each execution condition, and the arrangement order of the nodes is adjusted so that nodes having the same execution condition are always arranged continuously. This simplifies control in hardware.

（コンパイル装置の動作）
図２６を用いて、本コンパイル装置の特徴的な構成要素の動作について具体的な命令を用いて説明する。図２６は、コンパイラ上流部及びアセンブラコード生成部を経て生成されたアセンブラコードの一例である。命令スケジューリング部は、図２６のコードを入力として受け取る。図２６に含まれる各命令の意味は以下のとおりである。ただし、条件フラグＣ０とＣ１は、命令１以前の命令によって排他な関係となっていることを想定している。
・命令１…レジスタＲ３の格納値をレジスタＲ１の格納値の分だけ論理右シフトする。
・命令２…条件フラグＣ１の値が真の場合にのみ、レジスタＲ０の格納値に１を加算してレジスタＲ０に格納する。
・命令３…条件フラグＣ０の値が真の場合にのみ、レジスタＲ０の格納値から１を減算してレジスタＲ０に格納する。
・命令４…条件フラグＣ１の値が真の場合にのみ、レジスタＲ１の格納値とレジスタＲ２の格納値とを加算してレジスタＲ２に格納する。 (Compiler operation)
With reference to FIG. 26, the operation of the characteristic components of the compiling apparatus will be described using specific instructions. FIG. 26 shows an example of assembler code generated through the compiler upstream section and the assembler code generation section. The instruction scheduling unit receives the code shown in FIG. 26 as an input. The meaning of each instruction included in FIG. 26 is as follows. However, it is assumed that the condition flags C0 and C1 are in an exclusive relationship with the instruction before the instruction 1.
Instruction 1... Logically shifts the stored value of register R3 by the amount stored in register R1.
Instruction 2 ... Only when the value of the condition flag C1 is true, 1 is added to the stored value of the register R0 and stored in the register R0.
Instruction 3 ... Only when the value of the condition flag C0 is true, 1 is subtracted from the stored value of the register R0 and stored in the register R0.
Instruction 4 ... Only when the value of the condition flag C1 is true, the stored value of the register R1 and the stored value of the register R2 are added and stored in the register R2.

以下、命令スケジューリング部の動作について説明する。まず、条件排他性解析部と依存関係解析部が起動され、依存グラフが生成される。この例では、条件フラグＣ０とＣ１が排他であることを考慮しつつ、資源の定義及び参照関係を解析する。 The operation of the instruction scheduling unit will be described below. First, the condition exclusiveness analysis unit and the dependency relationship analysis unit are activated, and a dependency graph is generated. In this example, the definition and reference relationship of resources are analyzed while considering that the condition flags C0 and C1 are exclusive.

次に、命令再配置部が起動される。図２５のフローチャートに沿って説明すると、まず配置候補ノード集合を生成する（ステップＳ７２）。ここでは命令１のみが配置候補ノードとなる。次に最良ノードを取り出す（ステップＳ７４）。ここでは、自動的に命令１が選択される。そして、配置可能判定（ステップＳ７５）において、配置可能であると判定される。更に、配置状態判定（ステップＳ７７）においても、まだ配置可能であると判定される。そして、配置候補ノード追加（ステップＳ７８）において、追加対象となる命令として、命令２、命令３及び命令４が配置候補ノードに追加される。 Next, the instruction relocation unit is activated. Describing along the flowchart of FIG. 25, first, an arrangement candidate node set is generated (step S72). Here, only the instruction 1 is a placement candidate node. Next, the best node is extracted (step S74). Here, instruction 1 is automatically selected. And in arrangement | positioning possibility determination (step S75), it determines with arrangement | positioning being possible. Further, in the arrangement state determination (step S77), it is determined that the arrangement is still possible. Then, in the placement candidate node addition (step S78), the command 2, the command 3, and the command 4 are added to the placement candidate nodes as the commands to be added.

そして、再び戻って最良ノードを取り出す（ステップＳ７４）。ここでは、まず命令２が選択されて、配置可能であると判定される（ステップＳ７５）。 And it returns again and takes out the best node (step S74). Here, first, the instruction 2 is selected and it is determined that it can be arranged (step S75).

その後、再び戻って最良ノードを取り出す（ステップＳ７４）。ここでは、命令３が選択される。そして、命令２と命令３の実行条件は排他であるので、２個という演算器の制約を満たし、配置可能であると判定される（ステップＳ７５）。 Thereafter, the best node is retrieved again (step S74). Here, instruction 3 is selected. Since the execution conditions of the instruction 2 and the instruction 3 are exclusive, it is determined that the restriction of the two arithmetic units is satisfied and the arrangement is possible (step S75).

更に、戻って最良ノードを取り出す（ステップＳ７４）。ここでは、残った命令４が自動的に選択される。そして、配置可能判定を行う（ステップＳ７５）のであるが、実行条件Ｃ０が真であると仮定した場合には、命令１と命令３のみが有効となるので演算器の制約を満たす。一方、実行条件Ｃ１が真であると仮定した場合には、命令１、命令２及び命令４の３つの命令が有効となってしまう。ここで、これらの全ての組み合わせについて、命令の結合可能性を検討する。ここでは、命令１と命令４を結合して、ハードウェアに備えられたシフト加算命令とすることが可能であると判定され、結果的に２つの命令が有効となるので、配置可能であると判定される。 Further, the best node is retrieved again (step S74). Here, the remaining instruction 4 is automatically selected. Then, the arrangement possibility determination is performed (step S75). When it is assumed that the execution condition C0 is true, only the instruction 1 and the instruction 3 are valid, so that the restriction of the arithmetic unit is satisfied. On the other hand, if it is assumed that the execution condition C1 is true, the three instructions of the instruction 1, the instruction 2 and the instruction 4 are valid. Here, the possibility of combining instructions is examined for all of these combinations. Here, it is determined that the instruction 1 and the instruction 4 can be combined to form a shift addition instruction provided in the hardware, and as a result, the two instructions become valid, and therefore can be arranged. Determined.

以上で、全ての命令が第１サイクルに配置されたことになり、配置ノードを確定する（ステップＳ８０）。次に、各ノードを実行条件で分類し、配置順序の調整を行う（ステップＳ８１）。具体的には、命令２と命令４の実行条件がＣ１で同一なので、命令２と命令４が連続して配置されるように、配置順を、命令１、命令２、命令４、命令３の順に並べ直す。以上で、命令再配置部の処理が完了する。 As described above, all the instructions are arranged in the first cycle, and the arrangement node is determined (step S80). Next, each node is classified according to execution conditions, and the arrangement order is adjusted (step S81). Specifically, since the execution conditions of the instruction 2 and the instruction 4 are the same in C1, the arrangement order of the instruction 1, the instruction 2, the instruction 4, and the instruction 3 is set so that the instruction 2 and the instruction 4 are continuously arranged. Rearrange in order. This completes the processing of the instruction relocation unit.

最後に、実行境界付加部が起動される。ここでは、上記の命令再配置部による配置された命令群の末尾の命令に並列実行境界情報を設定していく。具体的には、命令３の並列実行境界情報に“１”を設定し、残りの命令の並列実行境界情報には“０”を設定する。以上で命令スケジューリング部の処理が完了する。 Finally, the execution boundary adding unit is activated. Here, parallel execution boundary information is set to the last instruction of the instruction group arranged by the instruction relocation unit. Specifically, “1” is set to the parallel execution boundary information of the instruction 3 and “0” is set to the parallel execution boundary information of the remaining instructions. This completes the process of the instruction scheduling unit.

以上のように、本実施の形態のコンパイル装置では、図２６に示した命令列は、１つの実行グループで実行されるようにコンパイルされる。ここには、配置可能判定（ステップＳ７５）において、プロセッサの解読ステージでの命令の結合を考慮したことの効果が現れている。更に、同一の実行条件を持つ命令を連続して配置するように調整することにより、プロセッサの解読ステージにて有効な命令を選択する際の制御を簡単化することができる。 As described above, in the compiling device according to the present embodiment, the instruction sequence shown in FIG. 26 is compiled so as to be executed by one execution group. Here, the effect of considering the combination of instructions at the decoding stage of the processor in the arrangement possibility determination (step S75) appears. Further, by adjusting so that instructions having the same execution condition are continuously arranged, it is possible to simplify the control when selecting an effective instruction in the decoding stage of the processor.

以上、本発明に係るプロセッサ及びコンパイル装置について、実施形態に基づいて説明したが、本発明はこれらの実施形態に限られないことは勿論である。変形例を以下に列挙する。 The processor and the compiling device according to the present invention have been described based on the embodiments. However, the present invention is not limited to these embodiments. The modifications are listed below.

（１）上記実施の形態のプロセッサ及びコンパイル装置では、固定長の命令を実行することを想定していたが、本発明はこのような命令フォーマットに限定されるものではない。可変長の命令フォーマットを採用しても本発明の有意性は保たれる。 (1) In the processor and compiling apparatus of the above embodiment, it is assumed that a fixed-length instruction is executed, but the present invention is not limited to such an instruction format. Even if a variable-length instruction format is adopted, the significance of the present invention is maintained.

（２）上記実施の形態のプロセッサ及びコンパイル装置では、２個の演算器を持つことを想定していたが、本発明はこの演算器数に限定されるものではない。１個の演算器もしくは３個以上の演算器を持つプロセッサを想定しても、本発明の有意性は保たれる。 (2) In the processor and the compiling device of the above embodiment, it is assumed that there are two arithmetic units, but the present invention is not limited to this number of arithmetic units. The significance of the present invention is maintained even if a processor having one arithmetic unit or three or more arithmetic units is assumed.

（３）上記実施の形態のプロセッサ及びコンパイル装置では、コンパイル装置が静的に命令並列性を抽出することを想定していたが、本発明はこの命令並列処理方式に限定されるものではない。例えば、ハードウェアで動的に命令並列性を抽出するスーパースカラ方式を採用しても、本発明の有意性は保たれる。この場合、本発明の命令フォーマットから並列実行境界情報Ｅを除去し、この情報に依存する処理を全て命令発行制御部にて動的に検出しながら実施すればよい。 (3) In the processor and compiling apparatus of the above embodiment, it is assumed that the compiling apparatus statically extracts instruction parallelism, but the present invention is not limited to this instruction parallel processing system. For example, the significance of the present invention is maintained even if a superscalar method that dynamically extracts instruction parallelism by hardware is adopted. In this case, the parallel execution boundary information E may be removed from the instruction format of the present invention, and all processes depending on this information may be performed while being dynamically detected by the instruction issue control unit.

（４）上記実施の形態のコンパイル装置の命令再配置部では、図１８中のステップＳ５４における最良ノードの決定方法として、依存グラフの終端までの実行時間の総和を用いていたが、本発明は、この選択基準に限定されるものではない。例えば、複数の実行フローの中で特定のパスを優先的に選択するようにしてもよい。この場合、最良ノードの取り出し（ステップＳ５４）の際に、ある特定の実行条件を持つ命令の優先度を高めておく。これによって、実行頻度の高いパスなどの、特定の実行パスに特化したスケジューリングを行うことができる。 (4) In the instruction rearrangement unit of the compiling device of the above embodiment, the sum of execution times until the end of the dependency graph is used as the best node determination method in step S54 in FIG. However, it is not limited to this selection criterion. For example, a specific path may be preferentially selected from among a plurality of execution flows. In this case, the priority of an instruction having a specific execution condition is raised at the time of taking out the best node (step S54). As a result, scheduling specialized for a specific execution path such as a path with high execution frequency can be performed.

（５）上記実施の形態のプロセッサの命令発行制御部では、最初に現れる並列実行境界情報Ｅが“１”となっている命令以降の命令の解読結果を必ず無効化していたが、必ずしもその必要はない。命令発行制御部内の実行命令選択制御部にて、実行部に転送すると判断した命令が、前記最初に現れる並列実行境界情報Ｅが“１”である命令以前に１つも存在しなかった場合、当該サイクル全体を削除し、次に現れる並列実行境界情報Ｅが“１”である命令までの命令群をこのサイクルでの発行対象とすればよい。つまり、並列実行境界情報Ｅが“１”である命令以前に有効なオペレーションを実行すべきと判定された命令が１つでも存在した場合にのみ、その命令を並列実行の境界とみなして以降の命令の解読結果を無効化し、そうでなかった場合には、その命令の並列実行境界情報Ｅは無視し、後続の命令の並列実行境界情報Ｅを参照することにより新たな並列実行の境界を検出すればよい。これによって、更に実行サイクル数を削減することができる。 (5) In the instruction issue control unit of the processor of the above embodiment, the decoding result of the instruction after the instruction in which the parallel execution boundary information E that appears first is “1” is always invalidated. There is no. In the execution instruction selection control unit in the instruction issuance control unit, if there is no instruction that is determined to be transferred to the execution unit before the first parallel execution boundary information E that is “1”, The entire cycle may be deleted, and an instruction group up to an instruction whose parallel execution boundary information E that appears next is “1” may be an issue target in this cycle. That is, only when there is at least one instruction determined to execute a valid operation before the instruction whose parallel execution boundary information E is “1”, the instruction is regarded as a boundary for parallel execution and thereafter Invalidate the result of decoding the instruction. If not, ignore the parallel execution boundary information E of the instruction and detect the new parallel execution boundary by referring to the parallel execution boundary information E of the subsequent instruction. do it. As a result, the number of execution cycles can be further reduced.

以上説明してきたとおり、本発明によれば、ハードウェアの有効利用を達成し、性能を向上させたプロセッサを提供することが可能であり、特に並列処理において演算器の効率的活用により性能の向上を図る技術として有用である。 As described above, according to the present invention, it is possible to provide a processor that achieves effective use of hardware and improves performance, and in particular, improves performance by efficiently using arithmetic units in parallel processing. This is useful as a technique for achieving this.

（ａ）〜（ｃ）は本発明の実施形態１に係るプロセッサが実行する命令の構造を示す図である。(A)-(c) is a figure which shows the structure of the command which the processor which concerns on Embodiment 1 of this invention performs. （ａ）及び（ｂ）は同プロセッサにおける命令の供給と発行の概念を示す図である。(A) And (b) is a figure which shows the concept of the supply and issue of the command in the processor. 同プロセッサのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the processor. 同プロセッサの命令レジスタ周辺の構成を示すブロック図である。It is a block diagram which shows the structure around the instruction register of the processor. 同プロセッサの命令発行制御部とその周辺の回路構成を示す図である。It is a figure which shows the command issue control part of the processor, and its peripheral circuit structure. 同プロセッサにて命令列を実行した際のパイプラインのタイミングを示す図である。It is a figure which shows the timing of the pipeline at the time of executing an instruction sequence in the processor. 条件付き実行命令を含むプログラムの一部を示す図である。It is a figure which shows a part of program containing a conditional execution instruction. 従来の命令発行制御部を持つプロセッサの命令レジスタ周辺の構成を示すブロック図である。It is a block diagram which shows the structure around the instruction register of the processor which has the conventional instruction issue control part. 図７のプログラムの処理を従来の命令発行制御部を持つプロセッサで行わせるプログラムを示す図である。It is a figure which shows the program which performs the process of the program of FIG. 7 with the processor which has the conventional command issue control part. 条件分岐を含む処理のフローを示す図である。It is a figure which shows the flow of the process containing a conditional branch. 図１０のフローの処理を条件付き実行方式で記述したプログラムを示す図である。It is a figure which shows the program which described the process of the flow of FIG. 10 by the conditional execution system. 本発明の実施形態２におけるコンパイル装置の構成及び関連するデータを示すブロック図である。It is a block diagram which shows the structure of the compilation apparatus in Embodiment 2 of this invention, and related data. 同コンパイル装置における条件排他性解析部の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the condition exclusiveness analysis part in the compilation apparatus. 同コンパイル装置における２命令間の実行条件排他性検出の処理手順を示すフローチャートである。It is a flowchart which shows the processing procedure of the execution condition exclusiveness detection between two instructions in the same compiling apparatus. アセンブラコードの一例を示す図である。It is a figure which shows an example of an assembler code. 図１５のアセンブラコードの命令２に対応する条件排他情報テーブルを示す図である。It is a figure which shows the condition exclusion information table corresponding to the instruction 2 of the assembler code of FIG. 図１５に対応する依存グラフである。16 is a dependency graph corresponding to FIG. 同コンパイル装置における命令再配置部の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the instruction rearrangement part in the compiling apparatus. アセンブラコードの一例を示す図である。It is a figure which shows an example of an assembler code. 図１９に対応する依存グラフである。It is a dependence graph corresponding to FIG. 図１９に対応する実行形式コードを示す図である。It is a figure which shows the execution format code corresponding to FIG. 図１９のコードを従来のコンパイル装置でスケジューリングした場合の実行形式コードの一例を示す図である。It is a figure which shows an example of the execution format code at the time of scheduling the code of FIG. 19 with the conventional compilation apparatus. 本発明の実施の形態３に係るプロセッサの命令発行制御部とその周辺の回路構成を示す図である。It is a figure which shows the instruction | indication issue control part of the processor which concerns on Embodiment 3 of this invention, and its peripheral circuit structure. 条件付き実行命令を含むプログラムの一部を示す図である。It is a figure which shows a part of program containing a conditional execution instruction. 本発明の実施の形態４に係るコンパイル装置における命令再配置部の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the instruction rearrangement part in the compilation apparatus which concerns on Embodiment 4 of this invention. アセンブラコードの一例を示す図である。It is a figure which shows an example of an assembler code.

Explanation of symbols

１０並列実行境界情報（Ｅ）
１１実行条件情報（Ｐ）
２０命令供給部（命令供給手段）
２１命令フェッチ部
２２命令バッファ
２３命令レジスタ
２３１Ａレジスタ
２３２Ｂレジスタ
２３３Ｃレジスタ
２３４Ｄレジスタ
３０解読部
３１命令発行制御部（命令発行制御手段）
３１１条件フラグ
３１２条件フラグ有効情報
３１３実行命令選択制御部
３１４，３１５論理回路
３２命令デコーダ（解読手段）
３３第１命令デコーダ
３４第２命令デコーダ
３５第３命令デコーダ
３６第４命令デコーダ
３７１〜３７４実行命令セレクタ
３８命令無効化方法選択部（命令無効化方法選択手段）
４０実行部（実行手段）
４１実行制御部
４２ＰＣ（プログラムカウンタ）部
４３レジスタファイル
４４第１演算器
４５第２演算器
４６書き込み制御部（実行結果無効化手段）
４７オペランドアクセス部
４８，４９データバス
１００コンパイラ上流部
１０１アセンブラコード生成部
１０２命令スケジューリング部（命令スケジューリング手段）
１０３オブジェクトコード生成部
１１０条件排他性解析部（条件排他性解析手段）
１１１依存関係解析部（依存関係解析手段）
１１２命令再配置部（命令再配置手段）
１１３実行境界付加部（実行境界付加手段）
１２０ソースコード
１３０オブジェクトコード
１４０命令発行制御部（命令発行制御手段）
１４１実行命令選択制御部
１４２命令結合部 10 Parallel execution boundary information (E)
11 execution condition information (P)
20 Command supply unit (command supply means)
21 instruction fetch unit 22 instruction buffer 23 instruction register 231 A register 232 B register 233 C register 234 D register 30 decoding unit 31 instruction issue control unit (instruction issue control means)
311 Condition flag 312 Condition flag valid information 313 Execution instruction selection control units 314 and 315 Logic circuit 32 Instruction decoder (decoding means)
33 First instruction decoder 34 Second instruction decoder 35 Third instruction decoder 36 Fourth instruction decoders 371 to 374 Execution instruction selector 38 Instruction invalidation method selection unit (instruction invalidation method selection means)
40 execution unit (execution means)
41 execution control unit 42 PC (program counter) unit 43 register file 44 first arithmetic unit 45 second arithmetic unit 46 write control unit (execution result invalidating means)
47 Operand access section 48, 49 Data bus 100 Compiler upstream section 101 Assembler code generation section 102 Instruction scheduling section (instruction scheduling means)
103 Object code generation part 110 Condition exclusivity analysis part (condition exclusivity analysis means)
111 Dependency Analysis Unit (Dependency Analysis Unit)
112 Instruction relocation unit (command relocation means)
113 Execution boundary adding unit (execution boundary adding means)
120 Source code 130 Object code 140 Instruction issue control section (command issue control means)
141 Execution Instruction Selection Control Unit 142 Instruction Coupling Unit

Claims

Command supply means for supplying a plurality of commands;
Decoding means for decoding each of the plurality of instructions;
An instruction or instruction for executing a valid operation by specifying execution condition information for specifying a condition indicating whether or not to execute each instruction in the plurality of instructions and referring to the condition specified by the execution condition information Instruction issue control means for determining a set of
A processor comprising an execution means for specifying an operation of each instruction in the plurality of instructions and executing one or a plurality of operations based on the specification;
The instruction issue control means determines whether the instruction is a valid instruction that needs to be executed or an invalid instruction that does not need to be executed by referring to the condition specified by the execution condition information. With respect to an instruction determined to be a correct instruction, the instruction is controlled to be deleted before the instruction is issued to the execution means, and a valid instruction subsequent to the instruction is executed instead of the instruction. A processor having a function of controlling to be issued to a means.

The processor of claim 1, wherein
The execution means includes execution result invalidation means for invalidating an execution result after executing an operation corresponding to the instruction.
Each instruction further includes instruction invalidation method selection means for selecting whether the instruction itself is deleted before being issued to the execution means or whether the execution result is invalidated by the execution result invalidation means. A processor characterized by that.

The processor of claim 2, wherein
The instruction invalidation method selection means determines which instruction invalidation method to select by referring to condition flag valid information indicating whether or not the value of each condition flag is fixed,
When the condition flag valid information is decoded by the decoding means as an instruction to update the condition flag, the determinism of the condition flag is set to false, and the instruction is executed by the execution means and the condition flag is A processor that is set to true when the value of is determined.

The processor of claim 1, wherein
The instruction issue control means further has a function of detecting a combination of instructions such that the functions of a plurality of instructions can be realized by a single instruction, and combining the plurality of instructions so as to handle them as a single instruction. A processor characterized by that.

The processor of claim 4, wherein
The combination of the plurality of instructions is applied after deletion of an instruction before issuance to the execution means.

The processor of claim 1, wherein
When the instructions having the same execution condition information are continuously arranged in each cycle, the instruction issue control means classifies the plurality of instructions decoded by the decoding means for each execution condition in advance, A processor characterized by referring to a condition flag for each classification to determine whether it is a valid instruction that needs to be executed or an invalid instruction that does not need to be executed.

The processor of claim 1, wherein
Parallel execution boundary information indicating whether each instruction is a boundary for parallel execution among the plurality of instructions is specified,
The instruction issue control means further has a function of referring to the parallel execution boundary information of each instruction and detecting a group of instructions to be executed in this cycle.

The processor of claim 7, wherein
The instruction issue control means, when all instructions before the boundary instruction detected in the parallel execution boundary information in the instruction are deleted as invalid instructions that do not need to be executed, the parallel execution boundary of the boundary instruction A processor characterized by detecting a new parallel execution boundary of the current cycle by invalidating information and referring to parallel execution boundary information of an instruction after the boundary instruction.

Command supply means for supplying a plurality of commands;
Decoding means for decoding each of the plurality of instructions;
An instruction issue control means for determining an instruction or set of instructions for executing a valid operation;
A processor comprising an execution means for specifying an operation of each instruction in the plurality of instructions and executing one or a plurality of operations based on the specification;
The instruction issuance control means detects a combination of instructions such that a function of a plurality of instructions can be realized by a single instruction from the group of instructions decoded by the decoding means, and determines the plurality of instructions. A processor having a function of being combined so as to be treated as a single instruction.

The processor of claim 9, wherein
The instruction issue control means detects a remaining instruction group that is neither an execution target, a deletion target, nor a connection target in this cycle, and sets these instruction groups as a target of issue in the next cycle or later. A processor further having a function of controlling.