JP2000284968A

JP2000284968A - Method and device for compiling

Info

Publication number: JP2000284968A
Application number: JP11093896A
Authority: JP
Inventors: Hidenori Matsuzaki; 秀則松崎; Toru Imai; 徹今井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-03-31
Filing date: 1999-03-31
Publication date: 2000-10-13
Anticipated expiration: 2019-03-31
Also published as: JP3648402B2

Abstract

PROBLEM TO BE SOLVED: To realize register assignment for avoiding false dependence by deciding a real register as a candidate for preferentially assigning to a virtual register when the real register which never generate a new inter-instruction dependence relation exists based on information showing a section analyzing result and inter-instruction dependence relation. SOLUTION: By inputting a source program written in a high-level language, an analysis part 1 analyzes characters/phrases, syntax, etc., with respect to the inputted source program 11 to generate a first intermediate code 12. Next, an optimization part 2 optimizes the code 12 for accelerating processing to generate a second intermediate code 13. The part 2 executes flow analysis, data depending analysis, and instruction assignment, register assignment, etc. An output part 3 generates a machine language (object program) 14 which can be executed by an object processor based on the optimized code 13.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ｏｕｔ−ｏｆ−ｏ
ｒｄｅｒ実行をサポートし複数の並行に動作する演算器
を持つプロセッサ上で実行されるオブジェクトプログラ
ムを生成するためのコンパイル方法及びコンパイラ装置
に関する。TECHNICAL FIELD The present invention relates to an out-of-o
The present invention relates to a compiling method and a compiling apparatus for generating an object program to be executed on a processor having a plurality of arithmetic units that operate in parallel and support rder execution.

【０００２】[0002]

【従来の技術】命令の実行速度の高速化のためのＣＰＵ
アーキテクチャとして、レジスタやキャッシュを共有し
並行に動作する演算器を複数持つＣＰＵアーキテクチャ
が知られている。その代表的なものとして、コンパイル
時にスタティックに資源を割り当て使用するＶＬＩＷ
（ＶｅｒｙＬｏｎｇＩｎｓｔｒｕｃｔｉｏｎＷｏ
ｒｄ）や、実行時に資源の割り当てをダイナミックに行
うスーパスカラがある（Ｊ．Ｌ．Ｈｅｎｎｅｓｓｙ
＆Ｄ．Ａ．Ｐａｔｔｅｒｓｏｎ、 “ＣＯＭＰ
ＵＴＥＲＡＲＣＨＩＴＥＣＴＵＲＥＡＱＵＡＮＴ
ＩＴＡＴＩＶＥＡＰＰＲＯＡＣＨ”、Ｃｈａｐｔｅ
ｒ４参照）。以下ではそれらのようなＣＰＵアーキテク
チャを総称してＩＬＰアーキテクチャ（ＩＬＰ：Ｉｎｓ
ｔｒｕｃｔｉｏｎ−ＬｅｖｅｌＰａｒａｌｌｅｌｉｓ
ｍ）と呼ぶ。2. Description of the Related Art CPU for increasing instruction execution speed
As an architecture, a CPU architecture having a plurality of arithmetic units that operate in parallel while sharing a register and a cache is known. A typical example is VLIW, which statically allocates resources at compile time and uses them.
(Very Long Instruction Wo
rd) and superscalar that dynamically allocates resources at the time of execution (JL Hennessy).
& D. A. Patterson, “COMP
UTER ARCHITECTURE A QUANT
ITATIVE APPROACH ”, Chapter
r4). Hereinafter, such CPU architectures are collectively referred to as an ILP architecture (ILP: Ins).
fraction-Level Parallellis
m).

【０００３】ＩＬＰアーキテクチャではハードウェアと
しては複数命令の並行実行によりプログラムを高速に実
行できる資源を有しているが、実際に高速性を発揮させ
るためには命令実行時の並行度（以下、ＩＬＰと呼ぶ）
が高いことが必要であり、このための方策が鍵となる。In the ILP architecture, hardware has resources capable of executing a program at high speed by executing a plurality of instructions in parallel. However, in order to actually exhibit high speed, the parallelism at the time of instruction execution (hereinafter, ILP) is required. Call it)
Must be high, and the strategy for this is key.

【０００４】ＩＬＰを上げる方法としてｏｕｔ−ｏｆ−
ｏｒｄｅｒ実行が知られている。すなわち、通常の方法
であるｉｎ−ｏｒｄｅｒ実行では、あるサイクルで実行
を開始すべく配置された命令と、それよりも後のサイク
ルで実行を開始すべく配置された命令との実行開始順序
は守られる。しかし、命令の配置順序が後であるにもか
かわらず、先に配置された命令との依存関係がなけれ
ば、先の命令を待たずに後の命令の実行の開始を許すこ
とにより、高速化を図ることが可能である。これを実現
する方法をｏｕｔ−ｏｆ−ｏｒｄｅｒ実行と呼ぶ。既存
のスーパスカラでは実際にｏｕｔ−ｏｆ−ｏｒｄｅｒ実
行をとるものが少なくない（ただし、従来のＶＬＩＷで
はｉｎ−ｏｆ−ｏｒｄｅｒ実行を前提としており、ｏｕ
ｔ−ｏｆ−ｏｒｄｅｒ実行をとるＶＬＩＷは知られてい
ない）。[0004] Out-of-
Order execution is known. That is, in the in-order execution, which is a normal method, the execution start order of an instruction arranged to start execution in a certain cycle and an instruction arranged to start execution in a later cycle is maintained. Can be However, even if the order of the instructions is later, if there is no dependency on the instructions placed earlier, the execution of the later instructions can be started without waiting for the earlier instructions, thereby increasing the speed. It is possible to achieve. A method for realizing this is called out-of-order execution. Many existing superscalars actually take out-of-order execution (however, conventional VLIW assumes in-of-order execution, and
VLIWs that take t-of-order execution are not known).

【０００５】ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行を行なうＩ
ＬＰアーキテクチャのＣＰＵでは、フォールスディペン
デンシ（ｆａｌｓｅｄｅｐｅｎｄｅｎｃｙ）の回避が
重要となる。ｆａｌｓｅｄｅｐｅｎｄｅｎｃｙとは、
先行命令において使用されているレジスタを後続命令が
再定義する場合に発生する依存関係で、例えば、図２４
（ａ）に示すように、命令Ａの使用したレジスタＲ２
を、命令Ａの後に実行される命令Ｂが定義するときに、
発生する。ここで、レジスタを使用するとはレジスタの
値を参照する（読み出す）ことを意味し、レジスタを定
義するとはレジスタの値を変更する（書き込む）ことを
意味する。図２４（ａ）の命令列を実行する場合、レジ
スタＲ２の値を命令Ｂで再定義しても命令Ａの結果が正
しくなることが保証されるまで命令Ｂの実行を待たなけ
ればならないため、命令Ａと命令Ｂを同時に実行できず
ＩＬＰを下げる原因となる。I for executing out-of-order execution
In the CPU of the LP architecture, it is important to avoid false dependency. What is false dependency?
FIG. 24 shows a dependency generated when a subsequent instruction redefines a register used in a preceding instruction.
As shown in (a), the register R2 used by the instruction A
Is defined by the instruction B executed after the instruction A,
appear. Here, using a register means referring to (reading) the value of the register, and defining a register means changing (writing) the value of the register. When the instruction sequence shown in FIG. 24A is executed, it is necessary to wait for the execution of the instruction B until the result of the instruction A is guaranteed to be correct even if the value of the register R2 is redefined by the instruction B. Instruction A and instruction B cannot be executed at the same time, resulting in a decrease in ILP.

【０００６】スーパスカラでは、ｆａｌｓｅｄｅｐｅ
ｎｄｅｎｃｙの回避のために、レジスタ・リネーミング
と呼ばれる方法をとるものが多い。これは、図２４
（ａ）を例にすると、命令Ｂにおいて依存しているレジ
スタＲ２を依存関係の発生しないようなレジスタ（例え
ば、レジスタＲ７とする）と置き換えることにより、命
令Ａと命令Ｂとの間のｆａｌｓｅｄｅｐｅｎｄｅｎｃ
ｙを回避するものである。この際、もとのプログラムで
命令Ｂが定義したレジスタＲ２の値を使用している他の
命令に関しても、その使用レジスタＲ２をレジスタＲ７
に置き換える必要がある。この例の場合、レジスタ・リ
ネーミング後の命令列は、図２４（ｂ）のようになり、
命令Ａと命令Ｂとが同時に実行可能となる。[0006] In super scalar, false depth
In order to avoid dendency, a method called register renaming is often used. This is shown in FIG.
Taking (a) as an example, by replacing the register R2 which depends on the instruction B with a register (for example, register R7) which does not cause a dependency, a false dependency between the instruction A and the instruction B is replaced.
y is avoided. At this time, for the other instruction using the value of the register R2 defined by the instruction B in the original program, the used register R2 is also changed to the register R7.
Need to be replaced with In the case of this example, the instruction sequence after register renaming is as shown in FIG.
Instruction A and instruction B can be executed simultaneously.

【０００７】このようにレジスタ・リネーミングによっ
てｆａｌｓｅｄｅｐｅｎｄｅｎｃｙを回避することが
従来から行われているが、この処理は実行時に複雑な制
御を行なうために、ＣＰＵの周波数を上げられない原因
になり、最適な方法とは言えなかった。Although false dependency is conventionally avoided by register renaming as described above, this process performs complicated control at the time of execution, and causes a problem that the frequency of the CPU cannot be increased. It was not the best way.

【０００８】レジスタ・リネーミングを行なわずにｆａ
ｌｓｅｄｅｐｅｎｄｅｎｃｙによる速度低下を避ける
ためには、あらかじめＣＰＵで実行すべき命令列をコン
パイラが生成するときにｆａｌｓｅｄｅｐｅｎｄｅｎ
ｃｙが起こりにくいように考慮して生成する必要があ
る。[0008] Fa without register renaming
In order to avoid a decrease in speed due to lsdependency, when the compiler generates an instruction sequence to be executed by the CPU in advance, falsedependency is used.
It is necessary to generate the cy so that it does not easily occur.

【０００９】しかし、ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行に
おいてどのような順序で命令が実行されるのかは、コン
パイル時に静的に見積もることができない。そのため、
コンパイル時に静的に見積もった仮想レジスタの生存区
間に基づいて仮想レジスタに対して実レジスタを割り当
てると、実行時の動的な命令発行により生存区間にずれ
が生じて、同じ実レジスタが割り当てられた命令間でｆ
ａｌｓｅｄｅｐｅｎｄｅｎｃｙが発生するという問題
があった。However, the order in which instructions are executed in out-of-order execution cannot be statically estimated at compile time. for that reason,
When real registers are assigned to virtual registers based on the virtual register live range that is statically estimated at compile time, the live range shifts due to dynamic instruction issuance at runtime, and the same real register is allocated. F between instructions
There was a problem that the "alse dependency" occurred.

【００１０】[0010]

【発明が解決しようとする課題】以上説明したように、
ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行を行なうＩＬＰアーキテ
クチャのプロセッサでは、高速化のためにｆａｌｓｅ
ｄｅｐｅｎｄｅｎｃｙの回避が重要となる。スーパース
カラで用いられるレジスタ・リネーミングは、ｆａｌｓ
ｅｄｅｐｅｎｄｅｎｃｙの回避が可能な反面、実行時
に複雑な制御を行なうため、結局、高速化に寄与し難い
という問題があった。また、従来のコンパイル方法に
は、ｆａｌｓｅｄｅｐｅｎｄｅｎｃｙを回避すること
を考慮したものはなかった。一方、従来のＶＬＩＷにつ
いてはｏｕｔ−ｏｆ−ｏｒｄｅｒ実行を可能とするもの
が知られていなかった。As described above,
In an ILP architecture processor that performs out-of-order execution, false
Avoiding dependency is important. Register renaming used in superscalar is fals
Although e dependency can be avoided, complicated control is performed at the time of execution, so that there is a problem that it is difficult to contribute to speeding up. In addition, there is no conventional compiling method that considers to avoid false dependency. On the other hand, there is no known VLIW capable of executing out-of-order execution.

【００１１】本発明は、上記事情を考慮してなされたも
ので、ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行をサポートするプ
ロセッサのためのコンパイル方法及びコンパイラ装置で
あって、ハードウェアによるレジスタ・リネーミング機
構なしでｏｕｔ−ｏｆ−ｏｒｄｅｒ実行特有のｆａｌｓ
ｅｄｅｐｅｎｄｅｎｃｙを回避するためのレジスタ割
当を可能にしたコンパイル方法及びコンパイラ装置を提
供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and provides a compiling method and a compiling apparatus for a processor that supports out-of-order execution, without using a register renaming mechanism by hardware. out-of-order execution specific fals
It is an object of the present invention to provide a compiling method and a compiling device that enable register allocation to avoid e-dependency.

【００１２】[0012]

【課題を解決するための手段】本発明（請求項１）は、
並行に命令を実行できる複数の演算器を備えるとともに
命令配置順序において後続する命令の実行をそれに先行
する命令の実行よりも前に開始させることを可能とする
機能を有するプロセッサを対象として、与えられたソー
スプログラムに基づいて該プロセッサで実行可能なオブ
ジェクトプログラムを生成するコンパイル方法であっ
て、前記ソースプログラムを解析して第１の中間コード
を生成する解析ステップと、前記第１の中間コードに基
づいて命令スケジューリングを行って、演算の一時的な
結果を置くレジスタとして仮想レジスタを割り当てて記
述され第２の中間コードを生成する命令スケジューリン
グステップと、前記第２の中間コードおよび前記プロセ
ッサの実レジスタに関する情報に基づいて、前記各仮想
レジスタに割り当てるべき実レジスタを決定するレジス
タ割り当てステップと、前記仮想レジスタを割り当てら
れた前記実レジスタで置き換えたオブジェクトプログラ
ムを出力する出力ステップとを有し、前記レジスタ割り
当てステップは、前記実レジスタが前記仮想レジスタに
割り当てられて使用される区間および該実レジスタの割
り当て対象となった該仮想レジスタの使用される区間を
解析するステップと、前記区間解析結果および既に発生
している命令間依存関係を示す情報をもとに、割り当て
対象となった前記仮想レジスタに割り当てても新たな命
令間依存関係を生じさせない実レジスタが存在すれば、
該実レジスタを優先的に該仮想レジスタに割り当てる候
補として決定するステップとを含むことを特徴とする。Means for Solving the Problems The present invention (claim 1) provides:
The present invention is directed to a processor having a plurality of arithmetic units capable of executing instructions in parallel and having a function of enabling execution of an instruction following the instruction in the instruction arrangement order to start before execution of the instruction preceding the instruction. A compiling method for generating an object program executable by the processor based on the source program obtained, comprising: an analyzing step of analyzing the source program to generate a first intermediate code; An instruction scheduling step of generating a second intermediate code which is described by allocating a virtual register as a register for storing a temporary result of an operation by performing instruction scheduling, and generating a second intermediate code, and a second intermediate code and a real register of the processor. Assigned to each virtual register based on information A register allocation step of determining a real register to be provided, and an output step of outputting an object program in which the virtual register has been replaced with the allocated real register, wherein the register allocation step is such that the real register is assigned to the virtual register. Analyzing the allocated and used section and the used section of the virtual register to which the real register is allocated; and analyzing the section analysis result and information indicating the inter-instruction dependency that has already occurred. If there is a real register that does not cause a new inter-instruction dependency even if it is allocated to the virtual register that has been allocated,
Determining the real register as a candidate to be preferentially assigned to the virtual register.

【００１３】命令間依存関係を示す情報（例えば、依存
グラフ）は、例えば、初期的には、ソースプログラムの
解析（例えば、データ依存解析処理）によって作成され
る。また、命令間依存関係は、例えば、前記決定するス
テップにおいて仮想レジスタに対する実レジスタの割り
当てが決定された際に、新たな命令間依存関係が発生し
た場合には、当該新たな命令間依存関係が反映されたも
のとなる。The information (for example, a dependency graph) indicating the inter-instruction dependency is created, for example, initially by analyzing a source program (for example, a data dependency analysis process). In addition, for example, when a new inter-instruction dependency is generated when the assignment of the real register to the virtual register is determined in the determining step, the new inter-instruction dependency is determined. It will be reflected.

【００１４】好ましくは、割り当て対象となった前記仮
想レジスタに割り当てても新たな命令間依存関係を生じ
させない実レジスタのうち、実際には冗長な命令間依存
関係を生じさせるが該冗長な命令間依存関係が既に発生
している命令間依存関係によって隠蔽される結果として
新たな命令間依存関係を生じさせないものとみなし得る
ような実レジスタの割り当て優先順位を、実際に冗長な
命令間依存関係を生じさせないことにより新たな命令間
依存関係を生じさせないような実レジスタの割り当て優
先順位よりも高くするようにしてもよい。Preferably, among the real registers which do not cause a new inter-instruction dependency even when assigned to the allocation-target virtual register, a redundant inter-instruction dependency is actually caused, but the redundant The priority order of real registers that can be regarded as not resulting in a new inter-instruction dependency as a result of the inter-instruction dependency already existing is determined by the actual inter-instruction dependency relationship. The priority may be set higher than the real register allocation priority such that no new instruction-to-instruction dependency is caused by not causing such a dependency.

【００１５】好ましくは、前記レジスタ割り当てステッ
プは、前記新たな依存関係を生じさせない実レジスタ以
外のレジスタのうち、命令配置順序と命令実行開始順序
の入れ替えが可能であるような実レジスタを、前記新た
な依存関係を生じさせない実レジスタに次ぐ優先順位で
割り当てる候補として決定するステップを更に含むよう
にしてもよい。[0015] Preferably, the register allocating step includes, among the registers other than the real register that does not cause the new dependency, the real register whose instruction arrangement order and instruction execution start order can be exchanged with each other. The method may further include the step of determining as a candidate to be assigned in the priority order next to the real register that does not cause any significant dependency.

【００１６】好ましくは、命令配置順序と命令実行開始
順序の入れ替えが可能であるような前記実レジスタのう
ち、割り当て対象となった前記仮想レジスタの生存区間
とその時点で割り当てが決定されている実レジスタの生
存区間との間の距離が大きい実レジスタほど高い優先順
位で割り当てる候補として決定するようにしてもよい。Preferably, among the real registers capable of exchanging the instruction arrangement order and the instruction execution start order, the live range of the virtual register to be allocated and the real area whose allocation is determined at that time. An actual register having a greater distance from a register live range may be determined as a candidate to be assigned with a higher priority.

【００１７】好ましくは、命令間依存関係におけるクリ
ティカルパス長をより小さくする実レジスタほどより高
い優先順位で割り当てる候補として決定するようにして
もよい。[0017] Preferably, the real register that makes the critical path length in the inter-instruction dependency smaller may be determined as a candidate to be assigned with a higher priority.

【００１８】好ましくは、前記レジスタ割り当てステッ
プは、前記仮想レジスタの生存区間と重複する部分を持
つ他の仮想レジスタの数および前記プロセッサの実レジ
スタの数に基づいて、実レジスタを割り当てる対象とす
る仮想レジスタの順番を決定するステップを更に含むよ
うにしてもよい。[0018] Preferably, the register allocating step sets a virtual register to which a real register is to be allocated based on the number of other virtual registers having a portion overlapping the live range of the virtual register and the number of real registers of the processor. The method may further include the step of determining the order of the registers.

【００１９】好ましくは、前記レジスタ割り当てステッ
プは、割り当て対象となった前記仮想レジスタに対して
決定された実レジスタであってその時点で実際に割り当
て可能なもののうちで最も優先順位が高い実レジスタを
選択し、該仮想レジスタと選択された該実レジスタとの
対応関係を記憶するステップを更に含むようにしてもよ
い。Preferably, in the register allocating step, the real register having the highest priority among the real registers determined for the virtual register to be allocated and which can be actually allocated at that time is selected. The method may further include the step of selecting and storing a correspondence between the virtual register and the selected real register.

【００２０】本発明（請求項８）は、並行に命令を実行
できる複数の演算器を備えるとともに命令配置順序にお
いて後続する命令の実行をそれに先行する命令の実行よ
りも前に開始させることを可能とする機能を有するプロ
セッサを対象として、与えられたソースプログラムに基
づいて該プロセッサで実行可能なオブジェクトプログラ
ムを生成するコンパイル装置であって、前記ソースプロ
グラムを解析して第１の中間コードを生成する解析手段
と、前記第１の中間コードに基づいて命令スケジューリ
ングを行って、演算の一時的な結果を置くレジスタとし
て仮想レジスタを割り当てて記述され第２の中間コード
を生成する命令スケジューリング手段と、前記第２の中
間コードおよび前記プロセッサの実レジスタに関する情
報に基づいて、前記各仮想レジスタに割り当てるべき実
レジスタを決定するレジスタ割り当て手段と、前記仮想
レジスタを割り当てられた前記実レジスタで置き換えた
オブジェクトプログラムを出力する出力手段とを具備
し、前記レジスタ割り当て手段は、前記実レジスタが前
記仮想レジスタに割り当てられて使用される区間および
該実レジスタの割り当て対象となった該仮想レジスタの
使用される区間を解析する手段と、前記区間解析結果お
よび既に発生している命令間依存関係を示す情報をもと
に、割り当て対象となった前記仮想レジスタに割り当て
ても新たな命令間依存関係を生じさせない実レジスタが
存在すれば、該実レジスタを優先的に該仮想レジスタに
割り当てる候補として決定する手段とを含むことを特徴
とする。According to the present invention (claim 8), a plurality of arithmetic units capable of executing instructions in parallel are provided, and the execution of the following instruction in the instruction arrangement order can be started before the execution of the preceding instruction. A compiling device for generating an object program executable by the processor based on a given source program for a processor having a function of: generating a first intermediate code by analyzing the source program Analysis means; instruction scheduling means for performing instruction scheduling based on the first intermediate code, and assigning and describing a virtual register as a register for storing a temporary result of the operation to generate a second intermediate code; Based on the second intermediate code and information about the real registers of the processor, Register assignment means for determining a real register to be assigned to each virtual register; and output means for outputting an object program in which the virtual register is replaced with the assigned real register, wherein the register assignment means includes the real register Means for analyzing a section used by being assigned to the virtual register and a section used by the virtual register to which the real register is assigned; and the section analysis result and inter-instruction dependency already occurring. If there is a real register that does not cause a new inter-instruction dependency even if the real register is allocated to the allocation target virtual register based on the information indicating the real register, the real register is preferentially allocated to the virtual register. Determining means.

【００２１】本発明（請求項９）は、並行に命令を実行
できる複数の演算器を備えるとともに命令配置順序にお
いて後続する命令の実行をそれに先行する命令の実行よ
りも前に開始させることを可能とする機能を有するプロ
セッサを対象として、与えられたソースプログラムに基
づいて該プロセッサで実行可能なオブジェクトプログラ
ムを生成するために、前記ソースプログラムを解析して
第１の中間コードを生成する解析させ、前記第１の中間
コードに基づいて命令スケジューリングを行って、演算
の一時的な結果を置くレジスタとして仮想レジスタを割
り当てて記述され第２の中間コードを生成させ、前記第
２の中間コードおよび前記プロセッサの実レジスタに関
する情報に基づいて、前記各仮想レジスタに割り当てる
べき実レジスタを決定するレジスタ割り当てさせ、前記
仮想レジスタを割り当てられた前記実レジスタで置き換
えたオブジェクトプログラムを出力する出力ステップと
をコンピュータに実行させるプログラムであって、前記
レジスタ割り当てステップにおいて、前記実レジスタが
前記仮想レジスタに割り当てられて使用される区間およ
び該実レジスタの割り当て対象となった該仮想レジスタ
の使用される区間を解析させ、前記区間解析結果および
既に発生している命令間依存関係を示す情報をもとに、
割り当て対象となった前記仮想レジスタに割り当てても
新たな命令間依存関係を生じさせない実レジスタが存在
すれば、該実レジスタを優先的に該仮想レジスタに割り
当てる候補として決定させるためのプログラムを記録し
たコンピュータ読取り可能な記録媒体を要旨とする。According to the present invention (claim 9), it is possible to provide a plurality of arithmetic units capable of executing instructions in parallel and to start the execution of the following instruction in the instruction arrangement order before the execution of the preceding instruction. For a processor having the function of, in order to generate an object program executable by the processor based on a given source program, the source program is analyzed to generate a first intermediate code, Performing instruction scheduling based on the first intermediate code, assigning a virtual register as a register for storing a temporary result of an operation to generate a second intermediate code, and generating a second intermediate code, the second intermediate code and the processor Real registers to be assigned to the respective virtual registers based on the information about the real registers of And an output step of outputting an object program in which the virtual register is replaced by the real register to which the virtual register is assigned. And the section used by the virtual register to which the real register is assigned is analyzed based on the section analysis result and information indicating the inter-instruction dependency that has already occurred. To
If there is a real register that does not cause a new inter-instruction dependency even if it is allocated to the allocation target virtual register, a program for determining the real register as a candidate to be preferentially allocated to the virtual register is recorded. The gist is a computer-readable recording medium.

【００２２】本発明では、第２の中間コードにおける仮
想レジスタに実レジスタを割り当てる際、割り当て対象
となった仮想レジスタに割り当てても新たな命令間依存
関係を生じさせない実レジスタが存在すれば、該実レジ
スタを優先的に該割り当て対象となった仮想レジスタに
割り当てる候補として決定する（もしくは割り当てるこ
とを決定する）。ここで、「レジスタの生存区間」と
は、レジスタの値が定義されている点からその値が最後
に参照される点までの区間である。レジスタの使われ方
によっては、複数の生存区間が存在することもある。In the present invention, when a real register is assigned to a virtual register in the second intermediate code, if there is a real register that does not cause a new inter-instruction dependency even if it is assigned to the virtual register to be assigned, The real register is preferentially determined (or determined to be allocated) as a candidate to be allocated to the allocated virtual register. Here, the “live range of the register” is a section from the point at which the value of the register is defined to the point at which the value is last referenced. Depending on how the registers are used, there may be multiple live ranges.

【００２３】より具体的には、例えば、第２の中間コー
ドから例えばフロー解析・データ依存関係を行い、依存
グラフの生成と演算の一時的な結果を置く仮想レジスタ
の生存区間の計算を行い、これをもとに仮想レジスタに
割り当てるべき実レジスタを決定する。仮想レジスタに
割り当てるのに最も理想的な実レジスタとは、それを割
り当てても依存グラフ中に新たな依存関係が発生しない
ような実レジスタである。このような条件を満たす実レ
ジスタを例えば依存グラフをもとに決定する。More specifically, for example, a flow analysis and data dependency are performed from the second intermediate code, a dependency graph is generated, and a live range of a virtual register in which a temporary result of the operation is stored is calculated. Based on this, a real register to be assigned to the virtual register is determined. The most ideal real register to be assigned to a virtual register is an actual register whose assignment does not cause a new dependency in the dependency graph. An actual register satisfying such a condition is determined based on, for example, a dependency graph.

【００２４】上記のような実レジスタを割り当てること
によって、ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行時のＩＬＰの
低下を防ぐことができる。By allocating the real registers as described above, it is possible to prevent a decrease in ILP at the time of executing the out-of-order.

【００２５】新たな依存関係が発生しないような実レジ
スタを割り当てることができれば、命令実行時にレジス
タを割り当てたことによる速度低下は発生しない。しか
し、実レジスタは有限であり、必ずしもこのような実レ
ジスタが存在するとは限らないため、新たな依存関係が
発生するような実レジスタの中から仮想レジスタに割り
当てる実レジスタを選択しなくてはならない場合もあ
る。If a real register that does not cause a new dependency can be allocated, the speed does not decrease due to the allocation of the register when executing the instruction. However, since real registers are finite and such real registers do not always exist, it is necessary to select a real register to be assigned to a virtual register from among real registers that cause a new dependency. In some cases.

【００２６】そこで、このような場合、新しく発生する
依存関係は実行時の速度低下の原因となり得るが、仮想
レジスタに割り当てることによって新たな依存関係が発
生したとしてもその依存関係がｏｕｔ−ｏｆ−ｏｒｄｅ
ｒ実行時に速度低下を引き起こさない、もしくは速度が
低下したとしてもそれを最小限にそどめるような実レジ
スタ（例えば、仮想レジスタに割り当てたときに依存グ
ラフ中に新たな依存関係が発生したとしてもそれらの命
令の実行されるタイミングの差が大きくなると予想され
るような実レジスタ）を優先的に割り当てるとよい。In such a case, the newly generated dependency may cause a reduction in execution speed. However, even if a new dependency is generated by allocating to a virtual register, the dependency is out-of- orde
A real register that does not cause a decrease in speed during execution of r, or minimizes the decrease even if the speed decreases (for example, a new dependency is generated in the dependency graph when assigned to a virtual register) Even so, it is preferable to preferentially allocate an actual register that is expected to increase the difference between the timings at which those instructions are executed.

【００２７】本発明によれば、レジスタ割り当てにとも
なって命令間に新たな依存関係が発生することを防ぎ、
もし発生してしまう場合でも新たな依存関係になる命令
間の実行されるタイミングをなるべく離れたものにする
ことができるようになる。このため、ハードウェアによ
るレジスタ・リネーミング機構を用いることなく、ｏｕ
ｔ−ｏｆ−ｏｒｄｅｒ実行時のＩＬＰの低下の原因とな
るｆａｌｓｅｄｅｐｅｎｄｅｎｃｙを最小限に抑える
ことが可能となる。また、ハードウェアによるレジスタ
・リネーミングではある限られた命令数に対してしかで
きないのに対して、コンパイラでおこなえば広範囲なレ
ジスタ解析を行うことができレジスタをより有効活用で
きるようになる。According to the present invention, it is possible to prevent a new dependency between instructions from occurring due to register allocation,
Even if it occurs, it is possible to make the execution timing between instructions having a new dependency relationship as far apart as possible. For this reason, without using a register renaming mechanism by hardware,
False dependency, which causes a decrease in ILP during execution of t-of-order, can be minimized. Moreover, while register renaming by hardware can be performed only for a limited number of instructions, a wide range of register analysis can be performed by a compiler, and registers can be more effectively used.

【００２８】なお、装置に係る本発明は方法に係る発明
としても成立し、方法に係る本発明は装置に係る発明と
しても成立する。The present invention relating to the apparatus is also realized as an invention relating to a method, and the present invention relating to a method is also realized as an invention relating to an apparatus.

【００２９】また、コンパイラ装置または方法に係る本
発明は、コンピュータに当該発明に相当する手順を実行
させるための（あるいはコンピュータを当該発明に相当
する手段として機能させるための、あるいはコンピュー
タに当該発明に相当する機能を実現させるための）プロ
グラムを記録したコンピュータ読取り可能な記録媒体と
しても成立する。Further, the present invention relating to a compiler apparatus or a method is provided for causing a computer to execute a procedure corresponding to the present invention (or for causing a computer to function as means corresponding to the present invention, or for causing a computer to execute the procedure corresponding to the present invention). The present invention is also realized as a computer-readable recording medium on which a program (for realizing a corresponding function) is recorded.

【００３０】[0030]

【発明の実施の形態】以下、図面を参照しながら発明の
実施の形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００３１】本発明の一実施形態に係る最適化コンパイ
ラについて説明する。本実施形態の最適化コンパイラ
は、コンパイルの対象としてｏｕｔ−ｏｆ−ｏｒｄｅｒ
実行可能なプロセッサ（ＣＰＵ）を想定している。な
お、本発明はｏｕｔ−ｏｆ−ｏｒｄｅｒ実行可能なスー
パースカラプロセッサにも適用可能であり、またｏｕｔ
−ｏｆ−ｏｒｄｅｒ実行可能なＶＬＩＷにも適用可能で
あるが、ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行可能なスーパー
スカラプロセッサについてはよく知られているのでここ
での詳細な説明は省略し、ｏｕｔ−ｏｆ−ｏｒｄｅｒ実
行可能なＶＬＩＷについては本コンパイラを説明した後
に説明する。An optimizing compiler according to an embodiment of the present invention will be described. The optimizing compiler according to the present embodiment performs out-of-order
An executable processor (CPU) is assumed. Note that the present invention is applicable to a superscalar processor capable of executing out-of-order.
Although the present invention can be applied to a VLIW that can execute -of-order, a super scalar processor that can execute out-of-order is well known, so a detailed description thereof is omitted here, and the out-of-order is omitted. The executable VLIW will be described after the description of the compiler.

【００３２】図１に、本実施形態に係る最適化コンパイ
ラの構成例を示す。FIG. 1 shows a configuration example of the optimizing compiler according to the present embodiment.

【００３３】本コンパイラは、高級言語で書かれたソー
スプログラム（１１）を入力とし、解析部１において、
入力されたソースプログラム（１１）に対する字句解析
・構文解析等を行って第１の中間コード（１２）を生成
する。字句解析処理では、入力されたソースプログラム
（１１）を形成する文字列を、解析し、語句に分割す
る。例えば、構文解析処理では、上記解析により得た語
句を上記高級言語の文法に照合して正しいか否かを判別
し、誤りがあればこれを通知し実行を止める。正しけれ
ば、構文の解析結果を中間コード（１２）として生成す
る。生成された中間コード（１２）は、主記憶やディス
ク等の記憶装置に格納される。中間コード（１２）は、
通常は、コンパイラ内部で管理され、外部からはアクセ
スできない。This compiler receives a source program (11) written in a high-level language as an input,
The first intermediate code (12) is generated by performing lexical analysis, syntax analysis, and the like on the input source program (11). In the lexical analysis processing, a character string forming the input source program (11) is analyzed and divided into words. For example, in the syntax analysis processing, the words obtained by the analysis are collated with the grammar of the high-level language to determine whether or not they are correct, and if there is an error, this is notified and the execution is stopped. If it is correct, a syntax analysis result is generated as an intermediate code (12). The generated intermediate code (12) is stored in a storage device such as a main storage or a disk. The intermediate code (12) is
Usually, it is managed inside the compiler and cannot be accessed from outside.

【００３４】次に、最適化部２において、中間コード
（１２）に対して処理を高速化するための最適化（生成
されたオブジェクトプログラムが対象プロセッサで実行
されたときの実行速度を高速化するための最適化）を行
って最適化された第２の中間コード（１３）を生成す
る。なお、本実施形態では、最適化部２においては、命
令スケジューリング部２１による命令スケジューリング
が行われた後にレジスタ割り当て部２２によるレジスタ
割り当てが行われるものとする。より具体的には例え
ば、最適化部２は、フロー解析、データ依存解析、命令
スケジューリング（命令割り当て）、レジスタ割り当て
等を行う。フロー解析処理では、中間コード（１２）が
生成されると、この中間コード（１２）をもとにしてプ
ログラムの流れの解析を行なう。データ依存解析処理で
は、プログラムの流れの解析が行われると、中間コード
（１２）を構成する各命令のデータ依存解析を行なって
依存グラフを作成し、どのような順番で命令割り当てを
行なわなければならないかといった制約を明らかにす
る。命令スケジューリング処理では、中間コードをもと
にオブジェクトプログラムの直前段階となる中間コード
（仮想レジスタが割り当てられたもの）（１３）を生成
する。生成された中間コード（１３）は、主記憶やディ
スク等の記憶装置に格納される。中間コード（１３）
は、通常は、コンパイラ内部で管理され、外部からはア
クセスできない。レジスタ割り当て処理では、命令スケ
ジューリング処理により生成された中間コード（１３）
について、命令スケジューリング処理で仮に割り当てら
れていた仮想レジスタを、対象となるプロセッサの持つ
実レジスタに割り当て直す、といったレジスタ割り当て
を行う。ここでは、仮想レジスタと実レジスタとの対応
をレジスタ対応表に登録する。Next, in the optimizing unit 2, optimization for speeding up the processing of the intermediate code (12) (to increase the execution speed when the generated object program is executed by the target processor). To optimize the second intermediate code (13). In the present embodiment, it is assumed that, in the optimizing unit 2, the register allocation by the register allocating unit 22 is performed after the instruction scheduling by the instruction scheduling unit 21 is performed. More specifically, for example, the optimization unit 2 performs flow analysis, data dependency analysis, instruction scheduling (instruction allocation), register allocation, and the like. In the flow analysis processing, when the intermediate code (12) is generated, the program flow is analyzed based on the intermediate code (12). In the data dependency analysis process, when the analysis of the program flow is performed, a data dependency analysis is performed on each instruction constituting the intermediate code (12) to create a dependency graph, and the instructions must be assigned in any order. Clarify the constraints such as not to be. In the instruction scheduling process, an intermediate code (one to which a virtual register is assigned) (13) is generated immediately before the object program based on the intermediate code. The generated intermediate code (13) is stored in a storage device such as a main storage or a disk. Intermediate code (13)
Is normally managed internally by the compiler and cannot be accessed externally. In the register allocation process, the intermediate code (13) generated by the instruction scheduling process
, The virtual register temporarily allocated in the instruction scheduling process is re-allocated to the real register of the target processor. Here, the correspondence between the virtual register and the real register is registered in the register correspondence table.

【００３５】そして、出力部３において、最適化された
中間コード（１３）をもとに対象プロセッサで実行可能
な機械語（オブジェクトプログラム）（１４）を生成し
出力する。すなわち、出力部３は、上記レジスタ対応表
をもとに、最適化された中間コード（１３）の仮想レジ
スタを実レジスタに置き換えた上で、機械語（１４）と
して出力する。Then, the output unit 3 generates and outputs a machine language (object program) (14) executable by the target processor based on the optimized intermediate code (13). That is, the output unit 3 replaces the virtual register of the optimized intermediate code (13) with a real register based on the register correspondence table, and then outputs it as a machine language (14).

【００３６】機械語（１４）を実行するプロセッサ（す
なわち本コンパイラが対象とするプロセッサ）は、複数
の並行に動作する演算器を有し、同時に複数の命令の実
行が可能であり、また命令のｏｕｔ−ｏｆ−ｏｒｄｅｒ
実行機能を有し、さらに本コンパイラにより仮想レジス
タに割り当てられる実レジスタを有するものとする。The processor that executes the machine language (14) (that is, the processor targeted by this compiler) has a plurality of arithmetic units that operate in parallel, can execute a plurality of instructions at the same time, and out-of-order
It has an execution function and further has a real register assigned to a virtual register by the compiler.

【００３７】本実施形態では、グラフカラーリング技法
によるレジスタ割り当て方式を例にして説明する。グラ
フカラーリング技法とは、仮想レジスタに対して実レジ
スタを割り当てるための手法として最も広く用いられて
いる方式の一つである。In the present embodiment, a description will be given of a register allocation method using a graph coloring technique as an example. The graph coloring technique is one of the most widely used methods for allocating real registers to virtual registers.

【００３８】図２に、グラフカラーリング技法によるレ
ジスタ割り当て方式の処理手順の一例を示す。FIG. 2 shows an example of the processing procedure of the register allocation method using the graph coloring technique.

【００３９】図２に示されるように、このレジスタ割り
当て方式は、レジスタ干渉グラフを生成するフェーズ
（ステップＳ１１）から始まる。このグラフにおける
「ノード」は「仮想レジスタ」であり、詳しくは後述す
るように、仮想レジスタの値が定義されている点が別の
仮想レジスタの生存区間内であれば、それらの仮想レジ
スタに対応するノードを「エッジ」で結ぶ。As shown in FIG. 2, the register allocation method starts with a phase for generating a register interference graph (step S11). The “node” in this graph is a “virtual register”, and as described in detail later, if the point at which the value of the virtual register is defined is within the live range of another virtual register, it corresponds to those virtual registers. Connected nodes are connected by "edges".

【００４０】ここで、図３に、これから実レジスタを割
り当てようとしているプログラムの一例（ＭＩＰＳアセ
ンブリ言語により記述されたもの）を示す。図３の例に
おいて、＄１００，＄１０１，＄１０２，＄１０３，＄
１０４は仮想レジスタであるとする。また、ここでは、
割り当てに使用できる実レジスタは＄１，＄２，＄３の
３つのレジスタであるものとする。FIG. 3 shows an example of a program (written in MIPS assembly language) for which real registers are to be allocated. In the example of FIG. 3, {100, # 101, # 102, # 103,}
104 is a virtual register. Also, here
It is assumed that there are three registers $ 1, $ 2, and $ 3 that can be used for assignment.

【００４１】図４は、図３の例の場合における命令間の
依存関係を示す依存グラフである。命令（２）“ｌｕｉ
＄１０１，０ｘ２０００”の実行結果（この場合、実
行結果は＄１０１に書かれる）をもとに命令（４）“ｓ
ｗ＄１０１，ｔｍｐ１（＄０）”は実行されるため、
命令（２）と命令（４）との間には依存関係を示すエッ
ジ（図４では、単方向の矢線で示す）が張られている。
同様に、命令（３）“ｌｕｉ＄１０２，０ｘ３００
０”と命令（５）“ｏｒｉ＄１０３，＄１０２，０ｘ
１１１１”との間、そして命令（５）と命令（６）“ｓ
ｗ＄１０３，ｔｍｐ２（＄０）”との間にも依存関係
が存在することが示されている。FIG. 4 is a dependency graph showing the dependency between instructions in the case of the example of FIG. Instruction (2) “lui
The instruction (4) “s” based on the execution result of “$ 101, 0x2000” (in this case, the execution result is written in $ 101)
Since w {101, tmp1 ($ 0) "is executed,
An edge (shown by a unidirectional arrow in FIG. 4) indicating a dependency is provided between the instruction (2) and the instruction (4).
Similarly, the instruction (3) “lui @ 102,0x300
0 ”and instruction (5)“ ori $ 103, $ 102, 0x
1111 ", and the instruction (5) and the instruction (6)" s
It is shown that there is also a dependency between w {103, tmp2 ($ 0) ".

【００４２】図５に、図３の例の場合における各仮想レ
ジスタの生存区間を示す。図５に示されるように、＄１
００は命令（１）の開始時から命令（７）の終了時ま
で、＄１０１は命令（２）の開始時から命令（４）の開
始時まで、＄１０２は命令（３）の開始時から命令
（５）の開始時まで、＄１０３は命令（５）の開始時か
ら命令（６）の開始時まで、＄１０４は命令（７）の開
始時から命令（７）の終了時までをそれぞれ生存区間と
する。FIG. 5 shows a live range of each virtual register in the case of the example of FIG. As shown in FIG.
00 is from the start of the instruction (1) to the end of the instruction (7), $ 101 is from the start of the instruction (2) to the start of the instruction (4), and $ 102 is from the start of the instruction (3). Until the start of the instruction (5), $ 103 is from the start of the instruction (5) to the start of the instruction (6), and $ 104 is from the start of the instruction (7) to the end of the instruction (7). Let it be a live range.

【００４３】図６は、この場合に生成されるレジスタ干
渉グラフである。＄１００は他の全ての仮想レジスタと
生存区間が重複しているため、全ての仮想レジスタに対
してエッジが張られる。＄１０１は＄１００以外に＄１
０２と生存区間が重複しているため＄１０２との間にも
エッジが張られる。＄１０３，＄１０４については＄１
００と生存区間が重複しているのみである。FIG. 6 is a register interference graph generated in this case. In # 100, since the live range overlaps with all the other virtual registers, edges are set for all the virtual registers. $ 101 is $ 1 other than $ 100
02 and # 102 are overlapped, so an edge is also formed between # 102 and # 102. $ 1 for $ 103 and $ 104
Only 00 and the live range overlap.

【００４４】生存区間が重複する仮想レジスタに対して
は異なる実レジスタを割り当てる必要がある。そこで、
このようにしてつくられたレジスタ干渉グラフにおい
て、エッジで結ばれているノードどうしがどれも別の色
（実レジスタ）で塗られているように色を塗る（カラー
リングする）ことにより（すなわち、エッジで結ばれて
いる両ノードに同じ実レジスタが割り当てられないよう
に）、実レジスタ割り当てを行う。図６の例で考える
と、＄１００に対して実レジスタ＄１が割り当てられた
場合、＄１００とエッジで結ばれた＄１０１，＄１０
２，＄１０３，＄１０４には＄１以外の実レジスタ（本
例の場合、＄２または＄３）を割り当てなければならな
い。It is necessary to allocate different real registers to virtual registers having overlapping live ranges. Therefore,
In the register interference graph thus created, by coloring (coloring) each node connected by an edge as if it were painted with another color (real register) (ie, The real register is allocated so that the same real register is not allocated to both nodes connected by the edge. Considering the example of FIG. 6, when the real register # 1 is allocated to $ 100, $ 101 and # 10 are connected to $ 100 by edges.
2, # 103 and # 104 must be assigned real registers other than $ 1 ($ 2 or $ 3 in this example).

【００４５】さて、ステップＳ１１でレジスタ干渉グラ
フが生成されたならば、次に、レジスタ干渉グラフ中の
どのノードから実レジスタを割り当てていくかを決定す
る。この処理は図２に示すステップＳ１２〜Ｓ１５によ
り行われる。When the register interference graph has been generated in step S11, it is next determined from which node in the register interference graph a real register is to be allocated. This process is performed by steps S12 to S15 shown in FIG.

【００４６】ステップＳ１２において、レジスタ干渉グ
ラフ内のノードのうち、そのノードから出ているエッジ
の数（すなわち、そのノードに隣接している他のノード
の数）が、割り当て可能な実レジスタの数よりも少ない
ものを検出し、そのようなノードが存在すれば、ステッ
プＳ１４でそのノードをレジスタ干渉グラフから取り除
いてレジスタ干渉グラフを再構築する。ここで、レジス
タ干渉グラフの再構築とは、検出されたノードとそれに
接しているエッジをレジスタ干渉グラフから削除するこ
とを意味している。なお、取り除くノードを検出する順
番については任意である（すなわち、上記条件を満たす
ノードが複数存在する場合には、そのエッジの数の大小
にかかわらず、いずれのノードを先に選択しても構わな
い）。In step S12, among the nodes in the register interference graph, the number of edges emerging from the node (ie, the number of other nodes adjacent to the node) is determined by the number of real registers that can be allocated. If fewer nodes are detected and such a node exists, the node is removed from the register interference graph in step S14 to reconstruct the register interference graph. Here, reconstructing the register interference graph means deleting the detected node and the edge in contact with the detected node from the register interference graph. The order in which the nodes to be removed are detected is arbitrary (that is, when there are a plurality of nodes satisfying the above conditions, any node may be selected first regardless of the number of edges thereof). Absent).

【００４７】図６の例においてこの処理を行う場合につ
いて説明する。The case of performing this processing in the example of FIG. 6 will be described.

【００４８】まず、図６のレジスタ干渉グラフにおい
て、ここでは例えば＄１０４について考えてみるものと
すると、＄１０４に隣接するノードは＄１００のノード
のみであるので、隣接するノードの数は割り当て可能な
実レジスタ数“３”より小さい。そこで、まず＄１０４
を図６のレジスタ干渉グラフから取り除いて、レジスタ
干渉グラフを再構築する。この結果、再構築後のレジス
タ干渉グラフは、図７（ａ）のようになる。取り除いた
ノードに関しては、その取り除いた順に記録をしてお
く。First, in the register interference graph of FIG. 6, if, for example, consider $ 104, the only node adjacent to $ 104 is $ 100, so the number of adjacent nodes can be assigned. Is smaller than the actual register number “3”. So first, $ 104
Is removed from the register interference graph of FIG. 6, and the register interference graph is reconstructed. As a result, the register interference graph after the reconstruction is as shown in FIG. The nodes that have been removed are recorded in the order in which they were removed.

【００４９】なお、ステップＳ１２において隣接ノード
数が割り当て可能実レジスタ数未満であるノードが存在
しない場合には、ステップＳ１３においてノードをレジ
スタスピル（ｓｐｉｌｌ）処理の候補として選択し、そ
のノードをステップＳ１４でレジスタ干渉グラフから取
り除いてレジスタ干渉グラフを再構築する。なお、レジ
スタｓｐｉｌｌの候補を選択する方法については既に種
々の方法が提案されている。If there is no node in which the number of adjacent nodes is less than the number of real registers that can be allocated in step S12, the node is selected as a candidate for register spill processing in step S13, and the node is selected in step S14. Removes from the register interference graph and reconstructs the register interference graph. Various methods have already been proposed for selecting a register spill candidate.

【００５０】以上の処理をステップＳ１５によってレジ
スタ干渉グラフが空になるまで繰り返す。The above processing is repeated until the register interference graph becomes empty in step S15.

【００５１】なお、ここまでのフェーズにおける処理は
従来の技術と同様でよく、周知技術であるのでここでの
詳しい説明は省略する（ＡｎｄｒｅｗＷ．Ａｐｐｅ
ｌ″ｍｏｄｅｒｎｃｏｍｐｉｌｅｒｉｍｐｌｅｍｅ
ｎｔａｔｉｏｎｉｎＣ″ Ｃｈａｐｔｅｒ１１参
照）。The processing in the phases up to this stage may be the same as that of the conventional technique, and is a well-known technique, so that detailed description thereof is omitted here (Andrew W. Appe).
l "modern compiler implement
nation in C ″ Chapter 11).

【００５２】図６の例の場合、上記の＄１０４と同様に
して、例えば以降は＄１０３，＄１０２，＄１０１，＄
１００の順にノードを取り除いていく。その際に再構築
されるレジスタ干渉グラフは、図６の状態から順に、図
７（ａ）→図７（ｂ）→図７（ｃ）→図７（ｄ）のよう
になり、最終的にレジスタ干渉グラフは図７（ｅ）のよ
うに空になる。なお、図６の例では、上記の順でレジス
タ干渉グラフからノードを取り除いていったとき、レジ
スタｓｐｉｌｌの候補は空集合である。In the case of the example shown in FIG. 6, similarly to the above-described step # 104, for example, steps {103, # 102, # 101,}
Nodes are removed in the order of 100. The register interference graph reconstructed at that time is as shown in FIG. 7 (a) → FIG. 7 (b) → FIG. 7 (c) → FIG. 7 (d) in order from the state of FIG. The register interference graph becomes empty as shown in FIG. In the example of FIG. 6, when the nodes are removed from the register interference graph in the above order, the candidates for the register spill are empty sets.

【００５３】ここまでのフェーズによって、仮想レジス
タの識別情報と、その仮想レジスタがレジスタ干渉グラ
フから取り除かれた順番との対応が記録されたことにな
る。本具体例の場合、仮想レジスタは図８に示すような
順でレジスタ干渉グラフから取り除かれたことが記録さ
れている。By the phases so far, the correspondence between the identification information of the virtual register and the order in which the virtual register is removed from the register interference graph is recorded. In the case of this specific example, it is recorded that the virtual registers have been removed from the register interference graph in the order shown in FIG.

【００５４】なお、上記では、図３のプログラムを処理
対象とした場合において、レジスタ干渉グラフから＄１
０４，＄１０３，＄１０２，＄１０１，＄１００の順に
ノードを取り除いた例を示したが、もちろん前述したよ
うにこの順番に限らず、例えば、＄１０２，＄１０３，
＄１０１，＄１０４，＄１００の順でも、＄１０４，＄
１０３，＄１０２，＄１００，＄１０１の順など、他の
順でも構わない。In the above description, when the program of FIG.
Although an example is shown in which nodes are removed in the order of 04, # 103, # 102, # 101, and # 100, the order is not limited to this order as described above.
Even in the order of $ 101, $ 104, $ 100, $ 104, $ 100
Other orders, such as 103, # 102, # 100, and # 101, may be used.

【００５５】さて、干渉グラフから全てのノードを選択
し終ったならば、ステップＳ１６において、ノードを選
択したのとは逆の順序で各ノードに実レジスタを割り当
て、そのノードを再びレジスタ干渉グラフに戻していく
処理を行う。以下、このステップＳ１６の処理について
詳しく説明する。When all the nodes have been selected from the interference graph, in step S16, real registers are assigned to the respective nodes in the reverse order of the selection of the nodes, and the nodes are re-registered in the register interference graph. Perform the returning process. Hereinafter, the process of step S16 will be described in detail.

【００５６】図９に、このフェーズ（ステップＳ１６）
においてノードに割り当てる実レジスタを決定する方式
の処理手順の一例を示す。FIG. 9 shows this phase (step S16).
2 shows an example of a processing procedure of a method of determining an actual register to be assigned to a node.

【００５７】まず、干渉グラフから取り除いたのとは逆
の順に実レジスタを割り当てる仮想レジスタを選択する
（ステップＳ２１）。本具体例の場合、＄１００が選択
される。First, a virtual register to which a real register is to be assigned is selected in the reverse order of the order removed from the interference graph (step S21). In the case of this specific example, $ 100 is selected.

【００５８】次に、ステップＳ２２〜Ｓ２４において、
上記選択されたレジスタに割り当てる実レジスタを決定
する。最初に、新たな依存関係が発生しないような実レ
ジスタが存在するかどうかを検査し、もしそのような実
レジスタが存在する場合には、それを優先順序付けされ
た実レジスタの列に登録する（ステップＳ２２）。続い
て、新たな依存関係を発生させないような実レジスタ以
外の実レジスタについては、ｏｕｔ−ｏｆ−ｏｒｄｅｒ
実行時に新たな依存関係がなるべく影響しないような実
レジスタを優先して優先順序付けされた実レジスタの列
に登録する（ステップＳ２３）。Next, in steps S22 to S24,
A real register to be assigned to the selected register is determined. First, it checks whether there is a real register that does not cause a new dependency, and if such a real register exists, registers it in the sequence of real registers that are prioritized ( Step S22). Subsequently, for real registers other than the real registers that do not cause a new dependency, out-of-order
The real registers whose new dependencies are not affected as much as possible during execution are registered in the priority-registered real register columns with priority (step S23).

【００５９】図１０は、優先順序付けされた実レジスタ
の列である。図１０に示すように、新たな依存関係が発
生しないような実レジスタ、新たな依存関係を生じる実
レジスタの順に優先順序が高い。詳しくは後述するが、
新たな依存関係が発生しないような実レジスタの中で
も、冗長な依存関係を生じるが既存の依存関係によって
隠蔽可能なものと、冗長な依存関係を生じないものとが
あり、前者の方が優先順位が高い。また、新たな依存関
係を生じる実レジスタどうしでは、新たな依存関係とな
る命令間の距離が大きくなるものほど優先順序が高い。FIG. 10 shows a sequence of real registers in priority order. As shown in FIG. 10, the priority order is higher in the order of the real registers that do not cause a new dependency and the real registers that generate a new dependency. Details will be described later,
Among real registers that do not generate new dependencies, there are those that generate redundant dependencies but can be hidden by existing dependencies, and those that do not generate redundant dependencies. Is high. In addition, among real registers that cause a new dependency, the priority order becomes higher as the distance between the instructions having the new dependency increases.

【００６０】続いて、優先順序付けされた実レジスタの
列が生成されたならば、その列からノードに割り当て可
能な実レジスタを優先順序に従って検索し、割り当て可
能な実レジスタを見つけたならば、それをレジスタ対応
表（図１１参照）に登録する（ステップＳ２４）。な
お、例えば実レジスタの割り当てに何らかの制約がある
ような場合などに、割り当て可能でない実レジスタが発
生し得る。Subsequently, if a column of real registers with priority order is generated, a real register that can be assigned to a node is searched from the column in accordance with the priority order, and if a real register that can be assigned is found, Is registered in the register correspondence table (see FIG. 11) (step S24). In addition, for example, when there is some restriction on the assignment of the real registers, a real register that cannot be assigned may occur.

【００６１】そして、割り当ての行われたノードを再び
レジスタ干渉グラフに戻し、既に配置されているノード
の生存区間と該ノードの生存区間が重複する場合には、
そのノード間にエッジを張ることによりレジスタ干渉グ
ラフを再構築する（ステップＳ２５）。Then, the assigned node is returned to the register interference graph again. If the live range of the already arranged node overlaps with the live range of the node,
The register interference graph is reconstructed by forming an edge between the nodes (step S25).

【００６２】以上の処理をレジスタ干渉グラフから削除
されたレジスタが全てレジスタ干渉グラフに再配置され
るまで繰り返し行う（ステップＳ２６）。The above processing is repeated until all the registers deleted from the register interference graph are rearranged in the register interference graph (step S26).

【００６３】例えば、図３の例において、＄１００，＄
１０１，＄１０２，＄１０３，＄１０４の順にノードに
実レジスタを割り当てていくと、レジスタ干渉グラフは
図７（ｅ）の空の状態から順に、図７（ｄ）→図７
（ｃ）→図７（ｂ）→図７（ａ）→図６のように再構築
されていく。For example, in the example of FIG.
When real registers are assigned to nodes in the order of 101, # 102, # 103, and # 104, the register interference graph becomes as shown in FIG.
(C) → FIG. 7 (b) → FIG. 7 (a) → reconstructed as shown in FIG.

【００６４】ここで、レジスタ対応表とは、図１１に示
すような各仮想レジスタにどの実レジスタを割り当てる
かを示すものであり、仮想レジスタの個数分のエントリ
を持っている。図１１は、仮想レジスタ＄１００，＄１
０１，＄１０２，＄１０３，＄１０４に順に実レジスタ
＄１，＄２，＄３，＄３，＄２が割り当てられた例を示
している。なお、出力部３においてコンパイラが最終的
に機械語（１４）を出力する際には、この表をもとに最
適化された中間コード（１３）の仮想レジスタを実レジ
スタに置き換えた上で、機械語（１４）を出力する。Here, the register correspondence table indicates which real register is to be assigned to each virtual register as shown in FIG. 11, and has as many entries as the number of virtual registers. FIG. 11 shows virtual registers # 100 and # 1
An example is shown in which real registers # 1, # 2, # 3, # 3, and # 2 are sequentially assigned to 01, # 102, # 103, and # 104. When the compiler finally outputs the machine language (14) in the output unit 3, after replacing the virtual register of the intermediate code (13) optimized based on this table with a real register, Output the machine language (14).

【００６５】次に、図２に示すステップＳ１６の処理の
うち図９に示すステップＳ２２での処理について詳細に
説明する。Next, of the processing in step S16 shown in FIG. 2, the processing in step S22 shown in FIG. 9 will be described in detail.

【００６６】なお、説明の便宜上、図１０の「冗長な依
存関係を生じるが既存の依存関係によって隠蔽可能な実
レジスタの集合」を第１優先レジスタ集合、「冗長な依
存関係を生じない実レジスタの集合」を第２優先レジス
タ集合、「新たな依存関係を生じる実レジスタの集合」
を第３優先レジスタ集合と呼ぶものとする。For convenience of explanation, “a set of real registers that generate redundant dependencies but can be hidden by existing dependencies” in FIG. 10 is referred to as a first priority register set and “real registers that do not generate redundant dependencies”. Set to the second priority register set, and the set of real registers that create new dependencies.
Is referred to as a third priority register set.

【００６７】図１２に、ステップＳ２２の処理を詳細化
した手順の一例を示す。FIG. 12 shows an example of a detailed procedure of step S22.

【００６８】ここでは、選択されたあるノード（すなわ
ち仮想レジスタ）にある実レジスタを割り当てることに
よって同一の実レジスタを参照している他の命令との間
に新たな依存関係が発生しないような実レジスタか、ま
たはそのような新たな依存関係が発生したとしてもそれ
らの命令間に既存の依存関係があり当該新たな依存関係
は無視できるような実レジスタの検出、およびそれらの
実レジスタの優先順序付けされた実レジスタの列への登
録を行う。具体的には、ステップＳ３１〜Ｓ３８の一連
の処理で図１０の第１優先レジスタ集合に相当する実レ
ジスタの登録を行い、次にステップＳ３９において第２
最優先レジスタ集合に相当する実レジスタの登録を行
う。Here, by assigning a certain real register to a selected certain node (that is, a virtual register), a real dependency that does not cause a new dependency with another instruction referring to the same real register is generated. Detecting registers or real registers that have existing dependencies between their instructions even if such new dependencies occur and the new dependencies can be ignored, and prioritize those real registers The registered real register is registered in the column. More specifically, a real register corresponding to the first priority register set in FIG. 10 is registered in a series of processing in steps S31 to S38, and then the second processing is performed in step S39.
A real register corresponding to the highest priority register set is registered.

【００６９】まず、「第１優先レジスタ集合」に相当す
る実レジスタの登録について説明する。First, registration of a real register corresponding to the “first priority register set” will be described.

【００７０】実レジスタを割り当てるノード（図９に示
すステップＳ２１で選択された仮想レジスタ）の生存区
間ごとにステップＳ３１〜Ｓ３５の処理を行い、それぞ
れの生存区間について該ノードに割り当てても新たな依
存関係が発生しない実レジスタを検出する。Steps S31 to S35 are performed for each live range of a node to which a real register is assigned (the virtual register selected in step S21 shown in FIG. 9). Detects real registers that have no relationship.

【００７１】ステップＳ３１〜Ｓ３３では、既に他の仮
想レジスタに割り当てられている実レジスタのうち、該
ノードに割り当てても新たな依存関係を生じない可能性
のある実レジスタを検出する。ステップＳ３１において
該ノードを定義している命令（該仮想レジスタの値を変
更する命令）に関して新たな依存関係を生じないような
実レジスタの検出を行い、ステップＳ３２において該ノ
ードを使用している命令（該仮想レジスタの値を参照す
る命令）に関して新たな依存関係を生じないような実レ
ジスタの検出を行う。In steps S31 to S33, among the real registers already allocated to another virtual register, a real register which may not cause a new dependency even if allocated to the node is detected. In step S31, a real register is detected so as not to cause a new dependency with respect to the instruction defining the node (instruction to change the value of the virtual register), and in step S32, the instruction using the node is detected. The detection of a real register that does not cause a new dependency with respect to (an instruction that refers to the value of the virtual register) is performed.

【００７２】該ノードを定義している命令に関しては、
選択されたノードの生存区間において該ノードを定義す
る命令を検索し、生存区間内で該ノードを定義する命令
全てに共通な先行依存命令を見つけ、それらの使用レジ
スタ（実レジスタ）を全て検出する。このとき、先行依
存命令として該ノードを定義している命令自身も含め
る。検出されたそれら実レジスタを、該ノードを定義し
ている命令に関して、該ノードに割り当てても新たな依
存関係を生じない可能性のある実レジスタ集合とする。As for the instruction defining the node,
In the live range of the selected node, search for the instruction that defines the node, find the predecessor-dependent instruction common to all the instructions that define the node in the live range, and detect all the registers (real registers) that are used. . At this time, the instruction itself that defines the node is also included as a preceding dependent instruction. The detected real registers are set as a set of real registers which may not cause a new dependency even if the instructions defining the node are assigned to the node.

【００７３】図３の例において、既に仮想レジスタには
＄１００＝＄１，＄１０１＝＄２，＄１０２＝＄３が割
り当てられているとして、次に仮想レジスタ＄１０３に
割り当てる実レジスタを選択する場合を考える。仮想レ
ジスタ＄１０３の生存区間は、図５に示すように一つだ
けである。そこで、この生存区間（命令（５）開始時か
ら命令（６）開始時まで）に関して解析を行う。生存区
間において仮想レジスタ＄１０３を定義する命令は、命
令（５）だけである。図４の依存グラフをみると、命令
（５）の先行依存命令は、命令（３）であることが分か
る。そこで、定義する命令に共通な先行依存命令は、命
令（３）および命令（５）となる。これら２つの命令で
使用されるレジスタのうち実レジスタが割り当てられて
いるのは＄１０２だけであるので、フェーズ（ステップ
Ｓ３１）で検出される実レジスタ集合は｛＄３｝とな
る。In the example of FIG. 3, it is assumed that $ 100 = $ 1, $ 101 = $ 2, $ 102 = $ 3 have already been allocated to the virtual registers, and the real register to be allocated next to virtual register $ 103 is selected. Think about it. The virtual register # 103 has only one live range as shown in FIG. Therefore, analysis is performed on this live range (from the start of the instruction (5) to the start of the instruction (6)). The only instruction that defines virtual register $ 103 in the live range is instruction (5). Looking at the dependency graph of FIG. 4, it can be seen that the preceding dependent instruction of the instruction (5) is the instruction (3). Therefore, the preceding dependent instructions common to the defined instruction are the instruction (3) and the instruction (5). Of the registers used by these two instructions, only the real register is allocated to {102}, and thus the real register set detected in the phase (step S31) is {3}.

【００７４】同様に、該ノードを使用している命令に関
しては、選択された該ノードの生存区間において該ノー
ドを使用する命令を検索し、生存区間内で該ノードを使
用する命令全てに共通な後続依存命令を見つけ、それら
の定義レジスタ（実レジスタ）を全て検出する。このと
き、後続依存命令として該ノードを使用している命令自
身も含める。検出されたそれら実レジスタを、該ノード
を使用している命令に関して、該ノードに割り当てても
新たな依存関係を生じない可能性のある実レジスタ集合
とする。Similarly, for an instruction using the node, an instruction using the node is searched for in the life cycle of the selected node, and an instruction common to all instructions using the node in the life cycle is searched. Find the subsequent dependent instructions and detect all their defining registers (real registers). At this time, the instruction itself using the node is also included as a subsequent dependent instruction. The detected real registers are set as a set of real registers that may not cause a new dependency even if the instructions using the node are assigned to the node.

【００７５】図３の例について引続き考えてみると、＄
１０３の生存区間において仮想レジスタ＄１０３を使用
する命令は、命令（６）だけである。図４の依存グラフ
をみると、命令（６）の後続依存命令は存在しない。そ
こで、使用する命令に共通な後続依存命令は、命令
（６）だけとなる。この命令で定義されるレジスタのう
ち実レジスタが割り当てられているものは存在しないの
で、フェーズ（ステップＳ３２）で検出される実レジス
タ集合は空集合となる。Continuing to consider the example of FIG.
The only instruction that uses virtual register $ 103 in the live range of 103 is instruction (6). Referring to the dependency graph of FIG. 4, there is no subsequent dependency instruction of the instruction (6). Therefore, the only subsequent instruction common to the used instruction is the instruction (6). Since there is no register to which a real register is assigned among the registers defined by this instruction, the real register set detected in the phase (step S32) is an empty set.

【００７６】最後に、ステップＳ３３で、それらの和集
合を計算し、既に他の仮想レジスタに割り当てられてい
る実レジスタのうち、該ノードに割り当てても新たな依
存関係を生じない可能性のある実レジスタ集合とする。Finally, in step S33, the union of these is calculated, and among the real registers already allocated to another virtual register, there is a possibility that a new dependency will not occur even if the real register is allocated to the node. It is a real register set.

【００７７】図３の例に関して｛＄３｝と空集合の和集
合を計算してフェーズ（ステップＳ３３）の段階で求め
られる実レジスタ集合は｛＄３｝となる。In the example of FIG. 3, the union of {3} and the empty set is calculated, and the actual register set obtained in the phase (step S33) is {3}.

【００７８】次に、ステップＳ３４，Ｓ３５において上
記のレジスタ集合から該ノードに割り当てると新たな依
存関係が生じてしまう実レジスタを削除する。ステップ
Ｓ３４において該ノードを定義している命令に関して新
たな依存関係が生じる実レジスタの検出を行い、ステッ
プＳ３５において該ノードを使用している命令に関して
新たな依存関係が生じる実レジスタの検出を行う。Next, in steps S34 and S35, the real registers from which a new dependency relationship occurs when assigned to the node from the above register set are deleted. In step S34, a real register in which a new dependency is generated for the instruction defining the node is detected, and in step S35, a real register in which a new dependency is generated for the instruction using the node is detected.

【００７９】該ノードを定義している命令に関しては、
生存区間内で該ノードを定義する命令全てに共通な先行
依存命令以外の先行命令を全て見つけ、それらの使用実
レジスタを全て検出する。このとき先行命令として該ノ
ードを定義している命令自身も含める。これらを、該ノ
ードを定義している命令に関して、該ノードに割り当る
と新たな依存関係を生じてしまう実レジスタ集合とし、
ステップＳ３３で求めた実レジスタの集合から削除す
る。As for the instruction defining the node,
In the live range, all preceding instructions other than the preceding dependent instruction common to all the instructions defining the node are found, and all of their actual registers are detected. At this time, the instruction itself defining the node is also included as a preceding instruction. These are a set of real registers which, when assigned to the node, causes a new dependency with respect to the instruction defining the node,
It is deleted from the set of real registers obtained in step S33.

【００８０】引続き、図３の例について考えてみる。該
ノードを定義する命令全てに共通な先行依存命令以外の
先行命令は命令（１），（２），（４）であり、それら
の使用する仮想レジスタのうち実レジスタが割り当てら
れているものは＄１０１であり、＄１０１に割り当てら
れた実レジスタは＄２であるので、使用実レジスタ集合
は｛＄２｝となる。よって、ステップＳ３３で求めた集
合｛＄３｝から｛＄２｝を削除して｛＄３｝となる。Next, consider the example of FIG. The preceding instructions other than the leading dependent instruction common to all the instructions defining the node are instructions (1), (2), and (4). Of the virtual registers used, those to which the real registers are assigned Since it is $ 101 and the real register assigned to $ 101 is $ 2, the used real register set is {2}. Therefore, {2} is deleted from the set {3} obtained in step S33 to become {3}.

【００８１】同様に、該ノードを使用している命令に関
しては、生存区間内で該ノードを使用する命令全てに共
通な後続依存命令以外の後続命令を全て見つけ、それら
の定義実レジスタを全て検出する。このとき後続命令と
して該ノードを使用している命令自身も含める。これら
を、該ノードを使用している命令に関して、該ノードに
割り当ると新たな依存関係を生じてしまう実レジスタ集
合とし、ステップＳ３３で求めた実レジスタの集合から
削除する。Similarly, for instructions using the node, all the subsequent instructions other than the subsequent dependent instructions common to all the instructions using the node are found in the live range, and all the definition real registers are detected. I do. At this time, the instruction itself using the node is also included as a subsequent instruction. These are set as a real register set that causes a new dependency when assigned to the node with respect to the instruction using the node, and are deleted from the set of real registers obtained in step S33.

【００８２】引続き、図３の例について考えてみる。該
ノードを使用する命令全てに共通な後続依存命令以外の
後続命令は命令（７）であり、命令（７）の定義する仮
想レジスタは＄１０４であるが、＄１０４にはまだ実レ
ジスタが割り当てられていないので、それらの定義する
実レジスタ集合は空集合となる。よって、ステップＳ３
４で求めた集合｛＄３｝から空集合を削除して｛＄３｝
となる。Next, consider the example of FIG. The succeeding instruction other than the following dependent instruction common to all instructions using the node is the instruction (7), and the virtual register defined by the instruction (7) is $ 104, but the real register is still allocated to $ 104. Therefore, the real register set defined by them is an empty set. Therefore, step S3
Delete the empty set from the set {3} found in 4 and {3}
Becomes

【００８３】以上のステップＳ３１〜Ｓ３５の処理を該
ノードの全ての生存区間について行う（ステップＳ３
６）。The processing of steps S31 to S35 is performed for all the live ranges of the node (step S3
6).

【００８４】次に、ステップＳ３７において全ての生存
区間で新たな依存関係を発生させない実レジスタ集合を
検出することによって、既に割り当てがされている実レ
ジスタのうち該ノードに割り当てたときに新たな依存関
係を発生されない実レジスタ集合を抽出する。Next, in step S37, a real register set that does not cause a new dependency relationship in all the live ranges is detected, and when a real register that has already been allocated is assigned to the node, a new dependency is set. Extract the real register set for which no relation is generated.

【００８５】図３の例に関しては、仮想レジスタ＄１０
３の生存区間は一つだけである。よって、ステップＳ３
７の時点で求められる実レジスタ集合は｛＄３｝とな
る。Referring to the example of FIG. 3, virtual register $ 10
3 has only one live range. Therefore, step S3
The real register set obtained at the time of 7 is {3}.

【００８６】そして、ステップＳ３８において、図１０
の第１優先レジスタ集合に相当する実レジスタの登録を
行う。これに当てはまる実レジスタ集合がステップＳ３
７までで求めた実レジスタ集合であるので、そこに含ま
れる実レジスタを最も優先順序が高い実レジスタとして
優先順序付けされた実レジスタの列に登録する。Then, in step S38, FIG.
Of the real registers corresponding to the first priority register set. The actual register set that corresponds to this is set in step S3.
7, the real registers included therein are registered as the real registers having the highest priority order in the column of the real registers in the priority order.

【００８７】図３の例では、ステップＳ３７の時点で求
められる実レジスタ集合が｛＄３｝であるので、実レジ
スタ＄３を優先順序付けされた実レジスタの列に登録す
る。In the example of FIG. 3, since the real register set obtained at the time of step S37 is {3}, the real register # 3 is registered in the priority-registered real register column.

【００８８】次に、第２優先レジスタ集合に相当する実
レジスタの登録について説明する。Next, registration of a real register corresponding to the second priority register set will be described.

【００８９】ステップＳ３９において、レジスタ干渉グ
ラフ中のノードに一度も割り当てられていない実レジス
タを検出する。一度も割り当てされていない実レジスタ
を該ノードに割り当てても他の命令との間に冗長な依存
関係が発生することはない。そこで、このようなレジス
タを冗長な依存関係を生じない実レジスタとして検出
し、２番目に優先順序が高い実レジスタとして優先順序
付けされた実レジスタの列に登録する。このような実レ
ジスタを既に割り当てられた実レジスタよりも優先順序
が下であるとしたのは、可能な限り既に割り当てられた
実レジスタを再利用し、使用する実レジスタ数を最小限
にとどめるようにするためである。At step S39, a real register which has never been assigned to a node in the register interference graph is detected. Even if a real register that has never been allocated is allocated to the node, there is no redundant dependency with other instructions. Therefore, such a register is detected as a real register having no redundant dependency, and is registered as a real register having the second highest priority order in a row of the real registers which are prioritized. The reason that such real registers are given a lower priority than already allocated real registers is to reuse already allocated real registers as much as possible and to minimize the number of real registers used. In order to

【００９０】図３の例では、実レジスタ＄１，＄２のう
ち一度も割り当てられていない実レジスタは存在しない
ので（既に全ての実レジスタが干渉グラフ中のノードに
割り当てられているので）、どのレジスタも優先順序付
けされた実レジスタの列に登録されない。In the example of FIG. 3, there is no real register to which no real register has been assigned among real registers # 1 and # 2 (since all real registers have already been allocated to nodes in the interference graph). None of the registers are registered in the sequence of prioritized real registers.

【００９１】以上がステップＳ２２における処理の詳細
な説明である。The above is the detailed description of the processing in step S22.

【００９２】次に、上記のステップＳ１６の処理のうち
ステップＳ２３での処理について詳細に説明する。Next, the processing in step S23 of the processing in step S16 will be described in detail.

【００９３】ステップＳ２３では、上記のステップＳ２
２にて当てはまらなかった実レジスタについて優先順序
付け（すなわち、「第３優先レジスタ集合」に相当する
実レジスタの登録）を行う。このフェーズでは、上記の
選択されたノード（すなわち仮想レジスタ）に実レジス
タを割り当てることによって同一の実レジスタを参照し
ている他の命令との間に新たな依存関係が発生するが、
依存関係を持つ命令どうしの実行されるタイミングが大
きく異なるために、実際のｏｕｔ−ｏｆ−ｏｒｄｅｒ実
行時にはその依存関係の影響を受けないかもしくは受け
たとしてもそれが小さなものであるような実レジスタを
優先させる。At step S23, at step S2
The priority ordering (that is, registration of the real registers corresponding to the “third priority register set”) is performed for the real registers that did not apply in step 2. In this phase, assigning a real register to the selected node (i.e., a virtual register) creates a new dependency with another instruction referencing the same real register,
Since the execution timings of instructions having a dependency greatly differ from each other, an actual register which is not affected by the dependency at the time of actual out-of-order execution, or which is small even if received, Priority.

【００９４】図１３に、ステップＳ２３の処理を詳細化
した手順の一例を示す。FIG. 13 shows an example of a procedure in which the process of step S23 is detailed.

【００９５】ここでは、説明の便宜上、ステップＳ２２
の説明とは異なる例を用いる。すなわち、図３のプログ
ラムにおいて、仮想レジスタ＄１０４に割り当てる実レ
ジスタを選択する場合を考える。なお、既に仮想レジス
タには＄１００＝＄１，＄１０１＝＄２，＄１０２＝＄
３，＄１０３＝＄３が割り当てられているとする。ま
た、仮想レジスタ＄１０４に関してステップＳ２２にお
いて検出される実レジスタは存在しない。Here, for convenience of explanation, step S22
An example different from that described above will be used. That is, consider the case where the real register to be allocated to virtual register $ 104 is selected in the program of FIG. Note that $ 100 = $ 1, $ 101 = $ 2, $ 102 = $ already exist in the virtual register.
3, ＄ 103 = ＄ 3 is assigned. Also, there is no real register detected in step S22 for virtual register # 104.

【００９６】最初に、ステップＳ４１において、干渉グ
ラフ中にノード（既に実レジスタが割り当てられたノー
ド）が存在すれば、その全てのノードに関して割り当て
られた実レジスタの生存区間を解析する。First, in step S41, if there is a node (a node to which a real register has already been allocated) in the interference graph, the live range of the real register allocated to all the nodes is analyzed.

【００９７】図１４は、仮想レジスタ＄１０４に割り当
てを行う時点で既に割り当てられている実レジスタおよ
び仮想レジスタ＄１０４の生存区間である。実レジスタ
＄３の生存区間は、＄３が割り当てられた仮想レジスタ
＄１０２，＄１０３の生存区間を合わせたものとなって
いる。FIG. 14 shows the live ranges of the real registers and virtual registers # 104 which have already been allocated at the time of allocation to virtual registers # 104. The live range of the real register # 3 is the sum of the live ranges of the virtual registers # 102 and # 103 to which $ 3 is assigned.

【００９８】次に、ステップＳ４２において、既に割り
当てられた実レジスタの生存区間と実レジスタを割り当
てようとしているノードの生存区間との距離を計算す
る。生存区間どうしの距離とは、一方の生存区間が終了
してから他方の生存区間が始まるまでのサイクル数であ
り、あるレジスタに関して複数の生存区間が存在する場
合にはそれらの区間全てに関して距離を計算しその最小
値を上記生存区間どうしの距離とする。本実施形態で
は、命令は中間コードの並びに従って一つずつ実行され
ると仮定している。Next, in step S42, the distance between the live range of the real register already allocated and the live range of the node to which the real register is to be allocated is calculated. The distance between live ranges is the number of cycles from the end of one live range to the start of the other live range. If there are multiple live ranges for a certain register, the distance is calculated for all of those ranges. Calculate and set the minimum value as the distance between the live ranges. In the present embodiment, it is assumed that instructions are executed one by one according to the arrangement of intermediate codes.

【００９９】さらに、本実施形態では、計算された生存
区間どうしの距離について修正を行うものとする。ここ
では、計算された距離がある一定値Ｘ以上となった場合
は、その距離を一定値Ｘであるとする。一般的に、Ｘの
値としては、それ以上距離が離れていれば依存の影響は
無くなると考えられるような値を利用する。Further, in this embodiment, it is assumed that the calculated distance between the live ranges is corrected. Here, when the calculated distance is equal to or more than a certain value X, the distance is assumed to be the certain value X. In general, as the value of X, a value that is considered to have no influence of dependence if the distance is longer than that is used.

【０１００】なお、生存区間が重なっている（干渉して
いる）ものに関しては、その距離を負数とするものとす
る。In the case where the live ranges overlap (interfere), the distance is set to a negative number.

【０１０１】このようにして、全ての実レジスタに関し
て該ノードの生存区間との距離が計算される。In this way, the distance from all the real registers to the live range of the node is calculated.

【０１０２】ここでの具体例について考えてみる。一定
値Ｘが１０であるとすると、仮想レジスタ＄１０４と実
レジスタとの距離はそれぞれ次のようになる。ただし、
レジスタの生存区間どうしの距離をｄｉｓｔ（ｒｅｇ
１，ｒｅｇ２）で表わしている。ｄｉｓｔ（＄１，＄１０４）＝−１ｄｉｓｔ（＄２，＄１０４）＝３ｄｉｓｔ（＄３，＄１０４）＝１ここで、距離が−１とは生存区間が重複することを意味
する。Consider a specific example here. Assuming that constant value X is 10, the distance between virtual register # 104 and the real register is as follows. However,
The distance between the live ranges of the register is dist (reg
1, reg2). dist (＄ 1, ＄ 104) = − 1 dist (＄ 2, ＄ 104) = 3 dist (＄ 3, ＄ 104) = 1 Here, a distance of −1 means that live ranges overlap.

【０１０３】次に、ステップＳ４３においてノードに割
り当てる実レジスタの優先順序付けをする。順序付け
は、生存区間どうしの距離が大きいものを優先させるよ
うに行う。Next, in step S43, the real registers assigned to the nodes are prioritized. The ordering is performed in such a manner that the one with the largest distance between the live ranges is prioritized.

【０１０４】ただし、距離が一定値Ｘの実レジスタが複
数あった場合には、一度でも他のノードに対して割り当
てがされた実レジスタをその中で優先されるものとす
る。このようにすることにより、実レジスタの再利用が
促進され、使用する実レジスタ数を可能な限り削減でき
るようになる。また、距離が負数の実レジスタは、ノー
ドに割り当てることが不可能であるため、優先順序付け
の対象から除外する。この処理をステップＳ４４で行
う。However, when there are a plurality of real registers having a constant distance X, the real registers assigned to other nodes at least once have priority. By doing so, reuse of the real registers is promoted, and the number of used real registers can be reduced as much as possible. Further, a real register having a negative distance cannot be assigned to a node, and is therefore excluded from the priority order. This processing is performed in step S44.

【０１０５】ここでの具体例について考えてみると、仮
想レジスタ＄１０４に割り当てるための実レジスタの優
先順序は、優先度の高い順に＄２，＄３となる。Considering the specific example here, the priority order of the real registers to be assigned to the virtual register # 104 is # 2, # 3 in descending order of priority.

【０１０６】最後に、ステップＳ４５において、以上の
処理によって定まった優先順序に従って実レジスタを優
先順序付けされた実レジスタの列に登録する。Finally, in step S45, the real registers are registered in the priority-registered real register column in accordance with the priority order determined by the above processing.

【０１０７】なお、このステップＳ２３の例では、＄１
００＝＄１，＄１０１＝＄２，＄１０２＝＄３，＄１０
３＝＄３のように仮想レジスタが実レジスタに割り当て
られている場合に仮想レジスタ＄１０４に割り当てる実
レジスタ（第３優先レジスタ集合に相当するもの）の優
先順位は＄２，＄３となったが、ステップＳ２２の例、
すなわち＄１００＝＄１，＄１０１＝＄２，＄１０２＝
＄３と割り当てられている場合に仮想レジスタ＄１０３
に割り当てる実レジスタを選択する例における実レジス
タの優先順位については、ステップＳ２２で図１０の第
１優先レジスタ集合に＄３が登録された（第２優先レジ
スタ集合は空集合）後、残りの＄１，＄２に関して距離
を計算すると、ｄｉｓｔ（＄１，＄１０３）＝−１ｄｉｓｔ（＄２，＄１０３）＝１となるため、それらの間の優先順位は＄２，＄１の順と
なる。Incidentally, in the example of step S23, $ 1
00 = $ 1, $ 101 = $ 2, $ 102 = $ 3, $ 10
When the virtual register is allocated to the real register as in 3 = $ 3, the priority of the real register (corresponding to the third priority register set) allocated to the virtual register $ 104 is $ 2, $ 3 Is an example of step S22,
That is, ＄ 100 = ＄ 1, ＄ 101 = ＄ 2, ＄ 102 =
Virtual register $ 103 when $ 3 is assigned
Regarding the priority order of the real registers in the example of selecting the real registers to be assigned to, after ＄ 3 is registered in the first priority register set of FIG. 10 in step S22 (the second priority register set is an empty set), the remaining ＄ When the distance is calculated with respect to 1 and ＄ 2, dist (＄ 1, ＝ 103) = − 1 dist (＄ 2, １103) = 1. Become.

【０１０８】以上がステップＳ２３における処理の詳細
な説明である。The above is the detailed description of the processing in step S23.

【０１０９】このようにして、ステップＳ２２，Ｓ２３
で優先順序付けされた実レジスタの列を生成し、ステッ
プＳ２４においてその中から優先度の高い順に該ノード
に割り当て可能な実レジスタを検索し割り当てを行う。Thus, steps S22, S23
Is generated, and in step S24, a real register that can be assigned to the node is searched for and assigned in descending order of priority in step S24.

【０１１０】例えば、ステップＳ２２で用いた例（＄１
００＝＄１，＄１０１＝＄２，＄１０２＝＄３と割り当
てられている場合に仮想レジスタ＄１０３に割り当てる
実レジスタを選択する例）では、最終的に仮想レジスタ
＄１０３に割り当てる実レジスタの優先順位は［＄３，
＄２］となり、ステップＳ２４において、＄３，＄２の
順に割り当て可能なものを見つけ、この例の場合には、
＄３が選択され、これがレジスタ対応表に登録される。For example, the example used in step S22 (# 1
In the case where 00 = $ 1, $ 101 = $ 2, and $ 102 = $ 3, the real register to be allocated to the virtual register $ 103 is selected). The priority is [$ 3,
＄ 2], and in step S24, one that can be assigned in the order of ＄ 3, ＄ 2 is found, and in this case,
$ 3 is selected and registered in the register correspondence table.

【０１１１】また、例えばステップＳ２３で用いた例
（＄１００＝＄１，＄１０１＝＄２，＄１０２＝＄３，
＄１０３＝＄３と割り当てられている場合に仮想レジス
タ＄１０４に割り当てる実レジスタを選択する例）で
は、最終的に仮想レジスタ＄１０４に割り当てる実レジ
スタの優先順位は［＄２，＄３］となり、ステップＳ２
４において、＄２，＄３の順に割り当て可能なものを見
つけ、この例の場合には、＄２が選択され、これがレジ
スタ対応表に登録される。Further, for example, the example used in step S23 (＄ 100 = ＄ 1, ＄ 101 = ＄ 2, ＄ 102 = ＄ 3,
In the example of selecting the real register to be allocated to the virtual register $ 104 when $ 103 = $ 3 is allocated), the priority of the real register finally allocated to the virtual register $ 104 is [$ 2, $ 3]. , Step S2
In step 4, an assignable element is found in the order of # 2 and # 3. In this example, # 2 is selected and registered in the register correspondence table.

【０１１２】この結果、図１１に示すような内容のレジ
スタ対応表が得られる。As a result, a register correspondence table having contents as shown in FIG. 11 is obtained.

【０１１３】ここで、本実施形態による処理結果の一例
と、従来技術による処理結果の一例とを比較する。Here, an example of the processing result according to the present embodiment is compared with an example of the processing result according to the related art.

【０１１４】図１５（ａ）は、（本実施形態の優先順序
付けを実レジスタに対して行わないような）従来の方法
によって図３の仮想レジスタに対してレジスタ割り当て
を行ったときに出力されるコードの一例である。この場
合、＄１００＝＄１，＄１０１＝＄２，＄１０２＝＄
３，＄１０３＝＄２，＄１０４＝＄２というように仮想
レジスタに対して実レジスタが割り当てられている。こ
のコードにおいて命令間の依存関係は、命令（４）と命
令（５）の間、および命令（６）と命令（７）の間に新
たなｆａｌｓｅｄｅｐｅｎｄｅｎｃｙが発生するた
め、図１５（ｂ）のようになってしまう。FIG. 15A is output when register assignment is performed to the virtual register of FIG. 3 by a conventional method (such that the prioritization of this embodiment is not performed for the real register). It is an example of a code. In this case, ＄ 100 = ＄ 1, ＄ 101 = ＄ 2, ＄ 102 = ＄
The real registers are allocated to the virtual registers such as 3, $ 103 = $ 2, $ 104 = $ 2. In this code, the dependency between instructions is such that a new false dependency occurs between the instruction (4) and the instruction (5) and between the instruction (6) and the instruction (7). It becomes like this.

【０１１５】一方、図１６（ａ）は、本実施形態を用い
て優先順序付けされた実レジスタを図３の仮想レジスタ
に対して割り当てた場合に出力されるコードである。こ
の場合、＄１００＝＄１，＄１０１＝＄２，＄１０２＝
＄３，＄１０３＝＄３，＄１０４＝＄２というように仮
想レジスタに対して実レジスタが割り当てられている。
このコードでは命令間の依存関係は、命令（４）と命令
（７）の間に新たなｆａｌｓｅｄｅｐｅｎｄｅｎｃｙ
が発生するため、図１６（ｂ）のようになる。これは図
１５（ｂ）と比較すると命令間の依存関係が少なくスケ
ジューリングに対する自由度も高い。On the other hand, FIG. 16A shows a code output when a real register that has been prioritized using this embodiment is assigned to the virtual register in FIG. In this case, ＄ 100 = ＄ 1, ＄ 101 = ＄ 2, ＄ 102 =
The real registers are allocated to the virtual registers such that $ 3, $ 103 = $ 3, $ 104 = $ 2.
In this code, the dependency between instructions is a new false dependency between the instruction (4) and the instruction (7).
Is generated as shown in FIG. 16B. This has less dependency between instructions and a higher degree of freedom for scheduling as compared to FIG.

【０１１６】このように命令間の依存関係を考慮したレ
ジスタ割り当てを行った結果、レジスタ割り当てにとも
なうＩＬＰの低下を防ぐことができる。また、図１６の
命令（４）と命令（７）の間のｆａｌｓｅｄｅｐｅｎ
ｄｅｎｃｙに関しても、これらの命令の実行されるタイ
ミングが離れているため、この依存関係がｏｕｔ−ｏｆ
−ｏｒｄｅｒ実行に対して与える影響は小さい。As a result of performing the register allocation in consideration of the dependencies between instructions as described above, it is possible to prevent a decrease in ILP due to the register allocation. In addition, a false dependency between the instruction (4) and the instruction (7) in FIG.
Regarding the density, since the timings at which these instructions are executed are far apart, this dependency is out-of-of
The effect on -order execution is small.

【０１１７】ところで、上記では図１０の第３優先レジ
スタ集合の優先順序を決定する際の優先順序の基準とし
てレジスタの生存区間どうしの距離を利用したが、他の
基準をもとに優先順序を決定するようにしてもよい。以
下では、他の基準の一例としてクリティカルパス長をも
とに優先順序を決定する場合について説明する。なお、
この場合、前述した処理のうち図９のステップＳ２３や
図１３の処理が相違する以外は同様であるので、ここで
は、相違する点を中心として説明する。In the above, the distance between the live ranges of the registers is used as a priority order criterion for determining the priority order of the third priority register set in FIG. 10, but the priority order is determined based on other criteria. It may be determined. Hereinafter, a case where the priority order is determined based on the critical path length will be described as an example of another criterion. In addition,
In this case, since the processing is the same as the above-described processing except that step S23 in FIG. 9 and the processing in FIG. 13 are different, the description will be made focusing on the different points.

【０１１８】クリティカルパスとは、図４に示すような
依存グラフにおける最長パスである。パス長とは、その
パスの処理に必要なサイクル数のことを示す。つまり、
クリティカルパス長とは、演算器が無限にあると仮定し
て命令列を並列に実行したときに、その命令列全体が処
理を終了するのに最低限必要なサイクル数を示してい
る。なお、１つの命令の実行には複数サイクル要するこ
ともあり、本発明は１命令の実行に複数サイクルを要す
る場合も包含するが、以下では、説明を分かり易くする
ために、１命令が１サイクルで処理されるとした場合に
ついて説明する。The critical path is the longest path in the dependency graph as shown in FIG. The path length indicates the number of cycles required for processing the path. That is,
The critical path length indicates the minimum number of cycles required for the entire instruction sequence to complete processing when the instruction sequence is executed in parallel assuming that the arithmetic unit is infinite. Note that the execution of one instruction may require a plurality of cycles, and the present invention also includes the case where the execution of one instruction requires a plurality of cycles. The case where it is assumed that the processing is performed will be described.

【０１１９】図１７に、クリティカルパス長を基準とし
たときの優先順序付けされた実レジスタの列を示す。図
１７に示すように、新たな依存関係が発生しないような
実レジスタ、新たな依存関係を生じる実レジスタの順に
優先順序が高い（この点は、前述と同様である）。ま
た、ここでは、新たな依存関係を生じる実レジスタどう
しでは、新たな依存関係が発生したときの命令列全体の
クリティカルパス長が小さくなるものほど優先順序が高
い。FIG. 17 shows a sequence of real registers in the order of priority based on the critical path length. As shown in FIG. 17, the priority order is higher in the order of a real register that does not cause a new dependency and a real register that generates a new dependency (this point is the same as described above). Here, among real registers that cause a new dependency, the priority order becomes higher as the critical path length of the entire instruction sequence when the new dependency occurs becomes smaller.

【０１２０】このような優先順序付けされた実レジスタ
の列を作成するには、前述の形態ではレジスタの生存区
間どうしの距離ｄｉｓｔ（ｒｅｇ１，ｒｅｇ２）を用い
て仮想レジスタに割り当てる実レジスタの優先順序付け
を行ったのに対し、本形態では、仮想レジスタｒｅｇ_ｐ
に実レジスタｒｅｇ_ｒを割り当てたときのクリティカル
パス長ｃｐ＿ｌｅｎｇｔｈ（ｒｅｇ_ｒ，ｒｅｇ_ｐ）を用
いて実レジスタの優先順序付けを行う。仮想レジスタに
実レジスタを割り当てると新たな依存関係が発生するた
めクリティカルパス長が変わることがあるが、この値は
割り当てる実レジスタによって異なる。クリティカルパ
ス長が小さいほど命令列全体の処理時間を短縮できる可
能性は高くなるため、割り当て後のクリティカルパス長
が小さくなるような実レジスタを優先させる。ただし、
生存区間が重なっている（干渉している）ものに関して
はノードに割り当てることが不可能であるため、優先順
序付けの対象から除外する。In order to create such a sequence of real registers in which priority is assigned, in the above-described embodiment, the priority order of the real registers assigned to the virtual registers is determined by using the distances dist (reg1, reg2) between the live ranges of the registers. In contrast, in the present embodiment, the virtual register reg _p
The priority order of the real registers is determined using the critical path length cp_length (reg _r , reg _p ) when the real registers reg _r are assigned to the real registers. When a real register is assigned to a virtual register, a new dependency occurs, so that the critical path length may change. However, this value differs depending on the real register to be assigned. The smaller the critical path length, the higher the possibility that the processing time of the entire instruction sequence can be shortened. Therefore, priority is given to the real registers whose critical path length after the assignment is small. However,
Those with overlapping (interfering) live ranges cannot be assigned to nodes, and are excluded from priority ordering.

【０１２１】さて、＄１００，＄１０１，＄１０２，＄
１０３，＄１０４の順に実レジスタを割り当てるものと
し、また実レジスタは＄１，＄２，＄３の３つとする。
本形態でも先の例と同様に新たな依存関係を生じずに＄
１００，＄１０１，＄１０２，＄１０３には実レジスタ
を割り当てることができる。つまり、仮想レジスタには
＄１００＝＄１，＄１０１＝＄２，＄１０２＝＄３，＄
１０３＝＄３というように実レジスタが割り当てられて
いるものとする。Now, {100, $ 101, $ 102,}
Real registers are allocated in the order of 103 and # 104, and there are three real registers, $ 1, $ 2 and $ 3.
In the present embodiment, similar to the previous example, no new dependency is generated.
Real registers can be assigned to 100, $ 101, $ 102, and $ 103. That is, in the virtual register, {100 = $ 1, $ 101 = $ 2, $ 102 = $ 3, $}
It is assumed that real registers are allocated as 103 = $ 3.

【０１２２】ここで、＄１０４への実レジスタの割り当
てを行う場合を考える。Here, a case is considered in which a real register is allocated to $ 104.

【０１２３】まず、前述の通り、第１および第２優先レ
ジスタ集合は空集合となる。First, as described above, the first and second priority register sets are empty sets.

【０１２４】次に、第３優先レジスタ集合に相当する実
レジスタとその優先順位をクリティカルバス長をもとに
求める。Next, the real registers corresponding to the third priority register set and their priorities are obtained based on the critical bus length.

【０１２５】この場合、＄１については、＄１０４と＄
１は生存区間が重なっている（干渉している）ため、優
先順序付けの対象から除外される。In this case, for # 1, # 104 and #
1 is excluded from the priority ordering because the live ranges overlap (interfere).

【０１２６】＄２については、＄１０４に＄２を割り当
てると命令（４）と命令（７）の間に新たにｆａｌｓｅ
−ｄｅｐｅｎｄｅｎｃｙが発生するため、依存グラフは
図１８（ａ）のようになり、このときのクリティカルパ
ス長は３となる。As for $ 2, if $ 2 is assigned to $ 104, a new false is added between instruction (4) and instruction (7).
Since -dependency occurs, the dependency graph is as shown in FIG. 18A, and the critical path length at this time is 3.

【０１２７】一方、＄３については、＄１０４に＄３を
割り当てると、命令（６）と命令（７）の間に新たにｆ
ａｌｓｅ−ｄｅｐｅｎｄｅｎｃｙが発生するため、依存
グラフは図１８（ｂ）のようになり、このときのクリテ
ィカルパス長は４となる。On the other hand, when $ 3 is assigned to $ 104, $ f is newly added between the instruction (6) and the instruction (7).
Since the “alse-dependency” occurs, the dependency graph is as shown in FIG. 18B, and the critical path length at this time is 4.

【０１２８】したがって、ｃｐ＿ｌｅｎｇｔｈ（＄２，＄１０４）＝３ｃｐ＿ｌｅｎｇｔｈ（＄３，＄１０４）＝４となり、クリティカルパス長が小さいものから優先させ
ると、仮想レジスタ＄１０４に割り当てるための実レジ
スタの優先順序は優先度の高い順に＄２，＄３となる。Therefore, cp_length ($ 2, $ 104) = 3 cp_length ($ 3, $ 104) = 4, and if the priority is given to the one with the smaller critical path length, the priority of the real register to be assigned to the virtual register $ 104 is given. The order is # 2, # 3 in descending order of priority.

【０１２９】そして、最終的に仮想レジスタ＄１０４に
割り当てる実レジスタの優先順位は［＄２，＄３］とな
り、ステップＳ２４において、＄２，＄３の順に割り当
て可能なものを見つけ、この例の場合には、＄２が選択
され、これがレジスタ対応表に登録される。この結果、
本例の場合、図１１と同様の実レジスタ割り当ての結果
となる。Then, the priority order of the real registers finally allocated to the virtual register # 104 is [$ 2, $ 3]. In step S24, a register that can be allocated in the order of $ 2, $ 3 is found. In this case, $ 2 is selected and registered in the register correspondence table. As a result,
In the case of this example, the result of the real register allocation is the same as that of FIG.

【０１３０】ところで、本実施形態では、図１０や図１
７の優先順序付けされた実レジスタの列における第１〜
第３優先レジスタ集合の各々について相当するレジスタ
を求めてから、図９に示すステップＳ２４において、該
優先順序付けされた実レジスタの列から対象ノードに割
り当て可能な実レジスタを優先順序に従って検索し、そ
の実レジスタが割り当て可能か否かを判断し、割り当て
可能であればそれを選択してレジスタ対応表（図１１参
照）に登録するようにした。その代わりに、その都度割
り当て可能か否かを判断して、割り当て可能なレジスタ
が得られた時点でその仮想レジスタに対する実レジスタ
割り当て処理を終了するようにしてもよい。すなわち、
第１優先レジスタ集合に相当する実レジスタが得られた
ならば、それが割り当て可能か否かを判断し、割り当て
可能の場合にはそれを選択して以降の処理は省く。割り
当て可能でない場合または第１優先レジスタ集合が空集
合の場合には第２優先レジスタ集合についての処理を行
い、第２優先レジスタ集合に相当する実レジスタが得ら
れたならば、それが割り当て可能か否かを判断し、割り
当て可能の場合にはそれを選択して以降の処理は省く。
割り当て可能でない場合または第２優先レジスタ集合が
空集合の場合には第３優先レジスタ集合についての処理
を行う。なお、仮に割り当て可能か否かを判断する必要
がない場合には、第１優先レジスタ集合に相当する実レ
ジスタが得られた場合にはそれを選択して以降の処理は
省き、そうでない場合には第２優先レジスタ集合につい
ての処理を行い、その処理においても実レジスタが定ま
らないときに、第３優先レジスタ集合についての処理を
行うようにしてもよい。By the way, in this embodiment, FIG.
7 in the sequence of the 7 real-ordered real registers.
After obtaining a corresponding register for each of the third priority register sets, in step S24 shown in FIG. 9, a search is made for the real registers that can be assigned to the target node from the priority-ordered real register column according to the priority order. It is determined whether a register can be assigned, and if it can be assigned, it is selected and registered in the register correspondence table (see FIG. 11). Instead, it may be determined whether or not the assignment is possible each time, and when the assignable register is obtained, the real register assignment process for the virtual register may be terminated. That is,
If an actual register corresponding to the first priority register set is obtained, it is determined whether or not it can be allocated. If it is, it is selected and the subsequent processing is omitted. If the assignment is not possible or if the first priority register set is an empty set, the process for the second priority register set is performed, and if a real register corresponding to the second priority register set is obtained, is it possible to assign it? It is determined whether or not it can be assigned, and if it can be assigned, it is selected and the subsequent processing is omitted.
When the assignment is not possible or when the second priority register set is an empty set, the process for the third priority register set is performed. If it is not necessary to determine whether or not allocation is possible, if an actual register corresponding to the first priority register set is obtained, it is selected and the subsequent processing is omitted. May perform a process on the second priority register set, and may perform a process on the third priority register set when an actual register is not determined in the process.

【０１３１】以上の実施形態ではグラフカラーリング手
法によるレジスタ割り当てを用いた場合について説明し
たが、本発明は、グラフカラーリング手法に限定される
ものではなく、様々なレジスタ割り当て手法において仮
想レジスタに割り当てるべき実レジスタを複数の候補の
中から選択する場合に適用可能である。In the above embodiment, the case where the register allocation by the graph coloring method is used has been described. However, the present invention is not limited to the graph coloring method, but allocates to virtual registers in various register allocation methods. This is applicable when a real register to be selected is selected from a plurality of candidates.

【０１３２】さて、前述したように、本発明はｏｕｔ−
ｏｆ−ｏｒｄｅｒ実行可能なＶＬＩＷにも適用可能であ
るが、従来はｏｕｔ−ｏｆ−ｏｒｄｅｒ実行可能なＶＬ
ＩＷ自体がなかったので、以下では、ｏｕｔ−ｏｆ−ｏ
ｒｄｅｒ命令発行ＶＬＩＷ（以下、ダイナミックＶＬＩ
Ｗ）プロセッサに係る発明の実施形態について説明す
る。As described above, the present invention provides an out-
Although the present invention can be applied to a VLIW that can execute an out-of-order, a conventional VL that can execute an out-of-order can be used.
Since there was no IW itself, in the following, out-of-o
der instruction issuance VLIW (hereinafter referred to as dynamic VLI
W) An embodiment of the invention relating to a processor will be described.

【０１３３】なお、以下では、１つのＶＬＩＷ命令を構
成している個々の命令をアトムと呼ぶことがある。図１
９に、１つのＶＬＩＷ命令の一例を示す。これは、３つ
のアトムから１つのＶＬＩＷ命令が構成される例であ
る。また、ＶＬＩＷ命令を構成する個々のアトムが入る
べき位置を、スロットと呼ぶ。In the following, each of the instructions constituting one VLIW instruction may be called an atom. FIG.
FIG. 9 shows an example of one VLIW instruction. This is an example in which one VLIW instruction is composed of three atoms. The position where each atom constituting the VLIW instruction should enter is called a slot.

【０１３４】さて、命令レベルの並列度を上げる方法と
してコンパイル時にスタティックに資源を割り当て使用
するＶＬＩＷによる方法と、実行時に資源の割り当てを
ダイナミックに行うスーパースカラーの方法とがある。
ＶＬＩＷ方式ではコンパイラにより同時実行可能な命令
を検出するので、実行時に検出するメカニズムが必要な
く、実行時のハードウェアが単純化され、高い周波数が
達成される可能性がある。しかし、コンパイラにより同
時実行可能な命令を検出する方法にはコンパイラでは完
全に予測できないあるいは現実的に予測不可能なパラメ
ータが存在する。As a method for increasing the degree of parallelism at the instruction level, there are a VLIW method in which resources are statically allocated and used at compile time, and a superscalar method in which resources are dynamically allocated during execution.
In the VLIW method, since a compiler detects instructions that can be executed simultaneously, a mechanism for detecting the instructions at the time of execution is not required, hardware at the time of execution is simplified, and a high frequency may be achieved. However, the method of detecting instructions that can be executed simultaneously by the compiler includes parameters that cannot be completely predicted by the compiler or that cannot be realistically predicted.

【０１３５】このダイナミックＶＬＩＷ方式は、スーパ
ースカラー方式とＶＬＩＷ方式の中間に位置するもの
で、基本的にはＶＬＩＷ方式でありながら一部をダイナ
ミックに実行することにより、コンパイラ時に予測困難
な事項に対してもある程度ダイナミックに動作し、プロ
セッサ全体を止めることなく処理を進めることができる
ようにしたものである。つまり、このダイナミックＶＬ
ＩＷ方式は、ハードウェアとソフトウェア（コンパイ
ラ）の新たな最適点を求め、性能を最適化することを目
指したものである。The dynamic VLIW system is located between the superscalar system and the VLIW system. Basically, the dynamic VLIW system partially executes the VLIW system so that it is possible to reduce problems that are difficult to predict at the time of compiler. However, it operates to some extent dynamically and can proceed with processing without stopping the entire processor. That is, this dynamic VL
The IW method seeks a new optimal point of hardware and software (compiler) and aims at optimizing performance.

【０１３６】ダイナミックＶＬＩＷ方式によるプロセッ
サの基本的な構成においては、フェッチしたが実行でき
ないアトムを、後続のアトムを先行して実行させること
を可能とするために一時待避させておくためのペンディ
ングキューを備え、各レジスタの使用状況に関する情報
を記憶・管理し、この情報に基づいて、フェッチしたア
トムの実行可否の判断を行い、実行可能であればフェッ
チしたアトムを実行し、実行可能でないならばフェッチ
したアトムをペンディングキューに蓄積するとともに、
ペンディングキューに蓄積されているアトムの実行可否
の判断を行い、実行可能であれば該アトムを実行するこ
とにより、先行するアトムが直ちには実行できない場合
にこれを一時待避しておき後続のアトムを先に実行でき
るようにしている。In a basic configuration of a processor based on the dynamic VLIW method, a pending queue for temporarily saving an fetched but unexecutable atom in order to enable a subsequent atom to be executed in advance is provided. It stores and manages information on the use status of each register, determines whether or not the fetched atom can be executed based on this information, executes the fetched atom if executable, and fetches if not executable The accumulated atoms are stored in the pending queue,
Judgment of the execution of the atom stored in the pending queue is performed. If the execution is possible, the atom is executed. It can be executed first.

【０１３７】このダイナミックＶＬＩＷ方式は、ＶＬＩ
Ｗ命令ごとにアトムをフェッチしていく点は従来のｉｎ
−ｏｒｄｅｒ命令発行ＶＬＩＷと同様であるが、同時に
フェッチしたＶＬＩＷ命令の複数のアトムのうちに実行
できないものがでてきた場合に、従来のｉｎ−ｏｒｄｅ
ｒ命令発行ＶＬＩＷでは常にフェッチを中断することに
なるが、このダイナミックＶＬＩＷ方式ではフェッチを
中断させないで済む可能性がでてくるわけである。This dynamic VLIW method is based on the VLI
The point of fetching an atom for each W instruction is the conventional in
-Order instruction issuance VLIW is the same as that of the VLIW instruction, but when a plurality of atoms of the simultaneously fetched VLIW instruction cannot be executed, the conventional in-order
In the r instruction issue VLIW, the fetch is always interrupted, but in the dynamic VLIW method, there is a possibility that the fetch need not be interrupted.

【０１３８】図２０は、このようなダイナミックＶＬＩ
Ｗプロセッサの基本的な構成を表す概念的な図である。
図２０では、２つのパイプラインユニット（１００６−
１，１００６−２）を持つ場合を例としている。このダ
イナミックＶＬＩＷプロセッサは、命令列からフェッチ
したアトムが直ちには実行できない場合にこれを実行待
ちとして待避させておくためのペンディングキュー（Ｐ
ｅｎｄｉｎｇＱｕｅｕｅ）というスロット毎に独立に
設けたキュー１００２−１，１００２−２と、各レジス
タの使用状況に関する情報を各レジスタ毎に管理するた
めのスコアボード１００４というテーブルを用いて、ｏ
ｕｔ−ｏｆ−ｏｒｄｅｒを実現している例である。FIG. 20 shows such a dynamic VLI.
It is a conceptual diagram showing the basic structure of W processor.
In FIG. 20, two pipeline units (1006-
1, 1006-2). This dynamic VLIW processor is provided with a pending queue (P) for saving an atom fetched from an instruction sequence as an execution wait when the atom cannot be immediately executed.
Using queues 1002-1 and 1002-2 provided independently for each slot called “ending queue” and a scoreboard 1004 for managing information on the usage status of each register for each register, o
This is an example of realizing out-of-order.

【０１３９】フェッチされたＶＬＩＷ命令の複数のアト
ムのうち実行されないアトムは、実行可能になるまで、
対応するペンディングキューに保存される。An atom that is not executed among a plurality of atoms of the fetched VLIW instruction is executed until it becomes executable.
Stored in the corresponding pending queue.

【０１４０】ペンディングキューはＦＩＦＯ（先入れ先
出し型のバッファ）で構成すると好ましい。ペンディン
グキューをＦＩＦＯで構成すると、ペンディングキュー
に蓄積された先頭のアトムから順に実行されることにな
り、この点が従来のスーパースカラーのリオーダーバッ
ファの場合と異なってくる。つまり、実行可能なアトム
がペンディングキューに存在するのに実行できない場合
があるという性能上の制約と引き換えに、ハードウェア
を非常に単純化させて高速化を図ることができる。The pending queue is preferably constituted by a FIFO (first-in first-out buffer). When the pending queue is configured by FIFO, the execution is performed in order from the first atom stored in the pending queue, which is different from the conventional superscalar reorder buffer. In other words, in exchange for a performance constraint that an executable atom may not be executable even if it exists in the pending queue, the hardware can be greatly simplified and the speed can be increased.

【０１４１】さらに、ペンディングキューは、ＶＬＩＷ
命令を構成する個々のアトムが入るべきスロットごとに
設けるのが好ましい。例えば、図２０に例示したＶＬＩ
Ｗ命令の形式を使う場合には、スロットが２つあるの
で、ペンディングキューは２つ用意されることになる。
そして、フェッチされたＶＬＩＷ命令のうち実行されな
いアトムは、そのスロットに対応するペンディングキュ
ーに投入する。このようにスロットごとにペンディング
キューが存在し、スロット間をまたぐことがないこと
も、ハードウェアを単純化して高速化を図るための制限
の一つになる。Further, the pending queue is VLIW
It is preferably provided for each slot into which the individual atoms making up the instruction are to enter. For example, the VLI illustrated in FIG.
When the format of the W instruction is used, since there are two slots, two pending queues are prepared.
Then, an atom that is not executed among the fetched VLIW instructions is put into a pending queue corresponding to the slot. The fact that a pending queue exists for each slot as described above and does not span slots also constitutes one of the limitations for simplifying hardware and increasing the speed.

【０１４２】各サイクル／各スロットにおいて、実行の
機会を与えるアトムには、通常の命令列からフェッチし
たアトムと、ペンディングキューが空でない場合におけ
るペンディングキューからのアトムとがあり得るが、
（１）フェッチしたアトム、（２）ペンディングキュー
のアトムの順に、実行が優先される。In each cycle / each slot, the atoms giving an opportunity to execute may be an atom fetched from a normal instruction sequence or an atom from the pending queue when the pending queue is not empty.
Execution is prioritized in the order of (1) the fetched atom and (2) the atom in the pending queue.

【０１４３】実行の機会が与えられたアトム（フェッチ
したアトムまたはペンディングキューの先頭にあるアト
ム）が実行可能かどうかについての判定は、スコアボー
ドの内容（当該アトムに関連するレジスタの使用状況）
に基づいて行い、基本的には、当該アトムが使うレジス
タが当該アトムにとって利用可能でないときは、当該ア
トムが実行できないと判定される。The determination as to whether an atom given an execution opportunity (a fetched atom or an atom at the head of a pending queue) can be executed is based on the contents of the scoreboard (use of registers related to the atom).
Basically, if the register used by the atom is not available to the atom, it is determined that the atom cannot be executed.

【０１４４】以上のように、本ダイナミックＶＬＩＷ方
式では、直ちには実行できないアトムをペンディングキ
ューに一時待避しておき、それが実行可能になったら実
行するという方法で、ｏｕｔ−ｏｆ−ｏｒｄｅｒを実現
している。As described above, in the present dynamic VLIW method, an out-of-order is realized by temporarily saving atoms that cannot be executed immediately in the pending queue and executing them when they become executable. ing.

【０１４５】なお、このダイナミックＶＬＩＷ方式で
は、レジスタについては、プロセッサ内にリネーミング
の構成を持たず、コンパイラによりレジスタを割り当て
るものとする。レジスタリネーミングを行わないように
することで、ハードウェアを単純にすることができる。
なお、このために、ＶＬＩＷの命令列を生成するコンパ
イラとして、ｆａｌｓｅｄｅｐｅｎｄｅｎｃｙが起こ
らないようにレジスタ割付を行うものが用いられる（公
知のコンパイラで構わない）。In this dynamic VLIW system, registers are assigned by a compiler without having a renaming structure in the processor. Eliminating register renaming can simplify the hardware.
For this purpose, as a compiler for generating a VLIW instruction sequence, a compiler that allocates registers so that false dependency does not occur is used (a known compiler may be used).

【０１４６】次に、このダイナミックＶＬＩＷの作用効
果を示すために、簡単な例を使ってその概要を説明す
る。Next, an outline of the dynamic VLIW will be described with reference to a simple example in order to show the effect of the dynamic VLIW.

【０１４７】図２１に、実行される命令列の例として、
一つのＶＬＩＷ命令に二つのアトムが含まれる場合の命
令列の一例を示す。FIG. 21 shows an example of an instruction sequence to be executed.
An example of an instruction sequence when two atoms are included in one VLIW instruction is shown.

【０１４８】なお、図２１では、各アトムは、ニーモニ
ック、ディスティネーション（ｄｅｓｔ）のレジスタ、
第１のソース（ｓｒｃ１）のレジスタ、第２のソース
（ｓｒｃ２）のレジスタの順番で表記するものとする。In FIG. 21, each atom is a mnemonic, a destination (dest) register,
Registers of the first source (src1) and registers of the second source (src2) are described in this order.

【０１４９】図２１に示されるように、この命令列は、ＡＤＤＲ８，Ｒ９，Ｒ１０とＬＤＲ５，（Ｒ３）、ＬＤＩＲ１８，１０００とＡＤＤＩＲ１３，Ｒ９，
４、ＡＤＤＲ２１，Ｒ１８，Ｒ９とＳＵＢＲ１１，Ｒ
５，Ｒ８、ＬＳＲＲ２２，Ｒ２１，５とＯＲＩＲ２４，Ｒ２
１，０ｘＦＦ、ＳＵＢＩＲ２５，Ｒ２４，５とＮＯＰ、ＢＲＺＲ１１，Ｒ０，ＲＯＯＰ＿ＥＸＴとＮＯＰが、この順に１組ずつフェッチされることになる。As shown in FIG. 21, this instruction sequence includes ADD R8, R9, R10 and LD R5, (R3), LDI R18, 1000 and ADD R13, R9,
4. ADD R21, R18, R9 and SUB R11, R
5, R8, LSR R22, R21, 5 and ORI R24, R2
1, 0xFF, SUBI R25, R24, 5 and NOP, BRZ R11, R0, ROOP_EXT and NOP are fetched one set at a time in this order.

【０１５０】なお、ＮＯＰアトムは、実際になにも動作
を生じさせない命令であってもよいし、ＡＤＤ等を実行
するが結果としてなにも変化が起こらないような命令で
あってもよい。Note that the NOP atom may be an instruction that does not actually cause any operation, or may be an instruction that executes ADD or the like but does not cause any change as a result.

【０１５１】以下、図２１に例示した命令列が従来のｉ
ｎ−ｏｒｄｅｒ命令発行ＶＬＩＷ方式とダイナミックＶ
ＬＩＷ方式とでそれぞれ実行された場合について比較し
て説明する。Hereinafter, the instruction sequence illustrated in FIG.
n-order instruction issue VLIW method and dynamic V
A description will be made in comparison with the case where the respective processes are executed by the LIW method.

【０１５２】図２２に、この命令列が従来のｉｎ−ｏｒ
ｄｅｒ命令発行ＶＬＩＷ方式で実行された場合の様子を
示し、図２３に、この命令列がダイナミックＶＬＩＷ方
式で実行された場合の様子を示す。FIG. 22 shows that this instruction sequence is a conventional in-or
FIG. 23 shows a state in which the der instruction is executed by the VLIW method, and FIG. 23 shows a state in which the instruction sequence is executed by the dynamic VLIW method.

【０１５３】図２２と図２３の例では、最初のＶＬＩＷ
命令の第２スロットのアトムであるＬＤ（ロード命令）
が１次キャッシュでミスを起こし、該当するデータが２
次キャッシュに存在したために、これをロードしてくる
のに４サイクル必要となったものとする。In the examples of FIGS. 22 and 23, the first VLIW
LD (load instruction), which is the atom of the second slot of the instruction
Caused a miss in the primary cache and the corresponding data was 2
It is assumed that four cycles are required to load this because it exists in the next cache.

【０１５４】図２２に示されるように、この命令列を従
来のｉｎ−ｏｒｄｅｒ命令発行ＶＬＩＷ方式により実行
した場合、サイクル１では、第１スロットのＡＤＤＲ
８，Ｒ９，Ｒ１０と第２スロットのＬＤＲ５，（Ｒ
３）が実行されるが、第２スロットのＬＤがキャシュミ
スを起こしたため、サイクル２〜５の４サイクルは第
１、第２スロットともにＬＤのミスによるストールにな
り（この間、フェッチが中断する）、その後は、順次命
令が実行され、結局、１０サイクルを要して処理が完了
している。As shown in FIG. 22, when this instruction sequence is executed by the conventional in-order instruction issue VLIW method, in cycle 1, ADD R of the first slot is used.
8, R9, R10 and LD R5, (R
3) is executed, but since the LD in the second slot causes a cache miss, the four cycles of cycles 2 to 5 are stall due to the LD miss in both the first and second slots (while fetch is interrupted). Thereafter, the instructions are sequentially executed, and as a result, the processing is completed in 10 cycles.

【０１５５】次に、図２３に示されるように、この命令
列をダイナミックＶＬＩＷ方式により実行した場合、ま
ず、サイクル１では、第１スロットのＡＤＤＲ８，Ｒ
９，Ｒ１０と第２スロットのＬＤＲ５，（Ｒ３）が実
行され、ＬＤがキャシュミスを起す。次のサイクルから
は、このＬＤのディスティネーション・レジスタである
Ｒ５を使用するアトムは、ＬＤが完了するまで実行でき
なくなる（このレジスタＲ５の状況は、スコアボードに
反映される）。Next, as shown in FIG. 23, when this instruction sequence is executed by the dynamic VLIW method, first, in cycle 1, ADD R8, R
9, R10 and LD R5, (R3) in the second slot are executed, and the LD causes a cache miss. From the next cycle, an atom using the destination register R5 of this LD cannot be executed until the LD is completed (the status of this register R5 is reflected on the scoreboard).

【０１５６】サイクル２では、ＶＬＩＷ命令の各アトム
はＬＤのディスティネーション・レジスタであるＲ５を
使用しないため、ＬＤＩＲ１８，１０００とＡＤＤＩ
Ｒ１３，Ｒ９，４が実行される。In cycle 2, since each atom of the VLIW instruction does not use R5, which is the destination register of LD, LDI R18,1000 and ADDI
R13, R9, and 4 are executed.

【０１５７】サイクル３では、第１スロットのＡＤＤ
Ｒ２１，Ｒ１８，Ｒ９はＲ５を使用しないため実行され
るが、第２スロットのＳＵＢＲ１１，Ｒ５，Ｒ８は、
Ｒ５を第１のソースレジスタとして参照するので、実行
できずにペンディングキューへ投入される（スコアボー
ドを参照することによってＲ５が使用できないことが分
かることから、実行できないことが分かる）。また、次
のサイクルからは、ＳＵＢのディスティネーション・レ
ジスタであるＲ１１を使用するアトム（このＳＵＢを除
く）は、このＳＵＢが完了するまで実行できなくなる
（このレジスタＲ１１の状況も、スコアボードに反映さ
れる）。In cycle 3, ADD of the first slot
R21, R18, and R9 are executed because R5 is not used, but SUB R11, R5, and R8 in the second slot are
Since R5 is referred to as the first source register, it cannot be executed and is put into the pending queue (it can be seen that R5 cannot be used by referring to the scoreboard, so it cannot be executed). Also, from the next cycle, atoms (excluding this SUB) using the SUB destination register R11 cannot be executed until this SUB is completed (the status of this register R11 is also reflected on the scoreboard). Is done).

【０１５８】サイクル４では、Ｒ５もＲ１１も使用され
ないので、ＬＳＲＲ２２，Ｒ２１，５とＯＲＩＲ２
４，Ｒ２１，０ｘＦＦが実行される。In cycle 4, since neither R5 nor R11 is used, LSR R22, R21, 5 and ORI R2
4, R21, 0xFF is executed.

【０１５９】サイクル５では、Ｒ５もＲ１１も使用され
ないので、ＳＵＢＩＲ２５，Ｒ２４，５とＮＯＰが実
行される。In cycle 5, since neither R5 nor R11 is used, the SUBI R25, R24, 5 and NOP are executed.

【０１６０】ここで、ＬＤが完了し、次のサイクルから
は、Ｒ５が使用可能となる（このレジスタＲ５の状況
も、スコアボードに反映される）。Here, the LD is completed, and from the next cycle, R5 becomes available (the status of this register R5 is also reflected on the scoreboard).

【０１６１】サイクル６では、まず、第１スロットのＢ
ＲＺＲ１１，Ｒ０，ＲＯＯＰ＿ＥＸは、Ｒ１１をディ
スティネーションとするので、実行できないことがわか
る。なお、詳しくは後述するが、ディスティネーション
とするレジスタが使用できない場合には、ペンディング
キューへは投入せずに、実行可能になるのを待つ（フェ
ッチを中断する）。従って、このサイクルは、空きスロ
ットとなる。フェッチが中断するので、フェッチした第
２スロットの命令も実行が保留される。In cycle 6, first, B in the first slot
RZ R11, R0, and ROOP_EX cannot be executed because R11 is the destination. As will be described in detail later, when the register to be used as the destination cannot be used, the register is not put into the pending queue and waits for the executable state (the fetch is interrupted). Therefore, this cycle becomes an empty slot. Since the fetch is interrupted, the execution of the fetched instruction in the second slot is also suspended.

【０１６２】ここで、第２スロットでは、フェッチの中
断が発生したので、ペンディングキュー中のアトムに実
行の機会が与えられる。ペンディングキューにあるＳＵ
ＢＲ１１，Ｒ５，Ｒ８は、先のＬＤが完了し、Ｒ５が使
用可能となっているので、実行可能であり（スコアボー
ドを参照することによって実行できることが分かる）、
したがってＳＵＢＲ１１，Ｒ５，Ｒ８がペンディング
キューから取り出され、実行される。Here, in the second slot, since the interruption of the fetch has occurred, the execution opportunity is given to the atom in the pending queue. SUs in the pending queue
BR11, R5, and R8 are executable because the previous LD has been completed and R5 is available (it can be understood by referring to the scoreboard).
Therefore, SUB R11, R5, and R8 are taken out of the pending queue and executed.

【０１６３】ここで、ＳＵＢが完了し、次のサイクルか
らは、Ｒ１１が使用可能となる（このレジスタＲ１１の
状況も、スコアボードに反映される）。Here, SUB is completed, and R11 becomes usable from the next cycle (the status of this register R11 is also reflected on the scoreboard).

【０１６４】サイクル７では、第１スロットで実行を待
っていたＢＲＺＲ１１，Ｒ０，ＲＯＯＰ＿ＥＸＴが、
実行可能となって、実行され、第２スロットでは実行を
待っていたＮＯＰが実行される。In cycle 7, BRZ R11, R0, ROOP_EXT waiting for execution in the first slot are
Executable and executed, and in the second slot, the NOP waiting for execution is executed.

【０１６５】この結果、７サイクルを要して処理が完了
したことになる。As a result, the process is completed in seven cycles.

【０１６６】以上のように、従来のｉｎ−ｏｒｄｅｒ命
令発行ＶＬＩＷ方式では１０サイクルかかるところが、
ダイナミックＶＬＩＷ方式ではＬＤアトムによるミスの
期間中に他のアトムが実行できるｏｕｔ−ｏｆ−ｏｒｄ
ｅｒの機能により、７サイクルで実行が完了し、高速化
できることがわかる。As described above, the conventional in-order instruction issuance VLIW system takes 10 cycles.
In the dynamic VLIW method, other atoms can execute out-of-ord during a miss due to an LD atom.
It can be seen that the execution of the er function is completed in seven cycles and the speed can be increased.

【０１６７】このようなｏｕｔ−ｏｆ−ｏｒｄｅｒ実行
可能なＶＬＩＷ方式によるプロセッサを対象とするコン
パイラに対しても本発明は適用可能である。The present invention is also applicable to a compiler for such a VLIW processor that can execute out-of-order.

【０１６８】なお、本実施形態におけるコンパイラはソ
フトウェアとしても実現可能である。また、本実施形態
におけるコンパイラは、コンピュータに所定の手段を実
行させるための（あるいはコンピュータを所定の手段と
して機能させるための、あるいはコンピュータに所定の
機能を実現させるための）プログラムを記録したコンピ
ュータ読取り可能な記録媒体としても実施することもで
きる。Note that the compiler in the present embodiment can also be realized as software. Further, the compiler according to the present embodiment is a computer-readable program that records a program for causing a computer to execute predetermined means (or for causing a computer to function as predetermined means or for causing a computer to realize predetermined functions). It can also be implemented as a possible recording medium.

【０１６９】本発明は、上述した実施の形態に限定され
るものではなく、その技術的範囲において種々変形して
実施することができる。The present invention is not limited to the above-described embodiments, but can be implemented with various modifications within the technical scope.

【０１７０】[0170]

【発明の効果】本発明によれば、レジスタ割り当てにと
もなって命令間に新たな依存関係が発生することを防
ぎ、もし発生してしまう場合でも新たな依存関係になる
命令間の実行されるタイミングをなるべく離れたものに
することができるようになる。そのため、ハードウェア
によるレジスタ・リネーミング機構を用いることなく、
ｏｕｔ−ｏｆ−ｏｒｄｅｒ実行時のＩＬＰの低下の原因
となるｆａｌｓｅｄｅｐｅｎｄｅｎｃｙを最小限に抑
えることが可能となる。また、ハードウェアによるレジ
スタ・リネーミングではある限られた命令数に対してし
かできないのに対して、コンパイラでおこなえば広範囲
なレジスタ解析を行うことができレジスタをより有効活
用できるようになる。According to the present invention, it is possible to prevent a new dependency between instructions from being generated due to register allocation, and to execute a timing between instructions that have a new dependency even if it occurs. Can be as far away as possible. Therefore, without using a register renaming mechanism by hardware,
It is possible to minimize false dependency, which causes a decrease in ILP during execution of out-of-order. Moreover, while register renaming by hardware can be performed only for a limited number of instructions, a wide range of register analysis can be performed by a compiler, and registers can be more effectively used.

[Brief description of the drawings]

【図１】本発明の一実施形態に係るコンパイラの構成例
を示す図FIG. 1 is a diagram showing a configuration example of a compiler according to an embodiment of the present invention.

【図２】グラフカラーリングの手順を示すフローチャー
トFIG. 2 is a flowchart showing a procedure of graph coloring;

【図３】プログラムの一例を示す図FIG. 3 shows an example of a program.

【図４】依存グラフを示す図FIG. 4 is a diagram showing a dependency graph.

【図５】各仮想レジスタの生存区間を示す図FIG. 5 is a diagram showing a live range of each virtual register;

【図６】レジスタ干渉グラフを示す図FIG. 6 shows a register interference graph.

【図７】レジスタ干渉グラフの再構築を説明するための
図FIG. 7 is a diagram for explaining reconstruction of a register interference graph;

【図８】仮想レジスタがレジスタ干渉グラフから取り除
かれた順番の記録を示す図FIG. 8 shows a record of the order in which virtual registers are removed from the register interference graph.

【図９】仮想レジスタに対して実レジスタを割り当てる
処理の手順の一例を示すフローチャートFIG. 9 is a flowchart illustrating an example of a procedure of a process of assigning a real register to a virtual register;

【図１０】優先順序付けされた実レジスタの列の一例を
示す図FIG. 10 is a diagram showing an example of a column of real registers in a priority order;

【図１１】レジスタ対応表の一例を示す図FIG. 11 shows an example of a register correspondence table.

【図１２】仮想レジスタに割り当てても新たな依存関係
を生じない実レジスタに関する処理の手順の一例を示す
フローチャートFIG. 12 is a flowchart illustrating an example of a procedure of a process related to a real register that does not cause a new dependency even when assigned to a virtual register;

【図１３】仮想レジスタに割り当てると新たな依存関係
を生じない実レジスタに関する処理の手順の一例を示す
フローチャートFIG. 13 is a flowchart illustrating an example of a procedure of a process related to a real register that does not cause a new dependency when assigned to a virtual register;

【図１４】対象仮想レジスタと実レジスタの生存区間を
示す図FIG. 14 is a diagram showing a live range of a target virtual register and a real register.

【図１５】従来方法による処理結果の一例を示す図FIG. 15 is a diagram showing an example of a processing result according to a conventional method.

【図１６】本実施形態における処理結果の一例を示す図FIG. 16 is a diagram illustrating an example of a processing result according to the embodiment;

【図１７】優先順序付けされた実レジスタの列の他の例
を示す図FIG. 17 is a diagram showing another example of a column of real registers in a priority order;

【図１８】クリティカルバス長を用いて実レジスタ割り
当てを行う場合のクリティカルパスの例を示す図FIG. 18 is a diagram illustrating an example of a critical path when real registers are allocated using a critical bus length.

【図１９】ＶＬＩＷ命令の一例を示す図FIG. 19 is a diagram showing an example of a VLIW instruction;

【図２０】ダイナミックＶＬＩＷ方式について説明する
ための図FIG. 20 is a diagram for explaining a dynamic VLIW method;

【図２１】ＶＬＩＷ命令の命令列の一例を示す図FIG. 21 is a diagram showing an example of an instruction sequence of a VLIW instruction;

【図２２】図２１の命令列を従来のＶＬＩＷ方式で実行
した場合について説明するための図FIG. 22 is a view for explaining a case where the instruction sequence of FIG. 21 is executed by a conventional VLIW method;

【図２３】図２１の命令列をダイナミックＶＬＩＷ方式
で実行した場合について説明するための図FIG. 23 is a view for explaining a case where the instruction sequence of FIG. 21 is executed by a dynamic VLIW method;

【図２４】命令間の依存関係について説明するための図FIG. 24 is a diagram for explaining a dependency relationship between instructions;

[Explanation of symbols]

１…解析部２…最適化部３…出力部２１…命令スケジューリング部２２…レジスタ割り当て部１００２−１，１００２−２…ペンディングキュー（Ｐ
ｅｎｄｉｎｇＱｕｅｕｅ）１００４…スコアボード１００６−１，１００６−２…パイプラインユニットDESCRIPTION OF SYMBOLS 1 ... Analysis part 2 ... Optimization part 3 ... Output part 21 ... Instruction scheduling part 22 ... Register allocation part 1002-1, 1002-2 ... Pending queue (P
Ending Queue) 1004 ... Scoreboard 1006-1, 1006-2 ... Pipeline unit

Claims

[Claims]

A processor having a plurality of arithmetic units capable of executing instructions in parallel and having a function of enabling execution of an instruction following in an instruction arrangement order to be started before execution of an instruction preceding it. A compiling method for generating an object program executable by the processor based on a given source program as an object, comprising: an analyzing step of analyzing the source program to generate a first intermediate code; An instruction scheduling step of performing instruction scheduling based on the intermediate code of, assigning a virtual register as a register for storing a temporary result of an operation, and generating a second intermediate code which is described; and Based on the information about the real registers of the processor, each of the virtual registers A register allocating step of determining a real register to be allocated to the virtual register, and an output step of outputting an object program in which the virtual register is replaced with the allocated real register. Analyzing the section used by being assigned to the virtual register and the section used by the virtual register to which the real register is assigned; and showing the section analysis result and the inter-instruction dependency that has already occurred. Based on the information, if there is a real register that does not cause a new inter-instruction dependency even if the real register is allocated to the allocation target virtual register, the real register is determined as a candidate to be preferentially allocated to the virtual register. And a compiling method.

2. A real register which does not cause a new inter-instruction dependency even when assigned to the allocation-target virtual register, actually causes a redundant inter-instruction dependency, The allocation priority of real registers that can be regarded as not causing a new inter-instruction dependency as a result of the relationship being hidden by the already existing inter-instruction dependency actually causes a redundant inter-instruction dependency. 2. The compiling method according to claim 1, wherein the priority is set higher than the real register allocation priority such that no new instruction-to-instruction dependency is caused by not performing the instruction.

3. The register allocating step includes, among the registers other than the real registers that do not cause the new dependency, real registers whose instruction arrangement order and instruction execution start order can be exchanged with each other. 2. The compiling method according to claim 1, further comprising the step of determining as a candidate to be assigned to a real register that does not cause a dependency in the next priority.

4. A real register whose allocation is determined at the time and a live range of the virtual register to be allocated among the real registers in which the instruction arrangement order and the instruction execution start order can be exchanged. 4. The compiling method according to claim 3, wherein a real register having a larger distance from a live range of the real register is determined as a candidate to be assigned with a higher priority.

5. The compiling method according to claim 3, wherein an actual register having a smaller critical path length in the inter-instruction dependency relationship is determined as a candidate to be assigned with a higher priority.

6. The virtual register to which a real register is to be allocated, based on a number of other virtual registers having a portion overlapping a live range of the virtual register and a number of real registers of the processor. 6. The compiling method according to claim 1, further comprising a step of determining an order of the compilation.

7. The register allocating step selects a real register having the highest priority among real registers determined for the virtual register to be allocated and which can be actually allocated at that time. 7. The compiling method according to claim 1, further comprising the step of storing a correspondence between the virtual register and the selected real register.

8. A processor having a plurality of arithmetic units capable of executing instructions in parallel and having a function of enabling execution of an instruction following in an instruction arrangement order to be started before execution of an instruction preceding it. A compiling device for generating an object program executable by the processor based on a given source program as an object, wherein the analyzing unit analyzes the source program to generate a first intermediate code; Instruction scheduling means for performing instruction scheduling based on the intermediate code, assigning a virtual register as a register for storing a temporary result of the operation, and generating a second intermediate code described therein; Each of the virtual registers is assigned based on information on the real registers of the processor. Register allocating means for determining a real register to be assigned; and output means for outputting an object program in which the virtual register has been replaced with the real register to which the virtual register has been allocated. Means for analyzing a section used by being allocated to the virtual register and a section used by the virtual register to which the real register is allocated; and information indicating the section analysis result and the inter-instruction dependency already occurring. Means for, if there is a real register that does not cause a new inter-instruction dependency even if it is allocated to the virtual register to be allocated, the real register is preferentially determined as a candidate to be allocated to the virtual register; A compiling device comprising:

9. A processor having a plurality of arithmetic units capable of executing instructions in parallel and having a function of enabling execution of an instruction following an instruction in the instruction arrangement order to be started before execution of an instruction preceding the instruction. In order to generate an object program executable by the processor based on a given source program as a target, the source program is analyzed to generate a first intermediate code, and the first intermediate code is analyzed. A virtual intermediate register is allocated and described as a register for storing a temporary result of an operation, and a second intermediate code is generated based on the instruction scheduling based on the second intermediate code and information on a real register of the processor. Register allocation for determining the real register to be allocated to each virtual register. And outputting an object program in which the virtual register is replaced with the real register to which the virtual register is assigned.The real register is assigned to the virtual register in the register assigning step. Analyzing the used section and the used section of the virtual register to which the real register is to be assigned, the allocation target is determined based on the section analysis result and the information indicating the inter-instruction dependency already occurring. If there is a real register that does not cause a new inter-instruction dependency even if it is assigned to the virtual register, a computer-readable recording of a program for causing the real register to be preferentially determined as a candidate to be assigned to the virtual register Possible recording medium.