JP2001075814A

JP2001075814A - Device and method for compilation

Info

Publication number: JP2001075814A
Application number: JP23253399A
Authority: JP
Inventors: Satoshi Koseki; 聰古関; Tatsushi Inagaki; 達氏稲垣; Toshiaki Yasue; 俊明安江; Hideaki Komatsu; 秀昭小松; Mikio Takeuchi; 幹雄竹内
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1999-08-19
Filing date: 1999-08-19
Publication date: 2001-03-23
Anticipated expiration: 2019-08-19
Also published as: JP3405696B2

Abstract

PROBLEM TO BE SOLVED: To effectively optimize an object code in the range satisfying limitation caused by the number of physical registers of a processor. SOLUTION: This compiler device performs code generation from a program represented by a DAG(directed acyclic graph) while evaluating the number of used registers and the number of execution cycles and optimizes a code to be generated. That is, the compiler calculates the number of cycles with which each operation can be executed on the DAG and the number of the currently available registers, performs code generation while preceding an operator on an execution path that takes the most time in the DAG in a part where the number of registers is sufficient, and performs code generation while preceding such an operator as to reduce the number of used registers when the number of registers is not sufficient.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、ソースコードをコンパ
イルし、最適化したオブジェクトコードを生成するコン
パイル装置およびその方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a compiling apparatus and method for compiling source code and generating optimized object code.

【０００２】[0002]

【従来の技術】例えば、、"Compilers -Principles, Te
chniques and Tools-A.V. Aho, R. Sethi and J.D. Ull
man", "Principles of Compiler Design A.V. Aho and
J.D. Ullman", "Advanced Compiler Design Implementa
tion, S.S. Muchnick"等の文献に開示されているよう
に、従来から、高級プログラミング言語で記述されたソ
ースコードをコンパイルし、プロセッサが実行可能なオ
ブジェクトコードを生成するコンパイラが一般的に用い
られており、また、このコンパイラに対しては、より高
速実行可能なオブジェクトコードが得られるように、従
来から、様々なオブジェクトコード生成処理の最適化が
なされてきた。2. Description of the Related Art For example, "Compilers -Principles, Te"
chniques and Tools-AV Aho, R. Sethi and JD Ull
man "," Principles of Compiler Design AV Aho and
JD Ullman "," Advanced Compiler Design Implementa
Conventionally, a compiler that compiles source code described in a high-level programming language and generates object code executable by a processor, as disclosed in literatures such as "Action, SS Muchnick", has been generally used. In addition, various optimizations of the object code generation processing have been conventionally performed on the compiler so that an object code that can be executed at a higher speed can be obtained.

【０００３】コンパイラにおいて、オブジェクトコード
生成処理の最適化をしようとすると、多くの場合、最適
化の適用範囲と、オブジェクトコード実行のために使用
可能な物理レジスタの数との間で、トレードオフを行わ
なければならない。In a compiler, when trying to optimize an object code generation process, a trade-off is often made between the scope of the optimization and the number of physical registers available for executing the object code. It must be made.

【０００４】例えば、コードスケジューリングにより最
適化を行うためには、データ依存性がある命令同士の間
隔を離してプロセッサのパイプラインの充填率を上げ、
プロセッサ内の演算器の遅延を隠蔽しなければならな
い。このような場合には、パイプライン処理される複数
の数値を格納するために、一時的なレジスタが必要とさ
れる。また、例えば、共通部分式（Common Subexpressi
ons）を省略して、演算の個数を減らす最適化を行う場
合には、計算された共通部分式の値を格納するレジスタ
が必要とされる。For example, in order to perform optimization by code scheduling, an interval between instructions having data dependency is increased to increase a filling rate of a pipeline of a processor,
The delay of the arithmetic unit in the processor must be hidden. In such a case, a temporary register is needed to store the plurality of values to be pipelined. Also, for example, a common subexpression (Common Subexpressi
In the case of omitting ons) and performing optimization to reduce the number of operations, a register for storing the calculated value of the common subexpression is required.

【０００５】従来のコンパイラにおける最適化では、共
通部分式削除の最適化処理、レジスタ割り付けを最適化
するレジスタアロケーション処理、コードスケジューリ
ングの最適化を行うコードスケジューリング処理は、そ
れぞれ別々のパスとして実現されていた。これらの最適
化処理パスは、それぞれ、前置された他の最適化処理パ
スの処理結果を入力とする。このように、全段階の処理
結果に対して最適化を行う方法には、例えば、前段階の
処理が、かえって、その次の段階の最適化処理を妨げる
方向に働く可能性があるという問題がある。例えば、共
通部分式削除処理の結果、レジスタの使用数が増えたよ
うな場合、次にレジスタアロケーション処理を行うと、
レジスタの値を退避したり復元したりするレジスタスピ
ルのコードが、不要に発生してしまうことがある。In the conventional optimization by a compiler, the optimization processing for elimination of common subexpressions, the register allocation processing for optimizing register allocation, and the code scheduling processing for optimizing code scheduling are implemented as separate paths. Was. Each of these optimization processing paths receives a processing result of another preceding optimization processing path as an input. As described above, the method of optimizing the processing results of all stages has a problem that, for example, the process of the previous stage may work in a direction that hinders the optimization process of the next stage. is there. For example, in the case where the number of registers used has increased as a result of the common subexpression deletion processing, when register allocation processing is performed next,
Register spill codes for saving and restoring register values may be generated unnecessarily.

【０００６】また、次段階の処理が、前段階の処理の結
果に制限されてしまう可能性もある。例えば、使用可能
なレジスタ数に余裕があっても、レジスタアロケーショ
ン処理の後にコードスケジューリング処理を行うと、コ
ードスケジューリング処理において、レジスタをリネー
ミングして演算器の遅延を隠蔽するという最適化を行う
ことは、難しくなる。There is also a possibility that the processing in the next stage is limited to the result of the processing in the previous stage. For example, even if the number of available registers is sufficient, if code scheduling processing is performed after register allocation processing, in the code scheduling processing, optimization is performed such that registers are renamed to conceal delays of arithmetic units. Gets harder.

【０００７】[0007]

【発明が解決しようとする課題】本発明は、上述した従
来技術の問題点に鑑みてなされたものであり、プロセッ
サの物理レジスタ数に起因する制約を満たす範囲におい
て、コードスケジューリングおよび共通部分式削除によ
り、オブジェクトコードを効果的に最適化することがで
きるコンパイル装置およびその方法を提供することを目
的とする。また、本発明は、ＤＡＧ(directed acyclic
graph)形式で表現されたソースプログラムから、レジス
タ使用数とハードウェアの実行サイクル数を評価しなが
らオブジェクトコードの生成を行い、コードスケジュー
リングによる最適化および共通部分式削除の適用範囲
と、物理レジスタ数の制約とを最適にトレードオフする
ことができるコンパイル装置およびその方法を提供する
ことを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned problems of the prior art, and provides code scheduling and common sub-expression elimination within a range that satisfies a constraint caused by the number of physical registers of a processor. Accordingly, it is an object of the present invention to provide a compiling device and a compiling method capable of effectively optimizing an object code. Further, the present invention provides a DAG (directed acyclic
From the source program expressed in (graph) format, object code is generated while evaluating the number of registers used and the number of hardware execution cycles, the scope of optimization by code scheduling and the elimination of common subexpressions, and the number of physical registers It is an object of the present invention to provide a compiling device and a compiling method that can optimally trade off with the constraints of the above.

【０００８】[0008]

【課題を達成するための手段】［コンパイル装置］上記
目的を達成するために、本発明にかかるコンパイル装置
は、プログラムのソースコードをＤＡＧ形式に変換する
変換手段と、前記変換されたソースコードに含まれる命
令それぞれに対応するオブジェクトコードを、所定数の
レジスタを用いて実行するプロセッサのパラメータを計
算するパラメータ計算手段と、前記変換されたソースコ
ードを走査し、計算された前記プロセッサのパラメータ
に基づいて、前記ソースコードをコンパイルしてオブジ
ェクトコードを生成するコンパイル手段とを有するコン
パイル装置であって、前記パラメータ計算手段は、前記
命令それぞれに対応するオブジェクトコードが、前記プ
ロセッサにより実行されるタイミングを計算するタイミ
ング計算手段と、前記命令それぞれに対応するオブジェ
クトコードが実行されるときに占有される前記プロセッ
サのレジスタ数の増減を示す数値を計算するレジスタ増
減計算手段とを有する。To achieve the above object, a compiling apparatus according to the present invention comprises a converting means for converting a source code of a program into a DAG format, and a converting means for converting the source code into a DAG format. Parameter calculating means for calculating a processor parameter for executing an object code corresponding to each of the included instructions by using a predetermined number of registers; and scanning the converted source code, based on the calculated processor parameter. A compiling unit for compiling the source code to generate an object code, wherein the parameter calculating unit calculates a timing at which the object code corresponding to each of the instructions is executed by the processor. Timing calculation means for performing Serial and a register decrease calculation means for calculating a numerical value indicating the register number of increase or decrease of the processor occupied when the object code corresponding to the instruction, respectively, are performed.

【０００９】好適には、前記コンパイル手段は、前記変
換されたソースコードにおける前記命令の実行順序に沿
って、前記命令に付与された優先順位に従って、前記命
令それぞれを順次コンパイルし、前記命令それぞれのオ
ブジェクトコードを生成するオブジェクトコード生成手
段と、前記レジスタ数の増減を示す数値に基づいて、次
にコンパイルされる可能性がある前記命令それぞれが実
行されるときに占有される前記プロセッサのレジスタの
数を示す占有レジスタ数を計算する占有レジスタ数計算
手段と、計算された前記開放レジスタ数、および、前記
命令のオブジェクトコードが実行されるタイミングに基
づいて、次にコンパイルされる可能性がある前記命令に
優先順位を付与する優先順位付与手段とを有する。Preferably, the compiling means sequentially compiles each of the instructions according to the order of execution of the instructions in the converted source code and in accordance with the priority given to the instructions, and Object code generation means for generating an object code, and the number of registers of the processor occupied when each of the instructions which may be compiled next is executed based on a numerical value indicating increase or decrease of the number of registers Occupied register number calculation means for calculating the number of occupied registers indicating the number of open registers, and the instruction that may be compiled next based on the calculated number of open registers and the timing at which the object code of the instruction is executed And a priority assigning means for assigning a priority.

【００１０】好適には、前記優先順位付与手段は、計算
された前記占有レジスタ数と所定の閾値とを比較し、前
記計算される占有レジスタ数が所定の閾値より多い場合
には、前記計算される占有レジスタ数をより少なく増や
すか、減らす命令に高い優先順位を付し、これ以外の場
合には、クリティカルパスにある命令に高い優先順位を
付す。[0010] Preferably, the priority assigning means compares the calculated number of occupied registers with a predetermined threshold value, and when the calculated occupied register number is larger than a predetermined threshold value, Instructions that increase or decrease the number of occupied registers are given higher priority, and otherwise, instructions on the critical path are given higher priority.

【００１１】好適には、前記コンパイル手段は、前記命
令における共通部分式を削除してからオブジェクトコー
ドの生成を行うように構成されており、前記計算される
占有レジスタ数が前記所定の閾値よりも多い場合に、前
記削除された共通部分式を元に戻す逆変換を行う逆変換
手段をさらに有する。Preferably, the compiling means is configured to generate an object code after deleting a common subexpression in the instruction, and the calculated occupied register number is smaller than the predetermined threshold value. In a case where the number of the common sub-expressions is large, an inverse transform unit for performing an inverse transform for restoring the deleted common subexpression is further provided.

【００１２】好適には、前記プロセッサのレジスタの用
途を制約する所定の制約条件に基づいて、前記レジスタ
の割り付けを行うデータ割付手段をさらに有する。Preferably, the apparatus further comprises data allocating means for allocating the register based on a predetermined constraint condition which restricts a use of a register of the processor.

【００１３】［コンパイル装置の作用］本発明にかかる
コンパイラ装置は、演算子・命令の依存ＤＡＧ(directe
d acyclic graph)で表されたプログラムから、レジスタ
使用数とハードウェアの実行サイクル数を評価しながら
コード生成を行い、コードスケジューリング最適化や共
通部分式削除の適用範囲と物理レジスタ数の制約との間
のトレードオフを最適化するように構成されている。[Operation of Compiler] The compiler according to the present invention provides an operator / instruction dependent DAG (directe
d acyclic graph), the code is generated while evaluating the number of registers used and the number of hardware execution cycles, and the application range of code scheduling optimization and common subexpression elimination and the restriction on the number of physical registers It is configured to optimize the trade-off between.

【００１４】本発明にかかるコンパイラ装置は、ＤＡＧ
上で各演算の実行可能なサイクル数を計算し、ＤＡＧか
らのコード生成を行う際に、現在使用可能なレジスタ数
を計算する。レジスタ数に余裕がある部分では、ＤＡＧ
の中で最も時間のかかる実行パス上の演算子を優先して
コード生成を行い、レジスタ使用数を増やしてハードウ
ェアのパイプライン充填率が高くなるようなコード生成
を行う。使用可能なレジスタ数が少ない部分では、使用
レジスタ数を減らすような演算子を優先してコード生成
を行い、余分なスピルコードが発生しないようにする。
また、コード生成中にＤＡＧ上で共通部分式削除の逆変
換を行うことで、最適化によって無駄なスピルコードが
発生することを避ける。[0014] The compiler device according to the present invention comprises a DAG
The number of executable cycles of each operation is calculated above, and the number of registers that can be currently used when code is generated from the DAG is calculated. In areas where there is room for registers, DAG
, The code generation is performed with priority given to the operator on the execution path which takes the longest time, and the code generation is performed such that the pipeline filling rate of hardware is increased by increasing the number of registers used. In a part where the number of registers that can be used is small, code generation is performed with priority given to an operator that reduces the number of registers used, so that unnecessary spill code is not generated.
In addition, by performing the inverse conversion of the common subexpression deletion on the DAG during the code generation, it is possible to avoid the generation of useless spill codes due to the optimization.

【００１５】［コンパイル方法］また、本発明にかかる
コンパイル方法は、プログラムのソースコードをＤＡＧ
形式に変換する変換ステップと、前記変換されたソース
コードに含まれる命令それぞれに対応するオブジェクト
コードを、所定数のレジスタを用いて実行するプロセッ
サのパラメータを計算するパラメータ計算ステップと、
前記変換されたソースコードを走査し、計算された前記
プロセッサのパラメータに基づいて、前記ソースコード
をコンパイルしてオブジェクトコードを生成するコンパ
イルステップとを含むコンパイル方法であって、前記パ
ラメータ計算ステップにおいて、前記命令それぞれに対
応するオブジェクトコードが、前記プロセッサにより実
行されるタイミングを計算し、前記命令それぞれに対応
するオブジェクトコードが実行されるときに占有される
前記プロセッサのレジスタ数の増減を示す数値を計算す
る。[Compile method] The compile method according to the present invention provides a compile method in which a program
A conversion step of converting to a format, an object code corresponding to each of the instructions included in the converted source code, a parameter calculation step of calculating a parameter of a processor that executes using a predetermined number of registers,
Scanning the converted source code and, based on the calculated parameters of the processor, compiling the source code to generate an object code. Calculate the timing at which the object code corresponding to each of the instructions is executed by the processor, and calculate a numerical value indicating an increase or decrease in the number of registers of the processor occupied when the object code corresponding to each of the instructions is executed. I do.

【００１６】［媒体］また、本発明にかかる媒体は、プ
ログラムのソースコードをＤＡＧ形式に変換する変換ス
テップと、前記変換されたソースコードに含まれる命令
それぞれに対応するオブジェクトコードを、所定数のレ
ジスタを用いて実行するプロセッサのパラメータを計算
するパラメータ計算ステップと、前記変換されたソース
コードを走査し、計算された前記プロセッサのパラメー
タに基づいて、前記ソースコードをコンパイルしてオブ
ジェクトコードを生成するコンパイルステップとをコン
ピュータに実行させるプログラムであって、前記パラメ
ータ計算ステップにおいて、前記命令それぞれに対応す
るオブジェクトコードが、前記プロセッサにより実行さ
れるタイミングを計算する処理と、前記命令それぞれに
対応するオブジェクトコードが実行されるときに占有さ
れる前記プロセッサのレジスタ数の増減を示す数値を計
算する処理とをコンピュータに実行させるプログラムを
媒介する。[Medium] Further, a medium according to the present invention includes a conversion step of converting a source code of a program into a DAG format, and converting a predetermined number of object codes corresponding to instructions included in the converted source code into a predetermined number. A parameter calculating step of calculating a parameter of a processor to be executed using a register; scanning the converted source code; and compiling the source code based on the calculated parameter of the processor to generate an object code. A program for causing a computer to execute a compiling step, wherein in the parameter calculating step, an object code corresponding to each of the instructions calculates a timing to be executed by the processor; and an object code corresponding to each of the instructions. Mediating program for executing a process of calculating a numerical value indicating the register number of increase or decrease of the processor in the computer occupied when the Tokodo is executed.

【００１７】[0017]

【発明の実施の形態】［第１実施形態］以下、本発明の
第１実施形態を説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [First Embodiment] A first embodiment of the present invention will be described below.

【００１８】［第１の実施形態として示すコンパイラの
概要］本発明に係るコンパイラは、まず、ＤＡＧ(direc
ted acyclic graph)に含まれる演算それぞれを実行する
ために必要なプロセッサの実行サイクル数を計算し、さ
らに、ＤＡＧに含まれる演算それぞれから、オブジェク
トコードを生成するのと同時に、演算それぞれのために
使用するレジスタ数を計算する。[Summary of Compiler Shown as First Embodiment] A compiler according to the present invention first uses a DAG (direc
Calculates the number of execution cycles of the processor required to execute each operation included in the ted acyclic graph), generates object code from each operation included in the DAG, and uses it for each operation. Calculate the number of registers to perform.

【００１９】［コード生成］さらに、上述のように得ら
れたオブジェクトコード実行に必要とされるレジスタの
数が少なく、レジスタの数に余裕がある場合は、本発明
に係るコンパイラは、演算に要する実行時間が長ければ
長いほど処理が行われる順番の優先順位を高くしてＤＡ
Ｇ中の演算に対応したオブジェクトコードを生成し、オ
ブジェクトコードを実行するプロセッサのレジスタ使用
数を増やし、かつ、プロセッサのパイプライン充填率を
高くするように最適化を行う。[Code Generation] Further, when the number of registers required for executing the object code obtained as described above is small and the number of registers has a margin, the compiler according to the present invention requires the operation. The longer the execution time, the higher the priority of the order in which the processing is performed
An object code corresponding to the operation in G is generated, the number of registers used by the processor executing the object code is increased, and optimization is performed to increase the pipeline filling rate of the processor.

【００２０】反対に、レジスタの数に余裕がない場合
は、本発明に係るコンパイラは、演算に必要とされるレ
ジスタ数が少なければ少ないほど処理が行われる順番の
優先順位を高くしてＤＡＧ中の演算に対応したオブジェ
クトコードを生成して、レジスタの内容がメモリに待避
されるコード（スピルコード）の発生量を少なくし、さ
らに、オブジェクトコード生成の際に、すでに行われた
共通部分式削除の最適化を取り消してコード生成するこ
とにより、無駄なスピルコードが発生しないようにす
る。Conversely, if the number of registers is not large enough, the compiler according to the present invention increases the priority of the order in which the processing is performed as the number of registers required for the operation is smaller, and sets the priority in the DAG. Generates object code corresponding to the operation of (1), reduces the amount of code (spill code) that saves register contents in memory, and deletes common sub-expressions already generated at the time of object code generation. By canceling the optimization of and generating code, useless spill code is not generated.

【００２１】このようなオブジェクトコードの最適化の
ために、本発明に係るコンパイラは、ＤＡＧからオブジ
ェクトコードを生成する際に、演算子のレジスタ使用数
と、実行開始が可能なサイクル数を評価してコードの生
成順を決定し、オブジェクトコードを生成する際に、コ
ード生成中に刻々と変化するレジスタの使用数に基づい
て、コード生成方針の変更、すなわち使用レジスタ数を
多くしてパイプライン重点率を高くする方針から使用レ
ジスタ数を少なくしてスピルコードを最小にする方針へ
の変更や共通部分式削除の最適化の取り消しを動的に行
い、ハードウェアの制約と、オブジェクトコードの効率
化とをトレードオフする。For such optimization of the object code, the compiler according to the present invention evaluates the number of registers used by the operator and the number of cycles at which execution can be started when generating the object code from the DAG. When generating the object code, the code generation policy is changed based on the number of registers that change every time the code is generated. Dynamically changing from a policy of increasing the ratio to a policy of minimizing spill code by reducing the number of registers used and optimizing the elimination of common sub-expressions, restricting hardware and increasing the efficiency of object code And trade off.

【００２２】このようにして、本発明に係るコンパイラ
は、オブジェクトコードを最適化する範囲を、ＤＡＧに
含まれる演算の内、オブジェクトコードの実行が、プロ
セッサの物理的レジスタ数、つまり、ハードウェア的な
制約を受けない演算のみに限定する。さらに、本発明に
係るコンパイラ方法の処理を、レジスタアロケーション
処理およびコードスケジューリング処理に前置すると、
オブジェクトコードの最適化のためには、これら２つの
処理を協調動作させなければならないような部分の制約
を緩和することができるので、レジスタアロケーション
処理とコードスケジューリング処理とを、分離した処理
パスとして実装しても、オブジェクトコードを最適化す
ることができる。As described above, the compiler according to the present invention determines the range of optimizing the object code as the number of physical registers of the processor, that is, the number of physical registers of the processor, among the operations included in the DAG. Limited to operations that are not subject to any restrictions. Further, when the processing of the compiler method according to the present invention is prepended to the register allocation processing and the code scheduling processing,
In order to optimize the object code, it is possible to alleviate the restriction on the part where these two processes must cooperate, so that the register allocation process and the code scheduling process are implemented as separate processing paths. Even so, the object code can be optimized.

【００２３】［第１の実施形態として示すコンパイラの
実現方法］本発明に係るコンパイラは、これまでに述べ
たオブジェクトコードの最適化のために、以下の３つの
ステップを含んでいる。[Method of Implementing Compiler Shown as First Embodiment] The compiler according to the present invention includes the following three steps for optimizing the object code described above.

【００２４】［第１のステップ］第１のステップにおい
て、本発明に係るコンパイラは、ソースコードを、演算
子のデータ依存と副作用の依存を表すＤＡＧの形式で表
現する。[First Step] In the first step, the compiler according to the present invention expresses the source code in a DAG format representing the dependence of the operator on the data and the dependence on the side effect.

【００２５】［第２のステップ］第２のステップにおい
て、本発明に係るコンパイラは、ソースコードを表現す
るＤＡＧの各頂点について、サイクル数や使用レジスタ
数の増減など、コード生成で使用されるハードウェアの
パラメータを計算する。[Second Step] In the second step, the compiler according to the present invention uses hardware used for code generation, such as increasing or decreasing the number of cycles or the number of registers used, for each vertex of the DAG representing the source code. Calculate wear parameters.

【００２６】［第３のステップ］第３のステップにおい
て、本発明に係るコンパイラは、上述の「コード生成」
に示したように、ＤＡＧを走査して、使用レジスタ数、
実行サイクルを考慮したコード生成を行い、最適化され
たオブジェクトコード列を得る。この第３のステップ
で、本発明に係るコンパイラは、コード生成中の使用レ
ジスタ数を計算しながら、オブジェクトコードを生成
し、そのときの使用レジスタ数が、例えば実験や経験に
より求められる所定値（閾値）以下になったときは、そ
れ以後のオブジェクトコード生成の方針の変更、すなわ
ち使用レジスタ数を多くしてパイプライン重点率を高く
する方針から使用レジスタ数を少なくしてスピルコード
を最小にする方針への変更を行い、必要に応じて共通部
分式削除の取り消しを行う。[Third Step] In the third step, the compiler according to the present invention executes the above-described "code generation".
As shown in, the DAG is scanned to find the number of registers used,
A code is generated in consideration of an execution cycle to obtain an optimized object code sequence. In the third step, the compiler according to the present invention generates an object code while calculating the number of used registers during code generation, and determines the number of used registers at that time by a predetermined value (for example, an experiment or experience). When the number of registers used becomes smaller than the threshold, the number of used registers is reduced to minimize the spill code by changing the object code generation policy thereafter, that is, by increasing the number of registers used and increasing the pipeline priority ratio. Make changes to the policy and undo common subexpression elimination as needed.

【００２７】なお、本発明に係るコンパイラは、後述す
るように、実際には、使用されていないレジスタ数、つ
まり、オブジェクトコードの実行のために占有されてい
ないレジスタ数を計算するが、使用レジスタ数と占有さ
れていないレジスタ数とは等価であり、表現の差異に過
ぎない。As will be described later, the compiler according to the present invention calculates the number of registers that are not actually used, that is, the number of registers that are not occupied for executing the object code. The number and the number of unoccupied registers are equivalent and are merely differences in representation.

【００２８】［第１〜第３のステップの詳細］以下、上
述した本発明に係るコンパイラの第１〜第３のステップ
の処理内容を詳細に説明する。[Details of First to Third Steps] Hereinafter, the processing contents of the first to third steps of the compiler according to the present invention will be described in detail.

【００２９】［第１のステップの詳細］図１は、ＤＡＧ
を例示する図である。下表１に例示するように、ループ
を含むソースプログラムからは、図１に示す依存ＤＡＧ
（ＤＡＧは、図１の頂点と矢印の集合として定義され
る）が得られる。[Details of First Step] FIG.
FIG. As exemplified in Table 1 below, a source program including a loop is supplied with the dependency DAG shown in FIG.
(DAG is defined as the set of vertices and arrows in FIG. 1).

【００３０】[0030]

【表１】 [Table 1]

【００３１】まず、上述のように、本発明に係るコンパ
イラは、以下のように、ソースコード（プログラム：表
１）を、オブジェクトコード生成処理の対象となるＤＡ
Ｇ（図１）の形式で表現する。本発明に係るコンパイラ
は、プログラム中の演算子（＋，＊，ＬＯＡＤ）を頂点
ｖ∈Ｖ（図１の白ぬきの丸のそれぞれ）、また、演算
子の間のデータの定義と使用の依存関係および演算が引
き越す副作用(たとえば、例外の発生およびメインメモ
リへの書き込み)がプログラムに記載された通りに発動
することを保証するための依存関係を、頂点間を接続す
る有向辺ｅ＝（ｖ、ｗ）∈Ｅ（図１の矢印のそれぞれ）
と表し、ループによる依存を無視して、プログラムを、
閉路のない有向グラフ（ＤＡＧ）Ｇ＝（Ｖ，Ｅ）と表
す。この有向辺ｅの方向は、各頂点（命令または演算
子）のデータ依存関係を示すとともに、ＤＡＧに含まれ
る各頂点それぞれから生成されたオブジェクトコードの
実行順序をも示している。First, as described above, the compiler according to the present invention converts a source code (program: Table 1) into a DA code to be subjected to object code generation processing as follows.
Expressed in the form of G (FIG. 1). The compiler according to the present invention defines the operators (+, *, LOAD) in the program as vertices v∈V (each of the white circles in FIG. 1), and also depends on the definition and use of data between the operators. The dependencies to ensure that the relations and operations carry over (e.g., exceptions and writes to main memory) are triggered as described in the program, and the directed edge e = (V, w) ∈E (each arrow in FIG. 1)
And ignore the loop dependency,
A directed graph (DAG) without a cycle is represented by G = (V, E). The direction of the directed side e indicates the data dependency of each vertex (instruction or operator), and also indicates the execution order of the object code generated from each vertex included in the DAG.

【００３２】さらに、本発明に係るコンパイラは、ある
頂点を指す有向辺の源となる頂点（先行する頂点）を持
たない頂点全てに先行する仮想的な頂点トップ┬と、あ
る頂点を指す有向辺の先となる頂点（後継する頂点）を
持たない全ての頂点に後継する仮想的な頂点ボトム┴と
を、有向グラフ（ＤＡＧ）Ｇ＝（Ｖ，Ｅ）に追加する。
なお、データ依存をあらわす有向辺（図１の実線の矢
印）は、演算子が先行する頂点の演算結果を参照すると
きに、プロセッサのレジスタを使うことを表し、各頂点
ｖには、演算を実行するために必要なマシンサイクル
数ｃｙ（ｖ）が割り当てられており、本発明に係るコ
ンパイラは、これらの情報を、実行サイクルを考慮した
コード生成を行うために使用する。Further, the compiler according to the present invention provides a virtual vertex top that precedes all vertices that do not have a vertex (preceding vertex) that is a source of a directed edge indicating a certain vertex, and a vertex that indicates a certain vertex. A virtual vertex bottom ┴ to be added to all vertices that do not have vertices (succeeding vertices) preceding the opposite side is added to the directed graph (DAG) G = (V, E).
Note that a directed edge (solid arrow in FIG. 1) indicating data dependence indicates that the operator uses the processor register when referring to the operation result of the preceding vertex. Are assigned the number of machine cycles cy (v) required to execute the above, and the compiler according to the present invention uses these pieces of information to generate a code in consideration of the execution cycle.

【００３３】［第２のステップの詳細］次に、本発明に
係るコンパイラは、図１に示したＤＡＧの各頂点につい
て、コード生成で使用されるパラメータを計算する。グ
ラフ構造から数値化できるパラメータで、コードスケジ
ューリングによる最適化のための情報は、トップからの
深さ、ボトムからの深さおよびスラックネス（slacknes
s）である。使用レジスタ数についての情報は頂点の参
照数に反映される。ある頂点ｖのトップからの深さｌｔ
（ｖ）、ボトムからの深さｌｂ（ｖ）、スラックネスｓ
ｌ（ｖ）および参照数（参照数の増加Δｒｅｆ（ｖ））
の意味は、それぞれ以下に示す通りである。[Details of Second Step] Next, the compiler according to the present invention calculates, for each vertex of the DAG shown in FIG. 1, parameters used in code generation. Parameters that can be quantified from the graph structure. Information for optimization by code scheduling includes depth from the top, depth from the bottom, and slackness (slacknes).
s). Information on the number of used registers is reflected in the number of reference vertices. The depth lt from the top of a vertex v
(V), depth lb (v) from bottom, slackness s
l (v) and number of references (increase in number of references Δref (v))
Has the following meanings.

【００３４】［トップからの深さ］図１に示した有向グ
ラフ（ＤＡＧ）Ｇ＝（Ｖ，Ｅ）において、ある頂点ｖの
トップからの深さとは、トップ┬から、有向辺を辿って
その頂点ｖにたどり着くまでに通る各頂点のマシンサイ
クルの累加算値の最大値として定義される。言い換える
と、トップからの深さとは、ＤＡＧで表されたプログラ
ムから生成されたオブジェクトコードを実行するプロセ
ッサ（ターゲットアーキテクチャ）が、先頭のオブジェ
クトコードの実行を開始ししてから、最大、ターゲット
アーキテクチャのマシンサイクルの何サイクル後に、そ
の頂点ｖの演算子に対応するオブジェクトコードを実行
するかを示す。[Depth from Top] In the directed graph (DAG) G = (V, E) shown in FIG. 1, the depth from the top of a vertex v is determined by tracing the directed edge from the top ┬. It is defined as the maximum value of the cumulative addition value of the machine cycle of each vertex that passes before reaching the vertex v. In other words, the depth from the top refers to the maximum of the target architecture after the processor (target architecture) that executes the object code generated from the program represented by the DAG starts executing the top object code. The number of machine cycles after which the object code corresponding to the operator at the vertex v is executed.

【００３５】ＤＡＧ（図１）を、トップ┬から、深さ優
先探索（reverse post-order）により走査した場合に
は、ある頂点ｖを走査する前に、必ず、その頂点ｖに先
行する全ての頂点が走査される。従って、ある頂点ｖ
のトップからの深さｌｔ（ｖ）は、トップ┬に近い頂点
ｖから順に、以下の手続１−１，１−２により計算する
ことができる。When the DAG (FIG. 1) is scanned from the top ┬ by the depth-first search (reverse post-order), before scanning a certain vertex v, all the preceding vertices v must be read. Vertices are scanned. Therefore, a vertex v
Can be calculated in order from the vertex v close to the top により by the following procedures 1-1 and 1-2.

【００３６】手続１−１：ある頂点ｖに、先行する頂点
がない場合には、その頂点ｖのトップからの深さｌｔ
（ｖ）＝０。手続１−２：ある頂点ｖに、先行する頂点がある場合
は、その頂点ｖのトップからの深さｌｔ（ｖ）は、頂点
ｖに先行する頂点を頂点ｐとして、ｌｔ（ｖ）＝まｘ
（ｌｔ（ｐ）＋ｃｙ（ｐ））。Procedure 1-1: If a certain vertex v has no preceding vertex, the depth lt from the top of the vertex v
(V) = 0. Procedure 1-2: If a vertex v has a preceding vertex, the depth lt (v) from the top of the vertex v is represented by lt (v) = matrix with the vertex preceding the vertex v as the vertex p. x
(Lt (p) + cy (p)).

【００３７】［ボトムからの深さ］ある頂点ｖのボトム
からの深さｌｂ（ｖ）とは、ボトム┴から、ＤＡＧ（図
１）の有向辺を逆向きに辿って、その頂点ｖにたどり着
くまでに通る頂点のマシンサイクルの和の最大値として
定義される。つまり、ボトムからの深さｌｂ（ｖ）は、
ＤＡＧで表されたプログラムから生成されたオブジェク
トコードをターゲットアーキテクチャで実行するとき、
すべてのオブジェクトコードの実行の終了時点から、最
大、何サイクル前にその頂点ｖの演算子に対応するオブ
ジェクトコードの実行が開始されているかを示す。[Depth from the Bottom] The depth lb (v) from the bottom of a certain vertex v is defined as the direction from the bottom ┴ to the directed side of the DAG (FIG. 1) in the opposite direction. It is defined as the maximum sum of the machine cycles of the vertices to be reached before reaching. That is, the depth lb (v) from the bottom is
When executing object code generated from a program represented by DAG on a target architecture,
It indicates how many cycles before the execution of the object code corresponding to the operator of the vertex v has been started from the end of execution of all the object codes.

【００３８】頂点ｖのボトムからの深さｌｂ（ｖ）は、
トップからの深さｌｔ（ｖ）と同様に、有向辺を逆向き
にしたＤＡＧをボトムから深さ優先探索（reverse post
-order）で走査して計算することができる。ボトムから
の深さｌｂ（ｖ）は、ボトム┴に近い頂点ｖから順に、
以下の手続２−１，２−２で計算することができる。The depth lb (v) of the vertex v from the bottom is
Similar to the depth lt (v) from the top, a DAG with the directed side reversed is depth-first search from the bottom (reverse post
-order). The depth lb (v) from the bottom is, in order from the vertex v near the bottom ┴,
It can be calculated by the following procedures 2-1 and 2-2.

【００３９】手続２−１：ある頂点ｖが後継する頂点を
有さない場合にはｌｂ（ｖ）＝０。手続２−２：ある頂
点ｖが後継する頂点ｓを有する場合には、ボトムからの
深さｌｂ（ｖ）＝ｍａｘ（ｌｂ（ｓ））＋ｓｙ（ｓ）。Procedure 2-1: If a vertex v has no successor vertex, lb (v) = 0. Procedure 2-2: If a vertex v has a successor vertex s, the depth lb (v) from the bottom = max (lb (s)) + sy (s).

【００４０】［スラックネス］ＤＡＧ（図１）のトップ
┴から、ボトムへの深さをＤＡＧのクリティカルパス長
と呼ぶ。つまり、クリティカルパス長は、ＤＡＧで表さ
れたプログラム全体を実行し終わるのに、最小、何サイ
クルかかるかを表している。クリティカルパス長をｃｐ
（Ｇ）とすると、各頂点ｖのスラックネスｓｌ（ｖ）
は、ｓｌ（ｖ）＝ｃｐ（Ｇ）−ｌｔ（ｖ）−ｌｂ（ｖ）
と定義される。つまり、スラックネスｓｌ（ｖ）は、オ
ブジェクトコードを生成するときの余裕を表している。
スラックネスｓｌ（ｖ）＝０の頂点ｖは、トップ┬か
ら、その頂点ｖを通ってボトムに至るまで、最も多くの
有向辺を経由するパス(クリティカルパス)上にある。[Slackness] The depth from the top to the bottom of the DAG (FIG. 1) is called the critical path length of the DAG. In other words, the critical path length indicates the minimum number of cycles it takes to execute the entire program represented by DAG. Critical path length is cp
(G), the slackness sl (v) of each vertex v
Is sl (v) = cp (G) -lt (v) -lb (v)
Is defined as That is, the slackness sl (v) represents a margin when the object code is generated.
The vertex v with slackness sl (v) = 0 is on the path (critical path) passing through the most directional edges from the top ┬ to the bottom through the vertex v.

【００４１】スラックネスｓｌ（ｖ）が大きい頂点ｖ
は、スラックネスｓｌ（ｖ’）がｓｌ（ｖ）よりも小さ
い頂点ｖ’よりも優先順位を下げてオブジェクトコード
生成を後回しにしても、優先順位を下げたことを理由と
して、その頂点ｖに依存する他の頂点のオブジェクトコ
ード生成が遅れる可能性が低いことを表している。Vertex v having large slackness sl (v)
Depends on the vertex v because slackness sl (v ') is lower in priority than vertex v' smaller than sl (v), and object code generation is postponed. It is unlikely that the generation of object codes for other vertices will be delayed.

【００４２】［参照数の増加］参照数の増加Δｒｅｆ
（ｖ）は、その頂点ｖに対応するオブジェクトコードの
実行を開始する際に使用状態が終了し、新たな使用のた
めに解放することができるレジスタの数と、新たに必要
になるレジスタの数を見積もるために用いられる。頂点
ｖの参照数の増加 Δｒｅｆ（ｖ）は、その頂点ｖに後
継する有向辺の数から、その頂点ｖに先行する有向辺の
数を引いたものである。頂点ｖの参照数の増加Δｒｅｆ
（ｖ）の値が負であるということは、先行する頂点に対
応するオブジェクトコードの実行のために用いられてい
たレジスタの使用状態が終了する可能性が高いことを示
す。反対に、頂点ｖの参照数の増加Δｒｅｆ（ｖ）の値
が正であるということは、先行する頂点に対応するオブ
ジェクトコードの実行のために用いられていなかったレ
ジスタを、さらに使用する可能性が高いことを示す。[Increase in the number of references] Increase in the number of references Δref
(V) indicates the number of registers that have been used when the execution of the object code corresponding to the vertex v is started and can be released for new use, and the number of registers newly required. Is used to estimate The increase in the reference number of the vertex v Δref (v) is obtained by subtracting the number of the directed sides preceding the vertex v from the number of the directed sides succeeding the vertex v. Increase in reference number of vertex v Δref
The fact that the value of (v) is negative indicates that the use state of the register used for executing the object code corresponding to the preceding vertex is likely to end. Conversely, a positive value of the increase in the reference number Δref (v) of the vertex v means that the register that has not been used for executing the object code corresponding to the preceding vertex may be further used. Is higher.

【００４３】［第３のステップの詳細］第３のステップ
において、本発明に係るコンパイラは、上述した第２の
ステップにおいて計算したパラメータ（ｌｔ（ｖ），ｌ
ｂ（ｖ），ｓｌ（ｖ），Δｒｅｆ（ｖ））を使って、Ｄ
ＡＧ（図１）トップ┬から順に各頂点を走査して、オブ
ジェクトコードの生成を行う。[Details of Third Step] In the third step, the compiler according to the present invention uses the parameters (lt (v), l) calculated in the above-described second step.
b (v), sl (v), Δref (v)) to obtain D
Each vertex is scanned in order from the top # of the AG (FIG. 1) to generate an object code.

【００４４】［データ構造］このオブジェクトコード生
成中に、本発明に係るコンパイラは、実際の使用レジス
タ数を計算して、オブジェクトコード生成の方針を切り
替える。なお、本発明に係るコンパイラは、第３のステ
ップの実行のために以下の各データ／リストａ(Ｇ)，ｒ
（Ｇ），ｕ（Ｇ），ｒｅｇ，ａｐ（ｖ），ａｕ（ｖ）を
用い、これらのデータ／リスト（データ構造）は、オブ
ジェクトコード生成処理の各時点における（現在の）オ
ブジェクトコード生成の状態を示す。[Data Structure] During the generation of the object code, the compiler according to the present invention calculates the actual number of registers used and switches the object code generation policy. Note that the compiler according to the present invention executes the following data / lists a (G), r for executing the third step.
(G), u (G), reg, ap (v), au (v) are used, and these data / lists (data structures) are used for the (current) object code generation at each point in the object code generation process. Indicates the status.

【００４５】ａ（Ｇ）：既にオブジェクトコードが生成
された頂点の列、ｒ（Ｇ）：次にオブジェクトコードを生成することがで
きる頂点の列、ｕ（Ｇ）：未だオブジェクトコードが生成されていない
頂点の集合、ｒｅｇ：現在使用可能なターゲットアーキテクチャの
物理レジスタの数。A (G): a sequence of vertices for which an object code has already been generated, r (G): a sequence of vertices for which an object code can be generated next, u (G): an object code has yet to be generated Set of missing vertices, reg: number of physical registers of the target architecture currently available.

【００４６】なお、オブジェクトコードの生成を開始す
る前の初期状態では、ａ（Ｇ）は空のリストであり、ｒ
（Ｇ）はトップ┬だけを含むリストとなり、ｕ（Ｇ）
は、トップ┬以外の全ての頂点を含む集合となり、ｒｅ
ｇはターゲットアーキテクチャのレジスタのうち、ＤＡ
Ｇ（図１）から生成されるオブジェクトコードの実行に
使用できるレジスタの数となる。In the initial state before the generation of the object code is started, a (G) is an empty list, and r (G) is an empty list.
(G) becomes a list containing only the top ┬, and u (G)
Is a set containing all vertices except top ┬, and re
g is DA of the registers of the target architecture.
This is the number of registers that can be used to execute the object code generated from G (FIG. 1).

【００４７】また、以下のデータ構造は、まだオブジェ
クトコードの生成が終了していない各頂点の状態を表
す。The following data structure indicates the state of each vertex for which the generation of the object code has not been completed.

【００４８】ａｐ（ｖ）：ＤＡＧにおいて頂点ｖに先行
する頂点であって、オブジェクトコードの生成が終了し
た頂点の数、ａｕ（ｖ）：頂点ｖに対して有向辺を介して後継する頂
点であって、オブジェクトコードの生成が終了した頂点
の数。Ap (v): the number of vertices preceding the vertex v in the DAG, for which the generation of the object code has been completed, and au (v): the vertex succeeding the vertex v via a directed edge. And the number of vertices for which object code generation has been completed.

【００４９】ａｐ（ｖ）の初期値は、ＤＡＧにおいて、
頂点ｖに先行する有向辺の数であり、ａｕ（ｖ）の初期
値は、頂点ｖに有向辺を介して後継する頂点の数とな
る。The initial value of ap (v) is as follows in DAG:
This is the number of directed sides preceding vertex v, and the initial value of au (v) is the number of vertices succeeding vertex v via directed sides.

【００５０】［オブジェクトコード生成アルゴリズム］
本発明に係るコンパイラは、以下の手続３−１〜３−８
により示されるアルゴリズムによりオブジェクトコード
を生成する。[Object Code Generation Algorithm]
The compiler according to the present invention performs the following procedures 3-1 to 3-8.
The object code is generated by the algorithm indicated by.

【００５１】手続３−１：ｒ（Ｇ）の先頭の頂点ｖを取
り出し、ａ（Ｇ）の最後に加える。手続３−２：頂点ｖに後継する各頂点ｓ∈ｕ（Ｇ）に対
応する有向辺について、ａｐ（ｓ）の値をａｐ（ｓ）：
＝ａｐ（ｓ）−１に更新する。手続３−３：手続３−２において、ａｐ（ｗ）＝０を与
える頂点ｗがあれば、ｒ（Ｇ）に追加する。手続３−４：4) 頂点ｖに先行する各頂点ｐ∈ａ（Ｇ）
について、ａｕ（ｐ）の値をａｕ（ｐ）：＝ａｕ（ｐ）
−１に更新する。手続３−５：ｒｅｇの値を、ｒｅｇ：＝ｒｅｇ＋Δｒｅ
ｇ（ｖ）に更新する。手続３−６：ｒ（Ｇ）の各頂点ｖについて、使用可能な
レジスタ数の増加Δｒｅｇ（ｖ）を算出する。手続３−７：ｒ（Ｇ）が空でなければ、優先順位に従っ
てｒ（Ｇ）の頂点を並び替え、手続３−１に戻る。つま
り、ｒ（Ｇ）が空でなければ、現在使用可能なレジスタ
数が閾値より多いときには、下記の「使用可能レジスタ
数ｒｅｇが閾値ｔｈｒより多い場合」に示すようにで与
えられる優先順位にしたがってｒ（Ｇ）の頂点を並べか
え、現在使用可能なレジスタ数が閾値より少ないときに
は、下記の「使用可能レジスタ数ｒｅｇが閾値ｔｈｒ以
下の場合」に示すように与えられる優先順位にしたがっ
てｒ（Ｇ）の頂点を並べ替え、手続き３−１に戻る。手
続３−８：ｒ（Ｇ）が空ならば処理を終了する。Procedure 3-1: The top vertex v of r (G) is extracted and added to the end of a (G). Procedure 3-2: For the directed side corresponding to each vertex s∈u (G) succeeding vertex v, change the value of ap (s) to ap (s):
= Ap (s) -1. Procedure 3-3: In procedure 3-2, if there is a vertex w that gives ap (w) = 0, it is added to r (G). Procedure 3-4: 4) Each vertex p @ a (G) preceding vertex v
, The value of au (p) is given by au (p): = au (p)
Update to -1. Procedure 3-5: Change reg value to reg: = reg + Δre
Update to g (v). Procedure 3-6: For each vertex v of r (G), calculate an increase Δreg (v) in the number of available registers. Procedure 3-7: If r (G) is not empty, rearrange the vertices of r (G) according to the priority and return to procedure 3-1. In other words, if r (G) is not empty and the number of currently available registers is larger than the threshold, according to the priority given by the following "when the number of available registers reg is larger than the threshold thr", The vertices of r (G) are rearranged, and when the number of registers currently available is smaller than the threshold, r (G) is given according to the priority given as shown in the following “when the number of available registers reg is less than or equal to the threshold thr” Are rearranged, and the procedure returns to the procedure 3-1. Procedure 3-8: If r (G) is empty, end the process.

【００５２】なお、上記手続３−５および手続３−６に
おいて使用される値Δｒｅｇ（ｖ）は、ターゲットアー
キテクチャが、ＤＡＧ（図１）の頂点ｖに対応するオブ
ジェクトコードを実行する場合に、その時点で（現在）
使用可能になるターゲットアーキテクチャのレジスタ数
を示す。換言すると、値Δｒｅｇ（ｖ）は、頂点（ｖ）
のオブジェクトコードが実行された時点で増加する使用
可能なレジスタの数を示す。It should be noted that the value Δreg (v) used in the above procedures 3-5 and 3-6 is obtained when the target architecture executes the object code corresponding to the vertex v of the DAG (FIG. 1). At the moment (current)
Indicates the number of registers of the target architecture that can be used. In other words, the value Δreg (v) is
Shows the number of available registers which increases when the object code of the above is executed.

【００５３】値Δｒｅｇ（ｖ）は、Δｒｅｇ（ｖ）＝
（頂点ｖのオブジェクトコードの実行により解放される
レジスタの数）−（頂点ｖのオブジェクトコードの実行
のために確保されるレジスタの数）として定義され、頂
点ｖのオブジェクトコードの実行により解放されるレジ
スタの数は、頂点ｖに先行する各頂点ｐのうち、ａｕ
（ｐ）＝１を与える頂点の数として定義され、頂点ｖの
オブジェクトコードの実行のために確保されるレジスタ
（頂点ｖの演算子が生成する値を格納するために必要な
レジスタ）の数は、頂点ｖが値をレジスタに生成する場
合は１、そうでなければ０である。The value Δreg (v) is given by Δreg (v) =
(The number of registers released by the execution of the object code of the vertex v) − (the number of registers reserved for the execution of the object code of the vertex v) and is released by the execution of the object code of the vertex v The number of registers is au of each vertex p preceding vertex v.
(P) = 1 is defined as the number of vertices giving 1, and the number of registers (registers required to store the value generated by the operator of vertex v) reserved for execution of the object code of vertex v is , 1 if vertex v produces a value in the register, 0 otherwise.

【００５４】また、上記手続３−７において、現在使用
可能なレジスタ数ｒｅｇが、一定の閾値ｔｈｒより大き
いか否かにより、ｒ（Ｇ）に含まれる頂点を整列する方
針を変更する。これにより、ターゲットアーキテクチャ
の物理レジスタ数の制約によって、オブジェクトコード
生成の方針を動的に切り替えることができる。In the above procedure 3-7, the policy for aligning the vertices included in r (G) is changed depending on whether or not the currently available register number reg is larger than a predetermined threshold thr. Thereby, the policy of object code generation can be dynamically switched according to the restriction on the number of physical registers of the target architecture.

【００５５】［使用可能レジスタ数ｒｅｇが閾値ｔｈｒ
より多い場合］優先順位が高い方から順番にオブジェク
トコードの生成を行うために、ＤＡＧ（図１）に含まれ
る頂点それぞれに対して優先順位を与える。使用可能な
レジスタ数が閾値より多い場合（ｒｅｇ＞ｔｈｒ）、本
発明に係るコンパイラは、現在使用可能なレジスタの数
と閾値との代償関係に応じて、優先順位を、以下の規則
ａ，ｂに従って与える。[The number of available registers reg is equal to the threshold thr
In the case of more than one) In order to generate object codes in order from the highest priority, a priority is given to each vertex included in the DAG (FIG. 1). When the number of available registers is larger than the threshold value (reg> thr), the compiler according to the present invention determines the priority according to the following rules a and b according to the compensation relation between the number of currently available registers and the threshold value. Give according to.

【００５６】規則ａ：スラックネスの小さい方の頂点の
優先順位を高くする。規則ｂ：スラックネスが同じ場合は、トップからの深さ
が小さいほうの頂点の優先順位を高くする。Rule a: Increase the priority of the vertex with the smaller slackness. Rule b: When the slackness is the same, the priority of the vertex having the smaller depth from the top is set higher.

【００５７】つまり、使用可能なレジスタ数が閾値より
多い（ｒｅｇ＞ｔｈｒ）ということは、使用可能なレジ
スタが充分にあり、数に余裕があるので、ＤＡＧにおけ
るクリティカルパス上にある頂点に高い優先順位を付
し、多くのレジスタを使用して、コードスケジューリン
グによる最適化を行って演算器の遅延を隠蔽できるよう
に、オブジェクトコード生成を行う。In other words, the fact that the number of available registers is larger than the threshold value (reg> thr) means that there are sufficient available registers and there is room in the number, so that a vertex on the critical path in the DAG has a higher priority. An object code is generated by assigning a rank and optimizing by code scheduling by using many registers to conceal a delay of a computing unit.

【００５８】［使用可能なレジスタ数ｒｅｇが閾値ｔｈ
ｒ以下の場合］使用可能なレジスタ数が閾値以下の場合
（ｒｅｇ≦ｔｈｒ）、本発明に係るコンパイラは、優先
順位を、以下の規則ｃ〜ｅに従って与える。[The number of available registers reg is equal to the threshold th
When the number of available registers is equal to or less than the threshold (reg ≦ thr), the compiler according to the present invention assigns priorities according to the following rules c to e.

【００５９】規則ｃ：Δｒｅｇ（ｖ）が大きい頂点の優
先順位を高くする。規則ｄ：Δｒｅｇ（ｖ）が同じ頂点同士では、Δｒｅｆ
（ｖ）が小さい頂点の優先順位を高くする。規則ｅ：規則ｃ，ｄにより優先順位が決まらない場合
は、レジスタ数が閾値より大きい場合と同様に、スラッ
クネスとトップからの深さを使って優先順位を決める。Rule c: The priority of a vertex having a large Δreg (v) is set higher. Rule d: For vertices having the same Δreg (v), Δref
The priority of a vertex with a small (v) is increased. Rule e: If the priority is not determined by rules c and d, the priority is determined using slackness and depth from the top, as in the case where the number of registers is larger than the threshold.

【００６０】使用可能なレジスタ数が閾値以下（ｒｅｇ
≦ｔｈｒ）ということは、使用可能なレジスタ数が少な
く、余裕がないことを示し、なるべくレジスタを多く解
放し、新しく確保しないような頂点の優先順位を高くし
て、より早い順番でオブジェクトコードを生成する。つ
まり、使用可能なレジスタ数ｒｅｇを増やす頂点や、値
が多くの頂点に参照されないような頂点の優先順位を高
くしてオブジェクトコードの生成を行う。The number of available registers is equal to or smaller than the threshold (reg)
≤ thr) indicates that the number of available registers is small and that there is no room, the registers are released as much as possible, the priority of vertices that are not newly secured is increased, and the object code is Generate. That is, the object code is generated by increasing the priority of a vertex that increases the number of available registers reg or a vertex whose value is not referred to by many vertices.

【００６１】なお、このような優先順位の付与のために
は、ｒ（Ｇ）の中から、次に生成すべき頂点によるレジ
スタ数の増減を、一つだけ先読みするとよい。複数の頂
点の列に対して先読みを拡張してもよいが、検討すべき
組み合わせを増やすので、コンパイル時間と最適化の効
果とをトレードオフする必要がある。つまり、本発明に
係るコンパイラは、ｒ（Ｇ）の中から次に生成すべき頂
点を選んだときのレジスタ数の増減を計算して優先順位
を決定している。アルゴリズムを、複数の頂点の列を順
に選んだときのレジスタ数の増減を計算するように拡張
すれば、より望ましい解が得られる可能性があるが、コ
ンパイル時間と最適化のトレードオフを考える必要があ
る。In order to assign such priorities, it is preferable to pre-read only one increase / decrease in the number of registers due to the next vertex to be generated from r (G). Although look-ahead may be extended for a plurality of vertex columns, it is necessary to trade off compile time and optimization effect because the number of combinations to be considered is increased. That is, the compiler according to the present invention determines the priority by calculating the increase or decrease in the number of registers when the next vertex to be generated is selected from r (G). Extending the algorithm to calculate the increase or decrease in the number of registers when a sequence of vertices is selected in sequence may provide a more desirable solution, but it is necessary to consider the trade-off between compile time and optimization There is.

【００６２】規則ｅにより優先順位を決める場合には、
使用可能なレジスタの数は、さらに少なくなる。このよ
うな場合、共通部分式削除の逆変換を行うと、使用可能
なレジスタ数ｒｅｇを増やすことができる。まだオブジ
ェクトコード生成されていない頂点はＤＡＧで表されて
いるので、本発明に係るコンパイラは、ＤＡＧを変形し
て逆変換を行う。When determining the priority order according to rule e,
The number of available registers is even smaller. In such a case, when the inverse transformation of common subexpression elimination is performed, the number of available registers reg can be increased. Since vertices for which object code has not been generated are represented by DAG, the compiler according to the present invention transforms DAG and performs inverse transformation.

【００６３】［共通部分式削除の逆変換の手順］本発明
に係るコンパイラは、共通部分式削除の逆変換を、以下
の手続４−１〜４−３により行う。[Inverse Conversion of Common Subexpression Deletion] The compiler according to the present invention performs the inverse conversion of common subexpression deletion according to the following procedures 4-1 to 4-3.

【００６４】手続４−１：ａ（Ｇ）中の頂点ｐからｒ
（Ｇ）およびｕ（Ｇ）中の頂点ｖに対して張られている
有向辺ｅ∈Ｅを探す。手続４−２：手続４−１において見つかった有向辺ｅ＝
（ｐ，ｖ）に先行する頂点ｐの内、頂点ｐの演算を、そ
の時点でレジスタに格納されている値のみを使って計算
可能な頂点を探す。手続４−３：先行する頂点ｐを複製して頂点ｐ’を作り
（先行する頂点ｐと同じ演算子、同じ先行点、同じ後継
点を持つ頂点ｐ’を生成し）、ｕ（Ｇ）に加えることに
より、有向辺ｅ＝（ｐ，ｖ）に先行する頂点を変更して
ｅ＝（ｐ，ｖ）とする。なお、共通部分式削除は、ＤＡ
Ｇ上では、同じ値を生成する複数の頂点を一つの頂点で
表し、後継する頂点をその頂点へ依存させる変換であ
り、手続き４−１、４−３では、共通部分式削除の結果
一つの頂点で生成されていた値を、もう一度同じ値を生
成する頂点を作ることによって、複数の頂点で生成する
ことで、共通部分式削除の逆変換を行っている。Procedure 4-1: From vertex p to r in a (G)
(G) and directed edge eεE stretched to vertex v in u (G). Procedure 4-2: directed side e = found in procedure 4-1
Among the vertices p preceding (p, v), a search is made for a vertex that can be calculated using only the value stored in the register at that time. Procedure 4-3: Duplicate the preceding vertex p to create a vertex p ′ (generate a vertex p ′ having the same operator, the same leading point, and the same successor as the preceding vertex p), and u (G) With this addition, the vertex preceding the directed side e = (p, v) is changed to e = (p, v). Note that common subexpression elimination is DA
On G, a plurality of vertices that generate the same value are represented by one vertex, and the succeeding vertex is dependent on the vertex. In procedures 4-1 and 4-3, one common vertex is deleted as a result. The inverse transformation of common subexpression elimination is performed by creating a vertex that generates the same value once again by generating a vertex that generates the same value at a plurality of vertices.

【００６５】ａｕ（ｐ）の値を、ａｕ（ｐ）：＝ａｕ
（ｐ）−１に更新し、ａｕ（ｐ）の値が０になれば、使
用可能なレジスタ数ｒｅｇを１個増やすことができる。
ｒｅｇを１増やすことができる理由は、先行する頂点の
値を使用している頂点がすべてコード生成されたので、
先行する頂点値はこれ以降使用されない。したがって、
レジスタを一個解放することができるためである。つま
り、ＤＡＧ上で共通部分式の削除を行うと、新しいデー
タ依存の辺が導入される。ＤＡＧ上で共通部分式の削除
を行うと、先行する頂点の値が複数の後継する頂点に共
有される。値が参照されていることは、ＤＡＧの辺で表
現されるので、コンパイラが、ＤＡＧを変形してデータ
依存の辺を導入する。使用可能なレジスタ数が不足した
ときに、共通部分式削除で導入された辺を選び、上の手
続４−１〜４−３に従って逆変換を行うと、使用可能な
レジスタ数が足りない部分では共通部分式削除の最適化
がキャンセルされ、使用可能なレジスタ数に余裕がある
ような部分にだけ最適化を適用することができる。The value of au (p) is expressed as au (p): = au
When the value is updated to (p) −1 and the value of au (p) becomes 0, the number of available registers reg can be increased by one.
The reason that reg can be increased by one is that all vertices using the value of the preceding vertex have been code generated,
The preceding vertex value is no longer used. Therefore,
This is because one register can be released. That is, when the common subexpression is deleted on the DAG, a new data-dependent edge is introduced. When the common subexpression is deleted on the DAG, the value of the preceding vertex is shared by a plurality of succeeding vertices. Since the fact that a value is referred to is represented by an edge of the DAG, the compiler transforms the DAG to introduce a data-dependent edge. When the number of available registers is insufficient, an edge introduced by common subexpression elimination is selected, and inverse conversion is performed according to the above procedures 4-1 to 4-3. The optimization of the common subexpression elimination is canceled, and the optimization can be applied only to a portion where the number of available registers has room.

【００６６】［本発明に係るコンパイラによる最適化の
具体例］以下、具体例として、本発明に係るコンパイラ
が、ソースプログラム（上記表１）から、そのループボ
ディ（プログラム中で繰り返して実行される部分）を表
すＤＡＧ（図１）を生成し、さらに、ＤＡＧから、イン
テル社他が製造するプロセッサ（ｘ８６アーキテクチ
ャ）およびモトローラ社が製造するプロセッサ（Ｐｏｗ
ｅｒＰＣアーキテクチャ）をターゲットアーキテクチャ
として、これらのプロセッサが実行可能なオブジェクト
コードを生成する場合の最適化結果をに対してコード生
成を行ったときの結果を説明する。[Specific Example of Optimization by Compiler According to the Present Invention] As a specific example, the compiler according to the present invention is repeatedly executed from a source program (Table 1 above) in a loop body (in a program). A DAG (FIG. 1) is generated from the DAG, and a processor manufactured by Intel and others (x86 architecture) and a processor manufactured by Motorola (Pow) are generated from the DAG.
(erPC architecture) as a target architecture, a description will be given of the result of code generation with respect to the optimization result when these processors generate executable object code.

【００６７】なお、ソースプログラム（表１）におい
て、ｉ，ｎは整数の変数を示し、ｔ１〜ｔ８，ｘ１〜ｘ
４は浮動小数点数の変数をしめし、ａ［ｉ］は配列アク
セスを表し、ターゲットアーキテクチャのロード命令に
対応する。説明の簡略化のために、以下、本発明に係る
コンパイラの動作を、浮動小数点数のレジスタと演算器
を使用する命令のコード生成のみに絞って説明する。In the source program (Table 1), i and n indicate integer variables, and t1 to t8, x1 to x
Reference numeral 4 denotes a floating-point variable, and a [i] represents an array access and corresponds to a load instruction of the target architecture. For the sake of simplicity, the operation of the compiler according to the present invention will be described below focusing only on the code generation of instructions using a floating-point number register and an arithmetic unit.

【００６８】上述の通り、表１に例示したソースプログ
ラムは、図１に示したをＤＡＧに表すことができる。こ
のＤＡＧにおいて、頂点間を結ぶ実線の有向辺は、演算
の間のデータの依存を表し、点線の有向辺は副作用の順
序による依存関係を表す。ここで、プログラミング言語
のモデルから、メモリアクセスの副作用の順序が守られ
ていなければならない場合には、変数ｘ１〜ｘ４は、ル
ープの外からの値を参照し、繰り返しの次の実行で値が
参照されるので、ループの繰り返しの間で値を保持す
る。従って、これらの変数ｘ１〜ｘ４それぞれは、浮動
小数点数レジスタを１個ずつ占有する。したがって、こ
のソースプログラムのループボディのオブジェクトコー
ドの生成開始時には、４個のレジスタが使用される。As described above, the source program exemplified in Table 1 can be represented by DAG shown in FIG. In this DAG, the directed side of the solid line connecting the vertices represents the dependence of the data during the operation, and the directed side of the dotted line represents the dependence in the order of the side effects. Here, if the order of the side effects of the memory access must be maintained from the programming language model, the variables x1 to x4 refer to values from outside the loop, and the values are changed in the next execution of the iteration. Keep the value between iterations of the loop because it is referenced. Therefore, each of these variables x1 to x4 occupies one floating-point number register. Therefore, when the generation of the object code of the loop body of the source program starts, four registers are used.

【００６９】［ｘ８６アーキテクチャ］以下、レジスタ
数が閾値より少ないという条件でコード生成が行われる
ｘ８６アーキテクチャについて説明する。ｘ８６アーキ
テクチャにおいては、レジスタをオペランドとするとき
の浮動小数点数のロード命令、乗算命令、加算命令のサ
イクル数は１である。各頂点のトップからの深さｌｔ
（ｖ）、ボトムからの深さｌｂ（ｖ）、スラックネスｓ
ｌ（ｖ）、参照数の増加Δｒｅｆ（ｖ）を、本発明に係
るコンパイラは、下表２に示すように計算する。[X86 Architecture] Hereinafter, the x86 architecture in which code generation is performed under the condition that the number of registers is smaller than the threshold will be described. In the x86 architecture, when a register is used as an operand, the cycle number of a load instruction of a floating-point number, a multiplication instruction, and an addition instruction is one. Depth lt from top of each vertex
(V), depth lb (v) from bottom, slackness s
The compiler according to the present invention calculates l (v) and the increase Δref (v) of the number of references as shown in Table 2 below.

【００７０】[0070]

【表２】 [Table 2]

【００７１】ｘ８６アーキテクチャは８個の浮動小数点
数レジスタを持つ。このループ全体で４個のレジスタが
占有されることが分かっているので、最初の状態で使用
可能なレジスタ数は４（＝８−４）である。今、コード
生成の方針を切り替える閾値ｔｈｒを、例えば２と設定
すると、最初の状態は、下表３の通りに表される。The x86 architecture has eight floating point registers. Since it is known that four registers are occupied in the entire loop, the number of registers available in the initial state is four (= 8−4). Now, if the threshold thr for switching the code generation policy is set to, for example, 2, the initial state is expressed as shown in Table 3 below.

【００７２】[0072]

【表３】 [Table 3]

【００７３】本発明に係るコンパイラが、上述したデー
タ構造において次にコード生成可能な頂点の列ｒ（Ｇ）
からトップ┬を取り出し、トップ┬のオブジェクトコー
ドを生成すると、頂点３がｒ（Ｇ）に加えられ、上記手
続き３−２に従ってａｐ（３）＝０となるので、頂点３
のオブジェクトコードを生成すると頂点２が、上記手続
き３−２に従ってａｐ（２）＝０となるのでｒ（Ｇ）に
加えられる）。本発明に係るコンパイラが、頂点２から
オブジェクトコードを生成すると、頂点１，７がｒ
（Ｇ）に加えられる。In the above-described data structure, the compiler according to the present invention is capable of generating a code in the next sequence of vertices r (G)
When the top オブジェクト is extracted and the object code of the top ┬ is generated, the vertex 3 is added to r (G), and ap (3) = 0 according to the procedure 3-2.
Is generated, the vertex 2 is added to r (G) because ap (2) = 0 according to the procedure 3-2). When the compiler according to the present invention generates an object code from vertex 2, vertices 1 and 7
(G).

【００７４】ここまでは、オブジェクトコード生成可能
な頂点が一つずつしかないので、どちらの方針（上記
「使用可能なレジスタ数ｒｅｇが閾値ｔｈｒより多い場
合」の方針と「使用可能なレジスタ数ｒｅｇが閾値ｔｈ
ｒ以下の場合」の方針）でも同じ順番になる。頂点２か
らオブジェクトコード生成すると、現在使用可能なレジ
スタ数ｒｅｇ＝２となるので、レジスタ数を最小化する
コード生成方針が選択される。Up to this point, since there is only one vertex for which object code can be generated, either one of the above-mentioned policies (when the number of available registers reg is larger than the threshold thr) and when the number of available registers is reg Is the threshold th
The same order is applied in the case of “if r or less”. When the object code is generated from the vertex 2, the number of registers currently available becomes reg = 2. Therefore, a code generation policy that minimizes the number of registers is selected.

【００７５】つまり、Δｒｅｇ（１）＝＋１，Δｒｅｇ
（７）＝＋１であり、頂点１，７のいずれが選択されて
も占有されるレジスタは１つ増加するが、Δｒｅｆ
（１）＝＋２，Δｒｅｆ（７）＝−１より、規則ｄにし
たがって頂点７の優先順位が高く設定され、データ構造
は下表４の通りとなる（規則ｃ，ｄ，ｅ）。That is, Δreg (1) = + 1, Δreg
(7) = + 1, and the register occupied by one increases even if any of the vertices 1 and 7 is selected.
From (1) = + 2, Δref (7) = − 1, the priority of vertex 7 is set higher according to rule d, and the data structure is as shown in Table 4 below (rules c, d, e).

【００７６】[0076]

【表４】 [Table 4]

【００７７】本発明に係るコンパイラが、頂点７からオ
ブジェクトコードを生成すると、頂点１１がｒ（Ｇ）に
加えられ、Δｒｅｇ（１１）＝−１，Δｒｅｇ（１）＝
＋１より、頂点１１の優先順位が高く設定され、データ
構造は下表５の通りとなる（規則ｃ）。When the compiler according to the present invention generates the object code from the vertex 7, the vertex 11 is added to r (G), and Δreg (11) = − 1, Δreg (1) =
The priority of the vertex 11 is set higher than +1 and the data structure is as shown in Table 5 below (rule c).

【００７８】[0078]

【表５】 [Table 5]

【００７９】本発明に係るコンパイラが頂点１１からオ
ブジェクトコードを生成すると、ｒｅｇ＝２となり、頂
点１のみがｒ（Ｇ）に残る（手続３−２）。本発明に係
るコンパイラが、以下同様に最適化を行うと、最終的
に、データ構造は、下表６の通りとなる。When the compiler according to the present invention generates an object code from vertex 11, reg = 2, and only vertex 1 remains in r (G) (procedure 3-2). When the compiler according to the present invention performs the same optimization in the following, the data structure is finally as shown in Table 6 below.

【００８０】[0080]

【表６】 [Table 6]

【００８１】［ＰｏｗｅｒＰＣアーキテクチャ］次に、
クリティカルパス上の頂点が優先的にコード生成される
ＰｏｗｅｒＰＣアーキテクチャについて説明する。Ｐｏ
ｗｅｒＰＣアーキテクチャの浮動小数点数のロード、乗
算、加算のサイクル数はそれぞれ１，２，１である。従
って、本発明に係るコンパイラは、各頂点ｖのトップか
らの深さｌｔ（ｖ）、ボトムからの深さｌｂ（ｖ）、ス
ラックネスｓｌ（ｖ）および参照数の増加Δｒｅｆ
（ｖ）を、下表７の通りに計算する。[PowerPC Architecture] Next,
A PowerPC architecture in which vertices on the critical path are preferentially code-generated will be described. Po
The number of cycles for loading, multiplying, and adding a floating-point number in the warPC architecture is 1, 2, 1, respectively. Accordingly, the compiler according to the present invention provides a depth lt (v) from the top of each vertex v, a depth lb (v) from the bottom, slackness sl (v), and an increase in the number of references Δref.
(V) is calculated as shown in Table 7 below.

【００８２】[0082]

【表７】 [Table 7]

【００８３】ここで、ｘ８６アーキテクチャと同様に、
オブジェクトコード生成の方針を切り替える閾値ｔｈｒ
の値を２と設定する。ＰｏｗｅｒＰＣアーキテクチャ
は、３２個の浮動小数点数レジスタを持つので、最初の
状態でデータ構造のｒｅｇの値は２８（＝３２−４）と
なる。ｘ８６の場合と同様に、本発明に係るコンパイラ
は、最初の状態から頂点３，２からオブジェクトコード
を生成する（手続３−１〜３−８）。Here, similar to the x86 architecture,
Threshold thr for switching object code generation policy
Is set to 2. Since the PowerPC architecture has 32 floating point number registers, the value of reg of the data structure is 28 (= 32−4) in the initial state. As in the case of x86, the compiler according to the present invention generates object codes from vertices 3 and 2 from the initial state (procedures 3-1 to 3-8).

【００８４】頂点２からオブジェクトコードを生成する
と、ｒｅｇの値は３０になるので、本発明に係るコンパ
イラは、スラックネスｓｌ（ｖ）とトップからの深さｌ
ｔ（ｖ）を優先するオブジェクトコード生成方針を選択
する（使用可能レジスタ数ｒｅｇが閾値ｔｈｒより多い
場合の処理から規則ｃ〜ｅまでを参照）。頂点１，７そ
れぞれのスラックネスｓｌ（１）＝０，ｓｌ（７）＝３
より、頂点１に高い優先順位が付され、データ構造は、
下表８の通りとなる（上述の「実際の使用レジスタ数を
計算して、オブジェクトコード生成の方針を切り替え
る」処理を参照）。When the object code is generated from the vertex 2, the value of reg becomes 30. Therefore, the compiler according to the present invention uses the slackness sl (v) and the depth l from the top.
An object code generation policy that prioritizes t (v) is selected (see processing from when the number of available registers reg is larger than the threshold thr to rules c to e). Slackness sl (1) = 0, sl (7) = 3 for each of vertices 1 and 7
Thus, vertex 1 is given a higher priority, and the data structure is
The results are as shown in Table 8 below (refer to the above-described process of “calculating the actual number of used registers and switching the object code generation policy”).

【００８５】[0085]

【表８】 [Table 8]

【００８６】本発明に係るコンパイラは、同様にして４
つのロード命令のオブジェクトコードを生成する。頂点
０のロード命令のオブジェクトコードを生成した状態で
は、次に、頂点４，５，６，７のオブジェクトコードが
生成可能になる。ここで、頂点４，５，６，７それぞれ
のスラックネスは、ｓｌ（４）＝０，ｓｌ（５）＝０，
ｓｌ（６）＝１，ｓｌ（７）＝３なので、頂点４，５に
高い優先順位が付され、頂点６，７よりも先に、頂点
４，５からオブジェクトコードが生成され、データ構造
は、下表９に示す通りとなる（手続３−１〜３−８）。The compiler according to the present invention similarly operates
Generate object code for one load instruction. With the object code of the load instruction of vertex 0 generated, the object codes of vertices 4, 5, 6, and 7 can be generated next. Here, the slackness of each of the vertices 4, 5, 6, and 7 is sl (4) = 0, sl (5) = 0,
Since sl (6) = 1 and sl (7) = 3, vertices 4 and 5 are assigned high priority, and object codes are generated from vertices 4 and 5 before vertices 6 and 7, and the data structure is And Table 9 below (procedures 3-1 to 3-8).

【００８７】[0087]

【表９】 [Table 9]

【００８８】同様に、頂点８，９，１０，１１それぞれ
のスラックネスの値は、ｓｌ（８）＝０，ｓｌ（９）＝
０，ｓｌ（１０）＝１，ｓｌ（１１）＝３なので、デー
タ構造は、最終的に下表１０に示す通りとなる。Similarly, the slackness values of the vertices 8, 9, 10, and 11 are sl (8) = 0 and sl (9) =
Since 0, sl (10) = 1 and sl (11) = 3, the data structure finally becomes as shown in Table 10 below.

【００８９】[0089]

【表１０】 [Table 10]

【００９０】［長い遅延時間を持つ演算子の遅延隠蔽］
上述したように、ＰｏｗｅｒＰＣアーキテクチャをター
ゲットアーキテクチャとすると、乗算の遅延時間は２マ
シンサイクルである。ここで、ターゲットアーキテクチ
ャの演算器の遅延時間を非常に大きくする演算子がクリ
ティカルパス上にある場合、スラックネスの小さい頂点
を優先すると、クリティカルパス上の命令を実行するた
めの待ち時間が無駄になってしまうことがある。[Delay concealment of operator with long delay time]
As described above, when the PowerPC architecture is the target architecture, the multiplication delay time is two machine cycles. Here, if there is an operator on the critical path that greatly increases the delay time of the arithmetic unit of the target architecture, giving priority to a vertex with small slackness wastes the waiting time for executing an instruction on the critical path. Sometimes.

【００９１】このような状態を改善するには、本発明に
係るコンパイラを、ＤＡＧからコード生成を行う際に、
各頂点の実行開始可能な時刻を計算し、クリティカルパ
ス上の命令の待ち時間に対して、スラックネスが大きい
頂点でも、実行開始可能時刻が早い頂点があれば、そち
らの優先順位を高くしてオブジェクトコード生成を行う
ように構成するとよい。In order to improve such a state, the compiler according to the present invention performs
Calculates the execution start time of each vertex, and if there is a vertex with a large slackness with respect to the waiting time of the instruction on the critical path, if there is a vertex with an earlier execution start time, the priority is increased and the object is set. It may be configured to perform code generation.

【００９２】このように構成した場合の本発明に係るコ
ンパイラの処理を、以下に説明する。オブジェクトコー
ドの生成時に、最も最近のコードが発行された時刻ｃｔ
（今までコード生成されたコードの発行時刻の最大値）
を算出し、さらに、オブジェクトコード生成中に、その
時点で未だオブジェクトコードが生成されていない頂点
ｖ（ＤＡＧからコード生成を行う間、ｒ（Ｇ）に属する
頂点ｖ）について、頂点ｖをコード生成することが可能
な時刻ｒｔ（ｖ）を計算する。ただし、この場合、時刻
ｃｔ，ｒｔ（ｖ）それぞれのオブジェクトコード作成開
始時点の初期値を０とする。The processing of the compiler according to the present invention in such a configuration will be described below. The time ct at which the latest code was issued when the object code was generated
(Maximum value of code generation time until now)
Further, during the object code generation, a vertex v is generated for the vertex v for which the object code has not yet been generated (the vertex v belonging to r (G) during the code generation from the DAG). Rt (v) is calculated. However, in this case, the initial values at the time of starting the object code creation at time ct and rt (v) are set to 0.

【００９３】頂点ｗのオブジェクトコードを生成すると
きに、上記手続３−１において、時刻ｃｔの値をｃｔ：
＝ｒｔ（ｗ）と設定する。手続３−３において、時刻ｒ
ｔ（ｗ）の値をｒｔ（ｗ）：＝ｃｔ＋ｃｖ（ｖ）と設定
する。演算子の遅延隠蔽を行うには、上記規則ａ，ｂに
従って、スラックネスｓｌ（ｖ）とトップからの深さｌ
ｔ（ｖ）を優先するコード生成方針で、rt(v)をスラッ
クネスより優先してコード生成を行えばよい。rt(v)が
同じ場合はスラックネスとトップからの深さを優先す
る。When the object code of the vertex w is generated, the value of the time ct is changed to ct:
= Rt (w). In procedure 3-3, time r
The value of t (w) is set as rt (w): = ct + cv (v). In order to perform the delay concealment of the operator, the slackness sl (v) and the depth l from the top are determined according to the rules a and b.
In the code generation policy in which t (v) is prioritized, rt (v) may be generated prior to slackness. When rt (v) is the same, priority is given to slackness and depth from the top.

【００９４】［効果］共通部分式がなく、木で表される
ようなソースプログラムからオブジェクトコードを生成
する場合については、Sethi, R. and J. D. Ullman, "T
he generation of optimal code for arithmetic expre
ssions", J. ACM 17:4, 715-728.に開示されているよう
に、中間変数のための使用レジスタ数を最小化するアル
ゴリズムが開発されている。また、Aho, A. V. and S.
C. Johnson, "Optimal code generation for expressio
n trees", J. ACM 23:3, 488-501.に開示されているよ
うに、複雑な命令セットを持つアーキテクチャに対応す
るために、動的計画法を用いたアルゴリズムが開発され
ている。これらのアルゴリズムは、いずれも使用レジス
タ数だけを最小化するように工夫されている。[Effect] In the case where an object code is generated from a source program represented by a tree without a common subexpression, see Sethi, R. and JD Ullman, "T
he generation of optimal code for arithmetic expre
ssions ", J. ACM 17: 4, 715-728. Algorithms have been developed to minimize the number of registers used for intermediate variables.
C. Johnson, "Optimal code generation for expressio
In order to cope with an architecture having a complicated instruction set, an algorithm using dynamic programming has been developed as disclosed in J. ACM 23: 3, 488-501. Each of these algorithms is devised so as to minimize only the number of used registers.

【００９５】しかしながら、先に述べたとおり、使用レ
ジスタ数を最小化することと、演算の遅延時間を隠蔽し
たり、共通部分式を削減したりするオブジェクトコード
の最適化との間にはトレードオフが存在する。本発明に
係るコンパイラは、このトレードオフを、ＤＡＧからの
オブジェクトコード生成時に計算し、動的に最適化の方
針を変更することができるため、他のコード最適化を妨
げることなく、最適化の適用範囲を使用可能なレジスタ
数の制約が許す部分に限定することができるという点
で、上記Seti等により開発されたアルゴリズムよりも優
れている。However, as described above, there is a trade-off between minimizing the number of registers used and optimizing the object code for hiding the operation delay time and reducing common sub-expressions. Exists. The compiler according to the present invention can calculate this trade-off at the time of generating the object code from the DAG and dynamically change the optimization policy, so that the optimization can be performed without hindering other code optimization. It is superior to the algorithm developed by Seti et al. In that the applicable range can be limited to a portion permitted by the restriction on the number of registers that can be used.

【００９６】［第２実施形態］以下、本発明の第２の実
施形態として、レジスタ割付を工夫することによりオブ
ジェクトコードを最適化する方法を説明する。レジスタ
割付は、コンパイルの結果として選られるオブジェクト
コードのパフォーマンスを向上させるための非常に重要
な手法の一つであるが、その多くは全てのレジスタを対
称に扱っている。ところが、全てのレジスタの扱われ型
は等価ではなく、実際にはマシンの仕様やオブジェクト
コード生成慣習に基づいたレジスタ使用の非対称性が存
在する。この制約を考慮せずにレジスタ割付を行ってし
まうと、オブジェクトコード生成時にレジスタ間の移動
命令が無駄に生成されてしまい、コンパイルの結果とし
て選られるオブジェクトコードのパフォーマンスを大き
く落としてしまう可能性がある。[Second Embodiment] A method of optimizing an object code by devising register allocation will be described below as a second embodiment of the present invention. Register allocation is one of the most important ways to improve the performance of the object code chosen as a result of compilation, but many treat all registers symmetrically. However, the handling types of all registers are not equivalent, and there is actually an asymmetry in register usage based on machine specifications and object code generation practices. If register allocation is performed without considering this restriction, instructions for moving registers between registers will be generated in vain when object code is generated, and the performance of the object code selected as a result of compilation may be greatly reduced. is there.

【００９７】本発明の第２の実施形態として示す最適化
処理においては、データフロー方程式を解くことによ
り、レジスタの特定の使用に関する情報を中間コードの
後ろから前に伝播し、将来的に出現する特定レジスタの
使用制約を前もって充足しておき、また、Ｗｅｂ（ある
変数の定義・使用連鎖および使用・定義連鎖による連結
集合）を構成している変数にレジスタが割り当てられた
時点で、Ｗｅｂつまり連結集合の要素にこのレジスタを
伝播させ、同じく将来的な特定レジスタの使用をあらか
じめ決定しておく。これにより、レジスタ制約を満たす
ためのレジスタ移動が削減され、無駄のない質の高いオ
ブジェクトコードが生成可能となる。In the optimization processing shown as the second embodiment of the present invention, information on a specific use of a register is propagated from back to front of the intermediate code by solving a data flow equation, and will appear in the future. The usage constraint of a specific register is satisfied in advance, and when a register is assigned to a variable constituting a Web (a definition / use chain of a certain variable and a linked set by use / definition chain), the Web, that is, the connection is established. This register is propagated to the members of the set, and the future use of the specific register is determined in advance. Thereby, register movement for satisfying the register constraint is reduced, and high-quality object code without waste can be generated.

【００９８】第２の実施形態として示す本発明にかかる
コンパイラは、コンパイル処理に用いる中間コードにお
いて各命令のレジスタ割付が必要となる要素に、望まし
いレジスタを記述できるフィールド(以下制約記述フィ
ールド)を用意し、ターゲットマシンの性質に合わせ
て、制約記述フィールドを記述する。この制約記述フィ
ールドには、強い制約を記述するものと弱い制約を記述
するものの二つが用意される。The compiler according to the present invention shown as the second embodiment prepares a field (hereinafter referred to as a constraint description field) that can describe a desired register in an element that requires register allocation of each instruction in the intermediate code used for the compiling process. Then, a constraint description field is described in accordance with the properties of the target machine. The constraint description field is prepared with two types, one that describes a strong constraint and one that describes a weak constraint.

【００９９】次に、マシンの仕様等に基づいて中間コー
ドの各命令に弱い制約を記述していく。制約には、特定
演算に使われる専用レジスタやレジスタを介して渡され
る関数の引数などがある。特に、本発明では関数呼び出
しを越えて生存する変数が不揮発性レジスタに乗るべき
であることを考慮し、関数呼び出しの後にもっとも早く
参照される、不揮発性レジスタの個数分の変数を参照し
ている命令に、不揮発性レジスタに対応した制約を記述
する。Next, weak constraints are described for each instruction of the intermediate code based on the specifications of the machine. The constraint includes a dedicated register used for a specific operation, a function argument passed through the register, and the like. In particular, the present invention considers that variables that survive a function call should be placed in nonvolatile registers, and refers to variables as many as the number of nonvolatile registers that are referred to earliest after a function call. The constraint corresponding to the non-volatile register is described in the instruction.

【０１００】各命令に弱い制約を記述した後、これらの
制約をデータの使用から定義にむけて後ろ向きに伝播さ
せる。ここで、具体的に伝播とは、データの使用が行わ
れているのオブジェクトコードにあった情報(ここでは
弱い制約)をデータの定義が行われているのオブジェク
トコードに複写することをいう。これにより、ある命令
における変数定義が、未来においてどのようなレジスタ
制約を持って参照されるかが明らかになる。また同時
に、ある制約を持った変数と生存区間が重なるような変
数に、その制約の補集合となるような制約を記述し、同
様に伝播を行う。After describing the weak constraints in each instruction, these constraints are propagated backward from the use of data to the definition. Here, the term “propagation” specifically refers to copying information (here, a weak constraint) in an object code in which data is used to an object code in which data is defined. As a result, it becomes clear what kind of register constraint the variable definition in a certain instruction will be referred to in the future. At the same time, a constraint that is a complement of the constraint is described in a variable whose live range overlaps with a variable having a constraint, and propagation is similarly performed.

【０１０１】これにより、制約を持ったレジスタを取得
する時点で、そのレジスタがすでに他の変数に割り付け
られているような事態を回避することができる。以上の
ような伝播をベーシックブロック内で行うような関数を
用意した上で、データフロー方程式を解く。以上の操作
で、ベーシックブロックを越えて弱い制約が中間コード
の全体に渡り伝播される。Thus, it is possible to avoid a situation in which a register having a constraint is already allocated to another variable at the time of obtaining the register with the constraint. After preparing a function for performing the above propagation in the basic block, the data flow equation is solved. By the above operation, the weak constraint is propagated throughout the intermediate code beyond the basic block.

【０１０２】この後、中間コードを前から後ろに走査
し、記述された制約に基づいてレジスタを割り付けてい
く。このとき、変数の定義・使用連鎖および使用・定義
連鎖をたどりながら、定義を行っている変数に割り付け
られたレジスタを、強い制約としてその変数の定義が属
するＷｅｂの全体に伝播させる。ここで、Ｗｅｂとは、
同じ変数に対する DU- UD-chain による連結集合のこと
をいう。Thereafter, the intermediate code is scanned from front to back, and registers are allocated based on the described constraints. At this time, the register assigned to the variable being defined is propagated as a strong constraint to the entire Web to which the definition of the variable belongs while following the definition / use chain and the use / definition chain of the variable. Here, the Web is
It refers to a connected set by the DU-UD-chain for the same variable.

【０１０３】これにより、各Ｗｅｂに属する変数に対し
て同じレジスタが割り付けられる可能性が高くなり、特
にベーシックブロックの合流点において変数の定義がマ
ージしている場合に保障的なレジスタ間の移動命令を生
成することを回避できる。この操作は各 Web に対して
一回のみおこなわれ、すでに強い制約が伝播されたＷｅ
ｂ上の変数にレジスタを割り付ける場合は伝播を行わな
い。As a result, there is a high possibility that the same register is allocated to a variable belonging to each Web. Particularly, when the definition of the variable is merged at the junction of the basic blocks, the transfer instruction between the registers is guaranteed. Can be avoided. This operation is performed only once for each Web.
When a register is assigned to a variable on b, no propagation is performed.

【０１０４】また、レジスタマッピングの初期値(イニ
シャルキャッシュセット)を決めるときに、データフロ
ー方程式によって解かれた弱いレジスタ制約を反映さ
せ、最初からレジスタに乗っている変数に対し、それら
の変数の参照時に無駄なレジスタ間移動命令が生成され
ないようにする。Further, when determining the initial value (initial cache set) of the register mapping, the weak register constraint solved by the data flow equation is reflected, and the variables which have already been in the registers from the beginning are referred to by those variables. At times, useless inter-register transfer instructions are prevented from being generated.

【０１０５】静的（Static）コンパイラで一般的に採用
されている干渉グラフの彩色アルゴリズムにおいても、
レジスタ制約を反映させる方法が開発されている。しか
しこの方法には、以下のような問題点がある。この方法
では、物理レジスタを表すノードを干渉グラフに加え、
レジスタと変数を干渉させることでレジスタ制約を消極
的に表現することが可能である。例えば、変数ａをレジ
スタｒ１に割り当てたい場合は、ｒ１を表すノードとａ
以外の変数を表すノードに干渉を与えればよい。しかし
ながら、この方法は消極的な記述のみが可能であるた
め、複数の変数を一つのレジスタに割り当てる場合など
は制約の記述が弱くなってしまう。In the interference graph coloring algorithm generally adopted by a static compiler,
Methods have been developed to reflect register constraints. However, this method has the following problems. In this method, a node representing a physical register is added to the interference graph,
It is possible to express the register constraint passively by causing the register and the variable to interfere with each other. For example, to assign a variable a to the register r1, a node representing r1 and a
Interference may be given to nodes representing variables other than. However, since this method can only describe passively, the description of the constraint becomes weak when a plurality of variables are assigned to one register.

【０１０６】また、各ノードには時間的なコンテクスト
が圧縮されているため、ある変数が異なったコンテクス
トで異なった制約を持っているような場合は、制約の記
述は行えない。さらに、レジスタ制約を表すのに必要な
干渉の数が膨大になるという問題点もある。Further, since the temporal context is compressed in each node, when a certain variable has a different context and a different constraint, the constraint cannot be described. Further, there is a problem that the number of interferences required to represent the register constraint becomes enormous.

【０１０７】［第２の実施形態として示すコンパイラ］
本発明の第２の実施形態として示すコンパイラ（Ｊａｖ
ａＪｕｓｔ−ｉｎーＴｉｍｅコンパイラ）を説明す
る。[Compiler Shown as Second Embodiment]
The compiler (Java) shown as the second embodiment of the present invention
a Just-in-Time compiler) will be described.

【０１０８】本発明にかかるコンパイラは、レジスタ転
送レベルの中間コードを用いて処理を行っている。すな
わち、各命令は、ディスティネーション、ソース(複数)
を表す変数(仮想レジスタ)と、オペレーションコードの
組として記述される。命令中の各変数はレジスタアロケ
ータによりレジスタが割り当てられる。本発明に係るこ
んぱいらは、これらの変数に対し、付属情報としてター
ゲットマシンで利用可能な物理レジスタを表現するため
のビットフィールドを用意し、以下に述べる方法で取得
されることが望ましいレジスタのビットを立てる。The compiler according to the present invention performs processing using the intermediate code at the register transfer level. That is, each instruction is a destination, a source (plural)
(Virtual register) and an operation code. Each variable in the instruction is assigned a register by the register allocator. Compiler according to the present invention, for these variables, prepare a bit field to represent a physical register available in the target machine as ancillary information, the bit of the register is preferably obtained by the method described below Stand up.

【０１０９】このフィールドは強い制約が記述されるも
のと弱い制約が記述されるものの二種類を用意する。レ
ジスタアロケータは、ベーシックブロックを深さ優先順
にたどり、各ベーシック内の命令を前から後ろに走査し
ながらレジスタを割り当てていく。レジスタ割付時にお
いて、レジスタを割り当てようとしている変数に強い制
約を表すビットが立っていた場合、そのレジスタを優先
的に割り当てる。もし、そのレジスタがすでに使用され
ていた場合は、弱い制約に基づきレジスタを割り当て
る。さらにそのレジスタが使用されていた場合は、他の
レジスタを割り付ける。This field prepares two types, one in which a strong constraint is described and one in which a weak constraint is described. The register allocator follows the basic blocks in depth-first order, and allocates registers while scanning instructions in each basic from the front to the back. At the time of register assignment, if a bit indicating a strong constraint is set in a variable to which a register is to be assigned, the register is preferentially assigned. If the register has already been used, allocate the register based on weak constraints. If the register is used, another register is allocated.

【０１１０】また、本手法では、ループを単位としてレ
ジスタのキャッシュセットを決定し、ループの入り口に
おいてレジスタの入れかえを行う。ここでは、あらか
じめレジスタに乗っているべき変数は、ループの入り口
からもっとも最初に参照される物理レジスタ個の変数と
する。これらの変数に対しレジスタを初期的に割り当て
るわけであるが、このときにループの入り口に伝播され
たレジスタ制約を反映させるようにする。In this method, a register cache set is determined for each loop, and registers are replaced at the entrance of the loop. Here, the variables that should be in the registers in advance are the variables of the physical registers that are first referred to from the entrance of the loop. Registers are initially allocated to these variables. At this time, the register constraints propagated at the entry of the loop are reflected.

【０１１１】［制約の記述］まず、関数呼び出しに関す
る制約を説明する。[Description of Constraints] First, constraints on function calls will be described.

【０１１２】［関数の引数に関する制約］本発明にかか
るコンパイラにおいては、関数呼び出し時に引数の一部
をレジスタに乗せ、処理の高速化を図っている。どの引
数がどのレジスタに乗るかは、呼び出し慣習によってあ
らかじめ定められており、これが変数にレジスタを割り
付けるときの制約となる。本例では、関数呼び出しを行
う命令のソース変数のうち、レジスタに乗せて値が渡さ
れるもののフィールドに、呼び出し慣習で定められてい
るレジスタに対応したビットを立てる。[Restriction on Argument of Function] In the compiler according to the present invention, a part of the argument is placed in a register at the time of calling the function, so that the processing is speeded up. Which argument rides on which register is predetermined by calling convention, which is a constraint when allocating registers to variables. In this example, among the source variables of the instruction for performing the function call, a bit corresponding to the register defined by the calling convention is set in a field of a value to be passed on a register.

【０１１３】［関数の戻り値に関する制約］本発明にか
かるコンパイラにおいては、関数からの復帰時に戻り値
をレジスタに乗せ、処理の高速化を図っている。戻り値
をどのレジスタに乗せるかは慣習によって決まっている
ので、戻り値となる変数にレジスタを割り付けるときに
制約が発生する。本例では、復帰を行う命令のソース変
数のフィールドに、復帰時の慣習で定められているレジ
スタに対応したビットを立てる。[Restriction on Return Value of Function] In the compiler according to the present invention, the return value is placed in a register when returning from the function, so as to speed up the processing. Since the register to which the return value is put is determined by custom, there is a restriction when assigning a register to a variable to be a return value. In this example, a bit corresponding to a register determined by custom at the time of return is set in the field of the source variable of the instruction to perform the return.

【０１１４】［関数を越えて生存する変数に関する制
約］本発明にかかるコンパイラでは、関数呼び出し時に
生存しているレジスタは、呼ばれる方の（Callee）側で
必要に応じてセーブされる。したがって、関数を越えて
生存する変数を不揮発性レジスタに乗せたほうが良いと
いう一種の制約がある。本例では、関数呼び出しが起こ
った後に最も早く参照される不揮発性レジスタ個の変数
を使用する命令のソースフィールドに、不揮発性レジス
タに対応したビットを立てる。関数呼び出し後に最も早
く参照される変数をデータフロー解析を用いて求めるこ
とができる。[Restriction on Variable Surviving Beyond Function] In the compiler according to the present invention, the register alive at the time of calling the function is saved as needed on the called side (Callee). Therefore, there is a kind of restriction that it is better to put a variable that survives beyond a function on a nonvolatile register. In this example, a bit corresponding to the nonvolatile register is set in the source field of the instruction that uses the variable of the nonvolatile register that is referred to the earliest after the function call occurs. The variable that is referred to earliest after a function call can be determined using data flow analysis.

【０１１５】［特定演算における制約］掛け算や割り
算、シフト命令等では、特定のレジスタを使用しなけれ
ばならないという制約がある。本実施例では、特定レジ
スタの使用が必要な命令のソース変数のフィールドにマ
シン仕様に定められたレジスタに対応したビットを立て
る。[Restriction on Specific Operation] There is a restriction that a specific register must be used in multiplication, division, shift instruction and the like. In this embodiment, a bit corresponding to a register defined in the machine specifications is set in a field of a source variable of an instruction that requires the use of a specific register.

【０１１６】［制約の伝播］以下、本発明にかかるコン
パイラにおける制約の伝播を説明する。[Propagation of Constraints] Propagation of constraints in the compiler according to the present invention will be described below.

【０１１７】［静的な制約伝播(弱い制約の伝播)］各ベ
ーシックブロックごとに、その入口と出口において、各
変数の持つレジスタ制約を表す配列を用意する。各ベー
シックブロックにおいて、各変数の初期レジスタ制約を
ベーシックブロックの出口の配列とし、命令を下から上
に走査して記述された制約を変数の使用から定義に伝播
させ、変数の弱い制約記述フィールドを記述していく。
また制約を持った変数と生存区間が重なるような変数
に、その制約の補集合となるような制約を記述し、同様
に伝播を行う。伝播された結果をベーシックブロックの
入口の配列に格納し、ベーシックブロックどうしの連結
状態から、あるベーシックブロックの出口の配列 = そ
のベーシックブロックから到達する全てのベーシックブ
ロックの入口の配列の論理積を計算し、この出口配列を
新たな初期値として制約の走査を行う。以上を出口配列
の変化が飽和するまで繰り返す。[Static Constraint Propagation (Propagation of Weak Constraints)] For each basic block, an array representing the register constraints of each variable is prepared at the entrance and exit. In each basic block, the initial register constraint of each variable is set to the array of the exit of the basic block, the instruction is scanned from bottom to top, and the described constraint is propagated from the use of the variable to the definition, and the weak constraint description field of the variable is set. I will describe it.
In addition, a constraint that is a complement of the constraint is described in a variable whose live range overlaps with the variable having the constraint, and propagation is performed similarly. Stores the propagated result in the array of basic block entrances, and calculates the logical product of the array of exits of a basic block = the array of entrances of all basic blocks arriving from the basic block based on the connection state of the basic blocks The constraint array is scanned using the exit array as a new initial value. The above is repeated until the change in the outlet arrangement is saturated.

【０１１８】［レジスタ割付時の制約伝播(強い制約の
伝播)］レジスタ割付中、二つ以上の定義を持つＷｅｂ
を構成している変数に対するレジスタ取得が行われた場
合、命令に付された相互リンクをたどって、他のＷｅｂ
を構成要素となっている変数を検出する。この変数の強
い制約記述フィールドに、取得されたレジスタを書き込
む。また、レジスタ取得対象となっているに制約がつい
ていなかった場合、Web 上を操作して他の定義に付され
ている制約を検出し、この制約に基づいてレジスタの取
得を試みる。なお、この操作は各Ｗｅｂに対して一回の
みおこなわれ、すでに処理が行われたＷｅｂ上の変数に
レジスタを割り付ける場合は伝播を行わない。[Propagation of Constraints During Register Allocation (Propagation of Strong Constraints)] During register allocation, a Web having two or more definitions
When the register acquisition is performed for the variable that configures the instruction, the link following the instruction is linked to another Web.
Is detected as a constituent variable. The obtained register is written in the strong constraint description field of this variable. If there is no restriction on the register acquisition target, operate the Web to detect the restriction attached to other definitions, and try to acquire the register based on this restriction. This operation is performed only once for each Web, and when a register is assigned to a variable on the Web that has already been processed, propagation is not performed.

【０１１９】［第３実施形態］以下、本発明の第３の実
施形態として、第１の実施形態および第２実施形態に示
したオブジェクトコードの最適化を行うコンパイラの実
現方法を説明する。[Third Embodiment] Hereinafter, as a third embodiment of the present invention, a method of implementing the compiler for optimizing the object code shown in the first embodiment and the second embodiment will be described.

【０１２０】［コンピュータシステム］図２は、第１の
実施形態および第２の実施形態に示した本発明にかかる
コンパイラを実現するコンピュータシステムの構成を示
す図である。図２に示すように、コンピュータシステム
は、Ｊａｖａコンパイラを含むＪａｖａ開発環境１、ネ
ットワークあるいは記録媒体などの媒体５およびクアラ
イアント装置６から構成される。[Computer System] FIG. 2 is a diagram showing a configuration of a computer system for realizing the compiler according to the present invention shown in the first embodiment and the second embodiment. As shown in FIG. 2, the computer system includes a Java development environment 1 including a Java compiler, a medium 5 such as a network or a recording medium, and a client device 6.

【０１２１】クライアント装置６は、所定数のレジスタ
・演算回路・パイプライン機構などから構成されるマイ
クロプロセッサ、ＲＡＭ・ＲＯＭなどのメモリ、周辺回
路、および、ハードディスク装置・リムーバブルディス
ク装置などの記憶装置（いずれも図示せず）等から構成
されるハードウェア１３を含む。このハードウェア１３
は、ＪａｖａＶＭ(Java Virtual Machine)７、および、
Ｗｉｎｄｏｗｓ（マイクロソフト社商品名）、ＯＳ２
（ＩＢＭ社商品名）およびＭａｃＯＳ（アップル社商品
名）等のオペレーティングシステム（ＯＳ）１２を記録
媒体からメモリにロードし、実行する。ＪａｖａＶＭ７
は、Ｊａｖａバイトコード検証部８、Ｊａｖａインタプ
リタ９、ＪａｖａＪＩＴコンパイラ１０を含む。The client device 6 includes a microprocessor including a predetermined number of registers, arithmetic circuits, and a pipeline mechanism, a memory such as a RAM and a ROM, peripheral circuits, and a storage device such as a hard disk device and a removable disk device. (Not shown). This hardware 13
Is a JavaVM (Java Virtual Machine) 7 and
Windows (trade name of Microsoft Corporation), OS2
An operating system (OS) 12 such as (IBM product name) and MacOS (Apple product name) is loaded from a recording medium to a memory and executed. JavaVM7
Includes a Java bytecode verification unit 8, a Java interpreter 9, and a JavaJIT compiler 10.

【０１２２】［Ｊａｖａ開発環境１］Ｊａｖａ開発環境
１において、Ｊａｖａコンパイラ３は、あらかじめ作成
されたＪａｖａソースコード２をコンパイルし、Ｊａｖ
ａバイトコード４を生成する。生成されたＪａｖａバイ
トコード５は、ネットワークにより、あるいは、記録媒
体に記録され、媒体５を介してクライアント装置６に対
して送られる。なお、Ｊａｖａバイトコード４は、コン
ピュータのハードウェアおよび実行環境に関わらず実行
可能なＪａｖａプログラムの表現形式である。[Java Development Environment 1] In the Java development environment 1, the Java compiler 3 compiles the Java source code 2 created in advance, and
Generate a byte code 4. The generated Java bytecode 5 is recorded on a network or in a recording medium and sent to the client device 6 via the medium 5. The Java bytecode 4 is an expression format of a Java program that can be executed regardless of computer hardware and an execution environment.

【０１２３】［クライアント装置６］クライアント装置
６は、Ｊａｖａ開発環境１から媒体５を介して供給され
たＪａｖａバイトコード４を、ＷＷＷブラウザ、ＷＷＷ
サーバ等のＪａｖａＶＭ７により実行する。[Client Device 6] The client device 6 converts the Java bytecode 4 supplied from the Java development environment 1 via the medium 5 into a WWW browser, WWW
It is executed by the Java VM 7 such as a server.

【０１２４】［Ｊａｖａバイトコード検証部８］クライ
アント装置６において、Ｊａｖａバイトコード検証部８
は、Ｊａｖａ開発環境１から媒体５を介してクライアン
ト装置６に送られてきたＪａｖａバイトコード４をバイ
トコード検証し、Ｊａｖａバイトコードの仕様を満たし
ているか否かを判断し、仕様を満たしている場合には、
Ｊａｖａインタプリタ９およびＪＩＴコンパイラ１０に
対して出力する。[Java Bytecode Verification Unit 8] In the client device 6, the Java bytecode verification unit 8
Verifies the bytecode of the Java bytecode 4 sent from the Java development environment 1 to the client device 6 via the medium 5, determines whether the specification of the Java bytecode is satisfied, and satisfies the specification. in case of,
Output to the Java interpreter 9 and the JIT compiler 10.

【０１２５】［Ｊａｖａインタプリタ９］Ｊａｖａイン
タプリタ９は、Ｊａｖａバイトコード検証部８から入力
されたＪａｖａバイトコードをインタプリタ方式により
実行する。[Java interpreter 9] The Java interpreter 9 executes the Java bytecode input from the Java bytecode verification unit 8 by an interpreter method.

【０１２６】［ＪＩＴコンパイラ１０］ＪＩＴコンパイ
ラ１０は、Ｊａｖａバイトコード検証部８から入力され
たＪａｖａバイトコード高速実行するために、ハードウ
ェア１３が実行可能なマシンコード（オブジェクトコー
ド）１１に変換し、実行させる。[JIT Compiler 10] The JIT compiler 10 converts the machine code (object code) 11 that can be executed by the hardware 13 to execute the Java bytecode input from the Java bytecode verification unit 8 at high speed. Let it run.

【０１２７】［コンピュータシステムの動作］以下、図
３〜図８を参照して、図２に示したコンピュータシステ
ムの動作を説明する。図３は、図２に示したＪＩＴコン
パイラ１０の動作を示すフローチャートである。図４
は、図３に示した中間コード最適化処理（Ｓ１７）を示
す第１のフローチャートである。図５は、図４に示した
中間コード作成処理（Ｓ２６）を示すフローチャートで
ある。図６は、図５に示したＤＡＧからの中間コード作
成処理（Ｓ３０）を示すフローチャートである。図７
は、図６に示した頂点列整列処理（Ｓ３７）を示すフロ
ーチャートである。図８は、図３に示した中間コード最
適化処理（Ｓ１７）を示す第２のフローチャートであっ
て、図４に示した処理に続く処理を示す。図９は、図３
に示したコード生成・レジスタ割付処理（Ｓ１８）を示
すフローチャートである。[Operation of Computer System] The operation of the computer system shown in FIG. 2 will be described below with reference to FIGS. FIG. 3 is a flowchart showing the operation of the JIT compiler 10 shown in FIG. FIG.
5 is a first flowchart showing the intermediate code optimization processing (S17) shown in FIG. FIG. 5 is a flowchart showing the intermediate code creation processing (S26) shown in FIG. FIG. 6 is a flowchart showing the process of creating the intermediate code from the DAG shown in FIG. 5 (S30). FIG.
7 is a flowchart showing a vertex string alignment process (S37) shown in FIG. FIG. 8 is a second flowchart showing the intermediate code optimizing process (S17) shown in FIG. 3, and shows a process subsequent to the process shown in FIG. FIG. 9 shows FIG.
9 is a flowchart showing a code generation / register allocation process (S18) shown in FIG.

【０１２８】Ｊａｖａ開発環境１において、Ｊａｖａコ
ンパイラ３は、Ｊａｖａソースコード２からＪａｖａバ
イトコード４を生成し、媒体５を介してクライアント装
置６に送信する。クライアント装置６において、Ｊａｖ
ａＶＭ７のＪａｖａバイトコード検証部８は、Ｊａｖａ
バイトコード４を受信し、Ｊａｖａバイトコードを検証
して、仕様に合致したＪａｖａバイトコード４をＪＩＴ
コンパイラ１０に対して出力する。In the Java development environment 1, the Java compiler 3 generates a Java bytecode 4 from the Java source code 2 and sends it to the client device 6 via the medium 5. In the client device 6, Java
The Java bytecode verification unit 8 of the aVM 7
Bytecode 4 is received, Java bytecode is verified, and Java bytecode 4 that meets the specifications is
Output to the compiler 10.

【０１２９】図３に示すように、ステップ１４（Ｓ１
４）において、ＪＩＴコンパイラ１０（図２）は、Ｊａ
ｖａバイトコード検証部８からＪａｖａバイトコード４
を受け取る。ステップ１５（Ｓ１５）において、ＪＩＴ
コンパイラ１０は、Ｊａｖａバイトコード４から中間コ
ードの形式に変換し、中間コード（Ｓ１６）を生成す
る。この中間コード（Ｓ１６）は、ＪａｖａＪＩＴコン
パイラが解析したり、変形したりしやすいような表現形
式、つまり、変数の値を読み出し、演算を行い、結果を
変数に書き込む命令が順番に並んだ形式になっている。As shown in FIG. 3, step 14 (S1
In 4), the JIT compiler 10 (FIG. 2)
From the Java bytecode verification unit 8 to the Java bytecode 4
Receive. In step 15 (S15), JIT
The compiler 10 converts the Java bytecode 4 into an intermediate code format and generates an intermediate code (S16). This intermediate code (S16) is in an expression format that is easily analyzed and deformed by the JavaJIT compiler, that is, a format in which instructions for reading a variable value, performing an operation, and writing a result to a variable are arranged in order. Has become.

【０１３０】ステップ１７（Ｓ１７）において、ＪＩＴ
コンパイラ１０は、Ｓ１５の処理において生成された中
間コード（Ｓ１６）を対象とした最適化処理（図４〜
８）を行う。In step 17 (S17), the JIT
The compiler 10 performs an optimization process (FIG. 4 to FIG. 4) on the intermediate code (S16) generated in the process of S15.
Perform 8).

【０１３１】図４に示すように、Ｓ１７（図３）のステ
ップ２３（Ｓ２３）において、ＪＩＴコンパイラ１０
（図２）は、中間コードをＤＡＧ（図１）の形式に変換
し、中間コードＤＡＧ（Ｓ２４）を生成する。この中間
コードＤＡＧ（Ｓ２５）は、上述のように、中間コード
が、演算子を頂点とし、演算に使用されるデータの依存
関係を辺とするような有向グラフとして表現され、ＪＩ
Ｔコンパイラ１０は、共通部分式の削除等の最適化処理
を、この中間コードＤＡＧ（Ｓ２５）に対して行う。As shown in FIG. 4, in step 23 (S23) of S17 (FIG. 3), the JIT compiler 10
(FIG. 2) converts the intermediate code into a DAG (FIG. 1) format to generate an intermediate code DAG (S24). As described above, the intermediate code DAG (S25) is expressed as a directed graph in which the intermediate code has an operator as a vertex and a dependency of data used for the operation as an edge.
The T compiler 10 performs optimization processing such as deletion of the common subexpression on the intermediate code DAG (S25).

【０１３２】ステップ２６（Ｓ２６）において、ＪＩＴ
コンパイラ１０は、中間コードＤＡＧ（Ｓ２５）で表現
された制約を満たすように、中間コードの演算子を順に
ならべて、再び中間コードを生成する。なお、中間コー
ドＤＡＧで表現された制約とは、ＤＡＧの有向辺が元の
中間コードの順序制約を表すことをいう。In step 26 (S26), the JIT
The compiler 10 generates the intermediate code again by arranging the operators of the intermediate code in order so as to satisfy the constraint expressed by the intermediate code DAG (S25). Note that the constraint expressed by the intermediate code DAG means that the directed side of the DAG represents the order constraint of the original intermediate code.

【０１３３】図５に示すように、Ｓ２６（図４）のステ
ップ２９（Ｓ２９）において、ＪＩＴコンパイラ１０
（図２）は、ＤＡＧ（Ｓ２６）の各頂点について、オブ
ジェクトコード生成に使用するパラメータ（ｌｔ，ｌ
ｂ，ｓｌ，△ｒｅｆ）を計算する（上記「第２のステッ
プの詳細」を参照）。As shown in FIG. 5, in step 29 (S29) of S26 (FIG. 4), the JIT compiler 10
FIG. 2 shows parameters (lt, l) used for object code generation for each vertex of the DAG (S26).
b, sl, △ ref) (see “Details of the second step” above).

【０１３４】ステップ３０（Ｓ３０）において、ＪＩＴ
コンパイラ１０は、生成された中間コードの使用レジス
タ数を計算し、中間コードを生成する方針を動的に変更
しながらオブジェクトコードを生成する（手続３−１〜
３−８）。In step 30 (S30), the JIT
The compiler 10 calculates the number of used registers of the generated intermediate code, and generates the object code while dynamically changing the policy for generating the intermediate code (procedure 3-1 to procedure 3-1).
3-8).

【０１３５】図６に示すように、Ｓ３０（図５）のステ
ップ３３（Ｓ３３）において、ＪＩＴコンパイラ１０
（図２）は、オブジェクトコード生成可能な頂点の列ｒ
（Ｇ）の初期化を行う（上記手続３−１〜３−８）。つ
まり、ＪＩＴコンパイラ１０は、トップ┬を、オブジェ
クトコード生成可能な頂点の列に加える。As shown in FIG. 6, in step 33 (S33) of S30 (FIG. 5), the JIT compiler 10
(FIG. 2) shows a column r of vertices from which object code can be generated.
(G) is initialized (procedures 3-1 to 3-8 described above). That is, the JIT compiler 10 adds the top ┬ to the row of vertices for which the object code can be generated.

【０１３６】ステップ３４（Ｓ３４）において、ＪＩＴ
コンパイラ１０は、オブジェクトコード生成可能な頂点
の列ｒ（Ｇ）の先頭の頂点を取り出し、オブジェクトコ
ード生成された頂点の列ａ（Ｇ）の最後に追加する（手
続３−１，３−２）。In step 34 (S34), the JIT
The compiler 10 extracts the top vertex of the vertex sequence r (G) in which the object code can be generated and adds it to the end of the vertex sequence a (G) in which the object code is generated (procedure 3-1 and 3-2). .

【０１３７】ステップ３５（Ｓ３５）において、ＪＩＴ
コンパイラ１０は、Ｓ３４の処理により、新たにオブジ
ェクトコードの生成が可能になった頂点を、オブジェク
トコード生成可能な頂点の列に追加する（手続３−
３）。In step 35 (S35), the JIT
The compiler 10 adds the vertex for which the object code can be newly generated to the column of the vertex for which the object code can be generated by the process of S34 (procedure 3-
3).

【０１３８】ステップ３６（Ｓ３６）において、ＪＩＴ
コンパイラ１０は、オブジェクトコード生成可能な頂点
の列が空になった場合には処理を終了し、これ以外の場
合にはＳ３７の処理に進む（手続３−８）。In step 36 (S36), the JIT
Compiler 10 terminates the process when the column of vertices for which object code can be generated becomes empty, and otherwise proceeds to the process of S37 (procedure 3-8).

【０１３９】ステップ３７（Ｓ３７）において、その時
点での（現在の）オブジェクトコード生成方針に従っ
て、オブジェクトコード生成可能な頂点の列を整列し、
Ｓ３４の処理に戻る（手続３−７）。In step 37 (S37), the rows of vertices for which object codes can be generated are arranged in accordance with the (current) object code generation policy at that time.
The process returns to S34 (procedure 3-7).

【０１４０】図７に示すように、Ｓ３７（図６）のステ
ップ４０（Ｓ４０）において、ＪＩＴコンパイラ１０
（図２）は、Ｓ３４の処理によるオブジェクトコード生
成により変化したレジスタ数を計算する（手続３−
５）。As shown in FIG. 7, in step 40 (S40) of S37 (FIG. 6), the JIT compiler 10
(FIG. 2) calculates the number of registers changed by the generation of the object code in the process of S34 (procedure 3-).
5).

【０１４１】ステップ４１（Ｓ４１）において、使用可
能なレジスタ数が０の場合にはＳ４２の処理に進み、こ
れ以外の場合にはＳ４３の処理に進む。つまり、Ｓ４１
において、共通部分式の逆変換の判定条件が判断され
る。In step 41 (S41), if the number of available registers is 0, the process proceeds to S42, otherwise, the process proceeds to S43. That is, S41
In, the condition for determining the inverse transformation of the common sub-expression is determined.

【０１４２】ステップ４２（Ｓ４２）において、ＪＩＴ
コンパイラ１０は、オブジェクトコードを生成していな
いＤＡＧの頂点であって、共通部分式の削除によって、
ハードウェア１３の使用可能なレジスタが増える頂点が
あれば、その頂点に対して逆変換を行う（手続４−１〜
４−３）。In step 42 (S42), the JIT
The compiler 10 is a vertex of the DAG for which no object code has been generated.
If there is a vertex at which the number of available registers of the hardware 13 increases, inverse transformation is performed on the vertex (procedure 4-1 to procedure 4-1).
4-3).

【０１４３】ステップ４３（Ｓ４３）において、ＪＩＴ
コンパイラ１０は、その時点で使用可能なハードウェア
１３のレジスタ数が所定の閾値よりも多いか否かを判断
し、多い場合にはＳ４４の処理に進み、これ以外の場合
にはＳ４５の処理に進む（手続３−７）。In step 43 (S43), the JIT
The compiler 10 determines whether or not the number of registers of the hardware 13 that can be used at that time is larger than a predetermined threshold. If the number is larger, the process proceeds to S44. Proceed (procedure 3-7).

【０１４４】ステップ４４（Ｓ４４）において、ＪＩＴ
コンパイラ１０は、使用可能なレジスタが数的に十分に
あると判断し、オブジェクトコードの並行性が高くなる
ように、ＤＡＧのクリティカルパス上の頂点を優先して
オブジェクトコードの生成を行う。In step 44 (S44), the JIT
The compiler 10 determines that the available registers are sufficient in number, and generates the object code by giving priority to the vertices on the critical path of the DAG so that the parallelism of the object code is high.

【０１４５】ステップ４５（Ｓ４５）において、ＪＩＴ
コンパイラ１０は、使用可能なレジスタが少なくなって
いると判断し、オブジェクトコード生成可能な頂点の列
に含まれる各頂点について、使用可能なレジスタ数の増
減などのパラメータ（△ｒｅｇ）を計算し、使用可能な
レジスタ数を増やす頂点を優先して整列を行う。In step 45 (S45), the JIT
The compiler 10 determines that the number of available registers is low, and calculates a parameter (△ reg) such as an increase or decrease in the number of available registers for each vertex included in the column of vertex where the object code can be generated, Priority is given to vertices that increase the number of available registers.

【０１４６】図７に示した処理に続き、ＪＩＴコンパイ
ラ１０は、図３に示したＳ１７の処理の続きとして、図
８に示す弱いレジスタ制約の伝播、および、二つ以上の
定義を持つＷｅｂを検出する。図８に示すように、ステ
ップ４８（Ｓ４８）において、ＪＩＴコンパイラ１０
（図２）は、中間コード（図３のＳ１６）中の各命令
に、ステップ４９において伝播されるべき弱い制約を記
述する。つまり、ＪＩＴコンパイラ１０は、マシンの仕
様等に基づいて中間コードの各命令に弱い制約を記述す
る。Following the processing shown in FIG. 7, as a continuation of the processing in S17 shown in FIG. 3, the JIT compiler 10 transmits the weak register constraint shown in FIG. 8 and the Web having two or more definitions as shown in FIG. To detect. As shown in FIG. 8, in step 48 (S48), the JIT compiler 10
(FIG. 2) describes the weak constraints to be propagated in step 49 for each instruction in the intermediate code (S16 in FIG. 3). That is, the JIT compiler 10 describes weak constraints on each instruction of the intermediate code based on the specifications of the machine and the like.

【０１４７】ステップ４９（Ｓ４９）において、ＪＩＴ
コンパイラ１０は、Ｓ４８の処理において記述された弱
い制約をデータの使用から定義にむけて伝播する。つま
り、ＪＩＴコンパイラ１０は、制約をデータの使用から
定義にむけて後ろ向きに伝播させる。In step 49 (S49), the JIT
The compiler 10 propagates the weak constraint described in the processing of S48 from the use of data to the definition. That is, the JIT compiler 10 propagates the constraint backward from use of data to definition.

【０１４８】ステップ５０（Ｓ５０）において、ＪＩＴ
コンパイラ１０は、データの定義および使用連鎖の情報
を利用しながら、ある変数に対する定義・使用の連結集
合（Ｗｅｂ）を検出する。特に本ステップでは二つ以上
の定義を持つＷｅｂを検出する。つまり、ＪＩＴコンパ
イラ１０は、中間コードを前から後ろに走査し、記述さ
れた制約に基づいてレジスタを割り付けていく。In step 50 (S50), the JIT
The compiler 10 detects a linked set of definitions and uses for a certain variable (Web) while using information on data definitions and usage chains. Particularly, in this step, a Web having two or more definitions is detected. That is, the JIT compiler 10 scans the intermediate code from the front to the back and allocates registers based on the described constraints.

【０１４９】再度、図３を参照する。ステップ１８（Ｓ
１８）において、ＪＩＴコンパイラ１０は、中間コード
内のそれぞれの命令に対し、レジスタ割付とオブジェク
トコード生成を行う。Referring again to FIG. Step 18 (S
In 18), the JIT compiler 10 performs register allocation and object code generation for each instruction in the intermediate code.

【０１５０】図９に示すように、Ｓ１８（図３）のステ
ップ５３（Ｓ５３）において、ＪＩＴコンパイラ１０
（図２）は、中間コードの各要素（中間コード内の各命
令）について、レジスタ割付とオブジェクトコードの作
成が終了したか否かを判断し、これらが終了した場合に
はオブジェクトコード（Ｓ２０；図３）を生成してＳ２
１の処理に進み、これ以外の場合にはＳ５４の処理に進
む。つまり、つまり、中間コード内の最初の命令から最
後の命令まで、コード生成とレジスタ割付を繰り返した
ことが終了条件とされる。As shown in FIG. 9, in step 53 (S53) of S18 (FIG. 3), the JIT compiler 10
(FIG. 2) determines whether or not register allocation and object code creation are completed for each element of the intermediate code (each instruction in the intermediate code), and when these are completed, the object code (S20; Figure 3) is generated and S2
Then, the process proceeds to S54. Otherwise, the process proceeds to S54. That is, the end condition is that code generation and register allocation are repeated from the first instruction to the last instruction in the intermediate code.

【０１５１】ステップ５４（Ｓ５４）において、ＪＩＴ
コンパイラ１０は、中間コードの各要素について、レジ
スタの割付とオブジェクトコード生成を行う。In step 54 (S54), the JIT
The compiler 10 performs register allocation and object code generation for each element of the intermediate code.

【０１５２】ステップ５５（Ｓ５５）において、ＪＩＴ
コンパイラ１０は、Ｓ５４の処理において割り付けられ
たレジスタを強い制約条件として、２つ以上の定義を持
つＷｅｂに伝播する。つまり、ＪＩＴコンパイラ１０
は、ある命令で使用される変数に割り付けられたレジス
タを、その変数が含まれるＷｅｂをたどり、Ｗｅｂは定
義と使用の集合なので、その要素となってる全ての定義
と使用を行う命令に強い制約を記述する。In step 55 (S55), the JIT
The compiler 10 propagates the register allocated in the processing of S54 to the Web having two or more definitions as a strong constraint. In other words, JIT compiler 10
Traces the register allocated to the variable used in a certain instruction to the Web in which the variable is included. Since the Web is a set of definitions and uses, there is a strong restriction on all the definitions and uses of the instructions that make up the element. Describe.

【０１５３】ステップ５６（Ｓ５６）において、ＪＩＴ
コンパイラ１０は、処理対象となる中間コードの次の要
素（命令）を取り出す。Ｓ５７でＳ１９が得られる。In step 56 (S56), the JIT
The compiler 10 extracts the next element (instruction) of the intermediate code to be processed. S19 is obtained in S57.

【０１５４】みたび図３を参照する。ステップ２１（Ｓ
２１；図３）において、ＪＩＴコンパイラ１０（図２）
は、コードスケジュール処理を行い、高速動作可能なオ
ブジェクトコード（マシンコード１１；図２）を生成す
る。Reference is now made to FIG. Step 21 (S
21; in FIG. 3), the JIT compiler 10 (FIG. 2)
Performs a code schedule process to generate an object code (machine code 11; FIG. 2) that can operate at high speed.

【０１５５】[0155]

【発明の効果】以上述べたように、本発明にかかるコン
パイル装置およびその方法によれば、プロセッサの物理
レジスタ数に起因する制約を満たす範囲において、コー
ドスケジューリングおよび共通部分式削除により、オブ
ジェクトコードを効果的に最適化することができる。As described above, according to the compiling apparatus and method of the present invention, object scheduling is performed by code scheduling and common sub-expression elimination within a range that satisfies the constraint caused by the number of physical registers of the processor. It can be optimized effectively.

【０１５６】また、本発明にかかるコンパイル装置およ
びその方法によれば、ＤＡＧ(directed acyclic graph)
形式で表現されたソースプログラムから、レジスタ使用
数とハードウェアの実行サイクル数を評価しながらオブ
ジェクトコードの生成を行い、コードスケジューリング
による最適化および共通部分式削除の適用範囲と、物理
レジスタ数の制約とを最適にトレードオフすることがで
きる。According to the compiling apparatus and the compiling method of the present invention, a DAG (directed acyclic graph) is provided.
Generates object code from the source program expressed in the format while evaluating the number of registers used and the number of hardware execution cycles, the scope of optimization by code scheduling and the elimination of common subexpressions, and the restrictions on the number of physical registers Can be optimally traded off.

[Brief description of the drawings]

【図１】ＤＡＧを例示する図である。FIG. 1 is a diagram illustrating a DAG.

【図２】第１の実施形態および第２の実施形態に示した
本発明にかかるコンパイラを実現するコンピュータシス
テムの構成を示す図である。FIG. 2 is a diagram illustrating a configuration of a computer system that implements a compiler according to the present invention illustrated in the first embodiment and the second embodiment.

【図３】図２に示したＪＩＴコンパイラの動作を示すフ
ローチャートである。FIG. 3 is a flowchart showing an operation of the JIT compiler shown in FIG. 2;

【図４】図３に示した中間コード最適化処理（Ｓ１７）
を示す第１のフローチャートである。FIG. 4 is an intermediate code optimizing process shown in FIG. 3 (S17);
6 is a first flowchart illustrating the process of FIG.

【図５】図４に示した中間コード作成処理（Ｓ２６）を
示すフローチャートである。FIG. 5 is a flowchart showing an intermediate code creation process (S26) shown in FIG. 4;

【図６】図５に示したＤＡＧからの中間コード作成処理
（Ｓ３０）を示すフローチャートである。FIG. 6 is a flowchart showing a process of creating an intermediate code from a DAG shown in FIG. 5 (S30).

【図７】図６に示した頂点列整列処理（Ｓ３７）を示す
フローチャートである。FIG. 7 is a flowchart showing a vertex string alignment process (S37) shown in FIG. 6;

【図８】図３に示した中間コード最適化処理（Ｓ１７）
を示す第２のフローチャートであって、図４に示した処
理に続く処理を示す。FIG. 8 is an intermediate code optimizing process shown in FIG. 3 (S17);
5 is a second flowchart showing processing subsequent to the processing shown in FIG.

【図９】図３に示したコード生成・レジスタ割付処理
（Ｓ１８）を示すフローチャートである。FIG. 9 is a flowchart showing a code generation / register allocation process (S18) shown in FIG. 3;

[Explanation of symbols]

１・・・Ｊａｖａ開発環境２・・・Ｊａｖａソースコード３・・・Ｊａｖａコンパイラ４・・・Ｊａｖａバイトコード５・・・媒体６・・・クライアント装置７・・・ＪａｖａＶＭ８・・・Ｊａｖａバイトコード検証部９・・・Ｊａｖａインタプリタ１０・・・ＪＩＴコンパイラ１１・・・マシンコード１２・・・ＯＳ１３・・・ハードウェア DESCRIPTION OF SYMBOLS 1 ... Java development environment 2 ... Java source code 3 ... Java compiler 4 ... Java bytecode 5 ... Medium 6 ... Client device 7 ... JavaVM 8 ... Java bytecode Verification unit 9 Java interpreter 10 JIT compiler 11 Machine code 12 OS 13 Hardware

───────────────────────────────────────────────────── フロントページの続き (72)発明者古関聰神奈川県大和市下鶴間1623番地14 日本アイ・ビー・エム株式会社東京基礎研究所内 (72)発明者稲垣達氏神奈川県大和市下鶴間1623番地14 日本アイ・ビー・エム株式会社東京基礎研究所内 (72)発明者安江俊明神奈川県大和市下鶴間1623番地14 日本アイ・ビー・エム株式会社東京基礎研究所内 (72)発明者小松秀昭神奈川県大和市下鶴間1623番地14 日本アイ・ビー・エム株式会社東京基礎研究所内 (72)発明者竹内幹雄神奈川県大和市下鶴間1623番地14 日本アイ・ビー・エム株式会社東京基礎研究所内Ｆターム(参考） 5B081 CC25 CC41 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Satoshi Furuseki 1623-14 Shimotsuruma, Yamato-shi, Kanagawa Prefecture Inside the Tokyo Research Laboratory, IBM Japan, Ltd. (72) Inventor Tatsu Inagaki, Yamato-shi, Kanagawa Prefecture 1623-14 Tsuruma, Tokyo Japan Basic Research Laboratories, IBM Japan, Ltd. (72) Inventor Toshiaki Yasue 1623-14 Shimotsuruma, Yamato-shi, Kanagawa Prefecture, Japan Tokyo Research Laboratories, IBM Japan, Ltd. (72 Inventor Hideaki Komatsu 1623-14 Shimotsuruma, Yamato City, Kanagawa Prefecture Inside the Tokyo Research Laboratory, IBM Japan, Ltd. (72) Mikio Takeuchi 1623-14 Shimotsuruma, Yamato City, Kanagawa Prefecture, Japan F-term in M Basic Research Laboratory Tokyo (reference) 5B081 CC25 CC41

Claims

[Claims]

1. A conversion means for converting a source code of a program into a DAG format, and a parameter of a processor for executing an object code corresponding to each instruction included in the converted source code by using a predetermined number of registers. A compiling device comprising: parameter calculating means for calculating; and compiling means for scanning the converted source code and compiling the source code based on the calculated parameters of the processor to generate object code. Wherein the parameter calculating means comprises: timing calculating means for calculating a timing at which the object code corresponding to each of the instructions is executed by the processor; and wherein the object code is occupied when the object code corresponding to each of the instructions is executed. Processor Compiling device having a register decrease calculation means for calculating a numerical value indicating the increase or decrease of register number.

2. The method according to claim 1, wherein the compiling means executes the instructions in the converted source code in accordance with an execution order of the instructions and a priority given to the instructions.
Object code generation means for sequentially compiling each of the instructions to generate an object code for each of the instructions; and executing each of the instructions which may be next compiled based on a numerical value indicating an increase or decrease in the number of registers. Occupied register number calculating means for calculating the number of occupied registers indicating the number of registers of the processor which are occupied when the instruction is executed, based on the calculated number of open registers and the timing at which the object code of the instruction is executed. ,
2. The compiling device according to claim 1, further comprising: a priority assigning unit that assigns a priority to the instruction that is likely to be compiled next.

3. The priority assigning means compares the calculated number of occupied registers with a predetermined threshold. If the calculated number of occupied registers is larger than a predetermined threshold, the priority assigning means calculates the occupied register. 3. The compiling apparatus according to claim 2, wherein an instruction for increasing or decreasing the number of registers is assigned a high priority, and otherwise, an instruction on a critical path is assigned a high priority.

4. The compile means is configured to generate an object code after deleting a common subexpression in the instruction, and when the calculated occupied register number is larger than the predetermined threshold value. 4. The compiling device according to claim 3, further comprising an inverse transform unit for performing an inverse transform for restoring the deleted common subexpression.

5. The compiling device according to claim 1, further comprising data allocating means for allocating said register based on a predetermined constraint condition for restricting a use of a register of said processor.

6. A conversion step of converting a source code of a program into a DAG format, and a parameter of a processor for executing an object code corresponding to each instruction included in the converted source code by using a predetermined number of registers. A compiling method comprising: a parameter calculating step of calculating; and a compiling step of scanning the converted source code and compiling the source code based on the calculated parameters of the processor to generate an object code. In the parameter calculation step, the timing at which the object code corresponding to each of the instructions is executed by the processor is calculated, and the processor occupied when the object code corresponding to each of the instructions is executed. Compiling method for calculating a numerical value indicating the increase or decrease of static number.

7. A conversion step of converting a source code of a program into a DAG format, and a parameter of a processor for executing an object code corresponding to each instruction included in the converted source code by using a predetermined number of registers. A program for causing a computer to execute a parameter calculating step of calculating, and a compiling step of scanning the converted source code and compiling the source code based on the calculated parameters of the processor to generate an object code. Wherein in the parameter calculation step, a process of calculating a timing at which the object code corresponding to each of the instructions is executed by the processor; and an occupation when the object code corresponding to each of the instructions is executed. Medium mediating program for executing a process of calculating a numerical value indicating the register number of increase or decrease of the processor computers.