JP5402752B2

JP5402752B2 - Compiling device and compiling program

Info

Publication number: JP5402752B2
Application number: JP2010063509A
Authority: JP
Inventors: 真駒形
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-03-19
Filing date: 2010-03-19
Publication date: 2014-01-29
Anticipated expiration: 2030-03-19
Also published as: JP2011197982A

Description

本件は、ソースコードを取得してプロセッサでの実行が可能なオブジェクトコードを作成するコンパイル装置、および演算処理装置内で実行されてその演算処理装置をコンパイル装置として動作させるコンパイルプログラムに関する。 The present invention relates to a compiling device that obtains source code and creates an object code that can be executed by a processor, and a compiling program that is executed in the arithmetic processing device and operates the arithmetic processing device as a compiling device.

プログラマがプログラムを作成するときに使用するプログラム言語で記述されたプログラムであるソースコードは、そのままではコンピュータのプロセッサで実行することはできない。そのプログラムをプロセッサで実行させるためには、そのソースコードを、プロセッサでの実行が可能な、そのプロセッサの構造が考慮されたオブジェクトコードに変換する必要がある。このコンパイルでは、レジスタ割付け処理が行なわれる。このレジスタ割付け処理は、コンパイラの中間言語に出現する変数が割り付けられた無制限の数の仮想レジスタを少数の実レジスタに割り付けることにより、変数を実レジスタに割り付ける作業である。 Source code, which is a program written in a programming language used when a programmer creates a program, cannot be executed by a computer processor as it is. In order to cause the program to be executed by the processor, it is necessary to convert the source code into an object code that can be executed by the processor and takes into consideration the structure of the processor. In this compilation, a register allocation process is performed. This register allocation processing is an operation of allocating variables to real registers by allocating an unlimited number of virtual registers to which variables appearing in the compiler intermediate language are allocated to a small number of real registers.

しかしながら、プロセッサによっては、実レジスタが複数のグループに分かれていて、別グループに属する実レジスタであっても共用可能であるが、１つの命令内で別グループの実レジスタを使用すると、同一グループ内の実レジスタを使用したときよりも命令の実行に遅延が生じる。 However, depending on the processor, the real registers are divided into multiple groups, and even real registers belonging to different groups can be shared. However, if another group of real registers is used within one instruction, There is a delay in the execution of instructions than when using real registers.

また、レジスタ割付結果によっては、複数の演算器を持つプロセッサであってもレジスタの読み書きによる依存関係が生まれて並列処理が大きく制限されるおそれがある。 In addition, depending on the register allocation result, even a processor having a plurality of arithmetic units may have a dependency due to reading and writing of registers, and parallel processing may be greatly limited.

ある演算器ではその演算器でしか使用できず他の演算器では使用できないという、演算器とレジスタとの対応関係が完全に固定されたシステムの場合の並列処理の工夫の提案は存在する。これに対し、本件は、複数の演算器と複数のレジスタを有し、複数の演算器のうちのどの演算器であっても複数レジスタのうちのどのレジスタをも使用可能なタイプのプロセッサを対象としている。このタイプのプロセッサを使う場合の演算処理の高速化のためのコンパイルの工夫は見当たらない。 There is a proposal for a contrivance for parallel processing in a system in which the correspondence between an arithmetic unit and a register is completely fixed, which can be used only by that arithmetic unit and cannot be used by other arithmetic units. In contrast, this case targets a processor of a type that has a plurality of arithmetic units and a plurality of registers and can use any of the plurality of registers regardless of the arithmetic unit of the plurality of arithmetic units. It is said. There is no compile ingenuity to speed up arithmetic processing when using this type of processor.

特開平９−６２６３６号公報JP-A-9-62636

本件開示のコンパイル装置およびコンパイルプログラムの課題は、上記のタイプのプロセッサでの演算処理の高速化を図ることにある。 An object of the compiling apparatus and the compiling program disclosed in the present disclosure is to increase the speed of arithmetic processing in the above type of processor.

本件開示のコンパイル装置は、ソースコードを取得して、複数のレジスタと該複数のレジスタを使いながら演算処理を実行する複数の演算器とを有するプロセッサでの実行が可能なオブジェクトコードを作成するコンパイル装置である。 The compiling device of the present disclosure obtains source code and compiles object code that can be executed by a processor having a plurality of registers and a plurality of arithmetic units that perform arithmetic processing using the plurality of registers. Device.

本件開示のコンパイル装置は、ソースコード取得部と、アンローリング部と、レジスタ割付部と、オブジェクトコード生成部と、オブジェクトコード出力部とを有する。 The compiling device of the present disclosure includes a source code acquisition unit, an unrolling unit, a register allocation unit, an object code generation unit, and an object code output unit.

ソースコード取得部は、ソースコードを取得する。 The source code acquisition unit acquires source code.

アンローリング部は、ソースコード内に記述されたループを複数の命令群にアンローリングする。 The unrolling unit unrolls the loop described in the source code into a plurality of instruction groups.

レジスタ割付部は、アンローリング部で生成された複数の命令群に出現する変数にレジスタを割り付ける。 The register allocation unit allocates registers to variables appearing in a plurality of instruction groups generated by the unrolling unit.

オブジェクトコード生成部は、レジスタ割付部におけるレジスタの割付け結果に基づいてオブジェクトコードを生成する。 The object code generation unit generates an object code based on the register allocation result in the register allocation unit.

オブジェクトコード出力部は、オブジェクトコード生成部で生成されたオブジェクトコードを出力する。 The object code output unit outputs the object code generated by the object code generation unit.

ここで、上記レジスタ割付部は、第１の割付部を含む。第１の割付部は、上記複数のレジスタを複数のレジスタ群にグループ分けしたときの各１つのレジスタ群を複数の命令群それぞれに割り当てる。そして、この第１の割付部は、それら複数の命令群それぞれについて、当該命令群に出現する変数のうちの複数の命令群に共通に使用される共通変数を除く個別変数の全てを当該命令群に割り当てられた１つのレジスタ群を構成するレジスタのみに割り付ける。 Here, the register allocation unit includes a first allocation unit. The first assigning unit assigns each register group to the plurality of instruction groups when the plurality of registers are grouped into a plurality of register groups. Then, for each of the plurality of instruction groups, the first assigning unit assigns all of the individual variables excluding the common variables used in common to the plurality of instruction groups among the variables appearing in the instruction group. Are assigned only to the registers constituting one register group assigned to.

また、本件のコンパイルプログラムは、プログラムを実行する演算処理装置で実行されてその演算処理装置を本件のコンパイル装置として動作させるプログラムである。 Further, the compile program of the present case is a program that is executed by an arithmetic processing device that executes the program and causes the arithmetic processing device to operate as the compile device of the present case.

本件開示のコンパイル装置およびコンパイルプログラムによれば、複数の演算器とそれら複数の演算器で共用可能な複数のレジスタとを有するプロセッサでの演算処理の高速化が図られる。 According to the compiling device and the compiling program of the present disclosure, it is possible to increase the speed of arithmetic processing in a processor having a plurality of arithmetic units and a plurality of registers that can be shared by the plurality of arithmetic units.

本件の第１実施形態のコンパイル装置の基本構成図である。It is a basic lineblock diagram of the compilation device of a 1st embodiment of this case. 図１のコンパイル装置で作成されるオブジェクトコードの実行が予定されているプロセッサの概念図である。It is a conceptual diagram of the processor by which execution of the object code created with the compiling apparatus of FIG. 1 is scheduled. 第２実施形態のコンパイル装置で実行される処理の手順を示す図である。It is a figure which shows the procedure of the process performed with the compilation apparatus of 2nd Embodiment. ソースファイルに格納されているソースコードの一例を示す図である。It is a figure which shows an example of the source code stored in the source file. 構文解析処理で生成される、中間言語で記述されたコードを示す図である。It is a figure which shows the code described by the intermediate language produced | generated by the parsing process. アンローリング処理のフローチャートである。It is a flowchart of an unrolling process. 図５に示す中間言語で記述されたコードがアンローリングされた状態を示した図である。FIG. 6 is a diagram illustrating a state in which a code described in the intermediate language illustrated in FIG. 5 is unrolled. レジスタ割付け処理における、ループ内の変数をレジスタグループに分類する処理を示すフローチャートである。It is a flowchart which shows the process in which the variable in a loop is classified into a register group in a register allocation process. 図７に示す例における、変数（仮想レジスタ）が分配された配列Ｓを示す図である。FIG. 8 is a diagram illustrating an array S in which variables (virtual registers) are distributed in the example illustrated in FIG. 7. レジスタグループ数Ｎ（ここではＮ＝２）に分類された実レジスタを示す図である。It is a figure which shows the real register classified into the register group number N (here N = 2). 配列Ｂ（ｎ）の内容を示す図である。It is a figure which shows the content of the arrangement | sequence B (n). レジスタ割付け処理のフローチャートである。It is a flowchart of a register allocation process. レジスタ割付けステップのフローチャートである。It is a flowchart of a register allocation step. 参照オペランド割付けステップのフローチャートである。It is a flowchart of a reference operand allocation step. 定義オペランド割付けステップのフローチャートである。It is a flowchart of a definition operand allocation step. 退避復元命令生成ステップのフローチャートである。It is a flowchart of an evacuation / restoration instruction generation step. 実レジスタへの割付け結果を示した図である。It is the figure which showed the allocation result to a real register. 図１０に代わるレジスタグループの例である。11 is an example of a register group in place of FIG. 図１１に代わる、配列Ｂ（ｎ）の内容を示す図である。It is a figure which shows the content of the arrangement | sequence B (n) instead of FIG. 図１８，図１９に示す前提における、実レジスタへの割付けの結果を示した図である。It is the figure which showed the result of the allocation to a real register in the premise shown in FIG. 18, FIG. 図２０の状態が生じた場合の、図１２のステップＳ３４の処理結果を示す図である。It is a figure which shows the process result of FIG.12 S34 when the state of FIG. 20 arises. １つのグループにまとめたレジスタを示す図である。It is a figure which shows the register put together into one group. アンローリングされた命令についてグループ分けせずにレジスタを使ったときの例ある。Here is an example of using registers without grouping for unrolled instructions. アンローリングされた命令についてグループ分けせずにレジスタを使ったときの例を示す図である。It is a figure which shows an example when a register is used without grouping about the unrolled instruction. プロセッサの説明図である。It is explanatory drawing of a processor. 命令の実行に遅延を生じさせない例を示す図である。It is a figure which shows the example which does not produce a delay in execution of an instruction | indication. 命令の実行に遅延を生じさせる例を示す図である。It is a figure which shows the example which produces the delay in execution of an instruction | indication. 並列性が低い例を示す図である。It is a figure which shows an example with low parallelism. 並列性が高い例を示す図である。It is a figure which shows an example with high parallelism.

以下、本件の実施形態を説明する。 Hereinafter, an embodiment of the present case will be described.

図１は、本件の第１実施形態のコンパイル装置の基本構成図である。このコンパイル装置１０は、ソースコードを取得して、プロセッサでの実行が可能なオブジェクトコードを作成する装置である。ただし、このコンパイル装置１０で作成されるオブジェクトコードは、複数のレジスタと該複数のレジスタを使いながら演算処理を実行する複数の演算器とを有するプロセッサで実行されるオブジェクトコードである。したがってこのオブジェクトコードは、この図１に示すプロセッサ２１で実行されることを予定するものではない。 FIG. 1 is a basic configuration diagram of a compiling device according to the first embodiment of the present invention. The compiling device 10 is a device that obtains source code and creates object code that can be executed by a processor. However, the object code created by the compiling apparatus 10 is an object code executed by a processor having a plurality of registers and a plurality of arithmetic units that execute arithmetic processing using the plurality of registers. Therefore, this object code is not intended to be executed by the processor 21 shown in FIG.

図２は、図１のコンパイル装置で作成されるオブジェクトコードの実行が予定されているプロセッサの概念図である。 FIG. 2 is a conceptual diagram of a processor that is scheduled to execute an object code created by the compiling apparatus of FIG.

図１に示すプロセッサ２１は、その図１に示すＯＳ３０やコンパイルプログラム４０を実行するプロセッサである。これに対し、この図２に示すプロセッサ５０は、図１のコンパイル装置４０で作成されたオブジェクトコードの実行を担うプロセッサである。 The processor 21 shown in FIG. 1 is a processor that executes the OS 30 and the compile program 40 shown in FIG. On the other hand, the processor 50 shown in FIG. 2 is a processor responsible for executing the object code created by the compiling device 40 shown in FIG.

コンパイル装置４０で作成されたオブジェクトコードは、例えばＣＤ（ＣｏｍｐａｃｔＤｉｓｋ光ディスク）等に書き込まれ、図２に示す構成のプロセッサを持つ、図１の演算処理装置とは別の演算処理装置ないし計算機にインストールされる。そしてこのオブジェクトコードは、そのインストール先の演算処理装置ないし計算機のプロセッサで実行される。 The object code created by the compiling device 40 is written in, for example, a CD (Compact Disk optical disc) or the like, and is installed in an arithmetic processing device or computer having the processor shown in FIG. 2 and different from the arithmetic processing device in FIG. Is done. This object code is executed by a processor of the installation destination computer or computer.

この図２に示すプロセッサ５０は、演算処理を実行する複数（図２に示す例では２個）の演算器５１ａ，５１ｂと、それら複数の演算器５１ａ，５１ｂでの共用が可能な複数（ここではｎ個）のレジスタ５２ａ，５２ｂ，…，５２ｎとを有する。 The processor 50 shown in FIG. 2 includes a plurality of (two in the example shown in FIG. 2) arithmetic units 51a and 51b that can be shared by the plurality of arithmetic units 51a and 51b. N) registers 52a, 52b,..., 52n.

図１に戻って説明を続ける。 Returning to FIG. 1, the description will be continued.

このコンパイル装置１０は、演算処理装置のハードウェア２０と、そのハードウェア２０上で実行されるオペレーティング・システム（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍＯＳ）３０と、そのＯＳ３０の管理下で実行されるコンパイルプログラム４０とを有する。 The compiling device 10 includes hardware 20 of an arithmetic processing device, an operating system (Operating System OS) 30 executed on the hardware 20, and a compiling program 40 executed under the management of the OS 30. .

ハードウェア２０は、プロセッサ２１と、メモリ２２と、入力装置２３と、出力装置２４を有する。 The hardware 20 includes a processor 21, a memory 22, an input device 23, and an output device 24.

プロセッサ２１では、ＯＳ３０やコンパイルプログラム４０が実行される。 In the processor 21, the OS 30 and the compile program 40 are executed.

また、メモリ２２は、データの記憶を担う。 The memory 22 is responsible for storing data.

また、ハードウェア２０を構成する入力装置２３は、このコンパイル装置１０の利用者により操作されるキーボードやマウス、ＣＤやＤＶＤ等の可搬型記録媒体の装填を受けてドライブするＣＤ／ＤＶＤドライブ、さらに通信回線（図示せず）を介しての外部からの情報受信を担う受信装置等を含む。 The input device 23 constituting the hardware 20 includes a keyboard / mouse operated by a user of the compiling device 10, a CD / DVD drive that is driven by loading a portable recording medium such as a CD / DVD, and the like. It includes a receiving device for receiving information from the outside via a communication line (not shown).

さらに出力装置２４は、画像を表示する画像表示装置や音声を出力するスピーカ、さらに、外部に向けて情報を送信する送信装置等を含む。 Furthermore, the output device 24 includes an image display device that displays an image, a speaker that outputs sound, and a transmission device that transmits information to the outside.

またＯＳ３０は、ハードウェア２０上で動作するプログラム（ここではコンパイルプログラム４０）の実行を制御する。 The OS 30 controls execution of a program (here, the compile program 40) that operates on the hardware 20.

コンパイルプログラム４０は、ソースコード取得部４１、アンローリング部４２、レジスタ割付部４３、オブジェクトコード生成部４４、およびオブジェクトコード出力部４５を有する。これらの各部４１〜４５は、ここではプログラム部品の名称であるが、それらの各プログラム部品がプロセッサ２１で実行されたときの機能としての、コンパイル装置１０の各構成要素を指す名称でもある。 The compile program 40 includes a source code acquisition unit 41, an unrolling unit 42, a register allocation unit 43, an object code generation unit 44, and an object code output unit 45. Each of these units 41 to 45 is a name of a program part here, but is also a name indicating each component of the compiling apparatus 10 as a function when the program part is executed by the processor 21.

ソースコード取得部４１は、ソースコードを取得する。このソースコードは、入力装置２３から利用者が入力したものであってもよく、ＣＤやＤＶＤに記憶されているソースコードがインストールされたものであってもよい。あるいは外部から通信回線を経由して入力されたものであってもよい。 The source code acquisition unit 41 acquires source code. This source code may be input by the user from the input device 23, or may be the source code stored on the CD or DVD installed. Alternatively, it may be input from outside via a communication line.

またアンローリング部４２は、ソースコード内に記述されたループを複数の命令群にアンローリングする。具体例の説明は後に譲る。 The unrolling unit 42 unrolls a loop described in the source code into a plurality of instruction groups. A specific example will be described later.

また、レジスタ割付部４３は、アンローリング部４２で生成された複数の命令群に出現する変数にレジスタを割り付ける。このレジスタ割付部４３は、第１の割付部４３１、比較部４３２、メモリ退避復元コード挿入部４３３、および第２の割付部４３４を有する。 The register allocation unit 43 allocates registers to variables that appear in a plurality of instruction groups generated by the unrolling unit 42. The register allocation unit 43 includes a first allocation unit 431, a comparison unit 432, a memory save / restore code insertion unit 433, and a second allocation unit 434.

第１の割付部４３１は、図２に示すプロセッサ５０を構成する複数のレジスタ５２ａ，５２ｂ，…，５２ｎを複数のレジスタ群にグループ分けしたときの各１つのレジスタ群を上記の複数の命令群それぞれに割り当てる。そして、この第１の割付部４３１は、それら複数の命令群それぞれについて、当該命令群に出現する変数のうちの複数の命令群に共通に使用される共通変数を除く個別変数の全てを当該命令群に割り当てられた１つのレジスタ群を構成するレジスタのみに割り付ける。 The first assigning unit 431 assigns each register group when the plurality of registers 52a, 52b,..., 52n constituting the processor 50 shown in FIG. Assign to each. Then, for each of the plurality of instruction groups, the first assigning unit 431 assigns all of the individual variables except for the common variables used in common to the plurality of instruction groups among the variables appearing in the instruction group. Allocation is made only to registers constituting one register group assigned to the group.

また、比較部４３２は、第１の割付部４３１による個別変数全てについてのレジスタへの割付けが不能であった場合に作用する。この比較部４３２は、上記複数の命令群に出現する全ての変数のうちの第１の割付部４３１ではレジスタの割り当てが不能であった全ての変数の参照回数の、プログラム全体の長さに対する比率と、閾値とを比較する。 Further, the comparison unit 432 operates when it is impossible to allocate all the individual variables to the registers by the first allocation unit 431. The comparison unit 432 is a ratio of the number of reference times of all variables that cannot be assigned to a register in the first allocation unit 431 among all variables appearing in the plurality of instruction groups to the overall program length. And the threshold value are compared.

メモリ退避復元コード挿入部４３３は、比較部４３２において上記比率が閾値未満であると判定された場合に動作する。このメモリ退避復元コード挿入部４３３は、第１の割付部４３１ではレジスタの割付けが不能であった変数のレジスタへの割付けに代えて、レジスタへの割付け不能な変数のメモリ２２への退避およびメモリ２２からの復元を指示するコードを挿入する。 The memory save / restore code insertion unit 433 operates when the comparison unit 432 determines that the ratio is less than the threshold value. The memory saving / restoring code insertion unit 433 saves the memory in the memory 22 of the variable that cannot be allocated to the register, instead of allocating the variable to the register that cannot be allocated in the first allocation unit 431. A code for instructing restoration from 22 is inserted.

第２の割付部４３４は、比較部４３２において上記比率が閾値以上であると判定された場合に動作する。この第２の割付部４３４は、複数の命令群のいずれにおいても、グループ分けされる前の複数のレジスタ５２ａ，５２ｂ，…，５２ｎ（図２参照）のいずれもが使用可能であるとして、それら複数の命令群に出現する全ての変数をレジスタに割り付ける。 The second allocation unit 434 operates when the comparison unit 432 determines that the ratio is equal to or greater than the threshold value. The second allocation unit 434 assumes that any of a plurality of registers 52a, 52b,..., 52n (see FIG. 2) before grouping can be used in any of a plurality of instruction groups. All variables appearing in multiple instruction groups are assigned to registers.

さらに、オブジェクトコード生成部４４は、レジスタ割付部４３におけるレジスタの割付け結果に基づいてオブジェクトコードを生成する。 Further, the object code generation unit 44 generates an object code based on the register allocation result in the register allocation unit 43.

さらにオブジェクトコード出力部４５は、オブジェクトコード生成部４４で生成されたオブジェクトコードを出力する。ここでの出力は、例えば、ＣＤ等の可搬型記録媒体への書込みや、あるいは通信回線を介して外部に送信することなど、どのような出力であってもよい。 Further, the object code output unit 45 outputs the object code generated by the object code generation unit 44. The output here may be any output, such as writing to a portable recording medium such as a CD, or transmitting to the outside via a communication line.

以上で、概括的な第１実施形態の説明を終了し、以下では、より具体的な第２実施形態を説明する。 This is the end of the general description of the first embodiment, and a more specific second embodiment will be described below.

図３は、第２実施形態のコンパイル装置で実行される処理の手順を示す図である。ここに示す処理は、図１に示すハードウェア２０およびＯＳ３０の下で、第２実施形態としてのコンパイルプログラムが実行されることにより実現する処理である。演算処理装置のハードウェアおよびＯＳは図１のものと代わるところがないため、ここでは必要に応じて図１を参照することとする。また、この第２実施形態で作成されるオブジェクトコードも、図２に示す構成のプロセッサ５０で実行されるオブジェクトコードである。 FIG. 3 is a diagram illustrating a procedure of processes executed by the compiling apparatus according to the second embodiment. The process shown here is a process realized by executing the compile program as the second embodiment under the hardware 20 and the OS 30 shown in FIG. Since the hardware and OS of the arithmetic processing unit are not different from those shown in FIG. 1, reference will be made to FIG. 1 as necessary. The object code created in the second embodiment is also an object code executed by the processor 50 having the configuration shown in FIG.

この図３には、ソースファイル６０と、オブジェクトファイル８０が示されている。 In FIG. 3, a source file 60 and an object file 80 are shown.

ソースファイル６０には、ソースコード用のプログラム言語で記述されたソースコードが格納されている。 The source file 60 stores source code written in a program language for source code.

また、オブジェクトファイル８０には、図２に示すプロセッサ５０で実行可能なオブジェクトコード用のプログラム言語（アセンブラなど）で記述されたオブジェクトコードが格納される。 The object file 80 stores object codes described in a program language (such as an assembler) for object codes that can be executed by the processor 50 shown in FIG.

また、この図３には、この第２実施形態のコンパイル装置で実行されるコンパイル処理７０として、構文解析処理７１、アンローリング処理７２、レジスタ割付け処理７３、命令スケジューリング処理７４およびコード生成処理７５が示されている。 Also, in FIG. 3, as a compile process 70 executed by the compile apparatus of the second embodiment, a syntax analysis process 71, an unrolling process 72, a register allocation process 73, an instruction scheduling process 74, and a code generation process 75 are provided. It is shown.

前述の第１実施形態との対応関係は次の通りである。 The correspondence with the first embodiment described above is as follows.

図３に矢印Ａで示す、ソースファイル６０を受け取る処理が、図１のソースコード取得部４１に相当する。また構文解析処理７１は、その次のアンローリング処理７２の前処理の役割りを担っており、構文解析処理７１とアンローリング処理７２を合わせた処理が図１のアンローリング部４２に相当する。さらに、レジスタ割付け処理７３がレジスタ割付部４３に相当する。また、命令スケジューリング処理７４は、その次のコード生成処理７５の前処理の役割りを担っており、それら命令スケジューリング処理７４とコード生成処理７５とを合わせたものが図１のオブジェクトコード生成部４４に対応する。さらに、矢印Ｂで示す、コード生成処理７５で生成されたオブジェクトコードがファイリングされたオブジェクトファイル７０を出力する処理が、図１のオブジェクトコード出力部４５に相当する。 The process of receiving the source file 60 indicated by the arrow A in FIG. 3 corresponds to the source code acquisition unit 41 in FIG. The parsing process 71 serves as a pre-process for the next unrolling process 72, and the combined process of the parsing process 71 and the unrolling process 72 corresponds to the unrolling unit 42 in FIG. Further, the register allocation process 73 corresponds to the register allocation unit 43. The instruction scheduling process 74 plays a role of pre-processing of the next code generation process 75, and the combination of the instruction scheduling process 74 and the code generation process 75 is the object code generation unit 44 in FIG. Corresponding to Furthermore, the process of outputting the object file 70 in which the object code generated by the code generation process 75 indicated by the arrow B is filed corresponds to the object code output unit 45 of FIG.

ここでは、図３に示す処理７０の概要を説明する。その後、詳細な具体例を説明する。 Here, an outline of the process 70 shown in FIG. 3 will be described. Then, a detailed specific example will be described.

構文解析処理７１では、受け取ったソースファイル６０の構文解析を行なって中間言語に変換する。この中間言語では、個数に制限のない仮想レジスタが使用され、出現する変数に仮想レジスタが割り当てられる。したがってここでは仮想レジスタと変数を区別する必要はなく、以下では特に断らない限りそれらの用語を区別せずに使用することがある。 In the parsing process 71, the received source file 60 is parsed and converted into an intermediate language. In this intermediate language, an unlimited number of virtual registers are used, and virtual registers are assigned to the appearing variables. Therefore, it is not necessary to distinguish between a virtual register and a variable here, and in the following, these terms may be used without distinction unless otherwise specified.

構文解析処理７１が行なわれたあと、アンローリング処理７２が行なわれる。このアンローリング処理７２では、中間言語に変換されたプログラム内のループが複数の命令群にアンローリングされる。 After the parsing process 71 is performed, an unrolling process 72 is performed. In this unrolling process 72, the loop in the program converted into the intermediate language is unrolled into a plurality of instruction groups.

レジスタ割付け処理７４では、アンローリング処理７２によって生成された複数の命令群を含む、中間言語で記述されたコード中の変数（仮想レジスタ）に、実際に使用できる実レジスタ（図２に示すレジスタ５２ａ，５２ｂ，…，５２ｎ）が割り付けられる。 In the register allocation processing 74, a real register (register 52a shown in FIG. 2) that can actually be used as a variable (virtual register) in a code written in an intermediate language, including a plurality of instruction groups generated by the unrolling processing 72. , 52b,..., 52n) are allocated.

命令スケジューリング処理７５では、複数の演算器５１ａ，５１ｂ（図２参照）での処理の並列度を上げるように命令のスケジューリングが行なわれる。 In the instruction scheduling process 75, instruction scheduling is performed so as to increase the degree of parallelism of the processes in the plurality of computing units 51a and 51b (see FIG. 2).

コード生成処理７５では、命令スケジューリング処理７４でのスケジューリングに従うオブジェクトコードの生成が行なわれる。 In the code generation process 75, an object code is generated in accordance with the scheduling in the instruction scheduling process 74.

以下、図３に示す各種処理７１〜７５について具体的に説明する。 Hereinafter, the various processes 71 to 75 shown in FIG. 3 will be specifically described.

図４は、ソースファイルに格納されているソースコードの一例を示す図である。本実施形態ではプログラムのループの部分に関心があり、ここではループに関する部分のみ例示している。 FIG. 4 is a diagram illustrating an example of source code stored in a source file. In this embodiment, there is an interest in the loop portion of the program, and only the portion related to the loop is illustrated here.

（１）行目では、変数ＩとＮが整数であることが定義されている。 (1) In the line, it is defined that the variables I and N are integers.

（２）行目では、配列Ａ（１０），配列Ｂ（１０），変数Ｃ，配列Ｄ（１０），変数ｔ１とｔ２とｔ３のいずれもが実数であることが定義されている。 (2) The line defines that array A (10), array B (10), variable C, array D (10), and variables t1, t2, and t3 are all real numbers.

（２）行目と（ｎ）行目との間（符号（ａ）で示す部分）で、Ｎ，Ａ（１０），Ｂ（１０），Ｃについて具体的な数値が定義されているものとする。 (2) Between the line (n) and the line (n) (part indicated by reference numeral (a)), specific numerical values are defined for N, A (10), B (10), and C. To do.

（ｎ）行目は、ループの始点であって、Ｉ＝１を初期値としＩを１ずつインクリメントしながら、Ｉ＝Ｎとなるまで、（ｎ＋１）行目から（ｎ＋４）行目までの処理を繰り返し実行すべきことが示されている。 The (n) line is the start point of the loop. The process from the (n + 1) line to the (n + 4) line is performed until I = N while I is incremented by 1 with I = 1 as an initial value. Is shown to be repeated.

（ｎ＋１）行目は、Ａ（Ｉ）とＣを加算してｔ１に格納すべきことをあらわしている。 The (n + 1) th row indicates that A (I) and C are added and stored in t1.

（ｎ＋２）行目は、ｔ１からＢ（Ｉ）を引き算してｔ２に格納すべきことをあらわしている。 The (n + 2) line indicates that B (I) should be subtracted from t1 and stored in t2.

（ｎ＋３）行目は、Ａ（Ｉ）とｔ２を乗算してｔ３に格納すべきことをあらわしている。 The (n + 3) line indicates that A (I) and t2 should be multiplied and stored in t3.

（ｎ＋４）行目は、Ｂ（Ｉ）をｔ３で割ってＤ（Ｉ）に格納すべきことをあらわしている。 The (n + 4) line indicates that B (I) should be divided by t3 and stored in D (I).

（ｎ＋５）行目は、ループの終点をあらわしている。 The (n + 5) line represents the end point of the loop.

ここでは、この図４に示すソースコードを基にして具体例を説明する。 Here, a specific example will be described based on the source code shown in FIG.

図５は、構文解析処理７１で生成される、中間言語で記述されたコードを示す図である。 FIG. 5 is a diagram showing a code written in an intermediate language generated by the syntax analysis process 71.

この図５や、以下に説明する各図に出現するＰＲｎ（ｎ＝１，２，…）は仮想レジスタをあらわしている。 PRn (n = 1, 2,...) Appearing in FIG. 5 and in each figure described below represents a virtual register.

また、この図５において、Ａ（Ｉ），Ｂ（Ｉ），Ｄ（Ｉ）はメモリ上の配列を意味しており、Ａ（Ｉ），Ｂ（Ｉ）（Ｉ＝１，２，…，Ｎ）には、この図５に示す部分よりも前にそのメモリ上の配列数値が格納されているものとする。また、この図５に示す部分よりも前にＮの値も定義されており、さらに、Ｃの値も定義されて仮想レジスタＰＲ２に対応づけられているものとする。 In FIG. 5, A (I), B (I), D (I) mean an arrangement on the memory, and A (I), B (I) (I = 1, 2,... It is assumed that the array numerical value in the memory is stored in N) before the portion shown in FIG. Further, it is assumed that the value of N is also defined before the portion shown in FIG. 5 and that the value of C is also defined and associated with the virtual register PR2.

図５のコードの（１）行目は、Ｉに１を代入すべきことが示されている。 The (1) line of the code in FIG. 5 indicates that 1 should be substituted for I.

（２）行目は、ループの始点であって、（１１）行目に記述された条件を満足するときの、その（１１）行目からの戻り点である。 The (2) line is the start point of the loop, and is the return point from the (11) line when the condition described in the (11) line is satisfied.

（３）行目は、メモリ上のＡ（Ｉ）の内容を仮想レジスタＰＲ１に格納すべきことが示されている。 The (3) line indicates that the contents of A (I) on the memory should be stored in the virtual register PR1.

（４）行目は、ＰＲ１（Ａ（Ｉ）に対応する）の内容とＰＲ２（Ｃに対応する）の内容を加算してＰＲ３（ｔ１に対応する）に格納すべきことが示されている。 (4) The line indicates that the contents of PR1 (corresponding to A (I)) and the contents of PR2 (corresponding to C) should be added and stored in PR3 (corresponding to t1). .

（５）行目は、メモリ上のＢ（Ｉ）の内容を仮想レジスタＰＲ４に格納すべきことが示されている。 The (5) line indicates that the contents of B (I) on the memory should be stored in the virtual register PR4.

（６）行目は、ＰＲ３（ｔ１に対応する）の内容からＰＲ４（Ｂ（Ｉ）に対応する）の内容を引き算してＰＲ５（ｔ２に対応する）に格納すべきことが示されている。 (6) The line indicates that the content of PR4 (corresponding to B (I)) should be subtracted from the content of PR3 (corresponding to t1) and stored in PR5 (corresponding to t2). .

（７）行目は、ＰＲ１（Ａ（Ｉ）に対応する）の内容とＰＲ５（ｔ２に対応する）とを乗算してＰＲ６（ｔ３に対応する）に格納すべきことが示されている。 (7) Line 7 indicates that the contents of PR1 (corresponding to A (I)) and PR5 (corresponding to t2) are multiplied and stored in PR6 (corresponding to t3).

（８）行目は、ＰＲ４（Ｂ（Ｉ）に対応する）の内容をＰＲ６（ｔ３に対応する）の内容で割り算してＰＲ７（Ｄ（Ｉ）に対応する）に格納すべきことが示されている。 (8) Line 8 indicates that the content of PR4 (corresponding to B (I)) should be divided by the content of PR6 (corresponding to t3) and stored in PR7 (corresponding to D (I)). Has been.

（９）行目は、ＰＲ７の内容をメモリ上のＤ（Ｉ）に格納すべきことが示されている。 The (9) line indicates that the contents of PR7 should be stored in D (I) on the memory.

（１０）行目は、Ｉに１を加えて再びＩとすることが示されている。 (10) Line 1 shows that I is incremented by adding 1 to I again.

（１１）行目は、Ｉ＜Ｎ＋１ならばＬＡＢＥＬ（（２）行目）に戻るべきことが示されている。 It is indicated that the (11) line should return to LABEL ((2) line) if I <N + 1.

図３に示す構文解析処理７１では、一例として、図４に示すソースコードが図５に示す、中間言語であらわされたコードに変換される。 In the parsing process 71 shown in FIG. 3, as an example, the source code shown in FIG. 4 is converted into a code expressed in an intermediate language shown in FIG.

図６は、アンローリング処理のフローチャートである。また図７は、図５に示す中間言語で記述されたコードがアンローリングされた状態を示した図である。 FIG. 6 is a flowchart of the unrolling process. FIG. 7 is a diagram showing a state where the code written in the intermediate language shown in FIG. 5 is unrolled.

尚ここでは、アンローリングされた部分のみ関心があり、図７には、図５のうちの（３）行目から（９）行目に相当する部分のみ示されている。実際には、例えば図７に示す部分の後に、（１０），（１１）に対応する処理（但し、（１０）に相当する処理ではＩに４が加えられる）が実行されるなど、さらにいくつかの処理が追加される。 Here, only the unrolled portion is of interest, and FIG. 7 shows only the portion corresponding to the (3) to (9) rows in FIG. Actually, for example, after the portion shown in FIG. 7, processing corresponding to (10) and (11) is executed (however, 4 is added to I in the processing corresponding to (10)). Such processing is added.

図７では、図５に示すループが４展開されている。アンローリング処理７２（図３参照）が実行されると各命令に展開番号が埋め込まれる。ただし、図７では、分かり易さのため、展開番号を命令とは別にして命令に並べて示してある。 In FIG. 7, four loops shown in FIG. 5 are developed. When the unrolling process 72 (see FIG. 3) is executed, a development number is embedded in each instruction. However, in FIG. 7, for the sake of easy understanding, the expansion numbers are arranged in the instructions separately from the instructions.

図５の（３）〜（９）行目と図７とを比較すると、図７では、図５の（３）〜（９）行目の各命令からなる命令群が４回繰り返され、各命令群について１〜４の各展開番号が付されている。また、各展開番号１〜４の命令群に出現する変数が各展開番号１〜４ごとに異なる仮想レジスタに割り当てられるように仮想レジスタの番号（ＰＲｎのｎ）が変更されている。ただし、各展開番号１〜４のそれぞれでは、各展開番号１〜４の命令群で共通に使用される共通変数（共通に使用される仮想レジスタ）であるＰＲ２（図４のＣに対応する）については、全ての展開番号１〜４で共通に使用されている。 When comparing the (3) to (9) lines in FIG. 5 and FIG. 7, in FIG. 7, the instruction group consisting of the respective instructions in the (3) to (9) lines in FIG. Each expansion number of 1-4 is attached | subjected about the instruction group. Further, the virtual register number (n of PRn) is changed so that variables appearing in the instruction groups of the expansion numbers 1 to 4 are assigned to different virtual registers for the expansion numbers 1 to 4, respectively. However, in each of the expansion numbers 1 to 4, PR2 (corresponding to C in FIG. 4), which is a common variable (a virtual register used in common) commonly used in the instruction groups of the expansion numbers 1 to 4. Is commonly used in all the development numbers 1 to 4.

図６のフローチャートに基づいてアンローリング処理７２を説明する。 The unrolling process 72 is demonstrated based on the flowchart of FIG.

ここでは先ず、カウンタに１が設定され（ステップＳ１１）、元の命令（ここでは図５の（３）行目）がコピーされ（ステップＳ１２）、カウンタ（ここでは１）が展開番号としてその命令に設定される（ステップＳ１３）。ステップＳ１４ではループ内にまだ命令が残っているか否かが判定される。ここでは、図５の（３）行目がコピーされただけであって、（４）〜（９）行目が残っているからステップＳ１２に戻り、今度は図５の（４）行目について、ステップＳ１２，Ｓ１３の処理が行なわれる。同様にして（５）行目〜（９）行目についてステップＳ１２，Ｓ１３の処理が順次繰り返される。ステップＳ１２，Ｓ１３の処理が（９）行目まで終了すると、ステップＳ１４を経由してステップＳ１５に進み、カウンタがカウントアップされる。ステップＳ１６ではカウンタが展開数（ここでは４）を超えたか否かが判定され、展開数４を超えるまで、ステップＳ１２〜Ｓ１５の処理が繰り返される。ステップＳ１６で展開数４を超えたことが判定されると、ステップＳ１７に進み、コピーした命令が参照する変数（仮想レジスタ）が展開ごとにリネームされる。ただし、各展開番号１〜４で共通に使用される変数（仮想レジスタ）（図５に示す例ではＰＲ２）はリネームされずにそのまま残される。 Here, first, 1 is set in the counter (step S11), the original instruction (here, line (3) in FIG. 5) is copied (step S12), and the counter (here, 1) is used as the expansion number. (Step S13). In step S14, it is determined whether or not there are instructions remaining in the loop. Here, since the (3) line in FIG. 5 has only been copied and the (4) to (9) lines remain, the process returns to step S12, and this time for the (4) line in FIG. Steps S12 and S13 are performed. Similarly, the processes in steps S12 and S13 are sequentially repeated for the lines (5) to (9). When the processes of steps S12 and S13 are completed up to the (9) th line, the process proceeds to step S15 via step S14, and the counter is counted up. In step S16, it is determined whether or not the counter exceeds the number of expansions (here, 4), and steps S12 to S15 are repeated until the number of expansions exceeds four. If it is determined in step S16 that the number of expansions exceeds 4, the process proceeds to step S17, and a variable (virtual register) referred to by the copied instruction is renamed for each expansion. However, a variable (virtual register) (PR2 in the example shown in FIG. 5) that is commonly used in each of the expansion numbers 1 to 4 is left without being renamed.

以上のアンローリング処理により、図５のコードが図７のコードにアンローリングされる。 By the above unrolling process, the code of FIG. 5 is unrolled to the code of FIG.

図８は、レジスタ割付け処理７３における、ループ内の変数をレジスタグループに分類する処理を示すフローチャートである。この図８の処理は、変数をレジスタに割り付ける処理（図１２参照）の前準備の処理である。 FIG. 8 is a flowchart showing processing for classifying variables in a loop into register groups in the register allocation processing 73. The process of FIG. 8 is a preparatory process for the process of assigning variables to registers (see FIG. 12).

ここでは、ここでのコンパイルにより生成されるオブジェクトコードの実行が想定されているプロセッサ５０（図２参照）で使用可能なレジスタが複数のレジスタ群にグループ分けされる。このグループ分けされるレジスタ群の数Ｎは、使用可能なレジスタの数とアンローリング展開数Ｌ（ここではＬ＝４）とから決定される（ステップＳ２１）。Ｎは２以上Ｌ以下である。ここではＮ＝２とする。 Here, the registers that can be used by the processor 50 (see FIG. 2) that is supposed to execute the object code generated by the compilation here are grouped into a plurality of register groups. The number N of register groups to be grouped is determined from the number of usable registers and the number of unrolling expansions L (here, L = 4) (step S21). N is 2 or more and L or less. Here, N = 2.

次に配列Ｓ（Ｎ＋１），Ｂ（Ｎ＋１）の領域を確保する（ステップＳ２２）。ここで、配列Ｓ（Ｎ＋１）は、レジスタグループごとに変数（仮想レジスタ）を分類するための配列である。また、配列Ｂ（Ｎ＋１）はレジスタ（実レジスタ）をグループごとに分類するための配列である。配列の数がＮ（レジスタグループの数）ではなく、Ｎ＋１となっているのは、複数の展開番号に跨って共通に使用される共通変数をレジスタグループとは別に分類するためである。またここでは、ループ内に出現するすべての変数（仮想レジスタ）の集合をＶとする。 Next, areas of the arrays S (N + 1) and B (N + 1) are secured (step S22). Here, the array S (N + 1) is an array for classifying variables (virtual registers) for each register group. The array B (N + 1) is an array for classifying registers (real registers) for each group. The reason why the number of arrays is not N (the number of register groups) but N + 1 is to classify common variables used in common across a plurality of development numbers separately from the register groups. Here, a set of all variables (virtual registers) appearing in the loop is assumed to be V.

ステップＳ２３では、変数（仮想レジスタ）の集合Ｖ内の任意の変数（仮想レジスタ）ｖが取り出される。そして、その変数ｖが出現する命令が全て同一の展開番号に属する命令であるか否か、すなわち、その変数ｖがある１つの展開番号内の命令にのみ出現する変数であるか否かが判定される（ステップＳ２４）。そして、その変数ｖが出現する命令が全て同一の展開番号に属する命令であるときは、その変数ｖが出現する展開番号によって配列Ｓに分配される。ここでは、展開数は４、レジスタグループ数Ｎは２なので、Ｓ（１）には、展開番号１と３の変数（仮想レジスタ）が分配され、Ｓ（２）には展開番号２と４の変数（仮想レジスタ）が分配される。 In step S23, an arbitrary variable (virtual register) v in the set V of variables (virtual registers) is extracted. Then, it is determined whether or not the instruction in which the variable v appears is an instruction belonging to the same expansion number, that is, whether or not the variable v is a variable that appears only in an instruction within a certain expansion number. (Step S24). If all the instructions in which the variable v appears are instructions belonging to the same expansion number, the variable v is distributed to the array S by the expansion number in which the variable v appears. Here, since the number of expansions is 4 and the number of register groups N is 2, the variables (virtual registers) of expansion numbers 1 and 3 are distributed to S (1), and the expansion numbers 2 and 4 are distributed to S (2). Variables (virtual registers) are distributed.

ステップＳ２４において、ｖが複数の展開番号の命令に跨って出現すると判定されたときは、ステップＳ２６に進み、その変数ｖが配列Ｓ（Ｎ＋１）に加えられる。 If it is determined in step S24 that v appears across instructions with a plurality of expansion numbers, the process proceeds to step S26, and the variable v is added to the array S (N + 1).

以上の処理が集合Ｖ内の全ての変数ｖについて実行された後、今度は配列Ｓ（ｎ）に対応する配列Ｂ（ｎ）が作成される。ここで、Ｂ（ｎ）はｎ番目のレジスタグループに属するレジスタの集合を意味する。Ｂ（Ｎ＋１）は特別な集合であり、各レジスタグループに属するレジスタすべてを含むレジスタの集合である。 After the above processing is executed for all the variables v in the set V, an array B (n) corresponding to the array S (n) is created. Here, B (n) means a set of registers belonging to the nth register group. B (N + 1) is a special set, which is a set of registers including all the registers belonging to each register group.

さらに、ステップＳ２８において命令群全体をサーチして、メモリへの退避／メモリからの復元用としていくつのレジスタが必要か調べ、必要な数のレジスタを予約しておく。 Further, in step S28, the entire instruction group is searched to check how many registers are required for saving / restoring to the memory, and a necessary number of registers are reserved.

図９は、図７に示す例における、変数（仮想レジスタ）が分配された配列Ｓを示す図である。 FIG. 9 is a diagram illustrating an array S in which variables (virtual registers) are distributed in the example illustrated in FIG.

Ｓ（１）には、展開番号１の命令群のみに出現する変数（仮想レジスタ）と展開番号３の命令群のみに出現する変数（仮想レジスタ）が分配されている。またＳ（２）には、展開番号２の命令群のみに出現する変数（仮想レジスタ）と展開番号４の命令群のみに出現する変数が分配されている。さらに、Ｓ（３）には、複数の展開番号の命令群に共通的に出現する変数（仮想レジスタ）が分配されている。 In S (1), a variable (virtual register) that appears only in the instruction group with expansion number 1 and a variable (virtual register) that appears only in the instruction group with expansion number 3 are distributed. In addition, a variable (virtual register) that appears only in the instruction group with expansion number 2 and a variable that appears only in the instruction group with expansion number 4 are distributed to S (2). Furthermore, a variable (virtual register) that commonly appears in a group of instructions having a plurality of expansion numbers is distributed to S (3).

図１０は、レジスタグループ数Ｎ（ここではＮ＝２）に分類された実レジスタを示す図である。 FIG. 10 is a diagram showing the real registers classified into the register group number N (N = 2 in this case).

ここでは、使用可能なレジスタがＲ０〜Ｒ１１の１２個（図２のｎ＝１２）存在し、Ｒ０〜Ｒ５がグループ１に、Ｒ６〜Ｒ１１がグループ２に分かれている。 Here, there are 12 usable registers R0 to R11 (n = 12 in FIG. 2), R0 to R5 are divided into group 1, and R6 to R11 are divided into group 2.

図１１は、配列Ｂ（ｎ）の内容を示す図である。 FIG. 11 is a diagram showing the contents of the array B (n).

Ｂ（１）には、レジスタどうしの間での定義参照（演算）で使用できるレジスタとしてＲ０〜Ｒ３、メモリへの退避／メモリからの復元用の予約レジスタとしてＲ４，Ｒ５が分配されている。また、Ｂ（２）には、レジスタどうしの間での定義参照で使用できるレジスタとしてＲ６〜Ｒ７、予約レジスタとしてＲ１０，Ｒ１１が分配されている。さらにＢ（３）には、全てのレジスタＲ０〜Ｒ１１が配置されている。これら全てのレジスタのうちのＲ１０，Ｒ１１は予約レジスタとして設定されており、使用できるレジスタはＲ０〜Ｒ９である。 In B (1), R0 to R3 are distributed as registers that can be used for definition reference (calculation) between registers, and R4 and R5 are reserved registers for saving / restoring to / from memory. Further, R6 to R7 are distributed to B (2) as registers that can be used for definition reference between registers, and R10 and R11 are reserved as registers. Further, all registers R0 to R11 are arranged in B (3). Of all these registers, R10 and R11 are set as reserved registers, and usable registers are R0 to R9.

図１２は、レジスタ割付け処理のフローチャートである。 FIG. 12 is a flowchart of register allocation processing.

図３に示すレジスタ割付け処理７３では、図８に示す前処理に続いて図１２に示すレジスタ割付け処理が実行される。 In the register allocation process 73 shown in FIG. 3, the register allocation process shown in FIG. 12 is executed following the pre-process shown in FIG.

ここでは、先ず、任意の変数ｖがＳ（ｎ）（図９参照）に含まれる場合、Ｂ（ｎ）（図１１参照）のレジスタしか利用することができないとしてレジスタ割付けを実行する。退避復元用のレジスタは別に予約されており、ここでは、図１１に示す「使用できるレジスタ」のみに割付けが行なわれる。 Here, first, when an arbitrary variable v is included in S (n) (see FIG. 9), only register B (n) (see FIG. 11) can be used, and register allocation is executed. Registers for saving and restoring are reserved separately, and here, only the “usable registers” shown in FIG. 11 are allocated.

ステップＳ３２では、「使用できるレジスタ」がすべての変数に割付けることができたか否かが判定される。「使用できるレジスタ」がすべての変数に割付けることができたときは、レジスタ割付け処理が終了する。「使用できるレジスタ」のみでは全ての変数にレジスタを割付けることができなかったときは、次に、割付けることができなかった変数の参照回数の合計がプログラムの全体に対して十分に小さいか否かが判定される（ステップＳ３３）。具体的には、展開番号１〜４の全てについて割り付けることができなかった変数の参照回数の合計が数えられ、その参照回数の、プログラム全体のステップ数に対する比率と閾値とが比較され、その比率が閾値未満か閾値以上かが判定される。その比率が閾値未満の場合、すなわち、割付けることができなかった変数の参照回数の合計がプログラム全体の大きさに対して十分に小さい場合はステップＳ３４に進み、そうでない場合はステップＳ３５に進む。 In step S32, it is determined whether or not “usable registers” have been assigned to all variables. When the “usable registers” can be assigned to all the variables, the register assignment process ends. If registers cannot be allocated to all variables using only "usable registers", then whether the total number of times the variables could not be allocated is sufficiently small for the entire program It is determined whether or not (step S33). Specifically, the total number of reference times of variables that could not be assigned for all of the expansion numbers 1 to 4 is counted, and the ratio of the number of reference times to the number of steps of the entire program is compared with a threshold, and the ratio Is less than the threshold or greater than or equal to the threshold. If the ratio is less than the threshold value, that is, if the total number of references to variables that could not be allocated is sufficiently small relative to the overall program size, the process proceeds to step S34, and if not, the process proceeds to step S35. .

ステップＳ３４では、予約したレジスタ（図１１参照）を使って、レジスタに割付けることができなかった変数をメモリに退避復元する命令が生成され、この割付け処理を終了する。 In step S34, using the reserved register (see FIG. 11), an instruction for saving and restoring the variable that could not be allocated to the register is generated in the memory, and the allocation process is terminated.

一方、ステップＳ３５では、任意の変数ｖにすべてのレジスタを利用できると考えてレジスタ割付けを実行する。ただし、ステップＳ３１の場合と同様、退避復元用のレジスタは別に予約しておく。 On the other hand, in step S35, register allocation is executed assuming that all registers can be used for an arbitrary variable v. However, as in the case of step S31, the save / restore register is reserved separately.

ステップＳ３６では、ステップＳ３５においてレジスタをすべての変数に割付けることができたか否かが判定され、すべての変数にレジスタを割付けることができたときは、レジスタ割付け処理を終了する。レジスタをすべての変数には割付けることができなかったときは、予約したレジスタを使って、レジスタに割付けることができなかった変数メモリに退避復元する命令が生成される（ステップＳ３７）。 In step S36, it is determined whether or not the register can be allocated to all variables in step S35. If the register can be allocated to all the variables, the register allocation process ends. If the register cannot be assigned to all variables, an instruction to save and restore to the variable memory that could not be assigned to the register is generated using the reserved register (step S37).

図１３は、レジスタ割付けステップのフローチャートである。このレジスタ割付けステップは、図１２のステップＳ３１およびステップＳ３５の双方で実行されるが処理のフローは同一である。ただし、ステップＳ３１とステップＳ３５では、割付けに利用できるレジスタの数が相違する。 FIG. 13 is a flowchart of the register allocation step. This register allocation step is executed in both step S31 and step S35 of FIG. 12, but the process flow is the same. However, the number of registers that can be used for allocation differs between step S31 and step S35.

図１３のフローに従って、図７に示すようにアンローリングされた命令を上から順に見ていき（ステップＳ４１）、参照オペランドの割付け（ステップＳ４２）、および定義オペランドの割付け（ステップＳ４３）を、すべての命令について実行する（ステップＳ４４）。 According to the flow of FIG. 13, the unrolled instructions as shown in FIG. 7 are viewed in order from the top (step S41), the allocation of reference operands (step S42), and the allocation of definition operands (step S43) are all performed. This command is executed (step S44).

ここで、参照オペランドとは、命令の左側の変数である。例えば展開番号１中の
ＡＤＤＲ１，Ｒ０ ⇒ Ｒ２
における変数（仮想レジスタ）Ｒ１，Ｒ０が参照オペランドである。また、命令の右側の変数、ここではＲ２が定義オペランドである。 Here, the reference operand is a variable on the left side of the instruction. For example, ADD R1, R0 ⇒ R2 in expansion number 1
The variables (virtual registers) R1 and R0 in FIG. Also, the variable on the right side of the instruction, here R2, is the definition operand.

図１４は、参照オペランド割付けステップ（図１３のステップＳ４２）のフローチャートである。 FIG. 14 is a flowchart of the reference operand allocation step (step S42 in FIG. 13).

ここでは、命令の参照オペランドが調べられ（ステップＳ５１）、その参照オペランドがその命令以降には参照がない最後の参照か否かが判定され、最後の参照であったときはその参照オペランドに割当てられているレジスタが解放される（ステップＳ５３）。この処理をすべての参照オペランドについて行なう（ステップＳ５４）。 Here, the reference operand of the instruction is checked (step S51), and it is determined whether or not the reference operand is the last reference that has no reference after the instruction. If it is the last reference, the reference operand is assigned to the reference operand. The registered register is released (step S53). This process is performed for all reference operands (step S54).

図１５は、定義オペランド割付けステップ（図１３のステップＳ４３）のフローチャートである。 FIG. 15 is a flowchart of the definition operand assignment step (step S43 in FIG. 13).

ここでは、命令の定義オペランドが調べられ（ステップＳ６１）、その定義オペランドがレジスタに割付けられているか否かが判定される（ステップＳ６２）。その定義オペランドがレジスタに割付けられていれば次の定義オペランドの調べに移り（ステップＳ６６）、その定義オペランドがレジスタに割付けられていないときは変数に割付けられていないレジスタが余っているか否かが調べられる（ステップＳ６３）。変数に割付けられていないレジスタが余っているときは、その定義オペランドにその余っているレジスタが割付けられ（ステップＳ６４）、一方、レジスタが余っていないときはその定義オペランド（変数）が集合ＳＰＩＬＬに追加される（ステップＳ６５）。この集合ＳＰＩＬＬは、レジスタに割付けることができなかった変数（仮想レジスタ）の集合である。以上の処理がすべての定義オペランドに対し行なわれる。 Here, the definition operand of the instruction is examined (step S61), and it is determined whether or not the definition operand is assigned to the register (step S62). If the definition operand has been assigned to a register, the next definition operand is checked (step S66). If the definition operand has not been assigned to a register, whether or not there are remaining registers not assigned to variables. It is checked (step S63). If there is a surplus register that is not assigned to a variable, the surplus register is assigned to the definition operand (step S64). On the other hand, if there is no surplus register, the definition operand (variable) is assigned to the set SPILL. It is added (step S65). This set SPILL is a set of variables (virtual registers) that could not be assigned to registers. The above processing is performed for all definition operands.

図１６は、退避復元命令生成ステップ（図１２のステップＳ３４およびステップＳ３７）のフローチャートである。 FIG. 16 is a flowchart of the save / restore instruction generation step (step S34 and step S37 in FIG. 12).

ここでは、レジスタを割付けることができなかった集合ＳＰＩＬＬの中から任意の変数ｖが取り出される（ステップＳ３７）。 Here, an arbitrary variable v is extracted from the set SPILL that could not be assigned a register (step S37).

次いで、その変数ｖに対してメモリ上の一時領域Ｍが決められる（ステップＳ７２）。 Next, a temporary area M on the memory is determined for the variable v (step S72).

さらに、その変数ｖの定義に対して一時領域Ｍへストアする命令を生成する。この命令で用いるストアのためのレジスタは、退避復元用に予約しておいたレジスタが使用される（図１１参照）。 Further, an instruction to store in the temporary area M is generated for the definition of the variable v. As a register for storing used in this instruction, a register reserved for saving and restoring is used (see FIG. 11).

次に、変数ｖの参照に対して一時領域Ｍからレジスタにロードする命令が生成される（ステップＳ７４）。ここでも、ロードするためのレジスタとして退避復元用に予約しておいたレジスタが使用される。 Next, an instruction to load the register from the temporary area M with respect to the reference of the variable v is generated (step S74). Again, a register reserved for saving and restoring is used as a register for loading.

以上の処理が、集合ＳＰＩＬＬ内の全ての変数について行なわれる（ステップＳ７５）。 The above processing is performed for all variables in the set SPILL (step S75).

図１７は、実レジスタへの割付け結果を示した図である。この図１７は、図７に示すアンローリング処理された命令にあらわれた変数を図１０に示す数のレジスタが存在することを前提にしてレジスタに割り付けたときの割付け結果である。 FIG. 17 is a diagram showing the result of allocation to real registers. FIG. 17 shows the allocation result when the variables appearing in the unrolled instruction shown in FIG. 7 are assigned to the registers on the assumption that the number of registers shown in FIG. 10 exists.

この図１７では、各展開番号ごとに、その展開番号の命令群のみにあらわれる全ての変数が、グループ分けされた１つのグループに属するレジスタに割付けられている。ただし、図１７に×印を付した２つの命令については、図１０に示すグループ１とグループ２との双方が参照されている。この×印を付した命令の数が多くなってくると、その分、処理の並列化の妨げや処理速度の低下を来たすおそれがあるが、図１７に示す例では×印は２つの命令にとどまっている。 In FIG. 17, for each expansion number, all variables appearing only in the instruction group of that expansion number are assigned to registers belonging to one group. However, for the two instructions marked with “x” in FIG. 17, both group 1 and group 2 shown in FIG. 10 are referred to. If the number of instructions marked with x increases, there is a risk that processing parallelization may be hindered or the processing speed may be reduced. However, in the example shown in FIG. It stays.

次に、レジスタが足りない場合の例を示す。 Next, an example in which there are not enough registers will be described.

図１８は、図１０に代わるレジスタグループの例である。 FIG. 18 shows an example of a register group in place of FIG.

図１０では、各グループには、６つのレジスタが属しているが、この図１８では各グループに５つのレジスタしか存在しない。 In FIG. 10, each group has six registers, but in FIG. 18, there are only five registers in each group.

図１９は、図１１に代わる、配列Ｂ（ｎ）の内容を示す図である。この図１９では、図１１と比べ「使用できるレジスタ」の数が減少している。 FIG. 19 is a diagram showing the contents of the array B (n) instead of FIG. In FIG. 19, the number of “usable registers” is reduced as compared with FIG.

図２０は、図１８，図１９に示す前提における、実レジスタへの割付け（図１２のステップＳ３１）の結果を示した図である。 FIG. 20 is a diagram showing the result of the allocation to the actual register (step S31 in FIG. 12) based on the assumptions shown in FIGS.

変数（仮想レジスタ）に割付けるレジスタが不足し、図中に★印を付した変数（仮想レジスタ）がレジスタに割付けられずに残っている。 There are not enough registers to be assigned to the variables (virtual registers), and the variables (virtual registers) marked with a star in the figure remain without being assigned to the registers.

図２１は、図１２のステップＳ３１で図２０の状態が生じた場合の、図１２のステップＳ３４の処理結果を示す図である。 FIG. 21 is a diagram illustrating a processing result in step S34 in FIG. 12 when the state in FIG. 20 occurs in step S31 in FIG.

図中に△印を付した部分に、メモリへの退避／メモリからの復元の命令が挿入されている。 Instructions for saving / restoring to / from the memory are inserted in the portions marked with Δ in the figure.

メモリへの退避／メモリからの復元の処理はレジスタ間のデータ移動と比べ大幅に時間がかかるが、頻度が少ないときは、メモリへの退避／メモリからの復元の処理を挿入することで並列化を妨げる要因を増やさずに済む。 The process of saving / restoring to memory takes a lot of time compared to data movement between registers, but when the frequency is low, it is parallelized by inserting the process of saving / restoring to / from memory. It is not necessary to increase the factors that hinder.

次に、図１２のステップＳ３５の例を説明する。このステップＳ３５では、レジスタをグループ分けせずに１つのグループにまとめ、この１つのグループ内のレジスタを全ての展開番号の命令で利用できるものとして、レジスタ割付けが行なわれる。 Next, an example of step S35 in FIG. 12 will be described. In this step S35, the registers are allocated on the assumption that the registers are grouped into one group without being grouped, and the registers in the one group can be used by instructions of all the expansion numbers.

図２２は、１つのグループにまとめたレジスタを示した図である。 FIG. 22 is a diagram illustrating registers grouped into one group.

図２３は、図７のようにアンローリングされた命令についてグループ分けせずにレジスタを使ったときの例である。 FIG. 23 shows an example in which a register is used without grouping an unrolled instruction as shown in FIG.

ここでは、メモリへの退避／メモリからの復元の処理を行なわなくても、全ての変数にレジスタが割付けられている。これは、メモリへの退避／メモリからの復元の処理の数が多過ぎるとき、並列処理をあきらめてレジスタに割付けることを意味している。すなわち、展開番号１〜４にわたって同じレジスタが割付けられており、複数の演算器での並列処理が難しい状況となっている。例えば８番目のＬＯＡＤはその直前のＳＴＯＲＥとの間でレジスタの依存関係があるので、これらの命令を並列に実行することはできない。 Here, registers are assigned to all variables without performing the process of saving to the memory / restoring from the memory. This means that when the number of processes for saving to the memory / restoring from the memory is too large, the parallel processing is given up and assigned to the register. That is, the same registers are assigned over the expansion numbers 1 to 4, and it is difficult to perform parallel processing with a plurality of arithmetic units. For example, since the eighth LOAD has a register dependency with the immediately preceding STORE, these instructions cannot be executed in parallel.

図２３では、使用するレジスタの数をできるだけ減らすために同じレジスタを使い回すようにしている。処理の並列性を少しでも向上させるためになるべく違うレジスタを使うように割り当ててもよい。 In FIG. 23, the same registers are reused in order to reduce the number of registers to be used as much as possible. In order to improve the parallelism of processing as much as possible, it may be assigned to use different registers as much as possible.

図２４は、図２３と同様、図７に示すようにアンローリングされた命令についてグループ分けせずにレジスタを使ったときの例を示す図である。ただしここでは、できるだけ違うレジスタを使うように工夫されている。例えば４番目の命令ＳＵＢの定義では、Ｒ２が解放されているので図２３に示すように再びＲ２を使うこともできるが、図２４ではＲ４が割付けられている。できるだけ違うレジスタを使う工夫をした結果、図２４の８番目のＬＯＡＤでは、その上のＳＴＯＲＥやＤＩＶと同時に実行することができる。ただし、ここではレジスタグループについては考慮されていないため、図２４に×印で示すように、レジスタグループ（図１８参照）が依存している命令が多数発生している。このため、図２１に示すメモリへの退避／メモリからの復元の命令を挿入したときと比べると並列性は低い。 FIG. 24 is a diagram showing an example when registers are used without grouping the unrolled instructions as shown in FIG. Here, however, it is devised to use as different registers as possible. For example, in the definition of the fourth instruction SUB, since R2 is released, R2 can be used again as shown in FIG. 23, but R4 is assigned in FIG. As a result of using a different register as much as possible, the eighth LOAD in FIG. 24 can be executed simultaneously with the STORE and DIV above it. However, since the register group is not taken into consideration here, many instructions on which the register group (see FIG. 18) depends are generated as indicated by the crosses in FIG. For this reason, the parallelism is low as compared with the case where the instruction for saving / restoring to / from the memory shown in FIG. 21 is inserted.

次に、上記の実施形態のメリットについて説明する。 Next, advantages of the above embodiment will be described.

図２５は、あるプロセッサの説明図である。 FIG. 25 is an explanatory diagram of a certain processor.

ここに示すプロセッサは、３つの演算器Ａ，Ｂ，Ｃと３つのレジスタグループＡ，Ｂ，Ｃを有する。演算器ＡはレジスタグループＡを直接参照することができるが、他のレジスタグループＢ，Ｃは直接には参照できず、参照しようとすると時間がかかる。他の演算器Ｂ，Ｃやレジスタグループも同様である。この場合、１つの命令に複数のレジスタグループに属するレジスタが依存すると命令の実行に時間がかかることとなる。 The processor shown here has three arithmetic units A, B, and C and three register groups A, B, and C. The arithmetic unit A can directly refer to the register group A, but the other register groups B and C cannot be directly referred to, and it takes time to refer to them. The same applies to the other arithmetic units B and C and the register group. In this case, if a register belonging to a plurality of register groups depends on one instruction, it takes time to execute the instruction.

図２６は、命令の実行に遅延を生じさせない例を示す図である。 FIG. 26 is a diagram illustrating an example in which a delay is not caused in the execution of an instruction.

ここでは、レジスタグループＡのレジスタをａ１，ａ２，ａ３，…とし、レジスタグループＢのレジスタをｂ１，ｂ２，ｂ３，…とする。 Here, the registers of the register group A are a1, a2, a3,..., And the registers of the register group B are b1, b2, b3,.

図２６に示す命令列では全てレジスタグループＡのレジスタを使用しているので命令の遅延は生じない。 In the instruction sequence shown in FIG. 26, all registers in register group A are used, so that no instruction delay occurs.

図２７は、命令の実行に遅延を生じさせる例を示す図である。 FIG. 27 is a diagram illustrating an example of causing a delay in the execution of an instruction.

図２７に示す命令列には、レジスタグループＡのレジスタとレジスタグループＢのレジスタが混在しており、命令の実行に遅延が生じる。 The instruction sequence shown in FIG. 27 includes a register group A register and a register group B register together, and a delay occurs in the execution of the instruction.

上述の実施形態では、グループ分けされたレジスタをできる限り使おうとしているため、命令の実行に遅延が生じる要素が最小限に抑えられ、高速処理が可能となる。 In the above-described embodiment, since the grouped registers are used as much as possible, elements that cause a delay in execution of instructions are minimized, and high-speed processing is possible.

また、本実施形態のもう１つのメリットは並列性の点である。図２５に示すようなレジスタグループがハードウェア上では存在しておらず全てのレジスタが平等に参照可能であったとしても、レジスタを本実施形態のようにグループ分けすることにより並列性が高められ処理速度が向上する。 Another advantage of this embodiment is parallelism. Even if the register group as shown in FIG. 25 does not exist on the hardware and all the registers can be referred to evenly, the parallelism can be improved by grouping the registers as in this embodiment. Processing speed is improved.

図２８は、並列性が低い例を示す図である。 FIG. 28 is a diagram illustrating an example of low parallelism.

ここではレジスタはグループ分けされないまま割付け処理が行なわれ、その結果、（３）行目でレジスタｒ１が解放されてからでないと（５）行目のＬＯＡＤ命令は実行できない。すなわち（５）行目のＬＯＡＤ命令は（４）行目のＳＴＯＲＥ命令と同時にしか実行できない。 Here, the allocation processing is performed without grouping the registers, and as a result, the LOAD instruction on the (5) line can be executed only after the register r1 is released on the (3) line. That is, the LOAD instruction on the (5) line can be executed only simultaneously with the STORE instruction on the (4) line.

図２９は、並列性が高い例を示す図である。 FIG. 29 is a diagram illustrating an example of high parallelism.

ここではレジスタはグループ分けされており、したがって（１）行目と（５）行目の命令を並列に実行することができる。 Here, the registers are grouped, so that the instructions on the (1) line and (5) line can be executed in parallel.

上述の実施形態では、できる限り、グループ分けされたレジスタを使おうとしているため、並列性が向上する。したがって、図５に示すような、レジスタがハードウェア的にグループ分けされたプロセッサでなくても複数の演算器を使った並列処理により処理速度が向上する。 In the above-described embodiment, parallelism is improved because the grouped registers are used as much as possible. Therefore, the processing speed is improved by parallel processing using a plurality of arithmetic units even if the processor is not a hardware grouped processor as shown in FIG.

１０，４０コンパイル装置
２０ハードウェア
２１，５０プロセッサ
２２メモリ
２３入力装置
２４出力装置
３０オペレーティング・システム（ＯＳ）
４０コンパイルプログラム
４１ソースコード取得部
４２アンローリング部
４３レジスタ割付部
４４オブジェクトコード生成部
４５オブジェクトコード出力部
５１ａ，５１ｂ演算器
５２ａ，５２ｂ，５２ｎレジスタ
６０ソースファイル
７０コンパイル処理
７１構文解析処理
７２アンローリング処理
７３レジスタ割付け処理
７４命令スケジューリング処理
７５コード生成処理
８０オブジェクトファイル
４３１第１の割付部
４３２比較部
４３３メモリ退避復元コード挿入部
４３４第２の割付部 10, 40 Compile device 20 Hardware 21, 50 Processor 22 Memory 23 Input device 24 Output device 30 Operating system (OS)
40 Compile Program 41 Source Code Acquisition Unit 42 Unrolling Unit 43 Register Allocation Unit 44 Object Code Generation Unit 45 Object Code Output Unit 51a, 51b Calculator 52a, 52b, 52n Register 60 Source File 70 Compile Process 71 Parsing Process 72 Unrolling Processing 73 Register allocation processing 74 Instruction scheduling processing 75 Code generation processing 80 Object file 431 First allocation unit 432 Comparison unit 433 Memory save / restore code insertion unit 434 Second allocation unit

Claims

A compiling device that obtains source code and creates an object code that can be executed by a processor having a plurality of registers and a plurality of arithmetic units that perform arithmetic processing using the plurality of registers,
A source code acquisition unit for acquiring source code;
An unrolling unit for unrolling a loop described in the source code into a plurality of instruction groups;
A register allocation unit that allocates registers to variables that appear in the plurality of instruction groups generated by the unrolling unit;
An object code generation unit that generates an object code based on a register allocation result in the register allocation unit;
An object code output unit that outputs the object code generated by the object code generation unit,
The register allocating unit assigns each register group when the plurality of registers are grouped into a plurality of register groups to each of the plurality of instruction groups, and each of the plurality of instruction groups appears in the instruction group. Including a first allocation unit that allocates all of the individual variables, except for the common variable used in common to the plurality of instruction groups, among the variables to only the registers constituting one register group allocated to the instruction group. Compile device characterized by this.

If the register allocation unit cannot allocate all of the individual variables to the register by the first allocation unit, the register of the variable that cannot be allocated by the first allocation unit 2. The compiling device according to claim 1, further comprising a memory saving / restoring code insertion unit for inserting a code for instructing saving and restoring of the variable to / from the memory instead of allocation to the variable.

When the register allocation unit is unable to allocate all the individual variables to the register by the first allocation unit, the plurality of instruction groups before being grouped in any of the plurality of instruction groups The compiling apparatus according to claim 1, further comprising a second allocation unit that allocates all the variables appearing in the plurality of instruction groups to the register, assuming that any of the registers can be used.

When the register allocation unit cannot allocate all the individual variables to the register by the first allocation unit, the first allocation of all the variables appearing in the plurality of instruction groups. A comparison unit that compares the ratio of the number of reference times of all variables for which allocation of registers was impossible in the part to the length of the entire program and a threshold value;
When the comparison unit determines that the ratio is less than the threshold value, instead of allocating the variable to the register, the first allocating unit cannot allocate the register, A memory save / restore code insertion unit for inserting a code for instructing save and restore from the memory;
When the comparison unit determines that the ratio is equal to or greater than the threshold, any of the plurality of registers before being grouped can be used in any of the plurality of instruction groups. The compiling apparatus according to claim 1, further comprising: a second allocation unit that allocates all variables appearing in the plurality of instruction groups to the registers.

In a processor having a plurality of arithmetic units that are executed by an arithmetic processing unit that executes a program, obtains a source code, and executes arithmetic processing while using the plurality of registers and the plurality of registers A compiling program that operates as a compiling device for creating executable object code,
The arithmetic processing unit;
A source code acquisition unit for acquiring source code;
An unrolling unit for unrolling a loop described in the source code into a plurality of instruction groups;
A register allocation unit that allocates registers to variables that appear in the plurality of instruction groups generated by the unrolling unit;
An object code generation unit that generates an object code based on a register allocation result in the register allocation unit;
An object code output unit that outputs the object code generated by the object code generation unit,
The register allocating unit assigns each register group when the plurality of registers are grouped into a plurality of register groups to each of the plurality of instruction groups, and each of the plurality of instruction groups appears in the instruction group. Compilation including a first allocation unit that allocates all of individual variables except for a common variable used in common to the plurality of instruction groups among variables to only a register constituting one register group allocated to the instruction group A compile program which is operated as a device.