JP2017016570A

JP2017016570A - Compiler device, compiler method and compiler program

Info

Publication number: JP2017016570A
Application number: JP2015135456A
Authority: JP
Inventors: 真駒形; Makoto Komagata
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-07-06
Filing date: 2015-07-06
Publication date: 2017-01-19
Anticipated expiration: 2035-07-06
Also published as: JP6547466B2

Abstract

PROBLEM TO BE SOLVED: To provide a compiler device that can generate objects capable of efficiently using an arithmetic unit owned by a processor.SOLUTION: The present compiler device comprises an extraction unit, a first generation unit, a second generation unit and a third generation unit. The extraction unit is configured to extract a command string in an input source file when detecting a repeat command instructing to repeat the command string in the input source file, and the first generation unit is configured to convert a command to be included in the extracted command string into an integer command of implementing an integer computation to generate an integer command string. The second generation unit is configured to convert the command to be included in the extracted command string into a floating point command implementing a floating point computation, and generates a floating point command string. The third generation unit is configured to generate an output command string including the integer command string and the floating point command string on the basis of the number of integer arithmetic units and number of floating point arithmetic units which serve as an execution environment of objects having a source file compiled and are owned by processors.SELECTED DRAWING: Figure 12

Description

本発明は、コンパイラ装置、コンパイル方法およびコンパイラプログラムに関する。 The present invention relates to a compiler apparatus, a compiling method, and a compiler program.

近年のプロセッサは、入力された命令を実行する演算器を複数有することで、複数の命令を並列に実行するスーパースカラを実現している。また、近年のプロセッサは、命令間に依存関係および分岐命令が無い場合、命令の順番を入れ替えて実行可能なアウトオブオーダーを実現している。演算器は、整数演算を実行する整数演算器および浮動小数点演算を実行する浮動小数点演算器を含む。すなわち、整数演算を行う整数命令は整数演算器で実行され、浮動小数点演算を行う浮動小数点命令は浮動小数点演算器で実行される。 Recent processors have a plurality of computing units that execute input instructions, thereby realizing a superscalar that executes a plurality of instructions in parallel. Further, recent processors realize an out-of-order that can be executed by changing the order of instructions when there is no dependency relationship or branch instruction between instructions. The arithmetic unit includes an integer arithmetic unit that performs integer arithmetic and a floating point arithmetic unit that performs floating point arithmetic. That is, an integer instruction that performs an integer operation is executed by an integer arithmetic unit, and a floating point instruction that performs a floating point operation is executed by a floating point arithmetic unit.

特開平４−３０７６２４号公報JP-A-4-307624 特開平１１−１１０２１５号公報Japanese Patent Laid-Open No. 11-110215

そのため、プロセッサによって実行されるオブジェクトに含まれる整数命令および浮動小数点命令の割合に偏りがあると、整数演算器および浮動小数点演算器のうちの一方の演算器に処理が集中する虞がある。その結果、他方の演算器は効率的に使用されない虞がある。そこで、開示の技術の１つの側面は、プロセッサの有する演算器を効率的に使用できるオブジェクトを生成可能なコンパイラ装置を提供することを課題とする。 Therefore, if there is a bias in the ratio of integer instructions and floating point instructions included in the object executed by the processor, there is a risk that the processing will be concentrated on one of the integer arithmetic unit and the floating point arithmetic unit. As a result, the other arithmetic unit may not be used efficiently. Accordingly, an object of one aspect of the disclosed technique is to provide a compiler apparatus that can generate an object that can efficiently use an arithmetic unit included in a processor.

開示の技術の１つの側面は、次のようなコンパイラ装置によって例示される。本コンパイラ装置は、抽出部、第１の生成部、第２の生成部および第３の生成部を備える。抽出部は、入力されたソースファイルにおいて命令列の繰り返しを指示する繰り返し命令を検出すると、繰り返し命令によって繰り返される命令列を抽出する。第１の生成部は、抽出された命令列に含まれる命令を整数演算を行う整数命令に変換して整数命令列を生成する。第２の生成部は、抽出された命令列に含まれる命令を浮動小数点演算を行う浮動小数点命令に変換して浮動小数点命令列を生成する。第３の生成部は、ソースファイルをコンパイルしたオブジェクトの実行環境となるプロセッサの有する整数演算器の数および浮動小数点演算器の数に基づいて、整数命令列と浮動小数点命令列とを含む出力命令列を生成する。 One aspect of the disclosed technique is exemplified by the following compiler apparatus. The compiler apparatus includes an extraction unit, a first generation unit, a second generation unit, and a third generation unit. When the extraction unit detects a repetitive instruction instructing repetition of the instruction sequence in the input source file, the extraction unit extracts an instruction sequence repeated by the repetitive instruction. The first generation unit generates an integer instruction sequence by converting an instruction included in the extracted instruction sequence into an integer instruction for performing an integer operation. The second generation unit converts an instruction included in the extracted instruction sequence into a floating point instruction for performing a floating point operation, and generates a floating point instruction sequence. The third generation unit outputs an output instruction including an integer instruction sequence and a floating-point instruction sequence based on the number of integer arithmetic units and the number of floating-point arithmetic units included in a processor that is an execution environment of an object compiled from a source file. Generate a column.

本コンパイラ装置は、プロセッサの有する演算器を効率的に使用できるオブジェクトを生成することができる。 This compiler apparatus can generate an object that can efficiently use an arithmetic unit included in a processor.

図１は、情報処理装置のハードウェア構成を例示する図である。FIG. 1 is a diagram illustrating a hardware configuration of the information processing apparatus. 図２は、情報処理装置が有するプロセッサの構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a configuration of a processor included in the information processing apparatus. 図３は、中間コードの一例を示す図である。FIG. 3 is a diagram illustrating an example of the intermediate code. 図４は、ブロックと命令の関係の一例を示す図である。FIG. 4 is a diagram illustrating an example of the relationship between blocks and instructions. 図５は、ループの命令列の一例を示す図である。FIG. 5 is a diagram illustrating an example of a loop instruction sequence. 図６は、ループ展開後の命令列の一例を示す図である。FIG. 6 is a diagram illustrating an example of an instruction sequence after loop expansion. 図７は、プロセッサに入力される命令列に含まれる各命令が複数の演算器で実行される処理の一例を示す図である。FIG. 7 is a diagram illustrating an example of processing in which each instruction included in the instruction sequence input to the processor is executed by a plurality of arithmetic units. 図８は、命令列に含まれる各命令を複数の演算器で分散して実行している状態の一例を示す図である。FIG. 8 is a diagram illustrating an example of a state in which each instruction included in the instruction sequence is distributed and executed by a plurality of arithmetic units. 図９は、命令列に含まれる各命令の実行がひとつの演算器に集中した状態の一例を示す図である。FIG. 9 is a diagram illustrating an example of a state in which execution of each instruction included in the instruction sequence is concentrated on one arithmetic unit. 図１０は、プログラムをコンパイルする処理の流れの一例を示す図である。FIG. 10 is a diagram illustrating an example of a flow of processing for compiling a program. 図１１は、コンパイルおよびコンパイル後のプログラムの実行の一例を示す図である。FIG. 11 is a diagram illustrating an example of compilation and execution of a program after compilation. 図１２は、第１実施形態に係るコンパイラ装置の処理ブロックの一例を示す図である。FIG. 12 is a diagram illustrating an example of processing blocks of the compiler apparatus according to the first embodiment. 図１３は、コンパイラ装置に入力されるループの命令列の一例を示す図である。FIG. 13 is a diagram showing an example of a loop instruction sequence input to the compiler apparatus. 図１４は、ループ展開に適さないソースファイルの一例を示す図である。FIG. 14 is a diagram illustrating an example of a source file that is not suitable for loop expansion. 図１５は、第１実施形態に係るコンパイラ装置によるループ展開処理の流れの一例を示す図である。FIG. 15 is a diagram illustrating an example of the flow of loop expansion processing by the compiler apparatus according to the first embodiment. 図１６は、ループの構造の一例を示す図である。FIG. 16 is a diagram illustrating an example of a loop structure. 図１７は、第１実施形態で入力されるループの命令列の一例を示す図である。FIG. 17 is a diagram illustrating an example of a loop instruction sequence input in the first embodiment. 図１８は、図１５のＦ４およびＦ５の処理の詳細な流れの一例を示す図である。FIG. 18 is a diagram illustrating an example of a detailed flow of the processes of F4 and F5 of FIG. 図１９は、ＦＵ変換テーブルの一例を示す図である。FIG. 19 is a diagram illustrating an example of the FU conversion table. 図２０は、ＩＵ変換テーブルの一例を示す図である。FIG. 20 is a diagram illustrating an example of the IU conversion table. 図２１は、入力されたループ演算命令列に含まれる各命令の整数命令への変換前後の対応の一例を示す図である。FIG. 21 is a diagram illustrating an example of correspondence between before and after conversion of each instruction included in the input loop operation instruction sequence into an integer instruction. 図２２は、入力されたループ演算命令列に含まれる各命令の浮動小数点演算命令への変換前後の対応の一例を示す図である。FIG. 22 is a diagram illustrating an example of correspondence between before and after conversion of each instruction included in the input loop operation instruction sequence into a floating-point operation instruction. 図２３は、ループ演算命令列の展開処理の一例を示す図である。FIG. 23 is a diagram illustrating an example of a loop operation instruction sequence expansion process. 図２４は、仮想レジスタマップの一例を示す図である。FIG. 24 is a diagram illustrating an example of a virtual register map. 図２５は、参照リストオペランドにおける仮想レジスタの変換処理の一例を示す図である。FIG. 25 is a diagram illustrating an example of a virtual register conversion process in the reference list operand. 図２６は、メモリオペランドのループ展開前後の対応の一例を示す図である。FIG. 26 is a diagram illustrating an example of correspondence before and after the loop expansion of the memory operand. 図２７は、各展開番号におけるメモリオペランドの変換を例示する図である。FIG. 27 is a diagram illustrating conversion of memory operands at each expansion number. 図２８は、定義リストオペランドにおける仮想レジスタオペランドの変換処理の一例を示す図である。FIG. 28 is a diagram illustrating an example of a virtual register operand conversion process in the definition list operand. 図２９は、出力ループ命令列格納部に格納された命令列の一例を示す図である。FIG. 29 is a diagram illustrating an example of an instruction sequence stored in the output loop instruction sequence storage unit. 図３０は、回転数が補正されたループ命令列の一例を示す図である。FIG. 30 is a diagram illustrating an example of a loop instruction sequence in which the rotation speed is corrected. 図３１は、ループ展開後のループの命令列の一例を示す図である。FIG. 31 is a diagram illustrating an example of an instruction sequence of a loop after loop expansion. 図３２は、命令列が実行された場合の演算器の使用状況の一例を示す図である。FIG. 32 is a diagram illustrating an example of a usage state of an arithmetic unit when an instruction sequence is executed. 図３３は、命令列が実行された場合の演算器の使用状況の一例を示す図である。FIG. 33 is a diagram illustrating an example of a usage state of an arithmetic unit when an instruction sequence is executed. 図３４は、第１比較例に係るコンパイラ装置のループ展開を行う処理ブロックの一例である。FIG. 34 is an example of a processing block for performing loop expansion of the compiler apparatus according to the first comparative example. 図３５は、ループの命令列を疑似的に例示する図である。FIG. 35 is a diagram exemplifying a loop instruction sequence. 図３６は、図３５に例示されるループが第１比較例に係るコンパイラ装置によってループ展開された命令列の一例を示す図である。FIG. 36 is a diagram illustrating an example of an instruction sequence in which the loop illustrated in FIG. 35 is loop expanded by the compiler apparatus according to the first comparative example. 図３７は、第２比較例によるループ展開を模式的に例示する図である。FIG. 37 is a diagram schematically illustrating loop expansion according to the second comparative example. 図３８は、図３７を疑似コードによって例示する図である。FIG. 38 is a diagram illustrating FIG. 37 by pseudo code. 図３９は、第１比較例によるループ展開と第１実施形態によるループ展開とを比較する図の一例である。FIG. 39 is an example of a diagram comparing the loop expansion according to the first comparative example and the loop expansion according to the first embodiment. 図４０は、コンパイラに入力されるソースファイルに含まれるループの一例を示す図である。FIG. 40 is a diagram illustrating an example of a loop included in a source file input to the compiler. 図４１は、図４０に例示されたループを第１比較例によるループ展開を行った命令列の一例を示す図である。FIG. 41 is a diagram illustrating an example of an instruction sequence obtained by performing loop expansion on the loop illustrated in FIG. 40 according to the first comparative example. 図４２は、図４０に例示されたループに対して第１実施形態によるループ展開を行った命令列の一例を示す図である。FIG. 42 is a diagram illustrating an example of an instruction sequence obtained by performing loop expansion according to the first embodiment on the loop illustrated in FIG. 図４３は、回帰演算の一例を示す図である。FIG. 43 is a diagram illustrating an example of the regression calculation. 図４４は、参照オペランドと定義オペランドを含む命令の一例を示す図である。FIG. 44 is a diagram illustrating an example of an instruction including a reference operand and a definition operand. 図４５は、コンパイラ装置に入力されるループの命令列の一例を示す図である。FIG. 45 is a diagram showing an example of a loop instruction string input to the compiler apparatus. 図４６は、図４５で例示されたループ展開後の命令列を中間コードで表現した命令列の一例を示す図である。FIG. 46 is a diagram illustrating an example of an instruction sequence in which the instruction sequence after loop expansion illustrated in FIG. 45 is expressed by intermediate code. 図４７は、第１変形例に係るコンパイラ装置によるループ展開処理の流れの一例を示す図である。FIG. 47 is a diagram illustrating an example of the flow of loop expansion processing performed by the compiler apparatus according to the first modification. 図４８は、図４７のＲ１およびＲ２の処理の詳細な流れの一例を示す図である。FIG. 48 is a diagram showing an example of a detailed flow of the processing of R1 and R2 of FIG. 図４９は、回帰演算用の初期化処理を示す図の一例である。FIG. 49 is an example of a diagram illustrating initialization processing for regression calculation. 図５０は、ループ演算命令列の展開処理の一例を示す図である。FIG. 50 is a diagram illustrating an example of a loop operation instruction sequence expansion process. 図５１は、回帰演算命令の一例を示す図である。FIG. 51 is a diagram illustrating an example of a regression calculation instruction. 図５２は、回帰演算命令の書き換え処理の一例を示す図である。FIG. 52 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. 図５３は、回帰演算命令の書き換え処理の一例を示す図である。FIG. 53 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. 図５４は、回帰演算命令の書き換え処理の一例を示す図である。FIG. 54 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. 図５５は、回帰演算命令の書き換え処理の一例を示す図である。FIG. 55 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. 図５６は、回帰演算命令の書き換え処理の一例を示す図である。FIG. 56 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. 図５７は、ループ展開後のループ演算命令列の一例を示す図である。FIG. 57 is a diagram showing an example of a loop operation instruction sequence after loop expansion. 図５８は、ループ展開後の初期化命令列の一例を示す図である。FIG. 58 is a diagram showing an example of an initialization instruction sequence after loop expansion. 図５９は、ループ展開後の収束命令列の一例を示す図である。FIG. 59 is a diagram illustrating an example of a converged instruction sequence after loop expansion. 図６０は、ループ展開後のループ命令列の一例を示す図である。FIG. 60 is a diagram showing an example of a loop instruction sequence after loop expansion. 図６１は、ループ展開後のループの命令列の一例を示す図である。FIG. 61 is a diagram showing an example of a loop instruction sequence after loop expansion.

以下、図面を参照して、一実施形態に係るコンパイラ装置について説明する。以下に示す実施形態の構成は例示であり、開示の技術は実施形態の構成に限定されない。 Hereinafter, a compiler apparatus according to an embodiment will be described with reference to the drawings. The configuration of the embodiment described below is an exemplification, and the disclosed technology is not limited to the configuration of the embodiment.

＜第１実施形態＞
図１は、情報処理装置１００のハードウェア構成を例示する図である。情報処理装置１００は、プロセッサ１０１、主記憶部１０２、補助記憶部１０３、通信部１０４および接続バスＢ１を含む。プロセッサ１０１、主記憶部１０２、補助記憶部１０３および通信部１０４は、接続バスＢ１によって相互に接続されている。情報処理装置１００は、例えば、第１実施形態に係るコンパイラ装置１０として使用できる。コンパイラ装置１０は、入力されたプログラムのソースファイルをコンパイルすることでプロセッサによって実行可能なオブジェクトを生成する。 <First Embodiment>
FIG. 1 is a diagram illustrating a hardware configuration of the information processing apparatus 100. The information processing apparatus 100 includes a processor 101, a main storage unit 102, an auxiliary storage unit 103, a communication unit 104, and a connection bus B1. The processor 101, the main storage unit 102, the auxiliary storage unit 103, and the communication unit 104 are connected to each other by a connection bus B1. The information processing apparatus 100 can be used as the compiler apparatus 10 according to the first embodiment, for example. The compiler apparatus 10 generates an object executable by the processor by compiling a source file of the input program.

情報処理装置１００では、プロセッサ１０１が補助記憶部１０３に記憶されたプログラ
ムを主記憶部１０２の作業領域に展開し、プログラムの実行を通じて周辺装置の制御を行う。これにより、情報処理装置１００は、所定の目的に合致した処理を実行することができる。主記憶部１０２および補助記憶部１０３は、情報処理装置１００が読み取り可能な記録媒体である。 In the information processing apparatus 100, the processor 101 expands the program stored in the auxiliary storage unit 103 in the work area of the main storage unit 102, and controls peripheral devices through execution of the program. Thereby, the information processing apparatus 100 can execute a process that matches a predetermined purpose. The main storage unit 102 and the auxiliary storage unit 103 are recording media that can be read by the information processing apparatus 100.

主記憶部１０２は、プロセッサ１０１から直接アクセスされる記憶部として例示される。主記憶部１０２は、Random Access Memory（ＲＡＭ）およびRead Only Memory（ＲＯＭ）を含む。 The main storage unit 102 is exemplified as a storage unit that is directly accessed from the processor 101. The main storage unit 102 includes a random access memory (RAM) and a read only memory (ROM).

補助記憶部１０３は、各種のプログラムおよび各種のデータを読み書き自在に記録媒体に格納する。補助記憶部１０３は外部記憶装置とも呼ばれる。補助記憶部１０３には、オペレーティングシステム（Operating System、ＯＳ）、各種プログラム、各種テーブル等が格納される。ＯＳは、通信部１０４を介して接続される外部装置等とのデータの受け渡しを行う通信インターフェースプログラムを含む。外部装置等には、例えば、コンピュータネットワーク等で接続された、他の情報処理装置および外部記憶装置が含まれる。なお、補助記憶部１０３は、例えば、ネットワーク上のコンピュータ群であるクラウドシステムの一部であってもよい。 The auxiliary storage unit 103 stores various programs and various data in a recording medium in a readable and writable manner. The auxiliary storage unit 103 is also called an external storage device. The auxiliary storage unit 103 stores an operating system (OS), various programs, various tables, and the like. The OS includes a communication interface program that exchanges data with an external device or the like connected via the communication unit 104. Examples of the external device include other information processing devices and external storage devices connected via a computer network or the like. The auxiliary storage unit 103 may be a part of a cloud system that is a group of computers on a network, for example.

補助記憶部１０３は、例えば、Erasable Programmable ROM（ＥＰＲＯＭ）、ソリッド
ステートドライブ（Solid State Drive、ＳＳＤ）、ハードディスクドライブ（Hard Disk
Drive、ＨＤＤ）等である。また、補助記憶部１０３は、例えば、Compact Disc（ＣＤ）ドライブ装置、Digital Versatile Disc（ＤＶＤ）ドライブ装置、Blu-ray（登録商標） Disc（ＢＤ）ドライブ装置等である。また、補助記憶部１０３は、Network Attached Storage（ＮＡＳ）あるいはStorage Area Network（ＳＡＮ）によって提供されてもよい。 The auxiliary storage unit 103 includes, for example, an Erasable Programmable ROM (EPROM), a solid state drive (SSD), and a hard disk drive (Hard Disk).
Drive, HDD). The auxiliary storage unit 103 is, for example, a Compact Disc (CD) drive device, a Digital Versatile Disc (DVD) drive device, or a Blu-ray (registered trademark) Disc (BD) drive device. Further, the auxiliary storage unit 103 may be provided by Network Attached Storage (NAS) or Storage Area Network (SAN).

情報処理装置１００が読み取り可能な記録媒体とは、データやプログラム等の情報を電気的、磁気的、光学的、機械的、または化学的作用によって蓄積し、情報処理装置１００から読み取ることができる記録媒体をいう。このような記録媒体のうち情報処理装置１００から取り外し可能なものとしては、例えばフレキシブルディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ／Ｗ、ＤＶＤ、ブルーレイディスク、ＤＡＴ、８ｍｍテープ、フラッシュメモリなどのメモリカード等がある。また、情報処理装置１００に固定された記録媒体としてハードディスク、ＳＳＤあるいはＲＯＭ等がある。 A recording medium that can be read by the information processing apparatus 100 is a recording medium that stores information such as data and programs by electrical, magnetic, optical, mechanical, or chemical action and can be read from the information processing apparatus 100. Say medium. Examples of such a recording medium that can be removed from the information processing apparatus 100 include a flexible disk, a magneto-optical disk, a CD-ROM, a CD-R / W, a DVD, a Blu-ray disk, a DAT, an 8 mm tape, a flash memory, and the like. There are memory cards. In addition, as a recording medium fixed to the information processing apparatus 100, there are a hard disk, an SSD, a ROM, and the like.

通信部１０４は、例えば、コンピュータネットワークとのインターフェースである。通信部１０４は、コンピュータネットワークを介して外部の装置と通信を行う。 The communication unit 104 is, for example, an interface with a computer network. The communication unit 104 communicates with an external device via a computer network.

情報処理装置１００は、例えば、ユーザ等からの操作指示等を受け付ける入力部をさらに備えてもよい。このような入力部として、キーボード、ポインティングデバイス、タッチパネル、加速度センサーあるいは音声入力装置といった入力デバイスを例示できる。 The information processing apparatus 100 may further include, for example, an input unit that receives an operation instruction from a user or the like. Examples of such an input unit include an input device such as a keyboard, a pointing device, a touch panel, an acceleration sensor, or a voice input device.

情報処理装置１００は、例えば、プロセッサ１０１で処理されるデータや主記憶部１０２に記憶されるデータを出力する出力部を備えるものとしてもよい。このような、出力部として、Cathode Ray Tube（ＣＲＴ）ディスプレイ、Liquid Crystal Display（ＬＣＤ）、Plasma Display Panel（ＰＤＰ）、Electroluminescence（ＥＬ）パネル、有機ＥＬパ
ネルあるいはプリンタといった出力デバイスを例示できる。 The information processing apparatus 100 may include an output unit that outputs data processed by the processor 101 and data stored in the main storage unit 102, for example. Examples of such an output unit include output devices such as a Cathode Ray Tube (CRT) display, a Liquid Crystal Display (LCD), a Plasma Display Panel (PDP), an Electroluminescence (EL) panel, an organic EL panel, or a printer.

図２は、情報処理装置１００が有するプロセッサ１０１の構成の一例を示す図である。プロセッサ１０１は、１つの整数演算器１０１ａおよび２つの浮動小数点演算器１０１ｂ、１０１ｂを有する。整数演算器１０１ａは、整数演算を実行する演算器である。浮動小数点演算器１０１ｂ、１０１ｂは、浮動小数点演算を実行する演算器である。プロセッサ
１０１は、複数の演算器を有することで、複数の命令を並列に実行可能である。なお、図中では、整数演算器１０１ａは「ＩＵ」と記載され、浮動小数点演算器１０１ｂは「ＦＵ」と記載されている。 FIG. 2 is a diagram illustrating an example of the configuration of the processor 101 included in the information processing apparatus 100. The processor 101 has one integer arithmetic unit 101a and two floating point arithmetic units 101b and 101b. The integer arithmetic unit 101a is an arithmetic unit that performs integer arithmetic. The floating point arithmetic units 101b and 101b are arithmetic units that execute floating point arithmetic. The processor 101 includes a plurality of arithmetic units, so that a plurality of instructions can be executed in parallel. In the drawing, the integer arithmetic unit 101a is described as “IU”, and the floating point arithmetic unit 101b is described as “FU”.

ここで、コンパイラ装置１０内部での命令の表現である中間コードについて説明する。図３は、中間コードの一例を示す図である。図３を参照して、中間コードについて説明する。 Here, an intermediate code that is an expression of an instruction in the compiler apparatus 10 will be described. FIG. 3 is a diagram illustrating an example of the intermediate code. The intermediate code will be described with reference to FIG.

図３に例示される「ld」、「mult」および「st」は、中間コードによって表記された命令コードの一例である。命令コードには整数型の整数命令と浮動小数点型の浮動小数点命令の２種類の型がある。整数命令は、整数演算を行う。浮動小数点命令は、浮動小数点演算を行う。すなわち、整数命令は整数演算器１０１ａで実行され、浮動小数点命令は浮動小数点演算器１０１ｂで実行される。図３に例示される「ld」、「mult」および「st」は、いずれも整数命令の例示である。命令に続いて記載されている「[a+$g1], $g2」、「$g2, 5, $g3」および「$g3, [b+$g1]」は、命令に渡される引数でありオペランドと称され
る。オペランドは、例えば、カンマ（,）で区切ることで複数指定できる。オペランドに
は、仮想レジスタ、メモリオペランドおよび定数等が指定される。整数命令には、整数型のオペランドが指定される。浮動小数点命令には、浮動小数点型のオペランドが指定される。 “Ld”, “mult”, and “st” illustrated in FIG. 3 are examples of instruction codes represented by intermediate codes. There are two types of instruction codes: integer type integer instructions and floating point type floating point instructions. An integer instruction performs an integer operation. Floating point instructions perform floating point operations. That is, the integer instruction is executed by the integer arithmetic unit 101a, and the floating point instruction is executed by the floating point arithmetic unit 101b. “Ld”, “mult”, and “st” illustrated in FIG. 3 are all examples of integer instructions. "[A + $ g1], $ g2", "$ g2, 5, $ g3" and "$ g3, [b + $ g1]" described after the instruction are arguments passed to the instruction and operands. Called. For example, a plurality of operands can be specified by separating them with a comma (,). As the operand, a virtual register, a memory operand, a constant, or the like is designated. An integer type operand is specified for the integer instruction. A floating-point type operand is specified for the floating-point instruction.

仮想レジスタは、中間コードで用いられる仮想的なレジスタである。図３では、「$g1
」、「$g2」および「$g3」が仮想レジスタの例示である。仮想レジスタは、ソースファイルがコンパイラ装置１０によってコンパイルされる際に割り付けられるコンパイラ内部での仮想的なレジスタである。仮想レジスタは、整数を扱う整数型の仮想レジスタと浮動小数点数を扱う浮動小数点型の仮想レジスタとがある。本明細書では、整数型をint型、浮
動小数点型をfloat型とも称する。int型の仮想レジスタは、仮想レジスタ名「$g」と仮想レジスタ番号である数字との組み合わせで特定される。float型の仮想レジスタは、仮想
レジスタ名「$f」と仮想レジスタ番号である数字との組み合わせで特定される。int型お
よびfloat型のいずれの仮想レジスタも、仮想レジスタ番号は１から始まり、新しい仮想
レジスタが作成されるたびに仮想レジスタ番号が１ずつ増加する。 The virtual register is a virtual register used in the intermediate code. In Figure 3, “$ g1
”,“ $ G2 ”, and“ $ g3 ”are examples of virtual registers. The virtual register is a virtual register inside the compiler that is allocated when the source file is compiled by the compiler apparatus 10. The virtual registers include an integer type virtual register that handles integers and a floating point type virtual register that handles floating point numbers. In this specification, the integer type is also referred to as int type, and the floating point type is also referred to as float type. An int type virtual register is specified by a combination of a virtual register name “$ g” and a number that is a virtual register number. A float type virtual register is specified by a combination of a virtual register name “$ f” and a number that is a virtual register number. In both the int type and float type virtual registers, the virtual register number starts from 1, and the virtual register number is incremented by 1 each time a new virtual register is created.

メモリオペランドは、アドレス定数、仮想レジスタおよびオフセットによって特定される。図３では、「[a+$g1]」および「[b+$g1]」がメモリオペランドの例示である。アドレス定数は、アドレスを指し示すラベルとしての文字列によって特定される。図３では、「[a+$g1]」の「a」および「[b+$g1]」の「b」がアドレス定数である。オフセットは、アドレス定数および仮想レジスタによって指定されたアドレスからの距離を示す数値である。図３では、「[a+$g1]」および「[b+$g1]」のいずれもオフセットは「０」となっている。すなわち、図３に例示される「[a+$g1]」は、アドレス定数「a」によって示されるアドレスと仮想レジスタ「$g1」の値が加算されたアドレスのメモリを特定している。また、図
３に例示される「[b+$g1]」は、アドレス定数「b」によって示されるアドレスと仮想レジスタ「$g1」の値が加算されたアドレスのメモリを特定している。 A memory operand is specified by an address constant, a virtual register, and an offset. In FIG. 3, “[a + $ g1]” and “[b + $ g1]” are examples of memory operands. The address constant is specified by a character string as a label indicating the address. In FIG. 3, “a” in “[a + $ g1]” and “b” in “[b + $ g1]” are address constants. The offset is a numerical value indicating the distance from the address specified by the address constant and the virtual register. In FIG. 3, the offsets of “[a + $ g1]” and “[b + $ g1]” are both “0”. That is, “[a + $ g1]” illustrated in FIG. 3 specifies the memory at the address obtained by adding the address indicated by the address constant “a” and the value of the virtual register “$ g1”. Further, “[b + $ g1]” illustrated in FIG. 3 specifies the memory at the address obtained by adding the address indicated by the address constant “b” and the value of the virtual register “$ g1”.

図３の「label」は、ブロックの先頭位置を示すラベルである。ブロックは、複数の命
令を含む。分岐命令またはジャンプ命令では、ラベルが指定されることで、ブロックの先頭に処理が進められる。基本ブロックとは、ブロック内に分岐を含まないブロックである。基本ブロックは分岐を含まないため、基本ブロック内の最初の命令が実行されると途中で分岐することなく最後の命令まで実行される。 “Label” in FIG. 3 is a label indicating the head position of the block. The block includes a plurality of instructions. In the branch instruction or the jump instruction, the process is advanced to the head of the block by specifying the label. A basic block is a block that does not include a branch in the block. Since the basic block does not include a branch, when the first instruction in the basic block is executed, the instruction is executed up to the last instruction without branching on the way.

図４は、ブロックと命令の関係の一例を示す図である。図４では、図３に例示された命令列に基づいて、ブロックと命令の関係の一例を示している。最初の「ld」命令では、ア
ドレス定数「a」が指し示す値と仮想レジスタ「$g1」に格納された値との和によって示されるアドレスで特定されるメモリの値が、仮想レジスタ「$g2」に読み込まれる。次の「mult」命令では、仮想レジスタ「$g2」に読み込まれた値と数字の「5」との乗算の結果が
、仮想レジスタ「$g3」に格納される。最後の「st」命令では、仮想レジスタ「$g3」に格納された値が、アドレス定数「b」と仮想レジスタ「$g1」との和によって示されるアドレスで特定されるメモリ上に格納される。 FIG. 4 is a diagram illustrating an example of the relationship between blocks and instructions. FIG. 4 shows an example of the relationship between a block and an instruction based on the instruction sequence illustrated in FIG. In the first “ld” instruction, the value of the memory specified by the address indicated by the sum of the value indicated by the address constant “a” and the value stored in the virtual register “$ g1” is the virtual register “$ g2”. Is read. In the next “mult” instruction, the result of multiplication of the value read into the virtual register “$ g2” and the number “5” is stored in the virtual register “$ g3”. In the last “st” instruction, the value stored in the virtual register “$ g3” is stored in the memory specified by the address indicated by the sum of the address constant “b” and the virtual register “$ g1”. .

図５は、ループの命令列の一例を示す図である。ループの命令列は、初期化命令列、ループ演算命令列およびループ命令列を含む。初期化命令列は、ループの初期化を行う。ループの初期化は、ループに用いられるループカウンタの初期化を含む。図５では、ループカウンタとして指定した仮想レジスタ「$g1」に０を代入することで、ループカウンタの
初期化を行っている。ラベル「Label」は、ループ演算命令列の先頭位置を特定する。ル
ープ演算命令列は、ループによって繰り返し実行される命令を含む。ループ命令列は、加算命令または減算命令と比較分岐命令とを含む。ループ命令列における加算命令または減算命令は、加算または減算によってループカウンタの数値を変更する。図５では、加算命令「add」によって、ループカウンタ「$g1」に「４」を加算している。比較分岐命令は、指定された条件にしたがって、処理を分岐させる。図７では、ループカウンタ「$g1」の
値が「９０×４」未満の場合、ラベル「Label」によって特定されるループ演算命令列の
先頭位置に処理が進められる。すなわち、ループは、初期化命令列およびループ命令列によって指定された条件にしたがって、ループ演算命令列の命令を繰り返し実行する。また、ループ演算命令列が基本ブロックである場合、当該ループを最内ループと称する。 FIG. 5 is a diagram illustrating an example of a loop instruction sequence. The instruction sequence of the loop includes an initialization instruction sequence, a loop operation instruction sequence, and a loop instruction sequence. The initialization instruction sequence initializes the loop. The initialization of the loop includes initialization of a loop counter used for the loop. In FIG. 5, the loop counter is initialized by assigning 0 to the virtual register “$ g1” designated as the loop counter. The label “Label” specifies the head position of the loop operation instruction sequence. The loop operation instruction sequence includes instructions that are repeatedly executed by the loop. The loop instruction sequence includes an addition instruction or subtraction instruction and a comparison branch instruction. The addition instruction or subtraction instruction in the loop instruction sequence changes the value of the loop counter by addition or subtraction. In FIG. 5, “4” is added to the loop counter “$ g1” by the addition instruction “add”. The comparison branch instruction branches the process according to a specified condition. In FIG. 7, when the value of the loop counter “$ g1” is less than “90 × 4”, the process proceeds to the head position of the loop operation instruction sequence specified by the label “Label”. That is, the loop repeatedly executes the instructions in the loop operation instruction sequence according to the conditions specified by the initialization instruction sequence and the loop instruction sequence. When the loop operation instruction sequence is a basic block, the loop is referred to as an innermost loop.

図６は、ループ展開後の命令列の一例を示す図である。図６は、図５に例示されたループの命令列をループ展開した命令列の一例である。ループ展開は、ループアンローリングとも称する。ループ展開では、ループ演算命令列の命令を展開することで、ループの回転数を減少させる。図６では、図５に例示されたループ演算命令列の３回転分が１回のループで実行されるようにループ展開されている。展開後の１回のループで実行される、展開前のループ演算命令列の数をループの展開数と称する。図６の場合、展開数は「３」となる。すなわち、ループ展開によって、ループの回転数が減少し、１回のループで実行される命令列の数が増加する。図６に例示されるＡ、Ｂ、Ｃのそれぞれが、図５に例示されたループ命令列のループ１回転分の命令列に相当する。Ａ、Ｂ、Ｃそれぞれによって例示される命令列には、この並び順に展開番号が割り当てられる。すなわち、Ａに例示される命令列の展開番号は１であり、Ｂに例示される命令列の展開番号は２であり、Ｃに例示される命令列の展開番号は３である。例えば、変換後のメモリオペランドのオフセット値は、変換前のオフセット値に（展開番号−１）の値を乗算することで算出する事が可能である。例えば、図５に例示される変換前の命令列では、加算命令によって、１回転ごとに仮想レジスタ「$g1」の値が「４」加算されている。したがって、図６に例示されるループ展
開後の命令列では、例えば、Ｂの命令列（展開番号２）のメモリオペランドでは、「（２−１）×４＝４」のオフセット値が加算されている。また、ループ展開に伴い、ループ命令列におけるループカウンタの１回転当たりの増分が補正されている。図６に例示されるループでは展開数が「３」であるため、ループ命令列における加算命令ではループカウンタ「$g1」に「４×３」が加算されている。ループ展開を行う事で、Ａ、Ｂ、Ｃによって
例示される各命令列の間の分岐命令を排除できる。分岐命令が排除された結果、Ａ、Ｂ、Ｃによって例示される各命令列を複数の演算器で並列して実行可能となる。したがって、ループ展開により、命令の並列実行の効率を高めることが可能である。 FIG. 6 is a diagram illustrating an example of an instruction sequence after loop expansion. FIG. 6 is an example of an instruction sequence obtained by loop expansion of the instruction sequence of the loop illustrated in FIG. Loop unrolling is also called loop unrolling. In the loop expansion, the number of rotations of the loop is reduced by expanding the instructions in the loop operation instruction sequence. In FIG. 6, the loop is expanded so that three rotations of the loop operation instruction sequence illustrated in FIG. 5 are executed in one loop. The number of loop operation instruction sequences before expansion that are executed in one loop after expansion is referred to as the number of loop expansions. In the case of FIG. 6, the number of expansions is “3”. That is, the loop expansion reduces the number of loop rotations and increases the number of instruction sequences executed in one loop. Each of A, B, and C illustrated in FIG. 6 corresponds to an instruction sequence for one loop rotation of the loop instruction sequence illustrated in FIG. Expansion numbers are assigned to the instruction sequences exemplified by A, B, and C in this order. That is, the expansion number of the instruction sequence illustrated by A is 1, the expansion number of the instruction sequence illustrated by B is 2, and the expansion number of the instruction sequence illustrated by C is 3. For example, the offset value of the memory operand after conversion can be calculated by multiplying the offset value before conversion by the value of (development number −1). For example, in the instruction sequence before conversion illustrated in FIG. 5, the value of the virtual register “$ g1” is incremented by “4” every rotation by the addition instruction. Therefore, in the instruction sequence after loop expansion illustrated in FIG. 6, for example, an offset value of “(2-1) × 4 = 4” is added to the memory operand of the B instruction sequence (expansion number 2). Yes. Further, with the loop expansion, the increment per one rotation of the loop counter in the loop instruction sequence is corrected. Since the expansion number is “3” in the loop illustrated in FIG. 6, “4 × 3” is added to the loop counter “$ g1” in the addition instruction in the loop instruction string. By performing loop expansion, branch instructions between the instruction sequences exemplified by A, B, and C can be eliminated. As a result of the elimination of the branch instruction, each instruction sequence exemplified by A, B, and C can be executed in parallel by a plurality of arithmetic units. Therefore, the efficiency of parallel execution of instructions can be increased by loop expansion.

図７は、プロセッサ１０１に入力される命令列に含まれる各命令が複数の演算器で実行される処理の一例を示す図である。図７において、ＦＵ０は浮動小数点演算器１０１ｂの一方に対応し、ＦＵ１は浮動小数点演算器１０１ｂの他方に対応する。また、ＩＵ０は、整数演算器１０１ａに対応する。図７では、第１サイクルから第３サイクルまでの処理が
例示されている。図７の命令列に含まれる命令「fadd」は、浮動小数点命令の一例である。浮動小数点命令である「fadd」は、浮動小数点演算器１０１ｂであるＦＵ０またはＦＵ１によって実行される。その結果、第１サイクルでは、命令列の１行目に記載の「fadd $f0,$f1,$f2」がＦＵ０で実行され、命令列の２行目に記載の「fadd $f3,$f4,$f5」はＦＵ１で実行される。第２サイクルでは、命令列の３行目に記載の「fadd $f6,$f7,$f8」がＦＵ０で実行され、命令列の４行目に記載の「fadd $f9,$f10,$f11」はＦＵ１で実行される。第３サイクルでは、命令列の５行目に記載の「fadd $f12,$f13,$f14」がＦＵ０で実行
される。なお、この命令列には、整数命令は含まれていない。そのため、整数命令を実行する整数演算器１０１ｂであるＩＵ０は、図７の命令列をプロセッサ１０１で実行している間、処理を行っていない。 FIG. 7 is a diagram illustrating an example of processing in which each instruction included in the instruction sequence input to the processor 101 is executed by a plurality of arithmetic units. In FIG. 7, FU0 corresponds to one of the floating point arithmetic units 101b, and FU1 corresponds to the other of the floating point arithmetic units 101b. IU0 corresponds to the integer arithmetic unit 101a. FIG. 7 illustrates processing from the first cycle to the third cycle. The instruction “fadd” included in the instruction sequence of FIG. 7 is an example of a floating-point instruction. The “fadd” which is a floating point instruction is executed by the FU0 or FU1 which is the floating point arithmetic unit 101b. As a result, in the first cycle, “fadd $ f0, $ f1, $ f2” described in the first line of the instruction sequence is executed by FU0, and “fadd $ f3, $ f4” described in the second line of the instruction sequence is executed. , $ f5 "is executed by FU1. In the second cycle, “fadd $ f6, $ f7, $ f8” described in the third row of the instruction sequence is executed by FU0, and “fadd $ f9, $ f10, $ f11” described in the fourth row of the instruction sequence is executed. "Is executed in FU1. In the third cycle, “fadd $ f12, $ f13, $ f14” described in the fifth line of the instruction sequence is executed by FU0. This instruction sequence does not include integer instructions. Therefore, IU0, which is an integer arithmetic unit 101b that executes integer instructions, does not perform processing while the processor 101 executes the instruction sequence of FIG.

図８は、命令列に含まれる各命令を複数の演算器で分散して実行している状態の一例を示す図である。図８では、第１サイクルから第３サイクルまでの処理が例示されている。図８の命令列に含まれる命令「add」は整数命令の一例である。整数命令である「add」は、整数演算器１０１ｂであるＩＵ０によって実行される。その結果、第１サイクルでは、命令列の１行目に記載の「fadd $f0,$f1,$f2」がＦＵ０で実行され、命令列の２行目に記載の「fadd $f3,$f4,$f5」はＦＵ１で実行され、命令列の３行目に記載の「add $g0,$g1,$g2」は、ＩＵ０で実行される。第２サイクルでは、命令列の４行目に記載の「fadd $f6,$f7,$f8」がＦＵ０で実行され、命令列の５行目に記載の「fadd $f9,$f10,$f11」はＦＵ
１で実行される。 FIG. 8 is a diagram illustrating an example of a state in which each instruction included in the instruction sequence is distributed and executed by a plurality of arithmetic units. In FIG. 8, processing from the first cycle to the third cycle is illustrated. The instruction “add” included in the instruction sequence of FIG. 8 is an example of an integer instruction. The integer instruction “add” is executed by the IU0 which is the integer arithmetic unit 101b. As a result, in the first cycle, “fadd $ f0, $ f1, $ f2” described in the first line of the instruction sequence is executed by FU0, and “fadd $ f3, $ f4” described in the second line of the instruction sequence is executed. , $ f5 ”is executed by FU1, and“ add $ g0, $ g1, $ g2 ”described in the third line of the instruction sequence is executed by IU0. In the second cycle, “fadd $ f6, $ f7, $ f8” described in the fourth row of the instruction sequence is executed by FU0, and “fadd $ f9, $ f10, $ f11” described in the fifth row of the instruction sequence is executed. Is FU
1 is executed.

図９は、命令列に含まれる各命令の実行がひとつの演算器に集中した状態の一例を示す図である。図９では、第１サイクルから第３サイクルまでの処理が例示されている。図９の命令列に含まれる命令は、全て整数命令である。そのため、図９の命令列に含まれる全ての命令は、整数演算器１０１ｂであるＩＵ０で実行される。その結果、浮動小数点演算器であるＦＵ０およびＦＵ１は、図９に例示される命令列をプロセッサ１０１で実行している間、処理を行っていない。 FIG. 9 is a diagram illustrating an example of a state in which execution of each instruction included in the instruction sequence is concentrated on one arithmetic unit. In FIG. 9, processing from the first cycle to the third cycle is illustrated. All the instructions included in the instruction sequence of FIG. 9 are integer instructions. Therefore, all the instructions included in the instruction sequence of FIG. 9 are executed by IU0 which is the integer arithmetic unit 101b. As a result, FU0 and FU1, which are floating point arithmetic units, do not perform processing while the processor 101 executes the instruction sequence illustrated in FIG.

図８および図９を対比するとわかるように、プロセッサ１０１に入力される命令列に含まれる命令が整数命令および浮動小数点命令の一方に偏っていると、プロセッサ１０１が有する演算器を有効に使用できない虞がある。 As can be seen by comparing FIG. 8 and FIG. 9, if the instruction included in the instruction sequence input to the processor 101 is biased to one of the integer instruction and the floating-point instruction, the arithmetic unit included in the processor 101 cannot be used effectively. There is a fear.

図１０は、プログラムをコンパイルする処理の流れの一例を示す図である。プログラムのコンパイルはコンパイラ装置１０によって実行される。Ｔ１では、コンパイラ装置１０は、入力されたプログラムのソースファイルの構文解析を行う。構文解析では、ソースファイルがプログラミング言語の仕様と合致しているか否か解析される。Ｔ２では、コンパイラ装置１０は、最適化を行う。最適化は、ループに含まれる命令を展開するループ展開を含む。Ｔ３では、コンパイラ装置１０は、レジスタの割り付けを行う。ここでは、コンパイラ装置１０は、仮想レジスタをプロセッサ１０１の有するレジスタに割り付ける。Ｔ４では、コンパイラ装置１０は、命令のスケジューリングを行う。命令のスケジューリングでは、プロセッサ１０１の整数演算器１０１ａ、浮動小数点演算器１０１ｂ、１０１ｂを効率的に使用できるように各命令の実行順が並べ替えられる。Ｔ５では、コンパイラ装置１０は、プロセッサ１０１によって実行可能なコードを生成する。実行可能なコードは、オブジェクトとも称される。 FIG. 10 is a diagram illustrating an example of a flow of processing for compiling a program. Compilation of the program is executed by the compiler apparatus 10. At T1, the compiler apparatus 10 performs syntax analysis of the source file of the input program. In the syntax analysis, it is analyzed whether or not the source file matches the specification of the programming language. At T2, the compiler apparatus 10 performs optimization. Optimization includes loop expansion that expands the instructions contained in the loop. At T3, the compiler apparatus 10 performs register allocation. Here, the compiler apparatus 10 assigns a virtual register to a register included in the processor 101. At T4, the compiler apparatus 10 performs instruction scheduling. In instruction scheduling, the execution order of each instruction is rearranged so that the integer arithmetic unit 101a and floating point arithmetic units 101b and 101b of the processor 101 can be used efficiently. At T <b> 5, the compiler apparatus 10 generates code that can be executed by the processor 101. Executable code is also referred to as an object.

図１１は、コンパイルおよびコンパイル後のプログラムの実行の一例を示す図である。Ｔ１１は、図１０のＴ２に対応する。Ｔ１１では、本実施形態において説明する最適化処理が行われる。Ｔ１２およびＴ１３は、それぞれ図１０のＴ４およびＴ５に対応する処理である。Ｔ１４では、コンパイルされたプログラムをプロセッサ１０１が実行する。プロセッサ１０１は、Ｔ１３で生成された実行形式のオブジェクトの中から同時に実行可能な
命令を検出し、アウトオブオーダーおよびスーパースカラによって検出された命令を並列に実行する。 FIG. 11 is a diagram illustrating an example of compilation and execution of a program after compilation. T11 corresponds to T2 in FIG. At T11, the optimization process described in this embodiment is performed. T12 and T13 are processes corresponding to T4 and T5 in FIG. 10, respectively. In T14, the processor 101 executes the compiled program. The processor 101 detects instructions that can be executed simultaneously from the execution format objects generated in T13, and executes the instructions detected by out-of-order and superscalar in parallel.

＜コンパイラ装置１０の処理ブロック＞
図１２は、第１実施形態に係るコンパイラ装置１０の処理ブロックの一例を示す図である。図１２では、最適化部２００、中間コード部２６０、マシンモデル２０１、ＮＦＵ数格納部２０２およびＮＩＵ数格納部２０３の各処理ブロックが例示されている。例えば、図１のプロセッサ１０１が図１２の各処理ブロックとして主記憶部１０２に展開されたコンピュータプログラムを実行する。ただし、図１２のいずれかの処理ブロックの少なくとも一部はハードウェア回路、専用のプロセッサまたはデジタルシグナルプロセッサ（Digital Signal Processor、ＤＳＰ）を含んでもよい。 <Processing Block of Compiler Device 10>
FIG. 12 is a diagram illustrating an example of processing blocks of the compiler apparatus 10 according to the first embodiment. FIG. 12 illustrates processing blocks of the optimization unit 200, the intermediate code unit 260, the machine model 201, the NFU number storage unit 202, and the NIU number storage unit 203. For example, the processor 101 in FIG. 1 executes a computer program loaded in the main storage unit 102 as each processing block in FIG. However, at least a part of any of the processing blocks in FIG. 12 may include a hardware circuit, a dedicated processor, or a digital signal processor (DSP).

マシンモデル２０１には、プロセッサ種別毎の情報があらかじめ記憶される。マシンモデル２０１は、例えば、プロセッサ種別毎の整数演算器および浮動小数点演算器の数を記憶する。マシンモデル２０１は、「プロセッサ情報記憶部」の一例である。 The machine model 201 stores information for each processor type in advance. The machine model 201 stores, for example, the numbers of integer arithmetic units and floating point arithmetic units for each processor type. The machine model 201 is an example of a “processor information storage unit”.

最適化部２００は、例えば、図１０のＴ２に例示される最適化処理を行う。最適化部２００は、演算器数取得部２１０、命令変換部２２０、命令展開部２３０を含む。演算器数取得部２１０は、プロセッサ１０１の有する演算器の数を取得する。演算器数取得部２１０は、整数演算器の数を取得するＮＩＵ取得部２１１および浮動小数点演算器の数を取得するＮＦＵ取得部２１２を有する。プロセッサ１０１は、ＮＩＵ取得部２１１として、マシンモデル２０１を参照してプロセッサ１０１の有する整数演算器１０１ａの数を取得する。ＮＩＵ取得部２１１は、取得した整数演算器１０１ａの数をＮＩＵ数格納部２０３に格納する。プロセッサ１０１は、ＮＦＵ取得部２０２として、マシンモデル２０１を参照してプロセッサ１０１の有する浮動小数点演算器１０１ｂの数を取得する。ＮＦＵ取得部２１２は、取得した浮動小数点演算器１０１ｂの数をＮＦＵ数格納部２０２に格納する。演算器数取得部２１０は、「演算器数取得部」の一例である。 For example, the optimization unit 200 performs an optimization process exemplified by T2 in FIG. The optimization unit 200 includes an arithmetic unit number acquisition unit 210, an instruction conversion unit 220, and an instruction expansion unit 230. The arithmetic unit number acquisition unit 210 acquires the number of arithmetic units included in the processor 101. The arithmetic unit number acquisition unit 210 includes an NIU acquisition unit 211 that acquires the number of integer arithmetic units and an NFU acquisition unit 212 that acquires the number of floating point arithmetic units. The processor 101 refers to the machine model 201 as the NIU acquisition unit 211 and acquires the number of integer arithmetic units 101 a included in the processor 101. The NIU acquisition unit 211 stores the acquired number of integer arithmetic units 101 a in the NIU number storage unit 203. The processor 101 refers to the machine model 201 as the NFU acquisition unit 202 and acquires the number of floating point arithmetic units 101b included in the processor 101. The NFU acquisition unit 212 stores the acquired number of floating-point arithmetic units 101 b in the NFU number storage unit 202. The arithmetic unit number acquisition unit 210 is an example of a “calculator unit acquisition unit”.

命令変換部２２０は、ループ演算命令列に含まれる整数命令および浮動小数点命令の数を算出する。さらに、命令変換部２２０は、整数命令を浮動小数点命令に、浮動小数点命令を整数命令に変換する。命令変換部２２０は、ＩＵ命令変換部２２１およびＦＵ命令変換部２２２を有する。プロセッサ１０１は、ＩＵ命令変換部２２１として、ループ演算命令列格納部２６１に格納されたループ演算命令列に含まれる命令を整数命令に変換する。ＩＵ命令変換部２２１は、整数命令に変換した命令をＩＵ変換命令列格納部２６３に格納する。プロセッサ１０１は、ＦＵ命令変換部２２２として、ループ演算命令列格納部２６１に格納されたループ演算命令列に含まれる命令を浮動小数点命令に変換する。ＦＵ命令変換部２２２は、浮動小数点命令に変換した命令をＦＵ変換命令列格納部２６２に格納する。命令変換部２２０は、「抽出部」の一例である。 The instruction conversion unit 220 calculates the number of integer instructions and floating point instructions included in the loop operation instruction sequence. Further, the instruction conversion unit 220 converts integer instructions into floating point instructions and floating point instructions into integer instructions. The instruction conversion unit 220 includes an IU instruction conversion unit 221 and an FU instruction conversion unit 222. As the IU instruction conversion unit 221, the processor 101 converts an instruction included in the loop operation instruction sequence stored in the loop operation instruction sequence storage unit 261 into an integer instruction. The IU instruction conversion unit 221 stores the instruction converted into the integer instruction in the IU conversion instruction string storage unit 263. As the FU instruction conversion unit 222, the processor 101 converts an instruction included in the loop operation instruction sequence stored in the loop operation instruction sequence storage unit 261 into a floating-point instruction. The FU instruction conversion unit 222 stores the instruction converted into the floating point instruction in the FU conversion instruction string storage unit 262. The instruction conversion unit 220 is an example of an “extraction unit”.

命令展開部２３０は、ループの命令列に含まれるループ演算命令列を展開する。命令展開部２３０は、ＩＵ命令展開部２３１およびＦＵ命令展開部２３２を有する。プロセッサ１０１は、ＩＵ命令展開部２３１として、ＩＵ命令変換部２２１によって整数命令に変換されたループ演算命令列を展開する。プロセッサ１０１は、ＦＵ命令展開部２３２として、ＦＵ命令変換部２２２によって浮動小数点命令に変換されたループ演算命令列を展開する。ＩＵ命令展開部２３１は、「第１の生成部」の一例である。ＦＵ命令展開部２３２は、「第２の生成部」の一例である。命令展開部２３０は、「第３の生成部」の一例である。 The instruction expansion unit 230 expands a loop operation instruction sequence included in the loop instruction sequence. The instruction expansion unit 230 includes an IU instruction expansion unit 231 and a FU instruction expansion unit 232. The processor 101 expands the loop operation instruction sequence converted into the integer instruction by the IU instruction conversion unit 221 as the IU instruction expansion unit 231. The processor 101 expands the loop operation instruction sequence converted into the floating point instruction by the FU instruction conversion unit 222 as the FU instruction expansion unit 232. The IU instruction expansion unit 231 is an example of a “first generation unit”. The FU instruction expansion unit 232 is an example of a “second generation unit”. The instruction expansion unit 230 is an example of a “third generation unit”.

ループ命令補正部２４０は、命令展開部２３０によるループ展開に伴うループ命令の補正を行う。ループ命令補正部２４０は、例えば、ループ命令におけるループカウンタに加
算される数値を補正する。ループ命令補正部２４０は、「決定部」の一例である。 The loop instruction correction unit 240 corrects the loop instruction accompanying the loop expansion by the instruction expansion unit 230. For example, the loop command correction unit 240 corrects a numerical value added to the loop counter in the loop command. The loop command correction unit 240 is an example of a “determination unit”.

中間コード部２６０は、コンパイラ装置１０に入力された中間コードが格納される。中間コード部２６０は、ループ演算命令列格納部２６１、ＦＵ変換命令列格納部２６２、ＩＵ変換命令列格納部２６３、ＦＵ出力命令列格納部２６４、ＩＵ出力命令列格納部２６５および出力命令列格納部２６６を有する。 The intermediate code unit 260 stores the intermediate code input to the compiler apparatus 10. The intermediate code unit 260 includes a loop operation instruction sequence storage unit 261, an FU conversion instruction sequence storage unit 262, an IU conversion instruction sequence storage unit 263, an FU output instruction sequence storage unit 264, an IU output instruction sequence storage unit 265, and an output instruction sequence storage. Part 266.

ループ演算命令列格納部２６１には、コンパイラ装置１０に入力されたソースファイルに含まれるループ演算命令列が格納される。ＦＵ変換命令列格納部２６２は、ＦＵ命令変換部２２２によって浮動小数点命令に変換された命令が記憶される。ＩＵ変換命令列格納部２６３は、ＩＵ命令変換部２２１によって整数命令に変換された命令が記憶される。ＦＵ出力命令列格納部２６４には、ＦＵ変換命令列格納部２６２に格納された命令が、ＮＦＵ取得部２１２によって取得された浮動小数点演算器１０１ｂの数だけ追記される。ＩＵ出力命令列格納部２６５には、ＩＵ変換命令列格納部２６３に格納された命令が、ＮＩＵ取得部２１１によって取得された整数演算器１０１ａの数だけ、追記される。出力ループ命令列格納部２６６には、ＦＵ出力命令列格納部２６４およびＩＵ出力命令列格納部２６５に格納された各命令列が追記される。 The loop operation instruction sequence storage unit 261 stores a loop operation instruction sequence included in the source file input to the compiler apparatus 10. The FU conversion instruction string storage unit 262 stores an instruction converted into a floating point instruction by the FU instruction conversion unit 222. The IU conversion instruction string storage unit 263 stores an instruction converted into an integer instruction by the IU instruction conversion unit 221. The number of instructions stored in the FU conversion instruction string storage unit 262 is added to the FU output instruction string storage unit 264 by the number of floating-point arithmetic units 101b acquired by the NFU acquisition unit 212. In the IU output instruction sequence storage unit 265, the instructions stored in the IU conversion instruction sequence storage unit 263 are added by the number of integer arithmetic units 101a acquired by the NIU acquisition unit 211. Each instruction sequence stored in the FU output instruction sequence storage unit 264 and the IU output instruction sequence storage unit 265 is added to the output loop instruction sequence storage unit 266.

以上の構成を有するコンパイラ装置１０によるコンパイルにおける最適化処理について、図面を参照して説明する。 An optimization process in compilation by the compiler apparatus 10 having the above configuration will be described with reference to the drawings.

図１３は、コンパイラ装置１０に入力されるループの命令列の一例を示す図である。図１３の左側はループ展開される前のソースファイルの一例であり、図１３の右側はループ展開後のソースファイルの一例である。コンパイラ装置１０では、入力されたソースファイルに含まれるループの命令列が、ループ展開可能であるか否かを判定する。コンパイラ装置１０は、ループ展開可能と判定されたループの命令列を展開する。図１４は、ループ展開に適さないソースファイルの一例を示す図である。図１４の左側はループ展開前のソースファイルの一例であり、図１４の左側のソースファイルでは、「a[i+1]」と「a[i]」との間に依存関係がある。ここで、「i」は、ループカウンタである。すなわち、図１４
に例示されるソースファイルでは、ループの回転間に依存関係がある。そのため、図１４の左側に例示される命令「a[i+1]=b[i]*c[i]+a[i];」と「a[i+2]=[b[i+1]*c[i+1]+a[i+1];」とは、並行して実行する事が出来ない。そのため、ループの回転間に依存関係がある
場合、第１実施形態のコンパイラ装置１０は、ループ展開の対象外とすることができる。 FIG. 13 is a diagram illustrating an example of a loop instruction sequence input to the compiler apparatus 10. The left side of FIG. 13 is an example of a source file before loop expansion, and the right side of FIG. 13 is an example of a source file after loop expansion. The compiler apparatus 10 determines whether or not the loop instruction sequence included in the input source file can be loop-expanded. The compiler apparatus 10 expands the instruction sequence of the loop that is determined to be loop expandable. FIG. 14 is a diagram illustrating an example of a source file that is not suitable for loop expansion. The left side of FIG. 14 is an example of a source file before loop expansion, and the source file on the left side of FIG. 14 has a dependency between “a [i + 1]” and “a [i]”. Here, “i” is a loop counter. That is, FIG.
In the source file illustrated in (1), there is a dependency between the rotations of the loop. Therefore, the instructions “a [i + 1] = b [i] * c [i] + a [i];” illustrated on the left side of FIG. 14 and “a [i + 2] = [b [i + 1] ] * c [i + 1] + a [i + 1]; "cannot be executed in parallel. Therefore, when there is a dependency relationship between the rotations of the loop, the compiler apparatus 10 of the first embodiment can be excluded from the loop expansion target.

図１５は、第１実施形態に係るコンパイラ装置１０によるループ展開処理の流れの一例を示す図である。図１５を参照して、ループ展開処理について説明する。 FIG. 15 is a diagram illustrating an example of the flow of loop expansion processing by the compiler apparatus 10 according to the first embodiment. The loop expansion process will be described with reference to FIG.

Ｆ１では、コンパイラ装置１０は、入力されたソースファイルの中から最内ループを検出する。コンパイラ装置１０は、検出された最内ループの命令列の中から、ループ演算命令列を抽出する。最内ループの検出には、公知の様々な方法が適用可能である。 In F1, the compiler apparatus 10 detects the innermost loop from the input source file. The compiler apparatus 10 extracts a loop operation instruction sequence from the detected innermost loop instruction sequence. Various known methods can be applied to the detection of the innermost loop.

図１６は、ループの構造の一例を示す図である。前述のとおり、ループは、初期化命令列によってループカウンタが初期化され、ラベルとループ命令列によってループ演算命令列の各命令が繰り返し実行される。すなわち、コンパイラ装置１０は、例えば、ラベルおよびループ命令列によって挟まれたブロックを検出することで、ソースファイルからループを検出する事が可能である。 FIG. 16 is a diagram illustrating an example of a loop structure. As described above, in the loop, the loop counter is initialized by the initialization instruction sequence, and each instruction of the loop operation instruction sequence is repeatedly executed by the label and the loop instruction sequence. That is, the compiler apparatus 10 can detect a loop from a source file by detecting a block sandwiched between a label and a loop instruction sequence, for example.

図１７は、第１実施形態で入力されるループの命令列の一例を示す図である。図１７に例示される命令列は、中間コードで表現されている。図１７に例示される命令列では、ラベル「Label0」とループ命令列とによっと挟まれたループ演算命令列が繰り返し実行され
る。以降の説明では、図１７に例示した命令列を対象に処理がなされる。 FIG. 17 is a diagram illustrating an example of a loop instruction sequence input in the first embodiment. The instruction sequence illustrated in FIG. 17 is expressed by an intermediate code. In the instruction sequence illustrated in FIG. 17, the loop operation instruction sequence sandwiched between the label “Label0” and the loop instruction sequence is repeatedly executed. In the following description, processing is performed for the instruction sequence illustrated in FIG.

図１５に戻り、Ｆ２では、命令変換部２２０は、抽出されたループ演算命令列に含まれる整数命令の数と浮動小数点命令の数を算出する。Ｆ３では、命令変換部２２０は、ループ演算命令列に含まれる整数命令または浮動小数点命令の数に偏りがあるか否かを判定する。偏りがあるか否かは、例えば、「ループ演算命令列に含まれる浮動小数点命令の数をループ演算命令列で割った値」または「ループ演算命令列に含まれる整数命令の数をループ演算命令列で割った値」のいずれかが、所定値以上であるか否かで判定可能である。所定値は、例えば、プロセッサ１０１の有する整数演算器１０１ａおよび浮動小数点演算器１０１ｂの数に基づいて決定される値である。決定された所定値は、マシンモデル２０１に記憶される。所定値は、例えば、「０．９」である。整数命令または浮動小数点命令の数に偏りがあると判定された場合（Ｆ３でＹＥＳ）、処理はＦ４およびＦ５に進められる。整数命令または浮動小数点命令の数に偏りがないと判定された場合（Ｆ３でＮＯ）、処理は終了となる。 Returning to FIG. 15, in F2, the instruction conversion unit 220 calculates the number of integer instructions and the number of floating-point instructions included in the extracted loop operation instruction sequence. In F3, the instruction conversion unit 220 determines whether the number of integer instructions or floating-point instructions included in the loop operation instruction sequence is biased. Whether there is a bias is, for example, “the value obtained by dividing the number of floating point instructions included in the loop operation instruction sequence by the loop operation instruction sequence” or “the number of integer instructions included in the loop operation instruction sequence It is possible to determine whether any of “value divided by column” is equal to or greater than a predetermined value. The predetermined value is a value determined based on the number of integer arithmetic units 101a and floating point arithmetic units 101b of the processor 101, for example. The determined predetermined value is stored in the machine model 201. The predetermined value is “0.9”, for example. If it is determined that there is a bias in the number of integer instructions or floating-point instructions (YES in F3), the process proceeds to F4 and F5. If it is determined that there is no bias in the number of integer instructions or floating point instructions (NO in F3), the process ends.

Ｆ４では、ＦＵ命令変換部２２２は、Ｆ１で抽出されたループ演算命令列に含まれる命令を浮動小数点命令に変換し、ＦＵ変換命令列格納部２６２に格納する。Ｆ５では、ＩＵ命令変換部２２１は、Ｆ１で抽出されたループ演算命令列に含まれる命令を整数命令に変換し、ＩＵ変換命令列格納部２６３に格納する。 In F4, the FU instruction conversion unit 222 converts the instruction included in the loop operation instruction sequence extracted in F1 into a floating-point instruction and stores it in the FU conversion instruction sequence storage unit 262. In F5, the IU instruction conversion unit 221 converts the instruction included in the loop operation instruction sequence extracted in F1 into an integer instruction and stores it in the IU conversion instruction sequence storage unit 263.

図１８は、図１５のＦ４およびＦ５の処理の詳細な流れの一例を示す図である。以下、図１８を参照して、図１５のＦ４およびＦ５の処理について説明する。 FIG. 18 is a diagram illustrating an example of a detailed flow of the processes of F4 and F5 of FIG. Hereinafter, the processes of F4 and F5 of FIG. 15 will be described with reference to FIG.

Ｇ１では、命令変換部２２０は、結果リストを作成する。結果リストは、変換処理中の命令列を一時的に格納するリストであり、例えば、主記憶部１０２に用意される。Ｇ２では、命令変換部２２０は、ループ演算命令列の先頭から順番に命令を取り出す。Ｇ３では、命令変換部２２０は、取り出した命令を複製する。取り出された命令は、例えば、主記憶部１０２に複製される。複製された命令を、ここでは、命令Ａと称する。Ｇ３の処理は、入力された命令列がコンパイラ装置１０の誤動作等で破壊されることを抑制するために実施される。Ｇ４では、命令変換部２２０は、命令Ａを結果リストに追加する。Ｇ５では、命令変換部２２０は、命令Ａの命令コードを取り出す。Ｇ６では、命令変換部２２０は、整数命令への変換または浮動小数点命令への変換のいずれを実行するか判定する。図１５のＦ４の処理の場合、浮動小数点命令への変換処理のため、処理はＧ７に進められる。図１５のＦ５の処理の場合、整数命令への変換のため、処理はＧ８に進められる。 In G1, the instruction conversion unit 220 creates a result list. The result list is a list for temporarily storing instruction sequences being converted, and is prepared in the main storage unit 102, for example. In G2, the instruction conversion unit 220 extracts instructions in order from the beginning of the loop operation instruction sequence. In G3, the instruction conversion unit 220 duplicates the fetched instruction. The fetched instruction is copied to the main storage unit 102, for example. The duplicated instruction is referred to herein as instruction A. The processing of G3 is performed in order to prevent the input instruction sequence from being destroyed due to malfunction of the compiler apparatus 10 or the like. In G4, the instruction conversion unit 220 adds the instruction A to the result list. In G5, the instruction conversion unit 220 extracts the instruction code of the instruction A. In G6, the instruction conversion unit 220 determines whether to convert to an integer instruction or to a floating point instruction. In the case of the processing of F4 in FIG. 15, the processing proceeds to G7 for the conversion processing to the floating point instruction. In the process of F5 in FIG. 15, the process proceeds to G8 for conversion to an integer instruction.

Ｇ７では、ＦＵ命令変換部２２２は、ＦＵ変換テーブルを参照して、変換後の命令コードと命令の型を取り出す。図１９は、ＦＵ変換テーブル３０１の一例を示す図である。ＦＵ変換テーブル３０１は、浮動小数点命令を整数命令に変換する場合の、変換前後の命令の対応を示すテーブルである。ＦＵ変換テーブル３０１は、例えば、図１の主記憶部１０２または補助記憶部１０３に記憶される。図１９では、変換前の命令コードおよび命令の型と変換後の命令コードと命令の型とが例示されている。図１９を参照すると、変換前にint型である命令は、float型の命令に変換されていることがわかる。また、変換前からfloat型である命令は、変換後もfloat型のままであることがわかる。図１８のＧ７では、ＦＵ命令変換部２２２は、図１９に例示されるＦＵ変換テーブル３０１を参照して、命令Ａを浮動小数点命令に変換した後の命令コードと命令の型とを取り出す。例えば、変換前の命令Ａの命令コードが「mult」である場合、変換後の命令コードとして「fumult」、変換後の命令の型として「float」が取り出される。なお、「fuadd」および「fumult」は、浮動小数点演算器１０１ｂによって演算される整数命令であり、オペランドには、例えば、float型の仮想レジスタが指定される。すなわち、「fuadd」および「fumult」は、float
型の仮想レジスタに格納されたビット列を整数として扱って演算する。 In G7, the FU instruction conversion unit 222 refers to the FU conversion table and extracts the converted instruction code and instruction type. FIG. 19 is a diagram illustrating an example of the FU conversion table 301. The FU conversion table 301 is a table showing correspondence between instructions before and after conversion when a floating-point instruction is converted into an integer instruction. The FU conversion table 301 is stored in, for example, the main storage unit 102 or the auxiliary storage unit 103 in FIG. FIG. 19 illustrates an instruction code and instruction type before conversion, and an instruction code and instruction type after conversion. Referring to FIG. 19, it can be seen that an instruction of int type is converted to a float type instruction before conversion. In addition, it can be seen that an instruction that is a float type before the conversion remains a float type after the conversion. In G7 of FIG. 18, the FU instruction conversion unit 222 refers to the FU conversion table 301 illustrated in FIG. 19 and extracts the instruction code and the instruction type after the instruction A is converted into a floating-point instruction. For example, when the instruction code of the instruction A before conversion is “mult”, “fumult” is extracted as the instruction code after conversion, and “float” is extracted as the type of the instruction after conversion. Note that “fuadd” and “fumult” are integer instructions operated by the floating-point arithmetic unit 101b, and, for example, a float type virtual register is designated as the operand. That is, "fuadd" and "fumult" are floats
The bit string stored in the type virtual register is treated as an integer.

Ｇ８では、ＩＵ命令変換部２２１は、ＩＵ変換テーブルを参照して、変換後の命令コードと命令の型を取り出す。図２０は、ＩＵ変換テーブル３０２の一例を示す図である。ＩＵ変換テーブル３０２は、整数命令を浮動小数点命令に変換した場合の、変換前後の命令の対応を示すテーブルである。ＩＵ変換テーブル３０２は、例えば、図１の主記憶部１０２または補助記憶部１０３に記憶される。図２０では、変換前の命令コードおよび命令の型と変換後の命令コードと命令の型とが例示されている。図２０を参照すると、変換前にfloat型である命令は、int型の命令に変換されていることがわかる。また、変換前からint型である命令は、変換後もint型のままであることがわかる。図１８のＧ８では、ＩＵ命令変換部２２１は、図２０に例示されるＩＵ変換テーブルを参照して、命令Ａを整数命令に変換した後の命令コードと命令の型とを取り出す。例えば、変換前の命令Ａの命令コードが「fmult」である場合、変換後の命令コードとして「iufmult」、変換後の命令の型として「int」が取り出させれる。なお、「iufadd」および「iufmult」は、整数演算器１０１ａによって演算される浮動小数点命令であり、オペランドには、例えば、int型の仮想
レジスタが指定される。すなわち、「iufadd」および「iufmult」は、int型の仮想レジスタに格納されたビット列を浮動小数点数として扱って演算する。 In G8, the IU instruction conversion unit 221 refers to the IU conversion table and extracts the converted instruction code and instruction type. FIG. 20 is a diagram illustrating an example of the IU conversion table 302. The IU conversion table 302 is a table showing correspondence between instructions before and after conversion when an integer instruction is converted into a floating-point instruction. The IU conversion table 302 is stored in, for example, the main storage unit 102 or the auxiliary storage unit 103 in FIG. FIG. 20 illustrates an instruction code and instruction type before conversion, and an instruction code and instruction type after conversion. Referring to FIG. 20, it can be seen that an instruction that is a float type is converted to an int type instruction before the conversion. It can also be seen that an instruction that is int type before conversion remains int type after conversion. In G8 of FIG. 18, the IU instruction conversion unit 221 refers to the IU conversion table illustrated in FIG. 20 and extracts the instruction code and the instruction type after converting the instruction A into an integer instruction. For example, when the instruction code of the instruction A before conversion is “fmult”, “iufmult” is extracted as the instruction code after conversion, and “int” is extracted as the type of the instruction after conversion. Note that “iufadd” and “iufmult” are floating-point instructions that are calculated by the integer arithmetic unit 101a, and for example, an int-type virtual register is specified as the operand. That is, “iufadd” and “iufmult” operate by treating the bit string stored in the int-type virtual register as a floating-point number.

図２１は、入力されたループ演算命令列に含まれる各命令の整数命令への変換前後の対応の一例を示す図である。入力されたループ演算命令列では、全ての命令が整数命令である。そのため、整数命令への変換が実行されても命令コードおよび命令の型に変更はない。図２２は、入力されたループ演算命令列に含まれる各命令の浮動小数点演算命令への変換前後の対応の一例を示す図である。変換前は全ての命令が整数命令であったが、変換後は全ての命令が浮動小数点命令になっていることがわかる。 FIG. 21 is a diagram illustrating an example of correspondence between before and after conversion of each instruction included in the input loop operation instruction sequence into an integer instruction. In the input loop operation instruction sequence, all instructions are integer instructions. Therefore, even if conversion to an integer instruction is executed, the instruction code and the instruction type are not changed. FIG. 22 is a diagram illustrating an example of correspondence between before and after conversion of each instruction included in the input loop operation instruction sequence into a floating-point operation instruction. Although all instructions were integer instructions before conversion, it can be seen that all instructions are floating point instructions after conversion.

図１８に戻り、Ｇ９では、Ｇ７またはＧ８で変換された命令コードと命令の型が命令Ａに設定される。すなわち、結果リストにおいて、Ｇ４で結果リストに追加された命令ＡがＧ７またはＧ８によって変換された命令コードと命令の型に書き換えられる。Ｇ１０では、命令変換部２２０は、まだ変換していない命令がループ演算命令列に残っているか否かを判定する。ループ演算命令列に含まれる全ての命令の変換が完了した場合（Ｇ１０でＮＯ）、処理は終了する。まだ変換していない命令がループ演算命令列に残っている場合（Ｇ１０でＹＥＳ）、処理は、Ｇ２に進められる。 Returning to FIG. 18, in G9, the instruction code and instruction type converted in G7 or G8 are set to the instruction A. That is, in the result list, the instruction A added to the result list in G4 is rewritten with the instruction code and instruction type converted by G7 or G8. In G10, the instruction conversion unit 220 determines whether or not an instruction that has not yet been converted remains in the loop operation instruction sequence. When conversion of all instructions included in the loop operation instruction sequence is completed (NO in G10), the process ends. If an instruction that has not yet been converted remains in the loop operation instruction sequence (YES in G10), the process proceeds to G2.

図１５に戻り、Ｆ６では、ＮＩＵ取得部２１１は、マシンモデル２０１を参照してプロセッサ１０１の有する整数演算器１０１ａの数であるＮＩＵ数を取得する。また、ＮＦＵ取得部２１２は、マシンモデル２０１を参照してプロセッサ１０１の有する浮動小数点演算器１０１ｂの数であるＮＦＵ数を取得する。ここでは、ＮＩＵ数として「１」、ＮＦＵ数として「２」が取得される。 Returning to FIG. 15, in F <b> 6, the NIU acquisition unit 211 acquires the number of NIUs that is the number of integer arithmetic units 101 a included in the processor 101 with reference to the machine model 201. Further, the NFU acquisition unit 212 refers to the machine model 201 and acquires the number of NFUs that is the number of floating-point arithmetic units 101 b included in the processor 101. Here, “1” is acquired as the number of NIUs, and “2” is acquired as the number of NFUs.

Ｆ７では、ＦＵ命令展開部２３２は、ＦＵ変換命令列格納部２６２に格納された命令列を展開し、展開した命令列をＦＵ出力命令列格納部２６４に格納する。Ｆ８では、ＩＵ命令展開部２３１は、ＩＵ変換命令列格納部２６３に格納された命令列を展開し、展開した命令列をＩＵ出力命令列格納部２６５に格納する。Ｆ９では、Ｆ７の処理が、ＮＦＵ回繰り返したか否か判定される。ＮＦＵ回繰り返した場合（Ｆ９でＹＥＳ）、処理は、Ｆ１１に進められる。ＮＦＵ回繰り返していない場合（Ｆ９でＮＯ）、処理はＦ７に戻る。Ｆ１０では、Ｆ８の処理が、ＮＩＵ回繰り返したか否か判定される。ＮＩＵ回繰り返した場合（Ｆ１０でＹＥＳ）、処理は、Ｆ１１に進められる。ＮＩＵ回繰り返していない場合（Ｆ１０でＮＯ）、処理はＦ８に戻る。 In F7, the FU instruction expansion unit 232 expands the instruction sequence stored in the FU conversion instruction sequence storage unit 262, and stores the expanded instruction sequence in the FU output instruction sequence storage unit 264. In F8, the IU instruction expansion unit 231 expands the instruction sequence stored in the IU conversion instruction sequence storage unit 263, and stores the expanded instruction sequence in the IU output instruction sequence storage unit 265. In F9, it is determined whether the process of F7 has been repeated NFU times. If it has been repeated NFU times (YES in F9), the process proceeds to F11. If not repeated NFU times (NO in F9), the process returns to F7. In F10, it is determined whether the process of F8 has been repeated NIU times. If it is repeated NIU times (YES in F10), the process proceeds to F11. If not repeated NIU times (NO in F10), the process returns to F8.

図２３は、ループ演算命令列の展開処理の一例を示す図である。図２３の処理は、図１５のＦ７からＦ１０の処理の詳細を示す図の一例である。図２３を参照して、ループ演算
命令列の展開処理について説明する。 FIG. 23 is a diagram illustrating an example of a loop operation instruction sequence expansion process. The process in FIG. 23 is an example of a diagram illustrating details of the processes in F7 to F10 in FIG. With reference to FIG. 23, the expansion processing of the loop operation instruction sequence will be described.

Ｈ１では、命令展開部２３０は、ループ展開の対象となるループ演算命令列の入力を受け付ける。命令展開部２３０は、入力されたループ演算命令列の参照リストオペランドに含まれる仮想レジスタを取り出し、仮想レジスタマップを作成する。 In H1, the instruction expansion unit 230 receives an input of a loop operation instruction sequence that is a target of loop expansion. The instruction expansion unit 230 extracts a virtual register included in the reference list operand of the input loop operation instruction sequence, and creates a virtual register map.

図２４は、仮想レジスタマップの一例を示す図である。仮想レジスタマップは、ループ展開前の仮想レジスタとループ展開後の仮想レジスタとを対応付けるマップである。仮想レジスタマップでは、ループ展開前の仮想レジスタからループ展開後の仮想レジスタが１対１で対応付けられる。ループ展開後の仮想レジスタは、命令展開部２３０によって、新たに作成される。ループ展開前の仮想レジスタとループ展開後の仮想レジスタの対応は、例えば、ハッシュ関数によって対応付けられる。すなわち、仮想レジスタマップは、ハッシュテーブルとすることが可能である。 FIG. 24 is a diagram illustrating an example of a virtual register map. The virtual register map is a map that associates a virtual register before loop expansion and a virtual register after loop expansion. In the virtual register map, the virtual register after the loop expansion is associated with the virtual register before the loop expansion on a one-to-one basis. The virtual register after the loop expansion is newly created by the instruction expansion unit 230. The correspondence between the virtual register before the loop expansion and the virtual register after the loop expansion is associated with, for example, a hash function. That is, the virtual register map can be a hash table.

図２３に戻り、Ｈ２では、命令展開部２３０は、入力されたループ演算命令列の先頭から順番に命令を取り出す。Ｈ３では、命令展開部２３０は、Ｈ２で取り出した命令の参照リストオペランドを順番に取り出す。Ｈ４では、命令展開部２３０は、Ｈ３で取り出したオペランドが仮想レジスタであるか否かを判定する。オペランドが仮想レジスタである場合（Ｈ４でＹＥＳ）、処理はＨ５に進められる。オペランドが仮想レジスタでない場合（Ｈ４でＮＯ）、処理はＨ６に進められる。Ｈ５では、命令展開部２３０は、Ｈ１で作成した仮想アドレスマップを基に、仮想レジスタの変換を行う。 Returning to FIG. 23, in H2, the instruction expansion unit 230 extracts instructions in order from the head of the input loop operation instruction sequence. In H3, the instruction expansion unit 230 sequentially extracts the reference list operand of the instruction extracted in H2. In H4, the instruction expansion unit 230 determines whether the operand extracted in H3 is a virtual register. If the operand is a virtual register (YES in H4), the process proceeds to H5. If the operand is not a virtual register (NO in H4), the process proceeds to H6. In H5, the instruction expansion unit 230 performs virtual register conversion based on the virtual address map created in H1.

図２５は、参照リストオペランドにおける仮想レジスタの変換処理の一例を示す図である。Ｊ１では、命令展開部２３０は、図２３のＨ１で作成した仮想レジスタマップを参照し、変換前の仮想レジスタに対応する変換後の仮想レジスタを取り出す。Ｊ２では、Ｊ１で取り出した変換後の仮想レジスタで変換前の仮想レジスタを書き換える。例えば、図２４を参照すると、ループ展開前の仮想レジスタ「$g1」は、ループ展開後に「$f2」に変換される。この際、仮想レジスタ「$g1」に格納されていた値は、そのビット列を変更せず
に仮想レジスタ「$f2」に格納される。すなわち、Ｊ２の処理では、書き換え前の仮想レ
ジスタに格納されていた値の符号、仮数、基数および指数等の区別がされない。Ｊ２の処理では、書き換え前の仮想レジスタに格納されていたビット列をそのまま書き換え後の仮想レジスタに格納する。 FIG. 25 is a diagram illustrating an example of a virtual register conversion process in the reference list operand. In J1, the instruction expansion unit 230 refers to the virtual register map created in H1 of FIG. 23, and extracts the virtual register after conversion corresponding to the virtual register before conversion. In J2, the virtual register before conversion is rewritten with the virtual register after conversion taken out in J1. For example, referring to FIG. 24, the virtual register “$ g1” before loop expansion is converted to “$ f2” after loop expansion. At this time, the value stored in the virtual register “$ g1” is stored in the virtual register “$ f2” without changing the bit string. That is, in the process of J2, the sign, mantissa, radix, exponent, etc. of the value stored in the virtual register before rewriting is not distinguished. In the process of J2, the bit string stored in the virtual register before rewriting is stored in the virtual register after rewriting as it is.

図２３に戻り、Ｈ６では、命令展開部２３０は、Ｈ３で取り出したオペランドがメモリのアドレスであるか否かを判定する。オペランドがメモリオペランドである場合（Ｈ６でＹＥＳ）、処理はＨ７に進められる。オペランドがメモリオペランドでない場合（Ｈ６でＮＯ）、処理はＨ８に進められる。Ｈ７では、命令展開部２３０は、オペランドとして指定されているメモリオペランドを変換する。 Returning to FIG. 23, in H6, the instruction expansion unit 230 determines whether or not the operand fetched in H3 is a memory address. If the operand is a memory operand (YES in H6), the process proceeds to H7. If the operand is not a memory operand (NO in H6), the process proceeds to H8. In H7, the instruction expansion unit 230 converts the memory operand specified as the operand.

図２６は、メモリオペランドのループ展開前後の対応の一例を示す図である。図２６では、メモリオペランド「a+$g1」が「a+$g1+（展開番号―１）×（$g1の増分値）」によって算出されることが例示されている。ここで、「$g1の増分値」は、展開前のループにお
ける１回転当たりの増分値のことである。すなわち、ループ命令列の加算命令によってループカウンタ「$g1」に加算される値であるともいえる。図２７は、各展開番号における
メモリオペランドの変換を例示する図である。展開番号１では、メモリオペランド「a+$g1」が変換されると、「a+$g1+（１−１）×4」より、「a+$g1」となる。展開番号２では
、メモリオペランド「a+$g1」が変換されると、「a+$g1+（２−１）×4」より、「a+$g1+4」となる。展開番号３では、メモリオペランド「a+$g1」が変換されると、「a+$g1+（３−１）×4」より、「a+$g1+8」となる。この際、メモリオペランド「a+$g1」によって指
定される領域に格納されていた値は、そのビット列を変更せずにメモリオペランド「a+$g
1+8」によって指定される領域に格納される。すなわち、図２６および図２７の処理では
、書き換え前のメモリオペランドによって指定される領域に格納されていた値の符号、仮数、基数および指数等の区別がされない。図２６および図２７の処理では、書き換え前のメモリオペランドによって指定される領域に格納されていたビット列をそのまま書き換え後のメモリオペランドによって指定される領域に格納する。 FIG. 26 is a diagram illustrating an example of correspondence before and after the loop expansion of the memory operand. In FIG. 26, the memory operand “a + $ g1” is calculated by “a + $ g1 + (development number−1) × (increment value of $ g1)”. Here, “increment value of $ g1” is an increment value per rotation in the loop before unfolding. That is, it can be said that the value is added to the loop counter “$ g1” by the addition instruction of the loop instruction sequence. FIG. 27 is a diagram illustrating conversion of memory operands at each expansion number. In the expansion number 1, when the memory operand “a + $ g1” is converted, “a + $ g1” is obtained from “a + $ g1 + (1-1) × 4”. In the expansion number 2, when the memory operand “a + $ g1” is converted, “a + $ g1 + 4” is obtained from “a + $ g1 + (2-1) × 4”. In the expansion number 3, when the memory operand “a + $ g1” is converted, “a + $ g1 + 8” is obtained from “a + $ g1 + (3-1) × 4”. At this time, the value stored in the area specified by the memory operand “a + $ g1” is stored in the memory operand “a + $ g” without changing the bit string.
Stored in the area specified by “1 + 8”. That is, in the processes of FIGS. 26 and 27, the sign, mantissa, radix, exponent, and the like of the value stored in the area specified by the memory operand before rewriting are not distinguished. In the processing of FIG. 26 and FIG. 27, the bit string stored in the area specified by the memory operand before rewriting is stored as it is in the area specified by the memory operand after rewriting.

図２３に戻り、Ｈ８では、命令展開部２３０は、未処理の参照リストオペランドがあるか否かを判定する。未処理の参照リストオペランドがある場合（Ｈ８でＹＥＳ）、処理はＨ３に進められる。未処理の参照リストオペランドが無い場合（Ｈ８でＮＯ）、処理はＨ９に進められる。 Returning to FIG. 23, in H8, the instruction expansion unit 230 determines whether there is an unprocessed reference list operand. If there is an unprocessed reference list operand (YES in H8), the process proceeds to H3. If there is no unprocessed reference list operand (NO in H8), the process proceeds to H9.

Ｈ９では、命令展開部２３０は、定義リストオペランドを順番に取り出す。Ｈ１０では、命令展開部２３０は、Ｈ９で取り出したオペランドが仮想レジスタであるか否かを判定する。オペランドが仮想レジスタである場合（Ｈ１０でＹＥＳ）、処理はＨ１１に進められる。オペランドが仮想レジスタでない場合（Ｈ１０でＮＯ）、処理はＨ１２に進められる。 In H9, the instruction expansion unit 230 extracts definition list operands in order. In H10, the instruction expansion unit 230 determines whether the operand extracted in H9 is a virtual register. If the operand is a virtual register (YES in H10), the process proceeds to H11. If the operand is not a virtual register (NO in H10), the process proceeds to H12.

図２８は、定義リストオペランドにおける仮想レジスタオペランドの変換処理の一例を示す図である。図２８は、図２３のＨ１１の処理の詳細を例示する図である。Ｋ１では、命令展開部２３０は、オペランドの変換対象となる命令の型がint型であるか否かを判定
する。int型である場合（Ｋ１でＹＥＳ）、処理はＫ２に進められる。int型でない場合（Ｋ１でＮＯ）、処理はＫ３に進められる。Ｋ２では、命令展開部２３０は、int型の仮想
レジスタを新規に作成する。Ｋ３では、命令展開部２３０は、float型の仮想レジスタを
新規に作成する。Ｋ４では、命令変換部２３０は、変換前のオペランドと、Ｋ３またはＫ４で作成した仮想レジスタとの対応を仮想レジスタマップに追加する。Ｋ５では、命令展開部２３０は、オペランドをＫ３またはＫ４によって作成した仮想レジスタに変換する。この際、変換前の仮想レジスタに格納されていた値は、そのビット列を変更せずに変換後の仮想レジスタに格納される。すなわち、Ｋ５の処理では、書き換え前の仮想レジスタに格納されていた値の符号、仮数、基数および指数等の区別がされない。Ｋ５の処理では、書き換え前の仮想レジスタに格納されていたビット列をそのまま書き換え後の仮想レジスタに格納する。 FIG. 28 is a diagram illustrating an example of a virtual register operand conversion process in the definition list operand. FIG. 28 is a diagram illustrating details of the processing in H11 of FIG. In K1, the instruction expansion unit 230 determines whether the type of the instruction to be converted into the operand is an int type. If it is an int type (YES in K1), the process proceeds to K2. If not int type (NO in K1), the process proceeds to K3. In K2, the instruction expansion unit 230 newly creates an int type virtual register. In K3, the instruction expansion unit 230 newly creates a float type virtual register. In K4, the instruction conversion unit 230 adds the correspondence between the operand before conversion and the virtual register created in K3 or K4 to the virtual register map. In K5, the instruction expansion unit 230 converts the operand into a virtual register created by K3 or K4. At this time, the value stored in the virtual register before conversion is stored in the virtual register after conversion without changing the bit string. That is, in the process of K5, the sign, mantissa, radix, exponent, etc. of the value stored in the virtual register before rewriting is not distinguished. In the process of K5, the bit string stored in the virtual register before rewriting is stored in the virtual register after rewriting as it is.

図２３に戻り、Ｈ１２では、命令展開部２３０は、Ｈ９で取り出したオペランドがメモリオペランドであるか否かを判定する。メモリオペランドである場合（Ｈ１２でＹＥＳ）、処理はＨ１３に進められる。メモリオペランドでない場合（Ｈ１２でＮＯ）、処理は、Ｈ１４に進められる。Ｈ１３の処理の内容は、Ｈ７と同様である。すなわち、Ｈ１３では、図２６および図２７に例示される処理が実行される。Ｈ１４では、命令展開部２３０は、未処理の定義リストオペランドがあるか否かを判定する。未処理の定義リストオペランドがある場合（Ｈ１４でＹＥＳ）、処理はＨ９に進められる。未処理の定義リストオペランドが無い場合（Ｈ１４でＮＯ）、処理は終了する。 Returning to FIG. 23, in H12, the instruction expansion unit 230 determines whether or not the operand extracted in H9 is a memory operand. If it is a memory operand (YES in H12), the process proceeds to H13. If it is not a memory operand (NO in H12), the process proceeds to H14. The content of the process of H13 is the same as that of H7. That is, in H13, the processes illustrated in FIGS. 26 and 27 are executed. In H14, the instruction expansion unit 230 determines whether there is an unprocessed definition list operand. If there is an unprocessed definition list operand (YES in H14), the process proceeds to H9. If there is no unprocessed definition list operand (NO in H14), the process ends.

図１５に戻り、Ｆ１１では、命令展開部２３０は、ＦＵ出力命令列格納部２６４およびＩＵ出力命令列格納部２６５に格納された命令を、出力ループ命令列格納部２６６に格納する。 Returning to FIG. 15, in F <b> 11, the instruction expansion unit 230 stores the instructions stored in the FU output instruction sequence storage unit 264 and the IU output instruction sequence storage unit 265 in the output loop instruction sequence storage unit 266.

図２９は、出力ループ命令列格納部２６６に格納された命令列の一例を示す図である。図２９では、図１７に例示されたループの命令列に含まれるループ演算命令列をループ展開した命令列の一例が示されている。図２９に例示される命令列では、展開番号１および２の命令列が、浮動小数点命令による命令列となっている。また、展開番号３の命令列が、整数命令による命令列となっている。その結果、出力ループ命令列格納部２６６には、
浮動小数点命令による命令列が２つ、整数命令による命令列が１つ格納されている。すなわち、出力ループ命令列格納部２６６に格納される浮動小数点命令による命令列の数および整数命令による命令列の数は、それぞれ、プロセッサ１０１の有する浮動小数点演算器１０１ｂおよび整数演算器１０１ａの数と一致している。 FIG. 29 is a diagram illustrating an example of an instruction sequence stored in the output loop instruction sequence storage unit 266. FIG. 29 shows an example of an instruction sequence in which a loop operation instruction sequence included in the instruction sequence of the loop illustrated in FIG. 17 is loop expanded. In the instruction sequence illustrated in FIG. 29, the instruction sequences with expansion numbers 1 and 2 are instruction sequences based on floating-point instructions. Further, the instruction sequence with the expansion number 3 is an instruction sequence based on integer instructions. As a result, the output loop instruction string storage unit 266 includes
Two instruction sequences based on floating-point instructions and one instruction sequence based on integer instructions are stored. That is, the number of instruction sequences based on floating point instructions and the number of instruction sequences based on integer instructions stored in the output loop instruction sequence storage unit 266 are respectively the number of floating point arithmetic units 101b and integer arithmetic units 101a included in the processor 101. Match.

図１５に戻り、Ｆ１２では、命令展開部２３０は、ループの回転数を補正する。ループの回転数の補正は、ループ１回転当たりのループカウンタの増分を補正することで行われる。ループカウンタの増分の補正は、例えば、「（ループ展開の展開数）×（ループ展開前のループカウンタの増分値）」によって算出される。ここで、ループ展開の展開数は、「（整数演算器１０１ａの数）＋（浮動小数点演算器１０１ｂの数）」となっている。したがって、ループカウンタの増分を補正することで、ループ展開後のループの回転数は、「（ループ展開前のループの回転数）÷（整数演算器１０１ａの数＋浮動小数点演算器１０１ｂの数）」となる。図２９に例示されるループ演算命令列の展開数は「３」である。すなわち、ループ展開後の１回転のループによって実行される処理は、ループ展開前のループ３回転分の処理に相当する。また、図１７を参照すると、ループ展開前におけるループカウンタ「$g1」は、ループ１回転当たり「４」加算されている。したがって、ループ
展開後では、ループカウンタ「$g1」の値は「（展開数）３×（ループ展開前のループカ
ウンタの増分値）４」によって「１２」と算出される。図３０は、回転数が補正されたループ命令列の一例を示す図である。図３０を参照すると、ループ命令列の加算命令において、ループカウンタ「$g1」の増分が「１２」になっていることがわかる。 Returning to FIG. 15, in F12, the instruction expansion unit 230 corrects the rotation speed of the loop. Correction of the rotation speed of the loop is performed by correcting the increment of the loop counter per one rotation of the loop. Correction of the increment of the loop counter is calculated by, for example, “(number of unfolded loops) × (increment value of loop counter before loop unfolding)”. Here, the number of loop expansions is “(number of integer arithmetic units 101a) + (number of floating point arithmetic units 101b)”. Therefore, by correcting the increment of the loop counter, the number of rotations of the loop after loop expansion is “(number of rotations of the loop before loop expansion) / (number of integer arithmetic units 101a + number of floating point arithmetic units 101b)”. " The number of expansions of the loop operation instruction sequence illustrated in FIG. 29 is “3”. That is, the process executed by the loop of one rotation after the loop expansion corresponds to the process for three rotations of the loop before the loop expansion. Referring to FIG. 17, the loop counter “$ g1” before the loop expansion is incremented by “4” per one loop rotation. Therefore, after loop expansion, the value of the loop counter “$ g1” is calculated as “12” by “(number of expansions) 3 × (increment value of the loop counter before loop expansion) 4”. FIG. 30 is a diagram illustrating an example of a loop instruction sequence in which the rotation speed is corrected. Referring to FIG. 30, in the addition instruction of the loop instruction sequence, it can be seen that the increment of the loop counter “$ g1” is “12”.

図３１は、ループ展開後のループの命令列の一例を示す図である。図３１では、初期化命令列に、展開されたループ演算命令列および補正されたループ命令列が続いている。このように展開された命令列は、例えば、図１０のＴ４に例示される命令スケジューリングによって、複数の演算器によって並列して実行しやすいように実行順が入れ替えられながら、プロセッサ１０１の演算器によって実行される。 FIG. 31 is a diagram illustrating an example of an instruction sequence of a loop after loop expansion. In FIG. 31, the initialized instruction sequence is followed by the expanded loop operation instruction sequence and the corrected loop instruction sequence. The instruction sequence developed in this way is changed by the arithmetic unit of the processor 101 while the execution order is changed so that it can be easily executed in parallel by a plurality of arithmetic units by the instruction scheduling exemplified in T4 of FIG. Executed.

図３２および図３３は、命令列が実行された場合の演算器の使用状況の一例を示す図である。図３２は、全ての命令が浮動小数点命令に変換された命令列を実行した場合の演算器の使用状況の一例を示す図である。図３３は、プロセッサ１０１の有する整数演算器１０１ａおよび浮動小数点演算器１０１ｂそれぞれの数に応じて、整数命令および浮動小数点命令を混在させた命令列を実行した場合の演算器の使用状況の一例を示す図である。すなわち、図３３は、第１実施形態によるループ展開が行われた命令列が演算器で実行される状態の例示である。図３２と図３３を比較すると、図３３に例示される場合の方が、プロセッサ１０１の有する各演算器を効率的に使用している事がわかる。 FIG. 32 and FIG. 33 are diagrams showing an example of the usage state of the arithmetic unit when the instruction sequence is executed. FIG. 32 is a diagram illustrating an example of a usage state of an arithmetic unit when an instruction sequence in which all instructions are converted into floating-point instructions is executed. FIG. 33 shows an example of the usage status of an arithmetic unit when an instruction sequence in which integer instructions and floating-point instructions are mixed is executed according to the numbers of the integer arithmetic unit 101a and the floating-point arithmetic unit 101b of the processor 101. FIG. That is, FIG. 33 is an illustration of a state in which the instruction sequence that has undergone loop expansion according to the first embodiment is executed by the computing unit. Comparing FIG. 32 and FIG. 33, it can be seen that the operation unit included in the processor 101 is used more efficiently in the case illustrated in FIG.

（第１比較例）
第１比較例では、入力されたループの命令列を所定の展開数だけループ展開し、命令の型の変換を行わないコンパイラ装置について説明する。図３４は、第１比較例に係るコンパイラ装置５００のループ展開を行う処理ブロックの一例である。情報処理装置１００は、コンパイラ装置５００としても利用可能である。図３４では、展開数決定部５０１、命令展開部５０２、ループ命令補正部５０３、入力ループ命令列格納部５０４、展開命令列格納部５０５および出力ループ命令列格納部５０６の各処理ブロックが例示されている。例えば、図１のプロセッサ１０１が図３４の各処理ブロックとして主記憶部１０２に展開されたコンピュータプログラムを実行する。ただし、図３４のいずれかの処理ブロックの少なくとも一部はハードウェア回路、専用のプロセッサまたはデジタルシグナルプロセッサ（Digital Signal Processor、ＤＳＰ）を含んでもよい。 (First comparative example)
In the first comparative example, a compiler apparatus will be described in which an instruction sequence of an input loop is loop-expanded by a predetermined number of expansions and no instruction type conversion is performed. FIG. 34 is an example of a processing block that performs loop expansion of the compiler apparatus 500 according to the first comparative example. The information processing apparatus 100 can also be used as the compiler apparatus 500. In FIG. 34, each processing block of the expansion number determination unit 501, the instruction expansion unit 502, the loop instruction correction unit 503, the input loop instruction string storage unit 504, the expansion instruction string storage unit 505, and the output loop instruction string storage unit 506 is illustrated. ing. For example, the processor 101 in FIG. 1 executes a computer program loaded in the main storage unit 102 as each processing block in FIG. However, at least a part of any of the processing blocks in FIG. 34 may include a hardware circuit, a dedicated processor, or a digital signal processor (DSP).

入力ループ命令列格納部５０４には、ループ展開の対象となるループの命令列が格納される。コンパイラ装置５００のプロセッサ１０１は、展開数決定部５０１として、ループ
展開の展開数を決定する。展開数は、プロセッサ１０１の有するレジスタ数、入力ループ命令列格納部５０４に格納されたループ演算命令列に含まれる命令数等に基づいて決定される。コンパイラ装置５００のプロセッサ１０１は、命令展開部５０２として、ループ演算命令列に含まれる命令列を展開する。展開された命令列は、展開命令列格納部５０５に記憶される。コンパイラ装置５００のプロセッサ１０１は、ループ命令補正部５０３として、展開後のループ命令におけるループカウンタの増加量、ループの終了条件の補正を行う。以上の処理によって展開されたループの命令列は、出力ループ命令列格納部５０６に格納される。 The input loop instruction string storage unit 504 stores an instruction string of a loop to be loop expanded. The processor 101 of the compiler apparatus 500 determines the number of loop expansions as the expansion number determination unit 501. The number of expansions is determined based on the number of registers included in the processor 101, the number of instructions included in the loop operation instruction sequence stored in the input loop instruction sequence storage unit 504, and the like. The processor 101 of the compiler apparatus 500 expands the instruction sequence included in the loop operation instruction sequence as the instruction expansion unit 502. The expanded instruction sequence is stored in the expanded instruction sequence storage unit 505. The processor 101 of the compiler apparatus 500, as the loop instruction correction unit 503, corrects the increment of the loop counter and the loop end condition in the expanded loop instruction. The instruction sequence of the loop developed by the above processing is stored in the output loop instruction sequence storage unit 506.

図３５は、ループの命令列を疑似的に例示する図である。図３５に例示されるループでは、ループ演算命令列に含まれる命令がすべて整数命令となっている。図３６は、図３５に例示されるループが第１比較例に係るコンパイラ装置５００によってループ展開された命令列の一例を示す図である。図３６に例示される展開後の命令列も、全て整数命令となっている。プロセッサ１０１は、図２に例示されるように、整数演算器１０１ａを複数有していない。そのため、プロセッサ１０１が図３６に例示される命令列を実行すると、ひとつの整数演算器１０１ａに処理が集中し、浮動小数点演算器１０１ｂ、１０１ｂは使用されない。そのため、プロセッサ１０１が有する複数の演算器が効率的に使用されているとは言い難い。 FIG. 35 is a diagram exemplifying a loop instruction sequence. In the loop illustrated in FIG. 35, all the instructions included in the loop operation instruction sequence are integer instructions. FIG. 36 is a diagram illustrating an example of an instruction sequence in which the loop illustrated in FIG. 35 is expanded by the compiler device 500 according to the first comparative example. The expanded instruction sequence illustrated in FIG. 36 is also an integer instruction. As illustrated in FIG. 2, the processor 101 does not include a plurality of integer arithmetic units 101a. Therefore, when the processor 101 executes the instruction sequence illustrated in FIG. 36, the processing is concentrated on one integer arithmetic unit 101a, and the floating point arithmetic units 101b and 101b are not used. Therefore, it cannot be said that a plurality of arithmetic units included in the processor 101 are efficiently used.

（第２比較例）
第２比較例では、ループ展開において、整数命令を浮動小数点命令に、浮動小数点命令を整数命令に変換する処理が追加される。図３７は、第２比較例によるループ展開を模式的に例示する図である。図３８は、図３７を疑似コードによって例示する図である。第２比較例では、ループ展開によって、整数命令によるＩＵ命令列と浮動小数点命令によるＦＵ命令列とが生成される。すなわち、第２比較例では、ループ展開前の元の命令列を、整数命令に変換したＩＵ命令列と、浮動小数点命令に変換したＦＵ命令列とを生成する。生成された命令列は、第２比較例に係るコンパイラ装置によって生成されたオブジェクトに含まれる。オブジェクトが実行されると、プロセッサ１０１の演算器の使用状況に応じて、ＩＵ命令列またはＦＵ命令列が実行される。すなわち、プロセッサ１０１の有する演算器の使用状況に応じて、ＩＵ命令列またはＦＵ命令列が実行される。それぞれの演算器で実行される命令列は、互いに異なるスレッドである。すなわち、第２比較例では、マルチスレッドによって、ＩＵ命令列とＦＵ命令列とを実行する。プロセッサ１０１の有する演算器の使用状況は、ＯＳのシステムコールによって取得可能である。このシステムコールは、オブジェクトが実行されている時点における、整数演算および浮動小数点演算のどちらが頻繁に実行されており、整数演算器１０１ａと浮動小数点演算器１０１ｂ、１０１ｂのいずれが空いているかを返す。 (Second comparative example)
In the second comparative example, processing for converting an integer instruction into a floating-point instruction and a floating-point instruction into an integer instruction is added in loop expansion. FIG. 37 is a diagram schematically illustrating loop expansion according to the second comparative example. FIG. 38 is a diagram illustrating FIG. 37 by pseudo code. In the second comparative example, an IU instruction sequence using integer instructions and a FU instruction sequence using floating-point instructions are generated by loop expansion. That is, in the second comparative example, an IU instruction string converted from an original instruction string before loop expansion into an integer instruction and an FU instruction string converted into a floating-point instruction are generated. The generated instruction sequence is included in an object generated by the compiler apparatus according to the second comparative example. When the object is executed, the IU instruction sequence or the FU instruction sequence is executed according to the usage status of the arithmetic unit of the processor 101. That is, the IU instruction sequence or the FU instruction sequence is executed according to the usage status of the arithmetic unit included in the processor 101. The instruction sequences executed by the respective arithmetic units are different threads. That is, in the second comparative example, the IU instruction sequence and the FU instruction sequence are executed by multithreading. The usage status of the arithmetic unit included in the processor 101 can be acquired by an OS system call. This system call returns whether an integer operation or a floating-point operation is frequently executed when the object is executed, and which of the integer arithmetic unit 101a and the floating-point arithmetic units 101b and 101b is free.

第２比較例では、システムコールによって空いている演算器の情報を取得することで、演算器を効率的に使用した。しかしながら、第２比較例では、他のスレッドによる演算器の使用状況を取得するため、オブジェクトの実行中に演算器の使用状況を動的に取得する。そのため、第２比較例では、システムコール等の関数呼び出しに係るオーバーヘッドが生ずる。 In the second comparative example, the computing unit is efficiently used by acquiring information on a computing unit that is free by a system call. However, in the second comparative example, in order to acquire the usage status of the computing unit by another thread, the usage status of the computing unit is dynamically acquired during execution of the object. For this reason, in the second comparative example, an overhead associated with a function call such as a system call occurs.

（比較例と第１実施形態との比較）
図３９は、第１比較例によるループ展開と第１実施形態によるループ展開とを比較する図の一例である。図３９では、展開前の命令列を「元の命令列」と記載している。第１比較例に係るループ展開では、「元の命令列」を２回転分展開している。第１実施形態に係るループ展開では、プロセッサ１０１ａの有する演算器の数に合わせて、浮動小数点命令による命令列（図中では、ＦＵ命令化した命令列と記載）が２回転分、整数命令による命令列（図中では、ＩＵ命令化した命令列と記載）が１回転分展開されている。例えば、「
元の命令列」が浮動小数点命令による命令列であった場合、第１比較例によるループ展開でも２つの浮動小数点演算器１０１ｂ、１０１ｂによって並列して命令の実行が可能である。しかしながら、整数演算器１０１ａは、この間処理を行っていない。第１実施形態によるループ展開では、前述のとおり、プロセッサ１０１ａの有する演算器の数に合わせてループ展開が行われている。そのため、第１実施形態によるループ展開後の命令列では、浮動小数点による命令列を２つの浮動小数点演算器１０１ｂ、１０１ｂによって実行し、整数命令による命令列を整数演算器１０１ａによって実行する。すなわち、第１比較例よりも第１実施形態によるループ展開を行った方が、プロセッサ１０１の有する演算器を効率的に使用できることがわかる。 (Comparison between the comparative example and the first embodiment)
FIG. 39 is an example of a diagram comparing the loop expansion according to the first comparative example and the loop expansion according to the first embodiment. In FIG. 39, the instruction sequence before expansion is described as “original instruction sequence”. In the loop expansion according to the first comparative example, the “original instruction sequence” is expanded twice. In the loop expansion according to the first embodiment, the instruction sequence based on the floating-point instruction (indicated as the instruction sequence converted to the FU instruction in the figure) is rotated twice by the integer instruction in accordance with the number of arithmetic units included in the processor 101a. An instruction sequence (indicated as an IU instruction sequence in the figure) is expanded by one rotation. For example, "
When the “original instruction sequence” is an instruction sequence based on a floating-point instruction, the instructions can be executed in parallel by the two floating-point arithmetic units 101b and 101b even in the loop expansion according to the first comparative example. However, the integer calculator 101a does not perform any processing during this time. In the loop expansion according to the first embodiment, as described above, the loop expansion is performed in accordance with the number of arithmetic units included in the processor 101a. For this reason, in the instruction sequence after loop expansion according to the first embodiment, the floating-point instruction sequence is executed by the two floating-point arithmetic units 101b and 101b, and the integer instruction sequence is executed by the integer arithmetic unit 101a. That is, it can be seen that the arithmetic unit included in the processor 101 can be used more efficiently when the loop expansion according to the first embodiment is performed than in the first comparative example.

図４０は、コンパイラに入力されるソースファイルに含まれるループの一例を示す図である。図４０に例示されるループは、３つの整数命令を含む。図４１は、図４０に例示されたループを第１比較例によるループ展開を行った命令列の一例を示す図である。図４１では、整数命令による命令列が２回転分例示されている。ループ展開後の命令列に含まれる命令には、浮動小数点命令は含まれていない。そのため、第１比較例によってループ展開された命令列は、整数演算器１０１ａによって実行される。この間、浮動小数点演算器１０１ｂ、１０１ｂは処理を行っていない。 FIG. 40 is a diagram illustrating an example of a loop included in a source file input to the compiler. The loop illustrated in FIG. 40 includes three integer instructions. FIG. 41 is a diagram illustrating an example of an instruction sequence obtained by performing loop expansion on the loop illustrated in FIG. 40 according to the first comparative example. In FIG. 41, an instruction sequence by integer instructions is illustrated for two rotations. The instructions included in the instruction sequence after loop expansion do not include floating point instructions. Therefore, the instruction sequence expanded by the loop in the first comparative example is executed by the integer arithmetic unit 101a. During this time, the floating point arithmetic units 101b and 101b are not performing processing.

図４２は、図４０に例示されたループに対して第１実施形態によるループ展開を行った命令列の一例を示す図である。第１実施形態では、プロセッサ１０１の有する整数演算器１０１ａおよび浮動小数点演算器１０１ｂの数に応じて、展開後の命令列に含まれる整数命令列と浮動小数点命令列の数を決定する。すなわち、図４２に例示するように、整数命令を浮動小数点命令に変換した浮動小数点命令列（図中では、ＦＵ整数命令と記載）が２つ生成され、整数命令による整数命令列は１つ生成される。その結果、１つの整数命令列は整数演算器１０１ａによって実行され、２つの浮動小数点命令列は、それぞれ浮動小数点演算器１０１ｂ、１０１ｂによって実行される。その結果、第１実施形態によれば、プロセッサ１０１の有する演算器をより効率的に使用する事ができる。 FIG. 42 is a diagram illustrating an example of an instruction sequence obtained by performing loop expansion according to the first embodiment on the loop illustrated in FIG. In the first embodiment, the numbers of integer instruction sequences and floating-point instruction sequences included in the expanded instruction sequence are determined according to the numbers of integer arithmetic units 101a and floating-point arithmetic units 101b of the processor 101. That is, as illustrated in FIG. 42, two floating-point instruction sequences (indicated as FU integer instructions in the figure) obtained by converting integer instructions into floating-point instructions are generated, and one integer instruction sequence based on the integer instructions is generated. Is done. As a result, one integer instruction sequence is executed by the integer arithmetic unit 101a, and two floating point instruction sequences are executed by the floating point arithmetic units 101b and 101b, respectively. As a result, according to the first embodiment, the arithmetic unit included in the processor 101 can be used more efficiently.

また、第１実施形態では、コンパイル時にプロセッサ１０１の有する整数演算器１０１ａおよび浮動小数点演算器１０１ｂの数に応じて、ループ展開後のループ演算命令列に含まれる整数命令列および浮動小数点命令列の数を決定した。ループ展開された整数命令列および浮動小数点命令列は、プロセッサ１０１の命令スケジューラによって整数演算器１０１ａおよび浮動小数点演算器１０１ｂに割り当てられる。そのため、第１実施形態によれば、オブジェクトの実行中にシステムコールによる演算器の使用状況を取得しなくともよい。その結果、第２比較例と比較して、オブジェクト実行中のシステムコール呼出し等によるオーバーヘッドが抑制される。 In the first embodiment, the integer instruction sequence and the floating-point instruction sequence included in the loop operation instruction sequence after the loop expansion are changed according to the numbers of the integer arithmetic units 101a and the floating-point arithmetic units 101b of the processor 101 at the time of compilation. The number was determined. The integer instruction sequence and the floating-point instruction sequence expanded in a loop are assigned to the integer arithmetic unit 101a and the floating-point arithmetic unit 101b by the instruction scheduler of the processor 101. Therefore, according to the first embodiment, it is not necessary to acquire the usage status of the arithmetic unit by the system call during the execution of the object. As a result, compared to the second comparative example, overhead due to a system call invocation during object execution is suppressed.

（第１実施形態の効果）
第１実施形態では、コンパイラ装置１０は、ループ展開の対象としてループ演算命令列が分岐命令を含まない最内ループを選択した。その結果、コンパイラ装置１０は、ループ展開において、分岐先の命令等を含めて展開しなくともよい。 (Effect of 1st Embodiment)
In the first embodiment, the compiler apparatus 10 selects the innermost loop in which a loop operation instruction sequence does not include a branch instruction as a target of loop expansion. As a result, the compiler apparatus 10 does not have to expand including the branch destination instruction in the loop expansion.

第１実施形態では、コンパイラ装置１０は、ループ演算命令列に含まれる整数命令または浮動小数点命令の数に偏りが無い場合には、図１５のＦ４およびＦ５以降の処理を省略した。そのため、コンパイラ装置１０は、コンパイル時間の長時間化を抑制できる。 In the first embodiment, the compiler apparatus 10 omits the processes after F4 and F5 in FIG. 15 when there is no bias in the number of integer instructions or floating-point instructions included in the loop operation instruction sequence. Therefore, the compiler apparatus 10 can suppress a long compile time.

第１実施形態では、コンパイラ装置１０は、ＦＵ変換テーブル３０１およびＩＵ変換テーブル３０２に基づいてループ演算命令列に含まれる各命令の型を変換した。その結果、コンパイラ装置１０は、変換前のループ演算命令列に整数命令および浮動小数点命令の双方が含まれていても、変換後の命令列では、整数命令のみの命令列および浮動小数点命令
のみの命令列を生成できる。 In the first embodiment, the compiler apparatus 10 converts the type of each instruction included in the loop operation instruction sequence based on the FU conversion table 301 and the IU conversion table 302. As a result, the compiler apparatus 10 has only an integer instruction and only a floating-point instruction in the converted instruction string even if both the integer instruction and the floating-point instruction are included in the loop operation instruction string before the conversion. An instruction sequence can be generated.

第１実施形態では、コンパイラ装置１０は、命令の型を変換した後、プロセッサ１０１の有する整数演算器１０１ａおよび浮動小数点演算器１０１ｂの数に応じて、整数命令による命令列と浮動小数点命令による命令列とを出力した。その結果、コンパイラ装置１０は、プロセッサ１０１の有する演算器をより効率的に使用できるオブジェクトを生成できる。 In the first embodiment, the compiler apparatus 10 converts the instruction type, and then, according to the number of integer arithmetic units 101a and floating point arithmetic units 101b of the processor 101, the instruction sequence using integer instructions and the instructions using floating point instructions Column and output. As a result, the compiler apparatus 10 can generate an object that can use the arithmetic unit of the processor 101 more efficiently.

第１実施形態では、コンパイラ装置１０は、マシンモデル２０１からプロセッサ１０１の有する整数演算器１０１ａおよび浮動小数点演算器１０１ｂの数を取得した。その結果、コンパイル装置１０は、マシンモデル２０１にプロセッサ種別毎の整数演算器および浮動小数点演算器の数を登録しておくことで、様々なプロセッサ向けのオブジェクトを生成できる。 In the first embodiment, the compiler apparatus 10 acquires the numbers of integer arithmetic units 101 a and floating point arithmetic units 101 b of the processor 101 from the machine model 201. As a result, the compiling apparatus 10 can generate objects for various processors by registering the number of integer arithmetic units and floating point arithmetic units for each processor type in the machine model 201.

第１実施形態では、コンパイラ装置１０は、例えば、図１５のＦ４による浮動小数点命令への変換および図１５のＦ５による整数命令への変換を並行して行う。しかしながら、コンパイラ装置１０は、浮動小数点命令への変換および整数命令への変換を並行して実施する構成に限定されない。コンパイラ装置１０は、浮動小数点命令への変換および整数命令への変換のいずれか一方を先に実行し、他方をその後に実行してもよい。 In the first embodiment, the compiler apparatus 10 performs, for example, the conversion to the floating point instruction by F4 in FIG. 15 and the conversion to the integer instruction by F5 in FIG. 15 in parallel. However, the compiler apparatus 10 is not limited to the configuration in which the conversion to the floating point instruction and the conversion to the integer instruction are performed in parallel. The compiler apparatus 10 may execute either the conversion to the floating-point instruction or the conversion to the integer instruction first, and the other after that.

第１実施形態では、コンパイラ装置１０は、コンパイラ装置１０が有するプロセッサ１０１を対象としてコンパイルが実行した。しかしながら、コンパイラ装置１０は、プロセッサ種別の指定を受け付け、指定されたプロセッサを対象としたコンパイルを実行するクロスコンパイラ装置であってもよい。この場合、コンパイル装置１０は、指定されたプロセッサの有する整数演算器および浮動小数点演算器の数をマシンモデル２０１から取得すればよい。 In the first embodiment, the compiler apparatus 10 is compiled for the processor 101 included in the compiler apparatus 10. However, the compiler apparatus 10 may be a cross compiler apparatus that accepts designation of a processor type and executes compilation for a designated processor. In this case, the compiling apparatus 10 may acquire the numbers of integer arithmetic units and floating point arithmetic units included in the designated processor from the machine model 201.

＜第１変形例＞
第１実施形態では、ループの各回転間の依存関係のないループがループ展開の対象とされた。第１変形例では、ループの各回転間に回帰演算の依存関係があるループもループ展開の対象とされる。第１実施形態と共通の構成要素については同一の符号を付し、その説明は省略される。以下、図面を参照して、第１変形例について説明する。 <First Modification>
In the first embodiment, a loop having no dependency relationship between each rotation of the loop is set as a target of loop expansion. In the first modification, a loop having a regression calculation dependency between each rotation of the loop is also subject to loop expansion. Constituent elements common to the first embodiment are denoted by the same reference numerals, and description thereof is omitted. Hereinafter, a first modification will be described with reference to the drawings.

図４３は、回帰演算の一例を示す図である。ひとつの式において、定義と参照に同一の変数を含むものを回帰演算と称する。図４３では、式の定義および参照の双方に変数「Ａ」が含まれている。すなわち、図４３に例示する式は、変数「Ａ」の回帰演算となっている。 FIG. 43 is a diagram illustrating an example of the regression calculation. In one equation, a definition and a reference that contain the same variable are called regression operations. In FIG. 43, the variable “A” is included in both the definition and the reference of the formula. That is, the equation illustrated in FIG. 43 is a regression calculation of the variable “A”.

図４４は、参照オペランドと定義オペランドを含む命令の一例を示す図である。図４４では、「add」命令が例示されている。この「add」命令は、参照オペランドとして「op1
」および「op2」を有し、定義オペランドとして「op3」を有する。第１変形例のコンパイラ装置１０は、参照オペランドである「op1」または「op2」と定義オペランドである「op3」とが等しく、「op1」と「op2」とが異なる命令をループ展開の対象となる回帰演算と
判定する。 FIG. 44 is a diagram illustrating an example of an instruction including a reference operand and a definition operand. FIG. 44 illustrates an “add” instruction. This “add” instruction uses “op1 as a reference operand.
”And“ op2 ”and“ op3 ”as a definition operand. The compiler apparatus 10 according to the first modified example sets an instruction in which “op1” or “op2” as a reference operand is equal to “op3” as a definition operand and “op1” and “op2” are different as loop expansion targets. It is determined that

図４５は、コンパイラ装置１０に入力されるループの命令列の一例を示す図である。図４５の左側はループ展開される前のループの命令列の一例であり、図４５の右側はループ展開後のループの命令列の一例である。図４５の左側に例示されるループのループ演算命令列では、式「a=a+b[i]*c[i]」において、定義と参照の双方に変数「a」が含まれている。そのため、この式は回帰演算となっている。回帰演算では、ループの回転間において依
存があるため、そのままループ展開する事は難しい。 FIG. 45 is a diagram showing an example of a loop instruction string input to the compiler apparatus 10. The left side of FIG. 45 is an example of an instruction sequence of a loop before loop expansion, and the right side of FIG. 45 is an example of an instruction sequence of a loop after loop expansion. In the loop operation instruction sequence of the loop illustrated on the left side of FIG. 45, the variable “a” is included in both the definition and the reference in the expression “a = a + b [i] * c [i]”. Therefore, this equation is a regression calculation. In regression calculation, there is a dependency between loop rotations, so it is difficult to unroll the loop as it is.

そこで、ループ演算命令列に回帰演算が含まれる場合、ループの回転間に依存が生じないようにループ演算命令を変形する。すなわち、図４５に例示される展開番号１、２、３の各命令列において互いに独立した定義オペランドを作成することで、各展開番号の命令列が他の展開番号の命令列から独立して演算可能とする。ループ終了後、各展開番号の命令列によって算出された値を加算することで、ループ展開前の命令列と同じ値を出力可能となる。ループ終了後に各展開番号の命令列によって算出された値を加算する命令を収束命令と称する。図４６は、図４５で例示されたループ展開後の命令列を中間コードで表現した命令列の一例を示す図である。図４６では、「add $g5, $g2, $g2」の部分が回帰演
算となっている。以下、図４６に例示される中間コードをループ展開する処理について説明する。 Therefore, when a regression operation is included in the loop operation instruction sequence, the loop operation instruction is modified so that no dependency occurs between the rotations of the loop. That is, by creating definition operands that are independent from each other in the instruction sequences of the expansion numbers 1, 2, and 3 illustrated in FIG. 45, the instruction sequences of the expansion numbers are operated independently from the instruction sequences of the other expansion numbers. Make it possible. After the loop is completed, the same value as the instruction sequence before the loop expansion can be output by adding the values calculated by the instruction sequences of the respective expansion numbers. An instruction for adding the values calculated by the instruction sequence of each expansion number after the loop ends is called a convergence instruction. FIG. 46 is a diagram illustrating an example of an instruction sequence in which the instruction sequence after loop expansion illustrated in FIG. 45 is expressed by intermediate code. In FIG. 46, the part “add $ g5, $ g2, $ g2” is the regression calculation. In the following, processing for loop expansion of the intermediate code illustrated in FIG. 46 will be described.

図４７は、第１変形例に係るコンパイラ装置１０によるループ展開処理の流れの一例を示す図である。以下、図４７を参照して、第１変形例のループ展開処理について説明する。 FIG. 47 is a diagram showing an example of the flow of loop expansion processing by the compiler apparatus 10 according to the first modification. Hereinafter, with reference to FIG. 47, the loop expansion processing of the first modification will be described.

図４７のＦ１からＦ３までの処理は、図１５のＦ１からＦ３までの処理と同様である。そのため、その説明を省略する。Ｒ１では、コンパイラ装置１０のＦＵ命令変換部２２２は、Ｆ１で抽出されたループ演算命令列に含まれる命令を浮動小数点命令に変換し、ＦＵ変換命令列格納部２６２に格納する。Ｒ２では、コンパイラ装置１０のＩＵ命令変換部２２１は、Ｆ１で抽出されたループ演算命令列に含まれる命令を整数命令に変換し、ＩＵ変換命令列格納部２６３に格納する。 The processing from F1 to F3 in FIG. 47 is the same as the processing from F1 to F3 in FIG. Therefore, the description is omitted. In R1, the FU instruction conversion unit 222 of the compiler apparatus 10 converts the instruction included in the loop operation instruction sequence extracted in F1 into a floating-point instruction, and stores it in the FU conversion instruction sequence storage unit 262. In R2, the IU instruction conversion unit 221 of the compiler apparatus 10 converts the instruction included in the loop operation instruction sequence extracted in F1 into an integer instruction and stores it in the IU conversion instruction sequence storage unit 263.

図４８は、図４７のＲ１およびＲ２の処理の詳細な流れの一例を示す図である。図４８を参照して、図４７のＲ１およびＲ２の処理の流れについて説明する。Ｕ１では、命令変換部２２０は、ループ展開後の初期化命令列を格納する初期化命令列結果リストを作成する。Ｕ２では、命令変換部２２０は、ループ展開後のループ演算命令列を格納する演算命令列結果リストを作成する。Ｕ３では、命令変換部２２０は、ループ展開後の収束命令を格納する収束命令列結果リストを作成する。Ｕ１からＵ３で作成されたそれぞれのリストは、例えば、コンパイラ装置１０の主記憶部１０２または補助記憶部１０３上に設けられる。図４８のＧ２からＧ１０までの処理は、図１８のＧ２からＧ１０までの処理と同様である。そのため、その説明を省略する。 FIG. 48 is a diagram showing an example of a detailed flow of the processing of R1 and R2 of FIG. The processing flow of R1 and R2 in FIG. 47 will be described with reference to FIG. In U1, the instruction conversion unit 220 creates an initialization instruction sequence result list that stores the initialization instruction sequence after loop expansion. In U2, the instruction conversion unit 220 creates an operation instruction sequence result list for storing the loop operation instruction sequence after loop expansion. In U3, the instruction conversion unit 220 creates a converged instruction string result list for storing the converged instructions after loop expansion. Each list created in U1 to U3 is provided on the main storage unit 102 or the auxiliary storage unit 103 of the compiler apparatus 10, for example. The processing from G2 to G10 in FIG. 48 is the same as the processing from G2 to G10 in FIG. Therefore, the description is omitted.

図４７に戻り、Ｆ６の処理は、図１５のＦ６の処理と同様である。そのため、その説明を省略する。Ｒ３では、命令変換部２２０は、回帰演算用の初期化処理を実行する。 Returning to FIG. 47, the processing of F6 is the same as the processing of F6 of FIG. Therefore, the description is omitted. In R3, the instruction conversion unit 220 executes initialization processing for regression calculation.

図４９は、回帰演算用の初期化処理を示す図の一例である。図４９は、図４７のＲ３の処理の詳細な流れの一例を示す図である。図４９を参照して、回帰演算用の初期化処理の流れについて説明する。 FIG. 49 is an example of a diagram illustrating initialization processing for regression calculation. FIG. 49 is a diagram showing an example of a detailed flow of the process of R3 of FIG. With reference to FIG. 49, the flow of initialization processing for regression calculation will be described.

Ｃ１では、命令変換部２２０は、仮想レジスタマップを作成する。仮想レジスタマップは、図２４で説明したように、ループ変換前の仮想レジスタとループ変換後の仮想レジスタとの対応を示すマップである。作成された仮想マップは、例えば、主記憶部１０２または補助記憶部１０３上に記憶される。 In C1, the instruction conversion unit 220 creates a virtual register map. As described with reference to FIG. 24, the virtual register map is a map showing the correspondence between the virtual register before the loop conversion and the virtual register after the loop conversion. The created virtual map is stored on the main storage unit 102 or the auxiliary storage unit 103, for example.

Ｃ２では、命令変換部２２０は、ループ展開の前後における回帰演算の定義の対応をＣ１で作成した仮想レジスタマップに追加する。命令変換部２２０は、例えば、図４６に例示される中間コードを例にすれば、回帰演算「add $g1, $g2, $g2」の定義「$g2」について、Ｃ２の処理を実行する。 In C2, the instruction conversion unit 220 adds the correspondence of the regression calculation definition before and after the loop expansion to the virtual register map created in C1. For example, taking the intermediate code illustrated in FIG. 46 as an example, the instruction conversion unit 220 executes the process of C2 for the definition “$ g2” of the regression operation “add $ g1, $ g2, $ g2”.

Ｃ３では、命令変換部２２０は、ループ演算命令列に次の回帰演算命令があるか否かを判定する。次の回帰演算命令がある場合（Ｃ３でＹＥＳ）、処理はＣ２に進められる。次の回帰演算命令が無い場合（Ｃ３でＮＯ）、処理は終了される。すなわち、図４９に例示される回帰演算用の初期化処理は、ループ演算命令列に含まれる全ての回帰演算命令に対して実行される。 In C3, the instruction conversion unit 220 determines whether there is a next regression operation instruction in the loop operation instruction sequence. If there is a next regression operation instruction (YES in C3), the process proceeds to C2. If there is no next regression calculation command (NO in C3), the process is terminated. That is, the initialization process for regression calculation illustrated in FIG. 49 is executed for all regression calculation instructions included in the loop calculation instruction sequence.

図４７に戻り、Ｒ４では、ＦＵ命令展開部２３２は、ＦＵ変換命令列格納部２６２に格納された命令列を展開し、展開した命令列をＦＵ出力命令列格納部２６４に格納する。Ｒ５では、ＩＵ命令展開部２３１は、ＩＵ変換命令列格納部２６３に格納された命令列を展開し、展開した命令列をＩＵ出力命令列格納部２６５に格納する。 Returning to FIG. 47, in R4, the FU instruction expansion unit 232 expands the instruction sequence stored in the FU conversion instruction sequence storage unit 262, and stores the expanded instruction sequence in the FU output instruction sequence storage unit 264. In R5, the IU instruction expansion unit 231 expands the instruction sequence stored in the IU conversion instruction sequence storage unit 263, and stores the expanded instruction sequence in the IU output instruction sequence storage unit 265.

図５０は、ループ演算命令列の展開処理の一例を示す図である。図５０は、図４７のＲ４およびＲ５の処理の詳細な流れの一例を示す図である。図５０を参照して、ループ演算命令列の展開処理について説明する。 FIG. 50 is a diagram illustrating an example of a loop operation instruction sequence expansion process. FIG. 50 is a diagram showing an example of a detailed flow of the processing of R4 and R5 of FIG. With reference to FIG. 50, the expansion processing of the loop operation instruction sequence will be described.

Ｈ２の処理は、図１５のＨ２の処理と同様である。そのため、その説明を省略する。Ｄ１では、命令展開部２３０は、Ｈ２で取り出した命令が回帰演算であるか否かを判定する。回帰演算である場合（Ｄ１でＹＥＳ）、処理はＤ２に進められる。回帰演算でない場合（Ｄ１でＮＯ）、処理は、Ｈ３に進められる。 The process of H2 is the same as the process of H2 in FIG. Therefore, the description is omitted. In D1, the instruction expansion unit 230 determines whether or not the instruction extracted in H2 is a regression operation. If it is a regression calculation (YES in D1), the process proceeds to D2. If it is not a regression calculation (NO in D1), the process proceeds to H3.

図５１は、回帰演算命令の一例を示す図である。図５１は、図４４で一例を示した命令について、説明の便宜上参照リストおよび定義リストの範囲を示したものである。すなわち、図４４で一例を示した加算命令「add」では、参照オペランドとして「op1」、「op2
」を有し、定義オペランドとして「op3」を有する。図５１に例示される加算命令「add」は、参照オペランド「op1」と定義オペランド「op3」とが同一の変数であり、「op1」と
「op2」とは異なる変数であるものとする。 FIG. 51 is a diagram illustrating an example of a regression calculation instruction. FIG. 51 shows the range of the reference list and the definition list for the convenience of explanation with respect to the instruction shown as an example in FIG. That is, in the addition instruction “add” shown as an example in FIG. 44, “op1”, “op2” are used as reference operands.
”And“ op3 ”as a definition operand. In the addition instruction “add” illustrated in FIG. 51, the reference operand “op1” and the definition operand “op3” are the same variable, and “op1” and “op2” are different variables.

図５２は、回帰演算命令の書き換え処理の一例を示す図である。図５２は、図５０のＤ２の処理の詳細な流れの一例を示す図である。以下、図５２を参照して、回帰演算命令の書き換え処理について説明する。Ｅ１では、命令展開部２３０は、参照オペランド「op1
」が整数型（int型）であるか否かを判定する。「op1」が整数型である場合、処理はＥ２に進められる。「op1」が整数型でない場合、処理はＥ５に進められる。 FIG. 52 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. FIG. 52 is a diagram showing an example of a detailed flow of the process D2 of FIG. Hereinafter, with reference to FIG. 52, the rewrite processing of the regression calculation instruction will be described. In E1, the instruction expansion unit 230 determines that the reference operand “op1
"Is an integer type (int type) or not. If “op1” is an integer type, the process proceeds to E2. If “op1” is not an integer type, the process proceeds to E5.

Ｅ２では、命令展開部２３０は、書き換え対象となる命令が整数命令であるか否かを判定する。書き換え対象となる命令が整数命令である場合、処理はＥ３に進められる。書き換え対象となる命令が整数命令でない場合、処理はＥ４に進められる。 In E2, the instruction expansion unit 230 determines whether or not the instruction to be rewritten is an integer instruction. If the instruction to be rewritten is an integer instruction, the process proceeds to E3. If the instruction to be rewritten is not an integer instruction, the process proceeds to E4.

Ｅ３では、命令変換部２３０は、回帰演算命令の書き換えを行う。図５３は、回帰演算命令の書き換え処理の一例を示す図である。図５３は、図５２のＥ３の処理の詳細な流れの一例を示す図である。以下、図５３を参照して、図５２のＥ３の処理について説明する。 In E3, the instruction conversion unit 230 rewrites the regression calculation instruction. FIG. 53 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. FIG. 53 is a diagram showing an example of a detailed flow of the process of E3 in FIG. Hereinafter, the process of E3 in FIG. 52 will be described with reference to FIG.

Ｅ３１では、命令展開部２３０は、整数型（int型）の仮想レジスタを新規に作成する
。図５３では、作成された仮想レジスタは、「new」と記載されている。Ｅ３２では、命
令展開部２３０は、図４８のＵ１で作成された初期化命令列結果リストに、Ｅ３１で作成した仮想レジスタを初期化する命令を追加する。ここでは、命令「mov 0, new」が追加される。すなわち、Ｅ３２で初期化命令列結果リストに追加された命令は、Ｅ３１で作成した仮想レジスタ「new」を「０」で初期化している。 In E31, the instruction expansion unit 230 newly creates an integer type (int type) virtual register. In FIG. 53, the created virtual register is described as “new”. In E32, the instruction expansion unit 230 adds an instruction for initializing the virtual register created in E31 to the initialization instruction sequence result list created in U1 of FIG. Here, the instruction “mov 0, new” is added. That is, the instruction added to the initialization instruction sequence result list in E32 initializes the virtual register “new” created in E31 with “0”.

Ｅ３３では、命令展開部２３０は、Ｅ３１で作成した仮想レジスタを用いて回帰演算命令を書き換える。書き換えられた命令は「add new, op2, new」となる。Ｅ３４では、命
令展開部２３０は、収束命令列結果リストにループ終了後に実行する加算命令を追加する。ここでは、「add new, op3, op3」が収束命令列結果リストに追加される。 In E33, the instruction expansion unit 230 rewrites the regression calculation instruction using the virtual register created in E31. The rewritten instruction is “add new, op2, new”. In E34, the instruction expansion unit 230 adds an addition instruction to be executed after the loop ends to the convergence instruction string result list. Here, “add new, op3, op3” is added to the convergence instruction string result list.

Ｅ３５では、命令展開部２３０は、図４９のＣ１で作成された仮想レジスタマップを参照し、仮想レジスタ「op2」に対応するループ展開後の仮想レジスタ（図５３では、newOP2と記載）を取り出す。Ｅ３６では、命令展開部２３０は、回帰演算を「add new, newOP2, new」と書き換える。 In E35, the instruction expansion unit 230 refers to the virtual register map created in C1 of FIG. 49, and extracts the virtual register after loop expansion corresponding to the virtual register “op2” (described as newOP2 in FIG. 53). In E36, the instruction expansion unit 230 rewrites the regression calculation as “add new, newOP2, new”.

図５２に戻り、Ｅ４では、命令変換部２３０は、回帰演算命令の書き換えを行う。図５４は、回帰演算命令の書き換え処理の一例を示す図である。図５４は、図５２のＥ４の処理の詳細な流れの一例を示す図である。以下、図５４を参照して、図５２のＥ４の処理について説明する。 Returning to FIG. 52, in E4, the instruction conversion unit 230 rewrites the regression operation instruction. FIG. 54 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. FIG. 54 is a diagram showing an example of a detailed flow of the process E4 of FIG. Hereinafter, the process of E4 of FIG. 52 will be described with reference to FIG.

Ｅ５１では、命令展開部２３０は、浮動小数点型（float型）の仮想レジスタを新規に
作成する。図５４のＥ５１では、作成された仮想レジスタは、「new1」と記載されている。Ｅ５２では、命令展開部２３０は、図４８のＵ１で作成された初期化命令列結果リストに、Ｅ５１で作成した仮想レジスタを初期化する命令を追加する。ここでは、命令「mov 0, new1」が追加される。すなわち、Ｅ５２で初期化命令列結果リストに追加された命令
は、Ｅ５１で作成した仮想レジスタ「new1」を「０」で初期化している。 In E51, the instruction expansion unit 230 newly creates a floating-point type (float type) virtual register. In E51 of FIG. 54, the created virtual register is described as “new1”. In E52, the instruction expansion unit 230 adds an instruction for initializing the virtual register created in E51 to the initialization instruction sequence result list created in U1 of FIG. Here, the instruction “mov 0, new1” is added. That is, the instruction added to the initialization instruction sequence result list in E52 initializes the virtual register “new1” created in E51 with “0”.

Ｅ５３では、命令展開部２３０は、Ｅ５１で作成した仮想レジスタを用いて回帰演算命令を書き換える。書き換えられた命令は「fadd new1, op2, new1」となる。Ｅ５４では、命令展開部２３０は、整数型（int型）の仮想レジスタを新規に作成する。図５４のＥ５
４では、作成された仮想レジスタは、「new2」と記載されている。Ｅ５５では、命令展開部２３０は、変数の型変換を行う命令を収束命令列結果リストに追加する。ここでは、「movftoi new1, new2」が追加される。この命令は、浮動小数点型の仮想レジスタ「new1」の値を整数型に型変換せずに、ビット列はそのままで、整数型の仮想レジスタである「new2」に代入する。 In E53, the instruction expansion unit 230 rewrites the regression calculation instruction using the virtual register created in E51. The rewritten instruction is “fadd new1, op2, new1”. In E54, the instruction expansion unit 230 newly creates an integer type (int type) virtual register. E5 in FIG.
4, the created virtual register is described as “new2”. In E55, the instruction expansion unit 230 adds an instruction for performing variable type conversion to the converged instruction string result list. Here, “movftoi new1, new2” is added. This instruction does not convert the value of the floating-point type virtual register “new1” into the integer type, and substitutes it into “new2”, which is the integer type virtual register, without changing the bit string.

Ｅ５６では、命令展開部２３０は、ループ終了後に実行される加算命令を収束命令列結果リストに追加する。ここでは、「add new2, op3, op3」が追加される。Ｅ５７では、命令展開部２３０は、図４９のＣ１で作成された仮想レジスタマップを参照し、仮想レジスタ「op2」に対応するループ展開後の仮想レジスタ（図５３では、newOP2と記載）を取り
出す。Ｅ５８では、命令展開部２３０は、回帰演算を「fadd new2, newOP2, new2」と書
き換える。 In E56, the instruction expansion unit 230 adds an addition instruction to be executed after the end of the loop to the convergence instruction string result list. Here, “add new2, op3, op3” is added. In E57, the instruction expansion unit 230 refers to the virtual register map created in C1 of FIG. 49 and extracts the virtual register after loop expansion corresponding to the virtual register “op2” (described as newOP2 in FIG. 53). In E58, the instruction expansion unit 230 rewrites the regression calculation as “fadd new2, newOP2, new2”.

図５２に戻り、Ｅ５では、命令展開部２３０は、書き換え対象となる命令が整数命令であるか否かを判定する。書き換え対象となる命令が整数命令である場合、処理はＥ６に進められる。書き換え対象となる命令が整数命令でない場合、処理はＥ７に進められる。 Returning to FIG. 52, in E5, the instruction expansion unit 230 determines whether or not the instruction to be rewritten is an integer instruction. If the instruction to be rewritten is an integer instruction, the process proceeds to E6. If the instruction to be rewritten is not an integer instruction, the process proceeds to E7.

図５５は、回帰演算命令の書き換え処理の一例を示す図である。図５５は、図５２のＥ６の処理の詳細な流れの一例を示す図である。以下、図５５を参照して、図５２のＥ６の処理について説明する。 FIG. 55 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. FIG. 55 is a diagram showing an example of a detailed flow of the process of E6 of FIG. Hereinafter, the process of E6 of FIG. 52 will be described with reference to FIG.

Ｅ３１からＥ３３の処理は、図５３のＥ３１からＥ３３の処理と同様である。そのため、その説明を省略する。Ｅ６１では、命令展開部２３０は、浮動小数点型（float型）の
仮想レジスタを新規に作成する。図５５のＥ６１では、作成された仮想レジスタは、「new2」と記載されている。Ｅ６２では、命令展開部２３０は、変数の型変換を行う命令を収
束命令列結果リストに追加する。ここでは、「movftoi new2, new」が追加される。この
命令は、浮動小数点型の仮想レジスタ「new2」の値を整数型に型変換せずに、ビット列はそのままで、整数型の仮想レジスタである「new」に代入する。 The processing from E31 to E33 is the same as the processing from E31 to E33 in FIG. Therefore, the description is omitted. In E61, the instruction expansion unit 230 newly creates a floating-point type (float type) virtual register. In E61 of FIG. 55, the created virtual register is described as “new2”. In E62, the instruction expansion unit 230 adds an instruction for performing variable type conversion to the converged instruction string result list. Here, “movftoi new2, new” is added. This instruction does not convert the value of the floating-point type virtual register “new2” into the integer type, and substitutes it into “new”, which is the integer type virtual register, without changing the bit string.

Ｅ６３では、命令展開部２３０は、ループ終了後に実行される加算命令を収束命令列結果リストに追加する。ここでは、「fadd new, op3, op3」が追加される。Ｅ３５からＥ３６の処理は、図５３のＥ３５からＥ３６の処理と同様である。そのため、その説明を省略する。 In E63, the instruction expansion unit 230 adds the addition instruction executed after the loop ends to the converged instruction string result list. Here, “fadd new, op3, op3” is added. The processing from E35 to E36 is the same as the processing from E35 to E36 in FIG. Therefore, the description is omitted.

図５６は、回帰演算命令の書き換え処理の一例を示す図である。図５６は、図５２のＥ７の処理の詳細な流れの一例を示す図である。以下、図５６を参照して、図５２のＥ７の処理について説明する。 FIG. 56 is a diagram illustrating an example of rewrite processing of a regression calculation instruction. FIG. 56 is a diagram showing an example of a detailed flow of the process of E7 of FIG. Hereinafter, the processing of E7 in FIG. 52 will be described with reference to FIG.

Ｅ５１からＥ５３の処理は、図５４のＥ５１からＥ５３の処理と同様である。そのため、その説明を省略する。Ｅ７１では、命令展開部２３０は、ループ終了後に実行される加算命令を収束命令列結果リストに追加する。ここでは、「fadd new1, op3, op3」が追加
される。Ｅ３５からＥ３６の処理は、図５３のＥ３５からＥ３６の処理と同様である。そのため、その説明を省略する。 The processing from E51 to E53 is the same as the processing from E51 to E53 in FIG. Therefore, the description is omitted. In E71, the instruction expansion unit 230 adds the addition instruction executed after the loop ends to the converged instruction string result list. Here, “fadd new1, op3, op3” is added. The processing from E35 to E36 is the same as the processing from E35 to E36 in FIG. Therefore, the description is omitted.

図５０に戻り、Ｈ３からＨ１５の処理は、図２３のＨ３からＨ１５の処理と同様である。そのため、その説明を省略する。 Returning to FIG. 50, the processing from H3 to H15 is the same as the processing from H3 to H15 in FIG. Therefore, the description is omitted.

図４７に戻り、Ｆ９からＦ１２の処理は、図１５のＦ９からＦ１２の処理と同様である。そのため、その説明を省略する。 Returning to FIG. 47, the processing from F9 to F12 is the same as the processing from F9 to F12 in FIG. Therefore, the description is omitted.

図５７は、ループ展開後のループ演算命令列の一例を示す図である。展開番号１および３の命令列は、図５４に例示されるループ展開方法によって展開される。展開番号２の命令列は、図５３に例示されるループ展開方法によって展開される。 FIG. 57 is a diagram showing an example of a loop operation instruction sequence after loop expansion. The instruction sequences of the expansion numbers 1 and 3 are expanded by the loop expansion method illustrated in FIG. The instruction sequence of expansion number 2 is expanded by the loop expansion method illustrated in FIG.

図５８は、ループ展開後の初期化命令列の一例を示す図である。展開番号１および２の初期化命令列は、図５４のＥ５２の処理によって追加された命令である。展開番号３の初期化命令列は、図５３のＥ３２の処理によって追加された命令である。 FIG. 58 is a diagram showing an example of an initialization instruction sequence after loop expansion. The initialization instruction strings with the expansion numbers 1 and 2 are instructions added by the process of E52 in FIG. The initialization instruction sequence with the expansion number 3 is an instruction added by the process of E32 in FIG.

図５９は、ループ展開後の収束命令列の一例を示す図である。展開番号１および２の収束命令列は、図５４のＥ５５およびＥ５６の処理によって追加された命令である。展開番号３の収束命令列は、図５３のＥ３４の処理によって追加された命令である。 FIG. 59 is a diagram illustrating an example of a converged instruction sequence after loop expansion. The convergence instruction sequences of the expansion numbers 1 and 2 are instructions added by the processes of E55 and E56 in FIG. The convergence instruction string of the expansion number 3 is an instruction added by the process of E34 in FIG.

図６０は、ループ展開後のループ命令列の一例を示す図である。ループ命令列は、第１実施形態と同様にループカウンタの補正がなされる。図６１は、ループ展開後のループの命令列の一例を示す図である。図６１は、図４７の処理によって展開されたループの命令列の一例である。図６１に例示される命令列は、図５８に例示される初期化命令列、図５７に例示されるループ演算命令列、図６０に例示されるループ命令列、図５９に例示される収束命令列を組み合わせたものである。すなわち、図５８によって例示される初期化命令列によって仮想レジスタの初期化がなされる。図５７に例示されるループ演算命令列は、図６０によって例示されるループ命令列によって繰り返し実行される。ループ終了後、図５９によって例示される収束命令列が実行される。 FIG. 60 is a diagram showing an example of a loop instruction sequence after loop expansion. The loop instruction sequence is corrected by the loop counter as in the first embodiment. FIG. 61 is a diagram showing an example of a loop instruction sequence after loop expansion. FIG. 61 is an example of a loop instruction sequence developed by the processing of FIG. The instruction sequence illustrated in FIG. 61 includes an initialization instruction sequence illustrated in FIG. 58, a loop operation instruction sequence illustrated in FIG. 57, a loop instruction sequence illustrated in FIG. 60, and a convergence instruction illustrated in FIG. It is a combination of columns. That is, the virtual register is initialized by the initialization instruction sequence illustrated in FIG. The loop operation instruction sequence illustrated in FIG. 57 is repeatedly executed by the loop instruction sequence illustrated by FIG. After the end of the loop, the convergence instruction sequence illustrated by FIG. 59 is executed.

第１変形例では、回帰演算を含む命令を命令およびオペランドの型で分類し、各分類ごとに、ループ間に依存関係が生じないようにループ展開を行った。その結果、第１変形例によれば、回帰演算を含むループの命令列に対しても第１実施形態に係るループ展開を適
用可能である。 In the first modification, instructions including regression operations are classified by instruction and operand types, and loop expansion is performed for each classification so that no dependency relationship occurs between the loops. As a result, according to the first modification, the loop expansion according to the first embodiment can be applied to the instruction sequence of the loop including the regression operation.

以上で開示した実施形態や変形例はそれぞれ組み合わせる事ができる。 The embodiments and modifications disclosed above can be combined.

＜その他＞
以上の第１変形例を含む実施形態に関し、更に以下の付記を開示する。
（付記１）
入力されたソースファイルにおいて命令列の繰り返しを指示する繰り返し命令を検出すると、前記繰り返し命令によって繰り返される前記命令列を抽出する抽出部（２２０）と、
抽出された前記命令列に含まれる命令を整数演算を行う整数命令に変換して整数命令列を生成する第１の生成部と、
抽出された前記命令列に含まれる命令を浮動小数点演算を行う浮動小数点命令に変換して浮動小数点命令列を生成する第２の生成部と、
前記ソースファイルをコンパイルしたオブジェクトの実行環境となるプロセッサの有する整数演算器の数および浮動小数点演算器の数に基づいて、前記整数命令列と前記浮動小数点命令列とを含む出力命令列を生成する第３の生成部と、を備える、
コンパイラ装置。
（付記２）
前記繰り返し命令による前記命令列の繰り返し回数、前記整数演算器の数および前記浮動小数点演算器の数に基づいて、前記繰り返し命令による前記出力命令列の繰り返し回数を決定する決定部をさらに備える、
付記１に記載のコンパイラ装置。
（付記３）
前記決定部は、前記繰り返し命令による前記命令列の繰り返し回数を前記整数演算器の数および前記浮動小数点演算器の数の和で除算した値を前記繰り返し命令による前記出力命令列の繰り返し回数とする、
付記２に記載のコンパイラ装置。
（付記４）
整数演算器と浮動小数点演算器の数をプロセッサ種別毎に記憶するプロセッサ情報記憶部と、
前記オブジェクトの実行環境となるプロセッサの有する整数演算器と浮動小数点演算器の数を前記プロセッサ情報記憶部から取得する演算器数取得部と、をさらに備える、
付記１から３のいずれか一項に記載のコンパイラ装置。
（付記５）
前記抽出部によって抽出される前記命令列は分岐を含まない命令列である、
付記１から４のいずれか一項に記載のコンパイラ装置。
（付記６）
コンピュータが、
入力されたソースファイルにおいて命令列の繰り返しを指示する繰り返し命令を検出すると、前記繰り返し命令によって繰り返される前記命令列を抽出し、
抽出された前記命令列に含まれる命令を整数演算を行う整数命令に変換して整数命令列を生成し、
抽出された前記命令列に含まれる命令を浮動小数点演算を行う浮動小数点命令に変換して浮動小数点命令列を生成し、
前記ソースファイルをコンパイルしたオブジェクトの実行環境となるプロセッサの有する整数演算器の数および浮動小数点演算器の数に基づいて、前記整数命令列と前記浮動小数点命令列とを含む出力命令列を生成する、
コンパイル方法。
（付記７）
前記繰り返し命令による前記命令列の繰り返し回数、前記整数演算器の数および前記浮動小数点演算器の数に基づいて、前記繰り返し命令による前記出力命令列の繰り返し回数を決定する処理をさらに実行する、
付記６に記載のコンパイル方法。
（付記８）
前記決定する処理は、前記繰り返し命令による前記命令列の繰り返し回数を前記整数演算器の数および前記浮動小数点演算器の数の和で除算した値を前記繰り返し命令による前記出力命令列の繰り返し回数とする処理を含む、
付記７に記載のコンパイル方法。
（付記９）
前記コンピュータは、整数演算器と浮動小数点演算器の数をプロセッサ種別毎に記憶するプロセッサ情報記憶部を備え、
前記オブジェクトの実行環境となるプロセッサの有する整数演算器と浮動小数点演算器の数を前記プロセッサ情報記憶部から取得する処理をさらに実行する、
付記６から８のいずれか一項に記載のコンパイル方法。
（付記１０）
前記抽出される前記命令列は分岐を含まない命令列である、
付記６から９のいずれか一項に記載のコンパイル方法。
（付記１１）
コンピュータに、
入力されたソースファイルにおいて命令列の繰り返しを指示する繰り返し命令を検出すると、前記繰り返し命令によって繰り返される前記命令列を抽出させ、
抽出された前記命令列に含まれる命令を整数演算を行う整数命令に変換して整数命令列を生成させ、
抽出された前記命令列に含まれる命令を浮動小数点演算を行う浮動小数点命令に変換して浮動小数点命令列を生成させ、
前記ソースファイルをコンパイルしたオブジェクトの実行環境となるプロセッサの有する整数演算器の数および浮動小数点演算器の数に基づいて、前記整数命令列と前記浮動小数点命令列とを含む出力命令列を生成させる、
コンパイラプログラム。
（付記１２）
前記繰り返し命令による前記命令列の繰り返し回数、前記整数演算器の数および前記浮動小数点演算器の数に基づいて、前記繰り返し命令による前記出力命令列の繰り返し回数を決定させる処理をさらに実行させる、
付記１１に記載のコンパイラプログラム。
（付記１３）
前記決定させる処理は、前記繰り返し命令による前記命令列の繰り返し回数を前記整数演算器の数および前記浮動小数点演算器の数の和で除算した値を前記繰り返し命令による前記出力命令列の繰り返し回数とする処理を含む、
付記１２に記載のコンパイラプログラム。
（付記１４）
前記コンピュータは、整数演算器と浮動小数点演算器の数をプロセッサ種別毎に記憶するプロセッサ情報記憶部を備え、
前記オブジェクトの実行環境となるプロセッサの有する整数演算器と浮動小数点演算器の数を前記プロセッサ情報記憶部から取得する処理ををさらに実行させる、
付記１１から１３のいずれか一項に記載のコンパイラプログラム。
（付記１５）
前記抽出される前記命令列は分岐を含まない命令列である、
付記１１から１４のいずれか一項に記載のコンパイラプログラム。 <Others>
Regarding the embodiment including the above first modified example, the following additional notes are further disclosed.
(Appendix 1)
An extraction unit (220) for extracting the instruction sequence repeated by the repetition instruction upon detecting a repetition instruction instructing repetition of the instruction sequence in the input source file;
A first generation unit that converts an instruction included in the extracted instruction sequence into an integer instruction that performs an integer operation to generate an integer instruction sequence;
A second generation unit that generates a floating-point instruction sequence by converting an instruction included in the extracted instruction sequence into a floating-point instruction that performs a floating-point operation;
An output instruction sequence including the integer instruction sequence and the floating-point instruction sequence is generated based on the number of integer arithmetic units and the number of floating-point arithmetic units included in a processor that is an execution environment of an object compiled from the source file. A third generation unit,
Compiler device.
(Appendix 2)
A determination unit that determines the number of repetitions of the output instruction sequence by the repetition instruction based on the number of repetitions of the instruction sequence by the repetition instruction, the number of integer arithmetic units, and the number of the floating-point arithmetic units;
The compiler apparatus according to appendix 1.
(Appendix 3)
The determination unit sets a value obtained by dividing the number of repetitions of the instruction sequence by the repetition instruction by the sum of the number of integer arithmetic units and the number of floating point arithmetic units as the number of repetitions of the output instruction sequence by the repetition instructions. ,
The compiler apparatus according to attachment 2.
(Appendix 4)
A processor information storage unit for storing the number of integer arithmetic units and floating point arithmetic units for each processor type;
An arithmetic unit number acquisition unit that acquires the number of integer arithmetic units and floating-point arithmetic units included in a processor that is an execution environment of the object from the processor information storage unit;
The compiler apparatus according to any one of appendices 1 to 3.
(Appendix 5)
The instruction sequence extracted by the extraction unit is an instruction sequence not including a branch.
The compiler apparatus according to any one of appendices 1 to 4.
(Appendix 6)
Computer
When a repetition instruction that instructs repetition of an instruction sequence is detected in the input source file, the instruction sequence repeated by the repetition instruction is extracted,
Converting an instruction included in the extracted instruction sequence into an integer instruction for performing an integer operation to generate an integer instruction sequence;
Converting the instruction included in the extracted instruction sequence into a floating-point instruction for performing a floating-point operation to generate a floating-point instruction sequence;
An output instruction sequence including the integer instruction sequence and the floating-point instruction sequence is generated based on the number of integer arithmetic units and the number of floating-point arithmetic units included in a processor that is an execution environment of an object compiled from the source file. ,
Compilation method.
(Appendix 7)
Further executing a process of determining the number of repetitions of the output instruction sequence by the repetition instruction based on the number of repetitions of the instruction sequence by the repetition instruction, the number of integer arithmetic units and the number of floating-point arithmetic units,
The compiling method according to appendix 6.
(Appendix 8)
The determining process includes a value obtained by dividing the number of repetitions of the instruction sequence by the repetition instruction by the sum of the number of integer arithmetic units and the number of floating-point arithmetic units, and the number of repetitions of the output instruction sequence by the repetition instruction. Including processing to
The compiling method according to appendix 7.
(Appendix 9)
The computer includes a processor information storage unit that stores the number of integer arithmetic units and floating point arithmetic units for each processor type,
Further executing a process of acquiring from the processor information storage unit the number of integer arithmetic units and floating-point arithmetic units possessed by the processor that is the execution environment of the object;
The compiling method according to any one of appendices 6 to 8.
(Appendix 10)
The extracted instruction sequence is an instruction sequence that does not include a branch.
The compiling method according to any one of appendices 6 to 9.
(Appendix 11)
On the computer,
When a repetition instruction that instructs repetition of an instruction sequence is detected in the input source file, the instruction sequence repeated by the repetition instruction is extracted,
Converting an instruction included in the extracted instruction sequence to an integer instruction for performing an integer operation to generate an integer instruction sequence;
Converting the instruction included in the extracted instruction sequence into a floating-point instruction for performing a floating-point operation to generate a floating-point instruction sequence;
An output instruction sequence including the integer instruction sequence and the floating-point instruction sequence is generated based on the number of integer arithmetic units and the number of floating-point arithmetic units included in a processor that is an execution environment of an object compiled from the source file. ,
Compiler program.
(Appendix 12)
Further executing a process of determining the number of repetitions of the output instruction sequence by the repetition instruction based on the number of repetitions of the instruction sequence by the repetition instruction, the number of the integer arithmetic units, and the number of the floating point arithmetic units.
The compiler program according to attachment 11.
(Appendix 13)
The determination is performed by dividing the number of repetitions of the instruction sequence by the repetition instruction by the sum of the number of integer arithmetic units and the number of floating-point arithmetic units and the number of repetitions of the output instruction sequence by the repetition instruction. Including processing to
The compiler program according to attachment 12.
(Appendix 14)
The computer includes a processor information storage unit that stores the number of integer arithmetic units and floating point arithmetic units for each processor type,
Further executing a process of acquiring from the processor information storage unit the number of integer arithmetic units and floating point arithmetic units possessed by a processor serving as an execution environment of the object;
The compiler program according to any one of appendices 11 to 13.
(Appendix 15)
The extracted instruction sequence is an instruction sequence that does not include a branch.
The compiler program according to any one of appendices 11 to 14.

１００・・・情報処理装置
１０１・・・プロセッサ
１０１ａ・・・整数演算器
１０１ｂ・・・浮動小数点演算器
１０２・・・主記憶部
１０３・・・補助記憶部
１０４・・・通信部
１０、５００・・・コンパイラ装置
２０１・・・マシンモデル
２０２・・・ＮＦＵ数格納部
２０３・・・ＮＩＵ数格納部
２１１・・・ＮＩＵ取得部
２１２・・・ＮＦＵ取得部
２２０・・・命令変換部
２２１・・・ＩＵ命令変換部
２２２・・・ＦＵ命令変換部
２３０・・・命令展開部
２３１・・・ＩＵ命令展開部
２３２・・・ＦＵ命令展開部
２４０・・・ループ命令補正部
２６１・・・ループ演算命令列格納部
２６２・・・ＦＵ変換命令列格納部
２６３・・・ＩＵ変換命令列格納部
２６４・・・ＦＵ出力命令列格納部
２６５・・・ＩＵ出力命令列格納部
２６６・・・出力ループ命令列格納部
３０１・・・ＦＵ変換テーブル
３０２・・・ＩＵ変換テーブル
DESCRIPTION OF SYMBOLS 100 ... Information processing apparatus 101 ... Processor 101a ... Integer arithmetic unit 101b ... Floating point arithmetic unit 102 ... Main memory part 103 ... Auxiliary memory part 104 ... Communication part 10,500 ... Compiler device 201 ... Machine model 202 ... NFU number storage unit 203 ... NIU number storage unit 211 ... NIU acquisition unit 212 ... NFU acquisition unit 220 ... Instruction conversion unit 221 ..IU instruction conversion unit 222... FU instruction conversion unit 230... Instruction expansion unit 231... IU instruction expansion unit 232... FU instruction expansion unit 240... Loop instruction correction unit 261. Operation instruction sequence storage unit 262... FU conversion instruction sequence storage unit 263... IU conversion instruction sequence storage unit 264... FU output instruction sequence storage unit 265. Parts 266 ... output loop instruction sequence storage unit 301 ... FU conversion table 302 ... IU conversion table

Claims

An extraction unit for extracting the instruction sequence repeated by the repetition instruction when detecting a repetition instruction instructing repetition of the instruction sequence in the input source file;
A first generation unit that converts an instruction included in the extracted instruction sequence into an integer instruction that performs an integer operation to generate an integer instruction sequence;
A second generation unit that generates a floating-point instruction sequence by converting an instruction included in the extracted instruction sequence into a floating-point instruction that performs a floating-point operation;
An output instruction sequence including the integer instruction sequence and the floating-point instruction sequence is generated based on the number of integer arithmetic units and the number of floating-point arithmetic units included in a processor that is an execution environment of an object compiled from the source file. A third generation unit,
Compiler device.

A determination unit that determines the number of repetitions of the output instruction sequence by the repetition instruction based on the number of repetitions of the instruction sequence by the repetition instruction, the number of integer arithmetic units, and the number of the floating-point arithmetic units;
The compiler apparatus according to claim 1.

The determination unit sets a value obtained by dividing the number of repetitions of the instruction sequence by the repetition instruction by the sum of the number of integer arithmetic units and the number of floating point arithmetic units as the number of repetitions of the output instruction sequence by the repetition instructions. ,
The compiler apparatus according to claim 2.

A processor information storage unit for storing the number of integer arithmetic units and floating point arithmetic units for each processor type;
An arithmetic unit number acquisition unit that acquires the number of integer arithmetic units and floating-point arithmetic units included in a processor that is an execution environment of the object from the processor information storage unit;
The compiler apparatus as described in any one of Claim 1 to 3.

The instruction sequence extracted by the extraction unit is an instruction sequence not including a branch.
The compiler apparatus as described in any one of Claim 1 to 4.

Computer
When a repetition instruction that instructs repetition of an instruction sequence is detected in the input source file, the instruction sequence repeated by the repetition instruction is extracted,
Converting an instruction included in the extracted instruction sequence into an integer instruction for performing an integer operation to generate an integer instruction sequence;
Converting the instruction included in the extracted instruction sequence into a floating-point instruction for performing a floating-point operation to generate a floating-point instruction sequence;
Based on the number of integer arithmetic units included in the processor and the number of floating point arithmetic units, an output instruction sequence including the integer instruction sequence and the floating point instruction sequence is generated.
Compilation method.

On the computer,
When a repetition instruction that instructs repetition of an instruction sequence is detected in the input source file, the instruction sequence repeated by the repetition instruction is extracted,
Converting an instruction included in the extracted instruction sequence to an integer instruction for performing an integer operation to generate an integer instruction sequence;
Converting the instruction included in the extracted instruction sequence into a floating-point instruction for performing a floating-point operation to generate a floating-point instruction sequence;
Based on the number of integer arithmetic units included in the processor and the number of floating point arithmetic units, an output instruction sequence including the integer instruction sequence and the floating point instruction sequence is generated.
Compiler program.